Biothermodynamics - PDF Free Download

METHODS IN ENZYMOLOGY Editors-in-Chief JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of...

Author: Michael L. Johnson | Jo M. Holt | Gary K. Ackers

42 downloads 1332 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

METHODS IN ENZYMOLOGY Editors-in-Chief

JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of Technology Pasadena, California, USA Founding Editors

SIDNEY P. COLOWICK AND NATHAN O. KAPLAN

Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 32 Jamestown Road, London NW1 7BY, UK First edition 2009 Copyright # 2009 Elsevier Inc. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@ elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made For information on all Academic Press publications visit our website at elsevierdirect.com

ISBN: 978-0-12-374596-5 ISSN: 0076-6879 Printed and bound in United States of America 09 10 11 12 10 9 8 7 6 5 4 3 2 1

CONTRIBUTORS

Gary K. Ackers Emeritus, Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri, USA Tural Aksel T. C. Jenkins Department of Biophysics, The Johns Hopkins University, Baltimore, Maryland, USA David L. Bain Department of Pharmaceutical Sciences, University of Colorado Denver, Denver, Colorado, USA Elisar Barbar Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, USA Doug Barrick T. C. Jenkins Department of Biophysics, The Johns Hopkins University, Baltimore, Maryland, USA Gregory Benison Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, USA Philip C. Bevilacqua Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania, USA James U. Bowie Department of Chemistry and Biochemistry, UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute, University of California, Los Angeles, California, USA A. Clay Clark Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina, USA Keith D. Connaghan-Jones Department of Pharmaceutical Sciences, University of Colorado Denver, Denver, Colorado, USA

xi

xii

Contributors

John J. Correia Department of Biochemistry, University of Mississippi Medical Center, Jackson, Mississippi, USA Enrique M. De La Cruz Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA David E. Draper Department of Chemistry and Biophysics, Johns Hopkins University, Baltimore, Maryland, USA Ernesto Freire Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA Dan Grilley Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University, Evanston, Illinois, USA Michael T. Henzl Department of Biochemistry, University of Missouri, Columbia, Missouri, USA Vincent J. Hilser Department of Biochemistry and Molecular Biophysics and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA Jo M. Holt Emeritus, Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri, USA Heedeok Hong Department of Chemistry and Biochemistry, UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute, University of California, Los Angeles, California, USA Juyang Huang Department of Physics, Texas Tech University, Lubbock, Texas, USA Nathan H. Joh Department of Chemistry and Biochemistry, UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute, University of California, Los Angeles, California, USA Sarah Katen Department of Biology, Indiana University, Bloomington, Indiana, USA Ana Maria Soto Department of Chemistry, Towson University, Towson, Maryland, USA

Contributors

xiii

Sara L. Milam Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina, USA E. Michael Ostap Department of Physiology, Pennsylvania Muscle Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA ¨n Arne Scho Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA Nathan A. Siegfried Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania, USA Walter F. Stafford Boston Biomedical Research Institute, Watertown, Massachusetts, USA Lukas K. Tamm Center for Membrane Biology and Department of Molecular Physiology and Biological Physics, University of Virginia Health System, Charlottesville, Virginia, USA Adrian Velazquez-Campoy Institute of Biocomputation and Physics of Complex Systems (BIFI), and Fundacio´n Arago´n IþD (ARAID-BIFI), Universidad de Zaragoza, Zaragoza, Spain Jason Vertrees Department of Biochemistry and Molecular Biophysics and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA Jad Walters Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina, USA James O. Wrabl Department of Biochemistry and Molecular Biophysics and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA Adam Zlotnick Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma and Department of Biology, Indiana University, Bloomington, Indiana, USA

PREFACE

Branches of the United States government have twice acknowledged Josiah Williard Gibbs for his contributions to thermodynamics; and thus indirectly acknowledged the importance of thermodynamics. The first acknowledgement was the US Navy with the USNS Josiah Williard Gibbs which was a ship of the line between 1958 and 1971. The second example was the US Postal Service by including him as one four great American scientists on a series of postage stamps that were issued in 2005. ‘‘The greatest thermodynamicist of them all’’ ( John Fenn, 2002 Nobel Prize in Chemistry). Unfortunately, a large fraction of scientists have the impression that thermodynamic approaches are archaic, and, at best, ancillary to the central issues of biochemistry. One reason for this misconception is that thermodynamics is commonly either poorly taught or not at all in departments of chemistry, biochemistry, etc. Steam engines come to mind when I think of my first thermodynamics course. Another reason for this narrow and insular perception is that thermodynamics is frequently equated with a single experimental technique (i.e. calorimetry). Sadly, thermodynamics has seldom been fused with developments in molecular biology, structural analysis or computational chemistry. However, all of these perceptions are far from accurate. The importance of thermodynamics is its use as a ‘‘logic tool.’’ One of many quintessential examples of such a use of thermodynamics is Wyman’s theory of linked functions. This volume is one of a continuing series which foster and develop this vision of how thermodynamics can be an important tool for the study of biological systems. MICHAEL L. JOHNSON JO M. HOLT GARY K. ACKERS

xv

METHODS IN ENZYMOLOGY

VOLUME I. Preparation and Assay of Enzymes Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME II. Preparation and Assay of Enzymes Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME III. Preparation and Assay of Substrates Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME IV. Special Techniques for the Enzymologist Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME V. Preparation and Assay of Enzymes Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME VI. Preparation and Assay of Enzymes (Continued) Preparation and Assay of Substrates Special Techniques Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME VII. Cumulative Subject Index Edited by SIDNEY P. COLOWICK AND NATHAN O. KAPLAN VOLUME VIII. Complex Carbohydrates Edited by ELIZABETH F. NEUFELD AND VICTOR GINSBURG VOLUME IX. Carbohydrate Metabolism Edited by WILLIS A. WOOD VOLUME X. Oxidation and Phosphorylation Edited by RONALD W. ESTABROOK AND MAYNARD E. PULLMAN VOLUME XI. Enzyme Structure Edited by C. H. W. HIRS VOLUME XII. Nucleic Acids (Parts A and B) Edited by LAWRENCE GROSSMAN AND KIVIE MOLDAVE VOLUME XIII. Citric Acid Cycle Edited by J. M. LOWENSTEIN VOLUME XIV. Lipids Edited by J. M. LOWENSTEIN VOLUME XV. Steroids and Terpenoids Edited by RAYMOND B. CLAYTON xvii

xviii

Methods in Enzymology

VOLUME XVI. Fast Reactions Edited by KENNETH KUSTIN VOLUME XVII. Metabolism of Amino Acids and Amines (Parts A and B) Edited by HERBERT TABOR AND CELIA WHITE TABOR VOLUME XVIII. Vitamins and Coenzymes (Parts A, B, and C) Edited by DONALD B. MCCORMICK AND LEMUEL D. WRIGHT VOLUME XIX. Proteolytic Enzymes Edited by GERTRUDE E. PERLMANN AND LASZLO LORAND VOLUME XX. Nucleic Acids and Protein Synthesis (Part C) Edited by KIVIE MOLDAVE AND LAWRENCE GROSSMAN VOLUME XXI. Nucleic Acids (Part D) Edited by LAWRENCE GROSSMAN AND KIVIE MOLDAVE VOLUME XXII. Enzyme Purification and Related Techniques Edited by WILLIAM B. JAKOBY VOLUME XXIII. Photosynthesis (Part A) Edited by ANTHONY SAN PIETRO VOLUME XXIV. Photosynthesis and Nitrogen Fixation (Part B) Edited by ANTHONY SAN PIETRO VOLUME XXV. Enzyme Structure (Part B) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME XXVI. Enzyme Structure (Part C) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME XXVII. Enzyme Structure (Part D) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME XXVIII. Complex Carbohydrates (Part B) Edited by VICTOR GINSBURG VOLUME XXIX. Nucleic Acids and Protein Synthesis (Part E) Edited by LAWRENCE GROSSMAN AND KIVIE MOLDAVE VOLUME XXX. Nucleic Acids and Protein Synthesis (Part F) Edited by KIVIE MOLDAVE AND LAWRENCE GROSSMAN VOLUME XXXI. Biomembranes (Part A) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME XXXII. Biomembranes (Part B) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME XXXIII. Cumulative Subject Index Volumes I-XXX Edited by MARTHA G. DENNIS AND EDWARD A. DENNIS VOLUME XXXIV. Affinity Techniques (Enzyme Purification: Part B) Edited by WILLIAM B. JAKOBY AND MEIR WILCHEK

Methods in Enzymology

VOLUME XXXV. Lipids (Part B) Edited by JOHN M. LOWENSTEIN VOLUME XXXVI. Hormone Action (Part A: Steroid Hormones) Edited by BERT W. O’MALLEY AND JOEL G. HARDMAN VOLUME XXXVII. Hormone Action (Part B: Peptide Hormones) Edited by BERT W. O’MALLEY AND JOEL G. HARDMAN VOLUME XXXVIII. Hormone Action (Part C: Cyclic Nucleotides) Edited by JOEL G. HARDMAN AND BERT W. O’MALLEY VOLUME XXXIX. Hormone Action (Part D: Isolated Cells, Tissues, and Organ Systems) Edited by JOEL G. HARDMAN AND BERT W. O’MALLEY VOLUME XL. Hormone Action (Part E: Nuclear Structure and Function) Edited by BERT W. O’MALLEY AND JOEL G. HARDMAN VOLUME XLI. Carbohydrate Metabolism (Part B) Edited by W. A. WOOD VOLUME XLII. Carbohydrate Metabolism (Part C) Edited by W. A. WOOD VOLUME XLIII. Antibiotics Edited by JOHN H. HASH VOLUME XLIV. Immobilized Enzymes Edited by KLAUS MOSBACH VOLUME XLV. Proteolytic Enzymes (Part B) Edited by LASZLO LORAND VOLUME XLVI. Affinity Labeling Edited by WILLIAM B. JAKOBY AND MEIR WILCHEK VOLUME XLVII. Enzyme Structure (Part E) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME XLVIII. Enzyme Structure (Part F) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME XLIX. Enzyme Structure (Part G) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME L. Complex Carbohydrates (Part C) Edited by VICTOR GINSBURG VOLUME LI. Purine and Pyrimidine Nucleotide Metabolism Edited by PATRICIA A. HOFFEE AND MARY ELLEN JONES VOLUME LII. Biomembranes (Part C: Biological Oxidations) Edited by SIDNEY FLEISCHER AND LESTER PACKER

xix

xx

Methods in Enzymology

VOLUME LIII. Biomembranes (Part D: Biological Oxidations) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME LIV. Biomembranes (Part E: Biological Oxidations) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME LV. Biomembranes (Part F: Bioenergetics) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME LVI. Biomembranes (Part G: Bioenergetics) Edited by SIDNEY FLEISCHER AND LESTER PACKER VOLUME LVII. Bioluminescence and Chemiluminescence Edited by MARLENE A. DELUCA VOLUME LVIII. Cell Culture Edited by WILLIAM B. JAKOBY AND IRA PASTAN VOLUME LIX. Nucleic Acids and Protein Synthesis (Part G) Edited by KIVIE MOLDAVE AND LAWRENCE GROSSMAN VOLUME LX. Nucleic Acids and Protein Synthesis (Part H) Edited by KIVIE MOLDAVE AND LAWRENCE GROSSMAN VOLUME 61. Enzyme Structure (Part H) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME 62. Vitamins and Coenzymes (Part D) Edited by DONALD B. MCCORMICK AND LEMUEL D. WRIGHT VOLUME 63. Enzyme Kinetics and Mechanism (Part A: Initial Rate and Inhibitor Methods) Edited by DANIEL L. PURICH VOLUME 64. Enzyme Kinetics and Mechanism (Part B: Isotopic Probes and Complex Enzyme Systems) Edited by DANIEL L. PURICH VOLUME 65. Nucleic Acids (Part I) Edited by LAWRENCE GROSSMAN AND KIVIE MOLDAVE VOLUME 66. Vitamins and Coenzymes (Part E) Edited by DONALD B. MCCORMICK AND LEMUEL D. WRIGHT VOLUME 67. Vitamins and Coenzymes (Part F) Edited by DONALD B. MCCORMICK AND LEMUEL D. WRIGHT VOLUME 68. Recombinant DNA Edited by RAY WU VOLUME 69. Photosynthesis and Nitrogen Fixation (Part C) Edited by ANTHONY SAN PIETRO VOLUME 70. Immunochemical Techniques (Part A) Edited by HELEN VAN VUNAKIS AND JOHN J. LANGONE

Methods in Enzymology

xxi

VOLUME 71. Lipids (Part C) Edited by JOHN M. LOWENSTEIN VOLUME 72. Lipids (Part D) Edited by JOHN M. LOWENSTEIN VOLUME 73. Immunochemical Techniques (Part B) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 74. Immunochemical Techniques (Part C) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 75. Cumulative Subject Index Volumes XXXI, XXXII, XXXIV–LX Edited by EDWARD A. DENNIS AND MARTHA G. DENNIS VOLUME 76. Hemoglobins Edited by ERALDO ANTONINI, LUIGI ROSSI-BERNARDI, AND EMILIA CHIANCONE VOLUME 77. Detoxication and Drug Metabolism Edited by WILLIAM B. JAKOBY VOLUME 78. Interferons (Part A) Edited by SIDNEY PESTKA VOLUME 79. Interferons (Part B) Edited by SIDNEY PESTKA VOLUME 80. Proteolytic Enzymes (Part C) Edited by LASZLO LORAND VOLUME 81. Biomembranes (Part H: Visual Pigments and Purple Membranes, I) Edited by LESTER PACKER VOLUME 82. Structural and Contractile Proteins (Part A: Extracellular Matrix) Edited by LEON W. CUNNINGHAM AND DIXIE W. FREDERIKSEN VOLUME 83. Complex Carbohydrates (Part D) Edited by VICTOR GINSBURG VOLUME 84. Immunochemical Techniques (Part D: Selected Immunoassays) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 85. Structural and Contractile Proteins (Part B: The Contractile Apparatus and the Cytoskeleton) Edited by DIXIE W. FREDERIKSEN AND LEON W. CUNNINGHAM VOLUME 86. Prostaglandins and Arachidonate Metabolites Edited by WILLIAM E. M. LANDS AND WILLIAM L. SMITH VOLUME 87. Enzyme Kinetics and Mechanism (Part C: Intermediates, Stereo-chemistry, and Rate Studies) Edited by DANIEL L. PURICH VOLUME 88. Biomembranes (Part I: Visual Pigments and Purple Membranes, II) Edited by LESTER PACKER

xxii

Methods in Enzymology

VOLUME 89. Carbohydrate Metabolism (Part D) Edited by WILLIS A. WOOD VOLUME 90. Carbohydrate Metabolism (Part E) Edited by WILLIS A. WOOD VOLUME 91. Enzyme Structure (Part I) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME 92. Immunochemical Techniques (Part E: Monoclonal Antibodies and General Immunoassay Methods) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 93. Immunochemical Techniques (Part F: Conventional Antibodies, Fc Receptors, and Cytotoxicity) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 94. Polyamines Edited by HERBERT TABOR AND CELIA WHITE TABOR VOLUME 95. Cumulative Subject Index Volumes 61–74, 76–80 Edited by EDWARD A. DENNIS AND MARTHA G. DENNIS VOLUME 96. Biomembranes [Part J: Membrane Biogenesis: Assembly and Targeting (General Methods; Eukaryotes)] Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 97. Biomembranes [Part K: Membrane Biogenesis: Assembly and Targeting (Prokaryotes, Mitochondria, and Chloroplasts)] Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 98. Biomembranes (Part L: Membrane Biogenesis: Processing and Recycling) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 99. Hormone Action (Part F: Protein Kinases) Edited by JACKIE D. CORBIN AND JOEL G. HARDMAN VOLUME 100. Recombinant DNA (Part B) Edited by RAY WU, LAWRENCE GROSSMAN, AND KIVIE MOLDAVE VOLUME 101. Recombinant DNA (Part C) Edited by RAY WU, LAWRENCE GROSSMAN, AND KIVIE MOLDAVE VOLUME 102. Hormone Action (Part G: Calmodulin and Calcium-Binding Proteins) Edited by ANTHONY R. MEANS AND BERT W. O’MALLEY VOLUME 103. Hormone Action (Part H: Neuroendocrine Peptides) Edited by P. MICHAEL CONN VOLUME 104. Enzyme Purification and Related Techniques (Part C) Edited by WILLIAM B. JAKOBY

Methods in Enzymology

xxiii

VOLUME 105. Oxygen Radicals in Biological Systems Edited by LESTER PACKER VOLUME 106. Posttranslational Modifications (Part A) Edited by FINN WOLD AND KIVIE MOLDAVE VOLUME 107. Posttranslational Modifications (Part B) Edited by FINN WOLD AND KIVIE MOLDAVE VOLUME 108. Immunochemical Techniques (Part G: Separation and Characterization of Lymphoid Cells) Edited by GIOVANNI DI SABATO, JOHN J. LANGONE, AND HELEN VAN VUNAKIS VOLUME 109. Hormone Action (Part I: Peptide Hormones) Edited by LUTZ BIRNBAUMER AND BERT W. O’MALLEY VOLUME 110. Steroids and Isoprenoids (Part A) Edited by JOHN H. LAW AND HANS C. RILLING VOLUME 111. Steroids and Isoprenoids (Part B) Edited by JOHN H. LAW AND HANS C. RILLING VOLUME 112. Drug and Enzyme Targeting (Part A) Edited by KENNETH J. WIDDER AND RALPH GREEN VOLUME 113. Glutamate, Glutamine, Glutathione, and Related Compounds Edited by ALTON MEISTER VOLUME 114. Diffraction Methods for Biological Macromolecules (Part A) Edited by HAROLD W. WYCKOFF, C. H. W. HIRS, AND SERGE N. TIMASHEFF VOLUME 115. Diffraction Methods for Biological Macromolecules (Part B) Edited by HAROLD W. WYCKOFF, C. H. W. HIRS, AND SERGE N. TIMASHEFF VOLUME 116. Immunochemical Techniques (Part H: Effectors and Mediators of Lymphoid Cell Functions) Edited by GIOVANNI DI SABATO, JOHN J. LANGONE, AND HELEN VAN VUNAKIS VOLUME 117. Enzyme Structure (Part J) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME 118. Plant Molecular Biology Edited by ARTHUR WEISSBACH AND HERBERT WEISSBACH VOLUME 119. Interferons (Part C) Edited by SIDNEY PESTKA VOLUME 120. Cumulative Subject Index Volumes 81–94, 96–101 VOLUME 121. Immunochemical Techniques (Part I: Hybridoma Technology and Monoclonal Antibodies) Edited by JOHN J. LANGONE AND HELEN VAN VUNAKIS VOLUME 122. Vitamins and Coenzymes (Part G) Edited by FRANK CHYTIL AND DONALD B. MCCORMICK

xxiv

Methods in Enzymology

VOLUME 123. Vitamins and Coenzymes (Part H) Edited by FRANK CHYTIL AND DONALD B. MCCORMICK VOLUME 124. Hormone Action (Part J: Neuroendocrine Peptides) Edited by P. MICHAEL CONN VOLUME 125. Biomembranes (Part M: Transport in Bacteria, Mitochondria, and Chloroplasts: General Approaches and Transport Systems) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 126. Biomembranes (Part N: Transport in Bacteria, Mitochondria, and Chloroplasts: Protonmotive Force) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 127. Biomembranes (Part O: Protons and Water: Structure and Translocation) Edited by LESTER PACKER VOLUME 128. Plasma Lipoproteins (Part A: Preparation, Structure, and Molecular Biology) Edited by JERE P. SEGREST AND JOHN J. ALBERS VOLUME 129. Plasma Lipoproteins (Part B: Characterization, Cell Biology, and Metabolism) Edited by JOHN J. ALBERS AND JERE P. SEGREST VOLUME 130. Enzyme Structure (Part K) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME 131. Enzyme Structure (Part L) Edited by C. H. W. HIRS AND SERGE N. TIMASHEFF VOLUME 132. Immunochemical Techniques (Part J: Phagocytosis and Cell-Mediated Cytotoxicity) Edited by GIOVANNI DI SABATO AND JOHANNES EVERSE VOLUME 133. Bioluminescence and Chemiluminescence (Part B) Edited by MARLENE DELUCA AND WILLIAM D. MCELROY VOLUME 134. Structural and Contractile Proteins (Part C: The Contractile Apparatus and the Cytoskeleton) Edited by RICHARD B. VALLEE VOLUME 135. Immobilized Enzymes and Cells (Part B) Edited by KLAUS MOSBACH VOLUME 136. Immobilized Enzymes and Cells (Part C) Edited by KLAUS MOSBACH VOLUME 137. Immobilized Enzymes and Cells (Part D) Edited by KLAUS MOSBACH VOLUME 138. Complex Carbohydrates (Part E) Edited by VICTOR GINSBURG

Methods in Enzymology

xxv

VOLUME 139. Cellular Regulators (Part A: Calcium- and Calmodulin-Binding Proteins) Edited by ANTHONY R. MEANS AND P. MICHAEL CONN VOLUME 140. Cumulative Subject Index Volumes 102–119, 121–134 VOLUME 141. Cellular Regulators (Part B: Calcium and Lipids) Edited by P. MICHAEL CONN AND ANTHONY R. MEANS VOLUME 142. Metabolism of Aromatic Amino Acids and Amines Edited by SEYMOUR KAUFMAN VOLUME 143. Sulfur and Sulfur Amino Acids Edited by WILLIAM B. JAKOBY AND OWEN GRIFFITH VOLUME 144. Structural and Contractile Proteins (Part D: Extracellular Matrix) Edited by LEON W. CUNNINGHAM VOLUME 145. Structural and Contractile Proteins (Part E: Extracellular Matrix) Edited by LEON W. CUNNINGHAM VOLUME 146. Peptide Growth Factors (Part A) Edited by DAVID BARNES AND DAVID A. SIRBASKU VOLUME 147. Peptide Growth Factors (Part B) Edited by DAVID BARNES AND DAVID A. SIRBASKU VOLUME 148. Plant Cell Membranes Edited by LESTER PACKER AND ROLAND DOUCE VOLUME 149. Drug and Enzyme Targeting (Part B) Edited by RALPH GREEN AND KENNETH J. WIDDER VOLUME 150. Immunochemical Techniques (Part K: In Vitro Models of B and T Cell Functions and Lymphoid Cell Receptors) Edited by GIOVANNI DI SABATO VOLUME 151. Molecular Genetics of Mammalian Cells Edited by MICHAEL M. GOTTESMAN VOLUME 152. Guide to Molecular Cloning Techniques Edited by SHELBY L. BERGER AND ALAN R. KIMMEL VOLUME 153. Recombinant DNA (Part D) Edited by RAY WU AND LAWRENCE GROSSMAN VOLUME 154. Recombinant DNA (Part E) Edited by RAY WU AND LAWRENCE GROSSMAN VOLUME 155. Recombinant DNA (Part F) Edited by RAY WU VOLUME 156. Biomembranes (Part P: ATP-Driven Pumps and Related Transport: The Na, K-Pump) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER

xxvi

Methods in Enzymology

VOLUME 157. Biomembranes (Part Q: ATP-Driven Pumps and Related Transport: Calcium, Proton, and Potassium Pumps) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 158. Metalloproteins (Part A) Edited by JAMES F. RIORDAN AND BERT L. VALLEE VOLUME 159. Initiation and Termination of Cyclic Nucleotide Action Edited by JACKIE D. CORBIN AND ROGER A. JOHNSON VOLUME 160. Biomass (Part A: Cellulose and Hemicellulose) Edited by WILLIS A. WOOD AND SCOTT T. KELLOGG VOLUME 161. Biomass (Part B: Lignin, Pectin, and Chitin) Edited by WILLIS A. WOOD AND SCOTT T. KELLOGG VOLUME 162. Immunochemical Techniques (Part L: Chemotaxis and Inflammation) Edited by GIOVANNI DI SABATO VOLUME 163. Immunochemical Techniques (Part M: Chemotaxis and Inflammation) Edited by GIOVANNI DI SABATO VOLUME 164. Ribosomes Edited by HARRY F. NOLLER, JR., AND KIVIE MOLDAVE VOLUME 165. Microbial Toxins: Tools for Enzymology Edited by SIDNEY HARSHMAN VOLUME 166. Branched-Chain Amino Acids Edited by ROBERT HARRIS AND JOHN R. SOKATCH VOLUME 167. Cyanobacteria Edited by LESTER PACKER AND ALEXANDER N. GLAZER VOLUME 168. Hormone Action (Part K: Neuroendocrine Peptides) Edited by P. MICHAEL CONN VOLUME 169. Platelets: Receptors, Adhesion, Secretion (Part A) Edited by JACEK HAWIGER VOLUME 170. Nucleosomes Edited by PAUL M. WASSARMAN AND ROGER D. KORNBERG VOLUME 171. Biomembranes (Part R: Transport Theory: Cells and Model Membranes) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 172. Biomembranes (Part S: Transport: Membrane Isolation and Characterization) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER

Methods in Enzymology

xxvii

VOLUME 173. Biomembranes [Part T: Cellular and Subcellular Transport: Eukaryotic (Nonepithelial) Cells] Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 174. Biomembranes [Part U: Cellular and Subcellular Transport: Eukaryotic (Nonepithelial) Cells] Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 175. Cumulative Subject Index Volumes 135–139, 141–167 VOLUME 176. Nuclear Magnetic Resonance (Part A: Spectral Techniques and Dynamics) Edited by NORMAN J. OPPENHEIMER AND THOMAS L. JAMES VOLUME 177. Nuclear Magnetic Resonance (Part B: Structure and Mechanism) Edited by NORMAN J. OPPENHEIMER AND THOMAS L. JAMES VOLUME 178. Antibodies, Antigens, and Molecular Mimicry Edited by JOHN J. LANGONE VOLUME 179. Complex Carbohydrates (Part F) Edited by VICTOR GINSBURG VOLUME 180. RNA Processing (Part A: General Methods) Edited by JAMES E. DAHLBERG AND JOHN N. ABELSON VOLUME 181. RNA Processing (Part B: Specific Methods) Edited by JAMES E. DAHLBERG AND JOHN N. ABELSON VOLUME 182. Guide to Protein Purification Edited by MURRAY P. DEUTSCHER VOLUME 183. Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences Edited by RUSSELL F. DOOLITTLE VOLUME 184. Avidin-Biotin Technology Edited by MEIR WILCHEK AND EDWARD A. BAYER VOLUME 185. Gene Expression Technology Edited by DAVID V. GOEDDEL VOLUME 186. Oxygen Radicals in Biological Systems (Part B: Oxygen Radicals and Antioxidants) Edited by LESTER PACKER AND ALEXANDER N. GLAZER VOLUME 187. Arachidonate Related Lipid Mediators Edited by ROBERT C. MURPHY AND FRANK A. FITZPATRICK VOLUME 188. Hydrocarbons and Methylotrophy Edited by MARY E. LIDSTROM VOLUME 189. Retinoids (Part A: Molecular and Metabolic Aspects) Edited by LESTER PACKER

xxviii

Methods in Enzymology

VOLUME 190. Retinoids (Part B: Cell Differentiation and Clinical Applications) Edited by LESTER PACKER VOLUME 191. Biomembranes (Part V: Cellular and Subcellular Transport: Epithelial Cells) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 192. Biomembranes (Part W: Cellular and Subcellular Transport: Epithelial Cells) Edited by SIDNEY FLEISCHER AND BECCA FLEISCHER VOLUME 193. Mass Spectrometry Edited by JAMES A. MCCLOSKEY VOLUME 194. Guide to Yeast Genetics and Molecular Biology Edited by CHRISTINE GUTHRIE AND GERALD R. FINK VOLUME 195. Adenylyl Cyclase, G Proteins, and Guanylyl Cyclase Edited by ROGER A. JOHNSON AND JACKIE D. CORBIN VOLUME 196. Molecular Motors and the Cytoskeleton Edited by RICHARD B. VALLEE VOLUME 197. Phospholipases Edited by EDWARD A. DENNIS VOLUME 198. Peptide Growth Factors (Part C) Edited by DAVID BARNES, J. P. MATHER, AND GORDON H. SATO VOLUME 199. Cumulative Subject Index Volumes 168–174, 176–194 VOLUME 200. Protein Phosphorylation (Part A: Protein Kinases: Assays, Purification, Antibodies, Functional Analysis, Cloning, and Expression) Edited by TONY HUNTER AND BARTHOLOMEW M. SEFTON VOLUME 201. Protein Phosphorylation (Part B: Analysis of Protein Phosphorylation, Protein Kinase Inhibitors, and Protein Phosphatases) Edited by TONY HUNTER AND BARTHOLOMEW M. SEFTON VOLUME 202. Molecular Design and Modeling: Concepts and Applications (Part A: Proteins, Peptides, and Enzymes) Edited by JOHN J. LANGONE VOLUME 203. Molecular Design and Modeling: Concepts and Applications (Part B: Antibodies and Antigens, Nucleic Acids, Polysaccharides, and Drugs) Edited by JOHN J. LANGONE VOLUME 204. Bacterial Genetic Systems Edited by JEFFREY H. MILLER VOLUME 205. Metallobiochemistry (Part B: Metallothionein and Related Molecules) Edited by JAMES F. RIORDAN AND BERT L. VALLEE

Methods in Enzymology

xxix

VOLUME 206. Cytochrome P450 Edited by MICHAEL R. WATERMAN AND ERIC F. JOHNSON VOLUME 207. Ion Channels Edited by BERNARDO RUDY AND LINDA E. IVERSON VOLUME 208. Protein–DNA Interactions Edited by ROBERT T. SAUER VOLUME 209. Phospholipid Biosynthesis Edited by EDWARD A. DENNIS AND DENNIS E. VANCE VOLUME 210. Numerical Computer Methods Edited by LUDWIG BRAND AND MICHAEL L. JOHNSON VOLUME 211. DNA Structures (Part A: Synthesis and Physical Analysis of DNA) Edited by DAVID M. J. LILLEY AND JAMES E. DAHLBERG VOLUME 212. DNA Structures (Part B: Chemical and Electrophoretic Analysis of DNA) Edited by DAVID M. J. LILLEY AND JAMES E. DAHLBERG VOLUME 213. Carotenoids (Part A: Chemistry, Separation, Quantitation, and Antioxidation) Edited by LESTER PACKER VOLUME 214. Carotenoids (Part B: Metabolism, Genetics, and Biosynthesis) Edited by LESTER PACKER VOLUME 215. Platelets: Receptors, Adhesion, Secretion (Part B) Edited by JACEK J. HAWIGER VOLUME 216. Recombinant DNA (Part G) Edited by RAY WU VOLUME 217. Recombinant DNA (Part H) Edited by RAY WU VOLUME 218. Recombinant DNA (Part I) Edited by RAY WU VOLUME 219. Reconstitution of Intracellular Transport Edited by JAMES E. ROTHMAN VOLUME 220. Membrane Fusion Techniques (Part A) Edited by NEJAT DU¨ZGU¨NES, VOLUME 221. Membrane Fusion Techniques (Part B) Edited by NEJAT DU¨ZGU¨NES, VOLUME 222. Proteolytic Enzymes in Coagulation, Fibrinolysis, and Complement Activation (Part A: Mammalian Blood Coagulation Factors and Inhibitors) Edited by LASZLO LORAND AND KENNETH G. MANN

xxx

Methods in Enzymology

VOLUME 223. Proteolytic Enzymes in Coagulation, Fibrinolysis, and Complement Activation (Part B: Complement Activation, Fibrinolysis, and Nonmammalian Blood Coagulation Factors) Edited by LASZLO LORAND AND KENNETH G. MANN VOLUME 224. Molecular Evolution: Producing the Biochemical Data Edited by ELIZABETH ANNE ZIMMER, THOMAS J. WHITE, REBECCA L. CANN, AND ALLAN C. WILSON VOLUME 225. Guide to Techniques in Mouse Development Edited by PAUL M. WASSARMAN AND MELVIN L. DEPAMPHILIS VOLUME 226. Metallobiochemistry (Part C: Spectroscopic and Physical Methods for Probing Metal Ion Environments in Metalloenzymes and Metalloproteins) Edited by JAMES F. RIORDAN AND BERT L. VALLEE VOLUME 227. Metallobiochemistry (Part D: Physical and Spectroscopic Methods for Probing Metal Ion Environments in Metalloproteins) Edited by JAMES F. RIORDAN AND BERT L. VALLEE VOLUME 228. Aqueous Two-Phase Systems Edited by HARRY WALTER AND GO¨TE JOHANSSON VOLUME 229. Cumulative Subject Index Volumes 195–198, 200–227 VOLUME 230. Guide to Techniques in Glycobiology Edited by WILLIAM J. LENNARZ AND GERALD W. HART VOLUME 231. Hemoglobins (Part B: Biochemical and Analytical Methods) Edited by JOHANNES EVERSE, KIM D. VANDEGRIFF, AND ROBERT M. WINSLOW VOLUME 232. Hemoglobins (Part C: Biophysical Methods) Edited by JOHANNES EVERSE, KIM D. VANDEGRIFF, AND ROBERT M. WINSLOW VOLUME 233. Oxygen Radicals in Biological Systems (Part C) Edited by LESTER PACKER VOLUME 234. Oxygen Radicals in Biological Systems (Part D) Edited by LESTER PACKER VOLUME 235. Bacterial Pathogenesis (Part A: Identification and Regulation of Virulence Factors) Edited by VIRGINIA L. CLARK AND PATRIK M. BAVOIL VOLUME 236. Bacterial Pathogenesis (Part B: Integration of Pathogenic Bacteria with Host Cells) Edited by VIRGINIA L. CLARK AND PATRIK M. BAVOIL VOLUME 237. Heterotrimeric G Proteins Edited by RAVI IYENGAR VOLUME 238. Heterotrimeric G-Protein Effectors Edited by RAVI IYENGAR

Methods in Enzymology

xxxi

VOLUME 239. Nuclear Magnetic Resonance (Part C) Edited by THOMAS L. JAMES AND NORMAN J. OPPENHEIMER VOLUME 240. Numerical Computer Methods (Part B) Edited by MICHAEL L. JOHNSON AND LUDWIG BRAND VOLUME 241. Retroviral Proteases Edited by LAWRENCE C. KUO AND JULES A. SHAFER VOLUME 242. Neoglycoconjugates (Part A) Edited by Y. C. LEE AND REIKO T. LEE VOLUME 243. Inorganic Microbial Sulfur Metabolism Edited by HARRY D. PECK, JR., AND JEAN LEGALL VOLUME 244. Proteolytic Enzymes: Serine and Cysteine Peptidases Edited by ALAN J. BARRETT VOLUME 245. Extracellular Matrix Components Edited by E. RUOSLAHTI AND E. ENGVALL VOLUME 246. Biochemical Spectroscopy Edited by KENNETH SAUER VOLUME 247. Neoglycoconjugates (Part B: Biomedical Applications) Edited by Y. C. LEE AND REIKO T. LEE VOLUME 248. Proteolytic Enzymes: Aspartic and Metallo Peptidases Edited by ALAN J. BARRETT VOLUME 249. Enzyme Kinetics and Mechanism (Part D: Developments in Enzyme Dynamics) Edited by DANIEL L. PURICH VOLUME 250. Lipid Modifications of Proteins Edited by PATRICK J. CASEY AND JANICE E. BUSS VOLUME 251. Biothiols (Part A: Monothiols and Dithiols, Protein Thiols, and Thiyl Radicals) Edited by LESTER PACKER VOLUME 252. Biothiols (Part B: Glutathione and Thioredoxin; Thiols in Signal Transduction and Gene Regulation) Edited by LESTER PACKER VOLUME 253. Adhesion of Microbial Pathogens Edited by RON J. DOYLE AND ITZHAK OFEK VOLUME 254. Oncogene Techniques Edited by PETER K. VOGT AND INDER M. VERMA VOLUME 255. Small GTPases and Their Regulators (Part A: Ras Family) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 256. Small GTPases and Their Regulators (Part B: Rho Family) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL

xxxii

Methods in Enzymology

VOLUME 257. Small GTPases and Their Regulators (Part C: Proteins Involved in Transport) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 258. Redox-Active Amino Acids in Biology Edited by JUDITH P. KLINMAN VOLUME 259. Energetics of Biological Macromolecules Edited by MICHAEL L. JOHNSON AND GARY K. ACKERS VOLUME 260. Mitochondrial Biogenesis and Genetics (Part A) Edited by GIUSEPPE M. ATTARDI AND ANNE CHOMYN VOLUME 261. Nuclear Magnetic Resonance and Nucleic Acids Edited by THOMAS L. JAMES VOLUME 262. DNA Replication Edited by JUDITH L. CAMPBELL VOLUME 263. Plasma Lipoproteins (Part C: Quantitation) Edited by WILLIAM A. BRADLEY, SANDRA H. GIANTURCO, AND JERE P. SEGREST VOLUME 264. Mitochondrial Biogenesis and Genetics (Part B) Edited by GIUSEPPE M. ATTARDI AND ANNE CHOMYN VOLUME 265. Cumulative Subject Index Volumes 228, 230–262 VOLUME 266. Computer Methods for Macromolecular Sequence Analysis Edited by RUSSELL F. DOOLITTLE VOLUME 267. Combinatorial Chemistry Edited by JOHN N. ABELSON VOLUME 268. Nitric Oxide (Part A: Sources and Detection of NO; NO Synthase) Edited by LESTER PACKER VOLUME 269. Nitric Oxide (Part B: Physiological and Pathological Processes) Edited by LESTER PACKER VOLUME 270. High Resolution Separation and Analysis of Biological Macromolecules (Part A: Fundamentals) Edited by BARRY L. KARGER AND WILLIAM S. HANCOCK VOLUME 271. High Resolution Separation and Analysis of Biological Macromolecules (Part B: Applications) Edited by BARRY L. KARGER AND WILLIAM S. HANCOCK VOLUME 272. Cytochrome P450 (Part B) Edited by ERIC F. JOHNSON AND MICHAEL R. WATERMAN VOLUME 273. RNA Polymerase and Associated Factors (Part A) Edited by SANKAR ADHYA VOLUME 274. RNA Polymerase and Associated Factors (Part B) Edited by SANKAR ADHYA

Methods in Enzymology

xxxiii

VOLUME 275. Viral Polymerases and Related Proteins Edited by LAWRENCE C. KUO, DAVID B. OLSEN, AND STEVEN S. CARROLL VOLUME 276. Macromolecular Crystallography (Part A) Edited by CHARLES W. CARTER, JR., AND ROBERT M. SWEET VOLUME 277. Macromolecular Crystallography (Part B) Edited by CHARLES W. CARTER, JR., AND ROBERT M. SWEET VOLUME 278. Fluorescence Spectroscopy Edited by LUDWIG BRAND AND MICHAEL L. JOHNSON VOLUME 279. Vitamins and Coenzymes (Part I) Edited by DONALD B. MCCORMICK, JOHN W. SUTTIE, AND CONRAD WAGNER VOLUME 280. Vitamins and Coenzymes (Part J) Edited by DONALD B. MCCORMICK, JOHN W. SUTTIE, AND CONRAD WAGNER VOLUME 281. Vitamins and Coenzymes (Part K) Edited by DONALD B. MCCORMICK, JOHN W. SUTTIE, AND CONRAD WAGNER VOLUME 282. Vitamins and Coenzymes (Part L) Edited by DONALD B. MCCORMICK, JOHN W. SUTTIE, AND CONRAD WAGNER VOLUME 283. Cell Cycle Control Edited by WILLIAM G. DUNPHY VOLUME 284. Lipases (Part A: Biotechnology) Edited by BYRON RUBIN AND EDWARD A. DENNIS VOLUME 285. Cumulative Subject Index Volumes 263, 264, 266–284, 286–289 VOLUME 286. Lipases (Part B: Enzyme Characterization and Utilization) Edited by BYRON RUBIN AND EDWARD A. DENNIS VOLUME 287. Chemokines Edited by RICHARD HORUK VOLUME 288. Chemokine Receptors Edited by RICHARD HORUK VOLUME 289. Solid Phase Peptide Synthesis Edited by GREGG B. FIELDS VOLUME 290. Molecular Chaperones Edited by GEORGE H. LORIMER AND THOMAS BALDWIN VOLUME 291. Caged Compounds Edited by GERARD MARRIOTT VOLUME 292. ABC Transporters: Biochemical, Cellular, and Molecular Aspects Edited by SURESH V. AMBUDKAR AND MICHAEL M. GOTTESMAN VOLUME 293. Ion Channels (Part B) Edited by P. MICHAEL CONN

xxxiv

Methods in Enzymology

VOLUME 294. Ion Channels (Part C) Edited by P. MICHAEL CONN VOLUME 295. Energetics of Biological Macromolecules (Part B) Edited by GARY K. ACKERS AND MICHAEL L. JOHNSON VOLUME 296. Neurotransmitter Transporters Edited by SUSAN G. AMARA VOLUME 297. Photosynthesis: Molecular Biology of Energy Capture Edited by LEE MCINTOSH VOLUME 298. Molecular Motors and the Cytoskeleton (Part B) Edited by RICHARD B. VALLEE VOLUME 299. Oxidants and Antioxidants (Part A) Edited by LESTER PACKER VOLUME 300. Oxidants and Antioxidants (Part B) Edited by LESTER PACKER VOLUME 301. Nitric Oxide: Biological and Antioxidant Activities (Part C) Edited by LESTER PACKER VOLUME 302. Green Fluorescent Protein Edited by P. MICHAEL CONN VOLUME 303. cDNA Preparation and Display Edited by SHERMAN M. WEISSMAN VOLUME 304. Chromatin Edited by PAUL M. WASSARMAN AND ALAN P. WOLFFE VOLUME 305. Bioluminescence and Chemiluminescence (Part C) Edited by THOMAS O. BALDWIN AND MIRIAM M. ZIEGLER VOLUME 306. Expression of Recombinant Genes in Eukaryotic Systems Edited by JOSEPH C. GLORIOSO AND MARTIN C. SCHMIDT VOLUME 307. Confocal Microscopy Edited by P. MICHAEL CONN VOLUME 308. Enzyme Kinetics and Mechanism (Part E: Energetics of Enzyme Catalysis) Edited by DANIEL L. PURICH AND VERN L. SCHRAMM VOLUME 309. Amyloid, Prions, and Other Protein Aggregates Edited by RONALD WETZEL VOLUME 310. Biofilms Edited by RON J. DOYLE VOLUME 311. Sphingolipid Metabolism and Cell Signaling (Part A) Edited by ALFRED H. MERRILL, JR., AND YUSUF A. HANNUN

Methods in Enzymology

xxxv

VOLUME 312. Sphingolipid Metabolism and Cell Signaling (Part B) Edited by ALFRED H. MERRILL, JR., AND YUSUF A. HANNUN VOLUME 313. Antisense Technology (Part A: General Methods, Methods of Delivery, and RNA Studies) Edited by M. IAN PHILLIPS VOLUME 314. Antisense Technology (Part B: Applications) Edited by M. IAN PHILLIPS VOLUME 315. Vertebrate Phototransduction and the Visual Cycle (Part A) Edited by KRZYSZTOF PALCZEWSKI VOLUME 316. Vertebrate Phototransduction and the Visual Cycle (Part B) Edited by KRZYSZTOF PALCZEWSKI VOLUME 317. RNA–Ligand Interactions (Part A: Structural Biology Methods) Edited by DANIEL W. CELANDER AND JOHN N. ABELSON VOLUME 318. RNA–Ligand Interactions (Part B: Molecular Biology Methods) Edited by DANIEL W. CELANDER AND JOHN N. ABELSON VOLUME 319. Singlet Oxygen, UV-A, and Ozone Edited by LESTER PACKER AND HELMUT SIES VOLUME 320. Cumulative Subject Index Volumes 290–319 VOLUME 321. Numerical Computer Methods (Part C) Edited by MICHAEL L. JOHNSON AND LUDWIG BRAND VOLUME 322. Apoptosis Edited by JOHN C. REED VOLUME 323. Energetics of Biological Macromolecules (Part C) Edited by MICHAEL L. JOHNSON AND GARY K. ACKERS VOLUME 324. Branched-Chain Amino Acids (Part B) Edited by ROBERT A. HARRIS AND JOHN R. SOKATCH VOLUME 325. Regulators and Effectors of Small GTPases (Part D: Rho Family) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 326. Applications of Chimeric Genes and Hybrid Proteins (Part A: Gene Expression and Protein Purification) Edited by JEREMY THORNER, SCOTT D. EMR, AND JOHN N. ABELSON VOLUME 327. Applications of Chimeric Genes and Hybrid Proteins (Part B: Cell Biology and Physiology) Edited by JEREMY THORNER, SCOTT D. EMR, AND JOHN N. ABELSON VOLUME 328. Applications of Chimeric Genes and Hybrid Proteins (Part C: Protein–Protein Interactions and Genomics) Edited by JEREMY THORNER, SCOTT D. EMR, AND JOHN N. ABELSON

xxxvi

Methods in Enzymology

VOLUME 329. Regulators and Effectors of Small GTPases (Part E: GTPases Involved in Vesicular Traffic) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 330. Hyperthermophilic Enzymes (Part A) Edited by MICHAEL W. W. ADAMS AND ROBERT M. KELLY VOLUME 331. Hyperthermophilic Enzymes (Part B) Edited by MICHAEL W. W. ADAMS AND ROBERT M. KELLY VOLUME 332. Regulators and Effectors of Small GTPases (Part F: Ras Family I) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 333. Regulators and Effectors of Small GTPases (Part G: Ras Family II) Edited by W. E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 334. Hyperthermophilic Enzymes (Part C) Edited by MICHAEL W. W. ADAMS AND ROBERT M. KELLY VOLUME 335. Flavonoids and Other Polyphenols Edited by LESTER PACKER VOLUME 336. Microbial Growth in Biofilms (Part A: Developmental and Molecular Biological Aspects) Edited by RON J. DOYLE VOLUME 337. Microbial Growth in Biofilms (Part B: Special Environments and Physicochemical Aspects) Edited by RON J. DOYLE VOLUME 338. Nuclear Magnetic Resonance of Biological Macromolecules (Part A) Edited by THOMAS L. JAMES, VOLKER DO¨TSCH, AND ULI SCHMITZ VOLUME 339. Nuclear Magnetic Resonance of Biological Macromolecules (Part B) Edited by THOMAS L. JAMES, VOLKER DO¨TSCH, AND ULI SCHMITZ VOLUME 340. Drug–Nucleic Acid Interactions Edited by JONATHAN B. CHAIRES AND MICHAEL J. WARING VOLUME 341. Ribonucleases (Part A) Edited by ALLEN W. NICHOLSON VOLUME 342. Ribonucleases (Part B) Edited by ALLEN W. NICHOLSON VOLUME 343. G Protein Pathways (Part A: Receptors) Edited by RAVI IYENGAR AND JOHN D. HILDEBRANDT VOLUME 344. G Protein Pathways (Part B: G Proteins and Their Regulators) Edited by RAVI IYENGAR AND JOHN D. HILDEBRANDT VOLUME 345. G Protein Pathways (Part C: Effector Mechanisms) Edited by RAVI IYENGAR AND JOHN D. HILDEBRANDT

Methods in Enzymology

xxxvii

VOLUME 346. Gene Therapy Methods Edited by M. IAN PHILLIPS VOLUME 347. Protein Sensors and Reactive Oxygen Species (Part A: Selenoproteins and Thioredoxin) Edited by HELMUT SIES AND LESTER PACKER VOLUME 348. Protein Sensors and Reactive Oxygen Species (Part B: Thiol Enzymes and Proteins) Edited by HELMUT SIES AND LESTER PACKER VOLUME 349. Superoxide Dismutase Edited by LESTER PACKER VOLUME 350. Guide to Yeast Genetics and Molecular and Cell Biology (Part B) Edited by CHRISTINE GUTHRIE AND GERALD R. FINK VOLUME 351. Guide to Yeast Genetics and Molecular and Cell Biology (Part C) Edited by CHRISTINE GUTHRIE AND GERALD R. FINK VOLUME 352. Redox Cell Biology and Genetics (Part A) Edited by CHANDAN K. SEN AND LESTER PACKER VOLUME 353. Redox Cell Biology and Genetics (Part B) Edited by CHANDAN K. SEN AND LESTER PACKER VOLUME 354. Enzyme Kinetics and Mechanisms (Part F: Detection and Characterization of Enzyme Reaction Intermediates) Edited by DANIEL L. PURICH VOLUME 355. Cumulative Subject Index Volumes 321–354 VOLUME 356. Laser Capture Microscopy and Microdissection Edited by P. MICHAEL CONN VOLUME 357. Cytochrome P450, Part C Edited by ERIC F. JOHNSON AND MICHAEL R. WATERMAN VOLUME 358. Bacterial Pathogenesis (Part C: Identification, Regulation, and Function of Virulence Factors) Edited by VIRGINIA L. CLARK AND PATRIK M. BAVOIL VOLUME 359. Nitric Oxide (Part D) Edited by ENRIQUE CADENAS AND LESTER PACKER VOLUME 360. Biophotonics (Part A) Edited by GERARD MARRIOTT AND IAN PARKER VOLUME 361. Biophotonics (Part B) Edited by GERARD MARRIOTT AND IAN PARKER VOLUME 362. Recognition of Carbohydrates in Biological Systems (Part A) Edited by YUAN C. LEE AND REIKO T. LEE

xxxviii

Methods in Enzymology

VOLUME 363. Recognition of Carbohydrates in Biological Systems (Part B) Edited by YUAN C. LEE AND REIKO T. LEE VOLUME 364. Nuclear Receptors Edited by DAVID W. RUSSELL AND DAVID J. MANGELSDORF VOLUME 365. Differentiation of Embryonic Stem Cells Edited by PAUL M. WASSAUMAN AND GORDON M. KELLER VOLUME 366. Protein Phosphatases Edited by SUSANNE KLUMPP AND JOSEF KRIEGLSTEIN VOLUME 367. Liposomes (Part A) Edited by NEJAT DU¨ZGU¨NES, VOLUME 368. Macromolecular Crystallography (Part C) Edited by CHARLES W. CARTER, JR., AND ROBERT M. SWEET VOLUME 369. Combinational Chemistry (Part B) Edited by GUILLERMO A. MORALES AND BARRY A. BUNIN VOLUME 370. RNA Polymerases and Associated Factors (Part C) Edited by SANKAR L. ADHYA AND SUSAN GARGES VOLUME 371. RNA Polymerases and Associated Factors (Part D) Edited by SANKAR L. ADHYA AND SUSAN GARGES VOLUME 372. Liposomes (Part B) Edited by NEJAT DU¨ZGU¨NES, VOLUME 373. Liposomes (Part C) Edited by NEJAT DU¨ZGU¨NES, VOLUME 374. Macromolecular Crystallography (Part D) Edited by CHARLES W. CARTER, JR., AND ROBERT W. SWEET VOLUME 375. Chromatin and Chromatin Remodeling Enzymes (Part A) Edited by C. DAVID ALLIS AND CARL WU VOLUME 376. Chromatin and Chromatin Remodeling Enzymes (Part B) Edited by C. DAVID ALLIS AND CARL WU VOLUME 377. Chromatin and Chromatin Remodeling Enzymes (Part C) Edited by C. DAVID ALLIS AND CARL WU VOLUME 378. Quinones and Quinone Enzymes (Part A) Edited by HELMUT SIES AND LESTER PACKER VOLUME 379. Energetics of Biological Macromolecules (Part D) Edited by JO M. HOLT, MICHAEL L. JOHNSON, AND GARY K. ACKERS VOLUME 380. Energetics of Biological Macromolecules (Part E) Edited by JO M. HOLT, MICHAEL L. JOHNSON, AND GARY K. ACKERS VOLUME 381. Oxygen Sensing Edited by CHANDAN K. SEN AND GREGG L. SEMENZA

Methods in Enzymology

xxxix

VOLUME 382. Quinones and Quinone Enzymes (Part B) Edited by HELMUT SIES AND LESTER PACKER VOLUME 383. Numerical Computer Methods (Part D) Edited by LUDWIG BRAND AND MICHAEL L. JOHNSON VOLUME 384. Numerical Computer Methods (Part E) Edited by LUDWIG BRAND AND MICHAEL L. JOHNSON VOLUME 385. Imaging in Biological Research (Part A) Edited by P. MICHAEL CONN VOLUME 386. Imaging in Biological Research (Part B) Edited by P. MICHAEL CONN VOLUME 387. Liposomes (Part D) Edited by NEJAT DU¨ZGU¨NES, VOLUME 388. Protein Engineering Edited by DAN E. ROBERTSON AND JOSEPH P. NOEL VOLUME 389. Regulators of G-Protein Signaling (Part A) Edited by DAVID P. SIDEROVSKI VOLUME 390. Regulators of G-Protein Signaling (Part B) Edited by DAVID P. SIDEROVSKI VOLUME 391. Liposomes (Part E) Edited by NEJAT DU¨ZGU¨NES, VOLUME 392. RNA Interference Edited by ENGELKE ROSSI VOLUME 393. Circadian Rhythms Edited by MICHAEL W. YOUNG VOLUME 394. Nuclear Magnetic Resonance of Biological Macromolecules (Part C) Edited by THOMAS L. JAMES VOLUME 395. Producing the Biochemical Data (Part B) Edited by ELIZABETH A. ZIMMER AND ERIC H. ROALSON VOLUME 396. Nitric Oxide (Part E) Edited by LESTER PACKER AND ENRIQUE CADENAS VOLUME 397. Environmental Microbiology Edited by JARED R. LEADBETTER VOLUME 398. Ubiquitin and Protein Degradation (Part A) Edited by RAYMOND J. DESHAIES VOLUME 399. Ubiquitin and Protein Degradation (Part B) Edited by RAYMOND J. DESHAIES VOLUME 400. Phase II Conjugation Enzymes and Transport Systems Edited by HELMUT SIES AND LESTER PACKER

xl

Methods in Enzymology

VOLUME 401. Glutathione Transferases and Gamma Glutamyl Transpeptidases Edited by HELMUT SIES AND LESTER PACKER VOLUME 402. Biological Mass Spectrometry Edited by A. L. BURLINGAME VOLUME 403. GTPases Regulating Membrane Targeting and Fusion Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 404. GTPases Regulating Membrane Dynamics Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 405. Mass Spectrometry: Modified Proteins and Glycoconjugates Edited by A. L. BURLINGAME VOLUME 406. Regulators and Effectors of Small GTPases: Rho Family Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 407. Regulators and Effectors of Small GTPases: Ras Family Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 408. DNA Repair (Part A) Edited by JUDITH L. CAMPBELL AND PAUL MODRICH VOLUME 409. DNA Repair (Part B) Edited by JUDITH L. CAMPBELL AND PAUL MODRICH VOLUME 410. DNA Microarrays (Part A: Array Platforms and Web-Bench Protocols) Edited by ALAN KIMMEL AND BRIAN OLIVER VOLUME 411. DNA Microarrays (Part B: Databases and Statistics) Edited by ALAN KIMMEL AND BRIAN OLIVER VOLUME 412. Amyloid, Prions, and Other Protein Aggregates (Part B) Edited by INDU KHETERPAL AND RONALD WETZEL VOLUME 413. Amyloid, Prions, and Other Protein Aggregates (Part C) Edited by INDU KHETERPAL AND RONALD WETZEL VOLUME 414. Measuring Biological Responses with Automated Microscopy Edited by JAMES INGLESE VOLUME 415. Glycobiology Edited by MINORU FUKUDA VOLUME 416. Glycomics Edited by MINORU FUKUDA VOLUME 417. Functional Glycomics Edited by MINORU FUKUDA VOLUME 418. Embryonic Stem Cells Edited by IRINA KLIMANSKAYA AND ROBERT LANZA

Methods in Enzymology

xli

VOLUME 419. Adult Stem Cells Edited by IRINA KLIMANSKAYA AND ROBERT LANZA VOLUME 420. Stem Cell Tools and Other Experimental Protocols Edited by IRINA KLIMANSKAYA AND ROBERT LANZA VOLUME 421. Advanced Bacterial Genetics: Use of Transposons and Phage for Genomic Engineering Edited by KELLY T. HUGHES VOLUME 422. Two-Component Signaling Systems, Part A Edited by MELVIN I. SIMON, BRIAN R. CRANE, AND ALEXANDRINE CRANE VOLUME 423. Two-Component Signaling Systems, Part B Edited by MELVIN I. SIMON, BRIAN R. CRANE, AND ALEXANDRINE CRANE VOLUME 424. RNA Editing Edited by JONATHA M. GOTT VOLUME 425. RNA Modification Edited by JONATHA M. GOTT VOLUME 426. Integrins Edited by DAVID CHERESH VOLUME 427. MicroRNA Methods Edited by JOHN J. ROSSI VOLUME 428. Osmosensing and Osmosignaling Edited by HELMUT SIES AND DIETER HAUSSINGER VOLUME 429. Translation Initiation: Extract Systems and Molecular Genetics Edited by JON LORSCH VOLUME 430. Translation Initiation: Reconstituted Systems and Biophysical Methods Edited by JON LORSCH VOLUME 431. Translation Initiation: Cell Biology, High-Throughput and Chemical-Based Approaches Edited by JON LORSCH VOLUME 432. Lipidomics and Bioactive Lipids: Mass-Spectrometry–Based Lipid Analysis Edited by H. ALEX BROWN VOLUME 433. Lipidomics and Bioactive Lipids: Specialized Analytical Methods and Lipids in Disease Edited by H. ALEX BROWN VOLUME 434. Lipidomics and Bioactive Lipids: Lipids and Cell Signaling Edited by H. ALEX BROWN VOLUME 435. Oxygen Biology and Hypoxia Edited by HELMUT SIES AND BERNHARD BRU¨NE

xlii

Methods in Enzymology

VOLUME 436. Globins and Other Nitric Oxide-Reactive Protiens (Part A) Edited by ROBERT K. POOLE VOLUME 437. Globins and Other Nitric Oxide-Reactive Protiens (Part B) Edited by ROBERT K. POOLE VOLUME 438. Small GTPases in Disease (Part A) Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 439. Small GTPases in Disease (Part B) Edited by WILLIAM E. BALCH, CHANNING J. DER, AND ALAN HALL VOLUME 440. Nitric Oxide, Part F Oxidative and Nitrosative Stress in Redox Regulation of Cell Signaling Edited by ENRIQUE CADENAS AND LESTER PACKER VOLUME 441. Nitric Oxide, Part G Oxidative and Nitrosative Stress in Redox Regulation of Cell Signaling Edited by ENRIQUE CADENAS AND LESTER PACKER VOLUME 442. Programmed Cell Death, General Principles for Studying Cell Death (Part A) Edited by ROYA KHOSRAVI-FAR, ZAHRA ZAKERI, RICHARD A. LOCKSHIN, AND MAURO PIACENTINI VOLUME 443. Angiogenesis: In Vitro Systems Edited by DAVID A. CHERESH VOLUME 444. Angiogenesis: In Vivo Systems (Part A) Edited by DAVID A. CHERESH VOLUME 445. Angiogenesis: In Vivo Systems (Part B) Edited by DAVID A. CHERESH VOLUME 446. Programmed Cell Death, The Biology and Therapeutic Implications of Cell Death (Part B) Edited by ROYA KHOSRAVI-FAR, ZAHRA ZAKERI, RICHARD A. LOCKSHIN, AND MAURO PIACENTINI VOLUME 447. RNA Turnover in Bacteria, Archaea and Organelles Edited by LYNNE E. MAQUAT AND CECILIA M. ARRAIANO VOLUME 448. RNA Turnover in Eukaryotes: Nucleases, Pathways and Anaylsis of mRNA Decay Edited by LYNNE E. MAQUAT AND MEGERDITCH KILEDJIAN VOLUME 449. RNA Turnover in Eukaryotes: Analysis of Specialized and Quality Control RNA Decay Pathways Edited by LYNNE E. MAQUAT AND MEGERDITCH KILEDJIAN VOLUME 450. Fluorescence Spectroscopy Edited by LUDWIG BRAND AND MICHAEL L. JOHNSON

Methods in Enzymology

xliii

VOLUME 451. Autophagy: Lower Eukaryotes and Non-mammalian Systems (Part A) Edited by DANIEL J. KLIONSKY VOLUME 452. Autophagy in Mammalian Systems (Part B) Edited by DANIEL J. KLIONSKY VOLUME 453. Autophagy in Disease and Clinical Applications (Part C) Edited by DANIEL J. KLIONSKY VOLUME 454. Computer Methods (Part A) Edited by MICHAEL L. JOHNSON AND LUDWIG BRAND VOLUME 455. Biothermodynamics (Part A) Edited by MICHAEL L. JOHNSON, JO M. HOLT, AND GARY K. ACKERS

C H A P T E R

O N E

Practical Approaches to Protein Folding and Assembly: Spectroscopic Strategies in Thermodynamics and Kinetics Jad Walters,* Sara L. Milam,* and A. Clay Clark* Contents 2 3 3 5 6 8

1. Introduction 2. Equilibrium Unfolding 2.1. Practical considerations 2.2. Instrumentation 2.3. Preparation of 10 M urea stock 2.4. Confirm that the protein is completely unfolded 2.5. Establishing equilibration times and reversibility for folding reactions 2.6. Equilibrium unfolding 2.7. Interpretation of equilibrium-unfolding curves 2.8. Data analysis 3. Measuring Folding Kinetics 3.1. Experimental protocol 3.2. Differential quenching by acrylamide 3.3. Data analysis References

9 10 12 19 21 21 26 29 36

Abstract We describe here the use of several spectroscopies, such as fluorescence emission, circular dichroism, and differential quenching by acrylamide, in examining the equilibrium and kinetic folding of proteins. The first section regarding equilibrium techniques provides practical information for determining the conformational stability of a protein. In addition, several equilibrium-folding models are discussed, from two-state monomer to four-state homodimer, providing a comprehensive protocol for interpretation of folding curves.

*

Department of Molecular and Structural Biochemistry, North Carolina State University, Raleigh, North Carolina, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04201-8

#

2009 Elsevier Inc. All rights reserved.

1

2

Jad Walters et al.

The second section focuses on the experimental design and interpretation of kinetic data, such as burst-phase analysis and exponential fits, used in elucidating kinetic folding pathways. In addition, simulation programs are used routinely to support folding models generated by kinetic experiments, and the fundamentals of simulations are covered.

1. Introduction Protein folding is a central theme in structural biochemistry and in biotechnology. While the forces that stabilize protein structure have been examined for more than one hundred years (Clark, 2008), protein chemists still are unable to predict the native structure of a protein from a known amino acid sequence. Under physiological conditions, proteins exist in equilibrium between ensembles of unfolded states (U) and native states (N), where each ensemble is characterized by a closely related set of structures that fluctuate around a local (or global) energy minimum. Protein function depends on attaining the native conformation. While the forces that drive proteins to adopt their native conformations are, in general, defined as the difference between the unfavorable chain entropy and the favorable enthalpic interactions, the stability of protein native structures can vary drastically. Moreover, the kinetic pathway a protein utilizes to adopt its native conformation can vary from a relatively simple two-state process, where only the native and unfolded ensembles are populated significantly, to more complex reactions in which the structure passes through one or more nonnative, partially folded intermediates before reaching the native conformation. It has been well documented that the pathways can be sequential, in which the intermediates are found between the unfolded and native ensembles, or parallel, in which multiple intermediates form simultaneously and eventually coalesce to the native ensemble (Wallace and Matthews, 2002). In addition, not all intermediates lead to the native conformation but rather can lead to misfolded, or off-pathway, structures (Ikai and Tanford, 1971). Finally, proteins have been shown to fold over a wide range of time regimes, from microseconds to hours (Creighton, 1990). Consequently, characterizing the kinetic and thermodynamic folding of proteins can be a daunting task. However, the benefits to understanding the folding process can prove invaluable, for example, in revealing motifs or regions of the protein that are critical to function, as potential drug targets, or in determining the mechanisms for protein misfolding or aggregation (Cohen and Kelly, 2003; Soto, 2003) Outlined in this chapter is practical information for characterizing the thermodynamic and kinetic folding properties of a protein by exploiting intrinsic probes such as fluorescence emission and circular dichroism.

Practical Approaches to Protein Folding

3

One should note other excellent sources that describe the use of extrinsic probes in protein folding (Lakowicz, 2006; Weber, 1951; Waggoner, 1995). Fluorescence techniques are extremely useful for this application, and the advantages of fluorescence emission over other techniques make it an attractive method to examine protein tertiary structure. These include high sensitivity, the use of low protein concentrations, the ability to selectively monitor regions or motifs within a protein, and the use of a multitude of solution conditions (Eftink, 2000). Circular dichroism also is employed to examine the protein secondary structure and/or tertiary structure during unfolding and refolding and to validate the findings from the fluorescence emission experiments. While proteins can be unfolded using a variety of agents, the focus here is on a well-identified chaotrope, urea. The equilibrium-unfolding studies described here allow for the calculation of the conformational free energy, revealing the stability of the native conformation and intermediates (where applicable). The kinetic techniques aid in deciphering the folding pathway and in examining intermediates that may not be detectable in equilibrium experiments. This chapter aims to provide a comprehensive protocol for examining the thermodynamic and kinetic folding properties of simple systems, so-called two-state, as well as more complex systems where multiple intermediates are present. More complex analyses for parsing the conformational free energy into component parts (entropy, enthalpy, heat capacity) as well as studies of the transition state can be found elsewhere (Dill, 1990; Privalov, 1989; Royer, 2008). Often, it is of interest to examine the conformational stability not only of wild-type proteins but also of mutants or other proteins that differ slightly in structure. This type of analysis can be useful in comparing proteins within the same family or comparing structural motifs in general. Such studies have revealed critical residues and regions of proteins that make significant contributions to the overall stability (Wilson and Wittung-Stafshede, 2005). The protocols outlined in this chapter also are useful for comparing multiple proteins in a family.

2. Equilibrium Unfolding 2.1. Practical considerations Equilibrium unfolding is the process of disrupting the protein’s native structure in favor of the unfolded ensemble by increasing the concentration of denaturant, either urea or guanidinium hydrochloride, GdmHCl, in a stepwise manner such that the protein reaches a balance between native and unfolded conformations. For this discussion, urea will be used as the denaturant; however, similar methodologies apply when using GdmHCl (to further investigate the properties and advantages and disadvantages of

4

Jad Walters et al.

urea and GdmHCl, see Pace, 1986). Increased temperature is another common technique used to induce unfolding. While the protocol is not outlined here, a thorough description of the technique and data analysis exists elsewhere (Pace and Sholtz, 1997; Pace et al., 2005). There are many factors to consider prior to setting up folding reactions and taking measurements. Is unfolding reversible? That is, do the unfolding and refolding data overlay to validate reversibility? This is important because the experiments yield thermodynamic information on the stability of the protein. Is the folding reaction equilibrated? Because equilibrium experiments are being performed, sufficient time must be given to allow each reaction to come to equilibrium. It is common for proteins, even single-point mutants, to vary greatly in equilibration times. While there are various methods for determining equilibration times, two are described subsequently. Other factors to consider include incubation temperature, the use of reductants, and the number of aromatic side chains in the protein. Temperature must be considered carefully, as the time required for the reaction to equilibrate can be temperature dependent (Pace, 1986). The experiments described here include the addition of 1 mM reducing agent because of the presence of free sulfhydryl groups and the possibility of forming disulfide bonds. We use dithiothreitol (DTT) routinely, but it is important to note that DTT degrades quickly. Therefore, if long incubation times are required (more than 24 h), then b-mercaptoethanol or tris(2-carboxyethyl)phosphine (TCEP) would serve well (Zahler and Cleland, 1968). If free sulfhydryls are not an issue, the reducing agent can be left out. It is very important to obtain an accurate determination of the protein concentration because fluorescence emission is a sensitive technique. If the extinction coefficient is known for the protein of interest, the concentration of the native stock can be readily obtained by measuring the absorbance of the protein at 280 nm. If the extinction coefficient is unknown, then the protocol outlined by Pace and Schmid (1997) explains in great detail how to determine this parameter. The concentration of the protein required for equilibrium unfolding experiments is typically in the low micromolar range but depends on the number of aromatic residues present in the protein. Protein fluorescence emission is dominated by tryptophans and tyrosines because of their high quantum yield at the wavelength of excitation (Schmid, 1997). Finally, there are a variety of buffers that one may use for fluorescence and CD measurements. The absorbance properties of the buffer must be taken into consideration to assure that it does not absorb in the spectral region of the protein. Buffer blanks (the sample without protein) should be scanned with each experiment and subtracted from the protein sample to remove artifacts that may be introduced by the buffer and urea. Typically, the following steps are used to set up an equilibriumunfolding experiment, and several steps are described in detail subsequently.

Practical Approaches to Protein Folding

5

1. Prepare 10 M urea stock. 2. Confirm that the protein is completely unfolded. 3. Optimize instrument settings using native and unfolded protein in their respective buffers. 4. Set up unfolding and refolding samples in varying concentrations of denaturant. 5. Establish equilibration times and reversibility. 6. Perform equilibrium-unfolding and equilibrium-refolding experiments. 7. Repeat experiments at different protein concentrations as needed.

2.2. Instrumentation For the studies described here, fluorescence emission is measured using a PTI C-61 spectrofluorometer (Photon Technology International, Birmingham, NJ), and circular dichroism is measured using a PiStar spectropolarimeter (Applied Photophysics, Surrey, UK). Both instruments are equipped with water jackets to maintain a constant temperature during the experiment. It is useful to monitor unfolding by different techniques because intermediates not revealed by one technique may appear using another. In general, if the equilibrium-unfolding curves from multiple techniques coincide, then the data from one technique is sufficient to determine the conformational free energy. Conversely, deviation in the unfolding curves, or noncoincidence of the data, from one technique to another implies intermediates are present under equilibrium conditions. General parameters and considerations for each technique are listed below. Note that quality quartz cuvettes are used in both techniques. 2.2.1. Fluorescence emission Certain features can be instrument and/or software specific; however, the same general parameters apply to the setup regardless of the instrument used. Adjustment of the slit width allows only the desired amount of light to enter the sample chamber and/or detector and, in part, determines the signal-tonoise ratio. Protein concentration will have the greatest effect on this setting. The reader is referred to the manufacturers’ guidelines for setting slit widths. Emission scans are acquired between 300 nm and 400 nm, following excitation at 280 nm and at 295 nm. The former provides information on the environmental changes of both tryptophan and tyrosine side chains due to the absorption wavelengths of both amino acids (280 nm and 275 nm, respectively). The latter provides a method to follow tryptophan emission selectively because there is little absorbance of tyrosines at 295 nm (Lackowicz, 2006). In general, there are two methods for collecting equilibrium unfolding data using fluorescence emission. In the first, one will obtain an emission spectrum for the native and unfolded samples (described subsequently).

6

Jad Walters et al.

Then one will choose a single wavelength that provides the largest difference in the two samples (Fig. 1.1A). In subsequent experiments, one will examine fluorescence emission intensity at the prescribed wavelength versus urea, where the signal typically is averaged for 30 s. In the second method, one will obtain an emission spectrum at each concentration of denaturant and calculate the average emission wavelength (AEW) for each sample (Royer et al., 1993). A description of the advantages and disadvantages of using AEW is provided by Eftink (1994). 2.2.2. Circular dichroism Secondary structure is monitored during unfolding by CD. Minima at 208 nm and 222 nm indicate a-helical structure, whereas a minimum at 217 nm is characteristic of b-sheet (Woody, 1995). Circular dichroism is strongest at the aforementioned wavelengths; however, light scattering by buffer components may require monitoring CD at somewhat higher wavelengths. We routinely monitor CD at 228 nm, which allows for detection of secondary structural changes in urea while avoiding amplification of the voltage at higher denaturant concentrations. As described previously for fluorescence emission, however, one should determine the wavelength that provides the greatest difference in signal between the native and unfolded protein samples. Slit width and scanning speeds should be adjusted according to the manufacturers’ specifications.

2.3. Preparation of 10 M urea stock This protocol, adapted from the method described by Pace (1986), describes the preparation of 100 mL of 10 M urea stock containing 50 mM potassium phosphate buffer, pH 7.5. One should use an analytical balance that measures accurately to 0.1 mg. Potassium phosphate buffer is prepared separately in a volumetric flask (100 mL) using the chemicals in step 1 below and distilled, deionized water. 1. Separately weigh 191 mg of potassium phosphate monobasic, KH2PO4, and 650 mg of dibasic, anhydrous dipotassium hydrogen phosphate, K2HPO4, on weigh paper. 2. Weigh 60.0 g of ultra pure urea (purity >99%) in a weigh boat. 3. Combine both phosphates in a beaker with a minimum capacity of 200 mL; add a clean, dry stir bar; place on a scale and tare. Add the urea into the beaker and record the weight. 4. Add distilled, deionized water into the beaker to a weight of 114.6 g. Record the weight. 5. Cover the beaker and stir until the urea dissolves. One should expect this to take 3 to 4 h.

7

Practical Approaches to Protein Folding

B

4105 3105 2105 1105

Pretransition = YN

8104

6104

Transition

5105

Fluorescence emission (A.U.)

Fluorescence emission (A.U.)

A

4104

2104 Posttransition = YU

0 300

320

340

360

380

0

400

0

1

2

3

0

1

2

3

[Urea]

6

7

8

6

7

8

D 1.0

1.0

0.8

0.8 Relative signal

Relative signal

C

4 5 [Urea]

0.6 0.4

0.6 0.4 0.2

0.2

0.0

0.0 0

1

2

3

4 5 [Urea]

6

7

8

4

5

[Urea]

E 1.0

Relative signal

0.8 0.6 0.4 0.2 0.0 0

1

2

3

4

5

6

7

8

[Urea]

Figure 1.1 (A) Emission spectra following excitation at 280 nm. Data for 0 M (○) and 8 M (□) urea are shown. In this example, the dotted line indicates a wavelength at which unfolding may be monitored due to a large difference in signal between the native and unfolded protein. (B) Equilibrium unfolding curve monitored by fluorescence emission at 320 nm (excitation at 280 nm). The pretransition, transition, and posttransition regions are indicated. (C) Normalized data demonstrating three probes used in the unfolding experiments. Unfolding was monitored by fluorescence emission following excitation at 280 nm (○) or 295 nm (□) or by CD (D). Refolded protein () demonstrates reversibility. (D) Noncoincidence of the unfolding curves when monitored by different spectroscopic techniques, suggesting a more complex folding mechanism than the two-state model suggested by a single technique. (E) Example of a threestate equilibrium-unfolding curve. Continuous lines in panels C–E represent fits of the data either to a two-state (panels C and D) or three-state (panel E) monomer unfolding model as described in the text.

8

Jad Walters et al.

6. When dissolved, check the pH using a recently calibrated pH meter. If the pH needs to be adjusted, then correct to pH 7.5, accordingly. 7. Filter before use. Once the urea stock is prepared, the molarity is determined based on the recorded weights from steps 3 and 4 and from the refractive index, as described by Pace (1986). If the difference in the calculated molarity from each method is less than 10%, then the urea may be used. If it is greater than 10%, then the urea must be prepared again. Upon completion, the urea may be stored at 80 C until used. A reducing agent such as DTT is added from a stock solution to the buffer and urea just prior to use. Methods for making stock urea in other buffers is described by Pace and Scholtz (1997).

2.4. Confirm that the protein is completely unfolded Before starting an equilibrium unfolding experiment, it is important to confirm that the unfolded sample has reached equilibrium and that the protein is completely unfolded, because many of the experiments begin with the unfolded protein. If insufficient time is allowed for the protein to unfold, or if one uses too low a denaturant concentration to fully unfold the protein, then subsequent experiments would begin with a species other than the unfolded state, and the data may be incorrect. To confirm the protein is completely unfolded after a certain incubation period, add native protein to 8 M urea-containing buffer, as shown subsequently, so that the final protein concentration is 1 mM and the final volume is 1 mL.

Urea (800 mL of 10 M urea stock) DTT (10 mL of a 0.1 M stock) Buffer (185 mL) Native protein (5 mL of a 200 mM stock)

A second sample should be prepared in which the urea is replaced with buffer. Take a fluorescence emission scan from 300–400 nm of the native protein (no urea) and of the unfolding sample. Place the samples back in the reaction tube and incubate at the desired temperature (typically 25 C) for 10 min. Take a second emission scan of the unfolded protein. Repeat this process until no change in signal is observed. If the protein is resistant to chemical denaturation, longer incubation times may be required between scans. Also, it is recommended that an additional scan be taken after the sample has incubated in urea for 24 h. One considers the protein equilibrated when no further signal change is observed. Depending on the protein, one might expect a red shift in the emission peak of unfolded protein with respect to the native protein (Fig. 1.1A), which shows that the protein tertiary structure is disrupted by the denaturant.

Practical Approaches to Protein Folding

9

In separate experiments, the samples should be excited at 280 nm and at 295 nm, as described previously. Also, one should verify the results using CD to examine changes in secondary structure. Repeat the experiments as described above until no further change in CD is observed. Finally, the fluorescence and CD experiments should be repeated at several final urea concentrations in order to determine the concentration of urea that is sufficient to unfold the protein. The example shown here uses 8 M urea-containing buffer, but the protein may unfold at much lower urea concentrations. Conversely, if the protein does not unfold fully at high urea concentrations (9 M ), then one should use a different denaturant, such as GdmHCl.

2.5. Establishing equilibration times and reversibility for folding reactions 2.5.1. Method 1 This protocol would serve well when no prior information on the folding of the protein of interest is available. The main purpose of this experiment is to determine the amount of time required to equilibrate the protein incubated in intermediate concentrations of urea. In general, this method requires one to set up unfolding and refolding samples and to monitor fluorescence emission over time until equilibrium is observed, that is, when the signal from the unfolding and refolding reactions are identical. Similar experiments also should be done using CD, as noted earlier. The unfolding samples are set up following the protocol shown in Table 1.1. The refolding samples are set up similarly with the exception that the starting material is unfolded protein, as shown in Table 1.2, in 8 M urea-containing buffer. In both cases, the final protein concentration is 1 mM. One should note that the urea in the unfolded protein stock must be accounted for in the setup of the refolding samples. We typically set up the experiments in 2-mL siliconized Eppi tubes to prevent protein from sticking to the tube. All samples should be incubated at the desired temperature in a water bath. Once the unfolding and refolding samples are assembled, fluorescence emission scans are taken for each sample from 300 nm to 400 nm. In separate experiments, the samples are excited at 280 nm and at 295 nm, as described previously. If the signals of the unfolding and refolding samples are identical, then the reaction has reached equilibrium. If the signals differ, then place each sample back in the reaction tube and incubate for longer time periods. Following incubation, take another emission scan, and repeat this process until the unfolding and refolding signals match for each final urea concentration, at which point equilibrium has been reached. As stated previously, equilibration may take only a few minutes, or it could potentially take several hours or days. The experiments should be repeated by monitoring CD, as described previously.

10

Jad Walters et al.

Table 1.1 Unfolding reaction setup Urea (mL)

Phosphate buffer (mL)

DTT (mL)

Native protein (mL)

Final urea [M]

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

985 935 885 835 785 735 685 635 585 535 485 435 385 335 285 235 185

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Calculations are based on stock concentrations of 200 mM native protein, 10 M urea, 100 mM DTT, and 50 mM phosphate buffer, pH 7.5. The final protein concentration is 1 mM.

2.5.2. Method 2 The longest equilibration times are those at the transition midpoint(s) because the rates of folding and unfolding are approximately equal (Pace, 1986). Therefore, one can monitor fluorescence emission over time for protein incubated in urea concentrations near the midpoint of the transition. One sample includes native protein in urea-containing buffer (unfolding sample). The second sample contains unfolded protein in urea-containing buffer (refolding sample). In both samples, the final urea concentration is identical and is close to the midpoint for unfolding. The samples are incubated as described earlier, and fluorescence emission scans of both samples are collected until the signals are identical. This method provides a quick and efficient approach for determining equilibration times when some information is known concerning the protein of interest.

2.6. Equilibrium unfolding In general, equilibrium unfolding experiments are set up in one of two ways. First, the simplest method is to use a titrator that accurately adds protein and urea-containing buffer into the cuvette. For this method, one

11

Practical Approaches to Protein Folding

Table 1.2 Refolding reaction setup Urea (mL)

Phosphate buffer (mL)

DTT (mL)

Unfolded protein (mL)

Final urea [M]

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750

927.5 877.5 827.5 777.5 727.5 677.5 627.5 577.5 527.5 477.5 427.5 377.5 327.5 277.5 227.5 177.5

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5 62.5

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Calculations are based on an unfolded protein stock of 16 mM in 8 M urea-containing buffer, 10 M urea stock, 100 mM DTT stock, and 50 mM phosphate buffer, pH 7.5. The final protein concentration is 1 mM.

need prepare only two solutions, native protein in buffer and unfolded protein in urea-containing buffer. The two solutions are prepared and allowed to reach equilibrium before data are collected. In this way, the protein concentration remains constant, but the urea concentration changes with each mixing. This method is preferred for proteins with short equilibration times. Second, for proteins with longer equilibration times (several minutes to hours), individual sample tubes are set up in the same manner as described previously. A typical experimental setup is shown in Tables 1.1 and 1.2. In this case, the final protein concentration is 1 mM and the final urea concentration is varied between 0 and 8 M. One should note that the protocol should be adjusted for each protein in order to maximize the number of samples in the transition region. The reactions are incubated for the established equilibration time prior to measurements of fluorescence emission and CD. Three data sets are obtained from each sample shown in Tables 1.1 and 1.2, one each from fluorescence emission following excitation at 280 nm or 295 nm and one from far-UV CD. If the protein under study is a monomer, then only one protein concentration will be needed for the final analysis. Several concentrations should be tested, however, to verify that the protein does not oligomerize at higher

12

Jad Walters et al.

protein concentrations. For a monomer, there should be no difference in the unfolding curves from one protein concentration to the next. If this is the case, then the data may be averaged. If the protein is a dimer or higherorder oligomer, then multiple protein concentrations should be examined in order to determine the concentration-dependent transition. Typically, protein concentrations over a ten-fold range (at least) are used.

2.7. Interpretation of equilibrium-unfolding curves The two-state equilibrium folding mechanism has been described in detail by Pace and others (Greene et al., 1974; Pace, 1986; Scholtz, 1995; Saito and Wada, 1983), where the native and unfolded ensembles are present in the absence of well-populated intermediate conformations. The goal of this section is to provide a comprehensive outline for analyzing and interpreting the data from an equilibrium folding experiment. We describe fitting for the following folding models: two-state monomer, three-state monomer, twostate dimer, three-state homodimer, three-state heterodimer, and four-state homodimer. More complicated mechanisms have been described for some proteins, such as the four-state monomeric model described by Enoki (2006), for example, but they are not considered here. While we will not derive the equations used in fitting the data, Tables 1.3 and 1.4 show the equations used in the fitting process for each respective model and the definition of each molar fraction. In addition, the references provided describe derivations for the models discussed here. Table 1.3 Equilibrium-folding models for monomeric proteins 2-state model Keq

$U

Mechanism

N

Equilibrium constants and total protein concentration

½U ½N PT ¼ [N] þ [U] 1 1þK —

Definition of molar fraction fN ¼ Definition of molar fraction fI ¼ Definition of molar fraction fU ¼ Fitting equation

K¼

K 1þK Y ¼ YNfN þ YUfU

3-state model N

K1

K2

$I $U

½I ½U ; K2 ¼ ½N ½I PT ¼ [N] þ [I] þ [U] 1 ð1 þ K1 þ K1 K2 Þ K1 ð1 þ K1 þ K1 K2 Þ K1 K2 ð1 þ K1 þ K1 K2 Þ Y ¼ YNfN þ YIfI þ YUfU K1 ¼

Notes: N, native state; I, intermediate state; U, unfolded state; PT, total molar concentration of the protein; fN, fI, fU, are the mole fractions of the respective species. YN, YI, and YU are the amplitudes of the spectroscopic signal for the specified species.

Table 1.4 Equilibrium-folding models for homodimeric proteins 2-state model Keq

$ 2U

3-state model (monomeric intermediate) N2

K2

K2

$ 2I $ 2U

Mechanism

N2

Equilibrium constants and total protein concentration Definition of molar fraction fN2 ¼ Definition of molar fraction fI2 ¼ Definition of molar fraction fI ¼

½U2 ½N2 PT ¼ 2½N2 þ ½U

½I2 ½U ; K2 ¼ ½N2 ½I PT ¼ 2½N2 þ ½I þ ½U

1 fU

1 fI fU

—

—

—

ðK1 þ K1 K2 Þ þ

Definition of molar fraction fU ¼ Fitting equation

K¼

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ K þ K 2 þ 8KPT 4PT Y ¼ YN2 fN2 þ YU fU 3-state model (dimeric intermediate) K1

K2

Mechanism

N2 $ I2 $ 2U

Equilibrium constants and total protein concentration

½I2 ½U2 ; K2 ¼ ½I2 ½N2 PT ¼ 2½N2 þ 2½I2 þ ½U K1 ¼

K1 ¼

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðK1 þ K1 K2 Þ2 þ 8K1 PT 4PT

K2 fI Y ¼ YN2 fN2 þ YI fI þ YU fU 4-state model K1

K2

K3

N2 $ I2 $ 2I $ 2U

½I2 ½I2 ½U ; K2 ¼ ; K3 ¼ ½I2 ½N2 ½I PT ¼ 2½N2 þ 2½I2 þ ½I þ ½U

K1 ¼

(continued)

Table 1.4 (continued ) 3-state model (dimeric intermediate) K1

K2

4-state model K1

K2

K3

Mechanism

N2 $ I2 $ 2U

N2 $ I2 $ 2I $ 2U

Definition of molar fraction fN2 ¼ Definition of molar fraction fI2 ¼

1 fI2 fU

1 fI2 fI fU

2fU2 PT K2

2fI2 PT K2

Definition of molar fraction fI ¼

—

fU K3

Definition of molar fraction fU ¼

K1 K2 þ

Fitting Equation

Y ¼ YN2 fN2 þ YI2 fI2 þ YU fU

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðK1 K2 Þ2 þ 8PT ðK1 K2 þ K12 K2 Þ 4PT ð1 þ K1 Þ

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ K1 K2 K3 ð1 þ K3 Þ þ K12 K22 K32 ð1 þ K3 Þ2 þ 8PT ð1 þ K1 ÞðK1 K2 K32 Þ 4PT ð1 þ K1 Þ

Y ¼ YN2 fN2 þ YI2 fI2 þ YI fI þ YU fU

Notes: Abbreviations used are the same as described for Table 1.3, with the addition of N2 representing the native homodimer and I2 representing the homodimeric intermediate. fN1 and fI1 are the mole fraction of the homodimer and of the dimeric intermediate, respectively, and YN2 and YI2 are the amplitudes of the spectroscopic signal for the specified species.

15

Practical Approaches to Protein Folding

During the following discussion we will refer the reader to several unfolding curves that were generated for visualization purposes and that describe what one may expect from a typical unfolding experiment. Raw data, corrected for buffer background (Fig. 1.1B), are normalized between zero (unfolded) and one (native), as shown in Eq. 1.1, in order to visualize different spectroscopic signals on a single scale.

YNormalized ¼ ðYX YU Þ ðYN YU Þ:

ð1:1Þ

In this case, Yx is the signal being normalized, YU is the signal of the unfolded protein, and YN is the signal of the native protein. The latter two signals are in the absence of urea, where YN and YU are determined from linear fits of data in the pre- and posttransition regions and extrapolated to zero denaturant, as shown in Fig. 1.1B. A plot of the normalized signal versus the denaturant concentration generates the denaturation or unfolding curve (Fig. 1.1C) for each of the spectroscopic probes. When analyzing denaturation curves, there are three regions one must take note of, regardless of the mechanism. The pretransition region shows the dependence of the native protein signal on denaturant concentration and is represented in the unfolding curve in Fig. 1.1B between 0 and 2.5 M urea. The transition region represents a mixture of the native ensemble and the unfolded ensemble (or intermediate if present). Once again, examining the data in Fig. 1.1B, this region is observed between 3 and 6 M urea. The posttransition region represents the denaturant concentrations where the unfolded ensemble is largely populated. This region occurs between 6 and 8 M urea in Fig. 1.1B. The posttransition must be sufficiently defined for two reasons. First, and most important, it shows that the protein is unfolded. Second, fits of the data will be inaccurate if no posttransition region is present. If a posttransition region is not observed, then one must use higher concentrations of denaturant or a stronger chaotrope to ensure that the protein is completely unfolded. 2.7.1. Monomeric models The simplest mechanism by which a protein unfolds is a two-state process, where the native protein cooperatively unfolds to the unfolded ensemble, as shown in Eq. 1.2.

N

Keq

$U

ð1:2Þ

In this model, N represents the native ensemble, U represents the unfolded or denatured ensemble, and Keq represents the equilibrium constant for the reaction. As shown in Fig. 1.1C, the denaturation curve

16

Jad Walters et al.

displays one cooperative transition between 3 and 6 M urea. The closed circles represent the refolding reaction, demonstrating that the protein folds reversibly. While the data imply a two-state mechanism, the model should be validated by employing other techniques to monitor unfolding. For example, the equilibrium-unfolding techniques described previously illustrate three probes for monitoring unfolding, two for tertiary structure, and one for secondary structure. Agreement among the different spectroscopic probes implies a two-state mechanism (Fig. 1.1C), while noncoincidence of the folding curves indicates a more complex unfolding mechanism, typically due to the population of nonnative, partially folded, intermediates (Fig. 1.1D). One should note, however, that agreement in the unfolding curves is only consistent with a two-state mechanism and does not prove this model with certainty (Lumry, 1966). A biphasic unfolding curve demonstrates a three-state unfolding mechanism, where the native protein unfolds through a partially structured intermediate before completely unfolding (Fig. 1.1E). A general three-state model of equilibrium unfolding can be described by Eq. 1.3.

N

$I $U K1

K2

ð1:3Þ

Here, N, I, and U represent the native, intermediate, and unfolded ensembles respectively, and K1 and K2 represent the equilibrium constants for the two reactions. A typical biphasic unfolding curve is shown in Fig. 1.1E, where a plateau is observed between 3 M and 5.5 M urea in this example. The data demonstrate two transitions that correspond to the transition of N to I (1 M to 3 M urea) and of I to U (5.5 M to 7 M urea). 2.7.2. Dimeric Models Overall, the unfolding reaction must begin with the native dimer, N2, and end with two unfolded monomers. However, the pathway by which the dimer unfolds can include one or more intermediates (two-state dimer vs. three-state dimer mechanism). In addition, the intermediate can be dimeric or monomeric. The most basic model for dimer unfolding resembles the simple two-state mechanism for the monomer (Eq. 1.2), where the native dimer dissociates to the two unfolded monomers in a single transition, as shown in Eq. 1.4.

N2

Keq

$ 2U

ð1:4Þ

In this model, N2 represents the native dimer, 2U corresponds to the unfolded monomers, and Keq represents the equilibrium constant for the

17

Practical Approaches to Protein Folding

reaction. Fig. 1.2A shows the expected sigmoidal unfolding curve for a dimer, where a single transition is observed. The apparent stability of the dimer is dependent on the concentration of monomer. In other words, as the concentration of protein increases, the equilibrium shifts toward N2. Therefore, one observes a shift to higher transition midpoints at higher protein concentrations (Fig. 1.2A). To validate a two-state dimer model, A

B 1.0 1.0 0.8 Relative signal

Relative signal

0.8 0.6

0.4

0.2

0.6 0.4 0.2

0.0

0.0 0

1

2

3

C

4 [Urea]

5

6

7

8

0

1

2

3

4 5 [Urea]

6

7

8

0

1

2

3

4 [Urea]

6

7

8

D 1.0

1.0 0.8 Relative signal

Relative signal

0.8

0.6

0.4

0.2

0.6

0.4

0.2

0.0

0.0 0

1

2

3

4 [Urea]

5

6

7

8

5

Figure 1.2 (A) Example of an equilibrium-unfolding curve for a dimeric protein that follows a two-state unfolding model. Protein concentration dependence is demonstrated by an increase in the transition midpoint as the protein concentration is increased (○ > □ > D). (B) Example of a three-state dimer-unfolding model in which the protein concentration dependence is observed in the first transition, demonstrating the presence of a monomeric intermediate. (C) Example of a three-state dimer-unfolding model in which the protein concentration dependence is observed in the second transition, demonstrating the presence of a dimeric intermediate. (D) Example of a four-state dimer-unfolding model. (I) In this case, the midpoint of the first transition is the same for each protein concentration, while the second transition midpoint increases with increasing protein concentration (○ > ◊). The relative signal observed between 3 M and 5 M urea also increases with increasing protein concentration revealing a four-state unfolding process as described in the text.

18

Jad Walters et al.

we recommend examining several structural probes as outlined above in section 2.7.1 for a two-state monomer. If one establishes that the protein of interest follows a three-state model, then there are two pathways by which the protein can unfold. In the first case, a monomeric intermediate is populated (Eq. 1.5), and in the second case, a dimeric intermediate is populated (Eq. 1.6).

N2

$ 2I $ 2U

ð1:5Þ

N2

$ I2 $ 2U

ð1:6Þ

K1

K2

K1

K2

Here, N2 represents the native dimer, 2I represents the monomeric intermediate, I2 represents the dimeric intermediate, 2U represents the unfolded monomers, and K1 and K2 represent the equilibrium constants for each transition. A biphasic unfolding curve would be expected in each case if the intermediate is well populated, and one of the two transitions will be dependent on the protein concentration (Fig. 1.2B–C). If the first step is protein concentration dependent (Eq. 1.5, Fig. 1.2B), then the dimer dissociates in the first transition to yield a monomeric intermediate. In contrast, protein concentration dependence in the second transition indicates that subunit dissociation occurs after formation of a dimeric intermediate (Eq. 1.6, Fig. 1.2C). While we present several unfolding models for homodimers in Table 1.4, one should note that similar models exist for heterodimers, depending on how the subunits of the heterodimer are treated. In equilibrium studies of the heterodimeric bacterial luciferase (Clark et al., 1993) and of the histone H2A/H2B (Placek et al., 2005), for example, the subunits were assumed to be identical. In this case, the data are then treated as one would for a homodimer. Only when the subunits of the heterodimer are treated differently will the data analysis vary from that of a homodimer. If one establishes the presence of two intermediates in the equilibriumunfolding pathway of the dimer, then there are three possible four-state models. For these models, dimer dissociation occurs as the first step (Eq. 1.7), the second step (Eq. 1.8), or the third step (Eq. 1.9) in unfolding.

N2

$ 2I x $ 2I y $ 2U

ð1:7Þ

$ I2 $ 2I $ 2U

ð1:8Þ

N2

K1

K1

K2

K2

K3

K3

19

Practical Approaches to Protein Folding

N2

$ I2x $ I2y $ 2U K1

K2

K3

ð1:9Þ

For these models, N2 represents the native dimer, I2 represents the dimeric intermediate, 2I represents the monomeric intermediates, 2U represents the unfolded monomers, and K1 – K3 represent the equilibrium constants for the three transitions. Examples are known for the first two models, but we are not aware of examples in the literature for dimer dissociation in the third step (Eq. 1.9). The dimer of glycyl tRNA synthetase was shown to dissociate in the first step of unfolding, as described by Eq. 1.7 (Dignam et al., 2001), while the dimer of procaspase-3 was shown to dissociate in the second unfolding step, as described by Eq. 1.8 (Bose and Clark, 2001). The unfolding data for procaspase-3 suggest a minimum three-state unfolding mechanism because two transitions are observed (Fig. 1.2D). However, when the experiments are carried out at several protein concentrations, one observes that the signal for the plateau shifts to higher values at higher protein concentrations. The curves in Fig. 1.2D represent increasing concentration of procaspase-3, between 0.25 mM and 2 mM. Thus, the native dimer isomerizes to a dimeric intermediate (ureaf12g ¼ 2.5 M ), followed by dissociation of the dimer to two monomers (3–5 M urea). The monomers then unfold at higher urea concentrations (6–8 M urea). All together, these data reveal two important points. First, dimerization is considered a folding event because dimerization occurs as a result of the association of two monomeric intermediates. Second, fitting of the data to Eq. 1.8 reveals that dimerization contributes significantly to the overall stability of the protein (see subsequent section for further explanation of data fitting).

2.8. Data analysis 2.8.1. Equilibrium constants and fractions of species For the monomeric folding models described by Eqs. 1.2 and 1.3, the equilibrium constants for the transitions are related to free energy as shown in Eq. 1.10.

DG ¼ RT ln ðKeq Þ

ð1:10Þ

Here, R is the gas constant and T is the temperature in degrees Kelvin. If one assumes that the free energy change for each step of the reaction is linearly dependent on the denaturant concentration (Pace, 1986), then one may calculate the free energy change in the absence of denaturant, as shown in Eq. 1.11.

20

Jad Walters et al.

DG ¼ DGH2 O m½denaturant

ð1:11Þ

In this case, DGH2 O represents the free energy change in the absence of denaturant, and m represents the cooperativity index associated with the reaction. While not discussed here, m is related to the solvent accessible surface area for each transition, and a comparison of m-values for related proteins has been shown to be quite informative (Myers et al., 1995). For the two-state monomer described by Eq. 1.2, the sum of the fraction of N (fN) and of the fraction of U (fU) is one, and the total protein concentration is the sum of the concentrations of N and U (PT ¼ [N] þ [U]) at a given concentration of urea (Table 1.3). As a result, the apparent fraction of unfolded species is shown by Eq. 1.12. In this case, Y represents the signal obtained at each urea concentration.

fapp ¼

ðYN Y Þ K ¼ ðYN YU Þ 1 þ K

ð1:12Þ

Taking into account Eqs. 1.10–1.12 and solving for Y, one can derive Eq. 1.13, as described previously (Santoro and Bolen, 1988).

Y ¼ YN fN þ YU fU expðDGH2 O m½ureaÞ ðYN 0 þ mN ½ureaÞ þ ðYU 0 þ mU ½ureaÞ RT ¼ H O 2 1 þ expðDGRT m½ureaÞ ð1:13Þ Here, mN and mU account for changes in the signals of the pre- and posttransition regions with changes in denaturant, if any. YN0 and YU0 represent the signal of the native and unfolded species, respectively, in the absence of denaturant. Inherent is this description, as well as the subsequent ones, is that the measured signal is the sum of the fractional contribution of each species. Thus, Eq. 1.13 is used to describe the simple two-state model for equilibrium unfolding of a monomeric protein. Eqs. for the three-state equilibrium-folding model for a monomer are provided in Table 1.3, taking into account both equilibrium constants in terms of the fractions of the three species. Using a similar analysis, the data for homodimeric proteins are analyzed to obtain the free energy change with each step in unfolding. Equations. are provided in Table 1.4 for the two-state Eq. 1.4 (Bowie and Sauer, 1989;

Practical Approaches to Protein Folding

21

Gloss and Matthews, 1997), three-state Eqs. 1.5 and 1.6 (Clark et al., 1993; Grimsley et al., 1997; Harder et al., 2004; Hornby et al., 2000; Park and Bedouelle, 1998), and four-state Eq. 1.8 (Bose and Clark, 2001) models. 2.8.2. Fitting equilibrium-unfolding data There are a multitude of programs available for fitting protein-folding data, and certain features vary depending on the software. Simple fitting procedures, such as that for the two-state monomer (Eq. 1.2 and Table 1.3) are readily performed in spreadsheet programs, such as Kaleidagraph (Synergy Software), SigmaPlot (Systat Software), or Excel (Microsoft). However, fitting multiple data sets should be done globally. For example, data collected from the three spectroscopic probes described here, and especially data that suggest more complicated folding mechanisms (Eqs. 1.4–1.9, Table 1.4) should be fit simultaneously to one of the models shown in Tables 1.3 and 1.4, or another appropriate model to describe the data. The advantage to global fitting is that parameters that are constant to all data sets, DG and m-values, for example, are linked, whereas other parameters are set locally, such as YN 0 and YU 0 (Eq. 1.13), as they vary between data sets.

3. Measuring Folding Kinetics In contrast to the equilibrium experiments described previously, kinetic experiments examine refolding or unfolding processes by monitoring changes in a spectroscopic signal over time following the initiation of the reaction. A number of methods have been developed to examine protein folding kinetics, from continuous flow instruments, which measure reactions on the microsecond time scale (Shastry et al., 1998), to stoppedflow instruments, which measure reactions on the millisecond to minute time scale. Stopped-flow instruments, the focus of this section, can employ absorbance, fluorescence emission, circular dichroism, or other spectroscopies as a detection method. We describe here experimental protocols and data-fitting procedures for kinetic folding experiments utilizing stopped-flow fluorescence emission and circular dichroism spectroscopies.

3.1. Experimental protocol 3.1.1. General considerations Kinetic experiments with a stopped-flow instrument require rapid mixing of two solutions. For both refolding and unfolding studies, one routinely uses asymmetric mixing, 1:10 for example, either to dilute (refolding) or increase (unfolding) the amount of denaturant to span the unfolding or refolding transition region that was determined from equilibrium-folding

22

Jad Walters et al.

experiments. The 1:10 ratio is obtained typically by using one small drive syringe (e.g., 250 mL) and one large drive syringe (e.g., 2.5 mL). One should consult the instrument manufacturer to determine the mixing dead-time, which is the shortest time at which one can measure the kinetic signal due to the time required to mix the two solutions, typically 1–10 ms. While the instrumental setup will vary depending on the manufacturer, there are several parameters common to all instruments, including signal detection, temperature, slit widths, wavelengths, and time scale. Each parameter will be discussed briefly. As described previously, it is recommended to use multiple spectroscopic probes to study protein folding because each detection method provides different information about the structures formed during refolding or unfolding. For fluorescence emission studies, there are two options available. The detection photomultiplier tube can be attached directly to the sample handing unit, or it can be attached to a monochromator. The advantage to the latter method is that it allows the user to select a particular detection wavelength. The disadvantage is that the overall signal will be decreased because other emission wavelengths are filtered out such that only a fraction of the total emission is detected. In contrast, a cutoff filter is used if the photomultiplier tube is attached to the sample-handling unit. For this case, there are a variety of filters available, where the most common for intrinsic protein fluorescence emission use cutoff wavelengths of 305 nm or 320 nm. It is important to maintain a constant temperature around the observation cell and drive syringes. The most common temperature for kinetic folding experiments reported in the literature is 25 C (Maxwell et al., 2005; Zarrine-Afsar and Davidson, 2004). This is because 25 C is slightly above room temperature, which allows easy control with heating, and it is an adequate endpoint for temperature-jump studies. If a reaction is too fast to be detected by a stopped-flow instrument, it may be helpful to slow the chemistry by lowering the temperature. One should note that studies at low temperatures (<10 C) will require low temperature syringes to prevent sample leakage during injections. Kinetic traces should be collected until the signal reaches that of the native or unfolded control in order to examine the full time course for the reaction. Depending on the protein, equilibration could occur in milliseconds or in hours. One should note that longer collection times can result in artifacts due to remixing of solutions of different densities in the observation cell and flow lines. One can test for mixing artifacts using various fluorescent compounds, such as tryptophan. Hand mixing experiments, which are performed on a steady-state instrument set for time-based data collection, also can verify slow reactions. All experiments described below should be completed at multiple protein concentrations in (at least) duplicate experiments in order to verify the apparent kinetic rates and amplitude changes. One should use a ten-fold or

Practical Approaches to Protein Folding

23

greater variation in protein concentration, which will allow for the identification of protein concentration-dependent steps if the protein is an oligomer. If the protein is a monomer, then the apparent rate(s) should not depend on the protein concentration. If one observes this experimentally, then the data from the various protein concentrations can be averaged. If one observes a protein concentration-dependent rate during the folding or unfolding of a monomer, then the data indicate that the protein forms aggregates during the reaction. 3.1.2. Initial parameterization Two samples are used to determine the initial parameters (final protein concentration and slit widths) needed for a kinetic experiment: native protein in buffer and unfolded protein in urea-containing buffer. First, native protein stock is placed in the small syringe, mixed with buffer (from the large syringe), and data are collected for several seconds. Second, unfolded protein stock is placed in the small syringe, mixed with ureacontaining buffer (from the large syringe) such that there is no dilution of the urea, and data are collected for several seconds. For each experiment, there should be no change in signal over time because the samples serve as controls for the native and unfolded signals. The unfolded protein stock should contain the same final urea concentration as the urea-containing buffer to which it is mixed. The general procedure for these initial experiments is to examine one protein concentration and slit width for various time scales, then to change the slit widths and repeat the experiment until the maximum difference is observed between the native and unfolded control samples. One may find that the optimized slit widths and protein concentrations are similar to those used in equilibrium unfolding experiments. In general, total volumes of 1 mL of protein and 10 mL of buffer solutions are sufficient for determining one set of parameters (e.g., slit width and voltage used for a particular final protein concentration). In the experiments described subsequently, unfolded protein will be mixed with different urea-containing buffers in order to examine refolding at several final urea concentrations. The number of urea steps and volumes required will vary, depending on the protein and on the refolding conditions. The procedure outlined here is useful for examining burst-phase kinetics, apparent rates and amplitudes of observable phases, and for generating chevron plots of the apparent rate constants. 3.1.3. Sample preparation for measuring refolding and unfolding kinetics First, one should determine the final urea and protein concentrations. Eq. 1.14 describes the necessary calculation assuming a 1:10 mixing ratio.

24

Jad Walters et al.

ð½ureaprotein stock 1Þ þ ð½ureaurea stock 10Þ 11

¼ final urea concentration ð1:14Þ

For example, if the protein stock solution is in 8 M urea-containing buffer (1 part ¼ 250 mL syringe) and is mixed with a 0 M urea-containing buffer (10 parts ¼ 2.5 mL syringe), then the final urea concentration will be 0.73 M using Eq. 1.14. Similar calculations are used to determine the final protein concentration as well. To measure refolding kinetics, one should prepare two stock protein samples (unfolded and native protein) and various urea-containing buffers that encompass the refolding transition. For example, in a typical experiment that measures the refolding of procaspase-3, we routinely use 17 urea concentrations between 0 M and 8 M at 0.5 M increments, and we average 20 injections per urea step. This requires 10 mL of urea-containing buffer for each final urea concentration and 15 mL of protein stock. Table 1.5 shows the procedure for preparing urea-containing buffer solutions as well as the final urea concentration calculated using Eq. 1.14. If necessary, the urea concentrations should be adjusted in order to cover all regions of the equilibrium-unfolding curve. Table 1.6 shows the calculations for making the unfolded and native protein stocks, assuming that the protein unfolds in 8 M urea-containing buffer and that the final protein concentration in the observation cell is 5 mM. A native protein stock is used as a control in order to determine the final signal in the refolding process (Table 1.6). The native stock should contain the same buffer as the final refolding sample: 0.73 M urea-containing buffer in this example. Incubate the protein samples for the predetermined equilibration time, and then follow the subsequent instrument procedure. To measure unfolding kinetics, one will mix native protein with different urea-containing buffers. The sample preparation basically is the same as for the refolding experiment except for the initial protein stock. Prepare the urea-containing buffers as described above (Table 1.5) and the unfolded protein and controls as shown in Table 1.6. Note that the unfolded control should contain the same urea concentration as the final unfolding sample, 8 M in this example. Incubate the protein samples for the predetermined equilibration time, and then follow the subsequent instrument procedure. 3.1.4. Instrument procedure The general procedure for a kinetic refolding experiment on a stopped-flow instrument is outlined here. The experimental protocol for unfolding follows the same steps except that one starts with native rather than unfolded protein. The basic idea is to refold or unfold the protein at several final urea

25

Practical Approaches to Protein Folding

Table 1.5 Example calculations for urea and 0.77 M acrylamide/urea stock solutions 10 M Urea stock (mL)

Urea stocks (M) 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

Buffer (mL)

8 M Acrylamide stock (mL)

Final urea concentration (M)

10 9 8 7 6 5 4 3 2

0.73 1.64 2.55 3.45 4.36 5.27 6.18 7.09 8

0.77 M Acrylamide/Urea stocks (M) 0 0 9.037 0.963 1 1 8.037 0.963 2 2 7.037 0.963 3 3 6.037 0.963 4 4 5.037 0.963 5 5 4.037 0.963 6 6 3.037 0.963 7 7 2.037 0.963 8 8 1.037 0.963

0.73 1.64 2.55 3.45 4.36 5.27 6.18 7.09 8

Table 1.6 Example calculations for 55 mM procaspase-3 protein stock solutions for refolding/unfolding kinetic experiments Protein stock solutions

10 M Urea (mL)

Unfolded control 0.8 (1 mL) Native (15 mL) 0 Native control 0.073 (1 mL) Unfolded 12 (15 mL) a

1 M DTT (mL)

825 mM Proteina (mL)

Buffer (mL)

1

66.7

0.1323

15 1

1000 66.7

13.985 0.8593

15

1000

1.985

Initial protein concentration of 825 mM is diluted to 55 mM for kinetic folding experiments.

26

Jad Walters et al.

concentrations and to monitor the signal until no further change occurs. Final signals for the lowest and highest urea concentrations should match those of the controls (native and unfolded protein). 1. Flush the system with distilled, deionized water and then with buffer. Make sure that the drive syringes contain no air bubbles, as this will cause mixing artifacts in the data. For 1:10 asymmetric mixing, about ten injections are required to move samples and/or buffers completely through the system, depending on the tubing length and stop syringe volume. 2. Place the unfolded protein stock in the small drive syringe and the 8 M urea-containing buffer in the large drive syringe. 3. Fill the drive syringes and flush the system. 4. Set the instrument parameters determined previously (wavelength, temperature, slit width, signal detection of choice, voltage). 5. Obtain data by acquiring multiple injections at one set of conditions. The goal is to collect enough repetitions to average, the final number of which will depend on the signal-to-noise ratio. The averages also depend on the final protein concentration and detection method. Collect data for various time frames, such as 1, 10, 100, and 500 s, as needed. 6. Once the unfolded control sample is completed, remove the 8 M ureacontaining buffer. For a refolding experiment, fill the large drive syringe with the next urea-containing buffer, in this case 7.5 M urea. One should note that whenever the solution conditions are drastically changed, one should rinse the syringe either with distilled, deionized water or with buffer to remove any remaining solution in the syringe. 7. Flush the system and acquire data as in step 5. 8. Continue to change the urea-containing buffer in the large drive syringe and acquire data by repeating steps 5–7 until the final buffer (0 M urea) is reached. 9. Remove unfolded protein and buffer solutions from the syringes. 10. For the native control, add the native protein (Table 1.6) to the small syringe and 0.73 M urea-containing buffer to the large syringe. 11. Flush the system and acquire data as in step 5. 12. Once the experiment is complete, clean the instrument by removing the protein and urea-containing buffer solutions from the reservoir syringes. Flush the system with buffer and distilled, deionized water as before.

3.2. Differential quenching by acrylamide Differential quenching describes the relative solvent accessibility of aromatic residues during refolding or unfolding compared to the same residues in the native or unfolded states. The technique of differential quenching may

Practical Approaches to Protein Folding

27

allow one to identify and characterize folding intermediates not detected by other methods such as stopped-flow fluorescence emission or CD spectroscopies. For example, Vanhove et al. (1998) used this technique to investigate a nonnative intermediate formed during the refolding of TEM-1 b-lactamase. Four tryptophan residues are located on the protein surface in the native state, and it is proposed that the tryptophans are less accessible to solvent during refolding than in the native state. This group verified, through quenching by acrylamide and other techniques, the presence of a hydrophobic collapse in the nonnative intermediate. One should first identify the quencher of choice. It is desirable to use a quencher that reacts differentially to the native and unfolded states of the protein. Equilibrium quenching measurements should be conducted first to provide the optimal concentration of quencher for use in kinetic studies. In the case of procaspase-3, for example, it was shown that 0.7 M acrylamide provided the greatest difference between quenching of native and unfolded protein fluorescence emission (Bose et al., 2003). In kinetic experiments, however, it is important to verify that the quencher does not affect the kinetic folding process. One should test a variety of quencher concentrations and, if necessary, different quenchers. Also, if an ionic quencher (e.g., iodide) is used, it is important to verify that the increased ionic strength has no effect on the folding process. One can do this simply by increasing the ionic strength in the folding experiments described earlier (in the absence of quencher). 3.2.1. Sample preparation Sample preparation for differential quenching by acrylamide is performed as described previously for the refolding and unfolding experiments except that acrylamide is included in the urea-containing buffers (Table 1.5). Protein stock solutions should be prepared as described in Table 1.6. Equation 1.14 is used to determine the initial concentration of acrylamide needed in the urea-containing buffers. Using procaspase-3 as a model, a final acrylamide concentration of 0.7 M requires the initial concentration to be 0.77 M, so the 0.77 M acrylamide urea-containing buffers are made as shown in Table 1.5. The acrylamide stock solution (8 M) is made by weighing 11.37 g of acrylamide (wear a mask) and bringing the volume up to 20 mL with buffer. 3.2.2. Experimental procedure The experimental procedure for stopped-flow differential quenching is the same as those for the refolding and unfolding experiments described previously for stopped-flow fluorescence emission. Briefly, unfolded or native protein (Table 1.6) is placed in the small syringe and is mixed 1:10 with acrylamide- and urea-containing buffers (Table 1.5). Data also are collected for controls of native and unfolded protein (Table 1.6). In the examples

28

Jad Walters et al.

shown here, the fluorescence emission of the native protein control is examined in 0.7 M acrylamide, 0.73 M urea-containing buffer, and that of the unfolded protein control is examined in 0.7 M acrylamide, 8 M ureacontaining buffer. The excitation wavelength should be set to 295 nm in order to avoid inner-filter effects from acrylamide (Lakowicz, 2006). The kinetic data in the presence of quencher and at various final urea concentrations are plotted with the native and unfolded controls, which also are in the presence of quencher (Fig. 1.3A). In this way, one observes the A

B 5.01 6.2 6.0

4.99 Fluorescence signal

Fluorescence signal in 0.7 M acrylamide

5.00

4.98 4.97 4.96 4.95

5.8 5.6 5.4 5.2

4.94 4.93 0.0001 0.001 0.01 0.1 Time (s)

1

10

5.0 0.0001 0.001

0.01 0.1 Time (s)

1

10

C Fluorescence signal

6.0 5.8 5.6 5.4 5.2 5.0

0

1

2

3 4 5 [Urea] M

6

7

8

Figure 1.3 (A) Hypothetical plot of fluorescence signal in the presence of 0.7 M acrylamide versus time for a refolding reaction. The unfolded (þ) and native (x) protein signals in the presence of acrylamide are shown. Data at several final urea concentrations are shown (D, ◊, □, ○). (B) Hypothetical plot of the fluorescence emission signal versus time for a refolding reaction. The unfolded (▪) and native () protein signals are shown. Final urea concentrations of 3.45 M (○), 4.36 M (□), 5.27 M (◊), and 6.18 M (D) are shown. The boxed area marks the time from 1–5 ms. The arrows show the extrapolation of the signal to time zero. (C) Plot of burst phase signal versus final urea concentration. The continuous line represents a fit to a two-state equilibrium-folding model as described in the text.

Practical Approaches to Protein Folding

29

differential quenching of tryptophan fluorescence emission during refolding or unfolding as compared to the quenching of the native and unfolded proteins.

3.3. Data analysis The first step in data analysis is to determine the number of kinetic phases in the experimental data. To achieve this, there are multiple data analysis programs available including KaleidaGraph (Synergy Software), Excel (Microsoft), and Origin (Origin Lab). A phase is defined here as any change in signal over time. A phase can be described by a single rate constant, or, in the case of a burst phase, a change in amplitude. In order to determine the number of kinetic phases, plot the signal versus time in seconds. A logarithmic scale can be used for short times (1 to 100 ms) or for combining multiple time scales. An example of a refolding experiment is shown in Fig. 1.3B, which shows the signal acquired versus time at six different final urea concentrations. One should note that the data will vary for different proteins. In this hypothetical refolding experiment, three phases are detected and are referred to here as the burst, fast, and slow phases in order to illustrate the following analysis. The burst phase occurs within the mixing dead time for the instrument and is observed at all final urea concentrations shown here. The fast phase, from 2 ms to 100 ms, occurs at 3.45, 4.36, 5.27, and 6.18 M urea. The slow phase, from 200 ms to 10 s, occurs at 3.45 and 4.36 M urea. 3.3.1. Burst phase The burst phase is defined as the change in signal that occurs during the mixing dead time, and while illustrated here, not all kinetic experiments will display a burst phase. If the data display a change in the initial signal with different final urea concentrations, as shown in Fig. 1.3B, then one should plot the burst phase signal versus final urea concentration (Fig. 1.3C). In order to determine the signal, extrapolate the data from 1–5 ms (the boxed area in Fig. 1.3B) to a time of 0 s (arrows in Fig. 1.3B). These times are chosen because the signal typically is too noisy at shorter times. A plot of the burst phase signals versus urea (Fig. 1.3C) may show linear or nonlinear transitions. A linear transition indicates a noncooperative process and simply may represent a change in signal for the unfolded protein with a change in final urea concentration. In contrast, a nonlinear transition (Fig. 1.3C) indicates a cooperative folding process. In this case, one can determine the free energy and m-value for the formation of the burst phase species by fitting the data to a relevant equilibrium-folding model. Typically, one uses a two-state equilibrium model (Eq. 1.13), although more complicated models also can be used. An examination of data determined by fluorescence emission, following excitation at 280 nm and 295 nm, CD,

30

Jad Walters et al.

and/or differential quenching may show whether multiple species form in the burst phase (Georgescu et al., 1998; Zaidi et al., 1997). By comparing the free energy and m-values to those obtained from equilibrium unfolding experiments, one will obtain further information regarding the species formed during the burst phase. 3.3.2. Exponential fits Most observable phases comprise exponential increases or decreases in signal. Therefore, this section will focus on the use of multiple exponential equations to fit the experimental data. Two valuable pieces of information will be obtained from these fits: the amplitude and the apparent rate of each phase. The amplitude provides information on the change in signal with each phase and is related to the population of species that occurs in a phase (Wallace and Matthews, 2002). The apparent rate constant provides the rate at which the transition occurs. One should note that because only the ratelimiting step is detected, a single phase may consist of multiple transitions with multiple rate constants. In addition, the amplitude of a particular phase may differ depending on the detection method used, while the rate will be consistent between all methods (so long as the phase is detected by each method). As stated previously, kinetic measurements are collected until the equilibrium signal is reached. So, in general, the total change in amplitudes of the kinetic phases should equal the difference in signals between native and unfolded control proteins. The data are fit as a sum of exponentials, as shown in Eq. 1.15:

AðtÞ Að1Þ ¼

n X

Ai eki t

ð1:15Þ

i¼1

In this case, A(t) is the amplitude at time t, A(1) is the offset value, Ai is the change in signal for phase i, k is the apparent rate, and t is time (Bieri and Kiefhaber, 2000; Utiyama and Baldwin, 1986). To fit exponential phases, it is best to use a nonlinear least squares fitting program, such as those mentioned previously. Two and three (or more) exponential equations have the same format as a single exponential but with additional terms to account for the total number of phases. That is, data that contain more than one phase are fit to a sum of exponentials. As an example, for the refolding experiment shown in Fig. 1.3B, each urea step is plotted and analyzed individually. Fig. 1.4 shows a plot of data collected at a final urea concentration of 4.36 M fit to either a one (Fig. 1.4A), two (Fig. 1.4B), or three (Fig. 1.4C) exponential equation. An analysis of the residuals to the fits (below each plot) shows the difference between the fit and the experimental data, where the distribution of the points should be random around zero. A single exponential (Fig. 1.4A) does

31

Practical Approaches to Protein Folding

A

B

5.90

5.85 Fluorescence signal

Fluorescence signal

5.85 5.80 5.75 5.70 5.65 5.60 5.55

5.80 5.75 5.70 5.65 5.60 5.55

0.1 0.05 0 −0.05 −0.1 0.0001 0.001 0.01 0.1 Time (s) C

1

5.50 0.0001 0.001 0.01 0.1 Time (s)

10

Residuals

5.50 0.0001 0.001 0.01 0.1 Time (s) Residuals

5.90

1

10

1 10−6 5 10−7 0 −5 10−7 −1 10−6 0.0001 0.001 0.01 0.1 Time (s)

1

10

1

10

5.90

Fluorescence signal

5.85 5.80 5.75 5.70 5.65 5.60

Residuals

5.55 5.50 0.0001 0.001 0.01 0.1 Time (s) 1 10−6 5 10−7 0 −5 10−7 −1 10−6 0.0001 0.001 0.01 0.1 Time (s)

1

10

1

10

Figure 1.4 The 4.36 M refolding data from Fig. 1.3B were fit to a one (A), two (B), or three (C) exponential equation. Residuals to the fits are shown below each panel. (A) A single exponential fit of the data provides an amplitude of 0.22 and rate of 33 sec1. (B) A two exponential fit of the data provides amplitudes of 0.20 and 0.08 and rates of 50 and 0.3 sec1, respectively. (C) A three exponential fit of the data provides amplitudes of 0.2, 0.08, and 0.05 and rates of 50, 0.3 and 3.3 107 sec1, respectively.

32

Jad Walters et al.

not adequately describe the data, and the residuals for the three exponential fit (Fig. 1.4C) are not significantly different from those of the two exponential fit (Fig. 1.4B). Thus, using the simplest model, one would describe the experimental data as containing two phases, each defined by an apparent rate constant and amplitude. Continue the fitting process until all of the urea steps are analyzed, and then plot the rates and amplitudes from the different phases and detection methods versus denaturant concentration. Using a monomeric protein as a simple example (all observed phases are first-order reactions), we generated a plot of the log of apparent rate constant versus denaturant concentration. The data display a V-shape, and this plot is referred to as a chevron plot (Fig. 1.5A). The chevron analysis has been described extensively for a simple two-state folding model (Ferguson et al., 1999; Parker et al., 1995; Wallace and Matthews, 2002; Zarrine-Asfar and Davidson, 2004), and the reader is referred to previous reviews on the subject. The information obtained from the chevron analysis includes the folding rates (both refolding and unfolding) in the absence of denaturant, the m-values for refolding and unfolding, and information on the transition state for folding. For an oligomeric protein, the phase in which the protein forms the oligomer will display a protein concentration dependence to the apparent rate constant. For a dimeric protein, for example, the apparent rate will be second order with regard to the protein concentration. To determine the rate of dimerization, plot the apparent rate (sec1) versus the protein concentration (mM). The second-order rate plot should show a linear dependence on protein concentration, with a slope equal to the rate of dimerization ( Jaenicke and Rudolph, 1986). 3.3.3. Simulations Creating a kinetic folding model is outside of the scope of this article but is covered in other reviews (Bieri and Kiefhaber, 2000; Creighton, 1988; Utiyama and Baldwin, 1986; Wallace and Matthews, 2002). Once a model is created, it can be tested using other stopped-flow techniques, such as sequential mixing (Eftink and Shastry, 1997; Schmid, 1986; Wallace and Matthews, 2002). The protein-folding mechanism could be very complex, depending on the number of kinetic phases and the oligomeric properties of the protein, and analysis of the data may yield several possible folding mechanisms. The overall goal is to devise the simplest mechanism that adequately explains the experimental data. In this regard, simulation programs are useful for distinguishing between possible mechanisms. It is important to note that simulations cannot prove a mechanism, but rather they allow one to determine whether a mechanism is supported by the experimental data. This section will describe the use of the simulation program KINSIM (Barshop et al., 1982), but other programs also are

33

Practical Approaches to Protein Folding

Rate (sec−1)

A

B

100

$ sequential pathway ! U == I I == N ! *OUTPUT X1*U X2*I X3*N (X1*U) + (X2*I) + (X3*N)

10

1

0

2

4 6 [Urea] M

8

10

C 0.6

U+I+N

0.5 N Signal

0.4 I

0.3 0.2 0.1

U

0.0 0

2

4 6 Time (s)

8

10

Figure 1.5 (A) Hypothetical chevron plot of the apparent refolding and unfolding rates versus the final urea concentration. The continuous line represents a fit to a two-state kinetic folding model (Ferguson et al., 1999) with the following 2O 2O ¼ 200 sec1 , kH ¼ 0:015 sec1 , mN-TS ¼ 0.7, mU-TS ¼ 0.6, parameters: kH U f H2 O DG ¼ 5:62 kcal=mol, m-value ¼ 1.3 kcal/mol/M, urea12 ¼ 4.3 M. (B) Example of a sequential pathway with one on-pathway intermediate written in the text format for KINSIM. X1, X2, and X3 are the extinction coefficients of U, I, and N, respectively. (C) Hypothetical example of a refolding reaction of 10 mM protein for 10 s. The populations of species, shown by the continuous lines, are labeled as U, I, N, and U þ I þ N. The rates of the U to I and I to N transitions are 1 and 0.3 sec1, respectively. The values of X1, X2, and X3 are 0.01, 0.05, and 0.06, respectively.

34

Jad Walters et al.

available, such as KinTekSim (KinTek Corp.), KINFITSIM (Svir et al., 2002), and DynaFit (BioKin). There are various examples in the literature of the use of KINSIM in the field of protein folding, ranging from determining if an intermediate is onor off-pathway (Heidary et al., 2000) to examining the complex parallel folding pathway of a homodimeric protein (Mallam and Jackson, 2006). KINSIM is a free program available at http://www.biochem.wustl.edu/ cflab/message.html, where there also is a help manual with instructions for the program, but a brief overview will be described here. KINSIM allows the user to define a mechanism in a text file. A simple example is shown in Fig. 1.5B for the folding of a monomer with one onpathway intermediate (U $ I $N). In this sequential mechanism, the unfolded protein (U) folds to an intermediate species (I) before forming the native structure (N). As shown in Fig. 1.5B, a reversible reaction is described by two equal signs (¼ ¼). The concentration of each species is multiplied by an output factor (X1–X3 in Fig. 1.5B), and the signal observed is the sum of the contribution of each species. Once the mechanism is loaded into KINSIM, the user can change protein concentration (mM), rate constants (sec1), output factors (signal/ [protein]) and simulation time (sec). The protein concentration and simulation time are chosen on the basis of the experimental conditions. The rate constants determined from fits of the experimental data (described previously) are used as initial guesses for the simulated rate constants. For example, if the data analysis reveals a rate of 50 sec1 for the transition of U to I, then one would use 50 for k1. Output factors are calculated by dividing the signal of the species, taken from the experimental data, by the protein concentration. Finally, one can output any combination of species. As an example, using the sequential mechanism in Fig. 1.5, the simulation screen will show four lines based on the four specified outputs, U, I, N and U þ I þ N (Fig. 1.5C). Experimental data also can be uploaded in order to examine the agreement between simulated and real data, which allows the user to fine-tune rate constants and output factors and to distinguish between possible mechanisms. As an example, KINSIM simulations were used to investigate several potential kinetic folding pathways of the caspase recruitment domain (CARD) of apoptotic protease activating factor 1 (Apaf-1) (Milam et al., 2007). Apaf-1 CARD is a small (<100 amino acids), monomeric protein consisting of six a-helices arranged in an a-Greek key folding topology (Weber and Vincenz, 2001). One sequential (Fig. 1.6A) and two parallel (Fig. 1.6B–C) kinetic mechanisms were proposed on the basis of three phases present in the experimental data (burst, fast, and slow). In order to demonstrate the ability of KINSIM to discriminate between possible mechanisms, this brief discussion will focus on simulations of the slow phase in the Apaf-1 CARD kinetic folding pathway. In the sequential

35

Practical Approaches to Protein Folding

A

B U

I2

I1

N

C U1

N1

U2

N2 E

0.07

0.07

0.06

0.06

0.05

0.05

0.04

0.04

Signal

Signal

D

0.03

0.02

0.01

0.01

0.00

0.00 0.01

0.1 Time (s) F

1

10

N U2

0.03

0.02

−0.01 0.001

U1 U3

−0.01 0.001

0.01

0.1 Time (s)

1

10

0.07 0.06 0.05

Signal

0.04 0.03 0.02 0.01 0.00 −0.01

0.001

0.01

0.1 Time (s)

1

10

Figure 1.6 One sequential (A) and two parallel folding pathways (B and C) used in KINSIM simulations. (A) The unfolded species folds through two intermediates before forming the native species. (B) Two unfolded and native species exist and interconvert. (C) Three unfolded species exist, where U1 and U2 fold to the native state and U3 interconverts with U1 or U2. (D–F) Plots of an Apaf-1 CARD refolding reaction in 3.64 M urea. KINSIM simulations for each plot are shown as a continuous line. For pathway A, the concentration of I1 was 10 mM. The extinction coefficient for U, I1, I2, and N were 0, 0.0007, 0.0058, and 0.0063, respectively. The rates for I1 to I2 and I2 to N were 45 and 0.6 sec1, respectively. For pathway B, the concentrations of U1 and U2 were 1 and 9 mM, respectively. The extinction coefficients for N1 and N2 were 0.0074 and 0.0064. The rates of U1 to N1, U1 to U2, U2 to N2, and N1 to N2 were 500, 100, 50, and 0.4 sec1, respectively. For pathway C, the concentrations of U1, U2, and U3 were 0.7, 8.3 and 1 mM. The extinction coefficient of N was 0.0086. The rates of U1 to N, U2 to N and U3 to U1 or U2 were 5000, 37 (back rate of 13) and 0.6 (back rate of 0.01) per second.

36

Jad Walters et al.

folding model (Fig. 1.6A), the unfolded protein folds through two intermediates before forming the native conformation. KINSIM simulations with this mechanism agree with the experimental data (Fig. 1.6D). However, more complicated mechanisms were proposed on the basis of stopped-flow sequential mixing studies that showed the presence of multiple unfolded species (Milam et al., 2007). The proposed parallel mechanisms, shown in Figs. 1.6B and 1.6C, consist of multiple unfolded and/or native conformations. In the model shown in Fig. 1.6B, two unfolded conformations, U1 and U2, form two native conformations, N1 and N2, and the unfolded and native species are able to interconvert. Simulations with this mechanism were not able to recapitulate the slow phase of refolding, assuming that the slow phase was due to the interconversion of U1 to U2 or of N1 to N2 (Fig. 1.6E). As a result, the folding model in Fig. 1.6C was proposed, with the inclusion of a third unfolded species, U3. In this parallel pathway, both unfolded conformations, U1 and U2, can form the native species (N). The conversion of U3 to U1 or to U2 represents the slow phase, as shown in Fig. 1.6F. Therefore, the models in Figs. 1.6A and 1.6C adequately explain the single mixing stoppedflow data (Fig. 1.6D and 1.6F, respectively), but only the model in Fig. 1.6C agrees with the sequential mixing stopped-flow data, which showed multiple unfolded species.

REFERENCES Barshop, B. A., Wrenn, R. F., and Frieden, C. (1983). Analysis of numerical methods for computer simulation of kinetic processes: Development of KINSIM—a flexible, portable system. Anal. Biochem. 130, 134–145. Bieri, O., and Kiefhaber, T. (2000). Kinetic models in protein folding. In ‘‘Mechanisms of Protein Folding’’ (R. H. Pain, ed.), pp. 34–64. Oxford University Press. Bose, K., and Clark, A. C. (2001). Dimeric procaspase-3 unfolds via a four-state process. Biochem. 40, 14236–14242. Bose, K., Pop, C., Feeney, B., and Clark, A. C. (2003). An uncleavable procaspase-3 mutant has a lower catalytic efficiency but an active site similar to that of mature caspase-3. Biochem. 42, 12298–12310. Bowie, J. U., and Sauer, R. T. (1989). Equilibrium dissociation and unfolding of the arc repressor dimer. Biochem. 28, 7139–7143. Clark, A. C. (2008). Protein folding: Are we there yet? Arch. Biochem. Biophys. 469, 1–3. Clark, A. C., Sinclair, J. F., and Baldwin, T. O. (1993). Folding of bacterial luciferase involves a non-native heterodimeric intermediate in equilibrium with the native enzyme and the unfolded subunits. J. Biol. Chem. 268, 10773–10779. Cohen, F. E., and Kelley, J. W. (2003). Therapeutic approaches to protein-misfolding diseases. Nature 426, 905–909. Creighton, T. E. (1988). Toward a better understanding of protein folding pathways. Proc. Natl. Acad. Sci. 85, 5082–5086. Creighton, T. E. (1990). Protein folding. Biochem. J. 270, 1–16. Dignam, J. D., Qu, X., and Chaires, J. B. (2001). Equilibrium unfolding of bombyx mori glycyl-tRNA synthetase. J. Biol. Chem. 276, 4028–4037. Dill, K. A. (1990). Dominant forces in protein folding. Biochem. 29, 7133–7155.

Practical Approaches to Protein Folding

37

Eftink, M. R. (1994). The use of fluorescence methods to monitor unfolding transitions in proteins. Biophys. J. 66, 482–501. Eftink, M. R. (2000). Use of fluorescence spectroscopy as thermodynamics tool. Methods Enzymol. 323, 459–473. Eftink, M. R., and Shastry, M. C. R. (1997). Fluorescence methods for studying kinetics of protein-folding reactions. Methods Enzymol. 278, 258–286. Enoki, S., Maki, K., Inobe, T., Takahashi, K., Kamagata, K., Oroguchi, T., Nakatani, H., Tomoyori, K., and Kuwajima, K. (2006). The equilibrium unfolding intermediate observed at pH 4 and its relationship with the kinetic folding intermediates in green fluorescent protein. J. Mol. Biol. 361, 969–982. Ferguson, N., Capaldi, A. P., James, R., Kleanthous, C., and Radford, S. E. (1999). Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9. J. Mol. Biol. 286, 1597–1608. Georgescu, R. E., Li, J. H., Goldberg, M. E., Tasayco, M. L., and Chaffotte, A. F. (1998). Proline isomerization-independent accumulation of an early intermediate and heterogeneity of the folding pathways of a mixed alpha/beta protein, Escherichia coli thioredoxin. Biochem. 37, 10286–10297. Gloss, L. M., and Matthews, C. R. (1997). Urea and thermal equilibrium denaturation studies on the dimerization domain of Escherichia coli trp repressor. Biochem. 36, 5612–5623. Greene, R. F., and Pace, C. N. (1974). Urea and guanidine hydrochloride denaturation of ribonuclease, lysozyme, alpha-chymotrypsin, and beta-lactoglobulin. J. Biol. Chem. 249, 5388–5393. Grimsley, J. K., Scholtz, J. M., Pace, C. N., and Wild, J. R. (1997). Organophosphorus hydrolase is a remarkably stable enzyme that unfolds through a homodimeric intermediate. Biochem. 36, 14366–14374. Harder, M. E., Deinzer, M. L., Leid, M. E., and Schimerlik, M. I. (2004). Global analysis of three-state protein unfolding data. Protein Sci. 13, 2207–2222. Heidary, D. K., O’Neill, J. C., Roy, M., and Jennings, P. A. (2000). An essential intermediate in the folding of dihydrofolate reductase. Proc. Natl. Acad. Sci. 97, 5866–5870. Hornby, J. A., Luo, J. K., Stevens, J. M., Wallace, L. A., Kaplan, W., Armstrong, R. N., and Dirr, J. M. (2000). Equilibrium folding of dimeric class mu glutathione transferases involves a stable monomeric intermediate. Biochem. 39, 12336–12344. Ikai, A., and Tanford, C. (1971). Kinetic evidence for incorrectly folded intermediate states in the refolding of denatured proteins. Nature 230, 100–102. Jaenicke, R., and Rudolph, R. (1986). Refolding and association of oligomeric proteins. Methods Enzymol. 131, 218–250. Lakowicz, J. R. (2006). ‘‘Principles of fluorescence spectroscopy.’’ Springer, New York. Lumry, R., Biltonen, R., and Brandts, J. (1966). Validity of the ‘‘two-state’’ hypothesis for conformational transitions of proteins. Biopolymers 4, 917–944. Mallam, A. L., and Jackson, S. E. (2006). Probing nature’s knots: The folding pathway of a knotted homodimeric protein. J. Mol. Biol. 359, 1420–1436. Maxwell, K. L., Wildes, D., Zarrine-Afsar, A., De Los Rios, M. A., Brown, A. G., Friel, C. T., Hedberg, L., Horng, J. C., Bona, D., Miller, E. J., Vallee-Belisle, A., Main, E. R. G., et al. (2005). Protein folding: Defining a ‘‘standard’’ set of experimental conditions and a preliminary kinetic data set of two-state proteins. Protein Sci. 14, 602–616. Milam, S. L., Nicely, N. I., Feeney, B., Mattos, C., and Clark, A. C. (2007). Rapid folding and unfolding of Apaf-1 CARD. J. Mol. Biol. 369, 290–304. Myers, J. K., Pace, C. N., and Scholtz, J. M. (1995). Denaturant m values and heat capacity changes: Relation to changes in accessible surface areas of protein unfolding. Protein Sci. 4, 2138–2148.

38

Jad Walters et al.

Pace, C. N. (1986). Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 131, 266–280. Pace, C. N., Grimsley, G. R., and Scholtz, J. M. (2005). Denaturation of proteins by urea and guanidine hydrochloride. In ‘‘Protein Folding Handbook’’ ( J. Buchner and T. Kiefhaber, eds.), pp. 45–69. Wiley-VCH, Weinheim. Pace, C. N., and Schmid, F. X. (1997). How to determine the molar absorption coefficient of a protein. In ‘‘Protein Structure: A Practical Approach’’ (T. E. Creighton, ed.), pp. 253–259. Oxford University Press, New York. Pace, C. N., and Scholtz, J. M. (1997). ‘‘Measuring the conformational stability of a protein.’’ In ‘‘Protein Structure: A Practical Approach’’ (T. E. Creighton, ed.), pp. 299–321. Oxford University Press, New York. Park, Y. C., and Bedouelle, H. (1998). Dimeric tyrosyl-tRNA synthetase from Bacillus stearothermophilus unfolds through a monomeric intermediate. J. Biol. Chem. 273, 18052–18059. Parker, M. J., Spencer, J., and Clarke, A. R. (1995). An integrated kinetic analysis of intermediates and transition states in protein folding reactions. J. Mol. Biol. 253, 771–786. Placek, B. J., Harrison, L. N., Villers, B. M., and Gloss, L. M. (2005). The H2A.Z/H2B dimer is unstable compared to the dimer containing the major H2A isoform. Protein Sci. 14, 514–522. Privalov, P. L. (1989). Thermodynamic problems of protein structure. Ann. Rev. Biophys. Biophys. Chem. 18, 47–69. Royer, C. A. (2008). The nature of the transition state ensemble and the mechanisms of protein folding. Arch. Biochem. Biophys. 469, 34–45. Royer, C. A., Mann, C. J., and Matthews, C. R. (1993). Resolution of the fluorescence equilibrium unfolding profile of Trp aporepressor using single tryptophan mutants. Protein Sci. 2, 1844–1852. Saito, Y., and Wada, A. (1983). Comparative study of GuHCl denaturation of globular proteins. I. Spectroscopic and chromatographic analysis of the denaturation curves of ribonuclease A, cytochrome c, and pepsinogen. Biopolymers 22, 2105–2122. Santoro, M. M., and Bolen, D. W. (1988). Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmethanesulfonyl a-chymotrypsin using different denaturants. Biochem. 27, 8063–8068. Schmid, F. X. (1986). Fast-folding and slow-folding forms of unfolded proteins. Methods Enzymol. 131, 70–82. Schmid, F. X. (1997). Optical spectroscopy to characterize protein conformation and conformational changes. In ‘‘Protein Structure: A Practical Approach’’ (T. E. Creighton, ed.), pp. 261–297. Oxford University Press, New York. Scholtz, J. M. (1995). Conformational stability of HPr: The histidine-containing phosphocarrier protein from Bacillus subtilis. Protein Sci. 4, 35–43. Shastry, M. C. R., Luck, S. D., and Roder, H. (1998). A continuous-flow capillary mixing method to monitor reactions on the microsecond time scale. Biophys. J. 74, 2714–2721. Soto, C. (2003). Unfolding the role of protein misfolding in neurodegenerative diseases. Nat. Rev. Neurosci. 4, 49–60. Svir, I. B., Klymenko, O. V., and Platz, M. S. (2002). ‘‘KINFITSIM’’: A software to fit kinetic data to a user selected mechanism. Computers and Chemistry 26, 379–386. Utiyama, H., and Baldwin, R. L. (1986). Kinetic mechanisms of protein folding. Methods Enzymol. 131, 51–70. Vanhove, M., Lejeune, A., Guillaume, G., Virden, R., Pain, R. H., Schmid, F. X., and Frere, J. M. (1998). A collapsed intermediate with nonnative packing of hydrophobic residues in the folding of TEM-1 beta-lactamase. Biochem. 37, 1941–1950. Waggoner, A. (1995). Covalent labeling of proteins and nucleic acids with fluorophores. Methods Enzymol. 246, 362–373.

Practical Approaches to Protein Folding

39

Wallace, L. A., and Matthews, C. R. (2002). Sequential vs. parallel protein-folding mechanisms: Experimental tests for complex folding reactions. Biophys. Chem. 101–102, 113–131. Weber, C. H., and Vincenz, C. (2001). The death domain superfamily: A tale of two interfaces? TRENDS Biochem. Sci. 26, 475–481. Weber, G. (1951). Polarization of the fluorescence of macromolecules. Biochem. J. 51, 145–155. Wilson, C. J., and Wittung-Stafshede, P. (2005). Role of structural determinants in folding of the sandwich-like protein Pseudomonas aeruginosa azurin. Proc. Natl. Acad. Sci. 102, 3984–3987. Woody, R. W. (1995). Circular dichroism. Methods Enzymol. 246, 34–71. Zahler, W. L., and Cleland, W. W. (1968). A specific and sensitive assay for disulfides. J. Biol. Chem. 243, 716–719. Zaidi, F. N., Nath, U., and Udgaonkar, J. B. (1997). Multiple intermediates and transition states during protein folding. Nat. Struct. Biol. 4, 1016–1024. Zarrine-Afsar, A., and Davidson, A. R. (2004). The analysis of protein folding kinetic data produced in protein engineering experiments. Methods 34, 41–50.

C H A P T E R

T W O

Using Thermodynamics to Understand Progesterone Receptor Function: Method and Theory Keith D. Connaghan-Jones* and David L. Bain* Contents 42 43 46 54 62 64 67 68 68

1. Introduction 2. Assessing Protein Functional and Structural Homogeneity 3. Dissecting Linked Assembly Reactions 4. Analysis and Dissection of Natural Promoters 5. Measuring the Energetics of Coactivator Recruitment 6. Correlation to Biological Function 7. Conclusions and Future Directions Acknowledgments References

Abstract Progesterone receptors (PRs) are members of the nuclear receptor superfamily of ligand-activated transcription factors. The mechanisms by which receptors such as PR assemble at a promoter and recruit coactivators are well understood at the biochemical level. However, a rigorous and thus quantitatively predictive understanding of function is entirely lacking. This is so in part because the study of receptor function has largely been carried out using semiquantitative or qualitative approaches. These types of analyses are limited in their ability to resolve thermodynamically valid and physically meaningful microscopic interaction parameters. This includes resolution of intrinsic binding constants and cooperativity terms, as well as the mathematical framework for integrating these values into a larger molecular code for function. Here we present our experimental and theoretical approach for dissecting the linked reactions associated with PR and coactivator assembly at complex promoter sequences. We discuss the use of analytical ultracentrifugation and quantitative DNase footprint titration and their coupling to exact theoretical treatments. We then highlight the major findings of these studies and their implications for understanding and reevaluating receptor function.

*

Department of Pharmaceutical Sciences, University of Colorado Denver, Denver, Colorado, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04202-X

#

2009 Elsevier Inc. All rights reserved.

41

42

Keith D. Connaghan-Jones and David L. Bain

1. Introduction We are attempting to elucidate the mechanisms underlying higher eukaryotic gene regulation. As a model system, we study the human progesterone receptor (PR) and its interactions with complex promoters and coactivators. PR is a member of the nuclear receptor superfamily of ligand-activated transcription factors (Tsai and O’Malley, 1994). It exists naturally as two functionally distinct isoforms, PR-A and PR-B (Meyer et al., 1992; Sartorius et al., 1994). The isoforms are identical except that the B-receptor has an additional 164 amino acids at its N-terminus. According to the standard model of receptor function, upon binding their cognate hormone, the isoforms translocate into the cell nucleus and assemble at progesterone response elements (PREs) located within the promoter DNA (Tsai and O’Malley, 1994). Receptor binding is coupled to recruitment of an array of cofactors, resulting in transcriptional activation (Liu et al., 2001). Despite an enormous amount of biochemical, structural, and cellular studies, our understanding of the functional mechanisms of receptors such as PR remains incomplete. In part, this is because transcriptional regulation is an extremely complex process that can involve the time-dependent interactions of approximately 50 different proteins (Metivier et al., 2003; Nagaich et al., 2004). Perhaps more important, however, the study of receptor action—a fundamentally quantitative process—has largely been carried out using semiquantitative or qualitative approaches. As such, a truly predictive foundation for understanding receptor function is lacking. To better illuminate this position, we note several unresolved issues: The microscopic terms that govern nuclear receptor-promoter interactions, parameters such as the intrinsic DNA binding energetics and associated cooperativity terms, are unknown. Likewise, the physical and chemical forces that drive these interactions are either unknown or poorly understood. Adding to the complexity, the assembly reactions at the promoter are often nonlinearly coupled (at the structural, energetic, and functional levels) to subsequent reactions in order to generate an efficient molecular switch. Consequently, we have no mathematical framework to connect the many microscopic interactions to the macroscopic phenomenon of gene regulation. A fundamental question, then, is how to take the physical and chemical principles responsible for assembly at a promoter and integrate them in a way predictive of biological regulation. Although a number of disciplines are available for quantitatively analyzing receptor-promoter interactions, our bias is to use a thermodynamic characterization coupled to statistical thermodynamic dissection. The value of thermodynamics is in its ability to constrain a family of potential mechanisms in a way that is impossible using semiquantitative approaches.

Thermodynamics of Progesterone Receptor Function

43

However, given this definition, how does one obtain microscopic insight into a complex system? Our solution to this problem is to employ statistical thermodynamics. When this approach is coupled to rigorous techniques and exact theory, it is possible to resolve microscopic parameters using physically meaningful binding models. Finally, these results can be correlated with nonquantitative cellular and biochemical data in order to further delineate the mechanisms governing transcriptional regulation. As a step toward obtaining a truly quantitative understanding of receptor function, we detail our theoretical and experimental research design for dissecting PR-promoter interactions. We then discuss our results in the context of recent biochemical and cellular data for the purpose of gaining insight into the mechanisms of gene regulatory processes. We also highlight a number of findings at odds with the traditional understanding of receptor function and suggest that a reassessment of the standard model may be in order.

2. Assessing Protein Functional and Structural Homogeneity For any quantitative analysis, it is self-evident that the protein of interest must be highly pure. With regard to PR, we have previously published our purification protocols for both isoforms (Connaghan-Jones et al., 2006; Heneghan et al., 2005). Briefly, each receptor is purified in multiple steps, using affinity chromatography, ion-exchange chromatography, and size-exclusion chromatography. After the final purification step, densitometric analysis of a Coomassie-stained gel indicates that the isoforms are 95% pure. Unfortunately, knowledge of protein purity offers no insight into the degree of protein activity—the extent of functional homogeneity. Nor does it account for the presence of unfolded or aggregated material—the extent of structural homogeneity. This lack of information can add significant imprecision to subsequent data collection and analysis. Some examples include DNA-binding studies that may take into account only total protein concentration rather than the active concentration, or proteolytic analyses that may inadvertently probe a heterogenous structural population. In the case of nuclear receptors, homogeneity has historically been demonstrated only in isolated domains (Bain et al., 2000, 2001; Chen et al., 1994). By contrast, evidence in support of homogenous full-length receptor preparations has been lacking. Although a number of techniques have been used to assess the functional (and structural) homogeneity of purified nuclear receptors, our approach is to use analytical, spectroscopic approaches. We favor these techniques over

44

Keith D. Connaghan-Jones and David L. Bain

more traditional assays such as charcoal adsorption because the latter has long been known to be incapable of accurately measuring the activity of purified material (Chen et al., 1994; Schrader, 1975). Moreover, the linearized equations commonly used to analyze such data (e.g., Scatchard analysis) propagate error in ways largely unaccounted for in the fitting process and thus considerably decrease the likelihood of obtaining accurate results ( Johnson, 1992; Wyman and Gill, 1990). Shown in Fig. 2.1 is a stoichiometric binding curve for RU-486 binding to PR-B (Heneghan et al., 2005). In this assay, the change in intrinsic fluorescence of unliganded PR-B was measured as a function of increasing concentration of the antiprogestin, RU-486. In order to assess the extent of binding activity of the receptor, the titration was carried out using a concentration of 0.5 mM PR-B. This concentration is much greater than the nanomolar binding affinity of RU-486 (Hurd and Moudgil, 1988) and thus ensures that the receptor-ligand interaction is carried out under stoichiometric conditions. As can be seen from the data, there is a monotonic increase in the fraction of receptor quenched until a plateau value is reached. The point at which the plateau value is achieved (i.e., the breakpoint) corresponds to saturation of the binding reaction. If the stoichiometry of the reaction is known, the transition serves as a quantitative measure of the functional activity of the protein. In this case, the universally accepted PRhormone binding stoichiometry is 1:1. Because the breakpoint of the plot occurs at a ratio of 0.76 0.1 [RU-486]/[PR-B], the protein is approximately 75% active in ligand-binding activity. Similar approaches to assessing 1.2

Fraction quenched

1.0 0.8 0.6 0.4 0.2 0.0 −0.2 0

0.5

1 1.5 [RU-486]/[PR-B]

2

2.5

Figure 2.1 Stoichiometric titration of PR-B with the anti-progestin RU-486—Points represent the fractional intrinsic florescence quench of 0.5 mM PR-B at each concentration of RU-486. Line represents data fit to a phenomenological breakpoint transition curve.

45

Thermodynamics of Progesterone Receptor Function

the quantitative extent of DNA-binding activity can be pursued using nitrocellulose filter binding or gel shift assays (Senear et al., 1993). The results of these studies can then be incorporated as correction terms to account for active protein concentration rather than total protein concentration. Our approach to assessing structural homogeneity is to use analytical ultracentrifugation (AUC). AUC techniques (sedimentation velocity and sedimentation equilibrium) are distinguished from glycerol or sucrose gradient centrifugation by their rigor because the theory and analysis for AUC can be derived from first principles of physics. Shown in Fig. 2.2A is a series of sedimentation velocity scans of PR-A taken at multiple protein concentrations (Connaghan-Jones et al., 2006). The solid lines represent the results of a global fit to all three data sets using a monomer-dimer assembly model. The residuals of the fit are shown in Fig. 2.2B. Analysis of the data supports the presence of a 7-s dimer species in rapid equilibrium with a 4-s monomer species, with a free energy change of 7.1 kcal/mol or a dimerization constant of 2 mM (Connaghan-Jones et al., 2006). The results of the analysis also indicate that aggregated or unfolded material represents less than 5% of the population (data not shown). Furthermore, other models (e.g., monomer-trimer or stable dimer) were inadequate in their ability to

−0.02 6.2 6.7 Radial distance (cm) Δ A230

B

−0.03 6.2 6.7 Radial distance (cm)

0.06 0.02

0.06 0.04 0.02 0

−0.02

−0.02

6.2 6.7 Radial distance (cm)

6.3 6.8 Radial distance (cm)

0.03 0.00

Absorbance (A230)

0.04

0.1

0.03 0.00

−0.03 6.3 6.8 Radial distance (cm)

0.03 Δ A230

0.1

Absorbance (A230)

0.16

0.08

0.14

Δ A230

Absorbance (A230)

A 0.22

0.00 −0.03 6.2 6.7 Radial distance (cm)

Figure 2.2 Sedimentation velocity analysis of PR-A—Presented in Panel A are three initial PR-A loading concentrations sedimented at 50,000 rpm. Left plot 1.5 mM PR-A, middle plot 1.0 mM PR-A, and right plot 0.5 mM PR-A. Squares represent absorbance scans as a function of time. Solid lines represent simultaneous analysis of all three loading concentrations using a monomer-dimer assembly model. The RMSD for the fit was 0.0037 absorbance units. Shown in Panel B are the residuals from the global analysis of the data sets presented in Panel A. For clarity, only every other scan from the analysis is shown.

46

Keith D. Connaghan-Jones and David L. Bain

describe the data. It is important to recognize that the ability of the fit to predictively account for the presence of monomers and dimers strongly suggests that both populations are structurally homogeneous with respect to their hydrodynamic properties. We note that this conclusion would be impossible to reach using more biochemically oriented approaches, including small-zone size-exclusion chromatography experiments (Ackers, 1970). Sedimentation velocity excels at characterizing the size and shape properties of the macromolecule of interest. However, in order to rigorously determine the energetics of a self-associating system, sedimentation equilibrium is the preferred method. Our sedimentation equilibrium experiments were conducted with three concentrations of protein equilibrated at three rotor speeds, under identical conditions as the earlier described velocity experiments (Connaghan-Jones et al., 2006). Global analysis of the nine data sets to a monomer-dimer model resolved a free energy change of 7.6 kcal/mol or a dissociation constant of 1 mM. Alternative models such as noninteracting dimer or higher-order assembly interactions (e.g., dimer-tetramer) either did not adequately describe the data or failed to converge. This result reflects the power of global analysis in discriminating between models. By contrast, analysis of a single data set can result in multiple and sometimes contradictory models being equally capable of describing the data. In summary, independent sedimentation velocity and sedimentation equilibrium experiments support the same conclusion: PR isoforms exists as structurally homogeneous monomers and dimers in rapid equilibrium. Furthermore, the energetics of self-association are dramatically weaker than the nanomolar affinity estimated using cellular extracts and semiquantitative approaches (Skafar, 1991). Thus, taken together with the spectroscopic functional analyses, these results allow us to analyze and interpret subsequent studies with great confidence.

3. Dissecting Linked Assembly Reactions The results of the spectroscopic and AUC studies serve two purposes—the most immediate is that one can assess the quality and integrity of the receptor prior to carrying out further analysis. Less obviously, an independent and explicit knowledge of the dimerization constant is crucial for determining the actual or intrinsic DNA binding affinity of the protein. This point is expanded on in Fig. 2.3A. Shown schematically is the standard biochemical model for PR-promoter binding. When cast in quantitative terms, the ligand-bound receptor dimerizes in the absence of DNA with a dimerization constant of kdim. The preformed dimer then binds to a palindromic response element with an intrinsic binding constant of k2.

47

Thermodynamics of Progesterone Receptor Function

A

B

kdim 2

k1

k2

kc1

kc2

kc2

Figure 2.3 Schematic depiction of selected PR:PRE2 assembly states—Panel A highlights the dimer-binding pathway: Circles represent hormone-bound PR-B structure. Squares represent PR-A solution dimers or PR-A bound to the PRE2 promoter (k2). Panel B highlights the monomer-binding pathway: Successive monomer binding at a palindromic response element (k1) is accompanied by an intrasite cooperative interaction (kc1), which is represented schematically by a transition from a filled circle to a filled square. Saturation of the two response elements is accompanied by an intersite cooperative interaction (kc2). Site 1 is indicated by the filled rectangle. Open rectangle represents site 2. White and black lines separate the two half-sites in site 1 and 2 respectively. Arrow refers to the direction of transcriptional start site.

Binding at multiple response elements is accompanied by cooperative interactions between the sites, kc2. Noting this, it is critical to recognize that an apparent dissociation constant (Kapp) determined by taking a halfsaturation value from a titration curve (e.g., as in a gel shift assay) is typically a composite of these microscopic terms, much as a Km for an enzymesubstrate reaction is a composite of multiple rate constants. In other words, the apparent Kapp (or Km) is a macroscopic value that offers no microscopic and little mechanistic insight—the value cannot be assigned to any single reaction or interaction. Moreover, it is usually impossible to deconvolute the microscopic parameters from the macroscopic value because of high parameter correlation. Thus only with an explicit and independent determination of the dimerization constant, kdim, is it then possible to resolve the microscopic DNA-binding parameters, k2 and kc2. In order to determine the parameters associated with promoter binding, we use quantitative footprint titrations (Brenowitz et al., 1986; ConnaghanJones et al., 2008b) coupled to statistical thermodynamic-based theory (Ackers et al., 1982). Particularly for complex promoters containing multiple PRE binding sites, quantitative footprinting may be the optimal technique. This is so because a footprint titration reports on binding at each site, and thus in principle allows resolution of microscopic interaction parameters. When this approach is carried out under conditions in which the DNase obeys ‘‘single-hit’’ kinetics (i.e., on average, no DNA fragment is nicked more than once), the resultant binding isotherms can be considered

48

Keith D. Connaghan-Jones and David L. Bain

thermodynamically valid. Under this condition, physically meaningful energetics can be extracted from the data. In order to describe this approach in some detail, we present an analysis of a PR-regulated promoter containing two identical and palindromic response elements (PRE2). Although not a naturally occurring promoter, several features make the PRE2 template worthy of examination. First, the landscape of the promoter allows one to resolve several types of microscopic energetic terms: intrinsic dimer binding affinity, intrinsic monomer binding affinity, and any microscopic cooperative interactions within and between palindromic PREs. Additionally, because the promoter contains only two binding sites of identical sequence, the mathematical formulations describing the binding reactions can be derived in a straightforward manner. Finally, there is an extensive database of cellular studies available for comparing to, and correlating with, the thermodynamic results. Shown in Fig. 2.4A is a quantitative footprint titration of the PRE2 promoter using the PR-A isoform. It is evident that protection from DNase activity is limited to two regions known by dideoxy sequencing analysis to include the palindromic response elements and one or two flanking nucleotides. Shown in Fig. 2.4B are the individual-site binding isotherms generated from the footprint. (The details of the quantification process are presented elsewhere; Connaghan-Jones et al., 2008b.) Also shown is the binding isotherm derived from a mutated or reduced-valency promoter containing only a single PRE (PRE1). The three binding isotherms can be analyzed as described subsequently in order to resolve the microscopic binding energetics of PR-A promoter assembly. Our goal is to formulate equations that describe a physically meaningful, microscopic binding model that also directly accounts for the observed binding data. We contrast this approach using models of indeterminate molecular relevance (e.g., the Hill Equation) and fits to transformed or linearized binding curves (e.g., Scatchard plots). As an introduction to this process, shown in Table 2.1 are the possible microstates (s) associated with PR-A binding to the PRE2 promoter. Two models are presented: The first assumes that only dimers are capable of binding and the second assumes only monomers bind. Representative assembly states for each binding model are depicted schematically in Fig. 2.3. As previously described, in the dimer-binding model monomers first assemble in solution with an affinity of kdim. This value is determined independently using sedimentation techniques. The preformed dimer can bind at either PRE with an intrinsic affinity of k2. Upon binding two dimers at the promoter, there is a cooperative interaction between the two PREs (kc2). In the monomer-binding model, monomer interactions occur at each PRE half-site with an intrinsic affinity of k1. A second monomer can be cooperatively recruited to the same PRE (kc1), and upon saturation of all half-sites there is an intersite

49

Thermodynamics of Progesterone Receptor Function

A

1

2

[PR-A]

B

1.2 1.0 0.8

− Y

0.6 0.4 0.2 0.0 −0.2 10−11

10−10

10−9 10−8 [PR-Atotal] (M)

10−7

10−6

Figure 2.4 Quantitative footprint titration and individual site binding isotherms for the PR-A:PRE2 binding interaction—Shown in Panel A is an image of the footprint titration of the PRE2 promoter. A schematic of the promoter landscape is shown to the right of the image. Triangle at the bottom of the image indicates increasing PR-A concentration. Coincident with increasing protein concentration is the disappearance of bands previously identified as the nucleotides located in sites 1 (filled rectangle) and 2 (open rectangle). Shown in Panel B are the individual site-binding isotherms

50

Keith D. Connaghan-Jones and David L. Bain

cooperative interaction between the PREs (kc2) identical to that described for the dimer-binding pathway. Quantitative footprinting measures the fractional saturation (Y ) of each individual binding site at the promoter. The extent of saturation is mathematically equivalent to the probability of binding. Thus, in order to connect each experimental binding isotherm to a mathematical model, equations must be derived that sum the probability of each protein-DNA microstate ( fs) that contributes to the footprint signal at each site (Table 2.1). For example, the fractional saturation of site 1 (Y 1 ) on the PRE2 promoter via a dimer-binding model is:

Y 1 ¼ f2 þ f4 ;

ð2:1Þ

whereas the fractional saturation for site 2 (Y 2 ) is:

Y 2 ¼ f3 þ f4 :

ð2:2Þ

From statistical thermodynamics, the probability of microstate s ( fs) at any protein concentration is (Hill, 1960):

exp DGsðsÞ =RT ½x j fs ¼ P ; exp DGsðsÞ =RT ½x j

ð2:3Þ

where DGs(s) is the sum of the microscopic free energy changes for species s as defined in Table 2.1. For example, the microscopic free energy terms associated with two dimers binding to the PRE2 promoter are 2 DGdim þ 2 DG2 þ DGc2 ¼ DGs(4)), where [x] is the free monomer protein concentration as determined by solving the conservation of mass equation for a monomer-dimer equilibrium; j is the binding stoichiometry of species s; R

associated with sites 1 (filled squares) and 2 (open squares) of the PRE2 promoter. Also shown is the binding isotherm from the reduced-valency PRE1 promoter (open circles). Solid line represents the best fit for binding to sites 1 and 2 on the PRE2 promoter. Dashed line represents the best fit for binding to the sole site on the PRE1promoter. Best fits are of the data were obtained using the dimer-binding model presented in Table 2.1 and schematically depicted in Fig. 2.3. Identical results were achieved with the monomer-binding model.

51

Thermodynamics of Progesterone Receptor Function

Table 2.1 PRE2 species binding configurations and associated free energy changes

Model

Dimer

Monomer

a

b

c

Species (s)

Site occupancya 1

2

1 2 3 4

– xx – XX

– – xx XX

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

x_ _x – – xx – x_ x_ _x _x xx xx x_ _x XX

– – x_ _x – xx x_ _x x_ _x x_ _x xx xx XX

Microscopic free energy contributionb

Reference State DG2 þ DGdim DG2 þ DGdim 2 DG2 þ 2 DGdi þ DGc2 DG1 DG1 DG1 DG1 2 DG1 þ DGc1 2 DG1 þ DGc1 2 DG1 2 DG1 2 DG1 2 DG1 3 DG1 þ DGc1 3 DG1 þ DGc1 3 DG1 þ DGc1 3 DG1 þ DGc1 4 DG1 þ 2 DGc1 þ DGc2

Macroscopic free energyc

DGs(1) DGs(2) DGs(3) DGs(4) DGs(5) DGs(6) DGs(7) DGs(8) DGs(9) DGs(10) DGs(11) DGs(12) DGs(13) DGs(14) DGs(15) DGs(16) DGs(17) DGs(18) DGs(19)

A PR-A protomer bound to a site is indicated by an ‘‘x.’’ Cooperative binding between nonadjacent sites is indicated by ‘‘X.’’ A dash (–) indicates a vacant palindrome. An underscore (_) indicates a vacant half-site adjacent to a bound half-site. The microscopic free energies describing each species where DGdim is equal to the independently determined free energy of solution dimerization, DG2 is equal to the intrinsic free energy of a solution dimer, DG1 is equal to the intrinsic free energy of a monomer, DGc1 is equal to the cooperative free energy of binding a second monomer to a palindromic PRE, DGc2 is equal to the cooperative free energy between each site induced upon saturation of the promoter. Free energy changes are related to the constants by the standard relationship DGi ¼ RT ln ki, where R is the gas constant and T is the temperature in Kelvin. The defined macroscopic free energy change for each species.

is the gas constant and T is the absolute temperature. Thus the equation describing binding at site 1 of the PRE2 promoter requires substituting Eq. (2.3) into Eq. (2.1) thus yielding:

Y 1 ¼

ðkdim k2 Þ½x2 þ ðk2dim k22 kc2 Þ½x4 ; 1 þ 2ðkdim k2 Þ½x2 þ ðk2dim k22 kc2 Þ½x4

ð2:4Þ

52

Keith D. Connaghan-Jones and David L. Bain

where kdim, k2, kc2, and [x] are as defined previously. The equation describing binding to site 2 of the PRE2 promoter is identical to Eq. (2.4) because of the symmetric identity of the two sites. The equations for binding at sites 1 and 2 can then be used to fit the two individual-site binding isotherms. However, because of high correlation among the binding parameters, the isotherms from the wild-type template must be included in a simultaneous (global) analysis with the single isotherm from a reduced-valency template (PRE1). The equation describing binding of a preformed dimer to the PRE1 promoter is:

Y 1 ¼

ðkdim kÞ½x2 : 1 þ ðkdim kÞ½x2

ð2:5Þ

The results of such an analysis are shown as the solid lines through the data in Fig. 2.4B. It is evident that the model does an excellent job at describing the three binding isotherms. Using a similar approach, expressions for the binding of sequential PR monomers to both the PRE2 and the PRE1- promoters can be derived and then fit to the experimental data (Heneghan et al., 2006). The resolved energetics associated with each binding model are presented in Table 2.2

Table 2.2 Resolved free energy changes and differences for PR-A and PR-B:PRE2 binding interactions

Interaction free energy

DG2 DGc2d DG1 DGc1 DGc2e DGdim a b c d e f

PR-Aa (kcal mol1)

PR-Bb (kcal mol1)

11.4 0.1 0.4 0.2 8.4 0.4 1.7 0.9 0.9 0.5 7.6 0.6 f

12.8 0.1 2.5 0.1 9.4 0.2 0.9 0.5 3.3 0.5 7.2 0.7 f

DDGc (kcal mol1)

1.4 0.1 2.1 0.2 1.0 0.4 0.8 1.0 2.4 0.7 0.4 0.9

Data taken from reference (Connaghan-Jones et al., 2007). Data taken from reference (Heneghan et al., 2006). Free energy difference for each PR-A and PR-B parameter. Resolved from the dimer binding pathway. Resolved from the monomer binding pathway. Free energy change for solution dimerization measured independently using sedimentation equilibrium (Connaghan-Jones et al., 2006; Heneghan et al., 2005).

Thermodynamics of Progesterone Receptor Function

53

(Connaghan-Jones et al., 2007). For comparison, the results of an identical analysis of PR-B interactions at the PRE2 promoter are also shown (Heneghan et al., 2006). Although visual inspection of the binding curves suggests consistency with earlier biochemical studies (Onate et al., 1994; Tsai et al., 1989), our computational analysis reveals a number of results inconsistent with the traditional model of receptor function. First, noting the values presented in Table 2.2 it should be evident that the 30 nM apparent binding affinity of PR-A for the PRE1 promoter (as determined by a half-saturation value; Fig. 2.4B) has little similarity to the 11.4 kcal/mol or 1 nM intrinsic binding affinity for actual dimer binding. This discrepancy arises because the data as presented is plotted in units of total protein concentration rather than dimer concentration. That is, the binding curves as visualized do not take into account the linked dimerization reaction as measured by sedimentation analysis. Thus, the apparent moderate binding affinity toward the promoter is a composite of the extremely weak dimerization affinity and extremely strong DNA binding affinity. Previous semiquantitative analyses of PR-A and PR-B interactions with a promoter response element suggested that the two isoforms bound to DNA with similar affinities (Onate et al., 1994). This conclusion was based on visual inspection of binding curves that had not taken into account the linked PR dimerization reaction that occurs in the absence of DNA. However, when a rigorous analysis is performed it becomes apparent that PR-B in fact has a greatly enhanced intrinsic binding affinity and cooperative binding ability relative to PR-A (see Table 2.2). Specifically, PR-B dimers have a nearly 13-fold increase in intrinsic binding energetics at an individual response element relative to PR-A. Furthermore, on the PRE2 promoter, the B-isoform is able to exhibit greatly increased cooperative interactions relative to PR-A. Because the only difference between the two isoforms is the additional 164 B-unique sequence (BUS) found within PR-B (Sartorius et al., 1994), these residues must allosterically regulate the energetics of cooperative promoter assembly. The functional implications of these results are discussed later. Surprisingly, high-affinity binding of each isoform is opposed by a large energetic penalty. This penalty can be seen by comparing the dimerization free energy DGdim and the DGc1 cooperativity term (i.e., DNA induced dimerization) for either PR-A or PR-B. The difference between the two terms results in a þ6 kcal/mol or approximately 50,000-fold decrease in binding affinity. The penalty can also be seen as the difference between the energetics of successive monomers binding to a PRE relative to binding of a preformed dimer. In both cases, structural changes in the DNA (Connaghan-Jones et al., 2007; Heneghan et al., 2006) and protein (Bain et al., 2000, 2001) that are coupled to dimer assembly at a palindromic response element are likely explanations for the penalty.

54

Keith D. Connaghan-Jones and David L. Bain

4. Analysis and Dissection of Natural Promoters The data presented in Fig. 2.4 corresponds to an entirely synthetic, PR-regulated promoter. However, a more complete understanding of PR function also requires the study of naturally occurring promoters. The mouse mammary tumor virus (MMTV) promoter has four clearly recognizable binding sites corresponding to one palindromic PRE and three halfsite PREs (Fig. 2.5A). A meaningful dissection of this complex promoter requires answering several key questions. First, is it possible to formulate a rigorous model that is also consistent with the data? Second, is it possible to demonstrate that only one model (or small subset of models) is capable of describing MMTV-PR assembly? And finally, is it possible to demonstrate that the binding parameters associated with the model are well resolved, and thus demonstrate that the data support the level of model complexity? The initial step in analysis is development of a microscopic model. On the basis of our analysis of the PRE2 promoter (Connaghan-Jones et al., 2007; Heneghan et al., 2006), our sedimentation studies (Connaghan-Jones et al., 2006; Heneghan et al., 2005), and earlier biochemical studies (Bailly et al., 1991; Chalepakis et al., 1988; Perlmann et al., 1990), we formulated a PR-A:MMTV binding model depicted schematically in Fig. 2.5B and tabulated in Table 2.3 (Connaghan-Jones et al., 2008a). The rules that comprise this model are as follows: 1. PR-A monomers assemble at half-sites with an intrinsic affinity described as DG1 (species 2–4 in Table 2.3). 2. A solution dimer of PR-A binds at the palindromic PRE (site 1) with an intrinsic affinity described by DG2 (species 5). 3. A second monomer binds to the remaining half-sites with a cooperative interaction described by DGc1 (species 6–8). There is no cooperativity between a monomer (or pairs of monomers) bound at a half-site and the dimer bound at the palindrome (species 9–14). 4. Nonadditive cooperativity associated with complete saturation of the half-sites is accounted for by DGc2 (species 15). 5. Cooperativity between the half-sites and the palindrome occurs only upon complete saturation of the three half-sites and is described by DGc3 (species 16). 6. PR-A undergoes a monomer-dimer association reaction described by the previously measured dimerization affinity (DGdim). In the case of the MMTV promoter, seven templates were required to resolve all binding parameters. The specific templates used in the analysis were the wild-type promoter (MMTVwt); promoters lacking either site 1

55

Thermodynamics of Progesterone Receptor Function

A

−167

PRE-1

−111

−90

PRE-2 PRE-3

−75

−68

−48

−35

−22

PRE-4 NF-1 OCT-1 OCT-1 TATA

kdim

B

2 k1

k2

kc1

kc1 kc1 kc2

kc1 kc1 kc2 kc3

Figure 2.5 Schematic of the MMTV promoter and select assembly states from PR-A: MMTV binding model—Shown in Panel A is a schematic of the MMTV promoter. PREs are indicated by open rectangles and labeled 1–4. Site 1 corresponds to the palindromic PRE (GTTACAAACTGTTCT); sites 2–4 correspond to the three halfsite PREs of identical sequence (TGTTCT). Binding sites for cofactors are indicated by shaded rectangles and labeled as NF-1, OCT-1, and TATA. The numbers above the schematic indicate bp position relative to the transcriptional start site. The location of the transcriptional start site is as indicated by the arrow above the schematic. Shown in Panel B is the binding model for the PR-A:MMTV interaction. Promoter layout is as described in Panel A. PR-A monomers (filled circle) can either dimerize in solution (depicted by the transition from filled circles to filled squares; kdim) or bind at a half-site (k1). A solution dimer can bind at the palindromic site (k2). Two monomers bound at half-sites cooperatively interact in a pairwise fashion (kc1). Nonadditive cooperativity induced by addition of a third monomer to the half-sites is accounted for by kc2. Saturation of the MMTV promoter is linked to a third cooperative interaction between the palindrome and the three half-sites (kc3).

56

Keith D. Connaghan-Jones and David L. Bain

Table 2.3 Model configurations and free energies for PR-A:MMTV binding

a

Site occupancya

Species (s)

1

2

3

4

Free energy contribution

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

– – – – xx – – – xx xx xx xx xx xx – XX

– x – – – X X – x – – X X – X X

– – x – – X – X – x – X – X X X

– – – x – – X X – – x – X X X X

17 18

– XX

X X

X X

X X

Reference State DG1 DG1 DG1 DGdim þ DG2 2 DG1 þ DGc1 2 DG1 þ DGc1 2 DG1 þ DGc1 DGdim þ DG2 þ DG1 DGdim þ DG2 þ DG1 DGdim þ DG2 þ DG1 DGdim þ DG2 þ 2 DG1 þ DGc1 DGdim þ DG2 þ 2 DG1 þ DGc1 DGdim þ DG2 þ 2 DG1 þ DGc1 3 DG1 þ 2 DGc1 þ DGc2 DGdim þ DG2 þ 3 DG1 þ 2 DGc1 þ DGc2 þ DGc3 3 DG1 þ DGc234 DGdim þ DG2 þ 3 DG1 þ DGc1234

Each site on the MMTVwt promoter is indicated as sites 1–4. Each diagram represents one of the possible microscopic configurations. A PR-A protomer bound to a site is indicated by ‘‘x.’’ Cooperative PR-A binding to nonadjacent sites is indicated by ‘‘X.’’ Dash (–) indicates a vacant site.

(MMTV1), site 3 (MMTV3) or site 4 (MMTV4); sites 1 and 3 (MMTV1,3) or sites 1 and 4 (MMTV1,4); and sites 2, 3 and 4 (MMTV2,3,4). Using quantitative footprinting, individual-site binding curves were generated for the interaction of PR-A with each template. The binding curves associated with each site and each template are presented in Fig. 2.6 (Connaghan-Jones et al., 2008a). Using this model, the data from the wild-type and six reduced-valency templates were fit simultaneously. The resolved parameter values are shown in Table 2.4 (Connaghan-Jones et al., 2008a). Visual inspection of the fit indicates that the model does an excellent job at describing the binding isotherms. Thus the first criterion of model validation, addressing whether the model can account for the experimental data, would appear to be satisfied. However, in order to rigorously validate the model, one must examine quantitatively the degree to which the model

57

Thermodynamics of Progesterone Receptor Function

A

1.4 1.2 1.0

− Y

0.8 0.6 0.4 0.2 0.0 −0.2 −0.4 10−11

B

10−10

10−9 10−8 [PR-Atotal] (M)

10−7

10−6

10−10

10−9 10−8 [PR-Atotal] (M)

10−7

10−6

10−10

10−9 10−8 [PR-Atotal] (M )

10−7

10−6

1.4 1.2 1.0 0.8

− Y

0.6 0.4 0.2 0.0

−0.2 −0.4 10−11

C

1.4 1.2 1.0

0.6

Y

–

0.8

0.4 0.2 0.0 −0.2 −0.4 10−11

Figure 2.6 (Continued)

58

Keith D. Connaghan-Jones and David L. Bain

D 1.4 1.2 1.0 0.8 Y

–

0.6 0.4 0.2 0.0 −0.2 −0.4 10−11

10−10

10−9 10−8 [PR-Atotal] (M )

10−7

10−6

10−10

10−9 10−8 [PR-Atotal] (M )

10−7

10−6

10−10

10−9 10−8 [PR-Atotal] (M )

10−7

10−6

E 1.4 1.2 1.0 0.8 Y

–

0.6 0.4

0.2 0.0 −0.2 −0.4 10−11

F

1.4 1.2 1.0

0.6

Y

–

0.8

0.4 0.2 0.0 −0.2 −0.4 10−11

Figure 2.6 (Continued)

59

Thermodynamics of Progesterone Receptor Function

G

1.4 1.2 1.0

0.6

Y

–

0.8

0.4 0.2 0.0 −0.2 −0.4 10−11

10−10

10−9 10−8 [PR-Atotal] (M )

10−7

10−6

Figure 2.6 MMTV:PR-A individual site binding isotherms for the wild-type and six reduced-valency templates—The fractional saturation of the each promoter template is shown as a function of total PR-A concentration. Each panel contains an inset depicting the specific reduced-valency template. The ‘‘X’’ indicates a nonfunctional site. Shown above a functional site is the symbol used to depict binding to that specific site (filled squares, site 1; open circles, site 2; open diamonds, site 3; open triangles, site 4). Solid lines represent best fit to site 1 from the global analysis. Dotted lines represent best fit to individual half-sites from the global analysis. Panels A, MMTVwt; B, MMTV1; C, MMTV3; D, MMTV4; E, MMTV1,3; F, MMTV1,4; G, MMTV2,3,4.

Table 2.4 Resolved free energy changes for PR-A:MMTV binding interactions.a Interaction free energy

DG1 DG2 DGc1 DGc2 DGc3 DGc234 DGc1234 a b c

kcal mol1

8.1 11.2 2.0 þ1.3 0.9 3.0 3.9

68% Confidence kcal mol1 b

7.9–8.2 11.2–11.3 b 1.9–2.5 b þ0.8–þ2.3 b 0.1–1.9 b 0.7 c 0.2 c

Values previously published in reference (Connaghan-Jones et al., 2008a). Errors represent 68% confidence intervals established from Monte Carlo analysis. Errors represent 68% confidence intervals reported from the program Scientist.

describes the data. We examine this issue by testing the various rules that make up the model. Perhaps the most unorthodox rule for PR-MMTV promoter interactions is that monomers bind at individual half-sites (rule 1). In order to confirm such stoichiometry, we used sedimentation equilibrium to measure

60

Keith D. Connaghan-Jones and David L. Bain

the reduced molecular weight of an MMTV promoter containing only the three half-sites when in the presence of increasing concentrations of PR-A. The resolved reduced molecular weights (s) are shown in Fig. 2.7 (Connaghan-Jones et al., 2008a). Comparison of the resolved s values with the predicted s for saturation of monomers at the three half-sites (short dashed line) is clearly consistent with a binding stoichiometry of one monomer per site. More compellingly, the predicted binding isotherm based on the experimentally resolved binding energetics (Tables 2.3 and 2.4) is entirely consistent with the sedimentation data. By contrast, a calculated binding isotherm that assumes only dimers bind at half-sites is at entirely at odds with the experimental data. As an additional check of the model, the remaining rules were examined for internal consistency by comparing the resolved microscopic values with macroscopic values incorporating the same interactions. As an example, shown in Table 2.3 is the cooperative free energy contribution to the saturated MMTV promoter described using a macroscopic interaction 7.0 8

6.5 6.0

5

s (cm−2)

5.5 5.0

2 1.E − 02

1.E + 00

1.E + 02

1.E + 04

4.5 4.0 3.5 3.0 2.5 2.0 0.0

0.1

0.2 [PR-Atotal] (mM)

0.3

0.4

Figure 2.7 Sedimentation equilibrium analysis of MMTV1- binding stoichiometry— Plot of the measured sigma values from two independent experiments (closed squares and open circles) covering a range of 0 to 0.275 mM PR-A in the presence of 20 nM MMTV1 promoter DNA. Error bars represent 68% confidence intervals. Shown as a solid line is the predicted sigma value for a monomer binding to each half-site. Shown as a dashed line is the predicted sigma value for a dimer binding to each half-site. Dotted line represents calculated sigma for three monomers saturating the three half-sites. The inset presents the identical plot in logarithmic scale covering a broader concentration range. Figure reproduced with permission (Connaghan-Jones et al., 2008a).

Thermodynamics of Progesterone Receptor Function

61

parameter (DGc1234, corresponding to species 18). Similarly, the cooperativity associated with saturation of the three half-sites can be described with a macroscopic parameter (DGc234) shown as species 17. The relationships between the microscopic and macroscopic cooperativity parameters are shown in Eqs. (2.6) and (2.7).

DGc1234 ¼ 2DGc1 þ DGc2 þ DGc3 :

ð2:6Þ

DGc234 ¼ 2DGc1 þ DGc2 :

ð2:7Þ

The complete experimental data set can then be reanalyzed using the macroscopic parameters, DGc1234 and DGc234. The results are then used to check the accuracy of the microscopic formulation by examining whether the microscopic values sum to the macroscopic values. Because the resolved macroscopic parameters are in excellent agreement with their analogous sum of microscopic parameters (Table 2.4), we conclude that the model is internally consistent and that we have accounted for all major contributions to cooperativity. The next issue is to determine the degree to which the model is unique in its ability to accurately describe the data. This necessitates testing differing models and analyzing subsets of the data. For example, the validity of rule 5 (cooperativity between the half-sites and the palindrome occurs only upon complete saturation of the three half-sites) was tested by fitting data from the MMTV3 and MMTV4 promoters. A model was used that allowed for a cooperative interaction between the half-sites and the palindrome. However, the resolved cooperativity term was found to be statistically equal to zero, thus consistent with the rule that palindrome:half-site cooperativity only occurs upon complete occupancy of the half-sites. A similar process was carried out for other rules of promoter binding (Connaghan-Jones et al., 2008a). The third and final concern is whether the data support the level of model complexity. This was addressed using Monte Carlo simulations: Each of the five model-dependent binding parameters resolved from the global analysis was used to generate a simulated individual-site binding curve (covering the same protein concentration and number of data points as the experimental isotherms) for each of the MMTV templates. Gaussian error was added to each data set such that the error in each in silico isotherm matched the error in the experimental data. The simulated data was then fit using the model described previously in order to resolve a new set of parameters. This process was carried out 100 times and the resultant parameter values were binned in a histogram distribution. The results of this analysis showed that each of the model-dependent parameters (DG1, DG2, DGc1, DGc2, and DGc3) were grouped tightly about a peak value (Connaghan-Jones et al., 2008a). From these distributions the 67%

62

Keith D. Connaghan-Jones and David L. Bain

confidence intervals were calculated. The confidence intervals (as shown in Table 2.4) were generally less than 15% of the resolved free energy changes, thus demonstrating that the wild-type and six reduced valency templates were sufficient to resolve the five interaction terms. Several important discoveries regarding PR promoter assembly were made upon model validation. First, in contrast to the traditional biochemical understanding that dimers are the only active binding species (Tsai and O’Malley, 1994), the approach used here conclusively demonstrates that it is monomers rather than dimers that bind at isolated half-sites. Furthermore, monomer rather than dimer binding accounts for the majority of binding energetics at the MMTV promoter. What this might mean functionally is still unclear; however, because DNA binding affinity is in the nanomolar range under conditions in which PR dimerization occurs in the micromolar range, there are very few dimers present upon initiation of DNA binding. It may be that monomers instead of dimers are the predominant binding species. Also, similar to what was observed for the PRE2 promoter, there are strong unfavorable forces associated with the assembly reaction. In this case, we again observe a large penalty associated with placing a PR dimer on a palindromic response element, but we also note that monomers bound at the three half-sites is accompanied by a decrease in stability of þ1.3 kcal/mol, thus assembly at the promoter cannot occur via a simple, additive binding reaction. Finally, and again in contrast to previous conclusions (Bailly et al., 1991; Chalepakis et al., 1988; Perlmann et al., 1990), PR binds to the MMTV promoter with high cooperativity and with multiple types of cooperative interactions (DGc1, DGc2, and DGc3). It is important to note that the nature of the interactions indicates that the receptor follows a specific code or algorithm for assembling at the promoter. The possible functional implications of cooperativity and this code are expanded upon subsequently.

5. Measuring the Energetics of Coactivator Recruitment As alluded to in the introduction, a critical process in transcriptional activation is the localization of coactivating proteins to the promoter. Although localization is likely to occur through multiple mechanisms, evidence suggests that one way is through an allosteric process (Lefstin and Yamamoto, 1998), whereby receptor interactions at a promoter are coupled to enhanced interactions between the receptor and the coactivating protein. We have begun to address the origins of this phenomenon by studying the interaction of a domain of steroid receptor coactivator 2 (SRC2) with a PR-promoter complex. Shown in Fig. 2.8 is a thermodynamic cycle describing the pathways by which a coactivator could become recruited

63

Thermodynamics of Progesterone Receptor Function

ΔGa

ΔGb

ΔGc

ΔGd

Figure 2.8 Thermodynamic cycle for SRC2 recruitment to the PRE1 promoter— SRC2 can be localized to the promoter through one of two pathways, by either binding to a preformed PR-A dimer:promoter complex (DGa þ DGc) or interacting with an unbound PR-A dimer prior to promoter binding (DGb þ DGd). Free energy changes are related to the equilibrium constants through the standard expression, DGi ¼ RT ln ki.

to a promoter containing a single PRE (PRE1). Simply summarized, the interaction of coactivator with free receptor can enhance the binding affinity of the receptor towards the promoter, or the DNA-bound receptor can enhance the binding of coactivator. The energetics of coactivator recruitment can be quantified as the difference in the PR-coactivator affinity in the absence and presence of DNA (DDGrecruit ¼ DGc DGb, Fig. 2.8). Unfortunately, DGc and DGb are technically difficult to measure experimentally. By contrast, it is straightforward to measure DGa and DGd, and due to conservation of energy (i.e., DGa þ DGc ¼ DGb þ DGd) it is possible to determine DDGrecruit. Although not depicted, the identical approach can be applied to the parameters associated with binding at the PRE2 promoter, for example. From an experimental perspective, DGa can be determined by measuring the affinity of PR for a single PRE in the absence of SRC2 protein (see Tables 2.1 and 2.2). A measure of DGd is only slightly more complicated in that it entails measuring receptor-promoter interactions under saturating conditions of coactivator. That is, all PR, whether bound or not to the promoter, should be complexed with coactivator. In the case of PR-SRC2 interactions, measuring saturation was not directly possible due to the difficulty in quantitatively measuring the PR-coactivator interaction in isolation. However, saturating conditions were determined empirically by demonstrating that SRC2 induced effects on PR-promoter binding no longer changed when the coactivator concentration was increased. Finally, it is imperative to account for any effect that the coactivator might have on the self-association of PR in the absence of DNA (kdim). This issue can be examined using sedimentation analysis as described elsewhere (Heneghan et al., 2007). Shown in Table 2.5 are the intrinsic and cooperative free energies for both PR-A monomer and dimer binding to the PRE2 promoter in the presence and absence of SRC2. Also shown are the free energy differences

64

Keith D. Connaghan-Jones and David L. Bain

Table 2.5 Resolved free energy changes and differences for PR-A:PRE2 binding in the presence and absence of SRC2a

a b c

Interaction free energy

(þ)SRC2b kcal/mol

()SRC2b kcal/mol

DDGc kcal/mol

DG2 DGc2 DG1 DGc1 DGc2 DGdim

11.7 0.1 1.3 0.2 8.7 0.2 1.4 0.5 1.9 0.3 7.6 0.6

11.4 0.1 0.4 0.2 8.3 0.3 2.1 0.8 0.6 0.2 7.6 0.6

0.3 0.1 0.9 0.3 0.4 0.4 0.7 0.9 1.3 0.4 —

Values obtained from matched experiments previously published (Heneghan et al., 2007). Errors represent 68% confidence intervals as reported by the program Scientist (Micromath). Values represent parameter differences between (þ) and () SRC2 experiments. Errors propagated using standard methods (Bevington, P. R., and Robinson, D. K. (1969). ‘‘Data Reduction and Error Analysis for the Physical Sciences.’’ McGraw-Hill Higher Education, New York, NY).

(DDG) of PR-A:promoter interactions as a function of coactivator. These values revealed several previously unknown details with regard to coactivator recruitment. First, recruitment to an isolated PRE is possible but inefficient as judged by the modest 0.3 kcal/mol (two-fold) increase in the affinity of a PR-A dimer. More intriguing is a greatly enhanced recruitment of 0.9 to 1.3 kcal/mol (5- to 11-fold increase) associated with PR-A binding to the multisite PRE2 promoter. This increase in recruitment is not a simple result of an increased number of DNA-binding sites. Rather, it is due to significantly enhanced cooperative interaction between the two PREs. Thus recruitment of SRC2 is most efficient when cooperativity is allowed between PREs. To this end we are currently investigating the ability of PR to recruit coactivators on the MMTV promoter, attempting to determine which of the three types of cooperative interactions are associated with enhanced SRC2 binding.

6. Correlation to Biological Function The results of the studies presented here may offer new insight into the ability of PR to activate transcription. As discussed earlier, it is evident PRB binds with enhanced affinity and cooperativity relative to PR-A. Using the experimentally determined free energies shown in Table 2.2, the probability of occupancy for the PRE2 promoter can be calculated for both isoforms as a function of receptor concentration (see Fig. 2.9) (Connaghan-Jones et al., 2007). The extent of biologically relevant promoter occupancy can then be approximated using the experimental

65

Thermodynamics of Progesterone Receptor Function

A

1.0

Probability

0.8 0.6 0.4 0.2 0.0 −10 10 B

10−9

10−8 [PR-Atotal] (M )

10−9

10−8

10−7

10−6

−7

10−6

1.0

Probability

0.8 0.6 0.4 0.2 0.0 10−10

10

[PR-Atotal] (M )

Figure 2.9 Simulated species probabilities for PR-A and PR-B binding to the PRE2 promoter—Shown in Panel A is the species distribution for PR-A binding to the PRE2 promoter through the dimer pathway. Panel B depicts the same distribution for the PR-B: PRE2 interaction. Dashed line represents unbound PRE2 DNA. Solid line represents a single dimer bound to the promoter. Dotted line represents saturated of two dimers bound to the PRE2 promoter. Also shown is the proportion of respective receptor isoform existing in a dimeric state in solution (þ). The shaded box represents the experimental estimate of the intracellular PR concentration (Theofan and Notides, 1984).

estimate of intracellular PR concentration (Theofan and Notides, 1984). As shown in Fig. 2.9A it is apparent that the fully occupied (i.e., transcriptionally active) microstate represents approximately 50% of the species distribution at physiological concentrations of PR-A. By contrast, when in the presence of PR-B, the fully ligated microstate accounts for nearly 100% of the population (Fig. 2.9B). Consistent with this, the transcriptional activity of each isoform on the PRE2 promoter as seen in cellular assays demonstrate that PR-B is a two- to five-fold stronger activator (Meyer et al., 1992; Sartorius et al., 1994).

66

Keith D. Connaghan-Jones and David L. Bain

Although one basis for the difference in isoform-specific binding occupancies is the increased intrinsic binding energetics of PR-B relative to PR-A (Connaghan-Jones et al., 2007), the more significant contributor is the difference in intersite cooperative free energy (DGc2; see Table 2.2). Thus considering that cooperativity is a part of the mechanism for both the PRE2 and MMTV promoters (Connaghan-Jones et al., 2007, 2008a; Heneghan et al., 2006), and that most promoters appear to contain multiple response elements capable of allowing cooperativity, our results predict that the B-isoform should regulate the majority of PR-regulated genes. Indeed, microarray analysis reveals that of 94 regulated genes, 65 were regulated by PR-B, 25 were regulated by both PR-B and PR-A, but only 4 were regulated by PR-A (Richer et al., 2002). Furthermore, computer simulations predict that promoter-specific modulation of cooperative interactions, and subtle changes in relative receptor concentrations, can potentially explain how PR-A can regulate genes while in the presence of PR-B (Connaghan-Jones et al., 2007). The results of the simulations presented in Fig. 2.9 suggest that cooperativity is a key component to transcriptional regulation. This conclusion may take on increased importance when considering that cooperativity is necessary to efficiently recruit coactivators. Seen in Fig. 2.10 is the calculated probability of observing the fully ligated PRE1 and PRE2 promoters 1.0

0.8

(+)SRC2

– Y

0.6

0.4 (−)SRC2

0.2

0.0 10−14 10−13 10−12 10−11 10−10

10−9

10−8

10−7

10−6

[PR-Adimer] (M )

Figure 2.10 Simulated probabilities for the saturated PRE2 and PRE1- promoters in the presence and absence of SRC2—Fine solid line represent the fully saturated PRE2 promoter in the absence of SRC2. Fine dashed line represents the fully saturated PRE1 promoter in the absence of SRC2. Bold solid line represents the fully saturated PRE2 promoter in the presence of SRC2. Bold dashed line represents the fully saturated PRE1 promoter in the presence of SRC2. Probabilities were calculated based on the experimentally determined interaction energetics presented in Table 2.5.

Thermodynamics of Progesterone Receptor Function

67

either in the presence and absence of SRC2 (Heneghan et al., 2007). Examination of the plot reveals several interesting features. It can be seen that the presence of SRC2 generates only an incremental increase in the probability of the fully ligated PRE1 microstate, whereas the probability of the fully ligated PRE2 microstate increases enormously, particularly in the range of 1010 to 108 M PR dimer concentration. These results suggest that the linkage between cooperativity and coactivator recruitment can explain the long-observed synergistic increase in transcriptional activity that is associated with only a monotonic increase in promoter response elements (Meyer et al., 1992). These results may also shed new light on the functional significance of the multiple types of cooperativity seen for PR-A interactions at the MMTV promoter.

7. Conclusions and Future Directions A number of unexpected results arose out of the studies presented here. For example, although PR self-associates into dimers as previously thought, the energetics of dimerization are roughly 1000-fold weaker than estimated by biochemical approaches. As a consequence at the concentrations at which DNA binding is initiated, there is little to no dimer present (Fig. 2.9), raising the question of whether dimers are the predominant binding unit as typically assumed (Tsai and O’Malley, 1994). Furthermore, the traditional understanding that monomers assemble to generate a highaffinity dimeric binding species, while true, is also incomplete and misleading: Although it is indeed the case that dimer assembly creates a high-affinity binding species, the binding affinity is much greater than could be predicted by visual inspection of binding curves, and yet it is also much weaker than could be anticipated by biochemical models of function. This is so because dimer binding at a palindrome (whether by successive monomer assembly or via a preformed dimer) is accompanied by an enormous energetic penalty. With regard to binding at multiple PREs, despite visual evidence to the contrary, binding at multiple PREs is accompanied by significant cooperative contributions (Connaghan-Jones et al., 2007; Heneghan et al., 2006). Moreover, PR monomers not only are capable of binding DNA but also engage in cooperative interactions between half-sites of varying distances and orientations, and can account for the majority of the binding energetics on a natural promoter (Connaghan-Jones et al., 2008a). Interestingly, the strongly favorable and unfavorable binding energetics evoke comparison to the thermodynamics of protein folding. In particular, much of the penalties to binding that we have observed may be attributed to coupled folding reactions within receptor N-terminal sequences (Bain et al., 2000, 2001; Brodie and McEwan, 2005; Kumar and Thompson, 2003;

68

Keith D. Connaghan-Jones and David L. Bain

Lefstin and Yamamoto, 1998; Warnmark et al., 2003). We note that none of these conclusions could easily be reached by traditional biochemical or cell biological studies of progesterone receptor function. Nonetheless, as we have discussed, these results can account for a number of previous observations seen in cellular studies. Finally, many unresolved issues exist in our attempt to gain a more nuanced and comprehensive understanding of receptor-mediated transcription. First, how does each PR-promoter microstate contribute to function? Second, what is the role of monomers in PR function? Third, what are the kinetic (time-dependent) pathways to binding, cooperativity, and coactivator recruitment? Fourth, what is the role of chromatin? More broadly, how do we apply thermodynamics and kinetics to understand the steadystate behavior in a living cell? It is likely that new methods of experimentation, theory, and analysis will be required to examine all of these questions with appropriate detail and rigor.

ACKNOWLEDGMENTS This work was supported by National Institutes of Health Grants DK061933 and DK071652 to D.L.B. and an American Foundation for Pharmaceutical Education Fellowship award to K.D.C.-J. We also thank Dr. Aaron F. Heneghan, Michael T. Miura and Amie D. Moody for helpful input and discussion.

REFERENCES Ackers, G. K. (1970). Analytical gel chromatography of proteins. In ‘‘Advances in protein chemistry’’ (Vol. 24, pp. 343–446). Academic Press, New York, NY. Ackers, G. K., Johnson, A. D., and Shea, M. A. (1982). Quantitative model for gene regulation by lambda phage repressor. Proc. Natl. Acad. Sci. USA 79, 1129–1133. Bailly, A., Rauch, C., Cato, A. C. B., and Milgrom, E. (1991). In two genes, synergism of steroid hormone action is not mediated by cooperative binding of receptors to adjacent sites. Mol. Cell. Endocrin. 82, 313–323. Bain, D. L., Franden, M. A., McManaman, J. L., Takimoto, G. S., and Horwitz, K. B. (2000). The N-terminal region of the human progesterone A-receptor: Structural analysis and the influence of the DNA binding domain. J. Biol. Chem. 275, 7313–7320. Bain, D. L., Franden, M. A., McManaman, J. L., Takimoto, G. S., and Horwitz, K. B. (2001). The N-terminal region of human progesterone B-receptors: Biophysical and biochemical comparison to A-receptors. J. Biol. Chem. 276, 23825–23831. Brenowitz, M., Senear, D. F., Shea, M. A., and Ackers, G. K. (1986). Footprint titrations yield valid thermodynamic isotherms. Proc. Natl. Acad. Sci. USA 83, 8462–8466. Brodie, J., and McEwan, I. J. (2005). Intra-domain communication between N-terminal and DNA-binding domains of the androgen receptor: Modulation of androgen response element DNA binding. J. Mol. Endocrinol. 34, 603–615. Chalepakis, G., Arnemann, J., Slater, E., Bruller, H.-J., Gross, B., and Beato, M. (1988). Differential gene activation by glucocorticoids and progestins through the hormone regulatory element of mouse mammary tumor virus. Cell 53, 371–382.

Thermodynamics of Progesterone Receptor Function

69

Chen, Z., Shemshedini, L., Durand, B., Noy, N., Chambon, P., and Gronemeyer, H. (1994). Pure and functionally homogeneous recombinant retinoid X receptor. J Biol. Chem. 269, 25770–25776. Connaghan-Jones, K. D., Heneghan, A. F., Miura, M. T., and Bain, D. L. (2006). Hydrodynamic analysis of the human progesterone receptor A-isoform reveals that selfassociation occurs in the micromolar range. Biochemistry 45, 12090–12099. Connaghan-Jones, K. D., Heneghan, A. F., Miura, M. T., and Bain, D. L. (2007). Thermodynamic analysis of progesterone receptor-promoter interactions reveals a molecular model for isoform-specific function. Proc. Natl. Acad. Sci. USA 104, 2187–2192. Connaghan-Jones, K. D., Heneghan, A. F., Miura, M. T., and Bain, D. L. (2008a). Thermodynamic dissection of progesterone receptor interactions at the mouse mammary tumor virus promoter: Monomer binding and strong cooperativity dominate the assembly reaction. J. Mol. Biol. 377, 1144–1160. Connaghan-Jones, K. D., Moody, A. D., and Bain, D. L. (2008b). Quantitative DNase footprint titration: A tool for analyzing the energetics of protein-DNA interactions. Nat. Protocols 3, 900–914. Heneghan, A. F., Berton, N., Miura, M. T., and Bain, D. L. (2005). Self-association energetics of an intact, full-length nuclear receptor: The B-isoform of human progesterone receptor dimerizes in the micromolar range. Biochemistry 44, 9528–9537. Heneghan, A. F., Connaghan-Jones, K. D., Miura, M. T., and Bain, D. L. (2006). Cooperative DNA binding by the B-isoform of human progesterone receptor: Thermodynamic analysis reveals strongly favorable and unfavorable contributions to assembly. Biochemistry 45, 3285–3296. Heneghan, A. F., Connaghan-Jones, K. D., Miura, M. T., and Bain, D. L. (2007). Coactivator assembly at the promoter: Efficient recruitment of SRC2 is coupled to cooperative DNA binding by the progesterone receptor. Biochemistry 46, 11023–11032. Hill, T. L. (1960). ‘‘An introduction to statistical thermodynamics.’’ Dover Publications, New York. Hurd, C., and Moudgil, V. K. (1988). Characterization of R5020 and RU486 binding to progesterone receptor from calf uterus. Biochemistry 27, 3618–3623. Johnson, M. L. (1992). Why, when, and how biochemists should use least squares. Anal. Biochem. 206, 215–225. Kumar, R., and Thompson, E. B. (2003). Transactivation functions of the N-terminal domains of nuclear hormone receptors: Protein folding and coactivator interactions. Mol. Endocrin. 17, 1–10. Lefstin, J. A., and Yamamoto, K. R. (1998). Allosteric effects of DNA on transcriptional regulators. Nature 392, 885–888. Liu, Z., Wong, J., Tsai, S. Y., Tsai, M.-J., and O’Malley, B. W. (2001). Sequential recruitment of steroid receptor coactivator-1 (SRC-1) and p300 enhances progesterone receptor-dependent initiation and reinitiation of transcription from chromatin. Proc. Natl. Acad. Sci. USA 98, 12426–12431. Metivier, R., Penot, G., Hubner, M. R., Reid, G., Brand, H., Kos, M., and Gannon, F. (2003). Estrogen receptor-alpha directs ordered, cyclical, and combinatorial recruitment of cofactors on a natural target promoter. Cell 115, 751–763. Meyer, M. E., Quirin-Stricker, C., Lerouge, T., Bocquel, M. T., and Gronemeyer, H. (1992). A limiting factor mediates the differential activation of promoters by the human progesterone receptor isoforms. J. Biol. Chem. 267, 10882–10887. Nagaich, A. K., Walker, D. A., Wolford, R., and Hager, G. L. (2004). Rapid periodic binding and displacement of the glucocorticoid receptor during chromatin remodeling. Mol. Cell 14, 163–174.

70

Keith D. Connaghan-Jones and David L. Bain

Onate, S. A., Prendergast, P., Wagner, J. P., Nissen, M., Reeves, R., Pettijohn, D. E., and Edwards, D. P. (1994). The DNA-bending protein HMG-1 enhances progesterone receptor binding to its target DNA sequence. Mol. Cell. Biol. 14, 3376–3391. Perlmann, T., Eriksson, P., and Wrange, O. (1990). Quantitative analysis of the glucocorticoid receptor-DNA interaction at the mouse mammary tumor virus glucocorticoid response element. J. Biol. Chem. 265, 17222–17229. Richer, J. K., Jacobsen, B. M., Manning, N. G., Abel, M. G., Wolf, D. M., and Horwitz, K. B. (2002). Differential gene regulation by the two progesterone receptor isoforms in human breast cancer cells. J. Biol. Chem. 277, 5209–5218. Sartorius, C. A., Melville, M. Y., Hovland, A. R., Tung, L., Takimoto, G. S., and Horwitz, K. B. (1994). A third transactivation function (AF3) of human progesterone receptors located in the unique N-terminal segment of the B-isoform. Mol. Endocrin. 8, 1347–1360. Schrader, W. T. (1975). Methods for extraction and quantification of receptors. Methods Enzymol. 36, 187–211. Senear, D. F., Dalma-Weiszhausz, D. D., and Brenowitz, M. (1993). Effects of anomalous migration and DNA to protein ratios on resolution of equilibrium constants from gel mobility-shift assays. Electrophoresis 14, 704–712. Skafar, D. F. (1991). Differential DNA binding by calf uterine estrogen and progesterone receptors results from differences in oligomeric states. Biochemistry 30, 6148–6154. Theofan, G., and Notides, A. C. (1984). Characterization of the calf uterine progesterone receptor and its stabilization by nucleic acids. Endocrinology 114, 1173–1179. Tsai, M. J., and O’Malley, B. W. (1994). Molecular mechanisms of action of steroid/thyroid receptor superfamily members. Annu. Rev. Biochem. 63, 451–486. Tsai, S. Y., Tsai, M.-J., and O’Malley, B. W. (1989). Cooperative binding of steroid hormone receptors contributes to transcriptional synergism at target enhancer elements. Cell 57, 443–448. Warnmark, A., Treuter, E., Wright, A. P. H., and Gustafsson, J.-A. (2003). Activation functions 1 and 2 of nuclear receptors: Molecular strategies for transcriptional activation. Mol. Endocrin. 17, 1901–1909. Wyman, J., and Gill, S. J. (1990). ‘‘Binding and linkage: Functional chemistry of biological macromolecules.’’ University Science Books, Mill Valley, CA.

C H A P T E R

T H R E E

Direct Quantitation of Mg2þ-RNA Interactions by Use of a Fluorescent Dye Dan Grilley,* Ana Maria Soto,† and David E. Draper‡ Contents 1. Introduction 2. General Principles 2.1. Ion-RNA interactions described by preferential interaction coefficients 2.2. Using an Mg2þ-binding dye to measure G2þ 2.3. Interaction coefficients vs. binding densities 3. Ion-Binding Properties of HQS 4. Preparation of Solutions and Reagents 4.1. Reagents and stock solutions 4.2. Sample preparation 5. Instrumentation and Data Collection Protocols 5.1. Automated titrations 5.2. Manual titrations 6. Data Analysis 7. Controls and Further Considerations Acknowledgments References

72 73 73 74 76 78 81 81 83 84 84 87 88 90 92 92

Abstract The ionic composition of a solution strongly influences the folding of an RNA into its native structure; of particular importance, the stabilities of RNA tertiary structures are sharply dependent on the concentration of Mg2þ. Most measurements of the extent of Mg2þ interaction with an RNA have relied on equilibrium dialysis or indirect measurements. Here we describe an approach, based on titrations in the presence of a fluorescent indicator dye, that accurately measures the excess Mg2þ ion neutralizing the charge of an RNA (the interaction or * { {

Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University, Evanston, Illinois, USA Department of Chemistry, Towson University, Towson, Maryland, USA Department of Chemistry and Biophysics, Johns Hopkins University, Baltimore, Maryland, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04203-1

#

2009 Elsevier Inc. All rights reserved.

71

72

Dan Grilley et al.

Donnan coefficient, G2þ) and the total free energy of Mg2þ- RNA interactions (DGRNA-2þ). Automated data collection with computer-controlled titrators enables the collection of much larger data sets in a short time, compared to equilibrium dialysis. G2þ and DGRNA-2þ are thermodynamically rigorous quantities that are directly comparable with the results of theoretical calculations and simulations. In the event that RNA folding is coupled to the addition of MgCl2, the method directly monitors the uptake of Mg2þ associated with the folding transition.

1. Introduction Early RNA folding studies demonstrated the unusual sensitivity of tRNA tertiary structure to Mg2þ ions (Stein and Crothers, 1976a); it has subsequently been recognized as a general principle that strong stabilization of an RNA structure by Mg2þ is diagnostic of the formation of tertiary contacts (Tinoco and Bustamante, 1999). RNA-folding studies have generally measured the net uptake of Mg2þ ions upon folding, or used spectroscopic or chemical probes to detect direct ion-RNA contacts (Draper et al., 2005). To develop a comprehensive framework for understanding the ways in which Mg2þ stabilizes tertiary structures, it is necessary to quantitate the ion interactions taking place with both folded and unfolded states of the RNA. The interactions can be described both in terms of the number of excess Mg2þ ions neutralizing the RNA charge and the overall free energy of Mg2þ interactions. Both experimental parameters are useful benchmarks for comparing results of theoretical calculations and simulations (Grilley et al., 2007; Misra and Draper, 2002; Soto et al., 2007). Several studies in the 1970s used equilibrium dialysis to follow the excess Mg2þ accumulated by transfer RNA (Bina-Stein and Stein, 1976; Stein and Crothers, 1976b). Dialysis is a rigorous way to measure thermodynamic quantities, but its application has been limited by requirements for longterm stability of the RNA solutions and the time needed to collect an extensive data set. As an alternative to equilibrium dialysis, chelating dyes have been used to sense the change in Mg2þ activity caused by the presence of an RNA. Eriochrome black T, which changes its extinction coefficient upon binding to Mg2þ, was used in an set of studies examining Mg2þ association with homopolymer RNAs (Krakauer, 1971). The fluorescence of a quinoline derivative has also been used to follow Mg2þ interactions with RNA (Ro¨mer and Hach, 1975; Serebrov et al., 2001) and proteins (Pickett et al., 2003). Here we present methods to measure Mg2þ–nucleic acid interactions using a fluorescent dye. The procedure can be largely automated by the use of computer-controlled titrators, enabling the collection of large data sets (>100 points) in less than a few hours. With appropriate controls, the results are as accurate and as thermodynamcally rigorous

73

Quantitation of Mg2+-RNA Interactions

as those from equilibrium dialysis. A single titration data set can be analyzed to find the excess Mg2þ associated with an RNA, the free energy of Mg2þ interaction with an RNA, and/or the uptake of Mg2þ ions accompanying a folding reaction or RNA conformational change.

2. General Principles 2.1. Ion-RNA interactions described by preferential interaction coefficients The high negative charge of a nucleic acid is neutralized by both an accumulation of mobile cations and an exclusion of mobile anions from the surrounding solution (Record et al., 1998). The accumulated cations may interact with the nucleic acid in a variety of ways: fully hydrated ions interact strongly with the RNA electrostatic potential at a distance from the RNA surface (the ‘‘diffuse ion atmosphere’’), cations may be bound to specific sites within the RNA, or cations may contact the RNA surface via an intermediate layer of water (Draper 2004; Misra et al., 2003). The depletion of anions near the RNA is caused primarily by repulsive interactions between the ions and RNA negative charges. To describe the accumulation or exclusion of ions in a model-independent way, we use the formalism of interaction coefficients. For Mg2þ, the coefficient is defined as

G2þ

@C2þ @CRNA

m2þ ;C ;T ;P

;

ð3:1Þ

where C2þ, CRNA, and C are the molar concentrations of Mg2þ, RNA, and 1:1 salt, respectively; m2þ is the chemical potential of the Mg2þ ion (Grilley et al., 2006; Record et al., 1998). G2þ is the number of Mg2þ ions that must accompany every RNA molecule added to a solution in order to maintain a constant chemical potential of the ion. These G2þ ‘‘excess’’ ions neutralize 2G2þ RNA charges. The remainder of the RNA charge is neutralized by an excess of monovalent ions, Gþ, and a deficiency of anions, G (a negative quantity). The physical meaning of an interaction coefficient can be illustrated by an equilibrium dialysis experiment, which measures the Donnan coefficient:

GD 2þ

in out C2þ C2þ : CRNA

ð3:2Þ

74

Dan Grilley et al.

in C2þ is the molar Mg2þ ion concentrations present ‘‘inside’’ the dialysis out membrane with RNA (at concentration CRNA). C2þ , also known as the 2þ ‘‘bulk’’ concentration of ions, is the molar Mg ion concentration ‘‘outin out side’’ the dialysis membrane. (C2þ C2þ ) is the excess number of Mg2þ ions accumulated in response to the presence of the RNA. In a typical set of measurements, a series of RNA solutions would be equilibrated with solutions of increasing Mg2þ concentration (and thus increasing m2þ) to D obtain GD 2þ as a function of m2þ. Formally, G2þ differs from G2þ (equation 3.1) in that the chemical potential of water is constant across the membrane in a dialysis experiment, while G2þ is defined for constant pressure. For the concentrations of salt and RNA considered here, the two coefficients are essentially identical and will not be distinguished in the following discussion (Anderson, et al., 2002).

2.2. Using an Mg2þ-binding dye to measure G2þ For applications in analytical chemistry, chelators have been synthesized that exhibit large changes in their absorption or fluorescence emission spectra upon binding specific metal ions. In principle, low concentrations of such a Mg2þ-binding dye could be used to sense the chemical potential of Mg2þ in a solution of RNA, yielding measurements of G2þ without the lengthy equilibration times needed for dialysis measurements. The experimental approach is sketched in Fig. 3.1A. Sample and reference solutions are prepared with identical concentrations of dye and buffer; the sample solution also contains RNA. The two solutions are titrated with MgCl2 in parallel, and the fraction of dye molecules complexed with Mg2þ, n, is

A

B sample C2+

0.24 0.20

Γ2+, ions per nucleotide

Fractional dye saturation ν

0.12

bulk C2+

0.16 0.12 Reference

Sample

0.08 0.04

Γ2+ =

bulk sample C2+ - C2+

0.5

1.0

0.08 0.06

ΔGRNA-2+ = −RT(area)

0.04 0.02

CRNA

0.00 0

0.10

1.5

Added MgCl2 (C2+), mM

2.0

0.00 10−6

10−5

10−4

bulk C2+ , M

Figure 3.1 Experimental approach for using an indicator dye to measure (A), G2þ and (B), DGRNA2þ for an RNA. See text for a description.

75

Quantitation of Mg2+-RNA Interactions

monitored spectroscopically. When n is plotted against the total concentration of added MgCl2, C2þ, a lag is observed in the curve for the sample solution (Fig. 3.1A) because favorable interactions between Mg2þ and RNA reduce the effective concentration (or activity) of the Mg2þ. The premise of this method, justified subsequently, is that sample and reference solutions with identical values of n also have the same Mg2þ ion chemical potential (m2þ) and would therefore be in thermodynamic equilibrium if separated by a dialysis membrane—they would be equivalent to the ‘‘inside’’ and ‘‘outside’’ solutions, respectively. The application of this premise is illustrated by the horizontal arrows in Fig. 3.1A, which compare C2þ of the two titration curves at points with the same n value. The Mg2þ interaction coefficient is calculated from these concentrations (see Eq. (3.2) as sample

G2þ ¼

C2þ

reference

C2þ CRNA

! :

ð3:3Þ

To justify the assumption made previously, that sample and reference solutions with the same value of n would be in thermodynamic equilibrium across a dialysis membrane, imagine a Mg2þ-RNA solution that is in dialysis equilibrium with buffer. Samples of the ‘‘inside’’ and ‘‘outside’’ solutions are then spectroscopically assayed with the Mg2þ-binding dye and a value for n is obtained after suitable normalization of the absorption or fluorescence signal. If the dye-ion complex has a 1:1 stoichiometry, the fraction of dye molecules complexed with Mg2þ is

v¼

CC ; CD þ CC

ð3:4Þ

where CC and CD are the molar concentrations of the complex and the unbound dye, respectively. Using the relationships between activity, concentration, and activity coefficient aC gCCC and aD gDCD, Eq. (3.4) becomes

v¼

aC : ðgC =gD ÞaD þ aC

ð3:5Þ

The question is whether the same value of n will be found in both solutions. Because the two solutions are in dialysis equilibrium, aC and aD must be identical between them. However, the activity coefficient gD or gC could differ between the solutions if either the free or complexed dye interacts with the RNA. For the dye used here, the Mg2þ-dye complex

76

Dan Grilley et al.

is electroneutral and thus has no electrostatic interaction with the RNA. As long as there is no direct binding of the complex to the RNA, gC is insensitive to the presence or absence of RNA. The dye itself has a negative charge, and thus should be excluded from the RNA-containing solution to approximately the same extent as the chloride ion that is also present. If these experiments are carried out with a large excess of monovalent salt over Mg2þ, the overall fraction of negative ions excluded by the RNA is so small as to introduce an error in the measurements of G2þ that is within the reproducibility of the experiment (see section 6 for a calculation of typical error). Thus, under appropriate conditions, nin nout is true to an acceptable level of approximation, and the dye can be used to monitor Mg2þ activity. An advantage of using a Mg2þ-sensing dye is that one pair of sample and reference titrations yields measurements of G2þ over a wide range of bulk Mg2þ concentrations, as graphed in Fig. 3.1B. This G2þ curve can then be used to find the Mg2þ-RNA interaction free energy by integration:

DGRNA2þ ﬃ RT

ð C2þ bulk 0

bulk G2þ d ln C2þ :

ð3:6Þ

DGRNA-2þ is the net free energy change that takes place when MgCl2 is added to an RNA solution until the bulk Mg2þ concentration reaches the upper limit of integration. A detailed derivation of this equation is available elsewhere (Grilley et al., 2006). Several approximations have been made in the derivation of Eqs. (3.3) and (3.6). The limitations these approximations impose on the measurement of G2þ and DGRNA2þ, as well as practical considerations in carrying out these measurements, are discussed in section 6.

2.3. Interaction coefficients vs. binding densities Equilibrium dialysis experiments are most often interpreted in terms of a binding density n, the average number of ligands bound per macromolecule. A plot of n against the ‘‘free’’ or bulk ligand concentration yields a binding curve, which is then fit with parameters describing the stoichiometry and affinities of different classes of binding sites (Wyman and Gill, 1990). Though this approach requires assumptions that are difficult to reconcile with long-range electrostatic interactions of ions, it is nevertheless widely used in the RNA-folding literature. Here we contrast the binding density and interaction coefficient approaches, and note why the latter is preferable for the analysis of Mg2þ-RNA titrations. A fundamental conceptual difference between the two approaches is the way in which interaction free energies are described. In a binding

Quantitation of Mg2+-RNA Interactions

77

formalism, each ligand-bound state of the macromolecule is considered a distinct species with its own standard chemical potential (m ). The interaction free energy is then the difference in standard state chemical potentials between the ligand and molecule in free and bound states, for example:

DGo ¼ m0complex ðm0macromolecule þ m0ligand Þ ¼ RT ln K:

ð3:7Þ

This approach works well when there are a small number of welldefined binding interactions. It breaks down in several ways when applied to ion–nucleic acid interactions. First, the repulsive interactions between anions and nucleic acids are not readily described in terms of specific complexes and are therefore usually ignored in the binding density formalism. Second, long-range electrostatic interactions create concentration gradients of ions near the RNA surface (the so-called ion atmosphere); closer cations experience stronger interactions. Because the binding formalism allows any one ion in solution to exist in only one of two states, bound in a complex or free in solution, these concentration gradients are also ignored. Last, the binding formalism relies on formulation of a model that describes all the possible bound states of the macromolecule. The usual models for Mg2þ-RNA interactions have postulated two or three classes of independent binding sites with strong or weak affinity (Stein and Crothers, 1976b); some models have included anticooperative interactions between ions (Laing et al., 1994; Leroy and Gue´ron, 1977). Given enough classes of binding sites, the binding polynomials generated by these models invariably provide a good fit to a data set, but in most cases the equilibrium constants and stoichiometries do not have readily interpretable physical significance, and in any case are not unique descriptions—the data can always be fit by multiple models (Leroy and Gue´ron, 1977). When using interaction coefficients, standard state chemical potentials are defined only for each electroneutral component added to the solution. For the cases considered here, there are formally four components: water, an RNA salt, monovalent (1:1) salt, and MgCl2. (Single-ion chemical potentials may also be defined, subject to the constraint of electroneutrality, e.g., m KCl ¼ m Kþ þ m Cl-). Interactions between any two components appear as mutual changes in their thermodynamic activities, which precludes any need to distinguish between bound and free fractions of a component. Thus, in an equilibrium dialysis experiment with Mg2þ and RNA, in out gin2þ C2þ ¼ gout 2þ C2þ ;

ð3:8Þ

because of the requirement that the Mg2þ activity (a2þ g2þC2þ) be the same on both sides of the membrane. Upon rearrangement, this equation

78

Dan Grilley et al.

shows how Mg2þ-RNA interactions are related to changes in the Mg2þ activity coefficient, in out out in C2þ C2þ ¼ C2þ ½ðgout 2þ =g2þ Þ 1:

ð3:9Þ

Favorable interactions between Mg2þ ions and RNA decrease gin2þ (relative in out to gout 2þ ) and thus cause an accumulation of ions by the RNA (C2þ > C2þ ). The left-hand side of Eq. (3.9) is the numerator of the Donnan coefficient, 2þ activity coefficient (g ) is mathematically GD 2þ 2þ (Eq. (3.2)). Thus, the Mg related to the interaction coefficient G2þ, and G2þ is in turn related to the Mg2þ-RNA interaction free energy (Eq. (3.6)). In contrast to the parameters obtained from a binding-density analysis, these three related thermodynamic quantities are model independent, As such, they may be directly compared with values of G2þ and DGRNA-2þ that have been extracted from theoretical calculations and simulations of model systems (Ni et al., 1999; Soto et al., 2007).

3. Ion-Binding Properties of HQS 8-hydroxyquinoline-5-sulfonic acid (HQS) was first described as a soluble quinoline derivative showing substantial changes in its absorption and fluorescence spectra upon chelation of a variety of metals with a valency of þ2 or larger (Bishop, 1963a,b; Liu and Bailar, 1951). The structure of HQS and its complex with Mg2þ are shown in Fig. 3.2A. The sulfonic acid group serves to increase the quinoline solubility in water; with a pK 4.0 (Smith and Martell, 1975), it is fully ionized in the pH range of interest here. The complex with Mg2þ has a net neutral charge, because ion chelation is coupled to deprotonation of the quinoline hydroxyl. Free HQS, with a protonated hydroxyl at pH 6–7, has an absorption maximum at 306 nm, which is shifted to longer wavelengths (355–357 nm) in the presence of either high pH or saturating concentrations of Mg2þ(Tables 3.1 and 3.2). Superimposed spectra taken over the pH range 6.0–9.5, or with Mg2þ concentrations up to 110 mM, show clear isosbestic points (Fig. 3.2B–C). The latter behavior is consistent with the formation of a single, 1:1 Mg2þHQS chelation complex. Fluorescence is observed at 500 nm upon excitation of the 355-nm absorption of the Mg2þ-HQS chelate. An 100-fold increase in fluorescence is obtained upon titration of HQS with MgCl2 (Fig. 3.3A). Fluorescence intensity data as a function of the total concentration of Mg2þ ion, C2þ, is fit very well by a single site binding isotherm:

79

Quantitation of Mg2+-RNA Interactions

A

B Extinction coefficient (mM −1 cm−1)

N OH Mg2+ H+ SO3−

O−

C 10

SO3−

8 6 4 2 0

N

250

Mg2+

300 350 400 Wavelength (nm)

250

300 350 400 Wavelength (nm)

Figure 3.2 Properties of HQS. (A) HQS reaction with Mg2þ ion. (B) HQS extinction coefficient as a function of pH (20 mM buffer: MES, MOPS, EPPS or CHES; 40 mM Kþ; 25 C). The pH values are 6.0 (gray), 6.4, 6.8, 7.0, 7.2, 7.4, 7.6, 8.0, 8.5, 9.0, and 9.5 (black). The absorbance does not change much for pH values less than 7.6. Extinction coefficients at high and low pH are compiled in Table 3.1. (C) HQS extinction coefficient as a function of Mg2þ concentration. The titration was performed in 20 mM MOPS pH 7.0, 10.0 mM Kþ, 6.43 mM Cl. The gray line represents 0 mM MgCl2, and the black line 110 mM MgCl2. Extinction coefficients at saturating Mg2þ are compiled in Table 3.2. Table 3.1 HQS absorption propertiesa Maxima at high and low pH

a

pH-independent isosbestic points

1

Wavelength (nm)

e (cm M1)

pH

240 253 270 306 357

26,300 24,000 3250 3600 3700

6.0 9.5 6.0 6.0 9.5

Wavelength (nm)

e (cm1 M1)

244 270 326 418

19,000 3250 2600 135

Spectra were taken at 25 C in either 20 mM MES pH 6.0, 40.0 mM Kþ, 31.7 mM Cl, or 20 mM CHES pH 9.5, 40 mM Kþ, 30 mM Cl. All extinction coefficients are based on an extinction of 2600 cm1M1 at 326 nm determined from three separate preparations of HQS in 10.0 mM EPPS, pH 8.0, 5.0 mM Kþ, 0.1 mM EDTA.

KHQS C2þ þ Imin ; I ¼ ðImax Imin Þ 1 þ KHQS C2þ

ð3:10Þ

where Imax is the intensity of the Mg2þ-HQS complex, Imin is the intensity of the free HQS, and KHQS is the apparent equilibrium constant for

80

Dan Grilley et al.

Table 3.2 Mg2þ-HQS absorption maxima and Mg2þ-independent isosbestic points at pH 7.0a e (cm1 M1)

244 255 267 329 355 415

17,700 24,200 3400 2250 3400 110

Type

isosbestic maximum isosbestic isosbestic maximum isosbestic

HQS extinction coefficients were measured in the presence of saturating Mg2þ (170 mM MgCl2, 20 mM MOPS pH 7.0, 10.0 mM Kþ, 6.43 mM Cl). Wavelengths, but not extinction coefficients, of Mg2þ-HQS absorbance maxima and Mg2þ-independent isosbestic points are constant with pH.

Relative fluorescence intensity

A

B

0.7

4.0 0.6 3.5

0.5 Log (KHQS)

a

Wavelength (nm)

0.4 0.3 0.2

0

KHQS =

2.5

Ka KHQS [H+] + Ka

2.0

0.1 0 0.0001

3.0

1.5 0.001 0.01 [MgCl2] (M)

0.1

6

6.5

7

7.5 8 pH

8.5

9

9.5

Figure 3.3 (A) Titration of HQS with MgCl2, pH 6.8, with a total of 60 mM Kþ. The curve is the least squares best fit to Eq. (3.9), with KHQS ¼ 372 M1. (B) Effective HQS binding constants as a function of pH: closed circles, Mg2þ; open circle, Ca2þ; open triangle, Ba2þ. Error bars represent one standard deviation calculated from at least three independent measurements. The dark line is a fit to the equation shown and gives the proton dissociation constant (Ka, pKa ¼ 8.43) and intrinsic HQS-Mg2þ binding constant (K HQS ¼ 11.3 mM1). The Kþ concentration is 40 mM.

formation of Mg2þ-HQS. At the concentration of HQS typically used in these titrations, the fraction of added ions that are bound to HQS is very small and accuracy is not compromised if the total ion concentration (C2þ) is used in place of the free (unbound) ion concentration. KHQS is pH dependent, because chelation of Mg2þ by HQS promotes deprotonation of the quinoline hydroxyl group. Log(KHQS) increases linearly with pH over the range 6–8 (Fig. 3.3B); the calculated pKa is 8.43, and the intrinsic K HQS is 11.3 103 M1, in agreement with literature values

Quantitation of Mg2+-RNA Interactions

81

(Smith and Martell, 1975). The large fluorescence enhancement and range of Mg2þ-binding affinities around neutral pH make HQS a suitable indicator for experiments measuring Mg2þ-RNA interactions. Starting from Mg2þ and going down the periodic table column of group II ions, HQS-metal ion-binding constants become weaker and the fluorescence enhancements become smaller. At pH 8.0 the apparent equilibrium constant for Ca2þ-HQS formation is 182 M1 and the fluorescence enhancement is 75 fold (Fig. 3.3B). The Ba2þ complex is even weaker, with an apparent binding constant of 36 M1 and a fluorescence enhancement of 10 fold. HQS may be a good sensor for Ca2þ- and Mg2þ-RNA interactions, but its affinity for Sr2þ or Ba2þ is probably too weak for this purpose.

4. Preparation of Solutions and Reagents 4.1. Reagents and stock solutions Measurement of metal ion–RNA association is critically dependent on the purity of the components that are used. All solutions should be made from pure water with at least 18 MO resistivity. Purchased buffers and salts should be at least 99.5% pure, but because EDTA is included in all the buffers to scavenge transition metals, salts of higher purity (and expense) are not necessary. HQS can be purchased from Sigma Chemicals in a relatively pure acid form. The small amount of metal contaminating the HQS can be removed by recrystallization at acid pH. First saturate 50 mL of pure water with the dye (5 g). To this solution add concentrated (12 N) HCl until no further color change or precipitation is perceptible. Heat the solution to boiling while stirring to fully redissolve the HQS. Cool the solution on ice, and drain the excess water leaving the crystalline HQS as very fine needles. Add fresh pure water and HCl to the HQS crystals, then heat with stirring until the HQS is all dissolved. Cool, drain, and repeat the process seven times, adding less water and HCl at each step to account for the slight loss of HQS that remains dissolved at the end of the cooling step. After the final step, add 50 mL of water and raise the pH to approximately 7 with KOH. Continue to add water and maintain pH 7 until all crystals are dissolved. When HQS treated this way is diluted into freshly made buffered solution, no change in its absorbance spectra is seen upon addition of 100 mM EDTA. The concentration of the stock solution can be determined using an extinction coefficient at the 326-nm isosbestic point (Table 3.1) of 2600 M1 cm1. The purified HQS can be divided into aliquots (100 mM stock is typical) and stored in acid rinsed glassware for years.

82

Dan Grilley et al.

The 99.99% purity MgCl2 hexahydrate available from Aldrich can be used as supplied to make 2 M MgCl2 stock solutions. MgCl2 solutions for titrations are prepared by diluting the stock 2 M MgCl2 solution into the appropriate titration buffer (see section 4.1). Titration buffers should be made from weighed salts and the acid form of the buffer; titrate with the appropriate 99.5% pure hydroxide salt to adjust the pH. Some convenient buffer concentrations and pHs for different monovalent cation concentrations are as follows: (A) 20 mM MOPS pH 7.2, 20 mM EDTA, 20.0 mM Kþ, 10.0 mM Cl (B) 20 mM MOPS pH 7.0, 20 mM EDTA, 40.0 mM Kþ, 32.3 mM Cl (C) 20 mM MOPS pH 6.8, 20 mM EDTA, 60.0 mM Kþ, 54.3 mM Cl (D) 20 mM MES pH 6.15, 20 mM EDTA, 150 mM Kþ, 140 mM Cl (Both buffer and EDTA stock solutions were prepared from the respective free acids and adjusted to the desired pH with KOH. The Kþ concentration is the sum of the weighed KCl and the KOH added to adjust the pH.) Lower pH, which weakens the affinity of HQS for Mg2þ, is paired with higher salt concentration to match the weakened Mg2þ-RNA interactions. These pairings have given reproducible measurements of G2þ with duplex DNA restriction fragments, RNAs that have Mg2þ-dependent or Mg2þindependent tertiary structures, and RNAs with only secondary structures. Titrations at pH 8.0 (and thus larger KHQS) extend the range of useful data bulk to C2þ concentrations as low as 1 mM (D. Leipply, personal communication). Glassware, plastic Eppendorf-style microcentrifuge tubes, and other preparative labware leach metals into solution. For that reason, 20 mM EDTA was included in all titration buffers. It is important to note that this level of EDTA serves only to scavenge transition metals, which bind EDTA much more tightly than does Mg2þ; an order of magnitude higher EDTA concentration is needed before a significant amount of Mg2þ is bound. The hygroscopic properties of MgCl2 affect the accuracy with which the salt can be weighed. Therefore either of the following two protocols is used to standardize MgCl2 solutions. Both are based on EDTA-Mg2þ complex formation but use different methods to detect the stoichiometric break point. The first protocol uses HQS as a reporter of free Mg2þ. An accurately known concentration of EDTA (1–5 mM) in 100 mM EPPS (pH 8.0) and 100 mM HQS is titrated with MgCl2 solution. Either fluorescence or absorbance of the HQS is plotted against added MgCl2 to find the amount of Mg2þ needed to titrate the EDTA to a stoichiometric endpoint. A solution of pH 8.0 provides an appropriate ratio of the EDTA and HQS binding affinities for accurate determination of the stoichiometry. A high buffer concentration is necessary because of the pH dependence of EDTA-Mg2þ binding (Martell and Smith, 1974).

Quantitation of Mg2+-RNA Interactions

83

The second protocol depends on the hypochromicity of EDTA in the low UV region (200–250 nm) upon metal ion binding. Titrations are conducted in 10 mM MOPS (pH 7) using semimicro-quartz cuvettes and recording an absorbance spectrum between 200 and 350 nm. In a typical experiment, 800 mL of a 1 mM solution of the metal ion of interest is titrated with 1- to 3-mL aliquots of a 40 mM EDTA solution. Plots of the absorbance at 230 nm versus the concentration of EDTA yield stoichiometric titrations with sharp break points that are used to calculate the concentration of the metal ion.

4.2. Sample preparation Metal-free DNA or RNA fragments can be prepared using a DEAE matrix made by Qiagen, which comes prepackaged in disposable columns appropriate for small step gradients. A Qiatip-2500 column holds 2.5 mg of double stranded nucleic acid (5 mg in a single strand), and has a column volume of approximately 30 mLs. The resin can be removed and repacked into a Biorad Econo-column for use with pumps or gradient makers. To use the column first make the following two buffers: Buffer A: 67mM MOPS, pH 6.8; 533 mM NaCl; 20% EtOH Buffer B: 67mM MOPS, pH 6.8; 3.0 M NaCl; 20% EtOH All the solutions that follow are adjusted to a final pH of 7.0 with NaOH after mixing buffer A or B with the other components. The resin is first washed with 0.15% Triton X-100, 25% water, and 75% buffer A, and then equilibrated with 25% water and 75% buffer A prior to loading the sample. Samples are diluted with 3 volumes of buffer A and loaded onto the column. If an RNA transcription mixture is being loaded, add sufficient EDTA to dissolve any magnesium pyrophosphate precipitate. After loading, rinse with 25% water and 75% buffer A until a stable baseline (monitored at 260 nm) is achieved. The column is then eluted in steps consisting of mixtures of buffers A and B in different ratios and 25% (v/v) urea (e.g., 100 mL of a 600 mM NaCl solution is made by mixing 67 mL of buffer A, 8 mL of buffer B, and 36.2 g of urea). Because addition of urea slightly shifts the pH, mix buffers A and B and dissolve urea before bringing the solution to pH 7.0 with NaOH. The Qiagen columns are particularly sensitive to pH. Doublestranded DNA tends to elute between 0.9 and 2 M NaCl, RNA between 0.4 and 1 M NaCl; larger nucleic acids elute at higher salt concentrations. Very large (greater than approximately 400 basepairs) nucleic acids may require higher pH for elution (e.g., 2 M salt and pH 8.0). Nucleic acids should be precipitated out of column fractions by addition of 50% isopropanol, which minimizes precipitation of the salt and urea. RNA transcripts for titrations can also be purified by denaturing gel electrophoresis followed by electroelution from an Elutrap (Schleicher &

84

Dan Grilley et al.

Schuell). The high concentration of EDTA and urea used in standard electrophoresis buffers are as efficient in removing Mg2þ and transition metal contamination as the preceding column protocol. Precipitated, purified nucleic acids should be resuspended in 50 mM EDTA, pH 7.0, 1 M ammonium acetate, to help remove any remaining metal contaminants. The nucleic acid can then be exchanged into titration buffer. Amicon centrifugal filtration devices (Millipore) are large enough (4 or 15 mL) to provide a reasonable dilution factor without concentrating the sample too much. Extremely high concentrations of nucleic acid (greater than 40 mM nucleotide) equilibrate slowly and should be avoided. A good final stock concentration is between 10 and 30 mM in nucleotide. Typical equilibrations involve at least eight fivefold dilutions. Nucleic acids approach equilibrium with low salt buffers very slowly by either dialysis or repeated rounds of concentration and dilution. We therefore first attempt to equilibrate samples with buffers containing half of the desired final salt concentration, and then bring the salt concentration up to the final desired concentration in the final dilution/concentration cycles. The approach of the sample to the final salt concentration can be tracked by UV-monitored melting curves. Sample RNA concentrations are adjusted with equilibration buffer to an RNA appropriate for a titration (1–4 mM nucleotide), and melting is monitored by absorbance at an appropriate wavelength in a 1-mm path-length cuvette. The same experiment repeated on a sample diluted 10-fold (0.1 mM nucleotide) in a 1-cm cuvette should have the same melting profile if equilibration has been achieved. Because of the high concentration of the sample, formation of RNA dimers is a potential artifact that could also cause differences between the melting profiles at two different RNA concentrations.

5. Instrumentation and Data Collection Protocols 5.1. Automated titrations All HQS titrations were performed on an Aviv ATF-105 differential/ratio spectrofluorometer designed for computer-controlled titrations into reference and sample cells. The instrument includes a pair of automatic dispensers (Hamilton) and J-shaped tubes for titrant delivery into standard 1-cm2 cuvettes. This setup works well for a high-density solution (e.g., concentrated urea), but the buffered, 12 mM MgCl2 titrant solutions used here easily mix with the cuvette solution during setup. The problem can be overcome by using a glass capillary tube that has been pulled into a very fine tip. The glass tips can be held in fittings with M6 threads by wrapping the tube with Teflon tape (Fig. 3.4). The assembly is held by a Teflon cap

85

Quantitation of Mg2+-RNA Interactions

A B C D E F

G Hamilton microlab 500 dispenser

Figure 3.4 Computer-controlled titrator setup. (A) Hamilton Microlab 500 Dispenser, with 5- mL gastight syringe; (B) FEP (Hamilton Company) tubing with M6 threaded fittings, 1-mL capacity; (C) Union, M6 threads; (D) Fittings, M6 thread; (E) Polymer cap, machined from Teflon with a cavity made to hold ‘‘D’’ and a hole made to accommodate the capillary tube; (F) glass capillary tube, held in place with Teflon tape, tip tapered and bent to fit into corner of cuvette and just long enough to reach solution but not interfere with light path; (G) cuvette and Teflon stir bar.

machined for the purpose (E, Fig. 3.4). This arrangement enhances wicking of sample up the cuvette walls and increases evaporation, but these problems are minimized by applying a hydrophobic coating to the walls. (Soak inside of cuvettes with 15 M nitric acid for at least 1 h. Rinse well with pure water and dry. Add Sigmacote (Sigma-Aldrich) to the cuvette and let sit for at least an hour. Rinse well with pure water.) The Hamilton dispensers can be used with syringe sizes from 25 mL to 25 mL. With 1000 steps per stroke, the 25-mL syringe could have a resolution in volume delivery of as little as 0.025 mL. However, the 25-ml syringes tend to wear out more quickly than the larger volume syringes. Thus we use 50-ml syringes as a good compromise between resolution and long-term stability. Regardless, the valves and syringe plungers wear down over time, leading to inconsistent results. With daily use, syringe plungers should be replaced about once a month, the entire syringe about every six months, and the valves every three to four months. The syringes and valves on the dispensers also leach metals and need to be rinsed with 5 mM EDTA solution prior to each day of use. In designing an HQS titration, the total time of the experiment and the number and volume of injections are important and related considerations. In general, we have found that the shorter the experiment, the more reproducible the data. For nucleic acid systems in which Mg2þ does not induce any folding reaction, the only limitation on the interval between

86

Dan Grilley et al.

additions is the time taken for mixing, about 15 s at moderate stirring speeds. The folding kinetics of some RNAs can require equilibration times of five or more minutes for each Mg2þ addition. (A necessary control is to obtain time courses of RNA folding after Mg2þ addition in the UV spectrophotometer under similar solution conditions as desired for the HQS-monitored titration; on the basis of such kinetic data, an appropriate interval can be designed for the HQS-monitored titration. Apparent folding rates are usually not uniform as one titrates across a folding transition.) The longest a titration should be is about five hours; longer titrations require corrections for drift in instrument signal (which is exacerbated by fluctuations in room temperature) and titrant and sample evaporation (to allow titrant addition, the cuvettes cannot be completely sealed). It is easier to repeat a titration several times with different nucleic acid concentrations and schedules of Mg2þ additions than to try to obtain a complete range of data in a single long titration. A single titration experiment requires data from two cuvettes run in parallel: a reference cuvette with HQS and titration buffer and a sample cuvette with RNA. The same batch of titration buffer with which the RNA has been equilibrated is used in both cuvettes. When preparing the sample and reference cuvettes, absorption spectra should be taken to check the RNA concentration and test for metal ion contamination. First, scan buffer alone in both cuvettes (550–200 nm). Add the nucleic acid to the sample cuvette (typically a five-fold dilution) and perform a second scan. The absorbance at 260 nm will be far beyond the linear range of any spectrophotometer; use the absorbance values between 0.2 and 1.2 OD in the range of 295–310 nm to check the RNA concentration. Add HQS, typically 1–2 mL of stock to a final concentration of 20–50 mM, and perform a final scan. (HQS concentrations as low as 10 mM may be used, if accurate data at bulk Mg2þ concentrations in the mM range are to be collected.) Metal ion contamination of the RNA stock solution is evident in the shape of the HQS absorbance curve between 350 and 400 nm (Fig. 3.2). A final diagnosis of contamination may be made by adding EDTA to 100 mM and looking for changes in the HQS spectrum; however, the sample cannot then be used without repurification. A typical titration protocol with the automatic titrators uses starting volumes of 2.00 mL in sample and reference cuvettes. The two titrant solutions are: Sample: Titration Buffer þ 12 mM MgCl2 (60 mM MgCl2 for pH 6.15 buffers) Reference: Titration Buffer þ 120 mM MgCl2 The same buffer used to equilibrate samples (see section 4.2) is also used for the titration buffer in each solution; unbuffered stock MgCl2 solutions are as described in section 4.1. Because titration buffer is slightly diluted

87

Quantitation of Mg2+-RNA Interactions

when making the sample and reference titrants, there will be an 0.2% change in salt and buffer concentration over the course of the entire titration; the magnitude is not large enough to be of concern. A typical titration schedule is as follows (concentrations listed are for the reference cell, but the same volume additions are used for sample cell): Section:

1

2

3

Final [MgCl2] Step size Number of points

12 mM 200 mM 60

20 mM 400 mM 20

40 mM 1 mM 20

The Aviv fluorometer software calculates the injection volumes needed to achieve constant increments in MgCl2 concentration; constant volume additions could be used as well. The standard stirring time between additions is 30 s; nucleic acid samples that fold may require adjustment in the stirring times programmed for each section. To reduce problems with self-quenching, data are collected on the edge of the HQS absorption maximum by exciting at 405 nm with a 2-nm bandwidth; emission is monitored at 500 nm with an 8-nm bandwidth. After the automatic titration is finished, manual additions are made to help determine the fluorescence intensity at saturating Mg2þ (Imax; see section 6). Unbuffered 2 M MgCl2 is added in the following volumes: 5, 5, 10, 20, 20, 20, and 20 ml.

5.2. Manual titrations HQS titrations can also be performed by manual additions of Mg2þ with standard pipetters. Manual titrations can be carried out in small volume cuvettes, which is useful when the RNA of interest cannot be obtained in large amounts. In a typical titration, a stoppered microcuvette containing 400 mL of an HQS solution is titrated in parallel to a cuvette containing 400 mL of HQS and the RNA of interest. We recommend using a 10-mL pipetter during the entire experiment for consistency. Use three MgCl2 stock solutions (made in titration buffer): 7, 35, and 1000 mM. Titrate the RNA containing cuvette initially with 7 mM MgCl2 (45 additions of 1–10 mL), then with 35 mM MgCl2 (three 10-mL additions), and finally with 1000 mM MgCl2 (five 10-mL additions). Titrate the cuvette containing only HQS with 35 mM MgCl2 (48 additions of 1–10 uL) and 1000 mM MgCl2 (five 10-mL additions). Add the same volume of titrant to both the RNA and HQS cuvettes. Monitor the fluorescence intensity until the HQS signal stabilizes, and take six additional measurements of the intensity to average for a single data point.

88

Dan Grilley et al.

6. Data Analysis An example titration data set is shown in Fig. 3.5. The most reliable and informative data are obtained near the beginning of the titration, where the difference between the sample and reference curves is most pronounced. In the example shown, this region corresponds to HQS normalized fluorescence (saturation) values between 0 and 0.15 (see Fig. 3.5, inset). To begin data analysis, measured intensities for the sample cuvette are first normalized:

Inorm ¼

I Imin : Imax Imin

ð3:11Þ

Inorm is the fraction of Mg2þ-bound HQS, which will be compared with the reference titration (Fig. 3.1A). Imin is taken as the initial data point (before MgCl2 addition) of the sample cuvette. Imax is obtained from the fit of a 1.0

0.6

0.4 Γ2+, per nucleotide

Normalized fluorescence

0.8

0.4

0.2

0.3 0.2 0.1 0

0

0

0.002

10−6

10−5

10−4 bulk C 2+ (M)

0.004 0.006 Added MgCl2, M

10−3

10−2

0.008

Figure 3.5 An example titration showing reference HQS titration (filled circles) and sample RNA titration (open squares). The fluorescence data have been normalized (Eq. (3.10)). The data from both the sample and reference cuvette have been fit to Eq. (3.9) (black and gray curves, respectively). The fit to the sample cuvette data uses just the manual titration points, only one of which is visible on this scale (solid square). The inset shows the values of G2þ calculated from Eq. (3.5). Vertical lines (both main graph and inset) show the end of each section of the titration schedule in the sample cuvette. The most informative part of the titration curve for calculating G2þ is the initial, sharply curved lag.

89

Quantitation of Mg2+-RNA Interactions

single site binding isotherm (Eq. (3.10)) to only the manual titration points of the sample cuvette (Fig. 3.5). More complicated schemes for obtaining Imax were tried; all gave identical results within error. The data from the reference titration are fit to a single site-binding isotherm (Eq. (3.10)), allowing KHQS, Imax, and Imin values to float. KHQS bulk is then used to calculate the bulk Mg2þ concentration, C2þ , for each data point in the sample curve: bulk C2þ ¼

Inorm : KHQS ð1 Inorm Þ

ð3:12Þ

Finally, the preferential interaction coefficient is computed by calculating the excess Mg2þ present in the sample cuvette over the calculated bulk Mg2þ concentration: sample

G2þ ¼ reference

C2þ

bulk C2þ

CRNA

:

ð3:13Þ

bulk C2þ is identical to C2þ in Eq. (3.3); Eq. (3.12) simply calculates this value from the fitted reference HQS-Mg2þ binding curve. Errors in four experimental measurements can affect the accuracy of the calculated G2þ values. These quantities and their estimated typical uncertainties are Imax (<1%), KHQS (<1%), nucleic acid concentration (3%), and stock MgCl2 concentration (1%). Fig. 3.6 shows the effect on the calculated value of G2þ when each of these quantities is perturbed by 3%–10%. Errors in Imax result in small changes in G2þ for the initial titration points, and much larger errors for the final titration points. Any error in determining the stock MgCl2 concentration will be partly compensated in the calculabulk tion of C2þ by the fact that an overestimation of the total added MgCl2 decreases the calculated KHQS (and vice versa). Errors in the RNA concenbulk tration result in the same percentage change in G2þ at all C2þ . Combining all of the typical estimated magnitudes of errors, their net effect on G2þ in the Fig. 3.6 data set is similar to the variation observed between titrations repeated under identical conditions in the range from 3 106 to nearly 103 M bulk Mg2þ. bulk To average data from separate titrations, we choose a set of ln(C2þ ) values, calculate the corresponding G2þ for each data set by linear interpolation between neighboring data points, and average the G2þ values so obtained. Titrations carried out at different pH values can be averaged to bulk extend the range of C2þ values for which errors in G2þ are acceptable. Three to five independent titrations are typically averaged to obtain a reliable data set; the standard deviations of the averaged values give an

90

Dan Grilley et al.

Γ2+, per nucleotide

A

B 0.4

0.2

0 Γ2+, per nucleotide

C

D 0.4

0.2

0

10−5

10−4 bulk C 2+ (M)

10−3

10−5

10−4

10−3

bulk C 2+ (M)

Figure 3.6 Effect of 3 (solid dark gray), 5 (dashed medium gray), and 10% (dotted light gray) perturbations of either Imax (A), KHQS (B), RNA concentration bulk . The arrows indicate the effect (C), or stock MgCl2 concentration (D) on G2þ and C2þ of an increase in each value.

estimate of the associated error. Integration of the averaged G2þ curve to obtain interaction free energies (Fig. 3.1B) is done either by using the trapezoidal rule of numerical integration or by fitting a polynomial that asymptotically approaches the x-axis, y ¼ b(x a)2 þ c(x a)3 þ d(x a)4, and integrating the polynomial. The two methods generally agree within error. bulk In either case, free energies calculated at low ln(C2þ ) are potentially subject to systematic error from the way the curve is extrapolated to the x-axis.

7. Controls and Further Considerations Here we discuss a number of factors that should be taken into account in designing a titration experiment and analyzing the results, either because of practical limitations in the experimental set-up or assumptions that are inherent in Eqs. (3.6) and (3.12). 1. The possibility that RNA-RNA interactions bias the measurement of G2þ should be checked by making measurements over a range of RNA concentrations and, if necessary, extrapolating to infinite dilution (Strauss et al., 1967). We have made measurements in the range of 0.25 to 6 mM nucleotide (2 mg/ml, in 40 mM monovalent cation)

Quantitation of Mg2+-RNA Interactions

91

for an RNA without detecting a significant concentration dependence in G2þ. The potential for RNA aggregation at high concentrations can be minimized by careful design of the sequence (Szewczak et al., 1990). 2. The dye or Mg2þ-dye complex should not bind directly to the RNA. We check to make sure the dye absorbance and fluorescence spectra are not perturbed by RNA, and compare RNA melting profiles carried out in the presence and absence of dye. We have no evidence for HQS or HQS Mg2þ binding to any RNA. 3. Eq. (3.13) is valid only when there is a large molar excess of monovalent salt (e.g., KCl or NaCl) over added MgCl2. As discussed earlier (in section 2.1), HQS can be used as a sensor of the Mg2þ activity only if repulsive electrostatic interactions between the RNA and the negatively charged dye negligibly perturb n, the fraction of dye molecules that are in complex with Mg2þ (Eq. (3.4)). The effect of RNA on the measurement of n is estimated as follows. Suppose that an equilibrium dialysis experiment is carried out with an RNA concentration of 5 mM nucleotides. For most RNAs, anion exclusion is on the order of G 0.1; thus the difference in anion concentrations between the two sides of the membrane is Cin Cout 0:5 mM. If 50 mM KCl is present, then the ratio of anion activity coefficients on the two sides of the membrane is about 2þ gin =gout 1:01 (see the analogous Eq. (3.9) for Mg ). Because anion exclusion is entirely a result of unfavorable coulombic forces, the identity of the anion is unimportant and HQS and Cl should be affected similarly. Thus gD (Eq. (3.5)) is 1% different between the two sides of the membrane, and the corresponding error in the measurement of n is <1%, which is comparable to the experimental error (see section 6). 4. In the derivations of Eqs. (3.6) and (3.10), Mg2þ ion concentration (C2þ) has been substituted for activity (a2þ). This substitution is justified only if the Mg2þ activity coefficient, g2þ, is a constant over the course of the titration. In a solution of strong electrolytes, g2þ is primarily affected by favorable electrostatic interactions with anions. If a large-enough concentration of NaCl or KCl is already present in solution, the Cl concentration does not significantly change as MgCl2 is added, and g2þ is approximately constant. Whether the Cl concentration is in large enough excess over Mg2þ is checked when a single site-binding isotherm (Eq. (3.10)) is fit to the reference titration data. A nonrandom distribution of residuals is observed if g2þ is not constant over the range of MgCl2 concentrations being used (D. Leipply, personal communication). A 10- to 30-fold excess of Cl over the Mg2þ concentration at the midpoint of the titration generally reduces the error to levels comparable with experimental variation. Note that an excess of monovalent ions is also required in equilibrium dialysis or any other experiment that attempts to extract thermodynamic information from a titration of RNA with MgCl2.

92

Dan Grilley et al.

5. The derivation of Eq. (3.6) assumes that RNA does not undergo any conformational changes as the Mg2þ concentration increases. If an RNA adopts additional structure in response to the presence of Mg2þ, an inflection may be observed in the plot of G2þ against ln bulk (C2þ ); the measured G2þ is then a weighted average over the RNA species present in solution. If the conformational change is two-state, it may be possible to find the number of Mg2þ ions taken up or released in the transition (Grilley et al., 2006). An RNA may also change conformation during the titration without any inflection appearing in the plot of G2þ, for instance, if there is gradual compaction of an ensemble of extended RNA structures as Mg2þ is added (Grilley et al., 2007). 6. The HQS titration method can be applied, in principle, over a wide range of monovalent salt concentrations. However, there are some practical considerations. At low salt concentrations, it becomes difficult to assure that equilibrium with the nucleic acid has been achieved in preparation of the sample, and it also becomes difficult to reduce adventitious metal ion contaminants to a sufficiently low level. These limitations set a lower limit of 20 mM monovalent ions. At high salt concentrations, the weakness of the Mg2þ-nucleic acid interactions and the correspondingly high concentrations of nucleic acid that must be used in the experiments become limiting. Measurement of Imax and KHQS become difficult because of the high Mg2þ concentrations required to saturate HQS. The solution viscosity and concentrationdependent behavior of the nucleic acid may also cause problems. We bulk have measured G2þ curves to >1 mM C2þ for a 58-mer RNA with þ 150 mM K present, which is probably close to the practical limit (Grilley et al., 2007).

ACKNOWLEDGMENTS This work was supported by NIH grant GM58545 (D.E.D.) and a Burroughs Wellcome Fellowship (A.M.S.).

REFERENCES Anderson, C. F., Felitsky, D. J., Hong, J., and Record, M. T. (2002). Generalized derivation of an exact relationship linking different coefficients that characterize thermodynamic effects of preferential interactions. Biophys. Chem. 101–102, 497–511. Bina-Stein, M., and Stein, A. (1976). Allosteric interpretation of Mg2þ binding to the denaturable Escherichia coli tRNAGlu2. Biochemistry 15, 3912–3917. Bishop, J. A. (1963a). Complex formation and fluorescence 1. Complexes of 8-hydroxyquinoline-5-sulfonic acid. Anal. Chim. Acta 29, 172.

Quantitation of Mg2+-RNA Interactions

93

Bishop, J. A. (1963b). Complex formation and fluorescence 2. Use of 8-hydroxyquinoline5-sulfonic acid as an indicator. Anal. Chim. Acta 29, 178. Draper, D. E. (2004). A guide to ions and RNA structure. RNA 10, 335–343. Draper, D. E., Grilley, D., and Soto, A. M. (2005). Ions and RNA folding. Annu. Rev. Biophys. Biomol. Struct. 34, 221–243. Grilley, D., Soto, A. M., and Draper, D. E. (2006). Mg2þ-RNA interaction free energies and their relationship to the folding of RNA tertiary structures. Proc. Natl. Acad. Sci. USA 103, 14003–14008. Grilley, D., Misra, V., Caliskan, G., and Draper, D. E. (2007). Importance of partially unfolded conformations for Mg(2þ)-induced folding of RNA tertiary structure: Structural models and free energies of Mg(2þ) interactions. Biochemistry 46, 10266–10278. Krakauer, H. (1971). The binding of Mgþþ ions to polyadenylate, polyuridylate, and their complexes. Biopolymers 10, 2459–2490. Laing, L. G., Gluick, T. C., and Draper, D. E. (1994). Stabilization of RNA structure by Mg2þ ion: Specific and non-specific effects. J. Mol. Biol. 237, 577–587. Leroy, J. L., and Gue´ron, M. (1977). Electrostatic effects in divalent ion binding to tRNA. Biopolymers 16, 2429–2446. Liu, J. C. I, and Bailar, J. C. (1951). The stereochemistry of complex inorganic compounds II. The resolution of bis-(8-quinolinolo-5-sulfonic acid) zinc(Ii). J. Am. Chem. Soc. 73, 5432–5433. Martell, A. E., and Smith, R. M. (1974). ‘‘Critical stability constants,’’ New York: Plenum Press, New York. Misra, V. K., and Draper, D. E. (2002). The linkage between magnesium binding and RNA folding. J. Mol. Biol. 317, 507–521. Misra, V. K., Shiman, R., and Draper, D. E. (2003). A thermodynamic framework for the magnesium-dependent folding of RNA. Biopolymers 69, 118–136. Ni, H., Anderson, C. F., and Record, M. T., Jr., (1999). Quantifying the thermodynamic consequences of cation (M2þ, Mþ) accumulation and anion(X-) exclusion in mixed salt solutions of polyanionic DNA using Monte Carlo and Poisson-Boltzmann calculations of ion-polyion preferential interaction coefficients. J. Phys. Chem. B. 103, 3489–3504. Pickett, J. S., Bowers, K. E., and Fierke, C. A. (2003). Mutagenesis studies of protein farnesyltransferase implicate aspartate beta 352 as a magnesium ligand. J. Biol. Chem. 278, 51243–51250. Record, M. T., Jr., Zhang, W., and Anderson, C. F. (1998). Analysis of effects of salts and uncharged solutes on protein and nucleic acid equilibria and processes: A practical guide to recognizing and interpreting polyelectrolyte effects, Hofmeister effects, and osmotic effects of salts. Adv. Protein Chem. 51, 281–353. Ro¨mer, R., and Hach, R. (1975). tRNA conformation and magnesium binding. A study of a yeast phenylalanine-specific tRNA by a fluorescent indicator and differential melting curves. Eur. J. Biochem. 55, 271–284. Serebrov, V., Clarke, R. J., Gross, H. J., and Kisselev, L. (2001). Mg2þ-induced tRNA folding. Biochemistry 40, 6688–6698. Smith, R. M., and Martell, A. E. (1975). ‘‘Critical stability constants,’’ Plenum Press, New York. Soto, A. M., Misra, V., and Draper, D. E. (2007). Tertiary structure of an RNA pseudoknot is stabilized by ‘‘diffuse’’ Mg(2þ) ions. Biochemistry 46, 2973–2983. Stein, A., and Crothers, D. M. (1976a). Conformational changes of transfer RNA: The role of magnesium(II). Biochemistry 15, 160–167. Stein, A., and Crothers, D. M. (1976b). Equilibrium binding of magnesium(II) by Escherichia coli tRNAfMet. Biochemistry 15, 157–160. Strauss, U. P., Helfgott, C., and Pink, H. (1967). Interactions of polyelectrolytes with simple electrolytes. II. Donnan equilibria obtained with DNA in solutions of 1-1 electrolytes. J. Phys. Chem. 71, 2550–2556.

94

Dan Grilley et al.

Szewczak, A. A., White, S. A., Gewirth, D. T., and Moore, P. B. (1990). On the use of T7 RNA polymerase transcripts for physical investigation. Nucleic Acids Res. 18, 4139–4142. Tinoco, I., Jr., and Bustamante, C. (1999). How RNA folds. J. Mol. Biol. 293, 271–281. Wyman, J., and Gill, S. (1990). ‘‘Binding and linkage: Functional chemistry of biological macromolecules,’’ Mill Valley, CA: University Science Books, Mill Valley, CA.

C H A P T E R

F O U R

Analysis of Repeat-Protein Folding Using Nearest-Neighbor Statistical Mechanical Models Tural Aksel* and Doug Barrick* Contents 1. Historical Overview of Ising Models and Motivation for the Present Review 1.1. Origins 1.2. Application to linear biopolymers 2. Linear Repeat Proteins and Their Connection to Linear Ising Models 3. Formulating a Homopolymer Partition Function and the Zipper Approximation 4. Matrix Approach: Homopolymers 5. Matrix Approach: Heteropolymers 6. Solvability Criteria for Ising Models Applied to Repeat-Protein Folding 7. Matrix Homopolymer Analysis of Consensus TPR Folding 8. Matrix Heteropolymer Analysis of Consensus Ankyrin Repeat Folding 9. Summary and Future Directions Acknowledgments References

96 96 96 97 100 104 109 111 115 119 123 124 124

Abstract The linear “Ising” model, which has been around for nearly a century, treats the behavior of linear arrays of repetitive, interacting subunits. Linear “repeatproteins” have only been described in the last decade or so, and their folding energies have only been characterized very recently. Owing to their repetitive structures, linear repeat-proteins are particularly well suited for analysis by the nearest-neighbor Ising formalism. After briefly describing the historical origins

*

T. C. Jenkins Department of Biophysics, The Johns Hopkins University, Baltimore, Maryland, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04204-3

#

2009 Elsevier Inc. All rights reserved.

95

96

Tural Aksel and Doug Barrick

and applications of the Ising model to biopolymers, and introducing repeat protein structure, this chapter will focus on the application of the linear Ising model to repeat proteins. When applied to homopolymers, the model can be represented and applied in a fairly simplified form. When applied to heteropolymers, where differences in energies among individual subunits (i.e. repeats) must be included, some (but not all) of this simplicity is lost. Derivations of the linear Ising model for both homopolymer and heteropolymer repeat-proteins will be presented. With the increased complexity required for analysis of heteropolymeric repeat proteins, the ability to resolve different energy terms from experimental data can be compromised. Thus, a simple matrix approach will be developed to help inform on the degree to which different thermodynamic parameters can be extracted from a particular set of unfolding curves. Finally, we will describe the application of these models to analyze repeat-protein folding equilibria, focusing on simplified repeat proteins based on “consensus” sequence information.

1. Historical Overview of Ising Models and Motivation for the Present Review 1.1. Origins The history of the ‘‘Ising’’ model, or perhaps more appropriately, the IsingLenz model, has been described extensively (Brush, 1967; Niss, 2005). Originally developed to study ferromagnetism, the model can be traced to the dissertation of Ernst Ising (Ising, 1925), and to an earlier proposal by Wilhelm Lenz (Lenz, 1920). At the time, Ising was directly connected to Lenz, as Ising carried out his dissertation work on the model under Lenz’s guidance at Hamburg University. Since that time, the model (with which Ising’s name is almost exclusively associated) has been applied to study a wide range of cooperative phenomena in one, two, and three dimensions, including phase separation in mixtures, phase transitions in singlecomponent systems (the lattice gas model), and cooperative phenomena in linear biopolymers. It seems unfortunate that Ising did not continue in this area, in part because he was discouraged that, in his view, the model could not capture ferromagnetic transitions (Brush, 1967).

1.2. Application to linear biopolymers Although the Ising model has been used to describe order-disorder transitions in a wide variety of diverse systems, the one-dimensional Ising model has been particularly useful for conformational transitions in linear polymers. These transitions, which can be categorized as helix-coil transitions, include the equilibria between the a-helix- and coil in peptides

Analysis of Folding with Nearest-Neighbor Models

97

(Schellman, 1958; Zimm and Bragg, 1959; Lifson and Roig, 1961), and various equilibria of DNA and RNA, including double-helix formation (Zimm, 1960; Crothers and Kallenbach, 1966), and stacking transitions of single strands (Applequist and Damle, 1965; Poland et al., 1966). This literature, along with a very clear development of analytical models, is presented in a beautiful monograph by Poland and Scheraga (Poland and Scheraga, 1970). More recent applications include binding of protein ligands to repetitive structures such as DNA and protein filaments (McGhee and von Hippel, 1974; De La Cruz, 2005). In this review, we develop aspects of the nearest-neighbor or Ising model in the context of linear repeat proteins, emphasizing key features that are pertinent to recent experimental studies (including heterogeneous, homogeneous, and capped structures; see subsequent sections). We focus both on the theory and on how it can be used to analyze experimental data. It is our aim to provide enough detail so that all steps of the derivation can be followed (from the basic model to the development of the partition function, and then to modeling equilibrium-unfolding transitions) while avoiding specific features that apply exclusively to other types of linear biopolymers. In addition, we will include a discussion of some practical issues associated with determining the model-dependent parameters, emphasizing the relationship between these parameters and the data needed for their accurate determination.

2. Linear Repeat Proteins and Their Connection to Linear Ising Models The units of repeat proteins described here are constructed from tandem elements of secondary structure (a-helix, b-strand, PII helix, turn) arranged in a large loop. The length of individual repeats is approximately 20–40 residues, depending on the type of repeat. Typically, individual repeats show primary sequence similarity, and in most cases repeats were identified by primary sequence before structural details were available. However, some repeats show little or no obvious repetition at the primary sequence level. Even when there is repetition, sequence identity from one repeat to the next is typically around 25%. Thus, although consensus sequences can be identified, sequences of natural repeats differ significantly from the consensus. Three types of repeat proteins that have been amenable to structural and thermodynamic analysis and simplification through consensus information are ankyrin- (ANK), leucine-rich- (LRR), and tetratricopeptide (TPR) repeat proteins (see (Kloss et al., 2008) for review; see also (Courtemanche and Barrick, 2008; Kloss and Barrick, 2008)). TPR and ANK repeats are

98

Tural Aksel and Doug Barrick

composed of a-helices and turns, with two short turns connecting the TPR helices, and one short turn and one extended loop connecting the ANK helices. In contrast, LRR proteins contain a b-strand that packs against strands of neighboring repeats to form a contiguous sheet. Depending on the subtype, LRRs contain either an a-helix, a 310 helix, or an extended PPII (Kajava, 2001). In linear repeat proteins, adjacent repeat units pack against their neighbors in a roughly linear array (Fig. 4.1). Depending on the shape and A

B

C

b

a

b

a

b

a

D

Figure 4.1 The modular architectures of repeat proteins. (A) Crystal structure of the Notch ankyrin domain (1ot8.pdb, chain A) consisting of six structured ANK repeats (sequence repeats 2–7) and an N-terminal partly structured repeat. (B) Crystal structure of a consensus ankyrin repeat protein (2qyj.pdb) containing three consensus repeats (green) and N- and C-terminal caps (red and blue, respectively). (C) Crystal structure of a consensus-based TPR protein (1na0.pdb, chain A) containing three consensus repeats and a C-terminal cap (blue). (D) Crystal structure of YopM, a leucine-rich repeat protein containing 15 full LRR repeats of the bacterial subtype (15jl.pdb). For naturally occurring (heterogeneous) proteins, individual repeats are shown in different colors; selection of boundaries between repeats (color changes) is somewhat arbitrary and is based on considerations such as intron position, interresidue contact density, surface area, and visual impression. For the consensus ankyrin and TPR proteins, consensus repeats are shown with the same color but alternate in color saturation. This figure was prepared using PyMol (DeLano, 2003).

Analysis of Folding with Nearest-Neighbor Models

99

packing of repeats, different types of repeats typically show regular deviation from linearity (Kobe and Kajava, 2000), displaying twist from repeat to repeat (particularly pronounced for TPRs) and/or curvature along the entire stack (particularly pronounced for some LRR subtypes). For some repeat proteins, such as WD40 domains and TIM barrels, curvature is so extreme that a closed or circular structure is formed. Because such closed proteins have numerous sequence-distant interactions, they are not easily analyzed using nearest-neighbor thermodynamic models and will not be discussed here. Linear repeat proteins have two features that make them ideal subjects for simple nearest-neighbor models. First, as described previously, they are constructed of a repeating unit at the level of secondary and tertiary structure; repetition can be extended to the level of primary sequence using consensus information (see subsequent sections). This translational symmetry reduces the number and type of energy terms required to describe stability, allowing for different regions of the molecule to be described in the same way. Second, as can be seen in interresidue contact maps, direct contacts are limited to repeats that are immediately adjacent in sequence, which justifies using a nearest-neighbor approximation to describe folding. Given this structural simplicity, the free energy of repeat protein folding may be expected to have two dominant contributions: the intrinsic folding of individual units (which we will call DGi) and the interfacial interaction of neighboring repeats (DGi,iþ1; Fig. 4.2). Thermodynamically, the second term is similar to a cooperative term describing short-range interactions in A ΔG⬚ = ΔG i = −RT lnk B ΔG⬚ = ΔG i + ΔG i,i +1 = −RT lnkt C ΔG⬚ = ΔG N + ΔG c + 2ΔG i + 3ΔG i,i+1 = −RT ln k Nk Ck i2t 3 D ΔG⬚ = 4ΔG i + 3ΔG i,i +1 = −RT ln k i4t 3

Figure 4.2 A nearest-neighbor thermodynamic description of repeat-protein stability. The first two lines (A, B) show single-repeat steps in folding (individual folded repeats are shown as blocks), whereas the last two lines (C, D) show overall folding reactions the fully denatured to the fully native state. Green repeats depict identical sequences, such as consensus repeats, and the red and blue repeats represent N- and C-terminal caps.

100

Tural Aksel and Doug Barrick

the peptide helix-coil transition (although the statistics are often formulated differently to capture backbone hydrogen bonding between residues i and iþ4), and to the stacking interactions in DNA duplex formation. As in these simpler systems, various levels of approximation can be used to analyze unfolding transitions with nearest-neighbor models. Because naturally occurring repeat proteins are quite heterogeneous at the primary sequence level, a homopolymer approach (treating all the repeats as identical) may not be appropriate. However, studies from a number of labs have shown that stable repeat proteins of various types (ANK, TPR, LRR) can be built of repeat arrays that are nearly identical in sequence, typically matching very closely to the consensus sequence for that particular repeat (Mosavi et al., 2002; Binz et al., 2003; Main et al., 2003). In principle, such consensus arrays can be well-modeled using a homopolymer approach (Fig. 4.2B; (Main et al., 2003)), although in most cases polar substitutions at the terminal repeats are required to maintain solubility, introducing an intermediate level of heterogeneity (Wetzel et al., 2008).

3. Formulating a Homopolymer Partition Function and the Zipper Approximation The partition function, or sum over states, is central to analysis of the thermodynamic properties of repeat proteins, their populations, and their folding. Here the partition function will be developed for a homopolymeric linear system as a summation. As articulated by Zimm and Bragg in the late 1950s (Zimm and Bragg, 1959), this summation is particularly useful for short chains, thus keeping the number of terms in the sum manageable. The summation also simplifies to a useful approximate (closed) form in the high cooperativity limit. One intuitive way to build a molecular partition function, q, for repeat protein folding, is to represent the statistical weight of each conformation (for a linear Ising model there will be 2n total) as the concentration of each conformation, compared (as a ratio) to an arbitrary reference conformation. By choosing the state in which all n repeats are unfolded (Un) as the reference state, such ratios are equivalent to equilibrium constants for folding, and are thus related exponentially to the intrinsic folding energy of each repeat (DGi) and the interfacial pairing energy between neighbors (DGi,iþ1). With this reference, the molecular partition function can be written as follows: n X 1 X q¼ ½Fi ; Uni : ½Un i ¼ 0 configs

ð4:1Þ

101

Analysis of Folding with Nearest-Neighbor Models

The inside sum in Eq. (4.1) is taken over all microscopic configurations which have i folded repeats (Fi). Because of the dependence of overall folding energies on interfacial interactions, these microscopic configurations can differ in energy even though they have the same number of folded repeats. The number of interfaces is maximized when folded repeats are clustered together, whereas gaps separating folded repeats decrease the number of interfaces. Thus, converting Eq. (4.1) to a sum of equilibrium constants k and t for intrinsic folding and interfacial interaction (or exponentials in energies) requires the number of gaps between folded segments to be explicitly stated:

q¼1þ

n X i1 X

Oi; g ki ti1g :

ð4:2Þ

i ¼ 1g ¼ 0

In this equation, Oi, g is the number of ways that i out of n folded repeats can be arranged with g gaps. Unfortunately, the degeneracy in Eq. (4.2) is rather complex even in open form and is not particularly useful except for short arrays (low n), where each term in q can be given explicitly. However, in the limit of high interfacial stability, which eliminates gaps between folded repeats, the degeneracy (Oi, g ¼ 0) and the partition function become particularly simple. When all i folded repeats are coalesced into one structured segment (g ¼ 0), there are n i þ 1 ways to arrange the structured segment. This approximation is often referred to as the zipper model because structure (folded repeats in this case) zips up as a single block. The partition function for the zipper model can be written as:

q¼1þ

n X

ðn i þ 1Þki ti1

i¼1

¼ 1 þ t1

n X

ðn i þ 1ÞðktÞi

i¼1 1

¼ 1 þ t ðn þ 1Þ

n X

i

ðktÞ t

i¼1 n X

1

n X

iðktÞ

i

ð4:3Þ

i¼1

n d X ¼ 1 þ t ðn þ 1Þ ðktÞ k ðktÞi : dðktÞ i ¼ 1 i¼1 1

i

Both sums in the last line of Eq. (4.3) express partial geometric series in the variable kt, which can be written in closed form as:

102

Tural Aksel and Doug Barrick

n X

ktðfktgn 1Þ ðktÞ ¼ : kt 1 i¼1 i

ð4:4Þ

Substituting this closed form expression into Eq. (4.3) gives:

kðn þ 1Þðfktgn 1Þ d ktðfktgn 1Þ k : ð4:5Þ q¼1þ kt 1 dðktÞ kt 1 Differentiating the second term and rearranging gives a closed form of the partition function:

q¼1þ

kðfktgnþ1 fn þ 1gkt þ nÞ : ðkt 1Þ2

ð4:6Þ

With this relatively simple expression for the partition function, populations and associated observable properties can be calculated. Of primary importance is the fraction of repeats that are folded, which is given as: n n n 1X 1X ðn i þ 1Þki ti1 kX ¼ y¼ ipi ¼ i ðn i þ 1Þiki1 ti1 q ni ¼ 0 ni ¼ 0 nq i ¼ 1 ( ) n X k d 1þ ¼ ðn i þ 1Þki ti1 nq dk i¼1 ð4:7Þ k dq ¼ nq dk 1 d ln q ¼ n d ln k

where pi is the fractional population of the ith partly folded macrostate. Finding y by differentiating q with respect to k can be understood by recognizing that k serves as a counter for folded repeats. For example, conformations with four folded repeats will have four powers of k. The penultimate expression, which is general, and applies even when the zipper approximation does not hold, provides the simplest form for calculation of the fraction folded as a function of k, t, and n, given Eq. (4.6): y¼

k nfktgnþ2 ðn þ 2ÞðktÞnþ1 þ ðn þ 2Þkt n nðkt 1Þ ðkt 1Þ2 þ kðfktgnþ1 fn þ 1gkt þ nÞ

ð4:8Þ

Analysis of Folding with Nearest-Neighbor Models

103

Equilibrium-unfolding transitions can be derived from (or fitted using) Eq. (4.8) by introducing an explicit dependence on an external variable (temperature, pressure, or denaturant) to either k, t, or both parameters. In this review we will primarily focus on denaturant-induced unfolding. In Ising analysis of repeat protein unfolding, statistical weights have been have been assumed to vary exponentially with denaturant (linear in terms of free energy):

kðxÞ ¼ eðDGi Þ=RT ¼ eðDGi;H2O mi ½xÞ=RT tðxÞ ¼ eðDGi;iþ1 Þ=RT ¼ eðDGi;iþ1;H2O mi;iþ1 ½xÞ=RT :

ð4:9AÞ ð4:9BÞ

Here, [x] represents molar denaturant concentration, mi and mi,iþ1 are denaturant sensitivities of the intrinsic and interfacial terms, and DGi,H2O and DGi,iþ1,H2O are intrinsic folding and interfacial interaction energies in the absence of denaturant. This form of denaturant dependence has been used extensively for globular protein folding studies (Pace, 1986; Street et al., 2008). Although in principle both the intrinsic and interfacial stability may be affected, most studies of repeat-protein denaturation have attributed the effect of denaturant solely to the intrinsic folding constant, k (Mello and Barrick, 2004; Kajander et al., 2005; Wetzel et al., 2008). Assuming intrinsic folding involves formation of secondary structure elements (Fig. 4.2), whereas the nearest-neighbor interaction corresponds to packing of neighboring repeats, this partitioning is consistent with a growing body of evidence suggesting that denaturants destabilize proteins largely by acting on the backbone, and thus should primarily destabilize units of secondary structure rather than packing interactions between such structures (Scholtz et al., 1995; Auton et al., 2007; Bolen and Rose, 2008). Moreover, this partitioning is consistent with recent global analysis from our laboratory on denaturant-induced unfolding of large numbers of consensus ankyrin repeat unfolding transitions (TA & DB, in preparation). The first application of the 1D-Ising model to repeat protein folding involved a series of constructs in which ankyrin repeats were deleted from one or both ends of the Notch ankyrin domain (Mello and Barrick, 2004). By analyzing the free energies of unfolding of these constructs using a set of linear equations, a free energy contribution originating from each repeat was obtained. Because of the way the deletion series was constructed, analysis yielded an estimate of the intrinsic stability (DGi) of one of the repeats of þ6.6 kcal/mol and an average interfacial stability (DGi,iþ1) of 9.1 kcal/mol. These parameters were used to evaluate the populations of folded, unfolded, and partly folded states as a function of denaturant concentration, using the zipper approximation, which confirmed the all-or-none nature of the unfolding transition observed experimentally (Mello and Barrick, 2004).

104

Tural Aksel and Doug Barrick

4. Matrix Approach: Homopolymers The zipper model assumes that the folding of each repeat is highly coupled to its neighbors. High coupling allows conformations in which stretches of folded repeats are separated by unfolded repeats to be ignored. However, if cooperativity between adjacent repeats is low, or if repeat arrays are long, these intermediates will be significantly populated and must be accounted for. In this section we will present a simple matrix-based derivation of the partition function for the folding reaction of homopolymeric repeat proteins (i.e., all repeats are the same) that accounts for all partly folded conformations in a very compact way. This matrix method has been widely used to study one dimensional interacting biological systems (Zimm and Bragg, 1959; Poland and Scheraga, 1970). In addition to providing a full description of all partly folded states, this matrix-based form can be used to analyze experimental unfolding transitions to determine DGi and DGi,iþ1. Before we show how the matrix representation of the partition function can be manipulated to analyze unfolding curves, we will use a recursion-based approach that justifies the matrix form of the partition function. Although the matrix-based form of the partition function can easily be used without a detailed understanding of its origin, and its form is often justified simply by the fact that the rules of matrix multiplication combine statistical weights in the appropriate way, we feel that an understanding of the origins of the matrix method will result in a deeper understanding of its application. In the homopolymer approximation, each repeat has the same intrinsic folding energy (DGi), and the same interaction energy with its neighbors (DGi,iþ1 for all i repeats; we will retain the subscript i for use below, although the homopolymer approximation makes all n repeats identical). The free energy of any particular configuration, relative to the fully denatured state (Un earlier), can be written as:

DG ¼

n X

dj DGi þ

j¼1

n1 X

dj djþ1 DGi;iþ1

ð4:10Þ

j¼1

where dj ¼ 1 if repeat j is folded, 0 if it is unfolded. With this free energy relationship, the partition function of the homopolymer system with n identical repeats can written as:

qðnÞ ¼

2n X state ¼ 1

eDG

∘

=RT

:

ð4:11Þ

Analysis of Folding with Nearest-Neighbor Models

105

Long repeat proteins (large n) leads to a very large number (2n) terms in the sum and are impractical for calculations and analysis of data. Instead, a simpler, more compact form of q(n) in terms of DGi and DGi,iþ1 is needed. One approach to simplifying the sum is to derive an expression for q(n) in terms of the partition function of a construct that contains fewer repeats (e.g., q(n 1)). Repeating this method recursively defines q(n) in terms of progressively smaller (and simpler) partition functions and generates the matrix representation of the partition function in terms of DGi and DGi,iþ1 in the process. Starting with q(n) in terms of q(n 1), the nth repeat can be added to an n 1 array in one of the two states: folded (the partition function that counts all such states will be called qf(n)) or unfolded (qu(n)). Applying the same dichotomy to the n 1 state divides q(n 1) into two halves, one in which the last (n1) repeat is folded (q f (n 1), and one in which the last repeat is unfolded (qu(n 1)). When the nth repeat is added to the C-terminal end in a folded state, qf (n) can be written in terms of qf (n 1) and qu(n 1):

q f ðnÞ ¼ q f ðn 1ÞeðDGi þ DGi;iþ1 Þ=RT þ qu ðn 1ÞeðDGi Þ=RT : ð4:12Þ This equation simply states that if repeat n 1 is folded (with partition function qf (n 1)), adding a folded repeat (with intrinsic energy DGi) at position n creates a new interface (DGi,iþ1). However, if repeat n 1 is unfolded (with partition function qu(n 1)), adding an unfolded repeat at position n does not create a new interface. Likewise when the nth repeat is added to the C-terminal end in an unfolded state, qu(n) can be calculated using the same approach:

qu ðnÞ ¼ qf ðn 1Þe0=RT þ qu ðn 1Þe0=RT : ¼ qf ðn 1Þ þ qu ðn 1Þ

ð4:13Þ

The expressions for qf (n) and qu(n) are linear equations in the variables qf (n 1) and qu(n 1):

qf ðnÞ ¼ eðDGi þ DGi;iþ1 Þ=RT qf ðn 1Þ þ eDGi =RT qu ðn 1Þ ; ð4:14Þ qu ðnÞ ¼ qf ðn 1Þ þ qu ðn 1Þ and can be consolidated with a simple matrix relationship:

qf ðnÞ qf ðn 1Þ eðDGi þDGi;iþ1 Þ=RT eDGi =RT ¼ qu ðnÞ qu ðn 1Þ 1 1 : ð4:15Þ kt k qf ðn 1Þ ¼ 1 1 qu ðn 1Þ

106

Tural Aksel and Doug Barrick

The second line comes from substituting statistical weights k ¼ eDGi =RT and t ¼ eDGi,iþ1/RT for the free energy terms. Continuing the recursion to the n 2 repeat gives:

kt k kt k qf ðn 2Þ qf ðnÞ ¼ 1 1 1 1 qu ðn 2Þ qu ðnÞ 2 : qf ðn 2Þ kt k ¼ qu ðn 2Þ 1 1

ð4:16Þ

This recursion can continued all the way to the first (N-terminal) repeat to give:

kt qf ðnÞ ¼ 1 qu ðnÞ

k 1

n1

qf ð1Þ ; qu ð1Þ

ð4:17Þ

where qf (1) and qu (1) are the statistical weights for a single N-terminal folded and unfolded repeats, and are simply,

Thus:

qf ð1Þ ¼ k : qu ð1Þ ¼ 1

ð4:18Þ

n1 kt k qf ðnÞ k : ¼ 1 1 qu ðnÞ 1

ð4:19Þ

Multiplying the LHS by the row vector [1 1] sums qf (n) and qu(n) to give the full partition function, q(n), as:

qf ðnÞ qðnÞ ¼ ½ 1 1 q ðnÞ u n1 : k kt k ¼ ½1 1 1 1 1

ð4:20Þ

By expanding the column vector on the RHS in terms of the statistical weight matrix, q(n) can be expressed as the nth power of the matrix

n1 kt k 0 kt k qðnÞ ¼ ½ 1 1 1 1 n 1 1 1 : 0 kt k ¼ ½1 1 1 1 1

ð4:21Þ

107

Analysis of Folding with Nearest-Neighbor Models

One final rearrangement of q(n), which will be helpful for further calculations, is given by taking the transpose of the equation above (as q(n) is a scalar, it is unaffected by transposition):

n T 0 kt k qðnÞ ¼ ½ 1 1 1 1 1 n T T kt k 0 ¼ ½ 1 1 T 1 1 1 ; n 1 kt 1 ¼ ½0 1 1 k 1 1 ½ 0 1 W n 1

ð4:22Þ

where the weight matrix is represented using W. The preceding equation allows q(n) to be computed without having to enumerate all 2n terms explicitly. Moreover, it can be simplified by treating it as an eigenvalue problem, which greatly simplifies the product of the statistical weight matrices. In this treatment, W is substituted by a matrix product,

W ¼ TDT 1 ;

ð4:23Þ

where D is a diagonal matrix of the eigenvalues (l1, l2) of W, and T is an invertible matrix of its eigenvectors (Strang, 2005). This substitution leads to:

qðnÞ ¼ ½ 0 ¼ ½0 ¼ ½0 ¼ ½0 ¼ ½0 ¼ ½0 ¼ ½0

1 ðTDT

1 Þ 1

1 n

1 1 ðTDT ÞðTDT Þ . . . ðTDT Þ 1 1 1 TDT 1 TDT 1 . . . TDT 1 1 1 1 TDD . . . DT 1 : 1 1 1 TDn T 1 1 n l1 0 1 1 T 1 T 1 0n l2 l1 0 1 1 1 T n T 0 l2 1 1

1

1

ð4:24Þ

108

Tural Aksel and Doug Barrick

The eigenvalues of W are obtained by solving the characteristic equation det(W lI ) ¼ 0, yielding the two roots:

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ l1 ¼ kt þ 1 þ ðkt 1Þ2 þ 4k =2; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 dl1 =dk ¼ t=2 þ ðkt t þ 2Þ=2 ðkt 1Þ2 þ 4k qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ l2 ¼ kt þ 1 ðkt 1Þ2 þ 4k =2; qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 dl2 =dk ¼ t=2 ðkt t þ 2Þ=2 ðkt 1Þ2 þ 4k

ð4:25AÞ

ð4:25BÞ

(the derivatives will be used subsequently). Two corresponding eigenvectors of W are

1 l1 1 l2 ~ ;~ t2 ¼ ; t1 ¼ k k

and combine to give:

and

1 l1 1l2 ; T ¼ ½~ t2 ¼ t1 ~ k k k l2 1 1 1 T ¼ : kðl1 l2 Þ k 1 l1

ð4:26Þ

ð4:27Þ

Combining these eigenvalues and eigenvectors into Eq. (4.24) gives a relatively simple closed-form expression for q(n):

kð1 tÞðln1 ln2 Þ þ l1nþ1 l2nþ1 qðnÞ ¼ : l1 l2

ð4:28Þ

By differentiating q(n) with respect to k as in Eq. (4.7), the fraction of folded repeats (y) can be calculated as: "

n @l1

# n1 @l2 n @l2 @l1 @l1 2 ð1 tÞ ln1 ln2 þ kn ln1 k @l 1 @k @k @k l2 @k þðn þ 1Þ l1 @k l2 @k y¼ þ : n l1 l2 kð1 tÞðln1 ln2 Þ þ l1nþ1 lnþ1 2

ð4:29Þ Values of l1 and l2, along with derivatives with respect to k, can be inserted into Eq. (4.29) from Eqs. (4.25A) and (4.25B). The denaturant dependence of the fraction of folded repeats can be obtained by combining

Analysis of Folding with Nearest-Neighbor Models

109

Eqs. (4.9A) (and if necessary, 4.9B) into Eq. (4.29). Finally, the fraction of folded repeats can be used to analyze experimental equilibrium denaturation curves to determine the underlying thermodynamic parameters through the equation

Yobs ð½x; nÞ ¼ ðAf ½x þ Bf Þyð½x; nÞ þ ðAu ½x þ Bu Þ 1 yð½x; nÞ ;

ð4:30Þ

where Yobs represents an observed signal (often far-UV circular dichroism or tryptophan fluorescence). The As and Bs allow for a linear denaturant dependence of the signals from folded and unfolded repeats, and combine to give native and denatured baselines. In principle, when analyzing multiple repeat proteins of different length (n), all the baseline parameters should be describable using a single pair of values for each baseline. However, owing to modest uncertainties in concentration, fitting separate baseline parameters may be preferable to introducing such a constraint, which may degrade the quality of the fit in the equilibrium transition region and compromise fitted thermodynamic parameters (Johnson, 2008).

5. Matrix Approach: Heteropolymers A primary motivation for analyzing consensus repeat-protein unfolding is that each repeat can be considered to have the same stability and the same interaction energy with its neighbors, greatly decreasing the number of unknown thermodynamic parameters. However, repeat-protein arrays built of a single consensus sequence seem to have solubility problems, likely owing to large hydrophobic interfaces present at the ends of each array. In crystal structures of a fragment of the Notch ankyrin domain, a head-to-head crystallographic dimer is seen (Lubman et al., 2005), suggesting that the end repeats can indeed mediate association by such an interface. Such associations are also seen crystallographically in superhelical consensus TPR arrays and actually displace the C-terminal capping helix (Kajander et al., 2007). Capping one or both termini with repeats bearing polar or charge substitutions solves this problem but introduces new thermodynamic parameters, and more importantly, requires more complex models for analysis. In this section, we will describe how the partition function for a heterogeneous repeat protein can be manipulated to simulate populations and folding transitions, and more importantly, fitted to equilibrium-folding transitions. As above, we will use a matrix representation of the partition function, which again can be simplified from an open sum that enumerates each conformation. For generality, our derivation will treat each repeat as different, having different intrinsic folding (DGi) and interaction energies (DGi,iþ1). For many repeat-protein-folding studies (especially capped

110

Tural Aksel and Doug Barrick

consensus arrays), an intermediate level of complexity, in which some terms are identical and some are unique, should be sufficient to model folding and determine underlying energetic parameters, and may provide a more convenient representation. We will start with the same matrix formulation we presented for homopolymers, and define a unique weight matrix for each repeat:

1 : qðnÞ ¼ ½ 0 1 W1 W2 Wn 1 ki ti1;i 1 Wi ¼ 1 ki : DGi =RT ki ¼ e ti1;i ¼ eDGi1;i =RT

ð4:31Þ

As demonstrated above for a homopolymer, the rules of matrix multiplication combine statistical weights in such a way as to produce the appropriate Boltzmann factor for each conformation. That derivation, which considered q(n) in terms q(n 1), q(n 2) . . . , can easily accommodate unique, position specific coefficients, rather than a single value for k and for t, to generate q(n) as in Eq. (4.31). The index on the interaction parameter in Eq. (4.31) represents the interaction between repeat i and the previous repeat (i 1) because the rows of the statistical weight matrix represent the folding status of the previous repeat. In the partition function for the homopolymer, diagonalization provides a huge simplification, converting a product of n identical matrices to a product of only three (TDT1). This is not possible for the heteropolymer partition function, because the n weight matrices are different (as are their eigenvalues and eigenvectors). Thus, we are stuck with a product of n matrices as the partition function for a heteropolymeric repeat protein. Although when multiplied out this product has no fewer terms than a general summation such as Eq. (4.11), owing to its compactness it is considerably easier to generate and manipulate using matrix manipulation programs such as Matlab (http://www.mathworks. com/) and Scilab (http://www.scilab.org/). As described previously, the quantity of greatest interest in terms of connecting with experiments is the fraction of the repeats folded, y. For homopolymeric systems, an expression for y could be generated by differentiating the partition function with respect to k, and dividing by q (see Eq. (4.7)). With the closed-form homopolymer partition function, this operation is mathematically quite simple. Here, not only is the partition function more complex, there is no single value of k that can be used as a counter of folded repeats. Moreover, the option of calculating an open sum of populations for all possible conformations and multiplying by the number of folded repeats is cumbersome (2n terms) and for large arrays of repeats, fitting requires significant computer memory.

111

Analysis of Folding with Nearest-Neighbor Models

Instead, we favor a summation over the n positions of the folded repeat, calculating the probability that each of the n repeats is folded, instead of the probability of each of the 2n conformations. Clearly, the fraction of repeats that are folded is simply the average probability that each of the repeats is folded:

y¼

n 1X yi ; ni ¼ 1

ð4:32Þ

where yi is the probability of finding ith repeat in folded state. yi can be connected to the q(n) through a subpartition function qi, which sums over all the conformational states in which the ith repeat is folded. These quantities can be related by recognizing that the probability of finding the ith repeat folded is simply the sum of conformations where it is folded divided by all the conformations, or:

qi ; qðnÞ

ð4:33Þ

n 1 X qi : nqðnÞ i ¼ 1

ð4:34Þ

yi ¼ giving

y¼

This summation emphasizes the fact that q(n) only needs to be calculated once. In contrast, qi needs to be calculated n times (once at each position), but it can also be calculated in matrix form:

qi ¼ ½ 0

1 W1 W2 . . . Wi1

ki ti1 ki

0 1 Wi þ 1 Wi þ 2 . . . Wn : 0 1

ð4:35Þ In the statistical weight matrix, the second column corresponds to all of the conformations where the ith repeat is unfolded. Setting this column to zero in the Wi matrix of qi eliminates all of these conformations without affecting the terms for conformations where the ith repeat is folded.

6. Solvability Criteria for Ising Models Applied to Repeat-Protein Folding The preceding sections derive equations for nearest-neighbor partition functions for repeat-protein folding. These partition functions can be used to evaluate populations of partly folded states and generate folding curves, given

112

Tural Aksel and Doug Barrick

a set of thermodynamic parameters (DGi, DGi,iþ1, and denaturant dependences). Subsequent sections will show how these models can be applied to analyze experimental folding curves, and will analyze fitted thermodynamic parameters for different repeat types and sequences. However, in this section, we will describe a way to evaluate whether a set of thermodynamic parameters is likely to be determined with any meaningful accuracy, given a set of data (folding transitions for constructs of different length, and potentially different sequence). This analysis will also connect to a closely related issue of determining whether a chosen model is mechanistically correct. Much has been written regarding criteria for testing different models and estimating uncertainties of parameter values, given a set of experimental data (see ( Johnson, 2008) for a recent review). Models are typically rejected on the basis of nonrandom residuals and/or physically unreasonable fitted parameter values. Confidence intervals on parameter values can be estimated by statistical methods such as bootstrap analysis, jack-knife analysis, or simple repetition of the experiment (all resampling methods that differ in their severity), analysis of the parameter covariance matrix, systematic exploration of how the variance of the fit increases as parameters are varied, and Monte Carlo simulation ( Johnson, 2008). It is an unfortunate fact that these critical tests usually come after data have been collected. Experimental analysis of repeat-protein folding is a laborious undertaking (involving cloning of multiple genes; expression and purification of multiple proteins of different length; and quantitative analysis of each protein, preferably multiple times, by denaturant titrations), and it would be good to know in advance whether such efforts are likely to yield significant thermodynamic insight. Although many aspects of the sequence in which data acquisition precedes parameter and model testing are largely unavoidable, it is often the case that experiments can be designed a priori so that parameters of interest can be determined with confidence, and alternative models can be compared and discriminated. This is particularly true for repeat proteins, given their simple linear architecture and the simple form of the linear free energy relationships implicit in the linear Ising model. Here we will describe how equilibrium-folding studies on repeat proteins can be designed to maximize the information content of the results, given the framework of a particular thermodynamic model. In addition to helping to design future experiments, these ideas help to interpret published studies on repeat-protein folding. By considering the free energies of folding of a collection of repeat proteins of different length as a system of linear equations, simple ideas from linear algebra relating to solvability can be used to determine whether parameters are likely to be well determined, and if not, what additional constructs would be required to improve the situation. For a set of repeat proteins of different length and composition, the free energy difference between the fully folded and fully unfolded states can be written as:

113

Analysis of Folding with Nearest-Neighbor Models

Table 4.1 Free energies of folding of capped consensus repeat protein constructs

A B C D E F G H I J K

nrep

Construct

Folding free energy (D ! N)

3 4 5 3 4 5 6 4 5 4 5

R3 R4 R5 NRC NR2C NR3C NR4C NR3 NR4 R3C R4C

DG DG DG DG DG DG DG DG DG DG DG

¼ 3DGR þ 2DGi,iþ1 ¼ 4DGR þ 3DGi,iþ1 ¼ 5DGR þ 4DGi,iþ1 ¼ 1DGN þ 1DGR þ 1DGC þ 2DGi,iþ1 ¼ 1DGN þ 2DGR þ 1DGC þ 3DGi,iþ1 ¼ 1DGN þ 3DGR þ 1DGC þ 4DGi,iþ1 ¼ 1DGN þ 4DGR þ 1DGC þ 5DGi,iþ1 ¼ 1DGN þ 3DGR þ 3DGi,iþ1 ¼ 1DGN þ 4DGR þ 4DGi,iþ1 ¼ 3DGR þ 1DGC þ 3DGi,iþ1 ¼ 4DGR þ 1DGC þ 4DGi,iþ1

Notes: N- and C-terminal caps are assumed to differ in intrinsic folding energy from consensus repeats (R), but have the same interfacial energy (DGi,iþ1). Relaxing this restriction would introduce two additional interfacial energy terms (N:R and R:C as well as R:R).

DG ¼

X

nk DGi; k þ

k repeat types

X

nj DGi;iþ1; j :

ð4:36Þ

j interface types

The first sum takes into account the different intrinsic energy terms, and the second sum takes into account the different interaction terms. Table 4.1 provides some examples, both for a homopolymic repeat-protein and for a heteropolymeric repeat-protein with unique N- and C-terminal caps. For a set of consensus repeats without caps (lines A–C, Table 4.1), the three free energy equations can be written as:

2

3 44 5

2 3 3 DG∘ A 2 DGR ¼ 4 DG∘ B 5; 35 DGi; iþ1 DG∘ C 4

ð4:37Þ

where DG A is the free energy difference between the native and denatured states for the reaction defined on line A, and other DG values are analogously defined. Based on simple linear equation theory, this set of linear equations has a unique solution, because the columns (and rows) of the matrix on the left-hand side are independent.1 As a result, the matrix has full column rank (r ¼ 2); that is, elimination produces a pivot in every column (Strang, 2005). This is fundamentally a result of the fact that linear 1

If there are experimental errors associated with the column on the right-hand side, the solution will be inexact, but can be found using least-squares.

114

Tural Aksel and Doug Barrick

repeat proteins have one more repeat than interface, and thus a length dependence can resolve these two parameters. For a set of consensus repeats with caps (lines D–G, Table 4.1), the free energy equations can be written as:

2

1 61 6 41 1

1 2 3 4

1 1 1 1

3 2 3 32 DGN 2 DG∘ D ∘ 7 6 7 6 37 76 DGR 7 ¼ 6 DG∘ E 7: 4 54 DGC 5 4 DG F 5 DGi;iþ1 DG∘ G 5

ð4:38Þ

Although there are enough equations to solve four unknowns (the column vector on the left-hand side), the columns are not independent. The first and third columns are equal; moreover, the sum of the first and second columns is equal to the fourth. Thus, the matrix lacks full column rank (again, r ¼ 2). As a result, this set of linear equations has an infinite number of solutions. Thus, the parameters cannot be uniquely determined by elimination. This problem will not be rectified by including additional equations (constructs) that retain both N- and C-terminalcapping repeats. Instead, if a set of four (or more) constructs is considered in which the caps vary along with the length, unique intrinsic folding energies can be determined for both the N- and C-terminal caps. For example, lines B, F, H, and J of Table 4.1 define the system of equations

2

0 61 6 41 0

4 3 3 3

0 1 0 1

3 2 3 32 DGN 3 DG∘ D ∘ 7 6 7 6 47 76 DGR 7 ¼ 6 DG∘ E 7: 4 5 4 5 DGC DG F 5 3 DGi;iþ1 DG∘ G 3

ð4:39Þ

The columns of this matrix are now independent, showing full column (and row) rank (r ¼ 4). Thus, the four thermodynamic parameters can be uniquely determined (although adding equations by including additional constructs will likely improve the robustness of the solution, given uncertainties in free energy measurements). In principle, this type of analysis could be applied directly to experimental unfolding free energies determined by linear extrapolation from denaturant-induced unfolding transitions (Pace, 1986; Street et al., 2008) assuming a two-state (high cooperativity) model. However, if partly folded states are populated in the transition, either because of moderate values of DGi,iþ1 or because stability is unevenly distributed along the repeat array, such free energy estimates will be incorrect. In such cases, globally fitting

Analysis of Folding with Nearest-Neighbor Models

115

the denaturation transitions directly using an Ising model, which takes partly folded states into account, can improve estimates of free energy terms, in favorable cases providing access to parameters that could not be determined on the basis of considerations of matrix rank above (see the following discussion of consensus ankyrin arrays). Nonetheless, this simple analysis is extremely useful both for thinking about what constructs need to be studied to analyze a particular model, and for thinking about why certain parameters do not appear to be well determined, given a set of data. This type of rank analysis can also be applied to models that can accommodate differences between interfaces, models that include non-nearest-neighbor interactions, and by differentiation with respect to denaturant concentration, partitioning of m-values into intrinsic and interfacial components.

7. Matrix Homopolymer Analysis of Consensus TPR Folding The first study in which a homopolymeric Ising model was used to analyze repeat-protein folding involved a collection of consensus TPR arrays of different lengths (Kajander et al., 2005). As described earlier, TPR units are composed of two anti-parallel a-helices (termed A and B) and are arranged in a linear array in which adjacent repeats twist along the long axis of the domain, like the steps in a spiral staircase (Fig. 4.1C). Using TPR units of identical consensus sequence (termed CTPRan by the authors, where n represents the number of full 34 residue TPR units in a given construct), Regan and coworkers created a series of constructs of different lengths that were amenable to analysis using a homopolymeric Ising model (see section 3). However, as with other consensus repeat arrays, to make their CTPR proteins soluble, the authors added an additional polar C-terminal-capping helix (a variant of helix A with four polar substitutions). By monitoring helical structure using CD spectroscopy as a function of guanidine hydrochloride concentration, Kajandar et al. were able to generate and analyze unfolding transitions for constructs containing from two to ten full TPR repeats, as well as the C-terminal cap (CTPRa2 to CTPRa10; data reproduced from Fig. 4.2 of ref (Kajander et al., 2005)). The authors developed a homopolymer partition function in which each helix, rather than each repeat, is treated as the single repeating unit. Applying the homopolymer approximation at the single-helix level treats the A and B helices (and the C-terminal-capping helix) as energetically equivalent, both in terms of intrinsic stability and in terms of nearest-neighbor interaction. Using this model, Kajandar et al were able to globally fit all of these transitions (and in a subsequent paper included even longer constructs (Kajander et al., 2007)) to a single intrinsic folding and interfacial interaction

116

Tural Aksel and Doug Barrick

term (Kajander et al., 2005), clearly demonstrating the applicability of the linear Ising model to repeat protein folding. Several aspects of this seminal study warrant further discussion. First, Kajandar et al. phrased the interaction energies in a way that is closer to the original magnetic spin-spin interactions (Kajander et al., 2005). Although at first glance the two representations look different, they can be shown to be identical, and the CTPR-unfolding data can be fitted equally well with the two formulations of the homopolymer Ising model. The curves in Fig. 4.3 were generated by fitting the model derived above to data from (Kajander et al., 2005); nearly identical fits and w2 values are obtained with their representation of the model. Moreover, parameters from the two different formulations are nearly identical, when converted using relationships given previously (Kloss et al., 2008). Second, fitted parameter values (DGi, DGi,iþ1, and the denaturant dependence, which the authors assigned entirely to intrinsic folding) appear to be very well determined. Kajandar et al. reported errors of 1% (Kajander et al., 2005), although no description was given for how these error margins were determined. To help compare the confidence levels of these parameters with 1.2 CTPR2 CTPR3 CTPR4 CTPR6 CTPR8 CTPR10

Normalized CD signal at 222 nm

1.0

0.8

0.6

0.4

0.2

0.0 −0.2 1.0

1.5

2.0

2.5

3.0

3.5

4.0

[gdnHCl]

Figure 4.3 Unfolding and 1D-Ising analysis of consensus TPR proteins. Data are from (Kanandar et al. 2005), and were obtained using the program DigitizeIt 1.5 for Mac OSX (http://www.digitizeit.de). Solid lines result from fitting a homopolymer partition function, with single helices as individual lattice sites, to the guanidine unfolding transitions. Fitted parameters are very close to those determined by Kajandar et al. (Table 4.2; fitted parameters are recast to DGi and DGi,iþ1), and are well determined by the data.

117

Analysis of Folding with Nearest-Neighbor Models

those from other studies and from other models, we have evaluated parameter confidence intervals by Bootstrap analysis ( Johnson, 2008). Briefly, the bestfitted parameters for the model used by Kajandar were used to generate ‘‘error-free’’ data at each experimental denaturant concentration for each construct. Residuals (observed minus fitted) were then used as a source of random error, by randomly sampling (with replacement) from the experimental residual set. The new data set (error-free plus randomized residuals) was then re-fitted using the same model to generate a new set of fitted parameters. By repeating this procedure many times (1000-10000, depending on the distribution in parameter space) for different random data sets, a distribution of fitted values was generated, from which confidence intervals were approximated at the 95% level. Using the Bootstrap method, we find fitted values of DGi, DGi,iþ1, and mi to be determined to within 2-3% at the 95% confidence level (Table 2), quite similar to the bounds provided by (Kajandar et al., 2005). These narrowly bounded parameters provide significant insight into the origins of TPR folding and cooperativity. The parameters indicate that each helix has an unfavorable free energy of folding (þ2.2 kcalmol1; Table 4.2), Table 4.2 Paramaters from 1D-Ising analysis of consensus repeat proteins Consensus TPR (Kajander et al., 2005)a

DGN

n.d.

DGR

2.30 0.04 2.26 0.07 n.d.

DGC DGi,iþ1 mi mcap

4.52 0.04 4.13 0.10 0.57 0.01 0.57 0.01 n.d.

Consensus Ankyrin (Wetzel et al., 2008)b

10.6 0.6c 9.2 1.1 3.3 0.2 1.9 1.2 10.6 0.6c 9.3 1.9 14.2 0.7 11.8 1.5 1.1 0.1 1.1 0.2 0.83 0.04 0.65 0.09

Consensus Ankyrind

5.2 0.1 5.4 0.1 4.4 0.1 4.4 0.1 6.8 0.1 6.8 0.1 11.2 0.2 11.2 0.2 0.75 0.01 0.75 0.02 n.d.

Notes: Energies are in kcalmol1; m-values are in kcalmol1M1. n.d., not determined in the model used. a Parameters for CTPR folding are for single helices. Parameters have been converted from the original formulation to DGi and DGi,iþ1, to allow comparison with other data. Errors were propagated assuming the published H and J values to be uncorrelated. a,b The top line for each parameter gives estimated parameter values and uncertainties given by the authors. To facilitate comparison, the bottom line (italics) gives parameter values based on our fits, with uncertainties determined by bootstrap analysis as described above. c For the consensus ankyrin repeats of Wetzel et al., DGN and DGC are assumed identical, and are fitted as a single parameter. d For the consensus ankyrin repeats from our laboratory, parameters and errors in the top line come from resampling guanidine titrations as described; errors in the bottom line (italics) come from bootstrap analysis as described.

118

Tural Aksel and Doug Barrick

which is more than offset by a favorable helix-helix pairing energy (4.5 kcalmol1). As was found for the Notch ankyrin domain, and for consensus ankyrin constructs (see the following section), this leads to cooperative folding. Third, although the treatment of the A and B helices as identical is clearly consistent with the published data, it would be surprising if the two helices were thermodynamically identical. The A and B helices have virtually no sequence similarity (Main et al., 2003). Moreover, structural analysis shows that the packing interactions of helices A and B differ substantially. Equally important, whereas the B-helices interact mostly with A-helices, lacking contacts with one another, the A-helices contact neighboring A-helices from adjacent TPRs, as well as their adjacent B-helices, as can be seen from the zigzag patterns in CTPR contact maps (Kajander et al., 2007). Adjacent A-helices have a two-unit separation in a single-helix Ising model; thus, close contacts between adjacent A-helices would suggest a more complex model that has non-nearest-neighbor terms (DGi,iþ2). Moreover, the C-terminal polar cap may introduce further complexity, as its folding energy may differ significantly even from the A-helix from which it is derived. Given all of these sequence complexities, why not use a more complicated model to describe CTPR folding? One answer to this question is that a simple model works just fine. But does that mean the simple model is right? Given the differences between the two types of helices, a more complex model in which the A and B helices are treated differently makes more physical sense. Unfortunately, all of the CTPR constructs in Kajandar et al have the same number of A and B helices, and thus it is not possible to separate the relative contributions of the two. Consideration of the free energy equations describing these constructs in terms of separate A and B helices makes this clear:

2

3 2 6 4 3 6 6 5 4 6 6 7 6 6 4 9 8 11 10

2 3 3 DGCTPRa2 4 2 3 6 DGCTPRa3 7 6 7 6 7 7 DGA 6 DGCTPRa4 7 7 8 74 5 6 7: DGB ¼6 7 DG 12 7 CTPRa6 6 7 7 DGi; iþ1 4 5 DGCTPRa8 5 16 DGCTPRa10 20

ð4:38Þ

The matrix on the right hand side only has a rank of 2, and thus there are an infinite number of solutions to the set of equations. Treating each helix as identical simply adds column 1 and 2, making the unknown corresponding to this column the sum of DGA and DGB. If instead a single A helix were deleted from one of the constructs (e.g., from the longest

Analysis of Folding with Nearest-Neighbor Models

119

construct, making the last row [10 10 19] ), the matrix would gain full column rank (r ¼ 3), and DGA and DGB would be resolved. This illustrates that in order, to determine a particular parameter, the structural element corresponding to that parameter must be varied relative to those corresponding to the other fitted parameters.

8. Matrix Heteropolymer Analysis of Consensus Ankyrin Repeat Folding Consensus ankyrin repeats have been available for some time (Mosavi et al., 2002; Binz et al., 2003), and have been used successfully as a platform for protein design (Steiner et al., 2008). However, the application of Ising analyis to the folding of consensus ankyrin repeats has been relatively recent (Wetzel et al., 2008). To maintain solubility, Pluckthun and coworkers added capping repeats on both termini (called N and C respectively). This modification is similar to the C-terminal TPR-capping helix of Regan and coworkers, although the capping N and C ankyrin repeats designed by Pluckthun and coworkers are significantly different from their consensus sequences, with only 15/33 and 8/24 identities, respectively. Using guanidine hydrochloride-induced unfolding, Pluckthun and coworkers obtained complete reversible unfolding transitions that could be used for Ising analysis for three constructs, NI1C, NI2C and NI3C (where I denotes internal consensus ankyrin repeats (Wetzel et al., 2008), Fig. 4.4). These three transitions were analyzed using a linear Ising model in which the N- and C-terminal-capping repeats have intrinsic free energies (DGcap) that differ from the internal consensus repeats but are identical to one another. In contrast, a single interfacial interaction energy was used (given the large number of sequence changes in the capping repeats, this may or may not be a valid assumption). As with the CTPR analysis, the denaturant dependence was attributed entirely to intrinsic parameters, although different denaturant sensitivities were assumed for the cap and internal repeats (mcap and mi, respectively). As can be seen from the solid lines in Fig. 4.4, this model describes the three fitted unfolding transitions reasonably well. Fitted parameters from (Wetzel et al. 2008) are listed in Table 4.2, along with confidence intervals provided by the authors. Again, there is no description of how these confidence intervals were determined. Using the heteropolymer partition function described above, and the same bootstrap method for error analysis described to analyze the CTPR array, we obtain intrinsic and interfacial energies that agree within 1–2.5 kcal/mol, although we find significantly greater margins of uncertainty on the fitted parameters than the authors; these values are also higher than those we obtained by the same error analysis of the CTPR data. One reason for the high level of parameter uncertainty may be that none of the three analyzed constructs have their

120

Tural Aksel and Doug Barrick

1.2

Normalized CD signal at 222 nm

1.0

0.8

0.6

0.4

0.2 NI1C NI2C NI3C

0.0 −0.2 0

1

2

3

4 [gdnHCl]

5

6

7

8

Figure 4.4 Unfolding and 1D-Ising analysis of capped consensus ankyrin repeat proteins. Data are from Pluckthun et al. (Wetzel et al., 2008), and were obtained using the program DigitizeIt 1.5 for Mac OSX (http://www.digitizeit.de). Solid lines result from fitting a heteropolymer partition function, assuming the N- and C-caps have identical intrinsic folding energies that differ from the value for the internal repeats. Likewise, the effect of guanidine is partitioned into intrinsic folding energies and is allowed to vary between the capping and internal repeats. The pretransition for NI3C appears to partly resolve the parameters from the capping and internal repeats.

caps removed, making it difficult to separate their contribution to free energy from the other parameters. Representing the constructs as a system of linear equations with a single cap free energy gives:

2

2 42 2

1 2 3

3 2 3 32 DG∘NI1 C DGcap 2 3 54 DGi 5 ¼ 4 DG∘NI2 C 5: DG∘NI3 C DGi; iþ1 4

ð4:41Þ

In the coefficient matrix, half the first column plus the second column is equal to the third column, giving a rank of only 2, and again, an infinite number of solutions. Although at face value, this would severely compromise the accuracy of the fitted parameters, one feature of the unfolding transitions of Wetzel et al. may significantly narrow parameter confidence intervals: the appearance of a partial unfolding transition in the long native baseline of NI3C. Interpreted as a separate unfolding event involving one or both caps, this pretransition provides additional information about the stability of the caps relative to the internal repeats. It is as if, from this region

121

Analysis of Folding with Nearest-Neighbor Models

of the unfolding transition, the authors had prepared the construct I3 for analysis, which would give the coefficient matrix above full column rank. As described in section 6, this study illustrates the value of analyzing complete denaturant unfolding transitions using the full partition function. Nonetheless, it appears that even with this information, the fitted parameter values are not as well determined as for the analysis of CTPR folding. A more direct way to obtain information on the contribution of the caps would be to prepare constructs that lack the caps. Although ankyrin consensus arrays lacking both caps show poor solubility, we have been able to prepare arrays that lack either one cap or the other (we will refer to these as NRn and RnC, because in this partly exposed context these proteins lose their internal nature; TA & DB, manuscript in preparation). Unlike the cap sequences of Wetzel et al., these caps differ by only four nonpolar ! polar/ charged substitutions on the ‘‘outside’’ face of the array, which should result in NR and RC interfaces that are much closer to full consensus (RR) interfaces. By combining these constructs with NRnC constructs, we have been able to obtain guanidine-induced folding transitions for ten constructs that are fully resolved and fully reversible (Fig. 4.5). The difference in the 1.2

Normalized CD signal at 222 nm

1.0 0.8 0.6

NR1 NR2 NR3 NR4 R2C R3C R4C NR1C NR2C NR3C

0.4 0.2 0.0 −0.2

−1

0

1

2

3 4 [gdnHCl] (M )

5

6

7

8

Figure 4.5 Unfolding and 1D-Ising analysis of consensus ankyrin repeat proteins with and without terminal caps. Constructs are described in the text (TA and DB, in preparation). Solid lines result from fitting a heteropolymer partition function, assuming N- and C-caps and the internal consensus repeats (R) all have different intrinsic folding energies. The effect of guanidine is partitioned into intrinsic folding energies and is assumed to be the same for all types of repeats. Different contributions of the N-, C-, and R repeats can be seen by noting the shifts between constructs containing the same number of repeats, but different identities.

122

Tural Aksel and Doug Barrick

contributions of the three types of repeats (N, R, C ) to stability is clearly illustrated in the unfolding transitions. For constructs that have the same number of repeats (same symbols, Fig. 4.5), the least stable construct has both caps, indicating that the caps are less stable than the consensus repeats. Between any pair of single-cap constructs, the one with the N-terminal cap is more stable than the one with the C-terminal cap. By independently removing the capping repeats, we have been able to test a number of different parameterizations of the Ising model to determine the relative intrinsic stabilities and contributions of the caps to denaturant-induced unfolding. The fits shown in Fig. 4.5 are from an Ising model with separate intrinsic free energies for each cap and consensus sequence (DGN, DGR, DGC), a single interfacial energy (DGi,iþ1), and a single m-value that affects only intrinsic folding energies. Fitted parameters are included in Table 4.2. In matrix form, the linear free energy equations for this data set show full column rank, allowing each parameter to be determined with minimal parameter correlation. To permit comparison to the other analyses described above, we have calculated errors using the bootstrap method.2 Using this method, uncertainties (at the 95% confidence level) on fitted energy values are approximately 2% of the fitted parameters, about the same as for the CTPR studies, but significantly better than for analysis of NIC-NI3C. Overall, the two consensus ankyrin repeat studies show a similar view of cooperativity in which the individual repeats are unstable, and the interfacial interaction is highly stabilizing (Table 4.2). Again, this is consistent with the high degree of cooperativity seen in solution, because single folded repeats should be rarified, and conformations with a large number of interfaces (blocks of consecutive folded repeats) should be maximized. Although this is qualitatively similar to what was seen in the CTPR study, cooperativity is much higher for the consensus ankyrin arrays. This is especially clear when the fitted Ising parameters from the CTPR studies are converted to whole-repeat (rather than single-helix) parameters. The intrinsic folding energy of an entire CTPR (DGi,helix þ DGi,iþ1,helix) is 0.1 kcal/mol (nearly half-folded), whereas the interfacial energy is 4.5 kcal/mol. Thus, individual CTPR repeats are moderately less stable than consensus ankyrin repeats (the latter at 3–4 kcal/mol), whereas the interfaces for CTPRs are significantly less stable than those of consensus ankyrin repeats (12 to 14 kcal/mol). The fitted Ising parameters for the two ankyrin consensus arrays are in reasonable agreement (Table 4.2). Fitted DGi values for consensus (noncapping) repeats and fitted DGi,iþ1 values are within 1–2 kcal/mol. Given 2

To obtain a more rigorous measure of parameter uncertainties, we have measured each unfolding transition at least three times, allowing us to use resampling methods to fit separate transitions and compare results. This resampling approach, which employs more data, cannot be directly compared with the other studies analyzed here, but it gives confidence intervals similar to those from the bootstrap method.

Analysis of Folding with Nearest-Neighbor Models

123

the differences between consensus sequences from the two studies (67% identity), these modest differences are not surprising. These parameters both favor folding more (by about 3 kcal/mol) than the parameters extracted from the deletion analysis (Mello and Barrick, 2004), which may also be a reflection of the substantial deviation from consensus seen for naturally occurring ankyrin repeat proteins. Fitted DGi values for the capping repeats for the two ankyrin studies show larger differences: the capping repeats of Wetzel et al are considerably less stable. This difference may result from the greater number of sequence differences, compared to the consensus, in that study.

9. Summary and Future Directions The studies featured in this article show quite clearly that a simple nearestneighbor model that has been highly successful in describing a wide variety of cooperative phenomena can be used to study repeat-protein folding and extract quantitative interaction energies from real data. Although Ising-like models have been applied to model globular protein folding (see, e.g., (Munoz, 2001)), the heterogeneity of globular proteins and their intrachain contacts makes such models overparameterized, requiring assumptions about energy terms that come from informatics or from native state structures, rather than from first principles or measurements. A recent retrospective from Harold Scheraga, one of the major contributors to the application of Ising analysis to biopolymers, states of his epic research trajectory: ‘‘It was soon realized that the helix–coil transition is not a good model for conformational changes in globular proteins, because the one-dimensional Ising model does not capture the cooperative features, embodied in the interplay between short- and longrange interactions, of the folding/unfolding transition of globular proteins’’ (Scheraga, 2008). Although repeat proteins differ from globular proteins in that they have structural simplicity and are somewhat elongated, they are the same in many other key respects. They have large, continuous hydrophobic cores, they have significant medium and long-range electrostatic interactions (Kloss and Barrick, 2008; Merz et al., 2008), and they are highly cooperative (Kloss et al., 2008). Thus, repeat proteins provide a unique experimental system to dissect protein folding using this elegant model. One of the most exciting aspects of the work featured here is that it provides an opportunity to understand protein-folding cooperativity in quantitative and structural detail. Determination of DGi,iþ1 provides a direct measure of long-range coupling within a folded protein. Further analysis of repeat proteins using the 1D Ising model should reveal not only the structural origins of this cooperativity, but how such cooperativity influences the kinetics of folding.

124

Tural Aksel and Doug Barrick

ACKNOWLEDGMENTS This work was supported by NIH grant RO1GM068462 to DB.

REFERENCES Applequist, J., and Damle, V. (1965). Thermodynamics of the Helix-Coil Equilibrium in Ologoadenylic Acid from Hypochromicity Studies. J. Am. Chem. Soc. 87, 1450–1458. Auton, M., Holthauzen, L. M., and Bolen, D. W. (2007). Anatomy of energetic changes accompanying urea-induced protein denaturation. Proc. Natl. Acad. Sci. USA 104, 15317–15322. Binz, H. K., Stumpp, M. T., Forrer, P., Amstutz, P., and Pluckthun, A. (2003). Designing repeat proteins: Well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 332, 489–503. Bolen, D. W., and Rose, G. D. (2008). Structure and energetics of the hydrogen-bonded backbone in protein folding. Annu. Rev. Biochem. 77, 339–362. Brush, S. G. (1967). History of the Lenz-Ising Model. Rev. Mod. Phys. 39, 883–893. Courtemanche, N., and Barrick, D. (2008). Folding thermodynamics and kinetics of the leucine-rich repeat domain of the virulence factor Internalin B. Protein Sci. 17, 43–53. Crothers, D. M., and Kallenbach, N. R. (1966). On the Helix-Coil Transition in Heterogeneous Polymers. J. Chem. Phys. 45, 917–927. De La Cruz, E. M. (2005). Cofilin Binding to Muscle and Non-muscle Actin Filaments: Isoform-dependent Cooperative Interactions. J. Mol. Biol. 346, 557–564. DeLano, W. L. (2003). MacPyMOL: PyMOL Enhanced for Mac OS X Palo Alto, DeLano Scientific. Groves, M. R., and Barford, D. (1999). Topological characteristics of helical repeat proteins. Curr. Opin. Struct. Biol. 9, 383–389. Ising, E. (1925). Title Unavailable. Z. Physik 31, 253. Johnson, M. L. (2008). Nonlinear least-squares fitting methods. Methods Cell Biol. 84, 781–805. Kajander, T., Cortajarena, A. L., Main, E. R., Mochrie, S. G., and Regan, L. (2005). A new folding paradigm for repeat proteins. J. Am. Chem. Soc. 127, 10188–10190. Kajander, T., Cortajarena, A. L., Mochrie, S., and Regan, L. (2007). Structure and stability of designed TPR protein superhelices: Unusual crystal packing and implications for natural TPR proteins. Acta Crystallogr. D Biol. Crystallogr. 63, 800–811. Kajava, A. V. (2001). Review: Proteins with repeated sequence–structural prediction and modeling. J. Struct. Biol. 134, 132–144. Kloss, E., and Barrick, D. (2008). Thermodynamics, kinetics, and salt dependence of folding of YopM, a large leucine-rich repeat protein. J. Mol. Biol. 383, 1195–1209. Kloss, E., Courtemanche, N., and Barrick, D. (2008). Repeat-protein folding: New insights into origins of cooperativity, stability, and topology. Arch. Biochem. Biophys. 469, 83–99. Kobe, B., and Kajava, A. V. (2000). When protein folding is simplified to protein coiling: The continuum of solenoid protein structures. Trends Biochem. Sci. 25, 509–515. Lenz, W. (1920). Title Unavailable. Physik. Z. 21, 613. Lifson, S., and Roig, A. (1961). On the Theory of Helix-Coil Transition in Polypeptides. J. Chem. Phys. 34, 1963–1974. Lubman, O. Y., Kopan, R., Waksman, G., and Korolev, S. (2005). The crystal structure of a partial mouse Notch-1 ankyrin domain: Repeats 4 through 7 preserve an ankyrin fold. Protein Sci. 14, 1274–1281.

Analysis of Folding with Nearest-Neighbor Models

125

Main, E. R., Lowe, A. R., Mochrie, S. G., Jackson, S. E., and Regan, L. (2005). A recurring theme in protein engineering: The design, stability and folding of repeat proteins. Curr. Opin Struct. Biol. 15, 464–471. Main, E. R., Xiong, Y., Cocco, M. J., D’Andrea, L., and Regan, L. (2003). Design of stable alpha-helical arrays from an idealized TPR motif. Structure 11, 497–508. McGhee, J. D., and von Hippel, P. H. (1974). Theoretical Aspects of DNA-protein interadtions: Co-operative and non-co-operative binding of large ligands to a onedimensional homogeneous lattice. J. Mol. Biol. 86, 469–489. Mello, C. C., and Barrick, D. (2004). An experimentally determined protein folding energy landscape. Proc. Natl. Acad. Sci. USA 101, 14102–14107. Merz, T., Wetzel, S. K., Firbank, S., Pluckthun, A., Grutter, M. G., and Mittl, P. R. (2008). Stabilizing ionic interactions in a full-consensus ankyrin repeat protein. J. Mol. Biol. 376, 232–240. Mosavi, L. K., Cammett, T. J., Desrosiers, D. C., and Peng, Z. Y. (2004). The ankyrin repeat as molecular architecture for protein recognition. Protein Sci. 13, 1435–1448. Mosavi, L. K., Minor, D. L. Jr., and Peng, Z. Y. (2002). Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl. Acad. Sci. USA 99, 16029–16034. Munoz, V. (2001). What can we learn about protein folding from Ising-like models. Curr Opin. Struct. Biol. 11, 212–216. Niss, M. (2005). History of the Lenz-Ising Model 1920–1950: From Ferromagnetic to Cooperative Phenomena. Arch. Hist. Exact. Sci. 59, 267–318. Pace, C. N. (1986). Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 131, 266–280. Poland, D., and Scheraga, H. A. (1970). Theory of Helix-Coil Transitions in Biopolymers Academic Press, New York. Poland, D., Vournakis, J. N., and Scheraga, H. A. (1966). Cooperative Interactions in Single-Strand Oligomers of Adenylic Acid. Biopolymers 4, 223–235. Schellman, J. A. (1958). The Factors Affecting the Stability of Hydrogen-Bonded Polypeptide Structures in Solution. J. Phys. Chem. 62, 1485–1494. Scheraga, H. A. (2008). From helix-coil transitions to protein folding. Biopolymers 89, 479–485. Scholtz, J. M., Barrick, D., York, E. J., Stewart, J. M., and Baldwin, R. L. (1995). Urea unfolding of peptide helices as a model for interpreting protein unfolding. Proc. Natl. Acad. Sci. USA 92, 185–189. Steiner, D., Forrer, P., and Pluckthun, A. (2008). Efficient selection of DARPins with sub-nanomolar affinities using SRP phage display. J. Mol. Biol. 382, 1211–1227. Strang, G. (2005). Introduction to Linear Algebra Wellesly-Cambridge Press, Wellesly, MA. Street, T. O., Courtemanche, N., and Barrick, D. (2008). Protein folding and stability using denaturants. Met. Cell Biol. 84, 295–325. Wetzel, S. K., Settanni, G., Kenig, M., Binz, H. K., and Pluckthun, A. (2008). Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins. J. Mol. Biol. 376, 241–257. Zimm, B., and Bragg, J. (1959). Theory of the Phase Transition between Helix and Random Coil in Polypeptide Chains. J. Chem. Phys. 31, 526–535. Zimm, B. H. (1960). Theory of ‘‘Melting’’ of the Helical Form in Double Chains of the DNA Type. J. Chem. Phys. 33, 1349–1356.

C H A P T E R

F I V E

Isothermal Titration Calorimetry: General Formalism Using Binding Polynomials ¨n,* and Adrian Velazquez-Campoy† Ernesto Freire,* Arne Scho Contents 1. 2. 3. 4. 5. 6. 7. 8.

Introduction The Binding Polynomial Microscopic Constants and Cooperativity Independent or Cooperative Binding? Analysis of ITC Data Using Binding Polynomials A Typical Case: Macromolecule with Two Ligand-Binding Sites Data Analysis Data Interpretation 8.1. Independent ligand binding: Two identical binding sites 8.2. Independent ligand binding: Two nonidentical binding sites 8.3. Cooperative ligand binding: Two identical binding sites 9. An Experimental Example 10. Experimental Situations from the Literature 11. Macromolecule with Three Ligand-Binding Sites 12. Conclusions Appendix Acknowledgment References

128 129 131 132 133 135 137 141 142 143 144 146 147 150 150 151 154 154

Abstract The theory of the binding polynomial constitutes a very powerful formalism by which many experimental biological systems involving ligand binding can be analyzed under a unified framework. The analysis of isothermal titration calorimetry (ITC) data for systems possessing more than one binding site has been cumbersome because it required the user to develop a binding model to fit the data. Furthermore, in many instances, different binding models give rise to

* {

Department of Biology, Johns Hopkins University, Baltimore, Maryland, USA Institute of Biocomputation and Physics of Complex Systems (BIFI), and Fundacio´n Arago´n IþD (ARAIDBIFI), Universidad de Zaragoza, Zaragoza, Spain

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04205-5

#

2009 Elsevier Inc. All rights reserved.

127

128

Ernesto Freire et al.

identical binding isotherms, making it impossible to discriminate binding mechanisms using binding data alone. One of the main advantages of the binding polynomials is that experimental data can be analyzed by employing a general model-free methodology that provides essential information about the system behavior (e.g., whether there exists binding cooperativity, whether the cooperativity is positive or negative, and the magnitude of the cooperative energy). Data analysis utilizing binding polynomials yields a set of binding association constants and enthalpy values that conserve their validity after the correct model has been determined. In fact, once the correct model is validated, the binding polynomial parameters can be immediately translated into the model specific constants. In this chapter, we describe the general binding polynomial formalism and provide specific theoretical and experimental examples of its application to isothermal titration calorimetry.

1. Introduction The introduction of the binding polynomial theory several decades ago by Jeffries Wyman provided a general statistical thermodynamic framework for studying ligand binding to macromolecules (Wyman, 1948, 1964. Wyman and Gill, 1990). Being equivalent to a partition function, the binding polynomial contains all the information about the system and allows derivation of all thermodynamic experimental observables (e.g., average number of ligand molecules bound, average excess enthalpy). Contrary to model-dependent parameters, the parameters that define the binding polynomial have a general validity. Consequently, unless a binding model has been validated, the binding polynomial should be the preferred starting point for the analysis of complex binding situations. There are experimental situations that cannot be assigned to a particular model. For two or more binding sites, different binding models can give rise to mathematically equal binding equations. In those cases, the discrimination between models cannot be made on the basis of binding data alone and requires extrathermodynamic arguments. The binding polynomial represents the basis for a general, modelindependent analysis of a binding experiment. The same methodology is applicable to a system with one or any arbitrary number of binding sites without the user having to decide on any particular binding mechanism. It is the preferred analysis protocol unless a specific binding model has been validated for the system under consideration. Among the experimental techniques employed for studying ligand binding, isothermal titration calorimetry (ITC) exhibits several features that render it a unique experimental tool: (1) the signal measured (heat of reaction) is a universal probe, avoiding the use of nonnatural spectroscopic labels; (2) the interacting molecules are in solution; and (3) it allows determining simultaneously the association constant, the enthalpy, and the

129

Binding Polynomials in ITC

stoichiometry of binding in a single experiment. Accordingly, ITC data is ideally suited to be analyzed using the binding polynomial formalism. This chapter discusses the theoretical basis of the binding polynomials and their application to the analysis of ITC data.

2. The Binding Polynomial The equilibrium of a ligand with a macromolecule with n ligand binding sites can be described in terms of two different sets of association constants, the overall association constants, bi, or, alternatively the stepwise association constants, Ki:

M þ iL ! MLi

½MLi ½M½Li ; ½MLi Ki ¼ ½MLi1 ½L bi ¼

MLi1 þ L ! MLi

ð5:1Þ

The two sets of descriptors are equivalent, and they are related through the following relationships:

bi ¼

i Y j¼1

Kj

b Ki ¼ i bi1

:

ð5:2Þ

Because the stepwise binding constants and the overall association constants are related and can be transformed into each other by using Eq. (5.2), for convenience this chapter will use the overall association constants, bi. In a binding experiment, the main parameter is the average number of ligand molecules bound per macromolecule, nLB, which can be calculated as a simple enumeration: n P

i½MLi ½LB i¼0 ½ML þ 2½ML2 þ 3½ML3 þ . . . ; nLB ¼ ¼P ¼ n ½MT ½M þ ½ML þ ½ML2 þ ½ML3 þ . . . ½MLi i¼0

ð5:3Þ

130

Ernesto Freire et al.

where n is the number of binding sites in the macromolecule, [M]T is the total concentration of macromolecule, and [L]B is the concentration of ligand bound to the macromolecule. According to its definition, nLB takes values between 0 and n. In terms of the overall association constants, the binding parameter nLB can expressed as: n P

nLB ¼ i¼0 n P i¼0

ibi ½Li :

bi ½Li

ð5:4Þ

This equation is the so-called Adair’s equation, which was first used for analyzing the oxygen binding to hemoglobin (Adair, 1925). The binding polynomial is defined as the partition function, P, of the system, and therefore is the sum of the different species concentrations relative to that of the free macromolecule that is defined as the reference: n X ½MLi

P¼

i¼0

½M

;

ð5:5Þ

and it can be expressed in terms of the association constants: n X

P¼

bi ½Li :

ð5:6Þ

i¼0

Therefore, the binding polynomial of a macromolecule with n ligand binding sites is a finite power series (nth-order polynomial) in the free ligand concentration, each term representing the relative concentration of a macromolecular species with a given number of bound ligands. Being P the partition function of the system, the thermodynamic parameters of the system are obtained from P as follows (Schellman, 1975; Wyman, 1964; Wyman and Gill, 1990):

nLB

! @ ln P ¼ RT @mL

¼ T;p

hDGi ¼ RT ln P ! @ ln P hDHi ¼ R @ð1=TÞ

p;½L

@ ln P @ ln½L ¼ RT2

! T;p

@ ln P @T

;

! p;½L

ð5:7Þ

131

Binding Polynomials in ITC

where mL is the chemical potential of the free ligand, and are the average excess molar Gibbs energy and enthalpy of binding at constant pressure, p, taking the unliganded macromolecule as the reference state. The last expression in Eq. (5.7) is equivalent to the Gibbs-Helmholtz equation. The temperature derivative of the association constants is evaluated using the van’t Hoff equation, which links the temperature derivative of a given association constant, bi, with the enthalpy change, DHi, associated with the equilibrium process. The fraction or population of each species, Fi, can be obtained from the expression of the binding polynomial:

Fi ¼

½MLi bi ½Li ; ¼ P ½MT

ð5:8Þ

and these populations exhibit two properties: (1) their sum is equal to 1; and (2) as the system is saturated with an increasing concentration of ligand, Fi reaches a maximum when nLB equals i (Wyman and Gill, 1990). The average number of ligand molecules bound per macromolecule and the excess molar thermodynamic parameters may be expressed as: n X hDGi ¼ RT ln bi ½Li

!

i¼0

n X bi ½Li DHi

hDHi ¼

i¼0

¼

P

nLB ¼

P

ð5:9Þ

i¼0

n X bi ½Li i i¼0

n X Fi DHi ;

¼

n X

Fi i ¼ hii

i¼0

where it can be clearly seen that and nLB are statistically weighted averages of the enthalpy of binding, DHi, and the number of ligand molecules bound, i.

3. Microscopic Constants and Cooperativity The overall and stepwise association constants are macroscopic association constants, and no mechanistic interpretation about the ligand binding can be inferred from them. Therefore, they are considered phenomenological

132

Ernesto Freire et al.

or model-free association constants. Besides macroscopic association constants, there is another type of association constants, the microscopic binding constants ki, related intrinsically with the ligand binding to the different binding sites, and therefore reflecting the intrinsic binding affinities to each site. In case of independent binding, they can be readily obtained from the macroscopic association constants, as the binding polynomial would factorize into n first-order polynomials with n negative real roots:

P¼

n X

bi ½Li ¼

i¼0

n Y ð1 þ ki ½LÞ:

ð5:10Þ

i¼1

This conclusion represents a particular case of the more general statistical thermodynamic result that the partition function of a system composed of independent subsystems is equal toQ the product of the partition functions for the independent subsystems (P ¼ Pi ). i

In the case of nonindependent (cooperative) binding, the binding polynomial would not factorize, and in addition to the microscopic binding constants, interaction or cooperative constants must be included in the description. Factorability of the binding polynomial into n first-order polynomials with n negative real roots is not guaranteed. Accordingly, even though it is always possible to estimate values for the macroscopic association constants, it is not always possible to extract microscopic intrinsic association constants in a straightforward manner (Krell et al., 2007; Tochtrop et al., 2002). The cooperative system will be defined in terms of microscopic and interaction constants mathematically related according to a specific model.

4. Independent or Cooperative Binding? In the analysis of systems with two or more binding sites, one of the most important questions is to assess whether the sites are independent of each other or whether cooperative interactions affect the ligand affinity of different sites. A qualitative analysis can be performed in a straightforward way once the overall association constants have been determined. For a macromolecule with n binding sites, a set of n-1 parameters, ri (i ¼ 2, . . . , n), can be calculated from the macroscopic association constants determined experimentally (Wyman and Gill, 1990; Wyman and Phillipson, 1974):

133

Binding Polynomials in ITC

ri ¼

bi n i

0 B bi1 B @ n i1

1 i i1 C C A

ð5:11Þ

;

with i ¼ 2, . . . , n. A r value of 1 indicates that the binding sites are identical and independent, because in such situation:

P¼

n X i¼0

n i bi ½L ¼ ð1 þ k½LÞ ) bi ¼ k; i i

n

ð5:12Þ

and every parameter ri is equal to 1. Any deviation from 1 in the r parameters indicates that the binding sites are not identical or that they behave cooperatively: (1) r values less than 1 indicate that not all the binding sites are identical or that they may exhibit negative cooperativity; and (2) r values greater than 1 indicate that the binding exhibit positive cooperativity. Thus, the binding polynomials provide a way to characterize a binding reaction in a similar way to that employed in differential scanning calorimetry in which the van’t Hoff-calorimetric enthalpy ratio is used for identifying the two-state or non-two-state character of the reaction.

5. Analysis of ITC Data Using Binding Polynomials The binding polynomial (Eqs. (5.5) and (5.6)) provides the starting point in the analysis of ITC data. As in the standard analysis, the total concentration of ligand is written as the sum of the concentrations of free and bound ligand, and expressed in terms of the binding polynomial:

½LT ¼ ½L þ ½LB ¼ ½L þ ½MT nLB ¼ ½L þ ½MT

@ ln P : ð5:13Þ @ ln½L

Eq. (5.13) is the basis for the analysis of the binding experiment; knowing the total concentrations of ligand and macromolecule, the values of the macroscopic association constants (bi) will determine the free ligand

134

Ernesto Freire et al.

concentration and the concentration of each complex. The values of the association constants are obtained through nonlinear least squares regression analysis of the experimental binding data. For any given system, the general analysis procedure follows the same lines used for specific models and consists of the following steps: (1) define the number of binding sites and corresponding binding polynomial; (2) calculate the total concentration of macromolecule and ligand for each experimental point; (3) solve (analytically or numerically) the ligand conservation equation for each experimental point, assuming certain values of the macroscopic association constants; (4) calculate the concentration of the different complexes for each experimental point, assuming certain values for the association constants; (5) calculate the expected signal, assuming certain values for the binding enthalpies, which are also floating parameters in the nonlinear least squares analysis; and (6) obtain the optimal set of macroscopic association constants that reproduce the experimental data using an iterative method. In ITC, the total concentrations of ligand and macromolecule in the calorimetric cell after the injection k are given by:

0

!k 1 v A ½LT;k ¼ ½L0 @1 1 V ; !k v ½MT;k ¼ ½M0 1 V

ð5:14Þ

where [M]0 and [L]0 are the initial macromolecule concentration in the cell and the concentration of ligand in the syringe, respectively, and V and v are the cell volume and the injection volume, respectively. The average excess molar enthalpy of the system can be calculated as previously mentioned, and the total accumulated heat until injection k is given by:

Qk ¼ V½MT;k hDHik ¼ V

n X

DHi ½MLi k :

ð5:15Þ

i¼1

Then, the heat effect associated to the injection k, qk, is calculated from the difference between the total heats after injection k and k 1, that is, it is proportional to the change in the concentration of each macromoleculeligand complex between injection k and k 1:

135

Binding Polynomials in ITC

v qk ¼ Qk Qk1 1 V v ¼ V ½MT;k hDHik ½MT;k1 hDHik1 1 V ; n X v DHi ½MLi k ½MLi k1 1 ¼V V i¼1

ð5:16Þ

where DHi is the enthalpy of formation of complex [MLi], and the concentration of each type of complex is calculated according to the fraction corresponding to each species (Eq. (5.8)). The values for the macroscopic association constants (bi) and the binding enthalpies (DHi) are obtained through nonlinear least squares regression analysis of the experimental binding data (qk).

6. A Typical Case: Macromolecule with Two Ligand-Binding Sites The binding polynomial for a macromolecule with two ligand binding sites is equal to (see Fig. 5.1):

P ¼ 1 þ b1 ½L þ b2 ½L2 :

ð5:17Þ

The average number of ligand molecules bound per macromolecule, nLB, and the average excess molar enthalpy, , are written in terms of the macroscopic association constants and binding enthalpies as follows: General model

Identical independent

Nonidentical independent

Identical cooperative

1

1

1

1

b1[L]

2k[L]

k1[L]+k2[L]

2k[L]

b 2[L]2

k2[L]2

k1k2[L]2

k k2[L]2

Figure 5.1 Scheme for computing the binding polynomial for a macromolecule with two binding sites. The different liganded states are shown with their statistical factor or relative concentration, taking the free macromolecule as the reference state. The sum of the different terms in each column provides the expression of the binding polynomial for each model.

136

Ernesto Freire et al.

nLB

b1 ½L þ 2b2 ½L2 ¼ ¼ F1 þ 2F2 1 þ b1 ½L þ b2 ½L2

b DH1 þ b2 ½L2 DH2 hDHi ¼ 1 ¼ F1 DH1 þ F2 DH2 1 þ b1 ½L þ b2 ½L2

:

ð5:18Þ

These expressions are completely general for any macromolecule with two ligand-binding sites. Even though the number of binding sites can be considered a fitting variable, it is not recommended unless absolutely necessary. If it is done, it is necessary to perform a statistical F-test to determine whether the improvement obtained by increasing the number of fitting parameters (two per binding site) actually reflects the nature of the system or the trivial fact of increasing the number of adjustable parameters. Often, in the analysis of ITC data the number of binding sites is considered as an adjustable parameter, yielding fractional values. In reality, the parameter being adjusted is the effective amount of active protein relative to the nominal value entered as protein concentration. In the analysis of ITC data using binding polynomials, the effective protein concentration needs to be adjusted to correctly represent the number of binding sites, which is an integer value. Once this is achieved, analysis of the ITC data provides accurate bi’s and DHi’s values. For a system with two binding sites there is only one cooperative parameter r:

r2 ¼

4b2 : b21

ð5:19Þ

If r2 is equal to 1, the binding sites are identical and independent. If r2 is less than 1, the binding sites can be either independent but different or identical but with negative cooperativity; and if r2 is greater than 1, the binding sites exhibit positive cooperativity. For cooperative binding the r2 parameter is equal to the cooperativity association constant k, as discussed subsequently. If the two binding sites behave independently (i.e., if the binding to one site does not influence the binding to the other site), the binding polynomial factorizes into two first-order polynomials, each corresponding to a binding site with a microscopic binding constant ki (see Fig. 5.1):

P ¼ ð1 þ k1 ½LÞð1 þ k2 ½LÞ ¼ 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2 : ð5:20Þ If the two binding sites are identical (equal thermodynamic microscopic binding parameters), then the binding polynomial simplifies to (see Fig. 5.1):

137

Binding Polynomials in ITC

P ¼ ð1 þ k½LÞ2 ¼ 1 þ 2k½L þ k2 ½L2 ;

ð5:21Þ

where the factor 2 in the second term represents the degeneracy of the state with one ligand bound. If two identical binding sites show cooperativity, then the binding polynomial will not factorize due to the presence of the cooperativity constant, k (see Fig. 5.1):

P ¼ 1 þ 2k½L þ kk2 ½L2 :

ð5:22Þ

The cooperativity constant reflects the energy penalty or gain due to simultaneous ligand binding to both binding sites. A value of k greater than 1 means positive cooperativity (for equivalent degrees of saturation the concentration of single liganded species is less than for independent binding), whereas a value of k less than 1 means negative cooperativity (the concentration of the single liganded species is higher than for independent binding). Each situation would correspond to a different model; however, the binding polynomial representation written in terms of macroscopic association constants is the same in both cases. This example illustrates the value and generality of the binding polynomials.

7. Data Analysis The equations described here have been implemented for an arbitrary number of binding sites in the analysis software distributed by manufacturers of isothermal titration calorimeters or can be employed as user-defined fitting functions using commercially available software. Sometimes, stepwise association constants are estimated (e.g., MicroCal) rather than the overall association constants discussed here. In those cases, Eq. (5.2) should be used to calculate them. Thermodynamic binding parameters (bi’s and DHi’s) are obtained through nonlinear least squares regression. Computer-simulated calorimetric titrations covering different representative situations for a macromolecule with two ligand-binding sites are shown in Fig. 5.2. Nonlinear least squares analysis of these titrations in terms of binding polynomials yields the following results:

(A) b1 ¼ 1.9107 M1, DH1 ¼10.0 kcal/mol, b2 ¼ 9.51013 M2, and DH2 ¼ 20.0 kcal/mol. (B) b1 ¼ 1.0107 M1, DH1 ¼10.0 kcal/mol, b2 ¼ 1.01012 M2, and DH2 ¼ 15.0 kcal/mol.

138

Ernesto Freire et al.

A

Time (min) 0

30

60

90

150

0

dQ/dt (mcal/s)

0.5

0.0

10.0

10.0

8.0 6.0 4.0 2.0 0.0

120

150

8.0 6.0 4.0 2.0 0.0

0

30

60

90

120

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

150 D

2.0

0

30

60

90

120

150

dQ/dt (mcal/s)

2.0

1.5 1.0 0.5

1.5 1.0 0.5

0.0

0.0

14.0 12.0 10.0 8.0 6.0 4.0 2.0 0.0

14.0 12.0 10.0 8.0 6.0 4.0 2.0 0.0

Q (kcal/mol of injectant)

dQ/dt (mcal/s)

90

0.5

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Q (kcal/mol of injectant)

60

1.0

0.0 Q (kcal/mol of injectant)

dQ/dt (mcal/s)

1.0

C

30

1.5

1.5

Q (kcal/mol of injectant)

Time (min)

B 120

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 [L]T/[M]T

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 [L]T/[M]T

Figure 5.2 Simulated titrations for a macromolecule with two binding sites. Four representative cases are illustrated, covering the different possibilities.

139

Binding Polynomials in ITC

(C) b1 ¼ 1.9107 M1, DH1 ¼10.0 kcal/mol, b2 ¼ 9.601014 M2, and DH2 ¼ 25.0 kcal/mol. (D) b1 ¼ 2.0107 M1, DH1 ¼10.0 kcal/mol, b2 ¼ 10.01012 M2, and DH2 ¼ 20.0 kcal/mol. Accordingly, the r2 values are 1.0 for (A), 0.04 for (B), 10.3 for (C), and 0.1 for (D). Therefore, the first case corresponds to two identical and independent binding sites, the cases (B) and (D) correspond to either two different independent binding sites, or two identical sites with negative cooperativity; and, case (C) corresponds to two binding sites with positive cooperativity. In order to inspect the system behavior during the titration, the populations of the different complexes are shown as a function of the binding saturation for each model (Fig. 5.3). Several characteristic can be highlighted: (1) the population of each liganded species, MLi, reaches a maximum when nLB ¼ i; (2) different independent binding sites behave similarly to identical binding sites with negative cooperativity; (3) different binding sites and identical binding sites with negative cooperativity exhibit a greater concentration of single-ligand bound macromolecules than independent binding for equal overall binding saturation; (4) identical binding sites with positive cooperativity exhibit a lower concentration of single-ligand bound macromolecules than independent binding for equal overall binding saturation.

1.0

A

0.5 0.0

Fi = [MLi]/[M]T

1.0

B

0.5 0.0 1.0

C

0.5 0.0 1.0

D

0.5 0.0 0.0

0.5

1.0 nLB

1.5

2.0

Figure 5.3 Fraction or population of each liganded species along the titrations shown in Fig. 5.2. The populations of free (solid), single-ligand bound (dashed), and doubleligand bound (dotted) macromolecules are calculated according to Eq. (5.8).

140

Ernesto Freire et al.

The binding polynomial P and the saturation function nLB for the titrations in Fig. 5.2 are shown in Fig. 5.4 as a function of the free ligand concentration. If the case for identical and independent sites is used as reference, a lower numerical value of the binding polynomial for different binding sites and negative cooperativity is observed. The opposite is A 106 105

P

104 103 102 101 100 1E-8

1E-7

1E-6 [L] (M)

1E-5

1E-4

1E-8

1E-7

1E-6 [L] (M)

1E-5

1E-4

B 2.0

nLB

1.5

1.0

0.5

0.0

Figure 5.4 (A) Binding polynomial as a function of the free ligand concentration for each model along the titrations shown in Figs. 5.2–5.3: identical and independent binding sites (solid), nonidentical and independent binding sites (dashed), identical binding sites with positive cooperativity (short dashed), identical binding sites with negative cooperativity (dotted). (B) Number of ligand molecules bound per macromolecule as a function of the free ligand concentration along the titrations shown in Figs. 5.2–5.3: identical and independent binding sites (solid), nonidentical and independent binding sites (dashed), identical binding sites with positive cooperativity (short dashed), identical binding sites with negative cooperativity (dotted).

Binding Polynomials in ITC

141

observed for the model with identical binding sites with positive cooperativity. This is an important conclusion because the binding polynomial can always be calculated by numerical integration of nLB (see Eq. (5.7)) and compared to the one expected for independent binding without the need for fitting the data. The binding saturation parameter nLB can be obtained by applying Eq. (5.7). In the case of different binding sites and negative cooperativity, two inflection points may be observed, indicating two binding events with significantly different binding affinity. If the two binding sites differ by less than a factor of ten in affinity, only one inflection point would be observed but with a broader transition toward saturation. The discussion can be extended by considering the binding capacity (@nLB/@ln[L]) and the slope of the Hill plot (log (nLB/(2 nLB)) versus log [L]) at half saturation (nLB ¼ 1) (Di Cera et al., 1988; Hill, 1910; Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990; see also the appendix). These two parameters take a value of 0.5 and 1, respectively, if the ligand-binding sites are identical and independent. The calculated values of these two parameters for the titrations shown in Fig. 5.2 are (A) 0.5 and 1; (B) 0.17 and 0.34; (C) 0.75 and 1.5; and (D) 0.24 and 0.48. The relative deviations of these two parameters from the values corresponding to the reference case (identical and independent binding sites) are (A) 0%, (B) 66%, (C) þ50%, and (D) 52%. If the relative change in the fractional population of the intermediate complex ML, is calculated at half saturation (Fig. 5.3), the same values are obtained: (A) 0%, (B) 66%, (C) þ50%, and (D) 52%. As demonstrated in the appendix, the experimentally accessible Hill slope or binding capacity can be used to estimate the population of intermediate liganded species and also the cooperative energy.

8. Data Interpretation As discussed earlier, the precise binding mechanism for systems with two or more binding sites is difficult to derive and usually requires extrathermodynamic information. Statistical fitting of the data to a model does not validate the appropriateness of the model. Even for hemoglobin, the most widely studied binding system in history, there are still lingering questions about the exact oxygen-binding mechanism (Holt and Ackers, 2005; Ackers and Holt, 2006). Even a macromolecule with two binding sites presents at least six different possible binding mechanisms: (1) two identical and independent binding sites, (2) two identical and negatively cooperative binding sites, (3) two identical and positively cooperative binding sites, (4) two nonidentical and independent binding sites, (5) two different and negatively cooperative binding sites, and (6) two different and positively cooperative

142

Ernesto Freire et al.

binding sites. Not all these cases are distinguishable experimentally (i.e., some give rise to exactly the same binding curve and extrathermodynamic information is required to elucidate the binding mechanism). For example, a macromolecule with two different binding sites exhibiting positive cooperative might resemble a macromolecule with two identical and independent binding sites, because both features will have compensating effects. Also, as mentioned earlier, a macromolecule with two different and independent binding sites is mathematically equivalent to a macromolecule with two identical and cooperative binding sites with negative cooperativity. This result can be demonstrated by considering Eqs. (20)–(22), and obtaining the mathematical relationship between k1, k2 , k, and k:

pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ k1 ¼ kð1 þ p1ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ kÞ : k2 ¼ kð1 1 kÞ

ð5:23Þ

If k ¼ 1, the system is noncooperative and corresponds to a macromolecule with two identical binding sites. If k < 1, the cooperative system is equivalent to a macromolecule with two different and independent binding sites. If k > 1, the cooperative system cannot be equated with a system with different sites. Consequently, the distinction between a macromolecule with different and independent binding sites and a macromolecule with identical and negatively cooperative binding sites cannot be made from binding data. Different binding models for a macromolecule with two binding sites are briefly reviewed in the following subsections.

8.1. Independent ligand binding: Two identical binding sites This case represents the reference model, and the other models may be considered deviations from independency. The binding polynomial for a macromolecule with two identical and independent binding sites is given by Eq. (5.21), from which the average number of ligand molecules bound per macromolecule is:

nLB ¼

2k½L þ 2k2 ½L2 ¼ F1 þ 2F2 ; 1 þ 2k½L þ k2 ½L2

ð5:24Þ

and the average excess molar enthalpy of the system is:

hDHi ¼

2k½LDh þ 2k2 ½L2 Dh ¼ F1 Dh þ F2 2Dh; 1 þ 2k½L þ k2 ½L2

ð5:25Þ

143

Binding Polynomials in ITC

where Dh is the binding enthalpy to any of the two binding sites. The concentration of any macromolecular state is calculated as follows:

½ML ¼ ½MT

2k½L ¼ ½MT F1 1 þ 2k½L þ k2 ½L2

k2 ½L2 ½ML2 ¼ ½MT ¼ ½MT F2 1 þ 2k½L þ k2 ½L2

:

ð5:26Þ

8.2. Independent ligand binding: Two nonidentical binding sites The binding polynomial for a macromolecule with two different and independent binding sites is given by Eq. (5.20), from which the average number of ligand molecules bound per macromolecule is:

nLB ¼

ðk1 þ k2 Þ½L þ 2k1 k2 ½L2 ¼ F1 þ F2 þ 2F12 ; 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2

ð5:27Þ

where F1 and F2 are the populations of macromolecule with only one ligand bound in either binding site (ML and LM), and F12 is the population of macromolecule with two ligands bound (LML). The average excess molar enthalpy of the system is:

k1 ½LDh1 þ k2 ½LDh2 þ k1 k2 ½L2 ðDh1 þ Dh2 Þ hDHi ¼ ; 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2 ¼ F1 Dh1 þ F2 Dh2 þ F12 ðDh1 þ Dh2 Þ

ð5:28Þ

where Dh1 and Dh2 are the binding enthalpies for the two binding sites. The concentration of any complex is calculated as follows:

k1 ½L ¼ ½MT F1 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2 k2 ½L ½LM ¼ ½MT ¼ ½MT F2 : 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2 ½ML ¼ ½MT

½LML ¼ ½MT

k1 k2 ½L2 ¼ ½MT F12 1 þ ðk1 þ k2 Þ½L þ k1 k2 ½L2

ð5:29Þ

144

Ernesto Freire et al.

8.3. Cooperative ligand binding: Two identical binding sites The binding polynomial for a macromolecule with two identical and independent binding sites is given by Eq. (5.22), from which the average number of ligand molecules bound per macromolecule is:

nLB ¼

2k½L þ 2kk2 ½L2 ¼ F1 þ 2F2 ; 1 þ 2k½L þ kk2 ½L2

ð5:30Þ

and the average excess molar enthalpy of the system is:

hDHi ¼

2k½LDh þ kk2 ½L2 ð2Dh þ DÞ 1 þ 2k½L þ kk½L2

ð5:31Þ

¼ F1 Dh þ F2 ð2Dh þ DÞ; where Dh and D are the binding enthalpy for any of the two binding sites and the cooperativity enthalpy, respectively. The concentration of any complex is calculated as follows:

½ML ¼ ½MT

2k½L ¼ ½MT F1 1 þ 2k½L þ kk2 ½L2

kk2 ½L2 ½LML ¼ ½MT ¼ ½MT F2 1 þ 2k½L þ kk2 ½L2

:

ð5:32Þ

The comparison of the binding polynomial and the average excess molar enthalpy written in terms of the overall and the microscopic parameters provides links between the overall and the microscopic parameters. In the case of a macromolecule with two identical and independent binding sites:

b1 b2 DH1 DH2

¼ 2k ¼ k2 ; ¼ Dh ¼ 2Dh

ð5:33Þ

and the microscopic parameters can be calculated from the overall parameters:

145

Binding Polynomials in ITC

b1 pﬃﬃﬃﬃﬃ ¼ b2 2 : DH2 Dh ¼ DH1 ¼ 2 k¼

ð5:34Þ

In the case of a macromolecule with two nonidentical and independent binding sites:

b1 ¼ k1 þ k2 b2 ¼ k1 k2 k1 Dh1 þ k2 Dh2 ; DH1 ¼ k1 þ k2 DH2 ¼ Dh1 þ Dh2

ð5:35Þ

and the microscopic parameters can be calculated from the overall parameters:

0 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ1 b1 @ 4b 1 þ 1 22 A k1 ¼ 2 b1 0 sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ1 b 4b k2 ¼ 1 @1 1 22 A 2 b1 : ðk1 þ k2 ÞDH1 k2 DH2 Dh1 ¼ k1 k2 k1 DH2 ðk1 þ k2 ÞDH1 Dh2 ¼ k1 k2

ð5:36Þ

Finally, in the case of a macromolecule with two identical and dependent binding sites:

b1 b2 DH1 DH2

¼ 2k ¼ kk2 ; ¼ Dh ¼ 2Dh þ D

ð5:37Þ

and the microscopic parameters can be calculated from the overall parameters:

146

Ernesto Freire et al.

b1 2 4b : k ¼ 22 b1 Dh ¼ DH1 D ¼ DH2 2DH1 k¼

ð5:38Þ

The preceding equations allow for estimation of the microscopic thermodynamic parameters from the macroscopic thermodynamic parameters once a specific model has been validated. Because model validation is sometimes difficult, it is advisable to publish macroscopic association constants (b’s) that have a universal validity.

9. An Experimental Example The binding of ferric ions to ovotransferrin is an example of a reaction that involves two sites. Fig. 5.5 shows a microcalorimetric titration of ovotransferrin with ferric ions, chelated with nitrilotriacetate, at 25 C, 100 mM HEPES, pH 7.5. Nonlinear least squares analysis of the experimental data using the binding polynomials formalism yields: b1 ¼ 1.7106 M1, DH1 ¼ 7.4 kcal/mol, b2 ¼ 1.11012 M2 and DH2 ¼ 12.2 kcal/mol. The calculated value for r2 is 1.5, suggesting positive cooperativity between the two sites, in agreement with earlier observations (Taniguchi et al., 1990). The enthalpy changes are also in good agreement with a previous calorimetric study (Lin et al., 1991). A cooperative index r2 of 1.5 is not very high and consistent with a cooperative energy of only 0.2 kcal/mol, and a degree of cooperativity of 10% (that is, the maximal population of the single liganded state ML is 45%, and the Hill slope at half saturation is 1.1). In fact, the data can also be fitted to a model of two different and independent sites, albeit with a slightly larger sum of squared residuals. Lin and coworkers (Lin et al., 1991) also noticed that the model of two different independent sites could not produce a good fit unless the number of sites was allowed to vary independently of each other. On the other hand, because the two binding sites are different, compensatory effects are possible (i.e., the positive cooperativity might be higher than the observed as a result of the masking effect of two sites with different intrinsic affinities). The results with ovotransferrin help to illustrate a frequently encountered situation. It is common for researchers to use a model with two different and independent binding sites as a starting point in the analysis of a system with

147

Binding Polynomials in ITC

Time (min) 0

30

60

90

0

1 2 3 [Fe+3-NTA]T/[OT]T

120

150

dQ/dt (mcal/s)

0.5 0.0 −0.5 −1.0

Q (kcal/mol of injectant)

−1.5 0.0 −2.0 −4.0 −6.0 −8.0

4

Figure 5.5 Binding of ferric ions, chelated with nitrilotriacetate, to ovotransferrin measured by isothermal titration calorimetry at 25 C using a high-precision VP-ITC titration calorimetric system from MicroCal, LLC (Northampton, MA). The upper panel shows the raw data for the titration of 1.4 mL of 25 mM ovotransferrin with 1.03 mM ferric ions in steps of 5 mL (100 mM HEPES, pH 7.5). The lower panel shows the integrated heats per mole of Fe3þ after subtraction of the heats of dilution (filled circles). The values for the affinities and enthalpy changes according to a binding polynomial for a molecule with two sites are obtained from a non-linear regression of the data (solid line).

two binding sites. This practice is dangerous because a model of different and independent binding sites cannot account for positive cooperativity. We recommend a fit with a binding polynomial as a starting point.

10. Experimental Situations from the Literature Other examples of ITC studies of macromolecules with two ligandbinding sites can be found in the literature. Several illustrative cases corresponding to different scenarios have been selected and are shown in Fig. 5.6. The published experimental titration data were digitally extracted and analyzed following the procedure outlined in this work.

148

Ernesto Freire et al.

B Q (kcal/mol of injectant)

Q (kcal/mol of injectant)

A 0

−5

−10 0

−5

−10

1 2 3 4 [a-methyl fucoside]T/[RSL]T

0

1 2 3 [Fe(III)-NTA]T/[HT]T

0

1 2 3 [TBPan]T/[d(T4G4T4G4)]T

4

D Q (kcal/mol of injectant)

C Q (kcal/mol of injectant)

0

6 4 2 0 0

1 2 3 4 [cAMP]T/[cAMP RP]T

5

4

0

−4 4

Figure 5.6 Experimental calorimetric titrations with macromolecules with two ligand-binding sites: (A) a–methyl fucoside binding to fucose-binding lectin, RSL, from Ralstonia solanacearum (data extracted from Kostlanova et al., 2005); (B) nitrilotriacetate-chelated ferric ion binding to human transferrin, HT (data extracted from Lin et al., 1993); (C) cAMP binding to cAMP receptor protein, cAMP RP, from Escherichia coli (data extracted from Gorshkova et al., 1995); (D) telomere binding alpha protein n-terminal domain, TBPan, from Oxytricha nova binding to a single-strand telomere fragment (data extracted from Buczek and Horvath, 2006). The values for the affinities and enthalpy changes according to a binding polynomial for a molecule with two sites are obtained from a nonlinear regression of the data (solid line), and r2 is calculated from bi values.

In Fig. 5.6A, RSL is a fucose-binding lectin from Ralstonia solanacearum, a gram-negative b-protobacterium causing lethal wilt in plants. RSL is a trimer, each monomer containing two carbohydrate-binding sites (Kostlanova et al., 2005). Nonlinear least squares analysis of the experimental data using the binding polynomials formalism yields: b1 ¼ 3.0106 M1, DH1 ¼ 10.1 kcal/mol, b2 ¼ 2.41012 M2 and DH2 ¼ 19.1 kcal/mol. Although the binding sites in each subunit are slightly different, they can be considered identical and independent for a–methyl fucoside binding (r2 ¼ 1.1). Data analysis with any model different from the identical and independent binding sites results in overparameterization and high parameter dependency.

Binding Polynomials in ITC

149

In Fig. 5.6B, human transferrin, HT, is structurally very similar to ovotransferrin: two different structural domains, each one containing an iron-binding site. Nitrilotriacetate-chelated ferric ion binding to human transferrin exhibits the same features as ovotransferrin (Lin et al., 1993). Nonlinear least squares analysis of the experimental data using the binding polynomials formalism yields: b1 ¼ 1.0107 M1, DH1 ¼ 9.5 kcal/mol, b2 ¼ 5.41013 M2 and DH2 ¼ 15.0 kcal/mol. As in the case of ovotransferrin, the two binding sites are slightly different and iron binding shows low positive cooperativity (r2 ¼ 2.2). This cooperative effect corresponds to a cooperative Gibbs energy of 0.5 kcal/mol, and a degree of cooperativity of 17% (i.e., the maximal population of the single liganded state ML is 41.5%, and the Hill slope at half saturation is 1.17). Analysis using a model of two different and independent sites is also possible but with a poorer fit as a result. Fig. 5.6C shows cAMP receptor protein, cAMP RP, a homodimeric protein that binds DNA after undergoing a conformational change induced by cAMP binding. Each identical subunit presents a cAMP binding domain and a DNA binding domain (Gorshkova et al., 1995). Nonlinear least squares analysis of the experimental data using the binding polynomials formalism yields: b1 ¼ 5.5104 M1, DH1 ¼ 1.9 kcal/mol, b2 ¼ 4.2109 M2 and DH2 ¼ 9.9 kcal/mol. cAMP binding to cAMP receptor protein shows positive cooperativity (r2 ¼ 5.4). This cooperative effect corresponds to a cooperative Gibbs energy of 1 kcal/mol, and a degree of cooperativity of 40% (i.e., the maximal population of the single liganded state ML is 30%, and the Hill slope at half saturation is 1.4). If the analysis is performed using the model with two nonidentical and independent binding sites, the result is a poorer fit with fractional stoichiometries. In this case, however, there is no reason for using such a model as the protein is known to be a homodimer with two binding sites that are identical in the absence of ligand. Fig. 5.6D shows how telomere-binding alpha protein n-terminal domain, TBPan, from the ciliate Oxytricha nova binds to single-strand DNA repeats at telomeres (Buczek and Horvath, 2006). Nonlinear least squares analysis of the experimental data using the binding polynomials formalism yields: b1 ¼ 2.5107 M1, DH1 ¼ 3.4 kcal/mol, b2 ¼ 3.31012 M2 and DH2 ¼ 2.5 kcal/mol. TBPan binding to a singlestrand telomere fragment d(T4G4T4G4) is consistent with either two binding sites in the fragment with negative cooperativity or two binding sites nonidentical and independent (r2 ¼ 0.022). If the cooperative model is assumed, the cooperative effect corresponds to a cooperative Gibbs energy of þ2.1 kcal/mol, and a degree of cooperativity of 74% (i.e., the maximal population of the single liganded state ML is 87%, and the Hill slope at half saturation is 0.26). If the model with two nonidentical and independent binding sites is assumed, the affinities of the two binding sites differ by a factor of 200, approximately. Given that the DNA fragment is not

150

Ernesto Freire et al.

completely symmetric (end effects), the model with nonidentical and independent binding sites would be preferred. However, given the small size of the DNA fragment, steric or other unfavorable interactions responsible for the negative cooperativity may arise when two proteins are bound to the same oligonucleotide.

11. Macromolecule with Three Ligand-Binding Sites As an additional example, three different possible models for a macromolecule with three ligand-binding sites are illustrated in Fig. 5.7. In this case, a set of two parameters, r2 and r3 (see Eq. (5.11)), provides information about the behavior of the binding sites in the macromolecule. The data analysis based on the binding polynomial formalism using overall association constants is fairly simple and straightforward. It requires solving a (n þ 1)th-order polynomial equation on the free ligand concentration, which can be easily done numerically (e.g., Newton-Raphson, secant, or bisection methods) for n 2 using commercially available software.

12. Conclusions The binding polynomial provides a general framework for describing ligand-binding equilibria to biological macromolecules using tools from statistical thermodynamics. The methodology can be easily applied to nearly General model

Identical independent

Nonidentical independent

Identical cooperative

1

1

1

1

b 1[L]

3k[L]

k1[L]+k2[L]+k3[L]

3k[L]

b 2[L]2

3k2[L]2

k1k2[L]2+k1k3[L]2 + k2k3[L]2

3k1k2[L]2

b 3[L]3

k3[L]3

k1k2k3[L]3

k13k2k3[L]3

Figure 5.7 Scheme for computing the binding polynomial for a macromolecule with three binding sites. The different liganded states are shown with their statistical factor or relative concentration taking the free macromolecule as the reference state. The sum of the different terms in each column provides the expression of the binding polynomial for each model.

151

Binding Polynomials in ITC

any kind of system. Binding experiments can be analyzed using a model-free methodology, which allows determination of phenomenological overall macroscopic association constants, bi, and binding enthalpies, DHi. The particular values of a set of parameters ri, provide information on the ligand binding process: identical or nonidentical, independent or cooperative binding sites. Once a binding model is developed, the microscopic binding parameters, ki and Dhi, and their relationships with the macroscopic binding parameters can be employed to describe in detail the ligand binding to the macromolecule. These relationships can be derived for more complicated cases using the procedures outlined in this chapter. Binding cooperativity in a macromolecule with several binding sites emerges as a result of interactions between ligand binding sites and/or a conformational equilibrium modulated by ligand binding between conformations with different ligand-binding affinities (Wyman and Gill, 1990). Cooperativity is reflected in a dependency of the thermodynamic binding parameters for each binding site on the occupancy of the other binding sites. Homotropic interactions occur if the binding sites bind the same type of ligand, whereas heterotropic interactions occur if the binding sites bind different types of ligand (Velazquez-Campoy et al., 2006). This chapter has addressed the description and analysis of homotropic interactions only. The description and analysis of heterotropic interactions is rather more complex, even for simple systems (Velazquez-Campoy et al., 2006).

Appendix For a protein with two binding sites the number of ligand molecules bound per macromolecule is given by:

nLB ¼

b1 ½L þ 2b2 ½L2 ; 1 þ b1 ½L þ b2 ½L2

ð5:39Þ

and the concentration of ligand required for achieving half saturation is given by:

1 ½LnLB ¼1 ¼ pﬃﬃﬃﬃﬃ : b2

ð5:40Þ

The fractional population of the complex ML is given by:

½ML b1 ½L ¼ ; ½MT 1 þ b1 ½L þ b2 ½L2

ð5:41Þ

152

Ernesto Freire et al.

which at half saturation takes the value:

pb1ﬃﬃﬃﬃ ½ML b2 : jnLB ¼1 ¼ ½MT 2 þ pb1ﬃﬃﬃﬃ

ð5:42Þ

b2

Traditionally, two indexes have been used for expressing numerically the degree of cooperativity: the binding capacity (@nLB/@ln[L])) (Di Cera et al., 1988; Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990) and the Hill slope, nH, which is the slope of the Hill plot (log (nLB/(2 nLB)) vs. log [L]) (Hill, 1910; Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990). The binding capacity is a measure of the ability of the macromolecule for accepting or delivering large quantities of ligand for relatively small changes in ligand concentration, and it is equal to the fluctuation ( 2) or variance in the number of ligand molecules bound to the macromolecule (Di Cera et al., 1988; Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990). The parameter nH is equal to the ratio between the observed binding capacity and the one corresponding to identical and independent binding sites (Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990). Both parameters represent a measure of the efficiency of the biochemical signal transduction (response of the system to changes in ligand concentration). Because the binding capacity and the Hill slope are functions of the ligand concentration, they are usually reported as values at half saturation. For a macromolecule with two ligand-binding sites, the binding capacity is given by:

@nLB b1 ½L þ 4b2 ½L2 þ b1 b2 ½L3 ¼ ; @ ln½L ð1 þ b1 ½L þ b2 ½L2 Þ2

ð5:43Þ

which takes values between 0 (maximal negative cooperativity) and 1 (maximal positive cooperativity), and at half saturation takes the value (Wyman, 1967):

@nLB 2 : ¼ @ ln½L nLB ¼1 2 þ pb1ﬃﬃﬃﬃ

ð5:44Þ

b2

The binding capacity is related to the Hill slope (Schellman, 1990; Wyman, 1964; Wyman and Gill, 1990):

@nLB nLB ¼ nH nLB 1 : @ ln½L 2

ð5:45Þ

153

Binding Polynomials in ITC

The Hill slope takes values between 0 (maximal negative cooperativity) and 2 (maximal positive cooperativity), and at half saturation it will be twice as big as the binding capacity:

@nLB @ ln½L

nLB ¼1

1 ¼ nH 2

:

ð5:46Þ

nLB ¼1

The binding capacity and the slope of the Hill plot at half saturation give qualitative and quantitative information on the ligand-binding process: (1) if the binding sites are identical and independent, these two parameters have values of 0.5 and 1, respectively; (2) if the binding sites are nonidentical and independent, or show negative cooperativity, they have values less than 0.5 and 1, respectively; and (3) if the binding sites show positive cooperativity, they have values greater than 0.5 and 1, respectively. In addition, the relative deviation of these two parameters from the values corresponding to identical and independent sites coincides with the relative deviation of the population of intermediate liganded states, as shown subsequently. The relative change in the fractional population of complex ML at half saturation taking as a reference the case with identical and independent binding sites is given by:

½ML ½MT n ¼1;identþindep LB

½ML ½M T

½ML ½M T

nLB ¼1;identþindep

nLB ¼1

2 pb1ﬃﬃﬃﬃ

pﬃﬃﬃ k1 ; ¼ ¼ pﬃﬃﬃ b1 kþ1 2 þ pﬃﬃﬃﬃ b2

ð5:47Þ

b2

where we have used the identity k ¼ 4b2/b12. On the other hand, the relative change in binding capacity (or the Hill slope, as they are proportional) of complex ML at half saturation taking as a reference the case with identical and independent binding sites is also: @nLB @nLB @ ln½L nLB ¼1 @ ln½L nLB ¼1;identþindep @nLB @ ln½L nLB ¼1;identþindep b1 2 pﬃﬃﬃﬃﬃ nH nH b2 pﬃﬃﬃ k1 nLB ¼1 nLB ¼1;identþindep : ¼ ¼ ¼ pﬃﬃﬃ b kþ1 1 nH 2 þ pﬃﬃﬃﬃﬃ nLB ¼1;identþindep b2

ð5:48Þ

Therefore, the relative change of the binding capacity or the Hill slope at half saturation, taking the values of 0.5 and 1.0 as reference values (for the case of

154

Ernesto Freire et al.

identical and independent binding sites), is equal to the relative change in the fractional population of the intermediate liganded state ML at half saturation, taking a fractional population of 50% as a reference value (for the case of identical and independent binding sites). Thus, the connection between cooperativity and changes in the population of intermediate states is quantitative (Eqs. (5.44)–(5.48)): positive cooperativity (k > 1) causes a reduction in the fractional population of ML, whereas negative cooperativity (k < 1) causes an increase in the fractional population of ML, from a 50% population at half saturation for identical and independent binding sites. For example, if a macromolecule with two ligand-binding sites exhibits a Hill slope of 1.35 at half saturation, then, the binding sites present 35% cooperativity, and the maximal population of the intermediate liganded state ML is 32.5% (0.65 50%) at half saturation. Furthermore, the cooperative association constant is k ¼ 4.3, and the corresponding cooperative Gibbs energy is 0.9 kcal/mol.

ACKNOWLEDGMENT We acknowledge financial support from grant SAF2004-07722 (Ministry of Education and Science) to A.V.-C., and grants from the National Institutes of Health (GM56550 and GM57144) and the National Science Foundation (MCB0641252) to E.F. A.V.-C. was supported by a Ramon y Cajal Research Contract from the Spanish Ministry of Science and Technology, and Fundacio´n Arago´n IþD (Diputacio´n General de Arago´n).

REFERENCES Ackers, G. K., and Holt, J. M. (2006). Asymmetric cooperativity in a symmetric tetramer: Human hemoglobin. J. Biol. Chem. 281, 11441–11443. Adair, G. S. (1925). The hemoglobin system, VI: The oxygen dissociation curve of hemoglobin. J. Biol. Chem. 63, 529–545. Buczek, P., and Horvath, M. P. (2006). Thermodynamic characterization of binding Oxytricha nova single strand telomere DNA with the alpha protein n-terminal domain. J. Mol. Biol. 359, 1217–1234. Di Cera, E., Gill, S. J., and Wyman, J. (1988). Binding capacity: Cooperativity and buffering in biopolymers. Proc. Natl. Acad. Sci. USA 85, 449–452. Gorshkova, I., Moore, J. L., McKenney, K. H., and Schwarz, F. P. (1995). Thermodynamics of cyclic nucleotide binding to the cAMP receptor protein and its T127L mutant. J. Biol. Chem. 270, 21679–21683. Hill, A. V. (1910). The possible effects of the aggregation of the molecules of hemoglobin on the dissociation curves. J. Physiol. (London) 40, iv–vii. Holt, J. M., and Ackers, G. K. (2005). Asymmetric distribution of cooperativity in the binding cascade of normal human hemoglobin. 2. Stepwise cooperative free energy. Biochemistry 44, 11939–11949. Kostlanova, N., Mitchell, E. P., Lortat-Jacob, H., Oscarson, S., Lahmann, M., GilboaGarber, N., Chambat, G., Wimmerova, M., and Imberty, A. (2005). The Fucosebinding lectinfrom Ralstonia solanacearum. A new type of b-propeller architecture formed by oligomerization and interacting with fucoside, fucosyllactose, and plant xyloglucan. J. Biol. Chem. 280, 27839–27849.

Binding Polynomials in ITC

155

Krell, T., Teran, W., Lopez-Mayorga, O., Rivas, G., Jimenez, M., Daniels, C., Molina-Henares, A. J., Martinez-Bueno, M., Gallegos, T., and Ramos, J. L. (2007). Optimization of the palindromic order of the TtgR operator enhances binding cooperativity. J. Mol. Biol. 369, 1188–1199. Lin, L.-N., Mason, A. B., Woodworth, R. C., and Brandts, J. F. (1991). Calorimetric studies of the binding of ferric ions to ovotransferrin and interactions between binding sites. Biochemistry 30, 11660–11669. Lin, L. N., Mason, A. B., Woodworth, R. C., and Brandts, J. F. (1993). Calorimetric studies of the binding of ferric ions to human serum transferrin. Biochemistry 32, 9398–9406. Schellman, J. (1975). Macromolecular binding. Biopolymers 14, 999–1018. Schellman, J. (1990). Fluctuation and linkage relations in macromolecular solution. Biopolymers 29, 215–224. Taniguchi, T., Ichimura, K., Kawashima, S., Yamamura, T., Tachi’iri, Y., Satake, K., and Kihara, H. (1990). Binding of Cu(II), Tb(III) and Fe(III) to chicken ovotransferrin. A kinetic study. Eur. Biophys. J. 18, 1–8. Tochtrop, G. P., Richter, K., Tang, C., Toner, J. J., Covey, D. F., and Cistola, D. P. (2002). Energetics by NMR: Site-specific binding in a positively cooperative system. Proc. Natl. Acad. Sci USA 99, 1847–1852. Velazquez-Campoy, A., Gon˜i, G., Peregrina, J. R., and Medina, M. (2006). Exact analysis of heterotropic interactions in proteins: Characterization of cooperative ligand binding by isothermal titration calorimetry. Biophys. J. 91, 1887–1904. Wyman, J. (1948). Heme proteins. Adv. Prot. Chem. 4, 407–531. Wyman, J. (1964). Linked functions and reciprocal effects in hemoglobin: A second look. Adv. Prot. Chem. 19, 223–286. Wyman, J. (1967). Allosteric lnkage. J. Am. Chem. Soc. 89, 2202–2218. Wyman, J., and Gill, S. J. (1990). ‘‘Binding and linkage: Functional chemistry of biological macromolecules.’’ University Science Books, Mill Valley, CA, USA. Wyman, J., and Phillipson, P. (1974). A probabilistic approach to cooperativity of ligand binding by a polyvalent molecule. Proc. Natl. Acad. Sci. USA 71, 3431–3434.

C H A P T E R

S I X

Kinetic and Equilibrium Analysis of the Myosin ATPase Enrique M. De La Cruz* and E. Michael Ostap† Contents 1. Introduction 2. Reagents and Equipment Used for all Assays 3. Steady-State ATPase Activity of Myosin 3.1. High salt ATPase activity of myosin 3.2. Actin-activated Mg2þ-ATPase activity of myosin 4. Steady-State Measurement of Actomyosin Binding Affinities 4.1. Sedimentation assays 4.2. Pyrene fluorescence measurements 5. Transient Kinetic Analysis of the Individual ATPase Cycle Transitions 5.1. Myosin binding to and dissociation from actin 5.2. ATP binding to actomyosin 5.3. ATP binding and hydrolysis by myosin 5.4. Actin-activated Pi release 5.5. ADP release 6. Kinetic Simulations Acknowledgments References

158 159 161 161 162 166 166 168 170 171 173 175 179 182 188 189 190

Abstract The myosin superfamily consists of more than 35 classes (each consisting of multiple isoforms) that have diverse cellular activities. The reaction pathway of the actin-activated myosin ATPase appears to be conserved for all myosin isoforms, but the rate and equilibrium constants that define the ATPase pathway vary significantly across the myosin superfamily, resulting in kinetic differences that that allow myosins to carry out diverse mechanical functions. Therefore, it is important to determine the lifetimes and relative populations of the key biochemical intermediates to obtain an understanding of a particular myosin’s cellular function. This chapter provides procedures for determining

* {

Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA Department of Physiology, Pennsylvania Muscle Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04206-7

#

2009 Elsevier Inc. All rights reserved.

157

158

Enrique M. De La Cruz and E. Michael Ostap

the overall and individual rate and equilibrium constants of the actomyosin ATPase cycle, including actomyosin binding and dissociation, ATP binding, ATP hydrolysis, phosphate release, and ADP release and binding. Many of the methods described in the chapter are applicable to the characterization of other ATPase enzymes.

1. Introduction Myosins are motor proteins that use ATP hydrolysis to generate force and power motility along actin filaments. The myosin superfamily consists of more than 35 classes, each consisting of multiple isoforms, which have diverse cellular activities (Berg et al., 2001; Foth et al., 2006). The reaction pathway for the actomyosin ATPase cycle appears to be conserved for all myosin isoforms (De La Cruz and Ostap, 2004); that is, the kinetic intermediates, and the order in which these intermediates are populated, are the same (Fig. 6.1). In the absence of ATP, myosin binds tightly to actin. ATP binding induces a conformational change in myosin that weakens its actin affinity and causes myosin to detach from actin. ATP is hydrolyzed to ADP and inorganic phosphate (Pi), and the hydrolysis products remain bound to myosin. Myosin rebinds to actin and the force generating power-stroke accompanies subsequent phosphate release. ADP is released, and the cycle repeats upon ATP binding (Bagshaw et al., 1974; Geeves and Holmes, 1999; Johnson and Taylor, 1978; Lymn and Taylor, 1971; Rosenfeld and Taylor, 1987). The rate and equilibrium constants that define the ATPase pathway vary significantly across the myosin superfamily, resulting in kinetic differences that allow myosins to carry out diverse mechanical functions (De La Cruz and Ostap, 2004; De La Cruz et al., 1999, 2001; El Mezgueldi et al., 2002; Section 3 and 4: steady-state ATPase and binding measurements. Section 5B.ATP binding and actomyosin dissociation

⬘ K1T A.M

Section 5D: Pi release

⬘ K2T

Section 5E: ADP release

A.M(ATP)

A.M.ATP

A.M.ADP.Pi

M.ATP

A.M.ADP

A.M(ADP)

KAP

Kdiss Section 5C:ATP hydrolysis

⬘ K1D

⬘ K2D

KPi⬘

KA

i

KH

M.ADP.Pi

A.M

Section 5A: actomyosin binding

M

Figure 6.1 The minimum actomyosin ATPase cycle reaction scheme. The individual reaction steps and corresponding sections in which they are described are boxed and colored. For clarity, we have omitted some of the biochemical intermediates that are not significantly populated during steady-state ATP cycling in the presence of actin. The rate and equilibrium constants are defined as the reaction proceeds from left to right.

Kinetic and Equilibrium Analysis of the Myosin ATPase

159

Kovacs et al., 2003). Therefore, it is important to determine the lifetimes and relative populations of the key biochemical intermediates to obtain an understanding of a particular myosin’s cellular role. The rates of the conversion between intermediates must be determined by transient kinetic analysis, and steady-state properties must be determined in terms of rate and equilibrium constants. This chapter provides procedures for determining the key rate and equilibrium constants of the actin-activated myosin ATPase outlined in Fig. 6.1. Single-molecule and actin-filament gliding assays provide valuable information regarding force generation and work output (i.e., motility) driven by myosin motors and can reveal how external loads affect specific ATPase cycle transitions (Laakso et al., 2008; Oguchi et al., 2008; Uemura et al., 2004; Veigel et al., 2003). However, we limit our discussion in this chapter to the experimental determination of the rate and equilibrium constants governing ATP utilization in solution, in the absence of external load. Although the assays described in the chapter are optimized for studying actomyosin interactions, many of the methods described are applicable to the characterization of other ATPase enzymes.

2. Reagents and Equipment Used for all Assays A. Solution conditions and temperature: We typically perform experiments in KMg50 buffer (50 mM KCl, 2 mM MgCl2, 1 mM EGTA, 2 mM dithiothreitol, and 10 mM imidazole, pH 7.0) at 25 C so that we can compare the behavior of different myosin types. Similar experimental conditions have been used to measure numerous non-muscle myosins including myosin-Is (El Mezgueldi et al., 2002; Lewis et al., 2006; Ostap and Pollard, 1996), myosin-III (Dose et al., 2007), chicken myosin V (De La Cruz et al., 1999), porcine myosin VI (De La Cruz et al., 2001; Robblee et al., 2004, 2005), and human myosin VIIb (Henn and De La Cruz, 2005), but the relatively high ionic strength makes experiments with some myosins difficult. B. Actin: Actin is purified from rabbit skeletal muscle (Spudich and Watt, 1971), pyrene-labeled with pyrenyl iodoacetamide as needed (Kouyama and Mihashi, 1981), gel-filtered over Sephacryl S-300HR equilibrated in G-buffer (5 mM Tris (pH 8.0), 0.2 mM ATP, 0.5 mM DTT, 1 mM NaN3, and 0.1 mM CaCl2) and stored at 4 C. The bound Ca2þ should be exchanged by adding 0.2 mM EGTA and 50 mM MgCl2 (excess over [total actin]) immediately prior to polymerization by dialysis against KMg50 buffer. The dialysis step polymerizes actin and also removes free ATP, which could interfere with some of the assays. It is therefore important to ensure proper dialysis so that the final polymerized actin sample has essentially no free ATP. Phalloidin (1.1 molar equivalents) should be added

160

Enrique M. De La Cruz and E. Michael Ostap

to stabilize polymerized actin filaments. This stock should be stored at 4 C and used within 2 days. C. Myosin: Kinetic assays typically use myosin fragments prepared by proteolysis or recombinant myosins that contain only the motor and regulatory domains. Myosins that form filaments or higher-order oligomers may be problematic due to protein aggregation and lack of solubility, which complicates data analysis. Some myosins have very high-nucleotide affinities, resulting in the copurification of ATP or ADP. Apyrase (grade VII, <0.1 units mL1) can be added to stock solutions to convert contaminating ADP and ATP to AMP. When using apyrase, ensure that low concentrations are used so as not to interfere with the kinetic measurements. D. Nucleotides: We prepare stock solutions of nucleotides from dry powder of the free acid form, adjust the pH to 7.0 with KOH and store as 20–100 mM solutions at 20 C or 80 C. Nucleotide concentrations are determined by absorbance using molar extinction coefficients (e) of 15,400 M1 cm1 at 259 nm for unlabelled (ATP and ADP), and 23,300 M1 cm1 at 255 nm for mant-labeled nucleotide (mantATP and mantADP, both mixed and single 20 -deoxy or 30 -deoxy isomers). We add a molar equivalent of MgCl2 to nucleotide solutions immediately before use. When characterizing myosins with high ADP affinities, it is critical to purify ATP from contaminating ADP (1%–2%) by HPLC (De La Cruz et al., 2000a). Similarly, it is critical to purify ATPgS from contaminating ADP generated by spontaneous hydrolysis immediately before use (Yengo et al., 2002). E. Fluorescence spectrophotometer: Equilibrium-binding measurements that monitor changes in fluorescence or light scattering require an instrument with fluorescence detection capabilities. Numerous instruments are commercially available, both for measuring samples in optical cuvettes or multiwelled plates. Instruments equipped with excitation and emission monochromators are best because they permit evaluation of changes in wavelength as well as intensities, but instruments equipped with optical filters are also adequate provided changes in intensity are significant. These and all instruments used for kinetic and equilibrium analysis should be temperature controlled, as many of the ATPase cycle rate and equilibriumbinding constants have significant enthalpic components and are therefore sensitive to temperature. F. Stopped-flow apparatus: Real-time acquisition of reaction time courses with millisecond time resolution is needed to measure the myosin ATPase cycle rate constants, as many of the experimentally observed and elementary rate constants are typically on the order of several hundred per second or faster. A variety of rapid mixers with absorbance and fluorescence detection are commercially available. Those driven by compressed air or stepper

Kinetic and Equilibrium Analysis of the Myosin ATPase

161

motors offer the most efficient perturbation and rapid mixing time. We have used instruments from KinTek, Applied Photophysics, and Hi-Tech, as well as in-house assembled instruments with satisfactory and reproducible results. Actin-activated Pi release from myosin is done following two mixing events (first with ATP then with actin; discussed subsequently) and requires an instrument capable of performing experiments in a sequential mixing configuration. Manually driven rapid mixers that can be used with absorbance or fluorescence spectrophotometers are also available (De La Cruz and Pollard, 1994), but they are subject to longer mixing and dead times and are therefore only adequate for measuring slow reactions (<100 s1). In addition, manually driven mixers require more material than the conventional stopped-flow instruments and are therefore not practical for characterization of myosin motors that can be purified in small quantities. G. Quenched-flow apparatus: Several rapid mixing chemical-quench-flow instruments are commercially available. As with the stopped-flow, millisecond time resolution is needed to measure rapid rates and rate constants. We use the KinTek Model RQF-3 instrument.

3. Steady-State ATPase Activity of Myosin 3.1. High salt ATPase activity of myosin The ATPase activities of some myosins are activated by the presence of high-salt and divalent cation chelators in the absence of actin (Chalovich and Eisenberg, 1982). These assays measure nonphysiological ATPase activities but are useful for determining the number of active myosins in a preparation and for measuring relative myosin concentrations in binding assays (e.g., section 4). The following procedure detects the steady-state hydrolysis of radioactive ATP and has the advantage of being insensitive to phosphate-contaminated myosin samples (Lynch et al., 1991). Nonradioactive methods for measuring phosphate are also available (Pollard, 1982), including a commercially available detection methods (Lanzetta et al., 1979; Webb, 1992). Stock Solutions 1. 2 Kþ/EDTA assay solution (1.0 M KCl, 30 mM Tris, pH 7.5, 10 mM EDTA) or 2 NH4þ/EDTA assay solution (0.8 M NH4Cl, 50 mM Tris, pH 7.5, 70 mM EDTA). 2. Isobutanol-benzene at 1:1 mixture 3. Silicotungstic-sulfuric acid. Mix 2 parts of 10 N sulfuric acid with 5 parts of 6% aqueous silicotungstic acid.

162

Enrique M. De La Cruz and E. Michael Ostap

Working solutions (freshly made) 1. Stop mixture. For each reaction tube, mix 1 mL of the isobutanolbenzene mixture with 0.25 mL of the silicotungstic-sulfuric acid mixture. This is most easily done using repipetter bottles. Aqueous and organic phases will form in each tube, with the organic phase on top. 2. Ammonium molybdate (10% solution). Dissolve 1 g of ammonium molybdate in 10 mL of water. 3. Prepare the ATPase assay solution by adding 1 mM [g-32P]ATP (1 Ci mol1) and diluting to 1, keeping in mind the volume of myosin to be added (see subsequent sections). Method 1. ATPase reaction: Equilibrate the ATPase reaction mixture from which time points will be removed to the experimental temperature. Initiate the reaction by adding myosin. 2. Time points: Every minute for 5 min, remove 100 mL of the ATPase reaction, quench by adding to a predispensed stop-mixture tube, vortex vigorously for 10 s, add 100 mL of the ammonium molybdate solution, and vortex for 10 s. 3. Separate phases: Separate the organic and aqueous phases by centrifuging briefly for 1 min at 1000g. This step minimizes background counts from unhydrolyzed ATP. 4. Determine phosphate concentration: Transfer 500 mL of the organic phase (the top phase that contains free phosphate) to a scintillation vial with the appropriate scintillation fluid. Determine the specific radioactive activity of the [g-32P]-ATP by adding aliquots of the ATPase reaction solution directly to scintillation vials. 5. Determine the ATPase rate: The slope of a plot of the phosphate concentration as a function of time yields the ATPase rate of the reaction mixture in units of Pi liberated per unit time, typically seconds. Dividing the value of the ATPase rate by the concentration of myosin, yields the concentrationnormalized ATPase rate in units of ATP hydrolyzed s1 myosin1.

3.2. Actin-activated Mg2þ-ATPase activity of myosin Determination of the actin-activated steady-state ATPase activity of myosin is the important first step in understanding the kinetic properties of the motor, and simply requires measuring the products of ATP hydrolysis (ADP or inorganic phosphate) as a function of time as explained for the high-salt ATPase activity of myosin described previously. When performed under the appropriate conditions, the assay reports the maximum rate at which myosin hydrolyses ATP in the absence of actin (no), the maximum ATPase

Kinetic and Equilibrium Analysis of the Myosin ATPase

163

rate of myosin in the presence of saturating actin (kcat), and the actin concentration-dependence of the myosin ATPase activity (KATPase). We prefer the NADH-coupled assay to measure the steady-state ATPase activity of myosin motors over other familiar detection methods including the colorimetric assay and radiolabeled ATP assay (discussed previously) because of the real-time detection, sensitivity, and regeneration of ATP from liberated ADP (see De La Cruz et al., 2000a). The assay relies on monitoring the change in absorbance or fluorescence that is coupled to the oxidation of NADH through a series of coupled enzymatic reactions. Pyruvate kinase converts phospho(enol )pyruvate (PEP) and the ADP generated from the steady-state ATP hydrolysis of myosin or actomyosin to ATP and pyruvate via a phosphoryl transfer reaction, which is subsequently converted to lactate by lactate dehydrogenase (LDH) in a reaction that is coupled to the oxidation of NADH to NADþ. NADH absorbs 340 nm of light, but NADþ does not; NADH is also fluorescent. Therefore, the NADH concentration can be readily monitored by absorbance or fluorescence. The overall reaction stoichiometry is such that one NADH molecule is consumed per ADP, permitting the concentration of ADP liberated, and therefore ATP hydrolyzed by myosin or actomyosin, to be readily determined from the loss of NADH. Stock solutions and instrumentation 1. Absorbance (or fluorescence) spectrophotometer equipped with timebased data acquisition 2. Lactate dehydrogenase (LDH; 4000 U mL1 in KMg50 containing 50% glycerol) 3. Pyruvate kinase (PK; 10,000 U mL1 KMg50 containing 50% glycerol) 4. Phospho(enol)pyruvate (PEP; 100 mM, pH adjusted to 7.0) 5. NADH (we use lyophilized 1-mg aliquots and prepare 5 cocktail solution in the vial) 6. 5 cocktail solution: KMg50 buffer supplemented with 1 mM NADH, 100 U mL1 LDH, 500 U mL1 PK, and 2.5 mM PEP. 7. Note, LDH, PK, and PEP can be stored at 20 C. NADH should be stored in the dark. Method 1. The method described is one for manual mixing using an absorbance (or fluorescence) spectrophotometer in time-based acquisition mode and optical microcuvette but can be easily adapted for other volumes or automated mixing using a stopped-flow apparatus. 2. In a cuvette, mix 20 mL of 5 cocktail solution with myosin (at a final concentration of 20–200 nM) with 60 mL of KMg50 buffer.

164

Enrique M. De La Cruz and E. Michael Ostap

3. Add 20 mL of 10 mM ATP in KMg50 to the cuvette and mix by pipetting. We prefer to aliquot large volumes because it makes it easier to mix by pipetting. 4. Start recording the time course of absorbance (or fluorescence) change at 340 nm. Continue for 100–200 s. The time course should be linear with a negative slope and starting value corresponding to 200 mM NADH and the optical path length. If the initial absorbance is low, it is possible that contaminating ADP in one of the solutions consumed the NADH before the start of the experiment. 5. Repeat steps 1–4 in the presence of a range of [actin]. Actin should be added in place of KMg50 buffer as stated in step 2. Be certain that the [myosin] is identical in all samples and that the [actin] be the only variable among samples. Although the ATPase rate of actin alone is typically negligible, it can be significant at high concentrations and should be accounted for when determining the true ATPase rate of myosin. 6. Use the extinction coefficient of NADH (e340 ¼ 6220 M1cm1) to convert absorbance at 340 nm to [ADP]. When using fluorescence detection, a standard curve with known ADP concentrations must be obtained. 7. Generate a plot of the time course of ADP production ([ADP] versus time) and fit to a linear function (Fig. 6.2A). The slope yields the steadystate ATPase rate in units of [ADP] per unit time, typically expressed in seconds ([ADP] s1). Divide this observed rate by the [myosin]. This normalized rate has units of ADP myosin1s1, but is often expressed in units of s1, as one ATP is hydrolyzed per catalytic cycle of myosin. 8. Generate a plot of the steady-state rate of ADP production (in units of [ADP] [myosin] 1 s1) versus the [actin]. A hallmark of myosin motors is that they are activated by binding to actin so the ATPase rate will increase as a function of the [actin]. The data should follow a rectangular hyperbola (Fig. 6.2B) and fitted to the Briggs-Haldane steady-state equation:

kcat ½Actin : Rate ¼ vo þ KATPase þ ½Actin

ð6:1Þ

The intercept is the ATPase rate of myosin alone in the absence of actin (n0), but it is usually indistinguishable from the origin and an unreliable measurement, and is best measured by a single turnover experiment (see De La Cruz et al., 1999). The kcat (determined from the best fit, not the data) is the maximum actin-activated ATPase rate of myosin (i.e., catalytic turnover number), KATPase is the concentration of actin needed to reach half maximal activation of myosin ATPase activity (i.e., apparent KM for actin). Note that reliable determination of the active myosin concentration is critical for determining the kcat and any uncertainties in myosin concentration or catalytically inactive motors

165

Kinetic and Equilibrium Analysis of the Myosin ATPase

[ATP] hydrolyzed (mM)

A

80 70

a

60

b

50

c

40 30 20

d e

10

f

0

0

20

40

ATPase rate (s−1)

B 8 7 6 5 4 3 2 1 0

0

5

10

60 80 Time (s)

15 20 25 [Actin] (mM)

100

30

35

120

40

Figure 6.2 Steady-state ATPase activity of myosin VI. (A) Time course of ATP turnover by 100 nM myosin VI at 16 (a), 8 (b), 4 (c), 2 (d), 1 (e), and 0 mM ( f ) actin filaments using the NADH-coupled assay. (B) Actin filament concentration dependence of the steady-state turnover rate of myosin VI. The solid line through the data points is the best fit to Eq. (6.1), with kcat ¼ 8.3 0.2 s1 and KATPase ¼ 2.8 0.3 mM. Data are from (De La Cruz et al., 2001).

will introduce uncertainties in kcat. If the data appear linear, then samples at higher [actin] are needed. If this cannot be achieved due to the high viscosity of the samples, lowering the ionic strength could lower the KATPase. Cases in which data points are acquired with the myosin concentration not at concentrations at least 10 times greater than that of actin should be fitted to a quadratic form of Eq. (6.1) (Henn et al., 2008). 9. Controls and troubleshooting. The final [ATP] according to this method is 2 mM. It is important that this be sufficient to saturate myosin and yield the maximum rate possible. Whether this condition is fulfilled can be evaluated by following the procedure described earlier, but measuring the KM for ATP from the [ATP]-dependence of the ATPase rate of myosin in the presence of saturating [actin]. When measuring the KATPase, the [ATP] in the samples should be >10 times greater than the KM for ATP. Similarly, when measuring the KM for ATP, the [actin] should be >10 times greater than the KATPase.

166

Enrique M. De La Cruz and E. Michael Ostap

It is essential to confirm that the overall coupled assay reaction is more rapid than the ATPase rate of myosin or actomyosin. This is done by simply adding a small volume of concentrated ADP and monitoring the change in absorbance or fluorescence. The rate of absorbance change reflects the rate at which the coupled assay can convert free ADP in solution to the observed spectroscopic signal change. This rate must be more rapid than the ATPase rate measured for (acto)myosin in order for the experimentally measured ATPase rate to accurately reflect that of (acto)myosin. Addition of ADP is also useful as a troubleshooting aid. If weak or no ATPase activity is detected, simply add ADP to the reaction mix. If there is a change in absorbance upon addition of ADP, then a lack of an observed ATPase activity is not due to a faulty reaction component in the assay mix or instrument configuration. Rather, the sample itself has little ATPase activity. Increasing the myosin concentration will accelerate the ATPase rate proportionally and could provide a reliable signal.

4. Steady-State Measurement of Actomyosin Binding Affinities The steady-state binding of actin and myosin is defined as:

A þ hMi>hAMi

with

Kd ¼

½A½hMi ; ½AhMi

ð6:2Þ

where, Kd is the dissociation equilibrium constant, and < > signifies a distribution of chemical states. The Kd value depends on the conformational state of the myosin (as determined by the bound nucleotide, scheme 6.1), so the overall, observed Kd during steady-state ATPase cycling depends on the affinities and distribution of the myosin intermediate states. The affinity of myosin for actin is highly dependent on the solution ionic-strength, and in many cases, binding experiments are performed at ionic strengths lower than the physiological condition to allow for experimentally measurable Kd’s. Actin sedimentation assays pioneered by Chalovich and Eisenberg (1982) and pyrene-actin fluorescence quenching assay first described by Kouyama and Mihashi (1981) and developed for myosin by (Geeves, 1989; Criddle et al., 1985) are the most commonly used techniques to measure steady-state actomyosin affinity.

4.1. Sedimentation assays Sedimentation assays quantitate the concentration of myosin that pellets with actin after high-speed centrifugation. This method is most useful for determining the effective binding constants of a population of myosin states, for example during steady-state ATP hydrolysis.

Kinetic and Equilibrium Analysis of the Myosin ATPase

167

Method 1. Experimental mix. Mix myosin at a constant concentration that is at least 50-fold lower than the Kd for the interaction and actin directly in a 200-mL TLA-100 centrifuge tube (Beckman). Actin concentrations are titrated to a final concentration that is 5-fold greater than the Kd. Appropriate control samples include (a) myosin in the absence of actin, (b) actin in the absence of myosin, and (c) actin and myosin in the absence of nucleotide. 2. Add nucleotide. Nucleotides that are hydrolyzed by myosin (e.g., ATP and ATP-g-S) should be added to the centrifuge tubes immediately before centrifugation. The actomyosin steady-state ATPase rate should be considered to ensure (a) the nucleotide substrate is not depleted during the experiment and (b) products of the ATP hydrolysis reaction do not affect the population of the steady-state intermediates. When assaying nucleotide analogues, one should ensure that the analogues are pure and do not contain contaminants that bind with a tighter affinity than the analog of interest. If the fraction of bound myosin is going to be determined via SDS-PAGE (see subsequent sections), a small amount of each sample should be saved before centrifugation (precentrifuged control). 3. Centrifuge. Centrifuge samples at 250,000g for 20 min at the appropriate temperature. Carefully remove supernatants from the tubes as they are taken out of the rotor. Keep track of the expected location of the protein pellets, as they may be difficult to see. Transfer supernatants to 1.5-mL microfuge tubes on ice without disturbing or contacting the pellet. Save the centrifuge tubes containing the pellet. 4. Detection of fraction of actin-bound myosin. The most common methods for determining the fraction of myosin bound to actin are (a) measurement of the ATPase activity of myosin in the supernatant and (b) resolution and quantitation via SDS-PAGE. a. ATPase activity. Measurement of the NH4/EDTA ATPase activity (section 3.1) provides a sensitive method for determining the concentration of myosin that remains in the supernatant after centrifugation and does not detect inactive myosins that may contaminate the protein preparation. The fraction of myosin bound ( fb) at each condition is determined by:

ATPase Activity of Supernatant : fb ¼ 1 ATPase Activity of no-Actin Supernatant

ð6:3Þ

b. SDS-PAGE. The protein pellets are resuspended in SDS-PAGE sample buffer to the original sample volume, heated to 100 C for 2 min and resolved by SDS-PAGE. Pelleted samples are resolved side by side with the precentrifuged controls (see No. 3, earlier). The fb is

168

Enrique M. De La Cruz and E. Michael Ostap

determined by dividing the quantity of myosin in the pellet by the quantity of myosin in the precentrifuged control. Myosin concentrations are determined by scanning stained gels, so protein standards of known concentration must be run on every gel to ensure linearity of detection. Coomassie blue is the most commonly used gel stain, and it is easily quantitated using a standard gel-documentation system or flatbed scanner. However, we found Sypro protein dyes to be more sensitive to low protein concentrations and more precisely determined when scanned with a fluorescence scanner (e.g., Typhoon Imager; Lin et al., 2005; Manceva et al., 2007). 5. Determination of the dissociation equilibrium-binding constant (Kd). Binding experiments are performed when the myosin concentration is kept at a constant concentration 10-fold lower than the Kd with the maximum actin concentration 5-fold the Kd. The actin affinities for different myosin isoforms vary >100-fold, so the protein concentrations depend on the isoform being investigated. Multiple experiments may be required to determine the appropriate concentration ranges. The Kd is determined by fitting the fraction bound ( fb) to the hyperbolic relationship:

fb ¼

½Actin : Kd þ ½Actin

ð6:4Þ

The Kd’s are determined by non-linear least-squares fitting using any of the widely available commercial or Web-based fitting programs. Linear transformations of the data are problematic in that they make error analysis difficult and, given the accessibility of computers and non–linear regression software, they are not recommended.

4.2. Pyrene fluorescence measurements The fluorescence of pyrene-labeled actin is linearly quenched by the binding of one myosin in the ‘‘strong binding’’ AM or AM.ADP state to one polymerized actin subunit (scheme 6.1). Pyrene-actin fluorescence is not quenched by binding of myosins in the ‘‘weak binding’’ AM.ATP or AM. ADP.Pi states (scheme 6.1). Thus, this method is best suited for (a) measuring that affinity of the strong AM or AM.ADP states, and (b) measuring the fraction of myosins in these states strong binding during steady-state ATPase cycling (De La Cruz et al., 2000b; Henn and De La Cruz, 2005). It is not suitable for measuring the total fractions of myosin bound during steady-state ATPase cycling.

Kinetic and Equilibrium Analysis of the Myosin ATPase

169

Method 1. Titration and detection of binding. Add pyrene-actin at the desired concentration to a fluorescence cuvette. Most fluorometers are very sensitive and detect linear changes of pyrene-actin fluorescence at concentrations <100 nM in a 100-mL cuvette. The excitation peak of pyrene-actin is 365 nm. Single-wavelength readings at 410 nm are sufficient for performing titrations, but scanning the emission wavelength for each sample between 375 nm and 450 nm is preferable, as it allows one to detect wavelength shifts and anomalous scattering due to air bubbles or protein aggregation. Experiments are performed by adding concentrated myosin solutions directly to the pyrene-actin-containing fluorescence cuvette or by preparing separate samples for each titration point. One must ensure that the actin concentration is not diluted as a result of myosin additions. Dilution of the sample is avoided by including pyrene-actin with the titrated myosin. The fractional saturation of pyrene-actin filaments ( fb) is calculated by:

fb ¼

ðFo FÞ ; ðFo F1 Þ

ð6:5Þ

where Fo is the fluorescence signal in the absence of myosin, and F1 is the signal at infinite myosin concentration. Determination of dissociation constant: Binding experiments are performed when the actin concentration is kept at a constant concentration of 10-fold lower than the Kd with the maximum myosin concentration of 5-fold the Kd. The actin concentration is kept constant to avoid changes in Fo and F1 during the titration. The Kd is determined by fitting a plot of fb versus myosin concentration to the hyperbolic relationship Eq. (6.4). If the AM and AM.ADP bind myosin with high affinity (Kd < 0.1 mM ), the pyrene-actin concentration (as defined previously) may be below the detection limit of the fluorometer. If this is the case, the pyrene-actin in the cuvette is increased to a concentration 2-fold lower than the Kd. The binding curve is no longer hyperbolically related to the myosin concentration under these conditions, but rather is fit using the following quadratic Eq. (6.6): fb ¼

½Ao þ ½Mo þ Kd

qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ½Ao 2 2½Ao ½Mo þ 2½Ao Kd þ ½Mo 2 þ 2½Mo Kd þ Kd2 2½Ao

;

ð6:6Þ

170

Enrique M. De La Cruz and E. Michael Ostap

where [Ao] and [Mo] are the total actin and myosin concentrations, respectively. It is important to ensure that the experiments are being performed at concentrations in which the curve is sensitive to the Kd.

5. Transient Kinetic Analysis of the Individual ATPase Cycle Transitions Steady-state characterization of the actin-activated myosin ATPase is an important step to understanding motor activity. Quantitative determination of the kcat and (apparent) KM for actin and ATP describe reliably the overall cycling behavior of myosin and can reveal valuable insight to the physiological role of various myosin isoforms. However, it is immediately obvious that the simplified two- to three-step reaction mechanism generally assumed for Michaelis-Menten or Briggs-Haldane steady-state kinetic analysis does not reliably account for the minimal ATPase cycle of (acto)myosin (scheme 6.1) and that steady-state characterization will not provide any information regarding the identity and distribution of transiently populated intermediates that play critical roles in contractility, force generation and motility, nor will it provide any information regarding the lifetimes of these intermediates. As a result, diversity in function and motor activities that arise from differential population of biochemical cycling intermediates will be transparent to steady-state characterization. For example, the steady-state ATPase activity of high duty ratio myosin are qualitatively similar to some low-duty-ratio myosins in that they are activated by micromolar actin concentrations (De La Cruz et al., 1999, 2001), yet they have dramatically different kinetic behaviors that confer unique motor activity and allow them to perform different functions. Limiting the kinetic analysis of these myosin motors to steady-state ATPase characterization would never have revealed their important differences. While steady-state characterization provides valuable information regarding the overall behavior of myosin, transient kinetics focuses on measuring the individual ATPase cycle reactions in isolation. The objective of transient kinetic analysis is to directly observe and identify biochemical intermediates populated during ATP cycling, determine the lifetime and distribution of these intermediates, and define the preferred reaction pathway through the ATPase cycle. A final goal of a thorough transient kinetic analysis is to account for the observed steady-state behavior and constants in terms of the fundamental rate and equilibrium constants of the ATPase cycle pathway shown in scheme 6.1. The transient kinetic approach is rather straightforward and requires only three things: (1) a signal, chemical (e.g., chemical cleavage of ATP) or optical (e.g., fluorescence, light scattering, absorbance, anisotropy),

171

Kinetic and Equilibrium Analysis of the Myosin ATPase

(2) rapid physical (temperature or pressure) or chemical (changing reactant concentrations) disruption of a system at equilibrium, and (3) observation of the time course of approach to the new equilibrium with appropriate time resolution. We describe in the following sections the signals to monitor for particular ATPase cycle reactions and focus our discussions on rapid mixing techniques in which perturbation arises from the rapid mixing of reactants in single (two solutions) or double (three solutions) mixing events and the time course to new equilibria are measured as a function of the reactant concentration. When performing rapid mixing, stopped-flow measurements, it is important to adjust the experimental conditions so that the concentration of one reactant is unchanged during the reaction. Otherwise, time courses of even simple, one-step reactions will deviate from single exponential behavior. This condition is easily achieved by ensuring that the concentration of one reactant is mixed at 10 times the concentration of the other so that the concentration at equilibrium of the reactant in excess will change by 10%. This condition is referred to as pseudo–first order condition and should be maintained whenever possible. It should be clarified that the concentration of the reactant in excess is used to determine the rate constants. Reaction time courses will follow single or a sum of exponentials and should be fitted to the following function describing a linear sum of exponentials:

SðtÞ ¼ S1 þ

n X

Ai eki t ;

ð6:7Þ

i¼1

where S(t) is the signal at time t, S1 is the final signal intensity, Ai is the amplitude, ki is the observed rate constant (kobs) characterizing the ith relaxation process, and n is the total number of observed kinetic phases. The value of n can vary among experiments but will usually be either one (single exponential) or two (double exponential). In some cases there will be an additional, linear component in the data. In this case, Eq. (6.7) will be modified to include a contribution to this linear phase (discussed subsequently).

5.1. Myosin binding to and dissociation from actin Determination of the rate constants of myosin binding to and dissociation from actin allows the calculation of the actomyosin rigor affinity (KA; Fig. 6.1). These rate constants also reveal information about myosin structural transitions that occur upon actin binding (e.g., De La Cruz et al., 1999; Taylor, 1991). Binding is most simply measured by monitoring the fluorescence of pyrene-actin upon mixing with myosin, and dissociation is

172

Enrique M. De La Cruz and E. Michael Ostap

measured by mixing pyrene-actin-myosin complexes with excess unlabeled actin. It is essential for proteins to be free of contaminating ADP or ATP when measuring binding of rigor myosin, so treatment of samples with apyrase is advisable (see previous sections). Inclusion of saturating ADP permits measuring myosin-ADP binding to actin. Myosin (and myosin-ADP) binding to actin can often be modeled as a simple one-step binding process: KA

A

þ M>A :M;

ðScheme 6:1Þ

where A** is the high fluorescence state and A* is the low fluorescence (quenched) state of pyrene actin. Although one-step binding is considered here, there is evidence for multistep (Henn and De La Cruz, 2005; Taylor, 1991) and additional actin-bound states for several myosins isoforms (Hannemann et al., 2005; McKillop and Geeves, 1993; Rosenfeld et al., 2000). Method 1. Myosin binding to pyrene-actin. Myosin (syringe A) is mixed with phalloidin-stabilized pyrene-actin (syringe B) in a stopped-flow fluorometer. Fluorescence time courses are acquired at multiple pyrene-actin concentrations. The myosin concentration after mixing is typically 0.1 mM. To satisfy pseudo–first order conditions, the actin concentration should be 10-fold in excess of the myosin concentration. To retain adequate signal-to-noise in the fluorescence transients, the myosin concentration may have to be increased as the actin concentration is increased. 2. Dissociation of myosin from pyrene-actin. Myosin bound to pyreneactin is mixed with 50- to 100-fold higher concentrations of unlabeled actin in a stopped-flow fluorometer. The pyrene-actomyosin concentrations is typically 0.5 mM after mixing. 3. Data analysis. Fluorescence time courses of myosin binding to actin are usually best fit to single exponential functions, yielding actin concentration dependent kobs. A linear fit of a plot of kobs versus actin concentration yields an apparent second-order rate constant for binding in units of M1s1, which range from 1 106 M1s1 to 1 107 M1s1. Many myosins have a hyperbolic dependence of kobs on the actin concentration, indicating two step binding (see Taylor, 1991). Fluorescence time courses of the dissociation of myosin from pyreneactin are usually best fit by a single exponential function. The dissociation rate constant varies significantly among myosins, ranging from 0.1 s1 for muscle myosins to <0.01 s1 for nonmuscle myosins.

173

Kinetic and Equilibrium Analysis of the Myosin ATPase

5.2. ATP binding to actomyosin ATP binding weakens the affinity of myosin for actin and dissociates the complex (Lymn and Taylor, 1971). ATP binding to all myosins (M) occurs in at least two steps with formation of a nonspecific collision complex, AM (ATP), that exists in rapid equilibrium (K1T0 ) with nucleotide-free actomyosin (AM ), followed by an isomerization (K2T0 ) to a state (AMT) that dissociates rapidly from actin (kdiss): K1T 0

kþ2T 0

A:M þ ATP > A:MðATPÞ >

k2T 0

A:M:ATP "#

Kdiss

ðScheme 6:2Þ

M:ATP þ A The three of the most efficient methods to measure ATP binding to actomyosin are (1) ATP-dependent dissociation of myosin from pyrenelabeled actin filaments, (2) monitoring the reduction in light scattering that occurs when myosin dissociates from (unlabeled) actin filaments, and (3) the fluorescence enhancement associated with binding of the nucleotide analogue mantATP (Hiratsuka, 1983). MantATP is valuable because one monitors the nucleotide and/or myosin (via energy transfer to mant from tryptophans at the nucleotide binding site) directly but can be used over a limited concentration range. Pyrene fluorescence provides the best signal-to-noise ratio and a broad ATP concentration working range (up to 20 mM). However, it has the disadvantage that it cannot distinguish between weakly bound, attached states (scheme 6.1) and those that are detached from actin. Light scattering is also very sensitive and has the advantages that it monitors actin attachment, regardless if myosin is strongly or weakly bound (scheme 6.1), and does not require fluorescent modification of actin or nucleotide. The combination of the three methods will identify if fluorescent modification of actin or nucleotides disrupts the intrinsic properties of the natural, unmodified substrates. We limit our discussion to the use of pyrene actin fluorescence or light scattering, as the detection methods are most sensitive and can be used to measure ADP binding (discussed subsequently). Method Determination of equilibrium/completely detached intensity: Before starting an experiment, the scattering or fluorescence intensity of actin alone. Rapidly mix (pyrene) actin (0.2–1 mM ) with KMg50 buffer in a stoppedflow (lex ¼ 365 nm, emission measured through a 400-nm long-pass filter for pyrene; light scattering can measured at 300–320 nm). The time course should be relatively flat. The signal intensity reflects that of actin

174

Enrique M. De La Cruz and E. Michael Ostap

alone after mixing and will be the final, equilibrium intensity of the experimental ATP-induced actomyosin dissociation transients, provided all of the bound myosin dissociates from actin upon addition of ATP. It should be noted that not all of the myosin need dissociate in the presence of ATP and an equilibrium signal that differs from actin alone can be indicative of physiological function. This behavior is a characteristic of high duty ratio motors that remain attached to actin (when present even at low micromolar concentrations) in the presence of ATP (De La Cruz et al., 1999, 2000b, 2001; Henn and De La Cruz, 2005). However, inactive myosin heads that bind but do not dissociate will also yield this behavior, so independent measurements of myosin activity should be acquired. Determination of starting point: (Pyrene) actomyosin (0.2–1 mM at a 1:1 stoichiometry; syringe A) is rapidly mixed with buffer (syringe B). This establishes the low pyrene fluorescence or high light-scattering intensity from which kinetic transients should originate. Experiment: Acquire time courses of fluorescence change after mixing with a range of ATP concentrations (0–several mM ). Data analysis and interpretation: Time courses of pyrene fluorescence enhancement (Fig. 6.3A), mantATP fluorescence, and reduction in light scattering (Fig. 6.3A) usually follow a single exponential with observed rate constants that depend hyperbolically on the ATP concentration (Fig. 6.3B) and should be fitted to the following equation:

kobs

KIT 0 ½ATP kþ2T 0 : ¼ 1 þ KIT 0 ½ATP

ð6:8Þ

The best fit parameters yield K1T0 (as an association equilibrium constant in units of M1) and the maximum isomerization rate constant achieved at saturating ATP (kþ2T0 ). The values of K1T0 and kþ2T0 have significance. Myosins that bind ATP weakly and slowly (i.e., myosins I (Coluccio and Geeves, 1999; Lewis et al., 2006) and VI (De La Cruz et al., 2001; Robblee et al., 2005) do so because the observed K1T0 is very large and weak. Consequently, physiological ATP concentrations can be nonsaturating. Formation of a (nucleotide-free) actomyosin state that cannot bind nucleotide (closed versus open) contributes to the weak K1T0 of myosins I and VI. Actin-attached myosin VI dimers communicate allosterically by shifting K1T0 to favor the open state, which accelerates ATP binding (Robblee et al., 2004). Intramolecular load can also affect the kþ2T0 value of some myosins (Robblee et al., 2004). There are some cases in which ATP binding to actomyosin does not follow a single exponential. Deviations from a single exponential including double exponential behavior (Coluccio and Geeves, 1999; Lewis et al., 2006) and lag phases (Robblee et al., 2005) can be indicative of an

175

Kinetic and Equilibrium Analysis of the Myosin ATPase

A

Signal intensity

Pyrene-actin

Light scattering

0

0.02

B

0.06 0.04 Time (s)

0.08

0.1

kobs (s−1)

400 300 200

Pyrene-actomyosin

100

ATP

0

0

500

1000 1500 2000 2500 3000 ATP (mM)

Figure 6.3 ATP-induced dissociation of pyrene-actomyosin. (A) Pyrene fluorescence and light-scattering transients obtained by mixing 0.25 mM actomyo1e with 100 mM ATP. Solids lines are single exponential fits (Eq. (6.7)) to determine kobs. (B) ATP concentration dependence of kobs determined from pyrene-fluorescence () and light scattering transients (▪) is shown for the full range of ATP concentrations. Solid lines are fits to Eq. (6.8). Data are from (El Mezgueldi et al., 2002).

isomerization between a closed actomyosin state that cannot bind nucleotide and an open one that can. But one must be careful when interpreting data, as contaminating nucleotide could also yield deviations from a single exponential (De La Cruz et al., 2001).

5.3. ATP binding and hydrolysis by myosin ATP hydrolysis (KH) is the biochemical transition closely linked to the structural change that results in the reverse power stroke, or repriming, step. The rate and equilibrium constants for this transition are most commonly determined in the absence of actin, because (a) ATP is normally hydrolyzed when myosins are detached from actin during normal cycling (scheme 6.1), and (b) the posthydrolysis M.ADP.Pi state has a dramatically longer lifetime in the absence of actin, due to slow phosphate release (kþPi0 ), which simplifies experimental design and interpretation ( Johnson and Taylor, 1978; Lymn and Taylor, 1970). The two most common methods for

176

Enrique M. De La Cruz and E. Michael Ostap

measuring the rate of ATP hydrolysis are detection of changes in the intrinsic tryptophan fluorescence of myosin using stopped flow, and the detection inorganic phosphate production using quenched flow. 5.3.1. Intrinsic tryptophan fluorescence ATP binding to some myosins that contain a tryptophan at position 512 results in an enhancement in the intrinsic myosin fluorescence as follows: K1T K2T

kþH

M þ ATP > ½M:ATP or M :ATP > M

:ADP:Pi !; kH

ðScheme 6:3Þ where the M* and M** are enhanced fluorescence states of myosin. The magnitude and origin of intrinsic tryptophan fluorescence enhancement is myosin-isoform dependent. For example, the fluorescence of skeletal and smooth-muscle myosin II increases upon population of the M.ATP state and increases further upon population of the M.ADP.Pi state. Vertebrate myosin I, Dictyostelium myosin II, and vertebrate myosin V have fluorescence enhancements that only correlate with the population of the M.ADP. Pi state, while the fluorescence of myosins VI is relatively insensitive to nucleotide binding (De La Cruz et al., 2001). Method 1. Myosin (syringe A) is rapidly mixed with ATP (syringe B) in a stoppedflow fluorometer (lex ¼ 280–295 nm, emission measured through a 320-nm long-pass filter). The required myosin concentration depends on the size of the signal change upon binding of nucleotide and on the sensitivity of the instrument. For most published investigations, 0.1– 0.5 mM has been sufficient. Myosin must be free of contaminants, particularly contaminants that contain tryptophans that contribute to the fluorescence signal. The myosin preparation must also be homogeneous, otherwise the fluorescence transient may contain multiple components that are difficult to interpret. 2. Acquire time courses of fluorescence change at multiple ATP concentrations. Fluorescence time courses are acquired at concentrations from 1 mM to 1–2 mM ATP. 3. Data analysis. Most myosins bind ATP rapidly and irreversibly, and release phosphate very slowly in the absence of actin. Therefore, interpretation of the ATP dependence of the fluorescence time courses is straightforward, in that the reaction can be considered a two step pathway, where an irreversible ATP-binding step is followed by a reversible first-order hydrolysis reaction. The two common cases for interpretation and analysis of the fluorescence signal are as follows.

Kinetic and Equilibrium Analysis of the Myosin ATPase

177

a. Case 1: Individual high fluorescence state When the ATP-induced change in the fluorescence originates from the population of the M.ADP.Pi state, the fluorescence time-course at each ATP concentration is best fit by a single exponential function Eq. (6.7). A plot of kobs versus the ATP concentration is hyperbolic, with the maximum value of kobs corresponding to the sum of the forward and reverse rates of ATP hydrolysis (kþH þ kH). A linear fit of the plot at low ATP concentrations yields the apparent secondorder rate constant for ATP binding (K1T0 kþ2T0 ). b. Case 2: Multiple high fluorescence states. Data analysis is more complicated when the fluorescence signal in the presence of ATP is the linear combination of multiple conformational states. The best characterized example is skeletal-muscle myosin II, where the M*.ATP state has a fluorescent enhancement intermediate between the M and M**.ADP.Pi state ( Johnson and Taylor, 1978). As the ATP concentration is increased, the fluorescence time course will be best fit to the sum of two exponentials, where kobs1 reports the rate of population of the M*.ATP state and kobs2 reports the rate of population of the M**.ADP.Pi state. A linear fit of kobs1 versus the ATP concentration reveals the apparent secondorder rate constant for ATP binding, and the maximum rate of kobs2 yields kþH þ kH. At high ATP concentrations (>500 mM), the time course of the population of the M*.ATP state might be too fast (>1000 s1) to be recorded by most stopped-flow instruments, resulting in the resolution of only a signal that is best fit by a single exponential function, corresponding to kþH þ kH. 5.3.2. Quench flow The rate of ATP hydrolysis and the equilibrium constant are determined by measuring the time dependence of phosphate production using a quenchflow apparatus. In this technique, myosin is mixed with ATP, aged for a specified time, and then quenched with acid, which denatures the myosin and stops the ATPase reaction. Because myosin is denatured, phosphate that was sequestered in the active site in the M.ADP.Pi state is also measured. Quenched-flow experiments are more labor-intensive than stoppedflow, as a single time course requires phosphate determinations from multiple time points at relatively high myosin concentrations. However, the advantage is that it is a direct measurement of ATP hydrolysis. Method 1. Myosin is mixed with [g-32P]-ATP (1 Ci mol1) in a quenched-flow instrument. When using the KinTek RQF-3 instrument, 35 mL of myosin is mixed with 35 mL of [g-32P]-ATP, the reaction is aged for a specified time, then mixed with quenching solution (2 M HCl,

178

Enrique M. De La Cruz and E. Michael Ostap

0.35 mM NaH2PO4). The myosin concentration is typically >1 mM after mixing, with higher concentrations being better. The ATP concentration should be high enough so that rate of ATP binding does not limit the rate of ATP hydrolysis. For example, if the rate of ATP binding is 5 106 M1s1, and the rate of ATP hydrolysis is 50 s1, one would select an ATP concentration of at least 25 mM. 2. Acquire multiple time points. Enough points are acquired to resolve the time-course of ATP hydrolysis. For example, if ATP hydrolysis occurs at a rate of 50 s1, points every 10 ms are acquired for 100–150 ms. Longer time courses are acquired to resolve the steady-state phase of ATP hydrolysis. Quenched time points are kept on ice, and the free phosphate concentration is determined as soon as possible to minimize acid hydrolysis of the ATP. 3. Determine phosphate concentration. There are multiple methods for determining the free phosphate concentration, including thin layer chromatography (Gilbert et al., 1995; Henn et al., 2008) and the molybdate method described earlier. We prefer the method developed by White (White and Rayment, 1993), in which equal volumes of the quenched reaction are mixed with a 10% activated charcoal slurry in quench solution and centrifuged at 12,000g. The supernatant contains phosphate, and the charcoal fraction contains ATP and ADP. The supernatant and a volume of the total reaction mix are separately added to scintillation vials and counted. The radioactivity counted from each time point is normalized against the total counts in the total reaction mix to account for pipetting errors. 4. Data analysis. Phosphate release from most myosins in the absence of actin (kþPi0 ) is very slow, so time courses of phosphate concentration formed is composed of burst and linear phases (Fig. 6.4). The burst phase reports formation of the M.ADP.Pi state, and the linear phase reports the rate of steady-state ATP turnover. When the phosphate concentration is normalized by dividing by the myosin concentration, the time course is fit by:

½Pi ¼ ½Myosin

KH ð1 ekobs t Þ þ ðkss tÞ; 1 þ KH

ð6:9Þ

where kobs ¼ (kþH þ k-H) and kss is the steady-state turnover rate. The burst amplitude is given by (KH/(1 þ KH)). In most cases, kss is slow and can be ignored. Knowing KH and kobs, one can calculate the hydrolysis (kþH) and ATP resynthesis (k-H) rate constants:

kþH ¼

KH kobs 1 þ KH

and

kH ¼ kobs kþH :

ð6:10Þ

179

Kinetic and Equilibrium Analysis of the Myosin ATPase

0.8

[Pi]/[myosin]

0.7 Linear phase

0.6 0.5 0.4

Myosin

0.3 0.2

ATP Burst phase

0.1 0

0

0.05

Quench

0.1 Time (s)

0.15

0.2

Figure 6.4 Time course of ATP hydrolysis and ADP-Pi burst of myosin V. Time course of ADP-Pi formation by a single-headed myosin V construct after mixing with 100 mM ATP. The solid line is the best fit of the data to Eq. (6.9) with an observed rate constant of 84 13 s1 and burst amplitude of 0.43 0.03 Pi/myosin. Data are from (De La Cruz et al., 2000b).

5.4. Actin-activated Pi release Multiple methods for measuring Pi are available, including the molybdate and charcoal extraction assays described earlier. These assays are not suited for real-time measurements and offer poor time resolution because they are typically done by manual mixing. In addition, the assays involve denaturation of the myosin and determination of total Pi formed, and can therefore not distinguish bound Pi from free Pi released to the solution. The MESG/ phosphorylase assay (Webb, 1992) is advantageous in that it provides realtime acquisition of free Pi in solution that can be monitored by absorbance. However, it has a sensitivity of 1 mM and can only measure rates up to 90 s1 when the purine nucleoside phosphorylase enzyme is present at very high concentrations (>50 mM ). While this is adequate for many experimental systems, it is too slow for measuring actin-activated Pi release from many myosin isoforms, which can be 100 s1 (De La Cruz et al., 1999; White et al., 1997). Actin-activated Pi release from myosin-ADP-Pi can be rapid (>100 s1) and be measured only using the fluorescently labeled mutant of the Pi-binding protein (MDCC-labeled PiBiP; (7-diethylamino-3-((((2-maleimidyl)ethyl)amino)carbonyl) coumarin)-labeled phosphate binding protein) developed by Martin Webb (Brune et al., 1994; White et al., 1997) with the stopped flow with the instrument in sequential mixing mode. PiBiP has the advantage over other detection methods in its sensitivity (10 nM) and ability to measure rapid rates and rate constants (>700 s1) in real time, though it can be difficult if significant Pi contaminates the solutions and glassware (which it always does). Background Pi must be removed from all solutions, syringes and the instrument with Pi ‘‘mop’’ solution: 7-methylguanosine (0.2–0.5 mM ) and purine nucleoside phosphorylase (0.1 units mL1), for at

180

Enrique M. De La Cruz and E. Michael Ostap

least 1 h. We treat the instrument with mop solution overnight before performing an experiment. To accurately measure transient Pi release from actomyosin, the ATP binding (K1kþ2) and hydrolysis rate constant (kþH þ k-H) must be known. The experiment is done by first mixing myosin with ATP under single or multiple turnover conditions, ageing for sufficient time to allow ATP binding and hydrolysis to occur (typically ms–s), and then rapidly mixing with a range of actin filament concentrations in the second mix. There is 5-fold enhancement in the fluorescence of PiBiP with Pi binding (lex ¼ 430 nm, 455 nm long pass emission filter). PiBiP should be present at 5–10 mM and preferably included in the myosin, nucleotide, and actin solutions so that Pi binding to PiBiP is more rapid than Pi release from (acto)myosin. The rate and equilibrium constants of Pi binding to MDCCPiBiP in KMg50 buffer and 25 C are: kþ ¼ 117 ( 8) mM1 s1,k- ¼ 24 s1, and Kd ¼ 0.20 mM (Henn and De La Cruz, 2005). Method 1. Configure the stopped-flow instrument into sequential (i.e., double mixing) mixing mode 2. Treat instrument with mop solution 3. First mix and aging time: Rapidly mix myosin (syringe A) with ATP (syringe B) and age for sufficient time that ATP binding and hydrolysis (but not Pi release from myosin) occur. The myosin concentration needed will depend on the enzymatic behavior of the myosin, particularly the equilibrium constant for ATP hydrolysis (KH). In our experience, an initial mix with 4 mM myosin is a good starting point if KH favors the post hydrolysis states. If the value of KH is such that a significant fraction of the myosin-bound nucleotide will remain as ATP, higher myosin concentrations will be needed. The ATP concentration can be less than (single turnover) or greater than (multiple turnover) that of myosin. Myosin motors that bind ADP rapidly and with high affinity can be forced to undergo a single turnover (De La Cruz et al., 2001; De La Cruz et al., 1999) even when [ATP] >> [myosin] by including excess (mM) ADP in the actin syringe (discussed subsequently). 4. Second mix: Rapidly mix the aged myosin-ATP/ADP.Pi with actin (syringe C) over a broad concentration range (0–tens of micromolar). If the myosin motor being characterized binds ADP strongly and rapidly, including mM ADP with the actin solution will ensure that ATP will not bind myosin after the initial round of product release and that a single turnover of Pi release will be measured. 5. Convert fluorescence intensity to Pi concentration by acquiring a standard curve. 6. Data analysis: Experimental time courses will follow single or double exponentials under single turnover conditions and single (or double)

181

Kinetic and Equilibrium Analysis of the Myosin ATPase

exponentials with a slope (steady-state) under multiple turnover conditions in the presence of actin. Time courses in the absence of actin will appear flat over the seconds timescale, as Pi release from myosin alone is very slow (0.02 s1). We will limit our discussion of the analysis to time courses that follow either a single exponential or a single exponential with a steady-state slope (Fig. 6.5A) but refer the reader to notable exceptions (White et al., 1997). Plot the observed rate constant of the exponential phase versus the [actin]. The [actin] dependence should either be linear or hyperbolic (Fig. 6.5B). If linear, the slope yields the second-order association rate constant of myosin-ADP-Pi binding to actin (KAPikþPi0 ; scheme 6.1) and the maximum rate of actin-activated Pi release (kþPi0 ) is more rapid than the fastest experimentally observed rate constant. If hyperbolic, the maximum observed rate constant reflects the rate of Pi release from actomyosin (kþPi0 ; Pi rebinding, kPi0 , does not contribute since Pi release is irreversible in the presence of PiBP) and the [actin] needed to reach half of kþPi0 reflects the A Fluorescence intensity

a b

c 0.08

0 B

0.16 Time (s)

0.24

0.32

60

kobs (s−1)

50 40 30

Myosin

20

ATP

10 0

Actin

0

5

10

15 20 25 [Actin] (mM)

30

35

40

Figure 6.5 Pi release from actomyosin VI. (A) Time course of transient Pi release from a truncated myosin VI construct after mixing with 20 (curve a), 9 (curve b), or 0 mM (curve c) actin filaments. (B) Actin filament concentration dependence of the Pi release burst rate. Final concentrations at t ¼ 0 were 1.5 mM myosin VI, 4.5 mM PiBiP, 100 mM ATP, and the indicated actin filament concentrations. Data are from (De La Cruz et al., 2001).

182

Enrique M. De La Cruz and E. Michael Ostap

dissociation constant of myosin-ADP-Pi binding to actin (KAPi). The slope under multiple turnover conditions reflects the steady-state ATPase rate activity at the actin and ATP concentrations present and should compare to that measured by other methods.

5.5. ADP release The most reliable and informative way to measure ADP binding to actomyosin is by evaluating how it affects the pyrene fluorescence enhancement associated with ATP binding (described previously). The fluorescence of mantADP can also be monitored, as described for mantATP earlier, but it is less sensitive than the pyrene fluorescence assays we describe subsequently. ADP binding to pyrene actomyosin is not associated with a fluorescence change, so competition approaches are used to obtain the kinetic and equilibrium binding parameters. There are two different ways to design an experiment, both of which involve keeping the [ATP] constant and varying the [ADP]. One approach is to measure how ATP binds to a preequilibrated mixture of actomyosin and ADP (i.e., mix actomyosin ADP with a solution of ATP). In this case, the [ADP] equilibrated with actomyosin is varied (Geeves, 1989). The second is to see how actomyosin responds to addition of a solution containing both ATP and ADP (i.e., mix actomyosin with a solution of ATP and ADP). The observed time courses will follow single or multiple exponentials in either case, depending on the ADP (and ATP) binding properties (Hannemann et al., 2005). Measuring exactly how the time courses vary with [ADP] permits determination of the binding mechanism and constants. We note that performing the ADP/ATP competition experiments both ways can serve as a useful diagnostic tool. If ADP binds actomyosin in a rapid equilibrium, time courses of pyrene fluorescence enhancement will follow single exponentials with observed rate constants that become slower as [ADP] increases regardless of how the mixing is done. If ADP dissociation from actomyosin is slow and not a rapid equilibrium, time courses will follow multiple exponentials at [ADP] that are Kd for ADP binding and single exponentials at high (i.e., saturating) [ADP], regardless of how the mixing is done. Subsequently we describe how time courses could behave for each experiment and how to analyze each possible case. 5.5.1. ATP binding to an equilibrated mixture of actomyosin and ADP The rate of ADP release, and the affinity of ADP for actomyosin is most commonly determined by measuring the rate of ATP-induced dissociation of myosin from pyrene-actin in the presence of ADP as given by the following scheme:

183

Kinetic and Equilibrium Analysis of the Myosin ATPase

A :M:ADP

0

K2D K1D

>

0

k0

A :M þ ATP ! A

þ M:ATP ðScheme 6:4Þ

The nucleotide concentrations and the method of data analysis depends on the affinity of ADP for the actomyosin complex. Two approaches are described subsequently. Case 1: This procedure is used for myosins that have relatively weak ADP affinities. Experimental conditions are set so the rates of binding and dissociation of ADP are rapid compared with the rate of ATP binding. Method 1. Determination of ADP affinity. Pyrene-actomyosin (1.0 mM ) equilibrated with ADP is mixed with ATP in a stopped-flow fluorometer. Final ADP and ATP concentrations depend on the myosin properties, but 50 mM ATP and 0–500 mM ADP are suitable starting concentrations. 2. Fluorescence intensities of transients increase due to the dissociation of myosin from pyrene-actin, and the transients should fit a single exponential function to obtain a rate (kobs). If the time course is best fit by two exponential rates, see Case 2. A plot of kobs versus ADP concentration should be hyperbolic (Fig. 6.6), where kobs is related to the ADP concentration by:

kobs ¼

k0 1 þ K½ADP 0 0 2D K1D

;

ð6:11Þ

where k0 is the observed rate constant of ATP binding and actomyosin dissociation in the absence of ADP at a given ATP concentration (K1T0 kþ2T0 [ATP]; Fig. 6.1), and K2D0 K1D0 is the overall dissociation constant for ADP binding to actomyosin (Fig. 6.1). 3. Determination of the ADP dissociation rate constant (kþ2D0 ). Pyreneactomyosin (1.0 mM) equilibrated with ADP is mixed with ATP in a stopped-flow fluorometer. The ADP concentration should be high enough that all myosin active-sties have a bound ADP. For example, if the Kd is 20 mM, the ADP concentration should be 100 mM. The ATP concentration should be high enough to displace the ATP. In this example, 1 mM ATP is sufficient. The fluorescence time course should fit a single exponential function, where the rate is equal to rate of ADP dissociation (Fig. 6.1). The rate of ADP association can now be calculated by dividing the dissociation rate by the Kd determined earlier.

184

Enrique M. De La Cruz and E. Michael Ostap

Pyrene-actin fluorescence

A

ADP release (kslow)

ATP binding to nucleotide-free myosin (kfast)

0.001

0.01

0.1 Time (s)

1

10

B Aslow (normalized)

1.0 0.8 0.6

Pyrene-actomyosin.ADP

0.4

ATP

0.2

0

5

10

20 25 15 [ADP] (mM)

30

35

Figure 6.6 ADP dissociation determined by ATP-induced dissociation of pyrene-actomyo1b. (A) Pyrene fluorescence transient obtained by mixing 0.15 mM myo1b equilibrated with 2 mM ADP with 1 mM ATP. The time course is presented on a log scale to show the slow and fast exponential phases. (B) Normalized amplitude of the slow phase obtained by fitting pyrene transients to double exponential functions (Eq. (6.7)) as a function of ADP concentration. The solid line is a fit of the data to Eq. (6.12). Data are from (Lewis et al., 2006).

Case 2: This procedure is used for myosins that have tight ADP affinities. Experimental conditions are set so the rate of ADP dissociation is slow compared with the rate of ATP binding. Method 1. Pyrene-actomyosin (0.20 mM ) equilibrated with ADP is mixed with ATP in a stopped-flow fluorometer, and the pyrene fluorescence is monitored as a function of time. Final ADP and ATP concentrations depend on the myosin properties, but 1.0 mM ATP and 0–50 mM ADP are suitable starting concentrations. The fluorescence of the time courses increases due to ATP-induced dissociation of myosin from pyrene-actin. At low ADP concentrations, the data should be best fit by the sum of two exponential rate functions (Eq. (6.7)). The observed rate constant of the fast component (kfast) reports

Kinetic and Equilibrium Analysis of the Myosin ATPase

185

rapid binding of ATP to the fraction of nucleotide-free pyrene-actomyosin (Fig. 6.6A). The slow observed rate constant (kslow) reports ADP release. At high ADP concentrations, the pyrene-actomyosin should be saturated with ADP, and the transient is dominated by the slow component. A plot of the relative amplitude of the slow component (Aslow) versus the ADP concentration is hyperbolic (Fig. 6.6B), and the overall ADP affinity (K2D0 K1D0 ) is obtained by fitting the data to:

Aslow ¼

½ADP : K2D K1D 0 þ ½ADP 0

ð6:12Þ

5.5.2. ATP and ADP binding to actomyosin ADP binding can also be measured by kinetic competition in which a solution of ATP and ADP race to bind free actomyosin. In this case, nucleotide-free actomyosin is rapidly mixed with solutions of ATP supplemented with ADP. The ADP concentration is varied over a broad range. As described for the experiments earlier, time courses will follow single or double exponentials depending on the ADP-binding mechanism and constants. We discuss both possible cases and how to analyze the data to extract the actomyosin-ADP binding constants. Case 1. Time courses follow single exponentials with kobs that gets slower with [ADP]. If time courses of fluorescence change after mixing pyrene actomyosin with a solution of ATP and ADP follow single exponentials at all [ADP] examined, ADP binds in a rapid equilibrium. As observed for mixing ATP to a pre-equilibrated actomyosin and ADP solution, the observed rate constants should become slower as [ADP] increases. Identical results are obtained by both mixing methods because ADP equilibrates rapidly with actomyosin during the mixing time in this experiment. That is, equilibrium between actomyosin and ADP is reached during the mixing time as it would if the sample were allowed to equilibrate before mixing with ATP. The actomyosin-ADP binding affinity can be determined as described earlier. Case 2. Time courses follow double exponentials. When ADP release is slower than ATP binding and not in rapid equilibrium, time courses of pyrene fluorescence enhancement after mixing a solution of ADP and ATP actomyosin are biphasic and follow double exponentials (Figs. 6.7A–7B) with fast and slow phases that depend on the [ADP] when the [ATP] is held constant (Figs. 6.7C–7D). The method simply involves mixing actomyosin with a solution of ATP in which ADP is varied over a broad concentration range. The [ADP] dependence of the fast phase observed rate constant may depend hyperbolically on the [ADP] (Fig. 6.7C), indicating that ADP binding, like ATP binding occurs via a two-step binding process and that

186

Enrique M. De La Cruz and E. Michael Ostap

C 1.0

a b

0.8

c

0.6 0.4

d

0.2

e

Kfast (s−1)

Fraction weakly bound

A

f

0.0 0

0.04 Time (s)

0.02

0.06

0.08

B

D 1.0

a

0.8

b

Kslow (s−1)

Fraction weakly bound

320 300 280 260 240 220 200 180 160

c

0.6

d

0.4 0.2

e

0.0 0.001

0.01

0.1 1 Time (s)

10

100

1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

0

50

100 150 200 [ADP] (mM)

250

300

0

50

100 150 200 [ADP] (mM)

250

300

Figure 6.7 ADP binding and dissociation from actomyosin VIIb measured by kinetic competition with ATP. (A) Time courses of fluorescence enhancement after mixing 0.1 mM pyrene actomyosin VIIb with 100 mM ATP supplemented with either no Mg ADP (curve a), 10 mM (curve b) 20 mM (curve c) 50 mM (curve d) or 300 mM (curve e) Mg ADP (curve b, c, d, e). Curve f is actomyosin VIIb mixed with no nucleotide. Concentrations are final after mixing. Smooth lines through the data represent best fits to a double exponential. (B) Time courses shown on a logarithmic time scale. (C) [ADP]-dependence of the fast observed rate constants measured by kinetic competition. The solid line is the best fit to Eq. (6.14). (D) [ADP]-dependence of the slow phase observed rate constant. The solid line is the best fit to Eq. (6.15). Data are from (Henn and De La Cruz, 2005).

competitive ATP and ADP binding to actomyosin (AM) can be described by the following parallel reaction mechanism: K1T

0

0

"# K1D A :MðADPÞ 0 kþ2D "# k2D 0 A :M:ADP

kþ2T

0

A :M þ ATP > A :MðATPÞ > A

:M:ATP;

k2T 0

ðScheme 6:5Þ

where A** denotes a high (unquenched) pyrene fluorescence and the parentheses indicate collision complexes in rapid equilibrium with

187

Kinetic and Equilibrium Analysis of the Myosin ATPase

dissociated species. K1T0 denotes an association constant and K1D0 denotes a dissociation constant to reflect progression through the ATPase cycle (Fig. 6.1). Note that this nomenclature differs from the papers in which this method was described (Henn and De La Cruz, 2005; Olivares et al., 2006; Robblee et al., 2004, 2005). The observed rate constant of the fast phase (kfast) reflects the depletion of free actomyosin and therefore depends on the sum of the observed rate constants for ATP (kATP) and ADP (kADP) binding, which can be expressed as: 0

kATP ¼

K1T ½ATPkþ2T

0

and

1 þ K1T 0 ½ATP þ ½ADP K 0 1D

kADP ¼

½ADPk2D K1D 0

ð6:13Þ

0

1 þ K1T 0 ½ATP þ ½ADP K 0

;

1D

when nucleotide binding is irreversible which is fulfilled when nucleotide dissociation is slower than binding. The [ADP]-dependence of the fast phase observed rate constants (kfast) should be fitted to a rectangular hyperbola in the form of the following expression: 2D K1T ½ATPkþ2T þ ½ADPk K 0 0

kfast ¼

0

1D

0

1 þ K1T ½ATP þ

½ADP K1D 0

0

;

ð6:14Þ

with the ATP binding constants (K1T0 and kþ2T0 ) constrained to the values obtained independently from ATP binding experiments (described previously). These constraints allow the ADP binding parameters (K1D0 and k-2D0 ) to be readily obtained. The slow phase of the reaction arises from actomyosin-ADP formed from kinetic partitioning during the fast phase that subsequently dissociates bound ADP then binds ATP to populate the high fluorescence, weak binding states. The [ADP]-dependence of the slow phase observed rate constant (kslow) also follows a hyperbola (Fig. 6.7D), but with negative amplitude (i.e., it becomes slower as [ADP] increases). The slow observed rate constant is equal to the rate constant of ADP dissociation (kþ2D0 ) times the probability that ATP will bind instead of ADP (kATP/(kATP þ kADP)) and should be fitted to the following equation with the ATP-binding parameters (K1T0 and kþ2T0 ) constrained, as when fitting the fast phase (described previously):

188

Enrique M. De La Cruz and E. Michael Ostap

kslow ¼

kþ2D 0 kATP : kATP þ kADP

ð6:15Þ

When kATP >> kADP (e.g., when [ADP] approaches zero and/or [ATP] >> [ADP]), ADP rebinding (kADP) is insignificant; ADP release is essentially irreversible; and kslow simplifies to kþ2D0 . The rate constant of ADP release from actomyosin (kþ2D0 ) can therefore be readily obtained by extrapolating the best fit of kslow versus [ADP] to the limit of [ADP] ¼ 0 (i.e., the intercept, Fig. 6.6D). The overall ADP binding affinity is given by the product of both equilibrium constants (K1D0 K2D0 ) and can be determined from the values of K1D0 , k-2D0 (obtained from fast-phase analysis), and kþ2D0 (obtained from slow-phase analysis). The final amplitudes reflect the equilibrium partitioning among strong and weak binding states as dictated by the nucleotide binding affinities and concentrations (Henn and De La Cruz, 2005; Robblee et al., 2004).

6. Kinetic Simulations We have focused most of this chapter on designing and carrying out steady-state binding and transient kinetic experiments to directly measure the individual myosin ATPase cycle reactions and analyzing the concentration-dependence of the observed behavior by non–linear regression to extract the fundamental rate and equilibrium constants. There are often instances in which individual ATPase cycle reactions cannot be measured and/or experimental conditions required for the fitting equations to apply are not fulfilled (i.e., pseudo–first order conditions). In these cases, one must rely on kinetic simulations and global fitting to analyze the experimental data. Although analyzing experimental data with analytical solutions of the rate equations is ideal, deriving them can be complex and labor intensive (Henn et al., 2008; Johnson, 1986), particularly if analysis of the amplitudes is desired (Hannemann et al., 2005). Kinetic simulations can therefore be viewed as a practical and extremely valuable alternative. We would argue that kinetic simulations should be considered an essential part of any transient kinetic analysis, at a minimum to confirm that the derived model and associated binding constants reliably account for the experimentally observed data, including the amplitudes. We also regularly use kinetic simulations to help design experiments (e.g., what concentrations and timescales to collect data). Because of space limitations, we will not discuss the use of kinetic simulations and global fitting in the kinetic analysis of myosin motors.

Kinetic and Equilibrium Analysis of the Myosin ATPase

189

Rather, we direct the reader to some key papers in which kinetic simulations have been used to characterize complex reaction pathways (Frieden, 1983; Moore and Lohman, 1994), authoritative reviews (Frieden, 1994), and tutorials (Wachsstock and Pollard, 1994), and we recommend various kinetic simulation programs that can be readily accessed for free through the Internet. We also remind readers that it is very unlikely that the constants used to fit a complex, multistep mechanism are a unique solution to the data, as only a small subset of the kinetic parameters may influence the observed signal of a given experiment. Therefore, constraining measured constants to the experimentally determined values will minimize the number of unknown fitting parameters and increase the likelihood of reliably identifying and characterizing unobservable chemical transitions. Most kinetic simulation programs available are modern, user-friendly programs based on the original KINSIM program developed by Carl Frieden and colleagues (Dang and Frieden, 1997). These programs simulate reaction time courses of a molecular mechanism provided by the user by deriving and numerically solving the differential equations for the concentrations and flux of all chemical species identified in the mechanism. KINSIM has a companion program, FITSIM, which permits fitting mechanism parameters to real data. In general, we use KINSIM to identify plausible mechanisms that can account for the experimental data (and eliminate many that cannot) then use FITSIM to fit the data and extract the rate and equilibrium constants that best account for the data according to the defined reaction mechanism. The more recent kinetic simulation programs incorporate both simulation and fitting modules into a single program. We recommend KINSIM/FITSIM (www.biochem.wustl.edu/ cflab/message.html), Tenua (http://www.geocities.com/tenua4java/), and Dynafit (http://www.biokin.com/dynafit/). We also recommend the KinTek Global Kinetic Explorer (http://www.kintek-corp.com/) and Berkeley-Madonna (http://www.berkeleymadonna.com/), but these must be purchased to access the complete software package with importing and saving options.

ACKNOWLEDGMENTS We gratefully acknowledge support from the various funding agencies that support the research activities of our laboratories. E.M.D.L.C. thanks the National Institutes of Health for supporting myosin research activities under award GM071688, and the National Science Foundation (MCB-0546353), the American Heart Association Grant (0655849T), and the Hellman Family Foundation for supporting other research. E.M.D.L.C. is an American Heart Association Established Investigator (0940075N) and recipient of a National Science Foundation CAREER Award. E.M.O. is supported by grants from the National Institutes of Health (GM057247 and AR051174).

190

Enrique M. De La Cruz and E. Michael Ostap

REFERENCES Bagshaw, C. R., Eccleston, J. F., Eckstein, F., Goody, R. S., Gutfreund, H., and Trentham, D. R. (1974). The magnesium ion-dependent adenosine triphosphatase of myosin. Two-step processes of adenosine triphosphate association and adenosine diphosphate dissociation. Biochem. J. 141, 351–364. Berg, J. S., Powell, B. C., and Cheney, R. E. (2001). A millennial myosin census. Mol. Biol. Cell 12, 780–794. Brune, M., Hunter, J. L., Corrie, J. E., and Webb, M. R. (1994). Direct, real-time measurement of rapid inorganic phosphate release using a novel fluorescent probe and its application to actomyosin subfragment 1 ATPase. Biochemistry 33, 8262–8271. Chalovich, J. M., and Eisenberg, E. (1982). Inhibition of actomyosin ATPase activity by troponin-tropomyosin without blocking the binding of myosin to actin. J. Biol. Chem. 257, 2432–2437. Coluccio, L. M., and Geeves, M. A. (1999). Transient kinetic analysis of the 130-kDa myosin I (MYR-1 gene product) from rat liver. A myosin I designed for maintenance of tension? J. Biol. Chem. 274, 21575–21580. Criddle, A. H., Geeves, M. A., and Jeffries, T. (1985). The use of actin labelled with N-(1-pyrenyl)iodoacetamide to study the interaction of actin with myosin subfragments and troponin/tropomyosin. Biochem. J. 232, 343–349. Dang, Q., and Frieden, C. (1997). New PC versions of the kinetic-simulation and fitting programs, KINSIM and FITSIM. Trends Biochem. Sci. 22, 317. De La Cruz, E., and Pollard, T. D. (1994). Transient kinetic analysis of rhodamine phalloidin binding to actin filaments. Biochemistry 33, 14387–14392. De La Cruz, E. M., and Ostap, E. M. (2004). Relating biochemistry and function in the myosin superfamily. Curr. Opin. Cell. Biol. 16, 61–67. De La Cruz, E. M., Ostap, E. M., and Sweeney, H. L. (2001). Kinetic mechanism and regulation of myosin VI. J. Biol. Chem. 276, 32373–32381. De La Cruz, E. M., Sweeney, H. L., and Ostap, E. M. (2000a). ADP inhibition of myosin V ATPase activity. Biophys. J. 79, 1524–1529. De La Cruz, E. M., Wells, A. L., Rosenfeld, S. S., Ostap, E. M., and Sweeney, H. L. (1999). The kinetic mechanism of myosin V. Proc. Natl. Acad. Sci. USA 96, 13726–13731. De La Cruz, E. M., Wells, A. L., Sweeney, H. L., and Ostap, E. M. (2000b). Actin and light chain isoform dependence of myosin V kinetics. Biochemistry 39, 14196–14202. Dose, A. C., Ananthanarayanan, S., Moore, J. E., Burnside, B., and Yengo, C. M. (2007). Kinetic mechanism of human myosin IIIA. J. Biol. Chem. 282, 216–231. El Mezgueldi, M., Tang, N., Rosenfeld, S. S., and Ostap, E. M. (2002). The kinetic mechanism of Myo1e (human myosin-IC). J. Biol. Chem. 277, 21514–21521. Foth, B. J., Goedecke, M. C., and Soldati, D. (2006). New insights into myosin evolution and classification. Proc. Natl. Acad. Sci. USA 103, 3681–3686. Frieden, C. (1983). Polymerization of actin: Mechanism of the Mg2þ-induced process at pH 8 and 20 degrees C. Proc. Natl. Acad. Sci. USA 80, 6513–6517. Frieden, C. (1994). Analysis of kinetic data: Practical applications of computer simulation and fitting programs. Methods Enzymol. 240, 311–322. Geeves, M. A. (1989). Dynamic interaction between actin and myosin subfragment 1 in the presence of ADP. Biochemistry 28, 5864–5871. Geeves, M. A., and Holmes, K. C. (1999). Structural mechanism of muscle contraction. Annu. Rev. Biochem. 68, 687–728. Gilbert, S. P., Webb, M. R., Brune, M., and Johnson, K. A. (1995). Pathway of processive ATP hydrolysis by kinesin. Nature 373, 671–676.

Kinetic and Equilibrium Analysis of the Myosin ATPase

191

Hannemann, D. E., Cao, W., Olivares, A. O., Robblee, J. P., and De La Cruz, E. M. (2005). Magnesium, ADP, and actin binding linkage of myosin V: Evidence for multiple myosin V-ADP and actomyosin V-ADP states. Biochemistry 44, 8826–8840. Henn, A., Cao, W., Hackney, D. D., and De La Cruz, E. M. (2008). The ATPase cycle mechanism of the DEAD-box rRNA helicase, DbpA. J. Mol. Biol. 377, 193–205. Henn, A., and De La Cruz, E. M. (2005). Vertebrate myosin VIIb is a high duty ratio motor adapted for generating and maintaining tension. J. Biol. Chem. 280, 39665–39676. Hiratsuka, T. (1983). New ribose-modified fluorescent analogs of adenine and guanine nucleotides available as substrates for various enzymes. Biochim. Biophys. Acta 742, 496–508. Johnson, K. A. (1986). Rapid kinetic analysis of mechanochemical adenosinetriphosphatases. Methods Enzymol. 134, 677–705. Johnson, K. A., and Taylor, E. W. (1978). Intermediate states of subfragment 1 and actosubfragment 1 ATPase: Reevaluation of the mechanism. Biochemistry 17, 3432–3442. Kouyama, T., and Mihashi, K. (1981). Fluorimetry study of N-(1-pyrenyl)iodoacetamidelabelled F-actin. Local structural change of actin protomer both on polymerization and on binding of heavy meromyosin. Eur. J. Biochem. 114, 33–38. Kovacs, M., Wang, F., Hu, A., Zhang, Y., and Sellers, J. R. (2003). Functional divergence of human cytoplasmic myosin II: Kinetic characterization of the non-muscle IIA isoform. J. Biol. Chem. 278, 38132–38140. Laakso, J. M., Lewis, J. H., Shuman, H., and Ostap, E. M. (2008). Myosin I can act as a molecular force sensor. Science 321, 133–136. Lanzetta, P. A., Alvarez, L. J., Reinach, P. S., and Candia, O. A. (1979). An improved assay for nanomole amounts of inorganic phosphate. Anal. Biochem. 100, 95–97. Lewis, J. H., Lin, T., Hokanson, D. E., and Ostap, E. M. (2006). Temperature dependence of nucleotide association and kinetic characterization of myo1b. Biochemistry 45, 11589–11597. Lin, T., Tang, N., and Ostap, E. M. (2005). Biochemical and motile properties of Myo1b splice isoforms. J. Biol. Chem. 280, 41562–41567. Lymn, R. W., and Taylor, E. W. (1970). Transient state phosphate production in the hydrolysis of nucleoside triphosphates by myosin. Biochemistry 9, 2975–2583. Lymn, R. W., and Taylor, E. W. (1971). Mechanism of adenosine triphosphate hydrolysis by actomyosin. Biochemistry 10, 4617–4624. Lynch, T. J., Brzeska, H., Baines, I. C., and Korn, E. D. (1991). Purification of myosin I and myosin I heavy chain kinase from Acanthamoeba castellanii. Methods Enzymol. 196, 12–23. Manceva, S., Lin, T., Pham, H., Lewis, J. H., Goldman, Y. E., and Ostap, E. M. (2007). Calcium regulation of calmodulin binding to and dissociation from the myo1c regulatory domain. Biochemistry 46, 11718–11726. McKillop, D. F., and Geeves, M. A. (1993). Regulation of the interaction between actin and myosin subfragment 1: Evidence for three states of the thin filament. Biophys J. 65, 693–701. Moore, K. J., and Lohman, T. M. (1994). Kinetic mechanism of adenine nucleotide binding to and hydrolysis by the Escherichia coli Rep monomer. 2. Application of a kinetic competition approach. Biochemistry 33, 14565–14578. Oguchi, Y., Mikhailenko, S. V., Ohki, T., Olivares, A. O., De La Cruz, E. M., and Ishiwata, S. (2008). Load-dependent ADP binding to myosins V and VI: Implications for subunit coordination and function. Proc. Natl. Acad. Sci. USA 105, 7714–7719. Olivares, A. O., Chang, W., Mooseker, M. S., Hackney, D. D., and De La Cruz, E. M. (2006). The tail domain of myosin Va modulates actin binding to one head. J. Biol. Chem. 281, 31326–31336.

192

Enrique M. De La Cruz and E. Michael Ostap

Ostap, E. M., and Pollard, T. D. (1996). Biochemical kinetic characterization of the Acanthamoeba myosin-I ATPase. J. Cell Biol. 132, 1053–1060. Pollard, T. D. (1982). Assays for myosin. Methods Enzymol. 85(Pt. B), 123–130. Robblee, J. P., Cao, W., Henn, A., Hannemann, D. E., and De La Cruz, E. M. (2005). Thermodynamics of nucleotide binding to actomyosin V and VI: A positive heat capacity change accompanies strong ADP binding. Biochemistry 44, 10238–10249. Robblee, J. P., Olivares, A. O., and de la Cruz, E. M. (2004). Mechanism of nucleotide binding to actomyosin VI: Evidence for allosteric head-head communication. J. Biol. Chem. 279, 38608–38617. Rosenfeld, S. S., and Taylor, E. W. (1987). The mechanism of regulation of actomyosin subfragment 1 ATPase. J. Biol. Chem. 262, 9984–9993. Rosenfeld, S. S., Xing, J., Whitaker, M., Cheung, H. C., Brown, F., Wells, A., Milligan, R. A., and Sweeney, H. L. (2000). Kinetic and spectroscopic evidence for three actomyosin:ADP states in smooth muscle. J. Biol. Chem. 275, 25418–25426. Spudich, J. A., and Watt, S. (1971). The regulation of rabbit skeletal muscle contraction. I. Biochemical studies of the interaction of the tropomyosin-troponin complex with actin and the proteolytic fragments of myosin. J. Biol. Chem. 246, 4866–4871. Taylor, E. W. (1991). Kinetic studies on the association and dissociation of myosin subfragment 1 and actin. J. Biol. Chem. 266, 294–302. Uemura, S., Higuchi, H., Olivares, A. O., De La Cruz, E. M., and Ishiwata, S. (2004). Mechanochemical coupling of two substeps in a single myosin V motor. Nat. Struct. Mol. Biol. 11, 877–883. Veigel, C., Molloy, J. E., Schmitz, S., and Kendrick-Jones, J. (2003). Load-dependent kinetics of force production by smooth muscle myosin measured with optical tweezers. Nat. Cell. Biol. 5, 980–986. Wachsstock, D. H., and Pollard, T. D. (1994). Transient state kinetics tutorial using the kinetics simulation program, KINSIM. Biophys. J. 67, 1260–1273. Webb, M. R. (1992). A continuous spectrophotometric assay for inorganic phosphate and for measuring phosphate release kinetics in biological systems. Proc. Natl. Acad. Sci. USA 89, 4884–4887. White, H. D., Belknap, B., and Webb, M. R. (1997). Kinetics of nucleoside triphosphate cleavage and phosphate release steps by associated rabbit skeletal actomyosin, measured using a novel fluorescent probe for phosphate. Biochemistry 36, 11828–11836. White, H. D., and Rayment, I. (1993). Kinetic characterization of reductively methylated myosin subfragment 1. Biochemistry 32, 9859–9865. Yengo, C. M., De la Cruz, E. M., Safer, D., Ostap, E. M., and Sweeney, H. L. (2002). Kinetic characterization of the weak binding states of myosin V. Biochemistry 41, 8508–8517.

C H A P T E R

S E V E N

The Hill Coefficient: Inadequate Resolution of Cooperativity in Human Hemoglobin Jo M. Holt1 and Gary K. Ackers Contents 194 194 197 200 203 204 205 206 209 211 211 212

1. 2. 3. 4.

Introduction Cooperativity and Intrinsic Binding The Macroscopic Binding Isotherm The Hill Coefficient 4.1. Formulation of the Adair constants 4.2. Redefinition of the Hill coefficient by Wyman 5. Microscopic Cooperativity in Hemoglobin 5.1. The hemoglobin binding cascade 5.2. Insensitivity of the binding isotherm 5.3. Insensitivity of the Hill coefficient 6. Summary References

Abstract The Hill coefficient nH is a dimensionless parameter that has long been used as a measure of the extent of cooperativity. Originally derived from the oxygenbinding curve of human hemoglobin (Hb) by A. V. Hill in 1910, and reinvented by J. Wyman several decades later, nH is indexed to the stoichiometry of ligation and is indirectly related to the overall cooperative free energy for binding all four oxygen ligands. However, the overall cooperative free energy of Hb ligation can be measured directly by experimental methods. The microscopic cooperative free energies that relate to energetic coupling between specific subunit pairs can also be experimentally determined, while the Hill coefficient is, by its nature, a macroscopic parameter that cannot detect differences among specific subunit-subunit couplings. Its continued use in studies of the mechanism of cooperativity in Hb is therefore of increasingly limited value. Emeritus, Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri, USA Corresponding author

1

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04207-9

#

2009 Elsevier Inc. All rights reserved.

193

194

Jo M. Holt and Gary K. Ackers

1. Introduction Experimental and theoretical studies regarding the cooperative binding of O2 to human hemoglobin (Hb) have a long history, dating back more than 100 years. Like many historical analytical methods, the Hill coefficient, nH, purports to describe an entire macromolecular system with a single determinant. The value of nH continues to be commonly used in Hb studies, if only to conclude that an applied perturbation resulted in increased or decreased overall cooperativity. This provides no information as to how the perturbation affected the individual energetic couplings among the four subunits. Over the decades, experimental protocols have been invented, optimized, and reinvented to yield greater and greater resolution of the functionality and structure of Hb. This evolution in methodology has been necessitated by the fact that cooperativity is manifested at the level of specific subunit-subunit coupling, whose individual constants lie beyond the information available from binding isotherms alone. Because of its remarkable longevity and its continued use today, it is of interest to retrace the origins of the Hill coefficient, its demise, and its rebirth, and to document its insensitivity to modern microscopic binding constants. First, it is necessary to define both cooperativity and the binding isotherm for Hb.

2. Cooperativity and Intrinsic Binding Many regulatory mechanisms are based upon communication between sites within the same molecule. The term cooperativity is commonly reserved for a subset of this general category of intramolecular coupling, specifically, the response of one site to ligation at another site. In the case of human hemoglobin (Hb), sequential binding of O2 to all four sites occurs with increasing affinity:

HbO2 þ O2 ⇄ Hb2O2 K Hb2O2 þ O2 ⇄ Hb3O2 K Hb3O2 þ O2 ⇄ Hb4O2 : K0!1

Hb þ O2 ⇄ Hb O2 K1!2 2!3

ð7:1Þ

3!4

The stepwise binding constants are denoted as Ki!iþ1, where i ¼ the number of O2 bound. The binding constants are measured in the order

195

The Hill Coefficient and Cooperativity in Hb

K3!4 > K2!3 > K1!2 > K0!1 and the binding curve for tetrameric Hb exhibits the distinct sigmoidal shape characteristic of positive cooperativity (Fig. 7.1). The shape of the curve tells us that the total free energy for binding O2 to all four hemesites is not the same value as the sum of the intrinsic free energies of binding to the four sites:

DG0!4 < DGa1 þ DGa2 þ DGb1 þ DGb2 :

a1

b2

b1

a2

ð7:2aÞ

Fractional saturation

1.00 0.75 0.50 0.25 0.00 0

5 10 15 Ligand concentration, [O2], mm Hg

20

Figure 7.1 Oxygen binding by human Hb. Upper: The deoxy Hb tetramer, viewed down its central axis. The four heme groups are shown in black. The tetramer is in equilibrium with free, noncooperative ab dimers (a1b1 and a2b2). Each O2 binds to the Fe2þ of the heme prosthetic group, which is noncovalently associated with each subunit. Lower: The tetrameric binding curve, with experimental error denoted by the gray shaded area. Values for the Adair constants under these high-affinity conditions (21.5 C, pH 7.4, 0.18 M Cl) are 3e4, 10e8, 1.5e14, and 2e20 M1, for K1, K2, K3, and K4, respectively, and the Hill coefficient is 3.5 (Holt et al., 2005).

196

Jo M. Holt and Gary K. Ackers

The intrinsic free energy for site a1, for example, is defined as the change in free energy for O2 binding to a1 in the absence of binding to a2, b1, and b2 (Ackers and Halvorson, 1974; Pauling, 1935). For Hb, it is necessary to add an additional term, DGc, to Eq. (7.2a) to yield the equality:

DG0!4 ¼ DGa1 þ DGa2 þ DGb1 þ DGb2 þ DGc :

ð7:2bÞ

The term DGc is the cooperative free energy, which is the change in free energy that occurs due to the interactions between subunits upon O2 binding. This is the energetic component of Hb cooperativity that is of interest in understanding how one subunit communicates with another subunit within the tetramer. The value of the intrinsic free energy of binding in Hb is designated as the binding constant of the free ab dimer (Mills et al., 1976), which is in reversible equilibrium with the a2b2 tetramer under all conditions: K int

2ðabÞ ⇆ a2 b2 :

ð7:3Þ

The free ab dimer acts as an ideal thermodynamic reference state for the Hb system, as it binds O2 with no cooperativity, and Kint can be measured simultaneously with Ki!iþ1, under the same exact experimental conditions. In normal human Hb, under the standard conditions used herein for temperature (21.5 C), pH (7.4), and ionic strength (0.18 M chloride), the four intrinsic free energies have the same value (8.35 0.05 kcal/mol), and the DGc for overall binding (i.e., binding all four O2) is þ6.3 0.2 kcal/mol (Ackers, 1998). The total change in free energy for binding all four O2 ligands is therefore:

DG0!4

¼ DGa1 þ DGa2 þ DGb1 þ DGb2 þ DGc ¼ 4ð8:35Þ þ 6:3 ¼ 27:1kcal=mol:

ð7:4Þ

The total free energy is related to the equilibrium binding constants as:

DG0!4 ¼ RT ln K0!4 ;

The Hill Coefficient and Cooperativity in Hb

197

where

K0!4 ¼ Ka1 Ka2 Kb1 Kb2 Kc ; and with equal intrinsics

Kint ¼ Ka1 ¼ Ka2 ¼ Kb1 ¼ Kb2 ; the expression in terms of stepwise binding is:

K0!4 ¼ Kint Kcð0!1Þ Kint Kcð1!2Þ Kint Kcð2!3Þ Kint Kcð3!4Þ : ð7:5Þ Thus, each binding constant is partitioned into two components.

3. The Macroscopic Binding Isotherm In practice, one of the first experiments performed on a purified sample of Hb is measuring the equilibrium binding curve over a range of O2 concentration. Because O2 is in reversible equilibrium with Hb, binding is measured under equilibrium conditions, holding the temperature constant, and thus the binding curve is frequently referred to as the binding isotherm. The curve is plotted in terms of the fractional saturation of binding sites, Y , versus the free ligand concentration: Y ¼

½boundsites ½HbO2 þ2½Hb2O2 þ3½Hb3O2 þ4½Hb4O2 ¼ : ð7:6Þ ½totalsites 4f½Hbþ½HbO2 þ½Hb2O2 þ½Hb3O2 þ½Hb4O2g

Note that, in the numerator, each Hb concentration term must be multiplied by the stoichiometric number of bound ligand, and in the denominator, the total number of sites available for binding is four times the concentration of all Hb species.

198

Jo M. Holt and Gary K. Ackers

From the reactions in Eq. (7.1), the four stepwise macroscopic binding equilibria are:

½Hb O2 ½Hb½O2 ½Hb 3O2 ¼ ½Hb 2O2 ½O2

K0!1 ¼ K2!3

K3!4

½Hb 2O2 ½Hb O2 ½O2 ½Hb 4O2 ¼ ½Hb 3O2 ½O2 :

K1!2 ¼

ð7:7Þ

The cascade of sequential binding steps is illustrated in Fig. 7.2A. To express the Hb concentrations of Eq. (7.6) in terms of binding constants and ligand A Hb macroscopic cascade

K0→1

B Hill’s Hb monomer aggregate

O2

K1→2 +4O2

2O2

K1

K2→3 3O2

K3→4

Figure 7.2 Hb binding cascades. (A) The four-step binding sequence proposed by Adair in 1925 (Adair, 1925b). The binding constants represent the statistical average of all configurations of bound ligands among the four hemesites at the singly, doubly, and triply ligated intermediate steps. (B) Single-step binding proposed by Hill in 1910 (Barcroft and Hill, 1910). The Hb monomer aggregrates into groups of four, and four O2 are bound simultaneously with the same binding constant.

199

The Hill Coefficient and Cooperativity in Hb

concentration, the stepwise binding constants are first transformed into their corresponding product constants, K0!1, K0!2, K0!3, and K0!4, historically known as Adair constants:

K0!1 K0!2 K0!3 K0!4

¼ K0!1 ¼ K0!1 K1!2 ¼ K0!1 K1!2 K2!3 ¼ K0!1 K1!2 K2!3 K3!4:

ð7:8Þ

The four product binding equilibria are:

K0!1 ¼

½Hb O2 ½Hb½O2

K0!2 ¼

½Hb O2 ½Hb 2O2 ½Hb 2O2 ¼ ½Hb½O2 ½Hb O2 ½O2 ½Hb½O2 2

K0!3 ¼

½Hb O2 ½Hb 2O2 ½Hb 3O2 ½Hb 3O2 ¼ ½Hb½O2 ½Hb O2 ½O2 ½Hb 2O2 ½O2 ½Hb½O2 3

K0!4 ¼

½Hb O2 ½Hb 2O2 ½Hb 3O2 ½Hb 4O2 ½Hb½O2 ½Hb O2 ½O2 ½Hb 2O2 ½O2 ½Hb 3O2 ½O2

¼

½Hb 4O2 ½Hb½O2 4 :

When the concentration of O2 is substituted with x, the concentrations of the four stoichiometric Hb species become:

O2 ¼ K0!1½Hbx2 2O2 ¼ K0!2½Hbx3 3O2 ¼ K0!3½Hbx4 4O2 ¼ K0!4½Hbx :

½Hb ½Hb ½Hb ½Hb

200

Jo M. Holt and Gary K. Ackers

The expression for Y is further simplified by factoring out [Hb], yielding the standard equation for fractional saturation of Hb:

Y ¼

K0!1 x þ 2K0!2 x2 þ 3K0!3 x3 þ 4K0!4 x4 : 4ð1 þ K0!1 x þ K0!2 x2 þ K0!3 x3 þ K0!4 x4 Þ

ð7:9aÞ

The four Adair binding constants can be experimentally determined from nonlinear least squares analysis of the O2-binding isotherm. Because each binding constant is a product of the intrinsic binding constant, Kint, and the cooperativity constant Kc, the fractional saturation can also be written: Kcð0!1Þ ðKint xÞ þ 2Kcð0!2Þ ðKint xÞ2 þ 3Kcð0!3Þ ðKint xÞ3 þ 4Kcð0!4Þ ðKint xÞ4 : Y ¼ 4 1 þ Kcð0!1Þ ðKint xÞ þ Kcð0!2Þ ðKint xÞ2 þ Kcð0!3Þ ðKint xÞ3 þ Kcð0!4Þ ðKint xÞ4

ð7:9bÞ In the absence of the cooperativity constants, Eq. (7.9b) reduces to the fractional saturation of a noncooperative Hb, such as Hb Ypsilanti (Ackers, 1998).

4. The Hill Coefficient The extent of cooperativity can be expressed as a ratio of cooperativity constants Kc or simply as the value of DGc (Eq. (7.4)). Historically, the Hill coefficient, nH, has also been used to gauge the extent of cooperativity in Hb. Researchers familiar with Hb modifications, mutations, and modulation by allosteric effectors have a sense of the overall functional difference between a Hb with nH ¼ 1.5 and normal Hb, whose nH ¼ 2.8–3.4 (depending on conditions and types of measurement). As a measurement of the extent of cooperativity, however, the precise relationship of nH is less clear, as evidenced in the literature. To better understand the meaning of the Hill coefficient, it is necessary to begin with its original formulation. In 1910, A. V. Hill was a student in the laboratory of J. Barcroft in Cambridge who took on the task of reconciling two conflicting pieces of data regarding the binding of O2 to Hb. On the one hand, it was strongly felt, among the scientific community at the time, that the molecular weight of Hb was 16,670 (Edsall, 1980). Work by Hu¨fner (1901) had demonstrated the 1:1 stoichiometry of heme Fe and bound O2 in Hb, and so it was

201

The Hill Coefficient and Cooperativity in Hb

generally accepted that Hb was a monomer that bound a single O2, which would generate a rectangular hyperbola (Fig. 7.3A):

Y ¼

K1 x : 1 þ K1 x

ð7:10Þ

So convinced of this was Hu¨fner that he chose not to measure the curve experimentally. In 1904 Christian Bohr, along with his colleagues Hasselbach and Krough, had made the decision to actually measure the Hb-binding curve, and found a distinct sigmoidal shape (Fig. 7.3A), very different from that assumed by Hu¨fner (Bohr et al., 1904). The sigmoid shape was not recognized by anyone at the time as an indicator of cooperativity, as there was no foundation for even thinking about multi-subunit proteins and site-site interactions. But it was clear that a monomer binding a single ligand could not generate a sigmoidal binding curve. Thus, Bohr’s work disproved the view of Hb as a single molecule containing a single heme that bound a single O2. Hill, who had trained in mathematics as an undergraduate at Cambridge, realized that raising the concentration of O2 to an exponential power greater than one would generate a sigmoidal binding curve (Edsall, 1980). He proposed that the Hb monomers aggregated into groups of four, and that this aggregate bound four O2 ligands simultaneously (Barcroft and Hill, 1910). In modern terms, the act of aggregation would immediately suggest interaction between the components of the aggregate, possibly generating differing binding constants. But this was not the thinking in 1910. In Hill’s hypothesis, the aggregate of n monomers bound O2 with the monomeric binding constant (Fig. 7.2B), so that Hb could still be viewed as a monomer, while the ligand concentration could be raised to the nth power, generating the sigmoidal curve:

4Hb þ 4O2

K1

⇄4Hb4O2:

ð7:11Þ

The expression for fractional saturation for Hill’s binding equation is:

½bound sites 4½Hb O2 : ¼ Y ¼ ½total sites 4f½Hb þ ½Hb O2 g

202

Jo M. Holt and Gary K. Ackers

A 1.00

Y

0.75 0.50

ΔGc

0.25 0.00 0

5 10 15 20 Ligand concentration, [O2], mm Hg

B 4

logY/1-Y

2 0

N

−2 −4 −4

C

−2

0

2 4 log [O2]

6

8

4

nH

3 2 1 0.0

0.2

0.4

0.6

0.8

1.0

Y

Figure 7.3 The Hill plot. (A) The shaded area between the tetrameric (sigmoidal) isotherm and the noncooperative dimeric (hyperbolic) isotherm represents the cooperative free energy of binding all four ligands. (B) The Hill plot of the tetrameric binding curve (solid line). The Hill coefficient is obtained from the slope of the line transitioning from the lower O2 concentration limit to the upper O2 concentration limit (extrapolations of the limiting linespare ﬃﬃﬃ dashed). Wyman argued that the cooperative free energy was equal to 2.303RT 2N , where N is the distance between the two straight lines of the limiting low and high ligand concentrations. (C) The Hill coefficient nH varies with Y . The value reported for nH in Hb studies is the maximum. The coefficient calculated for the cooperative isotherm of panel A (solid line) is contrasted with that calculated for the noncooperative dimeric isotherm (dashed line).

203

The Hill Coefficient and Cooperativity in Hb

The binding equilibrium for each monomer is:

K1 ¼

½Hb O2 ; ½Hb½O2

and the concentration of product is:

½Hb O2 ¼ K1 ½Hbx; where [O2] is substituted with x. Thus, the fractional saturation is:

Y ¼

K1 ½Hbx4 K1 x4 ¼ : ½Hb þ K1 ½Hbx4 1 þ K1 x4

ð7:12Þ

In the Hill equation, the parameter n was used as the exponent, and Hill was able to fit the available O2-binding curve with n ¼ 2.5 (Barcroft and Hill, 1910):

Y ¼

K1 xn : 1 þ K1 xn

ð7:13Þ

4.1. Formulation of the Adair constants After World War I, George S. Adair showed that the true molecular weight of Hb was four times that thought by Hu¨fner and others (Adair, 1925a). His finding of 64,500 for the molecular weight of Hb was initially met with scepticism (Edsall, 1980), until Svedburg independently found the same molecular weight by equilibrium ultracentrifugation (Svedburg and Fahreus, 1926). Adair’s significant improvements of osmotic pressure measurements and his systematic approach to thermodynamic problems resulted in the modern equation for fractional saturation in terms of sequential binding (Adair, 1925b), shown previously as Eq. (7.9a). Adair proposed that the four binding sites of Hb interacted, changing the affinity of each other as O2 bound. The Adair equation replaced Hill’s equation and pioneered the quantitative measurement of binding constants for all work on Hb and other cooperative proteins that followed. Adair compared his results to those of Hill as follows (see Schejter and Margoliash, 1985). The Hill equation (Eq. (7.13)) was based on the

204

Jo M. Holt and Gary K. Ackers

assumption that the intermediate species either did not exist or did not contribute significantly to the binding curve. Adair showed that, if K0!4 was large in comparison to the other binding constants, the equation for fractional saturation (Eq. (7.9a)) would reduce to Hill’s equation (Eq. (7.13)). In this case, the Hill coefficient would be equal to the number of binding sites. As the intermediate constants K0!1, K0!2, and K0!3 became larger relative to K0!4, the Hill coefficient would decrease. Even though Adair showed in 1925 that the Hill coefficient is not a measure of the stoichiometry of binding, the erroneous idea that nH represents the number of cooperating sites in the tetramer persists to the present day, a remarkable span of more than 80 years. Adair also pointed out that if all the stepwise binding constants were equal (the condition of noncooperativity), the equation for fractional saturation (Eq. (7.9a)) reduces to Hu¨fner’s equation for a rectangular hyperbola (Eq. (7.10)).

4.2. Redefinition of the Hill coefficient by Wyman The work of Adair in 1925 provided proof that the equation for fractional saturation of Hb developed by Hill in 1910 was not correct. However, the Hill coefficient was reinvented several decades later by Jeffries Wyman (1948, 1964). Wyman introduced the Hill plot, based on the Hill equation:

Y ¼

K1 xn 1 þ K1 xn

becomes

K1 xn ¼

Y : 1 Y

ð7:14Þ

Wyman’s Hill plot was plotted as ln 1YY versus ln x, as shown in Fig. 7.3B. This transformation of the sigmoidal binding curve yields a plot with the appearance of a straight line in the midrange ligand concentration values. The value of nH is the maximum slope of the linear portion of the plot, and is therefore dependent on Y , as illustrated in Fig. 7.3C. The Hill coefficient itself is therefore:

d ln 1YY nH ¼ : d ln x

ð7:15Þ

The Hill coefficient is indirectly related to DGc. Consider the difference between the rectangular hyperbola of noncooperative binding and the

The Hill Coefficient and Cooperativity in Hb

205

sigmoid (Fig. 7.3A). The difference between these curves is directly related to the cooperativity constants of Eq. (7.9b), which are, in turn, related to DGc. Wyman (1964) proposed that nH is ‘‘closely related to’’ the average free energy of interaction of the sites. Another view argued that nH is related instead to the difference between cooperative free energies of the first and last binding steps, but only when supplemented with a value for intrinsic binding (Saroff and Minton, 1972). Another explanation of the meaning of Wyman’s Hill coefficient is that it represents the variance of Y , as discussed in Holt and Ackers (2005). The Hill coefficient can be expressed as:

nH ¼

n ½Y2 ðY Þ2 ; Y ðn Y Þ

ð7:16Þ

where n is the number of binding sites. The expression ðY 2 Þ ðY Þ2 is proportional to the standard deviation of the fractional saturation (Cohn and Edsall, 1943; Wyman, 1964). Cooperativity increases the variance inY , thus increasing the value of nH. A caveat is necessary here, as factors other than cooperativity can cause a change in the variance of Y (i.e., dispersion of binding constants), such as different intrinsic binding constants of aand b-subunits. However, as is evident in Eq. (7.16), nH is not purely a standard deviation, but is also ratioed to a function of Y and indexed to the stoichiometry of ligand binding. What was the advantage to reintroducing such a complex dimensionless parameter as the Hill coefficient? The only real advantage appears to be that it provides, via the Hill plot, a means of measuring the extent of cooperativity in a binding curve without measuring the individual binding constants, as well as the intrinsic constant(s), which requires considerable additional effort. But because Wyman’s Hill coefficient is only indirectly related to the free energy of cooperativity, while simultaneously indexed to the stoichiometry of the system, it’s relationship to DGc is not of practical value, even at the macroscopic level.

5. Microscopic Cooperativity in Hemoglobin The question of the molecular mechanism of cooperativity is inherently microscopic in nature: for example, when O2 binds to subunit a1, how does the O2 binding constant for b2 adjust? The stepwise Adair constants K0!1, K1!2, K2!3, and K3!4, which form the basis of Y , which itself is the basis of nH, are all macroscopic parameters that are composites of microscopic binding constants. Given the values of the microstate binding

206

Jo M. Holt and Gary K. Ackers

constants, all macroscopic parameters can be calculated. However, the reverse is not true (Fig. 7.4). It is therefore necessary to experimentally determine the microstate binding constants.

5.1. The hemoglobin binding cascade The microscopic binding constants are readily identified by considering all possible individual site or microscopic binding reactions in the Hb tetramer (Fig. 7.5). Each tetrameric species is denoted as ij, where i is the number of bound ligand and j is a numeral assigned to a specific configuration of bound ligand(s); the markers a and b are used to distinguish one isomer from another. The total number of microscopic binding steps among the 16 tetrameric species is 32, and are distributed as follows: 4 first binding reactions, 12 second binding reactions, 12 third binding reactions, and 4 fourth binding reactions. The large number of binding reactions arise from counting each configurational isomer within the Hb tetrameric species, since each species, with the exception of species 01, 23, 24, and 41, is present in two isomeric forms. The 32 microscopic constants completely describe the Hb binding cascade, forming the basis of the four stepwise macroscopic constants, as shown in Table 7.1. The 32-step cascade can be simplified by grouping the isomeric species and by including the experimental observation that some species have very similar binding constants (Ackers, 1998). Under standard

Microscopic parameters

Macroscopic parameters

kij

Ki→i+1

ijk

i→i+1K c c

ijΔG

i→i+1ΔG c

c

nH

Figure 7.4 Macroscopic parameters are composed of higher-resolution microscopic parameters. Microscopic parameters can be measured experimentally and used to calculate macroscopic parameters. Macroscopic parameters can also be measured experimentally, and the measured versus calculated parameters can be compared. However, it is not possible to calculate a unique set of microscopic parameters from macroscopic experiments.

207

The Hill Coefficient and Cooperativity in Hb

a1

b2

b1

a2 01

21a

11a

12a

11b

22a

23

24

22b

31a

32a

31b

32b

12b

21b

41

Figure 7.5 The complete binding cascade for human Hb. All possible configurations of bound ligand are shown, including the (redundant) isomeric species.

conditions (pH 7.4, 21.5 C), the cumulative binding constants for ligation to the following species fall within experimental error: species 11 ¼ 12, species 22 ¼ 23 ¼ 24, and species 31 ¼ 32. Even under nonstandard conditions, the relative O2 affinities of these species are very close (Ackers, 1998). The cascade is greatly streamlined when these redundancies are grouped (Table 7.1), resulting in a simplified cascade with six unique binding steps (Fig. 7.6A). It should be noted that, in the process of reducing the cascade from 32 to 6 reaction steps, the redundant tetrameric species are still counted and are present in the statistical factors of the final binding equation, Eq. (7.9a). In addition, each microscopic binding constant in Table 7.1 has a corresponding microscopic binding free energy, designated DGij, which is the sum of the intrinsic free energy of binding and the cooperative free energy:

208

Jo M. Holt and Gary K. Ackers

Table 7.1 Relationship between the microscopic and macroscopic constants of the Hb-binding cascadea Stepwise microscopic binding constants

k01!11a k01!11b k01!12a k01!12b k11a!21a k11b!21b k12a!21a k12b!21b k11a!22a k11b!22b k12a!22b k12b!22a k11a!23 k11b!23 k12a!24 k12b!24 k21a!31a k21b!31b k21a!32a k21b!32b k22a!31a k22b!31b k22a!32b k22b!32a k23!32a k23!32b k24!31a k24!31b k31a!41 k31b!41 k32a!41 k32b!41 a b c

Grouped by isomers

With experimental equalitiesb

Stepwise macroscopic binding constantsc

2k01!11

4k01!11/12

K0!1 ¼ 4k01!11/12

4k11/12!21

K1!2 ¼ 4k11/12!21 þ 8k11/12!22/23/24

2k01!12 2k11!21 2k12!21 2k11!22

8k11/12!22/23/24

2k12!22 2k11!23 2k12!24 2k21!31

4k21!31/32

2k21!32 2k22!31

K2!3 ¼ 4k21!31/32 þ 8k22/23/24!31/32

8k22/23/24!31/32

2k22!32 2k23!32 2k24!31 2k31!41

4k31/32!41

K3!4 ¼ 4k31/32!41

2k32!41

Microscopic binding reactions are illustrated in Fig. 7.6. The O2 affinites for some species were found to be within experimental error under most conditions: 11 ¼ 12, 22 ¼ 23 ¼ 24, and 31 ¼ 32, as discussed in the text. As in Eq. (7.7).

209

The Hill Coefficient and Cooperativity in Hb

A

B Simplified microscopic cascade

nH = 3.46

a1

b2

b1

a2 7.2 ⫻ 103

Simplified macroscopic cascade

7.2 ⫻ 103

nH = 3.46 O2

5.6 ⫻ 104

5.1 ⫻ 103

1.1 ⫻ 104 2O2

4. 6 ⫻ 105 1.0 ⫻ 106

9.3⫻ 104

3O2

4.7 ⫻ 106

4. 7 ⫻ 106

Figure 7.6 Comparison of microscopic and macroscopic binding cascades. (A) The microscopic binding constants result in a branched cascade, due to the presence of intradimer cooperativity (Holt et al., 2005). (B) The macroscopic constants cannot detect intradimer cooperativity, and result in a linear cascade. The value of nH is identical for both cascades.

DGij ¼ DGint þ ij DGc :

ð7:17Þ

5.2. Insensitivity of the binding isotherm The binding isotherm is remarkably insensitive to the binding constants for the intermediate, partially ligated species of Hb. This was noted in the 1960s, when the fits for the sequential model of Koshland were found to be comparable to those for the two-state concerted model of Monod, Wyman, and Changeaux (Koshland et al., 1966). The insensitivity is also evident in the macroscopic binding constants for the second and third binding steps, which exhibit overlapping experimental errors even under

210

Jo M. Holt and Gary K. Ackers

the most rigorous techniques of analysis ( Johnson and Ackers, 1977; Johnson et al., 1976; Mills et al., 1976). And yet if the intermediate binding constants are equal to one another, as in a noncooperative tetramer, the shape of the isotherm changes dramatically. The work of Hill and Adair provides an interesting insight into this apparent irony. Hill showed that the shape of the isotherm is due to the exponent of the O2 concentration term of Y having a value greater than one. Adair showed that this exponential term will be present in Y as long as the four binding constants have different values, regardless of the magnitude of the different values. Thus, the Hill version of Y , which does not include any intermediate binding constants, still provides a remarkably close approximation to the shape of the binding curve, as illustrated for Hb under standard conditions in Fig. 7.7. (1)

K0→1x + 2K0→2x2 + 3K0→3x3+ 4K0→4x4

Y =

4 [1 + K0→1x + K0→2x2 + K0→3x3+ K0→4x4]

Simultaneous O2 binding

(2)

Y =

Single O2 binding

Kx4

(3)

1 + Kx4

Y =

Kx 1 + Kx

1.00

Y

0.75 0.50 0.25 0.00 0

5

10

15

20

Ligand concentration, [O2], mm Hg

Figure 7.7 The basis of the sigmoidal shape of the Hb-binding curve is the exponent of the O2 concentration. The experimental binding curve (solid line) is represented by the upper equation for Y (1). Removal of the intermediate terms from this equation generates the Hill equation for Y (2) and a sigmoidal binding curve (dashed-dotted line), using the same binding constant K0!4. Setting the exponent of x equal to 1 (3), as would be the case for single-site binding, results in the loss of the sigmoid shape (dashed line).

The Hill Coefficient and Cooperativity in Hb

211

5.3. Insensitivity of the Hill coefficient The fact that the shape of the binding curve is substantially insensitive to the microstate constants means that the macrostate constants, as well as any parameter derived from the macrostate constants or Y , exhibit a significant insensitivity to the values of the microstate constants. The microstate constants were first measured in the 1980s and 1990s by the application of linkage thermodynamics to analogues of the tetrameric species that bore nonlabile ligands, as reviewed in (Ackers, 1998; Ackers and Holt, 2006). The results showed two different pathways of cooperativity through the cascade, based on the sequential and sometimes asymmetric ligation of each ab dimer within the tetramer (Fig. 7.6A). Because of the structural organization of Hb as a dimer of dimers, it was necessary to designate two types of cooperative interactions: cross-dimer cooperativity for coupling between the two dimers (a1b1 and a2b2) and intradimer cooperativity for coupling within the ab dimer (between a1 and b1 as well as between a2 and b2). Having values for microstate constants, it becomes possible to address the relative contribution of each type of cooperativity to the overall cooperativity in Hb (Ackers and Holt, 2006). The macroscopic cascade differs fundamentally from the microscopic cascade, in that the former is linear and the latter is branched, due to asymmetric distribution of O2 within many of the partially ligated intermediate tetramers. There is no need to consider cross-dimer versus intradimer coupling in the macroscopic cascade. And yet, as different as these cascades are from a mechanistic perspective, they both exhibit the same Hill coefficient (Fig. 7.6). The failure of the Hill coefficient to resolve the linear from the branched cascade is due, as explained earlier, to its insensitivity to the microscopic binding constants and to its insensitivity to asymmetric perturbation of the tetramer.

6. Summary Although it is experimentally feasible and desirable to measure both macroscopic and microscopic O2-binding constants for human Hb, only the higher resolution of the microscopic constants is useful in identifying specific pathways of cooperativity. The O2-binding isotherm is largely insensitive to microscopic binding constants, and it follows that macroscopic parameters based on the binding curve, such as the Hill coefficient, do not provide the resolution needed to address modern issues of the mechanism of cooperativity. Even though the measurement of microscopic constants is substantially more labor intensive than that for the macroscopic, there is no substitute for the direct comparison of relevant binding constants in the analysis of subunit-subunit coupling.

212

Jo M. Holt and Gary K. Ackers

REFERENCES Ackers, G. K. (1998). Deciphering the molecular code of hemoglobin allostery. Adv. Protein Chem. 51, 185–248. Ackers, G. K., and Halvorson, H. R. (1974). The linkage between oxygenation and subunit dissociation in human hemoglobin. Proc. Natl. Acad. Sci. USA 71, 4312–4316. Ackers, G. K., and Holt, J. M. (2006). Asymmetric cooperativity in a symmetric tetramer: Human hemoglobin. J. Biol. Chem. 281, 11441–11443. Adair, G. S. (1925a). A critical study of the direct method of measuring the osmotic pressure of hemoglobin. Proc. R. Soc. London Ser. A 108A, 627–637. Adair, G. S. (1925b). The hemoglobin system. J. Biol. Chem. 63, 493–546. Barcroft, J., and Hill, A. V. (1910). The nature of oxyhaemoglobin, with a note on its molecular weight. J. Physiol. (London) 39, 411–428. Bohr, C., Hasselbalch, K. A., and Krogh, A. (1904). Ueber einen in biologischer Beziehung wichtigen Einfluss, den die Kohlensaurespannung des Blutes aufdessen Sauerstoffbinung ubt. Skand. Arch. Physiol. 16, 402–412. Cohn, E. J., and Edsall, J. T. (1943). "Proteins, amino acids and peptides as ions and dipolar ions." Reinhold Publishing, New York. Edsall, J. T. (1980). Hemoglobin and the origins of the concept of allosterism. FASEB 39, 226–235. Holt, J. M., and Ackers, G. K. (2005). Asymmetric distribution of cooperativity in the binding cascade of normal human hemoglobin. 2. Stepwise cooperative free energy. Biochemistry 44, 11939–11949. Holt, J. M., Klinger, A. L., Yarian, C. S., Keelara, V., and Ackers, G. K. (2005). Asymmetric distribution of cooperativity in the binding cascade of normal human hemoglobin. 1. Cooperative and noncooperative oxygen binding in Zn-substituted hemoglobin. Biochemistry 44, 11925–11938. Hu¨fner, G. (1901). Arch. Anat. Physiol., Anat. Abt. 5, 187–217. Johnson, M. L., and Ackers, G. K. (1977). Resolvability of Adair constants from oxygenation curves measured at low hemoglobin concentration. Biophys. Chem. 7, 77–80. Johnson, M. L., Halvorson, H. R., and Ackers, G. K. (1976). Oxygenation-linked subunit interactions in human hemoglobin: Analysis of linkage functions for constituent energy terms. Biochemistry 15, 5363–5371. Koshland, D. E., Nemethy, G., and Filmer, D. (1966). Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry 5, 365–385. Mills, F. C., Johnson, M. L., and Ackers, G. K. (1976). Oxygenation-linked subunit interactions in human hemoglobin: experimental studies on the concentration dependence of oxygenation curves. Biochemistry 15, 5350–5362. Pauling, L. (1935). The oxygen equilibrium of hemoglobin and its structural interpretation. Proc. Nat’l. Acad. Sci. USA 21, 186–191. Saroff, H. A., and Minton, A. P. (1972). The Hill plot and energy of interaction in hemoglobin. Science 175, 1253–1255. Schejter, A., and Margoliash, E. (1985). The Adair hypothesis. TIBS 10, 490–492. Svedburg, T., and Fahreus, R. (1926). A new method for the determination of the molecular weights of the proteins. J. Am. Chem. Soc. 48, 430–438. Wyman, J. (1948). Heme proteins. Adv. Prot. Chem. 4, 407–531. Wyman, J. (1964). Linked functions and reciprocal effects in hemoglobin: A second look. Adv. Protein Chem. 19, 223–286.

C H A P T E R

E I G H T

Methods for Measuring the Thermodynamic Stability of Membrane Proteins Heedeok Hong,*,1 Nathan H. Joh,*,1 James U. Bowie,* and Lukas K. Tamm† Contents 1. Introduction 2. Two Classes of Membrane Proteins 3. Methods for Measuring Transmembrane Domain Oligomer Stability 3.1. Analytical ultracentrifugation ¨rster resonance energy transfer (FRET) 3.2. Fo 3.3. Disulfide cross-linking 3.4. Genetic assay systems (TOXCAT, POSSYCAT, and GALLEX) 4. Methods for Measuring Multipass a-helical Membrane Protein Stability 5. Methods to Study the Stability of b-barrel Membrane Proteins 5.1. SDS denaturation 5.2. Thermal denaturation 5.3. Solvent denaturation with urea or GdnHCl 6. A Few Salient Results on Forces that Stabilize Membrane Proteins 6.1. Van der Waals/packing interactions 6.2. Hydrogen-bonding interactions 6.3. Electrostatic interactions 6.4. Aromatic-aromatic interactions 6.5. Elastic lipid bilayer forces 7. Conclusion and Outlook Acknowledgements References

* {

1

214 215 216 217 217 218 218 219 222 223 223 224 227 227 228 228 228 229 231 232 232

Department of Chemistry and Biochemistry, UCLA-DOE Center for Genomics and Proteomics, Molecular Biology Institute, University of California, Los Angeles, California, USA Center for Membrane Biology and Department of Molecular Physiology and Biological Physics, University of Virginia Health System, Charlottesville, Virginia, USA These two authors contributed equally

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04208-0

#

2009 Elsevier Inc. All rights reserved.

213

214

Heedeok Hong et al.

Abstract Learning how amino acid sequences define protein structure has been a major challenge for molecular biology since the first protein structures were determined in the 1960s. In contrast to the staggering progress with soluble proteins, investigations of membrane protein folding have long been hampered by the lack of high-resolution structures and the technical challenges associated with studying the folding process in vitro. In the past decade, however, there has been an explosion of new membrane protein structures and a slower but notable increase in efforts to study the factors that define these structures. Here we review the methods that have been used to evaluate the thermodynamic stability of membrane proteins and provide some salient examples of how the methods have been used to begin to understand the energetics of membrane protein folding.

1. Introduction Exploring the molecular forces that govern the structure and function of proteins is essential to many of the fundamental pursuits of biochemistry, including structure prediction and design, understanding evolution, disease etiology, and drug design. Although integral membrane proteins are prevalent, comprising a third of all genomes, and carry out important biological functions, our understanding of the folding and stability determinants of this special class of proteins remains rudimentary. A major challenge in the study of membrane protein folding is developing experimental systems that allow for controlled examination of the reaction. Folding studies require experimental conditions that drive unfolding but still enable complete refolding. In contrast to water-soluble proteins, folding studies in membrane proteins are complicated by the physical and chemical heterogeneity of the bilayer environment, which is matched by equally varied properties of the membrane protein. The physical forces that control folding also vary with the environment. It is therefore hard to find convenient experimental systems that can satisfy all the different constraints. In this chapter we will discuss the in vitro experimental approaches that have been used to study the membrane protein folding thermodynamics. We will first introduce the two main classes of membrane proteins, namely a-helical and b-barrel proteins, and review the methods for thermodynamic characterization of folding and assembly for the two classes. We will not explicitly discuss studies of folding kinetics here, which have been covered in previous reviews (Booth and Curnow, 2006; Tamm et al., 2001).

215

Membrane Protein Folding

2. Two Classes of Membrane Proteins Two classes of membrane protein structures have been observed to date: a-helical membrane proteins, comprised of bundles of transmembrane helices, and b-barrel membrane proteins, built from membrane-spanning b strands (Fig. 8.1). The helical membrane proteins are generally found in the inner membranes of bacterial cells or the plasma membrane of eukaryotes, while the b-barrel class appears in the outer membrane of bacteria or mitochondria. Both architectures are able to satisfy the requirement for hydrophobic matching of the bilayer and the desire to satisfy hydrogen bonds, but they impose very different folding imperatives (Fig. 8.2). In an a-helix, backbone hydrogen bonds can be satisfied locally so that an isolated hydrophobic helix can be effectively a stable domain within the bilayer (Engelman et al., 1986). During helical membrane protein biogenesis, the translocon can shuttle individual helices or pairs of helices into the bilayer (Rapoport, 2007), enabling final folding to proceed after membrane insertion (Engelman et al., 2003; Popot and Engelman, 1990). While we still do not have an experimental view of an unfolded membrane protein in a bilayer, the fact that individual secondary-structure elements can be stable suggests that it is reasonable to envision the unfolding of a helical membrane protein as a loss of tertiary structure in the bilayer, but not complete loss of stable helical transmembrane segments. Unlike a-helices, individual b-strands of b-barrel membrane proteins are generally not stable in the hydrocarbon core of the bilayer. The

a-helical protein

b-barrel protein

Figure 8.1 Structures of representative a-helical and b-barrel membrane proteins in a lipid bilayer. Left: bacteriorhodopsin (bR). Right: Transmembrane domain of outer membrane protein A (OmpA).

216

Heedeok Hong et al.

Folding

Insertion

Coupled folding and insertion

Figure 8.2 Thermodynamic folding pathways for a-helical and b-barrel membrane proteins. Left: Two-state model of a-helical membrane protein folding, adopted from Popot and Engelman (1990). Right: Coupled process of folding and insertion for bbarrel proteins.

backbone hydrogen bonds are not internally satisfied and the sequences tend to be relatively hydrophilic, with one face of the strand lining a polar pore and the other side facing the apolar core of the bilayer. Consequently the biogenesis of b-barrel membrane proteins is very different from the biogenesis of a-helical proteins, involving chaperones that ferry the protein to the membrane, and folding and insertion are highly coupled processes (Kleinschmidt and Tamm, 2002). Thus, unlike helical proteins, the unfolded protein is not likely to be inserted across the bilayer. Because a-helical membrane protein folding and b-barrel membrane protein folding are different, they need to be studied in different ways. In this review we will discuss the techniques that have been applied to study membrane protein folding, their limitations and prospects for the future.

3. Methods for Measuring Transmembrane Domain Oligomer Stability One way to access information about the energetics of molecular interactions in membrane proteins is by measuring dissociation constants of membrane protein subunits or isolated transmembrane (TM) helices. The assembly of individual TM helices also provides a model for the folding of larger helical membrane proteins from the unfolded, membrane-inserted form with intact transmembrane helices.

Membrane Protein Folding

217

3.1. Analytical ultracentrifugation After centrifugation of a protein to equilibrium, the concentration distribution in the cell is dependent on the effective mass of the protein (i.e., the mass of the molecule corrected for buoyancy). Because the equilibrium distribution does not depend on the shape of the molecule, it is an effective technique for obtaining molecular weights of well-behaved proteins. If the protein is an oligomer that dissociates in the concentration range of the centrifugation experiment, the concentration distribution will reflect the oligomerization equilibrium and the distribution can be fit to obtain dissociation constants. In the case of membrane proteins, the situation is complicated because the sedimenting species is not the protein alone but the detergent-protein complex. This can be dealt with either by adjusting the solvent density (Choma et al., 2000; Tanford et al., 1974) or by judiciously choosing a detergent (Fleming et al., 1997; Ludwig et al., 1982). In an ideal case, the solvent density matches the detergent so that the detergent does not contribute to the effective mass of the protein in the centrifuge tube (Fleming, 2008). Another significant difference between soluble and membrane proteins is the appropriate concentration units, and consequently, the standard state (Fleming, 2002). A membrane protein is generally confined to the volume defined by the micelle (Sehgal et al., 2005), not the total solvent volume, so that increasing the detergent concentration decreases the effective concentration of the membrane protein, even if the bulk concentration has not been changed. Thus, the most appropriate concentration units are mole fraction units in the micelle phase.

¨rster resonance energy transfer (FRET) 3.2. Fo Fo¨rster resonance energy transfer (FRET) measurements have been used to analyze transmembrane helix dimerization energetics in various detergents. If the Fo¨rster distance for a selected donor and acceptor pair is greater than the interchromophore distance in the oligomer, an approximate average degree of association at a particular peptide-detergent ratio can be determined by measuring the variation in the fluorescence intensity of the donor attached to peptide as a function of the concentration of the acceptor while keeping the total peptide-detergent ratio constant (Adair and Engelman, 1994; Chung et al., 1992; Gallivan and Dougherty, 1999; Reddy et al., 1999). Then, by carrying out the measurements at different peptidedetergent ratios, dissociation constants can be determined. Hristova and coworkers established that FRET methods could be applied to measuring free energies of helix-helix interactions in bilayers by demonstrating homogenization and equilibration of transmembrane

218

Heedeok Hong et al.

peptides integrated in either multilamellar or large unilamellar vesicles (You et al., 2005). FRET measurements for different donor-acceptor ratios have to be made from individually prepared samples because homogenization by titration is hard to achieve in the vesicle system. The ability to measure dissociation constants in bilayers is a major advantage over equilibrium sedimentation that can use only detergent systems, but because there is a limited dilution range possible in vesicle systems, high-affinity interactions are inaccessible with this approach.

3.3. Disulfide cross-linking DeGrado and coworkers introduced a disulfide cross-linking method to measure the free energy of transmembrane helix oligomerization in detergent micelles and, importantly, lipid vesicles (Cristian et al., 2003). In this method, cysteines are introduced into the oligomerizing transmembrane peptide at positions where they can form disulfide bonds when they are in close proximity. If placed appropriately, disulfide formation is more favorable in the oligomer compared to the monomer. Thus, by measuring the fraction of disulfide formed as a function of reduction potential, it is possible to measure dimerization constants. Again, the ability to measure dissociation constants in bilayers is a major advantage, but high affinity interactions may be inaccessible because of the limited dilution range possible.

3.4. Genetic assay systems (TOXCAT, POSSYCAT, and GALLEX) A number of genetic screens and selections have been developed to probe transmembrane helix oligomerization. Most of the methods tether a DNA binding domain that binds to DNA as a dimer to the TM domain. In this manner, DNA binding is coupled to TM domain oligomerization. By coupling gene expression to DNA binding, it is possible to assess TM domain oligomerization by monitoring gene expression. Langosch and coworkers developed the first system, using the transcriptional activator ToxR fused to the lacZ gene, which can be either used in a selection or readily screened using the well developed technology for detecting b-galactosidase activity (Gurezka and Langosch, 2001; Langosch et al., 1996). In 1999, Russ and Engelman converted the approach into a genetic selection, called TOXCAT, in which the ToxR dimer regulates expression of chloramphenicol acetyl transferase (CAT), conferring chloramphenicol resistance (Russ and Engelman, 1999). Langosch and coworkers developed a similar system, called POSSYCAT, in which the CAT gene is in single copy on the chromosome (Gurezka and Langosch, 2001). Leeds and Beckwith developed a system in which the TM domain is fused to the N-terminal domain of l-repressor, which can confer resistance to the lytic growth of phage l (Leeds and Beckwith, 1998). Schneider and Engelman expanded the approach to

Membrane Protein Folding

219

hetero-oligomers, employing a LexA DNA binding domain and named the assay GALLEX (Schneider and Engelman, 2003). Protein fragment complementation assays have also been developed to assess oligomerization in membranes (Remy and Michnick, 1999). Here two fragments of a protein that are inactive separately are fused to a membrane protein. Oligomerization brings the inactive fragments together where they can assemble and reconstitute activity. Thus, activity is coupled to oligomerization. To our knowledge, no genetic screens or selections have been implemented for antiparallel TM helix interactions, which would be a useful advance. A major advantage of these genetic screens and selection systems is that a huge number of TM variants can be tested for their ability to oligomerize. Moreover, oligomerization is assessed in a natural membrane rather than a membrane mimetic environment. A disadvantage, however, is that free energies of association cannot be measured directly. Nevertheless, free energies are correlated with CAT expression in the TOXCAT system, which allows for approximate free energies to be inferred (Duong et al., 2007; Russ and Engelman, 1999).

4. Methods for Measuring Multipass a-helical Membrane Protein Stability Reversible folding, an essential requirement for making thermodynamic stability measurements, is not easily achieved for larger polytopic helical membrane proteins. Unfolding of helical membrane proteins induced by most methods, such as thermal and chemical approaches, is irreversible as reviewed by Stanley and Fleming (2008). Currently the only viable method for measuring complex a-helical membrane protein folding energetics is an SDS unfolding assay (Lau and Bowie, 1997). Khorana and coworkers made the seminal observation that bacteriorhodopsin (bR) can be refolded from an SDS denatured state (London and Khorana, 1982). On the basis of this observation, Paula Booth pioneered studies of the mechanism of membrane protein folding by studying the kinetics of SDS unfolding and refolding (Booth et al., 1996). Lau and Bowie (1997) developed a thermodynamic stability assay for the protein diacylglycerol kinase by monitoring unfolding as a function of SDS concentration. A similar assay was later used for measuring the stability of bR (Chen and Gouaux, 1999; Faham et al., 2004) and the disulfide-bond thiooxidoreductase DsbB (Otzen, 2003), though in the latter case equilibrium constants were inferred by kinetic measurements (see also Curnow and Booth, 2007). The SDS unfolding assay is similar to urea and GuHCl denaturation of soluble proteins, except that a denaturing agent drives unfolding.

220

Heedeok Hong et al.

As with any method to monitor unfolding and refolding reactions, it is necessary to have an experimental probe that is sensitive to the conformational change. Methods that have been used include the far UV CD signal (Curnow and Booth, 2007; Lau and Bowie, 1997), the absorbance or fluorescence of Trp residues (Booth et al., 1996; Otzen, 2003), and the retinal chromophore of bacteriorhodopsin (Booth et al., 1996; Faham et al., 2004). Equilibrium unfolding with SDS is best illustrated by bR, as it is the best characterized and simplest system. A typical unfolding curve for bacteriorhodopsin monitored by retinal absorbance is shown in Fig. 8.3A. The unfolding curves for bR are fit under a number of assumptions that have varying levels of support. First, we assume that the system is in equilibrium throughout the experiment. This seems well justified by the finding that essentially the same curves are observed starting from the native state and adding SDS, or starting from the SDS denatured state and diluting the SDS (Lau and Bowie, 1997). Second, the unfolding reaction is assumed to be essentially two state, with minimal contributions from unfolding intermediates. The two-state assumption appears to be an excellent approximation because unfolding curves obtained by retinal absorbance and by far-UV circular dichroism, probes sensitive to very different structural parameters, show essentially the same unfolding curves (Curnow and Booth, 2007; Faham et al., 2004) (see Fig. 8.3B). Third, we assume that the unfolding free energy is linear with SDS concentration using mole fraction units. As pointed out by Otzen and coworkers (Sehgal et al., 2005), the best concentration units are micellar mole fraction, but we have used the bulk mole fraction. We originally applied this approach only because it is simple and appeared to fit the data well in the transition zones. More recently, measurements of folding and unfolding rates as a function of SDS concentration in the Booth lab also appear consistent with this simple analysis (Curnow and Booth, 2007). The theoretical justification remains unknown to our knowledge. As long as we keep extrapolations to a minimum, however, by calculating only unfolding free energies near the transition zones, large errors are unlikely. Fourth, we assume that the spectroscopic changes of the native state as a function of SDS concentration are linear. There is no justification for this assumption, but there is also no justification for a more complex model. We therefore apply the simplest model that fits the data. Applying these assumptions provides excellent fits to the unfolding curves and allows us to extract unfolding free energies in the transition zones (Fig. 8.3). The nature of the unfolded state in SDS remains somewhat murky (Renthal, 2006). Neutron-scattering experiments with soluble proteins support a model where SDS micelles bind to the polypeptide chain, reminiscent of beads on a string (Ibel et al., 1990). The polypeptide generally occupies the micelle surface, but presumably hydrophobic portions of the protein are more buried in the apolar micelle core. This model is consistent

221

Membrane Protein Folding

Normalized OD560(%)

A 100 80 60 40 20 0 B

0.2

0.4

0.6

0.8

1

Fraction folded

0.8 0.6 0.4

OD 560 CD 228

0.2 0 0.2

0.4 0.6 0.8 SDS mole fraction

Figure 8.3 SDS unfolding of bR. (A)Unfolding of bR at a concentration of 0.1 mg/ml in bicelle composed of 15 mM 1,2-Dimyristoyl-sn-Glycero-3-Phosphocholine (DMPC), 6 mM 3-[(3-cholamidopropyl)dimethylammonio]-2-hydroxy-1-propanesulfonate (CHAPSO) and 10 mM sodium phosphate (pH 6.0) induced by titrating in 20% (w/v) SDS in the same bicelle mixture. Unfolding was monitored by detecting the absorption of the retinal chromophore at 560 nm. The fitted curve is obtained using the assumptions described in the text (i.e., two-state folding, linear dependence of unfolding free energy with SDS concentration and linear dependence of the native state absorbance with SDS concentration). (B) Unfolding curves for bR generated by monitoring the retinal absorption at 560 nm (o) and far UV CD at 228 nm (□). The curves are essentially identical, consistent with a two-state assumption.

with the results from the Otzen group that the heat capacity decreases upon unfolding in SDS, which suggests additional shielding from solvent by SDS micelle binding (Sehgal and Otzen, 2006). It seems reasonable to suggest that, for membrane proteins, the hydrophobic TM helical segments can remain helical and somewhat buried in the micelle while the hydrophilic portions become associated with the polar head groups, but this is largely speculation.

222

Heedeok Hong et al.

How much structure remains within these putative micelle and polypeptide complexes? According to a straightforward interpretation of CD spectra, DsbB shows essentially no loss of helical content (Otzen, 2003), whereas bR and diacyl glycerolkinase lose about 40% and 15 % of helical content, respectively, upon unfolding in SDS (Faham et al., 2004; Lau and Bowie, 1997). Renthal (2006) has pointed out that the helical content obtained from NMR structures in SDS are consistently higher than observed by CD measurements. He suggests that CD underestimates helical content in SDS, perhaps because of changes in peptide absorbance. On the other hand, nuclear Overhauser effects (NOEs) that are used to calculate nuclear magnetic resonance (NMR) structures are sometimes observed for transiently stabilized conformations and therefore may underestimate unraveled helices that may also be present in the ensemble. Moreover, the close correspondence of unfolding curves measured by far UV CD and other unfolding probes strongly suggest that CD changes reflect conformational changes rather than simply environmental effects on extinction coefficients (Faham et al., 2004). The maintenance of considerable helical structure is an advantage of SDS unfolding because it somewhat resembles the presumed unfolded state in membranes in which transmembrane helix domains can remain folded (Popot and Engelman, 1990). It is an open question how much residual tertiary structure remains. Many membrane proteins, including bR, run normally in SDS-PAGE suggesting that the properties of the detergent and protein complexes are similar to soluble proteins, which appears inconsistent with a compact denatured state (Renthal, 2006). In our own unpublished hydrogen exchange results, we see an increase in water accessibility throughout bR, further suggesting a loss of folded structure. Moreover, an NMR structure of a two helix fragment of bR in SDS shows considerable maintenance of the TM helical structure, but no helix-helix interactions (Pervushin et al., 1994). Nevertheless, it is clear that stable tertiary interactions can be maintained in SDS as various oligomers remain intact in SDS complexes. Thus, the possibility of unbroken tertiary interactions in SDS remains a caveat to the interpretation of these unfolding experiments (Renthal, 2006).

5. Methods to Study the Stability of b-barrel Membrane Proteins Because of the completely different design principles of b-barrel membrane proteins, their unfolded reference state is different from that of a-helical membrane proteins. Therefore, both numerical values of measured stabilities as well as the methods to obtain these values are quite different. In favorable cases, the unfolded reference state of b-barrel membrane proteins is a completely denatured form that is no longer associated with lipids or other amphiphiles.

Membrane Protein Folding

223

5.1. SDS denaturation Beta-barrel membrane proteins are unusually resistant to denaturation by SDS due to their extensive cross-strand H-bonding network. Like many individual TM a-helices, unboiled samples of b-barrel membrane proteins do not unfold and show an anomalous migration behavior by SDS-PAGE. SDS molecules do not bind proportionally to the length of the polypeptide chain, which leads to faster or slower migration by SDS-PAGE than would be expected given their molecular mass. However, when membrane proteins of this class are boiled in SDS, they unfold completely, losing most of their secondary and tertiary structure. This phenomenon has been known since the early 1970s as heat modifiability of bacterial outer membrane proteins (Omps). The detailed mechanism of this denaturation is not well understood and different forms of even the same membrane protein can be modified up or down from the true molecular mass upon omission of the boiling step in SDS-PAGE. For example, folded full-length OmpA migrates at 30 kD, faster than unfolded 35 kD OmpA (Surrey and Jahnig, 1992), whereas the folded TM domain of OmpA migrates at 21 kD, slower than the 19 kD unfolded form of this domain (Arora et al., 2000). Complete heat modification requires and, therefore, is an indicator of the correct tertiary structure (i.e., closure of the b-barrel). Folding intermediates such as a membrane-surface adsorbed form and a partially inserted form, which have much of their native secondary structures already developed, migrate by unboiled SDS-PAGE as if they were completely denatured or at intermediate values (i.e., at 35 and 32 kD, respectively, for OmpA) (Kleinschmidt and Tamm, 1996). Although SDS does not affect the folded structure of unboiled Omps, it does reduce their thermal stability. The thermal transition temperature (Tm) decreases approximately linearly over several decades as the mole-fraction of SDS is increased in a mixed SDS and nondenaturing detergent micelle system (Mogensen et al., 2005).

5.2. Thermal denaturation Measurements of thermal denaturation can provide Gibbs free energy (△G), enthalpy (△H), entropy (△S), and heat-capacity changes (△Cp) between the folded and unfolded states of proteins. While calorimetric and spectroscopic measurements have been employed to characterize these parameters for many soluble proteins (and to determine whether their unfolding is truly two state), reversible thermal denaturation of membrane proteins has so far not been achieved. However, there have been numerous studies using irreversible thermal denaturation to get more qualitative insights into the stability of b-barrel membrane proteins. A recent comparative study of a range of b-barrel membrane proteins of different sizes illustrates this nicely (Burgess et al., 2008).

224

Heedeok Hong et al.

Some active transport systems of bacterial outer membranes have large 22-stranded b-barrels with an embedded plug domain. Thermal denaturation studies by differential scanning calorimetry (DSC) have demonstrated that the plug domain and the surrounding b-barrel are autonomous folding units. For example, the plug domain of the iron-siderophore transporter FhuA unfolded reversibly at 65 C and the b-barrel denatured irreversibly at 74 C (Bonhivers et al., 2001). Substrate binding increased the reversible transition to 71 C while the higher Tm transition remained unchanged. When the plug domain was deleted, the irreversible transition of the bbarrel decreased to 62 C, indicating that the presence of the plug stabilized the barrel structure. The autotransporter AIDA has a b-barrel TM domain (b2) and a surface-located b1 domain. Thermal denaturation in detergent micelles showed that the b1 domain stabilizes the b2 domain (Mogensen et al., 2005). Similarly, the interfacial a-helix of the lipid A biosynthesis protein PagP stabilized its b-barrel TM domain (Huysmans et al., 2007). Subunit interactions between monomers of trimeric porins may also be studied by thermal denaturation. For example, mutations breaking intersubunit salt bridges have been shown to decrease the trimer-monomer Tm of OmpF from 72 to about 50 C and △Hcal from 430 to about 280 kcal/ mol (Phale et al., 1998). These examples show that even in the absence of full thermodynamic descriptions, thermal denaturation studies are quite useful for analyzing domain and subunit interactions in b-barrel membrane proteins.

5.3. Solvent denaturation with urea or GdnHCl In favorable cases, reversible refolding from a completely denatured state in solution to the native state in lipid bilayers can be achieved with b-barrel membrane proteins. This was first demonstrated for OmpA in lipid bilayers of different lipid compositions (Hong et al., 2004). In these experiments, completely solubilized unfolded protein in 8 M urea (or 6 M GdnHCl) in the absence of detergent was refolded in the lipid bilayer of interest. Besides the aforementioned SDS-PAGE assay with unboiled samples, fluorescence spectroscopy, limited proteolysis, and single channel conductance measurements in planar lipid bilayers were used to ascertain quantitative conversion to the native structure (Arora et al., 2000). Unfolding of OmpA was monitored by SDS-PAGE, Trp fluorescence, and CD spectroscopy at different concentrations of denaturant. A plot of unfolded fraction versus denaturant concentration showed sigmoidal curves with fairly sharp transitions (Fig. 8.4). To prove reversibility the reverse experiment was also performed: unfolded protein was incubated with lipid bilayers at increasing amounts of denaturant and the unfolded fraction was determined. The unfolding and refolding curves were practically identical

225

Membrane Protein Folding

A [urea] 35 kD 30 kD 35 kD 30 kD

0 1.0 2.0 3.0 4.0 4.5 5.0 5.5 6.0 7.0 8.0

Unfolding Refolding

Unfolded fraction

B 1.0 0.8 0.6

pH 10.0

0.4

pH 9.2 pH 8.5 pH 8.0 pH 7.5 pH 7.0

0.2 0.0 0

4 6 [urea] (M )

8

5.0

1.6

4.5

1.4

4.0

1.2

3.5 1.0

ΔGo

u,H2O

3.0

0.8

2.5

0.6

2.0 7.0

8.0

pH

9.0

m-value (kcal mol−1 M −1)

(kcal mol−1)

C

2

10.0

Figure 8.4 Two-state equilibrium folding of OmpA in bilayers at different values of pH. (A) Equilibrium unfolding (upper gel) and refolding (lower gel) of OmpA in C16:0C18:1PC: C16:0C18:1PG lipid bilayers (92.5:7.5) measured by SDS-PAGE of unboiled samples. The approximate midpoints of transition are indicated by arrows. The unfolding and refolding reactions were incubated overnight in 10 mM HEPES buffer (pH 7.5) containing 2 mM EDTA. The protein concentration and the lipidto-protein ratio were 5.6 mM and 800, respectively. (B) pH-dependent equilibrium unfolding measured by Trp fluorescence. The protein concentration and the lipidto-protein ratio were 1.4 mM and 800, respectively. The unfolding curves at pH 10.0 obtained by Trp fluorescence (filled circles), far-UV circular dichroism (crosses), and the SDS PAGE shift assay (open diamonds), which are measures of lipid binding, secondary structure, and tertiary structure, respectively, superimpose in equilibrium measurements although they are not all synchronized in kinetic experiments. (C) Free energy of unfolding △Gou,H2O of OmpA in C16:0C18:1PC: C16:0C18:1PG bilayers as a function of pH. The free energies and m-values were obtained from best fits of the data of panel B to the two-state model described by Eqs. (8.3) and (8.4).

226

Heedeok Hong et al.

(see also Fig. 8.4A). A small amount (typically 5% to 10 mol %) of the negatively charged lipid POPG was included in the bilayer and the experiments were performed under basic and low salt conditions (pH > 7.5, [NaCl] < 30 mM). This ensured that the denatured state of the protein became completely dissociated from the membrane surface by electrostatic repulsion (the calculated pI of OmpA is 5.6). Figures 8.4B and 8.4C show the pH dependence of the equilibrium folding of OmpA in lipid bilayers. The fact that the unfolding and refolding curves monitored by Trp fluorescence, far-UV CD and SDS-PAGE were reversible and exactly superimposed (shown for pH 10 only, bold curve in Fig. 8.4B) strongly suggests that the transition is in two-state equilibrium because the three detection methods were previously shown to report on different kinetic phases of the folding pathway of OmpA (Kleinschmidt and Tamm, 2002). For a two-state equilibrium transition the free energy of unfolding as a function of denaturant is defined as follows:

DGo u ¼ RT ln Ku ¼ RT ln ð½unfolded=½foldedÞ

ð8:1Þ

Linear extrapolation of the free energy to 0 M urea allows one to calculate the free energy of unfolding in water, △Gou,H2O, and the proportionality constant m (Greene and Pace, 1974).

DGo u;H2 O ¼ DGo u þ m ½urea

ð8:2Þ

In practice, the equilibrium unfolding curve monitored by the average emission wavelength , defined as ¼S(Fili)/S (Fi), where li and Fi are the wavelength and the corresponding fluorescence intensity at the ith measuring step in the spectrum, respectively, is fitted to the following form of the two-state model (Mann et al., 1993).

< l >¼

< l>F þ < l>U Q1R exp½mð½denaturant Cm Þ=RT 1 þ Q1R exp½mð½denaturant Cm Þ=RT

ð8:3Þ

Here, F and U are the average emission wavelengths of the folded and unfolded states, respectively, determined from linear extrapolations of the two plateau values of the transition curve to 0 M urea. Cm is the urea concentration where folded and unfolded fractions are equal. QR is the relative ratio of the total fluorescence intensity of the native state to that of the unfolded state and is needed for normalization when one uses ’s to represent species concentrations. The free energy of unfolding is obtained from the fitted values of Cm and m.

227

Membrane Protein Folding

DGo u;H2 O ¼ mCm

ð8:4Þ

When the data of Fig. 8.4 were analyzed with this model, it was found that the stability (–△Gou,H2O) of OmpA decreased from 4.5 kcal/mol to 3.4 kcal/mol when the pH increased from 7 to 10 (Fig. 8.4C). The significance of the m-value, which did not vary much with pH, has been debated extensively in the protein folding literature. For soluble proteins, it is often thought that this value, which is also a measure of the cooperativity of the system, is related to the residue hydrophobic surface area that becomes exposed to solvent upon denaturation. What this means exactly for membrane proteins is not so clear at this time. The rather small stabilities of OmpA (3.4 kcal/mol at pH 10 reported by Hong and Tamm (2004) and 9.3 kcal/mol for a different bilayer system reported by Hong et al. (2007) are of the same order of magnitude as for water-soluble proteins of similar size. This might be surprising at first sight when one considers the extreme heat resistance of this and other b-barrel membrane proteins. However, if one simply calculates the free energy of transfer of all residues that are transferred into the lipid bilayer with the augmented Wimley-White hydrophobicity scale ( Jayasinghe et al., 2001), one finds that the net △Go amounts to only about 1 kcal/mol. Cross-strand hydrogen bonding in the membrane likely drives the reaction further but is counteracted by favorable hydrogen bonding with water in the denatured state in solution. Obviously, and as is true for soluble protein folding, the energetics of membrane protein folding are driven by a delicate balance between large numbers of much larger attractive and repulsive forces.

6. A Few Salient Results on Forces that Stabilize Membrane Proteins Although this is a review on thermodynamic methods to study membrane protein folding, we include a few salient results to better illustrate the usefulness of these methods. In this section we intentionally cherry-pick a few examples and do not intend to provide a comprehensive review on this rather broad topic.

6.1. Van der Waals/packing interactions Van der Waals packing is clearly an important factor stabilizing helical membrane proteins. Indeed, TM helices with no polar side chains can form stable oligomers (Popot and Engelman, 2000). Intimate packing provided by the GxxxG (Russ and Engelman, 1999), glycine zipper (Kim

228

Heedeok Hong et al.

et al., 2004; Wu et al., 2005), and leucine zipper motifs (Gurezka et al., 1999) provides the necessary structural complementarity for packing of TM helices (MacKenzie et al., 1997). Faham et al. (2004) found very similar energetic contributions of both polar and nonpolar side chains to the stability of bacteriorhodopsin. As nonpolar side chains constitute the vast majority of residues in the membrane, the results suggest that packing forces dominate.

6.2. Hydrogen-bonding interactions It has been widely assumed that hydrogen bonds in membrane proteins should be strong because of the lack of competition from water and the low dielectric environment inside the bilayer, which should strengthen electrostatic interactions. This idea is supported by the increased hydrogenbond strength seen for model compounds in apolar solvents relative to water (Klotz and Franzen, 1962). Most hydrogen-bonding interactions between side chains occur within a protein environment, however, not a membrane, so that an apolar solvent may not be a good model for these bonds. Indeed, the elimination of hydrogen bonds between oligomer subunits usually leads to quite modest changes in stability (Duong et al., 2007; Gratkowski et al., 2001; Hristova, 2008; Li et al., 2006; Stanley and Fleming, 2007), although some contribute more than 1 kcal/mol. Eight hydrogen-bonded side-chain interactions in bacteriorhodopsin were recently evaluated by double mutantcycle analysis and found to contribute only 0.6 kcal/mol on an average ( Joh et al., 2008). Thus, hydrogen bonds between side chains appear to be a net stabilizing force in membrane proteins, albeit not a dominant one.

6.3. Electrostatic interactions OmpA contains a cluster of charged residues consisting of Glu52, Arg138, Glu128 and Lys82 surrounded by aromatic residues Tyr8, Phe40 and Tyr94 in the center of the b-barrel (Fig. 8.5). A salt bridge between Glu52 and Arg138 on opposite walls of the barrel interior forms a complete barrier or gate for ionic conduction through this channel protein. Using double mutant-cycle analysis combined with urea-induced equilibrium unfolding, Hong et al. (2006) determined the strength of this salt bridge to be 5.6 kcal/mol. This is as strong as the strongest salt bridges observed deeply buried inside soluble proteins. Other pairwise electrostatic interaction energies in this charge tetrad were found to range from 0.6 to 3.5 kcal/mol (Fig. 8.5).

6.4. Aromatic-aromatic interactions Statistical analysis of membrane proteins of known structure and genomic sequence searches for identifying transmembrane segments of a-helical and b-barrel membrane proteins show that aromatic residues are dramatically

229

Membrane Protein Folding

Arg138

−5.6 kcal/mol

Glu52

−0.9 kcal/mol

−0.6 kcal/mol

−3.5 kcal/mol

Glu128 −1.7 kcal/mol

Lys82

Tyr8

Figure 8.5 Electrostatic interactions in gating region of OmpA. The interaction energies were determined from double mutant-cycle analysis (used with permission from Hong et al., 2006).

enriched in regions of the protein that contact the membrane-water interface (Adamian et al., 2005; Granseth et al., 2005; Landolt-Marticorena et al., 1993; Senes et al., 2007; Ulmschneider et al., 2005; Wimley, 2002). This prevalence is recapitulated in partition experiments of aromatic residuecontaining model peptides to the membrane interface (Wimley and White, 1996). The first thermodynamic measurements of aromatic side-chain contribution to membrane protein stability in bilayers of a bona fide integral membrane protein were performed with OmpA (Hong et al., 2007). It was found that isolated Trp, Tyr, and Phe residues (with no neighboring aromatic residues within a 7 A˚ radius) contribute on average 2.0, 2.6, and 1.0 kcal/mol, respectively, to the stability of this membrane protein. An unexpected new discovery of this study was that pairs of aromatic ˚ range contribute even more stability than they residues within a 7 A would individually. Pairwise interaction energies in the range from 0.7 to 1.4 kcal/mol were measured between aromatic residues of OmpA that reside in the lipid interface of OmpA. This is in the same range known for similar interactions in water soluble proteins (Burley and Petsko, 1985; Serrano et al., 1991).

6.5. Elastic lipid bilayer forces The molecular packing of lipids in a fluid bilayer is maintained by a combination of several forces: headgroup repulsion in the polar region, surface tension in the polar-nonpolar interface, and chain repulsion in the core region

230

Heedeok Hong et al.

(Marsh, 2007). These forces create a lateral pressure profile along the membrane normal, which cannot be directly measured but has been theoretically calculated (Cantor, 1999). It is not surprising that this pressure profile modulates the function of many integral membrane proteins, including receptors, ion channels, and enzymes (Botelho et al., 2006; Perozo, 2002; Rostovtseva et al., 2006). Internal membrane pressures also modulate the thermodynamic stability of membrane proteins, as was demonstrated with OmpA (Hong and Tamm, 2004). Including short-chain lipids in a reference bilayer increases the pressure in the interface region and including long-chain lipids with small headgroups and/or increasing the number of double bonds in the acyl chains increases the pressure in the core region of the bilayer. When the bilayer thickness was increased the stability of OmpA increased by 0.34 kcal/mol ˚ of additional bilayer thickness (Fig. 8.6). With the known circumferper A ˚ 2 of increased ence of the OmpA barrel this converts to 4 cal/mol per A hydrophobic contact area (i.e., about 20% of what would be expected from the hydrophobic effect) (Tanford, 1979). Another approximately 1.4 kcal/ mol per A˚ bilayer thickness is probably counteracted by an elastic lipid deformation energy due to a hydrophobic mismatch between the hydrophobic thickness of the protein and the equilibrium bilayer thickness in the absence of the protein. Because it can be estimated that about 25 lipids form the first shell of boundary lipid around OmpA, the energy for stretching ˚ if the or compressing a lipid molecule should be around 50–60 cal/mol/ A first lipid shell absorbed all mismatch deformation. In reality this energy would probably be distributed into further layers of lipid around the protein, decaying quite rapidly from the perimeter of the protein.

ΔGO

u, H2O

(kcal mol−1)

8

di-C14:1PC

di-C16:1PC

6

di-C18:1PC di-C20:1PC

4

C18:0C18:1PC C16:0C18:1PC

2

di-C14PC di-C12PC

0 di-C10PC −2 15

20

25 dhydrophobic (Å)

30

35

Figure 8.6 The stability of OmpA depends on bilayer thickness and curvature strain. Dependence of △Gou,H2O on the hydrophobic thickness of PC bilayers with saturated and monounsaturated acyl chains (filled circles) and cis-double-unsaturated acyl chains (open circles).

Membrane Protein Folding

231

7. Conclusion and Outlook Although the biogenic pathways of inserting membrane proteins into the bilayers of biological membranes are guided by chaperones and specific insertion machineries in vivo, it is fundamentally important to understand the forces that ultimately determine the final structures that membrane proteins adopt in lipid bilayers. Understanding these forces not only is of academic interest but also can guide future design of membrane proteins with altered functions and, from a practical standpoint of structural biologists, with better properties for forming two- and three-dimensional crystals or improved stabilities for NMR studies. With the advent of methods for evaluating the energetic effects of mutations on membrane protein thermodynamic stability, we have started to develop a quantitative, experiment-based picture of how protein sequences drive the formation of membrane protein structure. This is important because the elementary interactions that determine the folds of membrane proteins cannot be derived a priori from the vast existing knowledge of such forces in the soluble-protein-folding field. Some forces are similar, but others are very different in the complex milieu of lipid bilayers. Moreover, even for those elementary interactions that are similar, different sets of forces likely dominate the determination of the ultimate fold of membrane and soluble proteins. A challenge in this field has been to find appropriate conditions to generate unfolded states that refold reversibly into native states. As illustrated in this chapter, substantial progress has been made in this regard in the last few years for both a-helical and b-barrel membrane proteins. Despite this progress a lot of work remains. The unfolded states, especially for a-helical membrane proteins are still not very well defined. Because the denatured states of helical membrane proteins harbor significant amounts of secondary structures associated with SDS or other denaturing detergents, it is probably wise to directly compare only measurements done on the same protein with each other rather than try to make comparisons between different a-helical membrane proteins. However, the double mutant-cycle approaches that have been developed for both a-helical and b-sheet membrane proteins elegantly circumvent this problem (Hong et al., 2007; Hong et al., 2006; Joh et al., 2008). It does not matter what the denatured state really is as long as the effects of the mutations are independent. With b-barrel membrane proteins we are also beginning to understand the complexities that the lipid bilayer imparts on the folding reaction. Not surprisingly, the stability of these and probably also a-helical proteins depends on bilayer properties in a major way. Bilayer thickness, intrinsic curvature, specific chemistries of headgroup structures, and so on affect the

232

Heedeok Hong et al.

folding of membrane proteins. Biological membranes contain thousands of different lipid species. So, what is the best lipid background for folding studies of these proteins? The answer to this question is not clear at this point, and it may be that different lipid mixtures will have to be defined as appropriate reference states for membrane proteins that reside in different membranes in the cell. Although TM helix interactions can be studied in bilayers, there are still no methods for studying the folding of polytopic membrane proteins within a membrane. More needs to be done to explore the contribution of bilayer properties and how the energetics of molecular interactions vary with bilayer depth. The development of methods for unfolding and folding helical proteins in bilayers should be a major goal for the field. The tools for studying the folding thermodynamics discussed in this chapter have enabled our first forays into the energetics of membrane protein folding. However, there are vast new territories that will need to be explored in this field for decades to come. Membrane proteins have to catch up with 40 years of tremendous activity and accumulated knowledge on the folding and energetics of soluble proteins. No doubt time will add new tools and new, increasingly sophisticated insights. The field is still in its infancy, and we look forward to substantial growth as well as practical applications as it matures.

ACKNOWLEDGEMENTS Supported by grants GM063919 and GM081783 (J.U.B.) and GM051329 (L.K.T.) from the National Institutes of Health.

REFERENCES Adair, B., and Engelman, D. (1994). Glycophorin A helical transmembrane domains dimerize in phospholipid bilayers: A resonance energy transfer study. Biochemistry 33, 5539–5544. Adamian, L., Nanda, V., DeGrado, W. F., and Liang, J. (2005). Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins. Proteins 59, 496–509. Arora, A., Rinehart, D., Szabo, G., and Tamm, L. K. (2000). Refolded outer membrane protein A of Escherichia coli forms ion channels with two conductance states in planar lipid bilayers. J. Biol. Chem. 275, 1594–1600. Bonhivers, M., Desmadril, M., Moeck, G. S., Boulanger, P., Colomer-Pallas, A., and Letellier, L. (2001). Stability studies of FhuA, a two-domain outer membrane protein from Escherichia coli. Biochemistry 40, 2606–2613. Booth, P., Farooq, A., and Flitsch, S. (1996). Retinal binding during folding and assembly of the membrane protein bacteriorhodopsin. Biochemistry 35, 5902–5909. Booth, P. J., and Curnow, P. (2006). Membrane proteins shape up: Understanding in vitro folding. Curr. Opin. Struct. Biol. 16, 480–488. Epub 2006 July 3.

Membrane Protein Folding

233

Botelho, A. V., Huber, T., Sakmar, T. P., and Brown, M. F. (2006). Curvature and hydrophobic forces drive oligomerization and modulate activity of rhodopsin in membranes. Biophys. J. 91, 4464–4477. Burgess, N. K., Dao, T. P., Stanley, A. M., and Fleming, K. G. (2008). Beta-barrel proteins that reside in the E. coli outer membrane in vivo demonstrate varied folding behavior in vitro. J. Biol. Chem. 283, 26748–26758. Burley, S. K., and Petsko, G. A. (1985). Aromatic-aromatic interaction: A mechanism of protein structure stabilization. Science 229, 23–28. Cantor, R. S. (1999). Lipid composition and the lateral pressure profile in bilayers. Biophys. J. 76, 2625–2639. Chen, G. Q., and Gouaux, E. (1999). Probing the folding and unfolding of wild-type and mutant forms of bacteriorhodopsin in micellar solutions: Evaluation of reversible unfolding conditions. Biochemistry 38, 15380–15387. Choma, C., Gratkowski, H., Lear, J. D., and DeGrado, W. F. (2000). Asparagine-mediated self-association of a model transmembrane helix. Nat. Struct. Biol. 7, 161–166. Chung, L., Lear, J., and DeGrado, W. (1992). Fluorescence studies of the secondary structure and orientation of a model ion channel peptide in phospholipid vesicles. Biochemistry 31, 6608–6616. Cristian, L., Lear, J. D., and DeGrado, W. F. (2003). Use of thiol-disulfide equilibria to measure the energetics of assembly of transmembrane helices in phospholipid bilayers. Proc. Natl. Acad. Sci. USA 100, 14772–14777. Curnow, P., and Booth, P. (2007). Combined kinetic and thermodynamic analysis of alpha-helical membrane protein unfolding. Proc. Natl. Acad. Sci. USA 104, 18970–18975. Duong, M. T., Jaszewski, T. M., Fleming, K. G., and MacKenzie, K. R. (2007). Changes in apparent free energy of helix-helix dimerization in a biological membrane due to point mutations. J. Mol. Biol. 371, 422–434. Engelman, D., Chen, Y., Chin, C., Curran, A., Dixon, A., Dupuy, A., Lee, A., Lehnert, U., Matthews, E., Reshetnyak, Y., Senes, A., and Popot, J. (2003). Membrane protein folding: Beyond the two stage model. FEBS Lett. 555, 122–125. Engelman, D. M., Steitz, T. A., and Goldman, A. (1986). Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Chem. 15, 321–353. Faham, S., Yang, D., Bare, E., Yohannan, S., Whitelegge, J., and Bowie, J. (2004). Side-chain contributions to membrane protein structure and stability. J. Mol. Biol. 335, 297–305. Fleming, K., Ackerman, A., and Engelman, D. (1997). The effect of point mutations on the free energy of transmembrane alpha-helix dimerization. J. Mol. Biol. 272, 266–275. Fleming, K. G. (2002). Standardizing the free energy change of transmembrane helix-helix interactions. J. Mol. Biol. 323, 563–571. Fleming, K. G. (2008). Determination of membrane protein molecular weight using sedimentation equilibrium analytical ultracentrifugation. Curr. Protoc. Protein Sci. Unit 7.12.1–7.12.13. Gallivan, J. P., and Dougherty, D. A. (1999). Cation-pi interactions in structural biology. Proc. Natl. Acad. Sci. USA 96, 9459–9464. Granseth, E., von Heijne, G., and Elofsson, A. (2005). A study of the membrane-water interface region of membrane proteins. J. Mol. Biol. 346, 377–385. Gratkowski, H., Lear, J., and DeGrado, W. (2001). Polar side chains drive the association of model transmembrane peptides. Proc. Natl. Acad. Sci. USA 98, 880–885. Greene, R. F. Jr., and Pace, C. N. (1974). Urea and guanidine hydrochloride denaturation of ribonuclease, lysozyme, alpha-chymotrypsin, and beta-lactoglobulin. J. Biol. Chem. 249, 5388–5393.

234

Heedeok Hong et al.

Gurezka, R., Laage, R., Brosig, B., and Langosch, D. (1999). A heptad motif of leucine residues found in membrane proteins can drive self-assembly of artificial transmembrane segments. J. Biol. Chem. 274, 9265–9270. Gurezka, R., and Langosch, D. (2001). In vitro selection of membrane-spanning leucine zipper protein-protein interaction motifs using POSSYCCAT. J. Biol. Chem. 276, 45580–45587. Hong, H., Park, S., Jimenez, R. H., Rinehart, D., and Tamm, L. K. (2007). Role of aromatic side chains in the folding and thermodynamic stability of integral membrane proteins. J. Am. Chem. Soc. 129, 8320–8327. Hong, H., Szabo, G., and Tamm, L. K. (2006). Electrostatic couplings in OmpA ionchannel gating suggest a mechanism for pore opening. Nat. Chem. Biol. 2, 627–635. Hong, H., and Tamm, L. K. (2004). Elastic coupling of integral membrane protein stability to lipid bilayer forces. Proc. Natl. Acad. Sci. USA 101, 4065–4070. Hristova, K. (2008). Pathogenic activation of receptor dimers in mammalian membranes. In ‘‘FASEB summer research conferences; Molecular biophysics of cellular membranes’’ (J. U. Bowie, ed.), Saxtons River, Vermont. Huysmans, G. H., Radford, S. E., Brockwell, D. J., and Baldwin, S. A. (2007). The N-terminal helix is a post-assembly clamp in the bacterial outer membrane protein PagP. J. Mol. Biol. 373, 529–540. Ibel, K., May, R. P., Kirschner, K., Szadkowski, H., Mascher, E., and Lundahl, P. (1990). Protein-decorated micelle structure of sodium-dodecyl-sulfate—protein complexes as determined by neutron scattering. Eur. J. Biochem. 190, 311–318. Jayasinghe, S., Hristova, K., and White, S. H. (2001). Energetics, stability, and prediction of transmembrane helices. J. Mol. Biol. 312, 927–934. Joh, N. H., Min, A., Faham, S., Whitelegge, J. P., Yang, D., Woods, V. L., and Bowie, J. U. (2008). Modest stabilization by most hydrogen-bonded side-chain interactions in membrane proteins. Nature 453, 1266–1270. Kim, S., Chamberlain, A., and Bowie, J. (2004). Membrane channel structure of Helicobacter pylori vacuolating toxin: Role of multiple GXXXG motifs in cylindrical channels. Proc. Natl. Acad. Sci. USA 101, 5988–5991. Kleinschmidt, J. H., and Tamm, L. K. (1996). Folding intermediates of a beta-barrel membrane protein: Kinetic evidence for a multi-step membrane insertion mechanism. Biochemistry 35, 12993–13000. Kleinschmidt, J. H., and Tamm, L. K. (2002). Secondary and tertiary structure formation of the beta-barrel membrane protein OmpA is synchronized and depends on membrane thickness. J. Mol. Biol. 324, 319–330. Klotz, I. M., and Franzen, J. S. (1962). Hydrogen bonds between model peptide groups in solution. J. Am. Chem. Soc. 84, 3461–3466. Landolt-Marticorena, C., Williams, K. A., Deber, C. M., and Reithmeier, R. A. (1993). Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins. J. Mol. Biol. 229, 602–608. Langosch, D., Brosig, B., Kolmar, H., and Fritz, H. J. (1996). Dimerisation of the glycophorin A transmembrane segment in membranes probed with the ToxR transcription activator. J. Mol. Biol. 263, 525–530. Lau, F., and Bowie, J. (1997). A method for assessing the stability of a membrane protein. Biochemistry 36, 5884–5892. Leeds, J. A., and Beckwith, J. (1998). Lambda repressor N-terminal DNA-binding domain as an assay for protein transmembrane segment interactions in vivo. J. Mol. Biol. 280, 799–810. Li, E., You, M., and Hristova, K. (2006). FGFR3 dimer stabilization due to a single amino acid pathogenic mutation. J. Mol. Biol. 356, 600–612. London, E., and Khorana, H. (1982). Denaturation and renaturation of bacteriorhodopsin in detergents and lipid-detergent mixtures. J. Biol. Chem. 257, 7003–7011.

Membrane Protein Folding

235

Ludwig, B., Grabo, M., Gregor, I., Lustig, A., Regenass, M., and Rosenbusch, J. P. (1982). Solubilized cytochrome c oxidase from Paracoccus denitrificans is a monomer. J. Biol. Chem. 257, 5576–5578. MacKenzie, K., Prestegard, J., and Engelman, D. (1997). A transmembrane helix dimer: Structure and implications. Science 276, 131–133. Mann, C. J., Royer, C. A., and Matthews, C. R. (1993). Tryptophan replacements in the trp aporepressor from Escherichia coli: Probing the equilibrium and kinetic folding models. Protein Sci. 2, 1853–1861. Marsh, D. (2007). Lateral pressure profile, spontaneous curvature frustration, and the incorporation and conformation of proteins in membranes. Biophys. J. 93, 3884–3899. Mogensen, J. E., Kleinschmidt, J. H., Schmidt, M. A., and Otzen, D. E. (2005). Misfolding of a bacterial autotransporter. Protein Sci. 14, 2814–2827. Otzen, D. (2003). Folding of DsbB in mixed micelles: A kinetic analysis of the stability of a bacterial membrane protein. J. Mol. Biol. 330, 641–649. Perozo, E., Cortes, D. M., Sompornpisut, P., Kloda, A., and Martinac, B. (2002). Open channel structure of MscL and the gating mechanism of mechanosensitive channels. Nature 418, 942–948. Pervushin, K. V., Orekhov, V., Popov, A. I., Musina, L., and Arseniev, A. S. (1994). Threedimensional structure of (1-71)bacterioopsin solubilized in methanol/chloroform and SDS micelles determined by 15N-1H heteronuclear NMR spectroscopy. Eur. J. Biochem. 219, 571–583. Phale, P. S., Philippsen, A., Kiefhaber, T., Koebnik, R., Phale, V. P., Schirmer, T., and Rosenbusch, J. P. (1998). Stability of trimeric OmpF porin: The contributions of the latching loop L2. Biochemistry 37, 15663–15670. Popot, J., and Engelman, D. (1990). Membrane protein folding and oligomerization: The two-stage model. Biochemistry 29, 4031–4037. Popot, J., and Engelman, D. (2000). Helical membrane protein folding, stability, and evolution. Annu. Rev. Biochem. 69, 881–922. Rapoport, T. A. (2007). Protein translocation across the eukaryotic endoplasmic reticulum and bacterial plasma membranes. Nature 450, 663–669. Reddy, L., Jones, L., and Thomas, D. (1999). Depolymerization of phospholamban in the presence of calcium pump: A fluorescence energy transfer study. Biochemistry 38, 3954–3962. Remy, I., and Michnick, S. W. (1999). Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc. Natl. Acad. Sci. USA 96, 5394–5399. Renthal, R. (2006). An unfolding story of helical transmembrane proteins. Biochemistry 45, 14559–14566. Rostovtseva, T. K., Kazemi, N., Weinrich, M., and Bezrukov, S. M. (2006). Voltage gating of VDAC is regulated by nonlamellar lipids of mitochondrial membranes. J. Biol. Chem. 281, 37496–37506. Russ, W., and Engelman, D. (1999). TOXCAT: A measure of transmembrane helix association in a biological membrane. Proc. Natl. Acad. Sci. USA 96, 863–868. Schneider, D., and Engelman, D. M. (2003). GALLEX, a measurement of heterologous association of transmembrane helices in a biological membrane. J. Biol. Chem. 278, 3105–3111. Epub 2002 November 21. Sehgal, P., Mogensen, J., and Otzen, D. (2005). Using micellar mole fractions to assess membrane protein stability in mixed micelles. Biochim. Biophys. Acta 1716, 59–68. Sehgal, P., and Otzen, D. (2006). Thermodynamics of unfolding of an integral membrane protein in mixed micelles. Protein Sci. 15, 890–899.

236

Heedeok Hong et al.

Senes, A., Chadi, D. C., Law, P. B., Walters, R. F., Nanda, V., and Degrado, W. F. (2007). E(z), a depth-dependent potential for assessing the energies of insertion of amino acid side-chains into membranes: Derivation and applications to determining the orientation of transmembrane and interfacial helices. J. Mol. Biol. 366, 436–448. Serrano, L., Bycroft, M., and Fersht, A. R. (1991). Aromatic-aromatic interactions and protein stability. Investigation by double-mutant cycles. J. Mol. Biol. 218, 465–475. Stanley, A., and Fleming, K. (2007). The role of a hydrogen bonding network in the transmembrane beta-barrel OMPLA. J. Mol. Biol. 370, 912–924. Stanley, A., and Fleming, K. (2008). The process of folding proteins into membranes: Challenges and progress. Arch. Biochem. Biophys. 469, 46–66. Surrey, T., and Jahnig, F. (1992). Refolding and oriented insertion of a membrane protein into a lipid bilayer. Proc. Natl. Acad. Sci. USA 89, 7457–7461. Tamm, L. K., Arora, A., and Kleinschmidt, J. H. (2001). Structure and assembly of betabarrel membrane proteins. J. Biol. Chem. 276, 32399–32402. Epub 2001 June 29. Tanford, C. (1979). Interfacial free energy and the hydrophobic effect. Proc. Natl. Acad. Sci. USA 76, 4175–4176. Tanford, C., Nozaki, Y., Reynolds, J., and Makino, S. (1974). Molecular characterization of proteins in detergent solutions. Biochemistry 13, 2369–2376. Ulmschneider, M. B., Sansom, M. S., and Di Nola, A. (2005). Properties of integral membrane protein structures: Derivation of an implicit membrane potential. Proteins 59, 252–265. Wimley, W. C. (2002). Toward genomic identification of beta-barrel membrane proteins: Composition and architecture of known structures. Protein Sci. 11, 301–312. Wimley, W. C., and White, S. H. (1996). Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat. Struct. Biol. 3, 842–848. Wu, T., Malinverni, J., Ruiz, N., Kim, S., Silhavy, T. J., and Kahne, D. (2005). Identification of a multicomponent complex required for outer membrane biogenesis in Escherichia coli. Cell 121, 235–245. You, M., Li, E., Wimley, W., and Hristova, K. (2005). Forster resonance energy transfer in liposomes: Measurements of transmembrane helix dimerization in the native bilayer environment. Anal. Biochem. 340, 154–164.

C H A P T E R

N I N E

NMR Analysis of Dynein Light Chain Dimerization and Interactions with Diverse Ligands Gregory Benison* and Elisar Barbar*,1 Contents 238 241 246 247 251 255 256

1. NMR Methodology 2. Monomer-dimer Equilibrium Coupled to Electrostatics 3. Dimerization is Coupled to Ligand Binding 4. Folding is Coupled to Binding 5. Allostery in LC8 6. Summary References

Abstract NMR is a powerful tool for quantitative measurement of the thermodynamic properties of biological systems. In this review, we discuss the role NMR has played in understanding the various coupled equilibria in dimerization of dynein light chain LC8 and in its interactions with its ligands. LC8, a very highly conserved 89-residue homodimer also known as DYNLL, is an essential component of the dynein and Myosin V molecular motors and is also found in various other complexes. LC8 binds to disordered segments of its partners, promoting them to dimerize and form more ordered structures, often coiled coils. The monomer-dimer equilibrium is controlled by electrostatic interactions at the dimer interface, such as by phosphorylation of residue Ser88, which is a regulatory mechanism for LC8 in vivo. NMR experiments have uncovered several subtle interactions– weak dimerization of a phosphomimetic mutant, and allosteric interaction between the LC8 binding sites– that have been overlooked by other methods. NMR has also provided a residue-specific view of the titration of histidine residues at the LC8 dimer interface, and of a nascent helix in one of the binding partners, the primarily disordered dynein intermediate chain IC74. We give special attention to methods for quantitative interpretation of NMR spectra, an important consideration when using NMR to measure equilibria.

* 1

Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, USA Corresponding author: [email protected] (541)-737-4143, (541)-737-0481 (fax)

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04209-2

#

2009 Elsevier Inc. All rights reserved.

237

238

Gregory Benison and Elisar Barbar

Dynein light chain LC8 is a highly conserved, essential component of the microtubule-based molecular motor dynein. As a dynein subunit, LC8 is involved in fundamental processes including retrograde vesicular trafficking, ciliary/flagellar motility and cell division. LC8 also interacts with nondynein proteins in diverse systems, including some with roles in apoptosis, viral pathogenesis, enzyme regulation, and kidney development. LC8 is a moderately tight homodimer (Barbar et al., 2001, Liang et al., 1999). Its interactions with a number of non-dynein proteins led to the widely held view that LC8 functions as a cargo adaptor. However, based on recent structural and thermodynamic studies we proposed that LC8 is not primarily a dynein subunit, but is an essential component of diverse protein complexes that play roles in a variety of cellular systems (Barbar, 2008). In its role in these diverse systems, LC8 fits the definition of a hub protein with a common mode of action. In dynein and in all other complexes, LC8 acts as a dimerization engine, promoting the dimerization and ordering of the natively disordered monomeric proteins with which it interacts (Benison et al., 2006; Nyarko et al., 2004; Wang et al., 2004). Dimerization of LC8 is required for this activity because the monomer lacks the groove that is necessary for binding (Makokha et al., 2004; Wang et al., 2003). Interestingly, dimerization is disrupted by phosphorylation of a specific Ser residue at the interface, resulting in formation of an inactive monomer (Song et al., 2007, 2008). This review will focus on the use of NMR to understand four types of coupled equilibria: monomer-dimer equilibrium coupled to the electrostatic charge of a single interface residue, monomer-dimer equilibrium coupled to ligand binding, disorder to order transition in both LC8 and its binding partners coupled to ligand binding, and structural change in one subunit of the LC8 dimer coupled to ligand binding to the other subunit. These linkages were not apparent using other biochemical and biophysical techniques. We will not address in this review NMR sample preparation or data collection. Rather, we will focus on NMR data analysis and in particular accurate measurement of peak intensities, a topic that has received less attention in the literature but is very important for thermodynamics measurements.

1. NMR Methodology Exchanging populations. NMR is a useful tool for studying systems that exist in multiple interconvertible states. In proteins, these might be folded and unfolded conformations, or occupied and unoccupied binding sites. Remarkably, because of the minute energies involved in nuclear magnetic

NMR Analysis of Dynein Light Chain

239

transitions and their weak coupling with the rest of the system, NMR is capable of measuring populations and exchange rates in systems that remain unperturbed from thermodynamic equilibrium (Bain, 2003). The choice of NMR experiment and the type of conclusion that can be drawn depend on the relative populations of the different states and on the rate of exchange between them. Chemical exchange can be classified as fast, intermediate, or slow on the NMR time scale, and NMR can be used to study exchange processes in all of these regimes (Bain, 2008). When the exchange rate is significantly smaller than the chemical shift difference, the system is in slow exchange, and the NMR spectrum is simply the sum of the spectra of each population in the absence of exchange. The relative populations of the different states can be determined from the relative NMR signal intensities. Binding sites can be mapped by chemical shift perturbations. When the chemical shift difference and exchange rate are approximately equal, the system is in intermediate exchange, and the line widths become broader than what they would be without the exchange process. In practice the lines can become so broad that they are not observable above the noise level. With intermediate exchange, though populations cannot be as easily measured as in the case of slow exchange, it is still often possible to map binding sites to the residues that experience the most broadening. For systems in fast exchange the exchange rate is significantly larger than the chemical shift differences. The observed peak position is an average of its value in the various exchanging states weighted by the relative populations, and can therefore be used to measure the relative populations if the peak positions at the end points are known. Even when only one state is significantly populated, conformational exchange with minor states can still be observed through its influence on NMR relaxation parameters (Palmer et al., 2001). In particular, the exchange-derived contribution to transverse relaxation known as Rex is useful for studying exchange processes in proteins on the msec–msec time scale. Data reduction. For thermodynamic studies by NMR, we need quantitative measurements of abstract peak parameters such as peak positions, intensities, and linewidths. These parameters are not measured directly; rather, they must be extracted from spectra which in their raw form are simply a collection of intensity measurements sampled regularly on a frequency grid (Fig. 9.1). We call the process of extracting useful parameters from raw spectra data reduction, a term borrowed from X-ray crystallography where it refers to the analogous process of converting raw diffraction images into collections of structure factors (Leslie, 2006). The simple and robust tools available for data reduction in crystallography have contributed to its success as a method in structural biology. In NMR, despite less emphasis on development of robust automated methods (Malmodin and Billeter, 2005), the data reduction process is just as critical.

240

Gregory Benison and Elisar Barbar

Figure 9.1 The data reduction problem. A spectrum (S) which is a set of m intensities {Io. . .}, usually sampled on a rectangular grid, is mapped to a set of n parameters {P0. . .}. Parameters include peak intensities, positions, and linewidths. Which parameters are determined depends on the data reduction method used.

There are several methods available for extracting peak parameters from raw spectra and the choice of method depends on the application. For example, in collecting NOE restraints for structure determination by NMR, chemical shifts must be determined only well enough to allow assignment (Guntert, 1998). A typical tolerance for matching proton resonances is 0.02 ppm ( 12Hz at a typical field strength); trying to achieve tighter tolerance may not result in improved assignments (Malmodin and Billeter, 2005). In contrast, meaningful measurement of residual dipolar couplings requires very accurate determination of peak positions: within 2 Hz or less (Bax et al., 2001). In structure determination, intensities are often measured only accurately enough to place signals into coarse categories such as strong, medium, or weak. For the thermodynamic applications described in this review, it is desirable to measure populations (and therefore peak intensities) with the highest possible accuracy ( 5% error is typical). In our studies of dynein regulation and assembly we have used several data reduction strategies: ad-hoc analysis, integration, and modeling. An ad-hoc method relies on converting a spectrum to a visual representation such as a contour plot or a one-dimensional trace and then making an estimate of a parameter such as intensity or line width by looking at that visual representation. Ad-hoc methods can be very effective because they can leverage the ability of human visual processing to account subjectively for artifacts and overlap; however, this same subjectivity can introduce bias and is prone to over-interpreting the data, especially for weaker signals. In integration, a subset of spectral intensities Ii are chosen and added to obtain a peak intensity parameter. An optimum integration box size can be chosen that makes the best compromise between including more points (to capture the most signal) and excluding points at the edge (to minimize the inclusion of noise) (Rischel, 1995). A smaller-than-optimum box may be chosen to minimize the problem of overlap. In the extreme case, the integration box can be shrunk to a single point (the local maximum of

NMR Analysis of Dynein Light Chain

241

the peak). This approach has the disadvantage of excluding some points that contain useful signal, and it does not solve completely the problem of overlap. In modeling, the data reduction problem is performed in reverse: the peak parameters P0 . . . are used to reconstruct the spectral intensities I0, . . ., with some criterion such as the least-squared error being used to select the set of parameters that results in the best match between the reconstructed and the original spectrum. Modeling has been applied to NMR spectra both in the time domain (Andrec and Prestegard, 1998) and in the frequency domain (Denk et al., 1986). Modeling is somewhat more complex than other data reduction methods but is the best strategy in cases of overlap, and also can take advantage of inherent relationships between signals to obtain better parameter estimates (Andrec and Prestegard, 1998). For example, a cross-peak and an auto-peak might rigorously share a common chemical shift and linewidth in one dimension; or, in an experiment involving decay as a function of mixing time, only the intensity may change as a function of mixing time and not the peak position. A reduction in the number of independent parameters is beneficial because there is generally a trade-off between the complexity of a model and how accurately its parameters can be determined. Often, information obtained from strong peaks can be used to help constrain the fitting of weaker but more interesting peaks. This also has an analogy in X-ray crystallography: low-resolution reflections are critical in determining the unit cell parameters, which are then used to constrain the measurement of the high-resolution reflections that provide the most information about the structure (Otwinowski and Minor, 1997). All of the data reduction methods discussed above can be performed using the NMR visualization package burrow-owl (Benison et al., 2007a), available from the authors or at http://burrow-owl.sourceforge.net. Given the minimal (by today’s standards) computational resources needed, we recommend that full modeling be used whenever quantitative accuracy is important, making use of constraints from peak relationships where possible. We will now review several insights about LC8 that have been gained through NMR, the role that chemical exchange has played, and the data reduction methods used.

2. Monomer-dimer Equilibrium Coupled to Electrostatics LC8 is a symmetric dimer, with a b-sheet at the subunit interface composed of four strands from one subunit and one strand swapped over from the other subunit (Liang et al., 1999). Dimerization is moderately

242

Gregory Benison and Elisar Barbar

tight—Kd is 12 mM at 4 C and neutral pH—and becomes weaker at low pH (Barbar et al., 2001). The structure of the monomer in solution is quite similar to one chain of the dimer, except the swapped-over strand b3 is a flexible loop rather than a structured b-strand (Makokha et al., 2004; Wang et al., 2003). The monomer-dimer equilibrium can be modulated by electrostatic interactions of two residues at the dimer interface: His55, which becomes protonated as a function of pH, and Ser88, which is a target for phosphorylation (Fig. 9.2). NMR has been important in elucidating the roles of both of these residues in controlling dimerization, as shown below. Histidine pKa measurements. The wild-type dimer dissociates to a monomer at low pH, with a titration midpoint of pH 4.8 (Barbar et al., 2001). This pH-induced dissociation is linked to titration of residue His55 (Nyarko et al., 2005). When protonated, His55 inhibits dimerization by chargecharge repulsion with His550 . The histidine He and Ce chemical shifts, easily measured in 1H-13C HSQC spectra (Fig. 9.3A), are sensitive indicators of both protonation state of the histidine residue and monomer-dimer equilibrium. The dimerization reaction is slow on the NMR time scale, giving rise to separate peaks for the monomeric and dimeric forms. The monomer and dimer populations as a function of pH can be determined from the relative intensities of the corresponding peaks (Fig. 9.3B). In contrast, the protonation reaction is fast on the NMR time scale, giving rise to a single peak with a pH-dependent chemical shift. Titration curves for each histidine side chain individually can be measured by following the position of the Ce-He peak (Fig. 9.3c). These experiments demonstrate that systems often undergo multiple exchange processes on different time scales, and that NMR can be used to observe multiple exchange processes simultaneously. Uniquely amoung the three histidines of LC8, the protonation statue of His55 was identified as being coupled to dimerization because it has a pKa of 4.8, which corresponds to the macroscopic pKa determined from sedimentation equilibrium studies (Barbar et al., 2001), and it shows

a2

b2 a1

b5

b3

b4 b1

Figure 9.2 The symmetric dimer LC8. Residues Ser88 and His55, shown in red, influence the monomer-dimer equilibrium through electrostatic interactions at the dimer interface. Bound ligands are shown in yellow in the binding groove.

243

NMR Analysis of Dynein Light Chain

A

72d

68d

137

0.8

8.4

ppm

68m

72m 72d

0.6 0.4

Monomer 7.2

0.0 8.7

8.4

7.8 8.1 ppm

1H

8.0

7.6

0.2

55d 55d

138

8.8

Dimer

68d

136

C

1.0

1H

ppm

135

13C

B

68m

55m 72m

Normalized intensity

55m 134

7.5

2

3

4

5 pH

6

7

8

4

5

6 pH

7

8

Figure 9.3 Titration of residue His55 is linked to dimerization [taken from (Nyarko et al., 2005)]. (A) 1H-13C HSQC spectrum of wild-type LC8, showing He-Ce peaks for the three histidines, at pH 3 (black), pH 5 (blue), and pH 7 (red). ‘d’ and ‘m’ indicate peaks arising from dimer and monomer populations, respectively. (B) Monomer and dimer population as a function of pH, determined by following the peak intensities of the peaks in (A). (C) Titration curves for the histidines in the dimeric state, determined by following the proton chemical shifts of the peaks in (A).

no evidence of titrating in the dimeric state. The other two histidine residues (His68 and His72) exhibit typical pKa’s of 6.0 in the dimeric state. The mutant LC8H55K-a nontitratable analogue for protonated His55behaves as a monomer by size exclusion chromatography, and over a wide pH range has a spectrum resembling that of the pH-induced wild-type monomer (Nyarko et al., 2005). Measurement of dimer association and dissociation rates. Phosphorylation of LC8 is an important regulatory mechanism in vivo, as phosphorylation at Ser88 by Pak1 inhibits apoptosis and promotes cancerous phenotypes (Puthalakath et al., 1999; Song et al., 2008; Vadlamudi et al., 2004). Dimerization is disrupted in the phosphomimetic mutant LC8S88E which elutes as a monomer on a gel-filtration column (Song et al., 2007, 2008). 1H-15N HSQC spectra of LC8S88E collected at 1 mM, however, reveal the presence of a dimeric population (Fig. 9.4) in slow exchange with a monomeric population. From quantitation of peak intensities, the Kd for dimerization of LC8S88E is 1.4 mM, 100 times weaker than for wild type LC8 (Song et al., 2007). The association and dissociation rates for LC8S88E can be measured by monitoring exchange of NZ magnetization (Farrow et al., 1994). This is possible for systems where exchange is slow on the NMR time scale, yet not too slow that NMR signals decay before cross-peaks can build up (kex 0.5 sec1 5 sec1). LC8S88E falls into this favorable regime and crosspeaks are easily observed in NZ exchange experiments (Fig. 9.5). In this experiment, magnetization is frequency-labeled with the 15N chemical shift, transferred to the Z axis, then allowed to undergo chemical exchange for periods of up to 500 msec. The NZ experiment is a good example of the benefits of spectral modeling as a means of extracting peak parameters (see earlier). The experiment is

244

Gregory Benison and Elisar Barbar

A

B 63

63d

63

63m

110 59

59d

15N

ppm

59

59m

120 76

76d

76

76m 84d

84

84 9

130

57

82 82

9

9d 82m

57m 57d

57

10

84m 82d 9m

9

8

7

10 1H

9

8

7

ppm

Figure 9.4 Monomeric and dimeric populations of LC8S88E. (A) 1H-15N HSQC spectra of monomeric LC8H55K (red) overlayed on wild type LC8 (black). (B) HSQC spectrum of LC8S88E at 1 mM, which has the appearance of the superposition of the spectra in (A), showing that the sample contains a monomeric and a dimeric population in slow exchange. Taken from (Song et al., 2007).

collected as a series of two-dimensional spectra taken at different mixing times. For each residue, each spectrum contains two crosspeaks and two autopeaks. Therefore, modeling a single residue in a typical experiment containing a series of six mixing times involves fitting 6 4 (2 2 þ 1) ¼ 120 independent parameters if each peak is fit without considering its relationship to the others. The number of free parameters can be reduced to just 32 by recognizing that the peaks are related: for each residue, there is a 1H chemical shift and linewidth for the monomeric and dimeric states, and an 15N chemical shift and linewidth. These eight parameters (four chemical shifts, four linewidths) are sufficient to determine the positions of all the peaks in all the spectra, because the peak position does not change with mixing time and because the crosspeaks share positional parameters with the autopeaks. The intensity of each peak is still an independent function of mixing time. The total number of free parameters is therefore 8 þ (6 4) ¼ 32. Association and dissociation rate constants kon and koff are determined by fitting the time course of the intensity of the cross- and auto-peaks. Magnetization evolves via a system of linear differential equations, and chemical exchange simply adds linear terms to this system (Bain, 2008). Because the system remains at thermodynamic equilibrium, the contribution of chemical exchange to the overall magnetization exchange rate remains constant. In the NZ exchange experiment, two–site exchange of longitudinal magnetization between a monomeric environment (IM) and a dimeric environment (IM) is given by:

245

NMR Analysis of Dynein Light Chain

Figure 9.5 LC8S88E monomer-dimer exchange measured by NZ-exchange spectra. (A) Excerpts around residues Asp37 and Gly59, respectively, from the longitudinal magnetization exchange experiment, with mixing times of (left to right) 15 msec, 80 msec, 150 msec, and 250 msec. Auto-peaks for the monomer and the dimer are labeled m and d, respectively. (B) One-dimensional profile through an auto-peak (right) and a cross-peak (left) at 150 msec mixing time (circles, solid line) and 60 msec mixing time (triangles, dashed line). Solid lines correspond to the model described in the text from which the peak parameters are taken. (C) Peak intensities as a function of mixing time. The dimer-dimer autopeak and dimer-monomer crosspeak intensities are shown with dashed lines and solid symbols. The monomer-monomer autopeak and monomer-dimer crosspeak intensities are shown with solid lines and open symbols. The lines are fits to a monomer-dimer chemical exchange model.

½IM ¼ k ½IM þ k ½ID RM ½IM ½ID ¼ kþ ½IM k ½ID RD ½ID

ð9:1Þ

where RM and RD are the longitudinal relaxation rates for monomer and dimer, respectively. For this bimolecular reaction, the magnetization exchange rate constants kþ and k must be distinguished from the chemical exchange rate constants kon and koff of the LC8S88E dimerization reaction: kon 2M

D koff

The magnetic and chemical rate constants are related by the equations kþ ¼ konM and k ¼ koff. The monomer concentration M is a constant and can be calculated from the (known) total protein concentration and the rate constants kon and koff; therefore kon and koff are sufficient to determine kþ

246

Gregory Benison and Elisar Barbar

and k. Because temperature can be controlled precisely in the NMR experiment, the dependence of kon and koff on temperature can be measured, which in turn allows determination of DH0, DS0, and the activation energy for dimerization (Benison et al., 2009, in preparation).

3. Dimerization is Coupled to Ligand Binding The ligand-binding groove of LC8 is at the interface of the two subunits, and the bound ligand makes contacts with both subunits (Benison et al., 2007b; Liang et al., 1999). The swapped-over b-strand, which forms part of the binding site, becomes less ordered in the monomer (Makokha et al., 2004). These observations suggest that only the dimer has ligand-binding ability. Hence, the mutant LC8H55K, which is entirely monomeric, has no ligand-binding ability (Fig. 9.6). Surprisingly, the mutant LC8S88E, which is primarily monomeric, retains considerable binding capability in a GST pulldown assay (Fig. 9.6). In a fluorescence assay of 50 mM LC8S88E and 1 mM of the ligand Bim, where the LC8S88E is mostly monomeric, less than 10% of the ligand is bound (Song et al., 2008), confirming that the ligand cannot bind the monomeric form. However, at higher concentrations of LC8S88E in NMR experiments, a population with a spectrum resembling that of the wild-type complex appears as the ligands IC and Swa are titrated in (Fig. 9.7)—the ligands bind to the dimeric form and eventually shift the equilibrium entirely to dimer (Song et al., 2007). Since the monomeric, dimeric, and ligated forms are all in slow exchange, their populations as a function of ligand concentration can be measured from the peak intensities.

Figure 9.6 GST pulldown assays showing interaction of IC92–237 fused to GST with LC8 WT and mutants. The presence of a low-molecular-weight band indicates binding, and the intensity of the band indicates the efficiency of binding. (lane 1): Purified GSTIC92–237 in the absence of any lysates; (lane 2): with wild-type LC8 as a positive control; (lane 3): with monomeric LC8H55K, showing no binding; (lane 4): with LC8S88E, showing partial binding.

247

NMR Analysis of Dynein Light Chain

A 108

B

C

D 0.9

D

109

M

0.6

M

E

mole fraction

C

111

C

G

F

15

N ppm

110

108

D

D

109

M

M

M

110

0.3 0.0 H 0.9 0.6 0.3

C

111

C

0.0 9.6

8.9

8.2 9.6

8.9

8.2 9.6

8.9

8.2

1

H ppm

0.0

1.0

2.0

3.0

L, equiv.

Figure 9.7 Binding of ligands promotes the dimerization of LC8S88E. (A-D), (E-H) Titration with short peptides corresponding to the LC8-binding sites of the proteins Swallow (Swa) and IC74, respectively. Excerpts of 1H-15N HSQC spectra containing residue Gly63 at 0 equivalents (A, E), 0.4 equivalents (B, F), and 1.0 equivalents (C, G) show populations of monomeric LC8 (M), dimeric LC8 (D), and the LC8/peptide complex (C) in slow exchange. (D, H) Mole fractions of monomer (circle), dimer (square), and complex (triangle) as a function of ligand equivalents. Curves were calculated from the law of mass action using the Kd for LC8S88E association and a Kd for ligand binding. More tightly-bound ligands such as Swa are more efficient in shifting the monomer-dimer equilibrium towards the dimer. Reproduced from (Song et al., 2007).

Affinities for dimerization and ligand binding can be determined by fitting the intensities to theoretical titration curves derived from mass-action laws: 2X 2 (LC8)

K

(LC8)2

K1

(LC8)2X2

Where X is the ligand, K ¼ [(LC8)2]/[(LC8)] is the monomer-dimer equilibrium constant and K1 ¼ [X][(LC8)2]/[(LC8)2X] ¼ [X][(LC8)2X]/ [(LC8)2X2] is the ligand dissociation constant. Despite the high similarity of the structures of LC8/Swa and LC8/IC (Benison et al., 2007b), Swa binds LC8S88E with 100-fold greater affinity than IC. Note that in the preceding analysis a single dissociation constant is used to describe the first and second ligand-binding steps. A more complicated model, in which these are allowed to be different, is not justified by the LC8S88E titration data, but other experiments (described below) can detect small differences in the dissociation constants of the first and second binding steps.

4. Folding is Coupled to Binding The N-terminal segment of dynein intermediate chain IC74 is an excellent example of a natively disordered protein that forms a partially folded structure as part of a larger complex. A significant number of such

248

Gregory Benison and Elisar Barbar

proteins are now recognized, many of them having regulatory functions (Dyson and Wright, 2005b). The role of LC8 as a folding scaffold for IC74 has been described using a variety of techniques including limited proteolysis, circular dichroism spectroscopy, fluorescence, and NMR (Benison et al., 2006; Makokha et al., 2002; Nyarko et al., 2004). NMR has been useful both in mapping the direct interaction of LC8 and IC74 and in describing induced folding in IC74 distant from the binding site. Similar phenomena have been recognized or predicted in other LC8 binding partners (Barbar, 2008; Wang et al., 2004). Mapping the binding site in IC74. The N-terminal segment of the dynein intermediate chain IC74 is variable and subject to alternative splicing, yet contains a highly conserved "TQT box" (Nurminsky et al., 1998). This TQT box is protected from limited proteolysis by the binding of LC8 (Makokha et al., 2002). In a series of IC74 deletion mutants, only those containing the TQT box had LC8 binding affinity, and a small peptide corresponding to the TQT box region bound to LC8 (Lo et al., 2001). With NMR, all the specific residues in IC74 perturbed upon binding can be assigned. The construct IC7484–143 contains the LC8 binding site and some flanking residues. When a 15N-labeled segment of IC7484–143 is mixed with an unlabeled sample of LC8, certain signals of IC7484–143 vanish (Fig. 9.8) due to an intermediate exchange process between free and complexed IC7484–143 or between ordered and disordered conformations (Benison et al., 2006). The most-broadened peaks correspond to residues with the largest change in chemical environment upon binding LC8. The remaining residues show no change in chemical shift, which demonstrates that their conformation does not change appreciably upon binding: in a 60-residue segment around the LC8 binding site, IC74 remains disordered outside the small recognition motif (residues 126–134). Nascent order in a distal site. Although the segment immediately adjacent to the binding site does not gain secondary structure upon forming the complex, circular dichroism shows that binding of LC8 causes a modest increase in the helical content of the N-terminal segment of IC74, indicating the formation of a 24-residue helix. There are two predicted coiled-coil domains in the disordered N-terminal domain of IC74: one C-terminal to the LC8 binding site (residues 210–240) and one N-terminal (residues 1-30). In CD spectra of smaller domains containing the LC8 binding site and just one of these predicted coiled coils, the construct IC741–143 showed little change upon binding LC8, but the segment IC74114–260 showed increased helical content similar to that seen for the full N-terminal domain (Nyarko et al., 2004). The residues in the predicted coiled-coil around residue 230 are also protected from proteolysis by binding of LC8, despite being 100 residues distant from the LC8 binding site. Though this domain containing the predicted coiled coil is

249

NMR Analysis of Dynein Light Chain

B A

15N

ppm

110

IC

115

134

137

132 129

120 123

125

127 130

87 128

126

8.6

8.2 ppm

1H

7.8

LC8

Figure 9.8 Binding of LC8 to IC7484–143. (A) Overlay of 1H-15N HSQC spectra of 15 N-labeled IC7484–143 (black) and a 1:1 mixture of IC7484–143 with unlabeled LC8 (blue). All peaks arise from IC7484–143, because LC8 is not isotopically labeled. Labeled peaks correspond to those that are more than 80% attenuated in the complex. Taken from (Benison et al., 2006). (B) A model of the LC8 dimer bound to two chains of IC74. Residues in the binding site undergo intermediate conformational exchange, leading to peak broadening. Residues adjacent to the binding site remain disordered.

disordered completely in the absence of LC8, it has some intrinsic propensity to fold into a helix: addition of the osmolyte TMAO induces an increase in helicity that somewhat mimics what is observed upon binding of LC8. NMR spectroscopy provides a per-residue picture of this induced folding process. In unfolded proteins, the observation of sequential NOE signals often indicates the presence of latent secondary structure (Dyson and Wright, 2005a). In the construct IC74198–237, which contains the predicted coiled-coil but not the LC8 binding site, the presence of sequential HN-HN NOE’s clearly indicates a nascent helix (Fig. 9.9). Thus, even though this segment appears completely unfolded by CD, there is clear evidence for its propensity to form a coiled coil (Benison et al., 2006). Induced folding in LC8. LC8 is a well-ordered protein, leading to the view that it is a stable scaffold for the folding of disordered partners such as the dynein intermediate chain. However, the reverse is also true: LC8 itself is only fully folded when bound to one of its ligands. Apo-LC8 exhibits a small degree of flexibility that decreases upon forming a complex. Because the degree of

250

Gregory Benison and Elisar Barbar

Figure 9.9 Evidence for a nascent helix in IC74. Strip plots from 3D 1H-15N NOESYHSQC experiments recorded on 15N-labeled IC74198–237 showing sequential amideamide NOE connectivities indicated by horizontal lines. A complete set of strong ˚ distances amide-amide NOE’s for residues 223–228 at 5 C is typical of the 2.6–2.8 A in a-helices. Taken from (Benison et al., 2006).

flexibility in free LC8 is so much smaller than in the free ligands, it cannot be detected by methods like CD that report only on the overall ordered structure but it has been well-described by NMR experiments which can focus on the disordered regions and measure disorder on different time scales. The disorder in apo-LC8 is reflected by backbone 15N relaxation, which is a sensitive indicator of disorder and dynamics. In particular, the relaxation rates of apo-LC8 cannot be accounted for by the tumbling of a single rigid conformation; it is necessary to include Rex terms indicating conformational exchange on the ms-ms time scale (Fan et al., 2002; Hall et al., 2008). The Rex terms are not distributed evenly over the sequence, but localized to the binding groove indicating that these are the residues most affected by the conformational exchange. Upon forming a complex with a peptide derived from the KXTQT ligand Bim, nearly all of the conformational exchange vanishes — the complex behaves nearly like a rigid, single conformation. A similar result is observed for LC8/Swa (Hall et al., 2008). The disappearance of Rex terms demonstrates that binding incurs an entropic cost due to changes in the protein itself in addition to those arising from the solvent and ligand. The increased order of LC8 complexes relative to apo-LC8 is also reflected in better protection from amide proton exchange with the solvent, measured by NMR as H/D exchange (Benison et al., 2007b; Fan et al., 2002). Amide exchange rates are (as expected) reduced in the b-strand that directly contacts the ligand, but are also reduced in the interior b-strands, with the effect decreasing with increasing distance from the ligand-binding site. Thus ligand binding appears to reduce flexibility of the entire b-sheet.

NMR Analysis of Dynein Light Chain

251

The conserved glutamine residue of the ligand also forms a cap for the N-terminal of LC8 helix a2, and the amide exchange rates of the first few residues of this helix are greatly reduced in the complex. Interestingly, 15N relaxation demonstrates some disorder-to-order transition upon the monomer-to-dimer transition as well. There are residues distant from the dimer interface which display Rex behavior in the constitutively monomeric mutant LC8H55K, but not in the wild-type dimer (Hall et al., 2008). Thus LC8 exists in three states, ranging from least to most ordered: monomeric, dimeric, and complexed; with the monomeric form mostly ordered but with extensive heterogeneous dynamics, and the complexed form behaving almost as a rigid body.

5. Allostery in LC8 NMR evidence for allosteric interaction. LC8 has two identical ligandbinding sites at the dimer interface, which raises the interesting possibility of allosteric interaction (Fig. 9.10A). Crystal structures of free LC8 and several of its complexes have revealed a possible mechanism for such allostery: ligand binding is associated with an expansion of the peptide-binding groove due to shear motion at the dimer interface (Benison et al., 2008). Since this is primarily a change in the quaternary structure, binding of the first ligand could cause a global conformational shift that pre-organizes the second ligand binding site (Fig. 9.10B). Many oligomeric proteins which can undergo a shift in quaternary structure exhibit this type of allostery (Changeux and Edelstein, 2005). Crystallography has therefore defined a possible mechanism for allostery but has not provided any direct evidence for it, because there are no crystal structures of singly bound intermediates. Through NMR, however, it has been possible to characterize singly bound intermediates in solution. Titration monitored by NMR can provide evidence for allosteric interactions, because the number of unique conformational environments is reflected in the number of distinct chemical shifts that can be observed for each atom (Stevens et al., 2001). Theoretically, four distinct environments are possible for each residue: one for the free state, one for the doubly bound state, and two for the singly bound state (which lacks symmetry). However, in practice, for most residues less than four peaks are observed due to chemical shift degeneracy. If it is assumed that there is no allosteric interaction (i.e., binding at one site is accompanied by small local conformational changes, but no global changes that affect the conformation at the other site), each atom is expected to have only two possible chemical shifts because the two conformational environments in the singly bound state are degenerate with those of the free and doubly bound states. If binding of

252

Gregory Benison and Elisar Barbar

A X

X

X-LC8

LC8 K1d

K2d X-LC8-X LC8-X

X

X

B

C 59b

15N

59b

59f

59f 1H

Rel. intensity

D

1 0.5 0 0

0.5 Ligand:LC8

1

Figure 9.10 Model and evidence for allostery in LC8. (A) The two binding steps of LC8. X is the ligand. The two dissociation constants are defined as K1d ¼ ½LC8½X=½LC8 X; K2d ¼ ½LC8 X½X=½X LC8 X. The binding sites and ligands are identical, so [LC8-X] ¼ [X-LC8]. (B) A model for allostery in LC8 binding. Different polygonal shapes represent different conformations of LC8 and a black dot indicates an occupied ligand-binding site. (C) NMR evidence for allostery. Excerpts of of 1H-15N HSQC spectra of 15N-labeled LC8 titrated with unlabeled nNOS peptide at

NMR Analysis of Dynein Light Chain

253

the first ligand is associated with a global conformational change (as in Fig 9.10B), it is expected that for some residues this degeneracy will be removed, giving rise to intermediate peaks during titration. Such intermediate peaks are observed during titration of 15N-labeled LC8 with unlabeled ligands (Fig. 9.10C), confirming that there is allosteric interaction. An intriguing consequence of allosteric interaction between the binding sites is that the first and second binding constants K1d and K2d can be different. The ratio K1d =K2d is sufficient for calculating a theoretical titration curve by solving the following system of equations:

½LC8 ½X LC8 X K1d ¼ ½X LC8 ½LC8 X K2d ½X LC8 ¼ ½LC8 X ½LC8 þ ½X LC8 þ ½LC8 X þ ½X LC8 X ¼ 1

ð9:2Þ

For titration with the ligand nNOS (Fig. 9.10D), K1d =K2d ¼ 2:5 is the smallest (i.e., most conservative) ratio that provides a good fit to the data. The modest increase in binding affinity for the second binding event is consistent with an additional expansion of the binding groove accompanying the second ligation. NMR Data Reduction. The NMR titration experiments pose a challenge for NMR data reduction for several reasons. First, the most useful information (the population of the singly bound form) is carried by weaker signals (the intermediate peaks) in the presence of stronger signals (the apo and bound peaks); to be accurate, the weak signals must be measured with minimum interference from the strong ones. Second, peak overlap is an issue: although LC8 contains only a moderate number of residues (89), the HSQC spectrum becomes crowded during titration due to the presence of up to four peaks per residue. Thus for LC8/nNOS, only one residue (Gly59) shows four peaks entirely well-separated from each other and from all other signals (Fig. 9.10C). For all other residues with all ligands tested, the intermediate peaks suffer some degree of overlap. For this reason, modeling is the most useful data reduction method. A typical case is shown in (Fig. 9.11) for residue 37 during titration with nNOS; it is clear that full modeling is necessary to obtain reliable estimates of intensity.

(left to right) 0, 0.4, and 1.0 equivalents. Peaks for free LC8 (apo) and bound LC8 (doubly-occupied) are labeled ‘f’ and ‘b’, respectively. In the middle of the titration curve (middle panel), new peaks appear, which arise from singly bound LC8. (D) Titration curve for the resonances shown in (C). Crosses: free peak; circles: sum of intermediate peaks; squares: bound peaks. Curves represent populations predicted by the two-site binding model or K1d =K2d ¼ 2:5.

254

Gregory Benison and Elisar Barbar

A

B

C 1

15N,

ppm

Rel. population

0.5

0 D 1

0.5

1H,

ppm

0 119.6 120 120.4 15N, ppm

0

0.5 X, equivalents

1

Figure 9.11 Reduction of NMR titration data. (A) Small excepts of HSQC spectra (centered around the peak from residue 37) taken during the titration of 15N-labeled LC8 with a peptide derived from the ligand nNOS. Top to bottom: 0.2, 0.4, and 0.8 equivalents of ligand. (B) 1D slices along the 15N dimension, indicated by the dashed lines in (A). Dots represent the experimental data and the solid lines represent the models used for profile fitting, which are the sum of two 2D gaussian lineshapes, one for the apo-peak and one for the intermediate peak (which can be seen immediately downfield from the apo peak). (C) Integrated intensities for the apo (crosses), intermediate (squares), and bound (circles) peaks. Due to overlap, the measured relative populations add up to a total greater than 1.0, and the agreement with the theoretical titration curve is poor. (D) Intensities derived from profile fitting to a sum of 2D gaussians. The theoretical titration curves correspond to a K1d =K2d ratio of 2.5, and agree well with curves from other residues with better-resolved peaks.

In the titration experiment, like in the NZ exchange experiment described above, it is useful to exploit relationships between peaks to reduce the number of free parameters to be determined during modeling. Treating each peak independently, for each residue there are 4 5 n independent parameters (where n is the number of titration points). However, if the system behaves in the limit of slow exchange, then peak position and linewidth do not change throughout the titration, so that the number of independent parameters is 4 (4 þ n). Thus for a typical titration of 8 data points, the number of free parameters can be reduced from 160 to 48. This is an example of the well-known tradeoff between the complexity of a model and how accurately its parameters can be determined: assuming constant peak positions and widths allows more accurate determination of

NMR Analysis of Dynein Light Chain

255

minor populations, because the positions of weak peaks are constrained by the corresponding peaks at other points along the titration curve where the population is higher. Alternately, treating the peaks independently allows for the possibility that the system does not behave strictly as a system in slow exchange (i.e., peak position and linewidth can vary with titration). In the case of LC8 titrations, we find the simpler model the more appropriate tradeoff given the degree of overlap in the spectra.

6. Summary Our understanding of the role of LC8 in the assembly of dynein and other complexes has improved greatly in recent years, and much of this understanding has come from NMR. Contributions from NMR have been important in elucidating: the role of electrostatics at the dimer interface in LC8 monomer-dimer equilibrium; the coupling of ligand binding and dimerization; the coupling of folding and binding in LC8 and its partners; and allostery in LC8 binding. NMR has several unique advantages as a tool for studying protein complex assembly. Often, NMR is complementary to other methods: for example, a nascent helix in IC74 was predicted from sequence analysis and CD spectroscopy, and then observed by NMR. Crystallography suggested a mechanism for allostery in LC8 binding, and NMR provided evidence for it. More importantly, NMR can be used to observe processes that are difficult to observe by other methods, either because they involve only small changes in energy (such as the weak dimerization of LC8S88E) or because they involve only minor populations (such as the residual disorder in apo-LC8 or the nascent order in IC74). NMR can provide a per-residue view of a process understood in coarser detail by other methods: for example, sedimentation analysis suggested a connection between LC8 dimerization and titration of a histidine residue, and NMR identified His55 as the single histidine of the three in LC8 to behave this way. Sequence comparison and limited proteolysis roughly mapped a binding site for LC8 on IC74, and NMR then delineated the binding site precisely. Several specific strategies have contributed to the success of NMR as a tool for studying assembly in the dynein complex. Despite advances in NMR analysis of large systems, it is still desirable to work with the smallest system possible. Small constructs often retain the essential features of the larger system from which they are derived— in IC74, a successful strategy was to study the binding domain and the nascent helix domain individually. Differential labeling has been useful for simplifying the analysis of complexes: many of the experiments described in this review involve forming a complex where only one component is labeled with NMR-active nuclei.

256

Gregory Benison and Elisar Barbar

Finally, careful attention to data reduction techniques is crucial when studying exchanging populations where weak signals are often the most interesting. Modeling is the most general technique and the best for leveraging the natural relationships between NMR signals. Even complicated spectra can be modeled quite quickly by modern computers, so we find it worthwhile in most quantitative NMR studies. In conclusion, NMR can be the key to a more rigorous, residuespecific, thermodynamic characterization of a biomolecular system, possibly uncovering phenomena such as weak associations or allosteric interactions that have been missed by other methods.

REFERENCES Andrec, M., and Prestegard, J. (1998). A Metropolis monte carlo implementation of Bayesian time-domain parameter estimation: Application to coupling constant estimation from antiphase multiplets. J. Magn. Reson. 130, 217–232. Bain, A. (2003). Chemical exchange in NMR. Prog. Nucl. Magn. Reson. Spectrosc. 43, 63–103. Bain, A. (2008). Chemical exchange. Annual Reports on NMR spectroscopy, 63, 23–48. Barbar, E. (2008). Dynein light chain LC8 is a dimerization hub essential in diverse protein networks. Biochemistry 47, 503–508. Barbar, E., Kleinman, B., Imhoff, D., Li, M., Hays, T., and Hare, M. (2001). Dimerization and folding of LC8, a highly conserved light chain of cytoplasmic dynein. Biochemistry 40, 1596–1605. Bax, A., Kontaxis, G., and Tjandra, N. (2001). Dipolar couplings in macromolecular structure determination. Methods Enzymol. 339, 127–174. Benison, G., Berkholz, D., and Barbar, E. (2007a). Protein assignments without peak lists using higher-order spectra. J. Magn. Reson. 189, 173–181. Benison, G., Karplus, P., and Barbar, E. (2007b). Structure and dynamics of LC8 complexes with KXTQT-motif peptides: Swallow and dynein intermediate chain compete for a common site. J. Mol. Biol. 371, 457–468. Benison, G., Nyarko, A., and Barbar, E. (2006). Heteronuclear NMR identifies a nascent helix in intrinsically disordered dynein intermediate chain: Implications for folding and dimerization. J. Mol. Biol. 362, 1082–1093. Benison, G., Karplus, P. A., and Barbar, E. (2008). The Interplay of Quaternary Structure and Ligand Binding in the Diverse Interactions of Dynein Light Chain LC8. J. Mol. Biol. 384, 954–966. Changeux, J., and Edelstein, S. (2005). Allosteric mechanisms of signal transduction. Science 308, 1424–1428. Denk, W., Baumann, R., and Wagner, G. (1986). Quantitative evaluation of cross peak intensities by projection of two-dimensional NOE spectra on a linear space spanned by a set of reference resonance lines. J. Magn. Reson. 67, 386–390. Dyson, H., and Wright, P. (2005a). Elucidation of the protein folding landscape by NMR. Methods Enzymol. 394, 299–321. Dyson, H., and Wright, P. (2005b). Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 6, 197–208. Fan, J., Zhang, Q., Tochio, H., and Zhang, M. (2002). Backbone dynamics of the 8 kDa dynein light chain dimer reveals molecular basis of the protein’s functional diversity. J. Biomol. NMR 23, 103–114.

NMR Analysis of Dynein Light Chain

257

Farrow, N. A., Zhang, O. W., Forman-Kay, J. D., and Kay, L. E. (1994). A heteronuclear correlation experiment for simultaneous determination of 15N longitudinal decay and chemical-exchange rates of systems in slow equilibrium. J. Biomol. NMR 4, 727–734. Guntert, P. (1998). Structure calculation of biological macromolecules from NMR data. Q. Rev. Biophys. 31, 145–237. Hall, J., Hall, A., Pursifull, N., and Barbar, E. (2008). Differences in dynamic structure of LC8 monomer, dimer, and dimer-peptide complexes. Biochemistry 47, 11940–11952. Leslie, A. (2006). The integration of macromolecular diffraction data. Acta Crystallogr D. Biol. Crystallogr. 62, 48–57. Liang, J., Jaffrey, S., Guo, W., Snyder, S., and Clardy, J. (1999). Structure of the PIN/LC8 dimer with a bound peptide. Nat. Struct. Biol. 6, 735–740. Lo, K., Naisbitt, S., Fan, J., Sheng, M., and Zhang, M. (2001). The 8-kDa dynein light chain binds to its targets via a conserved (K/R)XTQT motif. J. Biol. Chem. 276, 14059–14066. Makokha, M., Hare, M., Li, M., Hays, T., and Barbar, E. (2002). Interactions of cytoplasmic dynein light chains Tctex-1 and LC8 with the intermediate chain IC74. Biochemistry 41, 4302–4311. Makokha, M., Huang, Y., Montelione, G., Edison, A., and Barbar, E. (2004). The solution structure of the pH-induced monomer of dynein light-chain LC8 from Drosophila. Protein Sci. 13, 727–734. Malmodin, D., and Billeter, M. (2005). High-throughput analysis of protein NMR spectra. Prog. Nucl. Magn. Reson. Spectrosc. 46, 109–129. Nurminsky, D., Nurminskaya, M., Benevolenskaya, E., Shevelyov, Y., Hartl, D., and Gvozdev, V. (1998). Cytoplasmic dynein intermediate-chain isoforms with different targeting properties created by tissue-specific alternative splicing. Mol. Cell Biol. 18, 6816–6825. Nyarko, A., Cochrun, L., Norwood, S., Pursifull, N., Voth, A., and Barbar, E. (2005). Ionization of His 55 at the dimer interface of dynein light-chain LC8 is coupled to dimer dissociation. Biochemistry 44, 14248–14255. Nyarko, A., Hare, M., Hays, T., and Barbar, E. (2004). The intermediate chain of cytoplasmic dynein is partially disordered and gains structure upon binding to light-chain LC8. Biochemistry 43, 15595–15603. Otwinowski, Z., and Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods in Enzymology 276, 307–326. Palmer, A., Kroenke, C., and Loria, J. (2001). Nuclear magnetic resonance methods for quantifying microsecond-to-millisecond motions in biological macromolecules. Methods Enzymol. 339, 204–238. Puthalakath, H., Huang, D., O’Reilly, L., King, S., and Strasser, A. (1999). The proapoptotic activity of the Bcl-2 family member Bim is regulated by interaction with the dynein motor complex. Mol. Cell 3, 287–296. Rischel, C. (1995). Fundamentals of peak integration. J. Magn. Reson. A 116, 255–258. Song, C., Wen, W., Rayala, S., Chen, M., Ma, J., Zhang, M., and Kumar, R. (2008). Serine 88 phosphorylation of the 8-kDa dynein light chain 1 is a molecular switch for its dimerization status and functions. J. Biol. Chem. 283, 4004–4013. Song, Y., Benison, G., Nyarko, A., Hays, T., and Barbar, E. (2007). Potential role for phosphorylation in differential regulation of the assembly of dynein light chains. J. Biol. Chem. 282, 17272–17279. Stevens, S., Sanker, S., Kent, C., and Zuiderweg, E. (2001). Delineation of the allosteric mechanism of a cytidylyltransferase exhibiting negative cooperativity. Nat. Struct. Biol. 8, 947–952. Vadlamudi, R., Bagheri-Yarmand, R., Yang, Z., Balasenthil, S., Nguyen, D., Sahin, A., den Hollander, P., and Kumar, R. (2004). Dynein light chain 1, a p21-activated kinase 1-interacting substrate, promotes cancerous pheno-types. Cancer Cell 5, 575–585.

258

Gregory Benison and Elisar Barbar

Wang, L., Hare, M., Hays, T., and Barbar, E. (2004). Dynein light chain LC8 promotes assembly of the coiled-coil domain of swallow protein. Biochemistry 43, 4611–4620. Wang, W., Lo, K., Kan, H., Fan, J., and Zhang, M. (2003). Structure of the monomeric 8-kDa dynein light chain and mechanism of the domain-swapped dimer assembly. J. Biol. Chem. 278, 41491–41499.

C H A P T E R

T E N

Characterization of Parvalbumin and Polcalcin Divalent Ion Binding by Isothermal Titration Calorimetry Michael T. Henzl* Contents 1. Introduction 2. Practical Aspects of Data Collection 2.1. Buffer selection 2.2. Standardization of metal ion and chelator solutions 2.3. Removal of metal ions from buffers and protein solutions 2.4. Preparation of EDTA-agarose 2.5. Removal of Ca2+ from protein samples 2.6. Binding parameters for competing chelators 2.7. ITC data collection 2.8. Data set preparation 2.9. Preparation of the parameter file 2.10. General comments on ITC model development 2.11. Binding in the presence of a competing chelator 2.12. Binding in the presence of a competing metal ion 2.13. Least-squares minimization 2.14. Error analysis 3. Illustrative Global ITC Analyses of Divalent Ion Binding 3.1. The independent two-site model 3.2. Competing chelator 3.3. Competing metal ion 3.4. Analysis of the divalent ion binding by the S55D/E59D variant of rat a-parvalbumin 3.5. Analysis of positively cooperative divalent ion binding 3.6. Modeling divalent ion binding by Phl p 7 4. Conclusion Acknowledgment References

*

260 262 263 263 264 264 265 267 268 268 271 271 275 276 277 280 281 281 283 284 285 288 291 295 295 295

Department of Biochemistry, University of Missouri, Columbia, Missouri, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04210-9

#

2009 Elsevier Inc. All rights reserved.

259

260

Michael T. Henzl

Abstract The elucidation of structure-affinity relationships in EF-hand proteins requires a reliable assay of divalent ion affinity. In principle, isothermal titration calorimetry (ITC) should be capable of furnishing estimates for Ca2+- and Mg2+-binding constants in these systems. And because the method yields the binding enthalpy directly, ITC can provide a more detailed view of binding energetics than methods that rely on 45Ca2+ or fluorescent indicators. For several reasons, however, it is generally not possible to extract reliable binding parameters from single ITC experiments. Ca2+ affinity is often too high, and Mg2+ affinity is invariably too low. Moreover, least-squares minimization of multisite systems may not afford a unique fit because of strong parameter correlations. This chapter outlines a strategy for analyzing two-site systems that overcomes these obstacles. The method—which involves simultaneous, or global, least-squares analysis of direct and competitive ITC data—yields binding parameters for both Ca2+ and Mg2+. Application of the method is demonstrated for two systems. The S55D/ E59D variant of rat a-parvalbumin, noteworthy for its elevated metal ion affinity, binds divalent ions noncooperatively and is amenable to analysis using an independent two-site model. On the other hand, Phl p 7, a pollen-specific EFhand protein from timothy grass, binds Ca2+ with positive cooperativity. Divalent ion-binding data for the protein must be analyzed using a two-site Adair model.

1. Introduction The ability to selectively replace amino acid residues within a protein provides a powerful tool for investigating its function. Effective incorporation of site-specific mutagenesis into a structure-function study, however, requires an effective assay for altered function. For an enzyme, perturbations can be readily diagnosed with a standard activity assay, and the nature of the perturbation can often be interpreted from its relative impact on kcat and or kcat/Km. However, the biological activity of many physiologically significant proteins is restricted to noncovalent interaction with one or more target ligands. For those systems, it is important to have an accurate, convenient assay for altered ligand affinity. Isothermal titration calorimetry (ITC) offers a particularly attractive method for studying protein-ligand reactions (Freyer and Lewis, 2008; Lewis and Murphy, 2005; Leavitt and Freire, 2001; Ladbury and Chowdhry, 1996). In an ITC experiment, automated additions of the ligand are made at regular intervals to a solution of the protein at constant temperature. The modern commercial instruments used to study proteinligand interactions are so-called power-compensation calorimeters. They maintain a constant, minuscule temperature difference between the sample cell and a reference cell filled with buffer (or water). The magnitude of the heat flux that accompanies an injection of titrant is estimated from the

Parvalbumin and Polcalcin Divalent Ion Binding

261

change in electrical power required to maintain the temperature difference. Because the vast majority of protein-ligand interactions are accompanied by detectable absorption or evolution of heat, ITC is a nearly universal method for monitoring binding reactions. ITC is the only technique that yields binding enthalpy directly. Because the enthalpy is measured as a function of the ligand concentration, the titration can also provide estimates for the overall binding free energy and stoichiometry. Given the free energy change and the reaction enthalpy, the entropic contribution can be calculated by difference. Protonation events can be readily diagnosed from the observation of a buffer-dependent reaction enthalpy. Moreover, the apparent DCp for the interaction, which is obtained from the temperature dependence of the reaction enthalpy, can furnish insight into the molecular details of the protein-ligand interaction. For a number of years, this laboratory has sought to understand the physical and structural basis for variations in divalent ion affinity in EF-hand proteins (Celio, Pauls, and Schwaller, 1996; Kawasaki and Kretsinger, 1995; Kretsinger, 1980; Strynadka and James, 1989), using the parvalbumin molecule as a model. Parvalbumins are small (Mr 12,000), vertebrate-specific proteins that contain two EF-hand binding motifs (Heizmann and Kagi, 1989; Kretsinger and Nockolds, 1973; Pauls, Cox, and Berchtold, 1996). Although they are generally viewed as interchangeable Ca2+ buffer proteins, in fact there are significant differences in divalent ion-binding properties among the various parvalbumin isoforms. We have attempted to exploit these differences to further our understanding of determinants of divalent ion affinity. We have also conducted numerous site-specific mutagenesis studies. The ability to facilely measure divalent ion affinity has been crucial to this project. The lab purchased a microtitration calorimeter from Hart Scientific in 1993. However, we continued to rely on 45Ca2+ flow-dialysis to evaluate Ca2+ affinity until 2001. The calorimeter was used exclusively for measuring binding enthalpies, employing the binding constants determined by corresponding flow-dialysis measurements in least-squares modeling. Mg2+ affinity was largely ignored, although occasionally an attempt was made to extract Mg2+ affinity by competition in flow-dialysis assays. This approach seems archaic in retrospect. However, it was simply not possible to extract reliable estimates for the parvalbumin divalent ion-binding constants from single ITC experiments or to accurately assess small changes in these values. The Ca2+ affinities in these systems approach or exceed the upper limit for accurate estimation by ITC. By contrast, the Mg2+ affinities are generally too low to determine unambiguously. For both ions, the leastsquares treatment often fails to find a unique solution, because of correlations between the binding parameters for the two binding sites. In recent years, we have employed an ITC-based strategy that surmounts the obstacles. The method involves simultaneous, or global, nonlinear least-squares modeling of direct and competitive ITC data. The

262

Michael T. Henzl

decision to adopt this approach was, in large part, motivated by the success of global NLLS methods for modeling sedimentation equilibrium ( Johnson et al., 1981), sedimentation velocity (Philo, 1997; Schuck, 2000), and timeresolved fluorescence (Beechem, 1992) data. Analysis of single-site systems by ITC is generally straightforward, in the absence of confounding ancillary issues (e.g., limited protein availability or solubility, high ligand affinities). Although the complexity of the problem increases exponentially with the number of binding sites, we herein demonstrate that the treatment of two-site systems remains quite tractable. We currently employ a MicroCal VP-ITC for our calorimetric analyses. However, the first application of the global ITC approach, to measure the divalent ion-binding parameters for wild-type rat b-parvalbumin, was actually performed with data collected on a Hart 4209 microtitration calorimeter. Although the 4200 series has been vastly improved in the interim, our 1993-vintage instrument had a signal-to-noise ratio fully an order of magnitude lower than the VP-ITC. That the analyses with data from the two calorimeters yielded comparable parameter values is testimony to the robustness of the global fitting strategy. The motivation for writing this chapter was to provide a reasonably detailed road map for implementing the global ITC analysis of calciumbinding proteins harboring two binding sites. The chapter begins with a discussion of some practical issues related to data acquisition. We then outline the general approach for modeling ITC data, including descriptions of the in-house-generated software that we employ for our analyses. The chapter concludes with two applications of the method. In the first, we describe the analysis of the S55D/E59D variant of rat a-parvalbumin, noteworthy for its extremely high affinity for Ca2+ and Mg2+. The binding of divalent ions in this case is macroscopically noncooperative, permitting the use of an independent two-site model. In the second example, we analyze divalent ion binding by Phl p 7, a member of the polcalcin family. Ca2+ binding in this system is positively cooperative, necessitating the application of a general two-site model.

2. Practical Aspects of Data Collection To obtain estimates of the binding enthalpies and binding constants for both Ca2+ and Mg2+, aliquots of the protein of interest are subjected to a battery of titrations. Typically, Ca2+ titrations are performed at two or more protein concentrations. Mg2+ titrations are likewise performed at one or more protein concentrations. The protein is titrated with Ca2+ at several fixed levels of Mg2+. It is also titrated with Ca2+ in the presence of

Parvalbumin and Polcalcin Divalent Ion Binding

263

competitive chelators—typically EDTA, EGTA, and NTA. Finally, the protein is titrated with Mg2+ in the presence of EDTA.

2.1. Buffer selection The analysis buffer should have low affinity for Ca2+ and Mg2+. Phosphate and bicarbonate are unsuitable. Although di- and tricarboxlic acids (e.g., malonate and citrate, respectively) could potentially be used as buffers between pH 5 and 6, they have the potential to chelate divalent ions and should be avoided. At pH 7.4, the pH at which we conduct our analyses, Hepes is an effective buffer. Other buffers in this series (e.g., Mes, Pipes, Mops) would be logical candidates at other pH values. Protonation phenomena associated with binding can be diagnosed by titration of the protein with Ca2+ in two or more buffers differing in ionization enthalpy. A buffer-dependent binding enthalpy signals (de)protonation. If the apparent binding enthalpy is plotted against the buffer ionization enthalpy, the slope equals the number of protons involved at the pH of the analysis. Ionization enthalpies have been tabulated for many common buffer systems (Fukada and Takahashi, 1998; Goldberg, Kishore, and Lennen, 2002). The composition of titrant and sample buffers should be as closely matched as possible, to minimize artifactual mixing heats. Unfortunately, when dealing with low-molecular-weight ligands, dialysis of the protein and ligand against a common pool of buffer is not an option. Alternatively, the protein could be dialyzed to equilibrium against a pool of buffer, which is subsequently used to prepare the titrant solutions. However, this approach requires time-consuming standardization of titrant solutions for every protein preparation. In our experience, with exercise of due care in their preparation, the variation between individual buffer solutions is acceptably low.

2.2. Standardization of metal ion and chelator solutions Careful consideration should be given to the preparation of reagents. In this lab, 1.0 M solutions of analytical reagent-grade CaCl22H2O and MgCl22H2O are prepared gravimetrically in the analysis buffer. Preparing the concentrated stock solutions in the analysis buffer helps to ensure that heat-of-mixing artifacts arising from adding the diluted titrant to the protein solution will be minimal. Aliquots of the stock Mg2+ solution are diluted in analysis buffer to obtain 100 mM and 2.0 mM solutions. Then 20 mM and 1.0 mM solutions of Ca2+ are similarly prepared. We store aliquots of the stock solutions and dilutions in tightly sealed polypropylene containers at 20 C.

264

Michael T. Henzl

We treat analytical-grade Na2EDTA2H2O as a primary standard. A 100.0 mM stock solution is prepared in the analysis buffer and then appropriately diluted, with the analysis buffer, to obtain 10.0 mM and 0.15 mM solutions. The latter is titrated with the 1.0 mM Ca2+ and 2.0 mM Mg2+ solutions to obtain more precise estimates of the respective metal ion concentrations. Additionally, 100 mM stock solutions of EGTA and NTA are likewise prepared and diluted with buffer to obtain 10.0 mM solutions. These are used in the sample preparation. Additionally, 0.15 mM EGTA and 1.0 mM NTA solutions are prepared. The former is standardized by titration with 1.0 mM Ca2+, the latter by titration with 20 mM Ca2+.

2.3. Removal of metal ions from buffers and protein solutions For the majority of cases, metal ions can be sequestered from protein preparations by passage over EDTA-derivatized agarose. The chelating matrix is prepared by carbodiimide-mediated coupling of EDTA to aminohexyl agarose.

2.4. Preparation of EDTA-agarose Aminohexyl agarose is prepared by activation with 1,4-butanedioldiglycidyl ether (Pepper, 1998), followed by reaction with 1,6-diaminohexane at pH 13. For example, 100 mL (moist cake) of Sepharose 4B is placed in a 500-mL polypropylene reaction bottle. In a fume hood, 100 mL of 0.6 N NaOH and 100 mL of 1,4-butanedioldiglycidyl ether are added, and the suspension is incubated overnight at 25 C with vigorous agitation (required to keep the minimally soluble ether in suspension). The suspension is filtered through a sintered-glass funnel and washed extensively with hot tap water (e.g., 10 vols) until the smell of the 1,4-butanedioldiglycidyl ether is faint. The resulting epoxy-activated agarose is ready for modification with 1,6-diaminohexane. Then, 1.0 g of 1,6-diaminohexane is dissolved in 100 mL of 1 M sodium carbonate, and the pH is adjusted to 13 with concentrated NaOH. This solution is added to the epoxy-activated gel cake, and the pH is readjusted to 13, if necessary. The resulting suspension is incubated for 24–48 h with vigorous agitation at 25 C. At the end of the incubation, excess reagents are removed by filtration, and the gel is washed extensively with water. After removing excess water, the aminohexyl agarose is resuspended in 250 mL of 0.5 M EDTA, pH 6.0. Then 5.0 g of EDAC are added to the gel suspension, which is then incubated at room temperature with constant agitation. After 4 h, the pH is measured and readjusted to 6.0 if necessary. A second 5.0-g aliquot of EDAC is added, and incubation is continued for another 4 h. The resulting material is washed extensively with water and then 10 mM Ca2+ to remove excess EDTA.

Parvalbumin and Polcalcin Divalent Ion Binding

265

The binding capacity is determined by saturating an aliquot of the gel with Ca2+, rinsing off the excess Ca2+, eluting the bound Ca2+ with dilute HCl (pH 1), and measuring the Ca2+ content of the eluate by flame atomic absorption spectrometry. One mL of EDTA-agarose thus prepared binds 20–30 mmol Ca2+. The EDTA-derivatized matrix retains a net positive charge due to the presence of unreacted amino groups, and it can act as an anion exchanger at low ionic strength. To prevent nonspecific electrostatic interactions between the acidic EF-hand proteins and the EDTA agarose, a solvent ion concentration of at least 0.15 M is recommended.

2.5. Removal of Ca2+ from protein samples Prior to loading the protein, residual divalent ions are stripped from the column with four bed volumes of dilute HCl (pH 2). The HCl in the head space is then removed and replaced with the analysis buffer, and the column is eluted with the analysis buffer until the pH of the eluate matches the buffer pH. The protein (2.5 mmol, at a concentration of 80 mM ) is loaded onto the column at 0.5 mL/min. After the protein has loaded, elution is continued at the same flow rate with buffer. Fractions of the eluate exhibiting significant UV absorbance are combined, and the resulting solution is assayed for residual Ca2+ by flame atomic-absorption spectrometry. Although the protein is diluted approximately 50% by passage over the column, recovery exceeds 90%. The residual Ca2+ concentration is typically less than 0.5 mM. Thus, if the protein concentration is 25 mM, removal is 99% complete. Using this protocol, we have successfully removed the Ca2+ from rat a-parvalbumin, which exhibits an average Ca2+-binding constant of 1.2 108 M1 in Hepes-buffered saline. The key to quantitative removal of divalent cations is to avoid overloading the column. Because a carboxylate is sacrificed in the coupling reaction, immobilized EDTA exhibits fairly modest affinity for Ca2+. The titration of an aliquot of EDTA-agarose with 20 mM Ca2+, in Hepesbuffered saline at pH 7.4, is displayed in Fig. 10.1. Least-squares analysis of the integrated data indicates the existence of two populations of binding sites. The major population binds with an average enthalpy of 3.8 kcal/ mol and average binding constant of just 2.7 (0.4) 105 M1. A minor population (<5%) binds with marginally higher affinity, 7.7 (1.0) 105 M1, and greater exothermicity, DH = 9.5 kcal/mol. The latter may represent adjacent chelating groups that are capable of functioning cooperatively. Clearly, the ability of the EDTA-agarose matrix to extract Ca2+ from proteins with binding constants exceeding 108 M1 is strongly dependent on mass action. Assuming 25 mmol of chelator per mL of gel, a 100-mL column would have a total capacity of 2.5 mmol. We typically load 2.5 mmol of protein, which corresponds to 5 mmol of divalent ion-binding sites. Thus, the ratio of chelator sites to protein sites is on the order of 500:1.

266

Michael T. Henzl

A

Power (mcal/s)

0 −4 −8 −12 −16

kcal/mole of injectant

B

0

30

60

90

120

150

Time (min)

180

0 −1 −2 −3 −4 0.0

0.5

1.0

1.5

2.0

2.5

Molar ratio

Figure 10.1 Titration of an aliquot of EDTA-derivatized agarose with Ca2+. The raw and integrated ITC data are displayed in panels A and B, respectively.

To date, we have encountered two proteins for which the EDTAderivatized agarose was unable to quantitatively remove the bound Ca2+. Because K+ does not compete with Ca2+ for the EF-hand sites in rat aparvalbumin, the protein exhibits substantially higher affinity for Ca2+ when K+, rather than Na+, is the major solvent cation. The average Ca2+-binding constant is 1.6 109 M1 in 0.15 M KCl, 0.025 M Hepes, pH 7.4. After passage over EDTA agarose in this buffer, the protein retains roughly 0.5 equivalents of Ca2+. In this case, the Ca2+ was removed in the presence of 0.15 M NaCl, 0.025 M Hepes, pH 7.4. The Na+ was subsequently replaced by K+ by repeated cycles of concentration and dilution, in an ultrafiltration cell, with (Ca2+-free) K+-containing buffer. The S55D/E59D variant of rat a-parvalbumin was the second protein that successfully resisted Ca2+ removal with EDTA-agarose. As discussed later in the chapter, this protein exhibits extremely high affinity for Ca2+. The binding constant for Ca2+ at the engineered CD site exceeds 109 M1 in Hepes-buffered saline and 1010 M1 when K+ replaces Na+ as the major solvent cation. To remove residual Ca2+ in these cases, it was necessary to dialyze the protein (20 mL, at 1.5 mg/mL) against a large excess of EDTA

Parvalbumin and Polcalcin Divalent Ion Binding

267

(1 L of NaCl/Hepes, containing 0.01 M EDTA), then dialyze away the bulk of the EDTA (against 1 L of buffer containing 0.001 M EDTA). Completion of EDTA removal and solvent cation exchange, if required, were completed by repeated cycles of concentration and dilution with the appropriate Ca2+-free buffer in an ultrafiltration cell.

2.6. Binding parameters for competing chelators Because the divalent ion-binding parameters extracted from titrations conducted in the presence of a competing chelator are dependent on the corresponding values for the chelator, accurate knowledge of the chelator binding behavior is important. The three competitive chelators that we routinely employ are EDTA, EGTA, and NTA. The relevant binding constants and enthalpies that we observe in 0.15 M NaCl, 0.025 M Hepes, pH 7.4, are listed in Table 10.2. The Ca2+-binding parameters for NTA were obtained by titrating a 1.0 mM, 0.5 mM, and 0.25 mM aliquots of the chelator with 20 mM or 5 mM Ca2+. The resulting data were analyzed globally with a single-site model to extract the binding constant and binding enthalpy. The Ca2+-binding parameters for EGTA were determined by titrating EGTA with Ca2+ in the presence of 2.0 mM NTA. The equations used to treat these data are described subsequently. To determine the binding parameters associated with EDTA, the chelator was first titrated with Mg2+. The binding constant and enthalpy for Mg2+, 5.5 105 M1 and 4.2 kcal/mol, respectively, can be readily estimated from direct titrations with Mg2+. The corresponding values for Ca2+ were then obtained by titrating samples of EDTA with Ca2+ and with Ca2+ in the presence of Mg2+. The equations used to treat these data are likewise described subsequently. Clearly, the competitive chelators employed in the experiments should not bind to the protein. Although we have not encountered this problem to date in our analyses, interactions between EF-hand proteins and Ca2+ indicators have been reported. For example, Haiech et al. reported that EGTA binds to hake parvalbumin with an apparent dissociation constant of 35 mM at pH 7.55 (Haiech et al., 1979). Moreover, whether such interactions occur may be strongly influenced by solution conditions. To determine whether protein-chelator interactions are occurring, the protein should be titrated with a solution of the chelator. Ideally, the observed heat fluxes should be identical to those accompanying addition of chelator to buffer alone. In practice, there may be small differences because the frictional heat generated by injecting the titrant into the stirred sample cell will differ slightly in the presence of the protein. However, the difference should be small, and the heat effects should be constant.

268

Michael T. Henzl

2.7. ITC data collection We routinely employ a set of 10 titrations, listed in Table 10.1, for the evaluation of parvalbumin/polcalcin Ca2+ and Mg2+-binding parameters. The significance of the experiment number is described subsequently. Protein concentrations between 50 and 60 mM are optimal. If the concentrations in the experiments numbered 11 and 13 significantly exceed 60 mM, the amount of titrant in the buret will not be sufficient to reach the endpoint for the reaction. When the protein concentration is significantly less than 50 mM, the endpoint is reached rather earlier during the titration, with a resultant decrease in the informational content of the experiment. Because overfilling the VP-ITC sample cell requires approximately 1.8 mL of solution, we typically prepare 2.0 mL samples in 5.0-mL plastic vials. To prepare sample 7, for example, 20 mL of a 100 mM Mg2+ solution is combined with 1.98 mL of (60 mM ) protein solution. Sample 9 is prepared similarly by combining 20 mL of 1.0 M Mg2+ and 1.98 mL of protein. We find it convenient to prepare the entire sample series at the same time. The samples are then rapidly frozen in liquid nitrogen and stored at 20 C prior to use.

2.8. Data set preparation Although one can probably imagine alternative strategies for supplying data to the fitting program, we combine the data from the 10 experiments described previously into a single composite data file. As currently configured, that file includes 13 columns. An excerpt from a typical file is displayed in Fig. 10.2. These values correspond to the first three points from a titration with Ca2+ in the presence of EDTA. Clearly, the data file must include the injection heat (Col. 1) for each titration point and an estimated standard deviation (Col. 2), as well as the associated total ligand Table 10.1 Titrations used to analyze parvalbumin and polcalcin divalent ion binding Exp. Protein no. concentration

Competing ion or chelator

Titrant

Injection Number of volume injections

60 mM 30 mM 60 mM 60 mM 60 mM 60 mM 60 mM 60 mM 60 mM 60 mM

— — — 1.0 mM Mg2+ 5.0 mM Mg2+ 10.0 mM Mg2+ 60 mM EDTA 60 mM EGTA 0.10 mM NTA EDTA

1.0 mM Ca2+ 1.0 mM Ca2+ 2.0 mM Mg2+ 1.0 mM Ca2+ 1.0 mM Ca2+ 1.0 mM Ca2+ 1.0 mM Ca2+ 1.0 mM Ca2+ 1.0 mM Ca2+ 2.0 mM Mg2+

7 mL 5 mL 10 mL 7 mL 7 mL 7 mL 7 mL 7 mL 7 mL 7 mL

1 2 4 7 8 9 11 13 15 17

42 30 29 42 42 42 42 42 42 42

269

Parvalbumin and Polcalcin Divalent Ion Binding

1 Inj. heat (mcal)

2 Std. dev. (mcal)

− 7.46189 − 43.0542 − 41.3975

1.00E + 07 1.5645E − 06 7.0228E − 06 0.350000 1.2454E − 05 0.350000

3 [Ca2+]tot (M)

4 [Mg2+]tot (M) 0.0000 0.0000 0.0000

5 [protein] (M)

6 [EDTA] (M)

5.7800E − 05 5.7918E − 05 5.7718E − 05 5.7632E − 05 5.7432E − 05 5.7350E − 05

8 [NTA] (M)

9 Expt. no.

10 Inj. no.

11 Model no.

12 Inj. vol. (L)

13 Tot. vol. (L)

0.0000 0.0000 0.0000

11 11 11

1 2 3

4 4 4

2.0000E − 06 7.0000E − 06 7.0000E − 06

1.4193E − 03 1.4123E − 03 1.4263E − 03

7 [EGTA] (M) 0.0000 0.0000 0.0000

Figure 10.2 Excerpt from a composite data file.The first three lines from the titration of a parvalbumin sample with Ca2+ in the presence of EDTA. Note that the standard deviation associated with the first titrant addition, the 2.0-mL preinjection, is assigned a value of 107, to ensure that it receives no weight in the least-squares analysis.

(Col. 3 for Ca2+, Col. 4 for Mg2+) and total protein concentrations (Col. 5). If there is a competing chelator, as in this case, that concentration must also be supplied (Cols. 6, 7, and 8 for EDTA, EGTA, and NTA, respectively). The experiment number, listed in Col. 9, tells the fitting program which model is to be used to simulate the injection heat. Experiments 1–3 correspond to direct titrations with Ca2+, 4–6 to direct titrations with Mg2+, 7–10 to titrations with Ca2+ in the presence of Mg2+, 11–12 to titrations with Ca2+ in the presence of EDTA, 13–14 to titrations with Ca2+ in the presence of EGTA, 15–16 to titrations with Ca2+ in the presence of NTA, and 17–18 to titrations with Mg2+ in the presence of EDTA. The injection number for the particular experiment is listed in Col. 10. The model number appears in Col. 11. It will differ depending on whether independent-site (Scatchard) or general (Adair) fitting equations are to be employed for the least-squares minimization. Columns 12 and 13 provide the injection volume and total volume, respectively. Preparation of the data file requires some care. We integrate the data from separate experiments and perform baseline smoothing, if necessary, using the Origin-based software supplied with the VP-ITC. The topic of baseline smoothing deserves comment. Obviously, one manipulates raw data at his peril, as there is always the danger of reducing the information content of the data set ( Johnson and Faunt, 1992). However, the MicroCal baseline algorithm frequently cuts a sawtooth pattern through the data, greatly exaggerating the variance. In these cases, we adjust the baseline so that it bisects the perceived average in the point-to-point noise. Because there is generally insufficient protein sample available to prerinse the cell, the protein sample is diluted by residual buffer in the sample cell at the time of loading. Consequently, the actual and calculated protein concentrations will differ slightly. This situation can compromise the quality of the

270

Michael T. Henzl

least-squares fit. Although it is possible to correct these inaccuracies during the least-squares analysis, as described subsequently, it is preferable to begin with the most accurate estimates for the protein concentration. In many of the titrations, the equivalence point is well-defined. These individual data sets are subjected to a preliminary analysis using the MicroCal software, the motivation being to correct minor errors in protein concentration. The sample concentration is varied until the fitted stoichiometries for both sites are 1.00. The MicroCal conversion software produces a spreadsheet listing the relevant information associated with each injection of titrant. The first four columns list the injection heat (mcal), the injection volume (mL), the dilution-corrected ligand concentration (mM) in the cell following the injection, and the corresponding dilution-corrected protein concentration (mM). The four columns of data are exported as an ASCII file and subsequently used in the construction of the master data file. We employ a Fortran-based utility called ‘‘itcdatafile’’ for this task. This program asks for the total number of titrations to be included in the composite data set. Then, for each experiment, it prompts for the name of the corresponding ASCII file, the experiment number, and a standard deviation. Except for the first point in the titration, which is always a 2.0mL preinjection, we generally utilize a uniform standard deviation for each data point. This number was determined by repeatedly titrating a solution of EDTA with Ca2+, calculating the standard deviation associated with each point in the titration, and averaging the values. For our calorimeter, this value turned out to be 0.35 mcal. The use of a uniform variance is not particularly realistic, as the signal-to-noise varies from experiment to experiment and even from replicate to replicate. However, it is approximately correct and permits the calculation of a reduced w2 value:

w2v

1X 1 2 ¼ ½yðiÞ yðxi Þ : v s2i

ð10:1Þ

In Eq. (10.1), n is the number of degrees of freedom (the number of data points minus the number of variable parameters), si is the standard deviation of the ith point, y(i) is the observed injection heat and y(xi) is the calculated injection heat. The heat associated with the 2-mL preinjection is invariably low due to diffusion of titrant from the buret during the relatively lengthy thermal equilibration period. In order to effectively ignore this point during the least-squares minimization process, it is assigned a standard deviation of 107. If the experiment number corresponds to a titration in the presence of a fixed level of Mg2+, the program asks for the initial concentration. Likewise, if a competing chelator is present, it prompts for the initial chelator concentration. When the information has been collected for all of the titrations, the program asks for an output file name and subsequently writes out the file.

Parvalbumin and Polcalcin Divalent Ion Binding

271

2.9. Preparation of the parameter file The parameter file contains the initial estimates of the Ca2+- and Mg2+binding enthalpies and binding constants for the protein of interest. It also contains the corresponding binding parameters for the competing chelators. These are so-called global parameters, which apply to the entire data set. There are also some experiment-specific, or local, parameters. These include binding stoichiometries for the proteins and chelators in each of the titrations and baseline offsets for each experiment. The stoichiometry values are set to 1.0 by default. As the w2 minimum is approached, experiments for which the assumed protein concentration is incorrect become obvious—exhibiting large w2 contributions in the vicinity of the equivalence point. For these cases, the corresponding stoichiometry coefficients can be adjusted by trial and error to find the optimal values. The baseline offsets are all set to 0.0 by default. As the w2 minimum is approached, it may become apparent that one or more of the experiments requires a baseline correction. This is likewise done empirically, adjusting the value until no further improvement in w2 is observed. We employ a utility called ‘‘itcparfile’’ to produce the parameter file. As currently configured, the resulting parameter file contains four columns. An excerpt from a file is displayed in Fig. 10.3. The first column is a shorthand label for the parameter. The second column is a text descriptor. The third column contains the actual parameter values, and the final column, consisting of 10 s and 00 s, indicates whether the corresponding parameter is to be varied (1) or not (0). The file contains 80 lines. Lines 1–4 present the binding enthalpies and binding constants for Ca2+; lines 5–8 present the corresponding values for Mg2+. Lines 9 and 10, used only in the Adair model, correspond to the enthalpy change and equilibrium constant, respectively, for the formation of a mixed Ca2+-Mg2+ complex. Lines 11– 18 present the binding parameters for the competing chelators. Lines 19–38 contain stoichiometry values (n1, n2) for experiments 1–10. Lines 39–62 contain stoichiometry values (n1, n2, nC) for experiments 11–18. Lines 63– 80 correspond to the baseline adjustments for experiments 1–18.

2.10. General comments on ITC model development We herein offer a brief introduction to ITC model development. For simplicity, we consider a molecule M having a single-site for ligand X:

M þ X Ð MX:

ð10:2Þ

For relevance, we further assume that M is a small-molecule chelator (e.g., EDTA or EGTA) and that X is either Ca2+ or Mg2+. The model development process begins with the equation for the cumulative heat of binding following the ith titrant addition:

272

Michael T. Henzl

2

1 H1 K1 H2 K2 HM1 KM1 HM2 KM2 HCAMG BCAMG HEDTA KEDTA HMEDTA KMEDTA HEGTA KEGTA HNTA KNTA N1 N2 N1 N2 •• •

BL BL BL BL

3

SITE 1 CA ENTHALPY SITE 1 CA-BINDING CONSTANT SITE 2 CA ENTHALPY SITE 2 CA-BINDING CONSTANT SITE 1 MG ENTHALPY SITE 1 MG-BINDING CONSTANT SITE 2 MG ENTHALPY SITE 2 MG-BINDING CONSTANT MIXED-SPECIES BINDING ENTHALPY MIXED-SPECIES BINDING CONSTANT EDTA CA-BINDING ENTHALPY EDTA CA-BINDING CONSTANT EDTA MG-BINDING ENTHALPY EDTA CA-BINDING CONSTANT EDTA CA-BINDING ENTHALPY EGTA CA-BINDING CONSTANT NTA CA-BINDING ENTHALPY NTA CA-BINDING CONSTANT EXPT 1 CA TITRATION EXPT 1 CA TITRATION EXPT 2 CA TITRATION EXPT 2 CA TITRATION •• •

BASELINE BASELINE BASELINE BASELINE

4

−4100.000 22000000.000 −3460.000 1500000.000 3010.000 9200.000 4160.000 170.000 0.000 0.000 −6300.000 44000000.000 4400.000 550000.000 −7500.000 17000000.000 −1900.000 11000.000 1.000 1.000 1.000 1.000 •• •

EXPT EXPT EXPT EXPT

0.000 0.000 0.000 0.000

15 16 17 18

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 •• •

0 0 0 0

Figure 10.3 Excerpt from a typical parameter file. Binding enthalpies and binding constants for the protein and chelators occupy the first 18 lines. Binding stoichiometries for 18 potential experiments occupy the following 34 lines. Baseline adjustment parameters occupy the final 18 lines.

Qi ¼ DHai :

ð10:3Þ

In this general equation, DH is the enthalpy of the reaction and ai is the fraction of binding sites occupied on M after the ith addition. DH is commonly expressed in cal/mol, whereas Qi is typically measured in mcal. Thus, for consistency of units, the right side of the equation must be multiplied by 106Vo[M]t,i, where Vo represents the sample cell volume and [M]t is the total concentration of M:

Qi ¼ 106 Vo ½M t;i DHai :

ð10:4Þ

The factor of 106 serves to convert calories to microcalories, and the productVo ½Mt equals the total moles of M in the sample cell. For a single-site system, the value of ai is given by the binding isotherm:

ai ¼

Kxi ; 1 þ Kxi

ð10:5Þ

273

Parvalbumin and Polcalcin Divalent Ion Binding

where K is the association constant and xi is the free molar concentration of ligand after the ith injection, shorthand for [X]i. The denominator in this equation corresponds to the binding partition function, or binding polynomial. It reflects the statistical weight of each of the ligated forms at ligand concentration xi. For the single-site system, there are just two forms, apo- and bound. Note that Eq. (10.5) requires knowledge of the free ligand concentration, whereas the independent variable in an ITC experiment is the total ligand concentration, [X]t,i. However, given [X]t,i, the total concentration of M, [M]t,i, and an estimate for K, we can solve for the free ligand concentration. For a one-site system, the solution can be obtained either analytically or numerically. In the former case, we can write an expression for the association constant K, taking advantage of the fact thatxi ¼ ½Xt;i ½MXi and½Mi ¼ ½Mt;i ½MXi :

K¼

½MX i ½MX i : ¼ ½M i xi ½M t;i ½MX i ½X t;i ½MX i

ð10:6Þ

Clearing the denominator and collecting terms yields a quadratic equation that can be solved for [MX]i, yielding:

1 ½MXi ¼ 2

1 ½Mt;i þ ½Xt;i þ K 1 ½Mt;i þ ½Xt;i þ K

2

!1=2 ! ð10:7Þ 4 ½Mt;i ½Xt;i

:

The resulting value can then be used to calculate ai because

ai ¼

½MX i : ½M t;i

ð10:8Þ

Given a value for ai and an estimate for DH, Qi can be calculated via Eq. (10.4). Alternatively, the free ligand concentration can be estimated numerically. To do this, we define a function, f(xi), equal to the estimated freeligand concentration (xi) plus the estimated bound-ligand concentration minus the (known) total ligand concentration:

f ðxi Þ ¼ xi þ ½MX i ½X t;i :

ð10:9Þ

274

Michael T. Henzl

We can substitute for [MX]i, making use of Eqs. (10.5) and (10.8), to obtain an expression in terms of xi and K:

f ðxi Þ ¼ xi þ ½M t;i

Kxi ½X t;i : 1 þ Kxi

ð10:10Þ

If the initial estimate for xi is correct, f(xi) will equal zero. If not, we try other values of xi until the value is acceptably small, say 106[X]t,i. The bisection method is an efficient way to proceed. We first identify a value for xi that yields a positive value of f(xi). The total ligand concentration, [X]t,i, is a logical choice. We refer to that value of xi as up (for upper limit). The value of xi is then decremented until the value of f(xi) is negative. That value of xi is referred to as lo (for lower limit). Next, we calculate the average of up and lo (i.e., we bisect them) to obtain a better estimate for xi. We then recalculate f(xi). If it is sufficiently small, we are done. Otherwise, we continue. If f(xi) is positive, we assign xi to up; if f(xi) is negative, xi becomes the new value for lo. The updated values for up and lo are once again averaged to yield a new value of xi. As before, assuming f(xi) is still not acceptably small, if f(xi) is positive, xi is assigned to up. Otherwise, it is assigned to lo. The bisection routine is continued until the criterion for acceptability is met. Regardless of the model used to simulate the cumulative heat, the injection heat for the ith titrant addition is obtained by subtracting the cumulative heats associated with the ith and (i 1)th data points:

dV qi ¼ ðQi Qi1 Þ þ Vo

Qi þ Qi1 þ bl: 2

ð10:11Þ

In both the VP-ITC and Nano ITC, the sample cell is overfilled, so that the observed reaction volume is constant. With this configuration, however, each addition of titrant displaces an equal volume of the reaction mixture from the cell. The second term in Eq. (10.5) corrects for the undetected heat of reaction associated with that displaced volume. The ‘‘bl’’ term, short for baseline, corresponds to the mixing heat associated with the titrant injection. To summarize, ITC model development involves four basic steps: (1) Express the cumulative heat is expressed in terms of the binding enthalpies and fractional occupancies. (2) Express the fractional occupancies in terms of the binding constants and free ligand concentrations. (3) Solve for the free ligand concentration, employing the binding constants and total concentrations of M and X. (4) Calculate the injection heat from the difference between the cumulative heats for the ith and (i 1)th titrant additions. Implementation of this process is described for two cases: binding of Ca2+ to a small molecule chelator in the presence of a competing chelator and binding of Ca2+ to chelator in the presence of a competing metal ion, Mg2+.

275

Parvalbumin and Polcalcin Divalent Ion Binding

2.11. Binding in the presence of a competing chelator Often, ligand-binding affinity precludes a determination of the binding constant by direct titration. For example, the Ca2+ affinity of EGTA is sufficiently high that direct determination of the binding constant is problematic. One solution to this problem involves titrating the molecule of interest in the presence of a competitor molecule having lower affinity for the ligand. For example, EGTA can be titrated with Ca2+ in the presence of NTA. In this case, the cumulative heat for the ith titrant addition includes contributions from the binding of Ca2+ to both EGTA and NTA, as expressed by the following equation:

Qi ¼ DHEGTA ai;EGTA þ DHNTA ai;NTA ;

ð10:12Þ

where ai,EGTA and ai,NTA represent the fractions of the chelators coordinating Ca2+ after the ith addition. They are calculated as follows:

ai;EGTA ¼

KEGTA ½Ca2þ i KNTA ½Ca2þ i ; a ¼ : ð10:13Þ i;NTA 1 þ KEGTA ½Ca2þ i 1 þ KNTA ½Ca2þ i

Employing these expressions and making the units consistent, Qi can then be written as

DHEGTA KEGTA ½Ca2þ i Qi ¼106 Vo ½EGTAt;i 1 þ KEGTA ½Ca2þ i !! DHNTA KNTA ½Ca2þ i þ ½NTAt;i : 1 þ KNTA ½Ca2þ i

!

ð10:14Þ

The next step is to solve for the free ligand concentration. In this case, the free Ca2+ concentration can be obtained numerically for this experiment by minimizing this equation:

KEGTA ½Ca2þ i f ðxi Þ ¼ Ca i þ ½EGTAt;i 1 þ KEGTA ½Ca2þ i 2þ KNTA ½Ca2þ i þ ½NTAt;i Ca t : 1 þ KNTA ½Ca2þ i

2þ

ð10:15Þ

The injection heats are obtained by substituting the expressions for Qi (and Qi1) from Eq. (10.14) into Eq. (10.11).

276

Michael T. Henzl

2.12. Binding in the presence of a competing metal ion Another strategy for estimating the binding constant for a high-affinity ligand involves titrating the molecule of interest with ligand X in the presence of low-affinity ligand Y (Eatough, 1970; Hu and Eftink, 1994; Sigurskjold, Berland, and Svensson, 1994). For example, whereas the Ca2+ affinity of EDTA is too high to measure directly, the Mg2+ affinity is readily measured by direct titration. Thus, the Ca2+ association constant for EDTA can be determined by titrating the chelator with Ca2+ in the presence of a fixed level of Mg2+. For this system, the cumulative heat for the ith titrant addition will include contributions from both the binding of Ca2+ and the displacement of Mg2+:

Qi ¼ DHCa ai;Ca þ DHMg ai;Mg ;

ð10:16Þ

where ai,Ca or ai,Mg represent the fraction of chelator coordinating Ca2+ or Mg2+, respectively. These fractional occupancies are given by the following expressions:

ai;Ca ¼

KCa ½Ca2þ i ; 1 þ KCa ½Ca2þ i þ KMg ½Mg2þ i

ai;Mg

KMg ½Mg2þ i ¼ : 1 þ KCa ½Ca2þ i þ KMg ½Mg2þ i

ð10:17Þ

Note that the binding partition function (denominator) contains an additional term corresponding to the Mg2+-bound form of the protein. Clearly, to have a discernible impact on the affinity for Ca2+, the Mg2+ must be present at a concentration comparable to, or exceeding, its dissociation constant. Substituting the terms from Eq. (10.17) into Eq. (10.16) yields the following expression for Qi:

DHCa KCa ½Ca2þ i 1 þ KCa ½Ca2þ i þ KMg ½Mg2þ i ! DHMg KMg ½Mg2þ i þ : 1 þ KCa ½Ca2þ i þ KMg ½Mg2þ i

Qi ¼106 Vo ½EGTAt;i

ð10:18Þ

Application of this equation requires knowledge of both the free Ca2+ and Mg2+ concentrations. Sigurskjold has described an analytical solution for the free ligand concentration in this type of competitive ITC analysis (Sigurskjold, 2000). Alternatively, these values can be obtained by simultaneously minimizing these two equations:

Parvalbumin and Polcalcin Divalent Ion Binding

f

f

Ca2þ

Mg2þ

i

¼ Ca2þ i þ ½EGTAt;i ð10:19Þ 2þ KCa ½Ca2þ i Ca : t;i 1 þ KCa ½CaH 2þ i þ KMg ½Mg2þ i

i

277

¼ Mg2þ i þ ½EGTAt;i 2þ KMg ½Mg2þ i Mg t;i : 1 þ KCa ½Ca2þ i þ KMg ½Mg2þ i

ð10:20Þ

A strategy for performing the simultaneous minimization is described in a subsequent section.

2.13. Least-squares minimization If an appropriate mathematical model is available, the exercise of curve fitting reduces to a search for optimal parameter values. The least-squares strategy for parameter optimization assumes that the set of parameters that provides the best agreement with observation (i.e., the smallest variance) has the highest probability of being correct (Johnson and Faunt, 1992). The validity of this assumption is conditional. It requires that the fitting model be correct, that the data points be independent of one another, and that there be sufficient data points to adequately sample the experimental uncertainty. It further requires that the data be free of systematic error, that the experimental uncertainty can be described by a Gaussian distribution, and that the uncertainty can be attributed entirely to the dependent variables. The term ‘‘least-squares minimization’’ actually encompasses a number of numerical strategies (Bevington and Robinson, 1992; Johnson and Faunt, 1992;). In so-called grid-search methods, after calculating an initial w2 value, one of the fitting parameters aj is incremented by Daj (where the sign is chosen such that w2 decreases) and w2 is recalculated. This step is repeated until w2 no longer decreases and begins to increase. This process is carried out for each fitting parameter in turn. The entire procedure is repeated until no further significant reduction in w2 is achieved. Although reliable, grid-search algorithms converge slowly, especially when the parameters are highly correlated. In the more efficient gradientsearch or steepest-descent methods, all of the fitting parameters aj are adjusted simultaneously. The relative magnitudes of the adjustments are chosen so that the movement in parameter space is in the direction of maximum reduction in w2 (i.e., along the w2 gradient (rw2)). The gradient

278

Michael T. Henzl

rw2 is a vector pointing in the direction along which w2 changes most rapidly:

" # n 2 X @w rw2 ¼ ^aj ; @a j j¼1

ð10:21Þ

where aˆj is a unit vector in the direction of the aj coordinate axis. The partial derivatives of w2 with respect to each of the parameters are calculated numerically:

w2 aj þ daj w2 aj @w2 ¼ ; @aj daj

ð10:22Þ

where daj is some fraction of the parameter value. The sign of each daj increment is chosen so that w2 decreases. So-called expansion methods offer yet another approach to minimizing w2. In this approach, the fitting function y(x) is approximated by means of a truncated Taylor series about the point aj0 :

m

X @ y 0 ðxÞ yðxÞ ¼ y ðxÞ þ daj : @aj j¼1 0

ð10:23Þ

In Eq. (10.23), y0 (x) is the value of the fitting function when the parameters have starting values aj0 and the derivatives are evaluated at those values. Then, w2 can be expressed explicitly as a function of the parameter increments daj:

w ¼ 2

X

(

1 s2i

m

X @ y 0 ðxi Þ yi y ðxi Þ daj @aj j¼1 0

)2 !

:

ð10:24Þ

Then w2 is minimized with respect to each of the parameter increments dak to obtain the optimal values of the increments:

( ! ) 0 m

0 X 1 X @w2 @ y ð x Þ @ y ð x Þ i ¼ 2 yi y0 ðxÞ daj ¼ 0: @dak s2i @aj @aj j¼1

ð10:25Þ The resulting m simultaneous equations are solved to obtain the optimal parameter increments, which are then used to update the starting parameter estimates. This process is carried out iteratively until no further significant reduction in w2 is obtained. The advantage of this approach is that it converges very rapidly in the neighborhood of a minimum.

Parvalbumin and Polcalcin Divalent Ion Binding

279

One of the most effective least-squares fitting strategies, the LevenbergMarquardt algorithm, combines the most desirable features of the steepestdescent and expansion methods. When w2 is far from a minimum, the Marquardt algorithm behaves like a gradient or steepest-descent search. However, in the neighborhood of a minimum, it behaves more like the expansion method. Ideally, least-squares minimization should begin with good initial parameter estimates. Often, the binding parameters are known for a closely related protein, an ortholog or site-specific variant. In our experience, however, even with reasonably good starting parameter estimates, the LevenbergMarquardt algorithm often fails to converge to a global minimum. This failure might reflect the ruggedness of the w2 hypersurface resulting from the variation of eight parameters—four enthalpies and four binding constants— and the presence of multiple local minima. Whatever the reason, our least-squares minimization strategy relies heavily on Monte Carlo simulation. The program ‘‘itcmonte’’ prompts for the data file and parameter file names, the number of parameter sets desired (typically 100), and a value for ‘‘delta.’’ Initially, delta should be set to 0.20. The program generates random numbers between 1.0 and +1.0. Each of the variable parameters is multiplied by a random number and the value for delta. This product is then added to the current value of the parameter. After this process has been carried out for each of the adjustable parameters, the resulting modified parameter set is used to calculate simulated injection heats for each of the data points in the composite dataset. If the resulting w2 value is less than the starting value, the updated parameter values of the parameters are written to a file and used as the starting point for another round of modification. Otherwise, they are discarded. This procedure is repeated until the specified number of parameter sets has been collected. Upon completion, the program displays the w2 minimum, and the associated parameter values, and writes the latter to a file (‘‘bestpars’’). It also performs a simulation with the optimal parameter values and writes an ASCII file called ‘‘monteplot’’ containing the experimental values, the calculated values, and residuals for each experiment. This file can be imported into a plotting program to evaluate the quality of the fit. The program then asks whether you wish to continue with additional simulations and, if so, the number of parameter sets desired and the value of delta to be employed. As the w2 limit decreases, the value of delta should likewise be decreased. As one approaches the minimum in the error space, and the parameter values approach their optimal values, smaller corrections to the parameter values should be used. In the absence of systematic errors and an accurate estimate of the standard deviation for each point, the reduced w2 value for a perfect least-squares fit would be 1.0. In practice, we obtain values between 2 and 5. Occasionally, the Monte Carlo simulations will fail to yield a w2 of this magnitude. This situation might reflect a problem in the dataset, perhaps mistyping protein or chelator

280

Michael T. Henzl

concentrations or inadvertently entering concentrations in the wrong units (e.g., millimolar rather than molar). Alternatively, if the initial estimates were extremely poor, the simulations may simply have wandered into the wrong sector of parameter space. If the Monte Carlo simulations stall at a high w2 value, it is recommended that the data and simulated fit be examined. Nonconvergence of the w2 values can also be a signal that the fitting model is incorrect. This situation was encountered during the initial analysis of the polcalcin Phl p 7, described subsequently. Initially, we tried the two-site independent-site model that we typically employ for analysis of parvalbumin binding data. When it failed to yield an acceptable w2 value, we modified the data set to use the Adair fitting functions, which readily yielded a solution. Once the Monte Carlo simulations have identified a set of parameter values that yield a reduced w2 value in the acceptable range, a round of least-squares minimization is performed. We utilize a standard Levenberg-Marquardt algorithm, adapted from the CURFIT routine in the original Bevington monograph (Bevington, 1969), for minimization. The program called ‘‘itcmarqmin’’ prompts for the names of the data set and the parameter file. At the conclusion of the minimization, it writes a file called ‘‘marqplot’’, comparable to ‘‘monteplot’’, which can be imported into a plotting program for visualization.

2.14. Error analysis The least-squares analysis is not concluded with the acquisition of a set of parameter values. The importance of knowing the uncertainties associated with the estimated binding parameters cannot be overstated ( Johnson and Faunt, 1992). Tellinghuisen has discussed the this issue in the context of analyzing ITC data (Tellinghuisen, 2004). In our work, we have conducted statistical error analyses by Monte Carlo analysis, simulated data sets, and confidence intervals. In the Monte Carlo analysis, the optimized parameter values are modified with the aid of a random-number generator. A simulation is conducted with the resulting parameter set, and the w2 value is recorded. This procedure is repeated perhaps 5,000 to 10,000 times. At the end, the F-statistic is used to estimate the w2 value associated with, say, the 95% confidence interval. The range of parameter values observed at this limit is accepted as the uncertainty associated with a given parameter. Alternatively, synthetic data sets can be used to estimate parameter uncertainty. A set of simulated data is generated with the optimized parameter values using the program ‘‘syndat.’’ Random noise is introduced into this data set by ‘‘Gaussian data smearing.’’ This procedure is repeated until 525 simulated datasets have been collected, which is equivalent, in principle, to having repeated the actual experimental analysis 525 times. Each data set in this ensemble is then subjected to Marquardt-Levenberg minimization. The distribution of parameter values can be used as an estimate of the parameter uncertainty.

Parvalbumin and Polcalcin Divalent Ion Binding

281

Our preferred method of error analysis involves generation of confidence intervals for each parameter. Following identification of the optimal parameter values and the lowest chi-square value, w2(min), confidence intervals are obtained by incrementing the parameter of interest, fixing its value, and repeating the least-squares minimization, allowing the remaining parameters to vary. This procedure is repeated until the chi-square value, w2(par), exceeds a specified threshold, given by

p 2 w ðparÞ ¼ 1 þ F ðp; v; P Þ w ðminÞ : v 2

ð10:26Þ

In this equation, p is the number of fitting parameters, v the degrees of freedom, P the probability that the increase in w2 could be the result of random errors (e.g., 0.10 for the 90% confidence limit), and F( p, v, P) the corresponding F statistic. After the upper limit has been determined, the parameter value is decremented from its optimal value until w2(par) again exceeded the specified threshold. The program ‘‘itcconfint’’ is used for this procedure. At the conclusion of the analysis, the confidence intervals are written to a file called ‘‘parlimits.’’ In addition to defining confidence intervals for each of the parameters, the program will occasionally reveal a combination of parameters that produces a (slightly) improved w2 value. If this happens, the improved parameter estimates are written to file called ‘‘newoptpars.’’

3. Illustrative Global ITC Analyses of Divalent Ion Binding 3.1. The independent two-site model For all of the parvalbumin analyses that we have performed, it has been possible to employ an independent two-site model, sometimes called a Scatchard model. For a protein with two ligand-binding sites, the cumulative heat after the ith titrant addition can be expressed as

Qi ¼ DH1 a1;i þ DH2 a2;i ;

ð10:27Þ

where DH1 and DH2 are the binding enthalpies associated with sites 1 and 2 and a1,i and a2,i are the corresponding fractional occupancies. The extent of binding after the ith addition, X i , for a two-site system is given by:

X i ¼

n1 k1 xi n2 k2 xi : þ 1 þ k1 xi 1 þ k2 xi

ð10:28Þ

282

Michael T. Henzl

In this equation, the first term, n1k1xi/(1+ k1xi), describes the fractional occupancy of site 1, a1,i, and the second term describes the fractional occupancy of site 2, a2,i. The n1 and n2 values allow for nonunity binding-site stoichiometries, which could result from minor errors in the protein concentration. The denominator in each term is the binding polynomial for the particular site. The equation for the free ligand concentration is given by

xi ¼ ½X t;i ½M t;i

n1 k1 xi n2 k2 xi : þ 1 þ k1 xi 1 þ k2 xi

ð10:29Þ

Multiplying both sides of the equation by (1 + k1xi) and (1 + k2xi) yields a cubic equation that can be solved exactly for xi, employing the following system of equations:

A ¼ k1 k2 :

ð10:30Þ

B ¼ k1 þ k2 þ k1 k2 ½M t;i ðn1 þ n2 Þ ½X t;i :

C ¼ 1 þ ½M t;i ðn1 k1 þ n2 k2 Þ ½L t;i k1 þ k2 :

ð10:31Þ ð10:32Þ

D ¼ ½X t;i :

ð10:33Þ

1 3C B2 : F¼ 3 A A2

ð10:34Þ

1 2B3 9BC 27D G¼ : 2 þ 27 A3 A A

ð10:35Þ

H¼

G2 F 3 þ : 4 27

2 0:5 G I¼ H : 4

ð10:36Þ

ð10:37Þ

283

Parvalbumin and Polcalcin Divalent Ion Binding

J ¼ I 1=3 : 1

K ¼ cos

ð10:38Þ

G : 2I

ð10:39Þ

K B : xi ¼ 2J cos 3 3A

ð10:40Þ

Alternatively, the value of xi can be obtained by bisection, as described earlier, minimizing the following function:

f ðxi Þ ¼ xi þ ½M t;i

n1 k1 xi n2 k2 xi ½X t;i : þ 1 þ k1 xi 1 þ k2 xi

ð10:41Þ

Once the free-ligand concentration has been obtained, the cumulative heat can be calculated with the expression:

Qi ¼ 10 Vo ½M t;i 6

DH1 n1 K1 xi DH2 n2 K2 xi ; þ 1 þ K1 xi 1 þ K2 xi

ð10:42Þ

which may be substituted into Eq. (10.11) to calculate the injection heat. These equations can be used to model data obtained by direct titration of the protein with either Ca2+ or Mg2+.

3.2. Competing chelator If the titration is carried out in the presence of a competing chelator (e.g., titration of the protein with Ca2+ in the presence of EDTA), the cumulative heat includes contributions from binding of the metal ion to both the protein and the chelator:

Qi ¼ DH1 a1;i þ DH2 a2;i þ DHC aC;i

ð10:43Þ

where DHC and aC,i are the binding enthalpy and fractional occupancy of the chelator. Substituting the relevant fractional occupancies for the protein and chelator binding sites (given by Eqs. (10.23) and (10.5), respectively) yields:

Qi ¼ 106 Vo ½Mt;i DH1 n1 k1 xi DH2 n2 k2 xi þ 1 þ k1 xi 1 þ k2 xi

! þ ½Ct;i

DHC nC kC xi 1 þ kC x i

!! ð10:44Þ

284

Michael T. Henzl

In this equation, [C]t,i represents the total chelator concentration, and the expression in parentheses immediately following describes the binding of ligand to the chelator. To apply Eq. (10.44), it is of course necessary to solve for the free ligand concentration. The relevant equation to be minimized has the following general form:

f ðxi Þ ¼ xi þ ½M t;i

n1 k1 xi n2 k2 xi nC k C x i þ ½C t;i ½X t;i: þ 1þk1 xi 1þk2 xi 1þkC xi

ð10:45Þ 3.3. Competing metal ion When a protein is titrated with Ca2+ in the presence of Mg2+, the cumulative heat expression must include the contributions from the binding of both ions:

Qi ¼ DH1Ca a1Ca;i þ DH2Ca a2Ca;i þ DH1Mg a1Mg;i þ DH2Mg a2Mg;i ; ð10:46Þ where, for example, DH1Ca and a1Ca,i represent the Ca2+-binding enthalpy for site 1 and the fractional occupancy of that site by Ca2+, respectively. The extent of Ca2+ binding in the presence of a fixed concentration of Mg2+ is described by this equation:

X i ¼

n1 k1Ca ½Ca2þ i 1 þ k1Ca ½Ca2þ i þ k1Mg ½Mg2þ i n2 k2Ca ½Ca2þ i þ ; 1 þ k2Ca ½Ca2þ i þ k2Mg ½Mg2þ i

ð10:47Þ

where k1Ca and k2Ca represent the Ca2+-binding constants for sites 1 and 2 and k1Mg and k2Mg represent the corresponding Mg2+-binding constants. The two terms on the right-hand side of Eq. (10.47) correspond to a1Ca,i and a2Ca,i in Eq. (10.46). An analogous equation can be written for Mg2+:

Y i ¼

n1 k1Mg ½Mg2þ i 1 þ k1Ca ½Ca2þ i þ k1Mg ½Mg2þ i n2 k2Mg ½Mg2þ i þ ; 1 þ k2Ca ½Ca2þ i þ k2Mg ½Mg2þ i

ð10:48Þ

where the two terms on the right side of the equation correspond to a1Mg,i and a2Mg,i in Eq. (10.46).

285

Parvalbumin and Polcalcin Divalent Ion Binding

Substituting these fractional occupancies into Eq. (10.46) yields the following expression for the cumulative heat of binding:

DH1 k1 ½Ca2þ þ DH1Mg k1Mg ½Mg2þ Qi ¼n1 1 þ k1 ½Ca2þ þ k1Mg ½Mg2þ DH2 k2 ½Ca2þ þ DH2Mg k2Mg ½Mg2þ : þn2 1 þ k2 ½Ca2þ þ k2Mg ½Mg2þ

ð10:49Þ

Application of Eq. (10.49) requires knowledge of both the free Ca2+ and free Mg2+ concentrations. We begin by solving for [Ca2+]i in the absence of Mg2+ and for [Mg2+]i in the absence of Ca2+. These estimates, which can be obtained analytically using Eqs. (10.30–40), constitute lower limits because the presence of the competing ion will displace some of the primary ligand from the protein, and vice versa. However, they can be used as the starting point in a nested bisection routine to obtain the true [Ca2+]i and [Mg2+]i values. The two equations that must be simultaneously minimized are: Ca2þ i ¼ Ca2þ i þ ½M t;i

n1 k1Ca ½Ca2þ i 1 þ k1Ca ½Ca2þ i þ k1Mg ½Mg2þ i n2 k2Ca ½Ca2þ i Ca2þ t;i: þ 1 þ k2Ca ½Ca2þ i þ k2Mg ½Mg2þ i n1 k1Mg ½Mg2þ i f Mg2þ i ¼ Mg2þ i þ ½M t;i 1 þ k1Ca ½Ca2þ i þ k1Mg ½Mg2þ i n2 k2Mg ½Mg2þ i Mg2þ i : þ 2þ 2þ 1 þ k2Ca ½Ca i þ k2Mg ½Mg i f

ð10:50Þ

ð10:51Þ

Typically, we begin with Eq. (10.50), using our preliminary estimates for xi and yi. The resulting improved value of xi is used in Eq. (10.51) to obtain an improved value of yi. That improved value for yi is then recycled back into Eq. (10.50) to further refine our estimate of xi. The process continues until the changes in xi and yi become acceptably small.

3.4. Analysis of the divalent ion binding by the S55D/E59D variant of rat a-parvalbumin The CD site of the S55D/E59D variant is an example of a ‘‘pentacarboxylate’’ ligand array. Five of the six metal-ion coordination positions are occupied by side-chain carboxylates—aspartate at +x, +y, +z, and x, glutamate at z. Although this ligand constellation displays reduced metal

286

Michael T. Henzl

ion affinity, relative to a four-carboxylate site, in a model-peptide system (Marsden, Hodges, and Sykes, 1988), heightened Ca2+ affinity is observed when it is introduced into either parvalbumin binding site (Henzl, Hapak, and Goodpasture, 1996; Henzl, Agah, and Larson, 2004; Lee et al., 2004). As mentioned earlier, the S55D/E59D variant of rat a-parvalbumin exhibits very high affinity for Ca2+. A representative titration of a 55/59 with Ca2+ in 0.15 M KCl, 0.025 M Hepes-KOH, pH 7.4, is displayed in Fig. 10.4A. Note that the injection heats return abruptly to baseline values with the addition of two molar equivalents of Ca2+, indicative of a high-affinity

A 0.0

−0.2

−0.4

B

Power (mcal/s)

−0.6

−0.8 0

50

100 Time (min)

150

200

0.0 −0.5 −1.0 −1.5 −2.0

0

20

40

60 80 Time (min)

100

120

140

Figure 10.4 Representative raw data for the titration of rat a S55D/E59D with Ca2+ (A) and with Ca2+ in the presence of EGTA (B).

287

Parvalbumin and Polcalcin Divalent Ion Binding

interaction. Fig. 10.4B presents raw data for the titration with Ca2+ in the presence of EGTA. Under our experimental conditions, the enthalpy change associated with the binding of Ca2+ to EGTA is substantially greater than that associated with either EF-hand motif in the parvalbumin. Thus, the appearance of the thermogram suggests that Ca2+ binding occurs in a highly ordered manner, with occupancy of the protein binding sites preceding that of the chelator, an indication that the Ca2+ affinity of the protein greatly exceeds that of EGTA. Samples of a 55/59 were subjected to a battery of titrations comparable to that outlined in Table 10.1. Because the Ca2+ affinity is extremely high, the titrations in the presence of NTA were omitted. Instead, the protein was titrated with Ca2+ in the presence of two different levels of EDTA and EGTA. Integrated data for the titrations are displayed in Fig. 10.5. Analysis

B

Residuals heat per injection (mcal)

A 0

0

−10

–20

−20

–40

−30

–60

−40

–80

−50

0.00

0.05

Residuals heat per injection (mcal)

C

[Ca2+]

0.10

0.15

(mM)

0.10

0.20 [Ca2+] (mM)

0.30

0.05

0.10 [Ca2+] (mM)

0.15

D 0

50 40

−10

30

−20

20

−30

10

−40

0

0.00

0.00

0.10

0.20 [Ma2+] (mM)

0.30

0.00

Figure 10.5 Integrated ITC data for the titration of rat a S55D/E59D with Ca2+ (A), with Ca2+ in the presence of EDTA (B, white symbols) and EGTA (B, grey symbols), with Mg2+ (C, white and grey), with Mg2+ in the presence of EDTA (C, black), and with Ca2+ in the presence of Mg2+ (D) at 1 mM (white), 3 mM (grey), and 10 mM (black) Mg2+.The solid lines reflect the best least-squares fit to the data.

288

Michael T. Henzl

of these data with the independent-site models described above yielded the optimal least-squares fit indicated by the solid lines in the figure. The optimal parameter values and associated uncertainties are listed in Table 10.2. The first Ca2+-binding constant, presumably corresponding to occupancy of the mutated CD binding site, approaches 2 1010 M1. Clearly, the global analysis strategy is applicable to systems exhibiting very high divalent ion affinity. However, we have also applied it to systems having compromised Ca2+ affinity. For example, the combined D94S and G98E mutations in rat aparvalbumin produce a CD-site ligand array in the environment of the EF site. The divalent ion affinity of the resulting ‘‘pseudo-CD site’’ is seriously attenuated, however, with the association constant down from 1.2 108 M1 for the wild-type EF site to 5.4 105 M1. Nevertheless, the global fitting method works equally well for this protein (Tanner et al., 2005).

3.5. Analysis of positively cooperative divalent ion binding Many ligand-binding phenomena exhibit positive cooperativity, the binding of oxygen by hemoglobin being the archetypal example. In these systems, the initial binding event provokes a conformational change that facilitates subsequent binding events. The Scatchard model, predicated on the apparent absence of binding-site interactions, fails for these systems. Instead, a more general binding equation is required. For the two-site systems treated here, the extent of binding is described by the following equation:

2 K x þ 2K K x 1 i 2 1 i ; X ¼ n 1 þ K1 xi þ K2 K1 x2i

ð10:52Þ

where the factor n allows for nonunity binding stoichiometries. The binding constants in Eq. (10.43), represented by capital letters, are stepwise macroscopic binding constants. The term ‘‘macroscopic’’ is applied because they describe the formation of the three macrostates of the protein: apo(M), singly bound (MX), and doubly bound (MX2). The term ‘‘stepwise’’ emphasizes their relevance to the individual ligation steps (i.e., M ! MX ! MX2). Alternatively, the extent of binding can be described in terms of the overall macroscopic binding constants, b1 and b2:

X ¼ n

b1 xi þ 2b2 x2i : 1 þ b1 xi þ b2 x2i

ð10:53Þ

The overall binding constants pertain to the overall conversion from the apo-protein to a given binding state (i.e., b1 governs the M ! MX reaction; b2 governs the M ! MX2 reaction).

Table 10.2 Divalent ion-binding parametersa

DH1,Ca k1,Ca DH2,Ca k2,Ca DH1,Mg k1,Mg DH2,Mg k2,Mg – a b c

EDTA

EGTA

NTA

Rat a S55D/E59Db

6.3 (6.4, 5.9) 4.4 107 (4.3, 4.5) – – 4.3 (4.2, 4.5) 5.5 105 – – –

7.50 (7.7, 7.3) 1.7 107 (1.6, 1.8) – – – – – – –

1.9 (2.1, 1.7) 1.1 104 (0.9, 1.2) – – – – – – –

4.6 (4.8, 4.4) 1.8 1010 (1.4, 2.2) 4.6 (4.8, 4.4) 2.0 109 (1.6, 2.4) 3.5 (3.4, 3.6) 2.7 106 (2.1, 3.3) 3.9 (3.8, 4.0)

Phl p 7c

DH1,Ca K1,Ca DH2,Ca K2,Ca DH1,Mg K1,Mg DH2,Mg K2,Mg bCa,Mg

1.9 (2.0, 1.6) 1.7 106 (1.5, 2.0 4.0 (4.1, 3.7) 8.1 106 (6.8, 9.4) 2.8 (2.6, 2.9) 2.8 104 (2.4, 3.1) 5.7 (5.2, 6.1) 170 M1 (147, 197) 1.08 109 (0.49, 1.55)

Enthalpies are expressed in units of kcal/mol, binding constants in units of M1. Numbers in parentheses below the parameter values reflect the 95% confidence intervals. Values originally reported in Lee et al. (2004). Values originally reported in Henzl et al. (2008).

290

Michael T. Henzl

In terms of the stepwise constants, the function that must be minimized to obtain the free ligand concentration is given by:

K1 xi þ 2K2 K1 x2i ½X t;i : f ðxi Þ ¼ xi þ ½M t;i n 1 þ K1 xi þ K2 K1 x2i

ð10:54Þ

where nC is included to allow for a nonunity binding stoichiometry for the chelator. The corresponding expression for the cumulative heat of binding is:

DH1 K1 xi þ ðDH1 þ DH2 ÞK2 K1 x2i : ð10:55Þ Qi ¼ 10 Vo ½M t;i n 1 þ K1 xi þ K2 K1 x2i 6

If the titration is performed in the presence of a competing chelator, the function to be minimized becomes

K1 xi þ 2K2 K1 x2i nC K C x i þ ½C t;i ½X t;i ; f ðxi Þ ¼ xi þ ½M t;i n 1 þ K1 xi þ K2 K1 x2i 1 þ KC xi ð10:56Þ where nC is included to allow for a nonunity binding stoichiometry for the chelator. The corresponding expression for the cumulative heat of binding is then given by ! 2 DH K x þ ð DH þ DH ÞK K x 1 1 i 1 2 2 1 i Qi ¼ 106 Vo ½M t;i n 1 þ K1 xi þ K2 K1 x2i !! ð10:57Þ DHc nC KC xi þ ½C t;i : 1 þ K C xi

If the protein is titrated with ligand X in the presence of competing ion Y, then, as described earlier, we are faced with the task of obtaining selfconsistent estimates for both xi and yi. The situation is somewhat more complicated, however, owing to the possibility of formation of a mixed MXY species. The partition function, or binding polynomial, for this case must include the contributions from M, MX, MX2, MY, MY2, and MXY. The extent of binding of ligand X for this case becomes:

! K1 xi þ 2K2 K1 x2i þ bxy xi yi X ¼ n : 1 þ K1 xi þ K2 K1 x2i þ bxy xi yi þ K1y yi þ K2y K1y y2i

ð10:58Þ

291

Parvalbumin and Polcalcin Divalent Ion Binding

The corresponding expression for Y is

! K1y yi þ 2K2y K1y y2i þ bxy xi yi Y ¼ n : ð10:59Þ 1 þ K1 xi þ K2 K1 x2i þ bxy xi yi þ K1y yi þ K2y K1y y2i Thus, the following two equations must be simultaneously minimized: f ðxi Þ ¼ xi þ ½M t;i n K1 xi þ 2K2 K1 x2i þ bxy xi yi 1 þ K1 xi þ K2 K1 x2i þ bxy xi yi þ K1y yi þ K2y K1y y2i f ðyi Þ ¼ yi þ ½M t;i n K1 yi þ 2K2 K1 y2i þ bxy xi yi 1 þ K1 xi þ K2 K1 x2i þ bxy xi yi þ K1y yi þ K2y K1y y2i

! ½X t;i :

! ½Y t;i :

ð10:60Þ

ð10:61Þ

The expression for the cumulative heat must include the contributions from binding of both ions:

Qi ¼ DH1 a1;i þ DH1y a1y;i þ DH2 a2;i þ DH2y a2y;i þ DHxy axy;i ; ð10:62Þ where DHxy is the enthalpy associated with formation of the mixed complex and axy,i is the fraction of the macromolecule present as the mixed complex.

3.6. Modeling divalent ion binding by Phl p 7 Phl p 7 belongs to the family of proteins called polcalcins, small pollenspecific proteins containing two EF-hand motifs (Tinghino et al., 2002; Valenta et al., 1998;). Although their functions are presently unknown, they are believed to function in pollen-tube elongation. Interestingly, they are potent allergens. Phl p 7 is expressed naturally in timothy grass, Phleum pratense. The cDNA sequence, which was expressed in E. coli, codes for 77 amino acids plus the initiation codon. Although the protein crystallizes as an unusual domain-swapped dimer at pH 3.4 (Verdino et al., 2003), sedimentation analysis at neutral pH shows the protein to be exclusively monomeric in both apo- and Ca2+-loaded states (Henzl et al., 2008). Our interest in the protein grew from the observation that EF-hand 2 in Phl p 7 has a pentacarboxylate ligand array identical to that in the S55D or G98D variants

292

Michael T. Henzl

of rat b-parvalbumin (i.e., aspartyl carboxylates at +x, +y, +z, and x; main-chain carbonyl at y; and glutamate at z). As discussed earlier, this ligand array imparts heightened divalent ion affinity on the parvalbumin CD and EF sites. It was therefore of interest to see how it performed in an entirely different protein background. Samples of Phl p 7 were subjected to a battery of titrations similar to that employed for the S55D/E59D variant of rat a-parvalbumin, but including titrations in the presence of NTA. Raw data for the titration with Ca2+ and with Ca2+ in the presence of EGTA are displayed in Fig. 10.6. The difference in the appearance between the raw data for this protein and that of the S55D/E59D variant offers some insight into the capacity of this method for extracting binding parameters from very diverse systems. Integrated data for the entire suite of titrations are presented in Fig. 10.7. The independent two-site model described above failed to produce a

A 0.0 −0.2 −0.4

B

Power (mcal/s)

−0.6 −0.8

0

20

40

60 80 100 120 140 160 Time (min)

0

20

40

60 80 100 120 140 160 Time (min)

0.0 −0.5 −1.0 −1.5 −2.0

Figure 10.6 Representative raw data for the titration of the polcalcin Phl p 7 with Ca2+ (A) and with Ca2+ in the presence of EGTA (B).

293

Parvalbumin and Polcalcin Divalent Ion Binding

Residuals injection heat (mcal)

A

B −5 −10 −15 −20 −25

0.00

C Residuals injection heat (mcal)

0 –10 –20 –30 –40 –50 –60

0

0.05 0.10 0.15 Total [Ca2+] (mM)

0.20

50

0.00

D −10

20

−30

0

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Total [Ca2+] (mM)

0.20

0.05 0.10 0.15 Total [Ca2+] (mM)

0.20

0

40 30 10

0.05 0.10 0.15 Total [Ca2+] (mM)

−20

−40

0.00

Figure 10.7 Integrated ITC data for the titration of Phl p 7 with Ca2+ (A, white and grey), with Ca2+ in the presence of EGTA (B, white), EDTA (B, grey), and 0.1 mM NTA (B, black), with Mg2+ (C, white), with Mg2+ in the presence of EDTA (C, black), with Ca2+ in the presence of Mg2+ at 1.0 mM (D, white), 5.0 mM (D, grey), and 10 mM (D, black), and with Ca2+ in the presence of 1.0 mM NTA (D, white rectangles). The solid lines represent the optimal least-squares fit.

satisfactory fit to the data. In retrospect, this result was not surprising. Because the polcalcins are proposed to function as Ca2+-dependent regulatory proteins, it was quite plausible that Ca2+ binding would be accompanied by a substantial conformational change, which might well be manifested in positively cooperative metal-ion binding. Because positive macroscopic cooperativity cannot be described by a Scatchard-type model, it was necessary to employ a general Adair model. The optimal fit obtained with this system of equations is indicated by the solid lines in Fig. 10.7. The corresponding parameter values are listed in Table 10.2. The results yielded substantial insight into this protein. The binding of Ca2+ was decidedly cooperative, with K1 and K2 values of 1.7 106 and 8.0 106, respectively. These values yield a ratio of K2: K1 of 4.7. For equivalent binding sites, in the absence of cooperativity, this ratio would be 0.25. Thus, the minimum coupling free energy for Ca2+ binding

294

Michael T. Henzl

to Phl p 7 is RT ln(4.7/0.25) = 1.7 kcal/mol. In fact, the coupling energy could be substantially larger because the two sites EF-sites are decidedly nonequivalent. Whereas site 1 has just three anionic ligands, carboxylates occupy five of the six coordination sites in site 2. As discussed previously, in the rat a- and b-parvalbumin isoforms, the pentacarboxylate ligand arrays exhibit heightened divalent ion affinity. It is therefore likely that the initial Ca2+-binding event occurs at site 2. The positively cooperative Ca2+-binding behavior can be interpreted in terms of a classic allosteric model, in which the protein is capable of occupying two distinct conformations having different affinities for ligand. In this case, the molecule resides largely in the low-affinity T state in the absence of ligand. Binding of the first ligand is discouraged because the preferred form for binding is present at very low concentrations and because the net binding free energy includes a penalty for the conformational change to the high-affinity R state. By contrast, binding of the second ligand occurs more readily because the protein resides almost exclusively in the R state. Thus, the net affinity associated with binding the second ligand can exceed that for the first, even though the second site has intrinsically lower affinity for the ligand. Spectroscopic data indicate that, indeed, Ca2+ binding is accompanied by a substantial conformational change. Far-UV circular dichroism measurements reveal a significant increase in molar ellipticity in the presence of Ca2+. Moreover, ANS fluorescence increases markedly in the presence of Ca2+-bound Phl p 7, whereas it is virtually unchanged in the presence of the apo-protein. In contrast to Ca2+ binding, the binding of Mg2+ to Phl p 7 is highly sequential. The two association constants—2.78 104 and 170 M1—differ by more than two orders of magnitude. Consistent with the apparent absence of cooperativity, the binding of Mg2+ does not appear to provoke a significant conformational change. The CD spectrum is unchanged in the presence of Mg2+, and ANS fluorescence does not increase upon addition of the Mg2+-bound Phl p 7. Typically, the ratio of Ca2+- to Mg2+-binding constants for an EF-hand site is on the order of 1–2 104. In Phl p 7, however, the ratio of binding constants (K1: K1M) associated with site 2 is a mere 60. Two factors conspire to produce this anomalously low value. First, whereas Ca2+-binding at site 2 is linked to an energetically costly conformational change, the Mg2+-binding event evidently is not. Thus, whereas K1 is substantially smaller than the intrinsic binding constant, K1M approaches the intrinsic value. Secondly, the pentacarboxylate ligand array confers exceptionally high intrinsic Mg2+ affinity on site 2. By contrast, the ratio of Ca2+ and Mg2+ binding constants associated with site 1 is atypically large, at 4.7 104. Presumably, this elevated ratio reflects the enhancement of site 1 Ca2+ affinity, but not Mg2+ affinity, by the positively cooperative interaction with site 2.

Parvalbumin and Polcalcin Divalent Ion Binding

295

4. Conclusion Global analysis of direct and competitive ITC experiments offers a robust alternative to radiochemical or fluorescence-based methods for measuring divalent ion affinity in EF-hand proteins. The approach can be used to model either cooperative or noncooperative binding data. It has been used to estimate Ca2+-binding constants exceeding 1010 M1, as well as Mg2+-binding constants on the order of 102 M1. The global ITC strategy has the advantage over other methods that it furnishes estimates of the binding enthalpies for Ca2+ and Mg2+ directly, thereby providing a more complete thermodynamic description of divalent ion binding.

ACKNOWLEDGMENT This work was supported by NSF award MCB0543476 (to M.T.H.).

REFERENCES Beechem, J. M. (1992). Global analysis of biochemical and biophysical data. Methods Enzymol. 210, 37–54. Bevington, P. R. (1969). Data Reduction and Error Analysis for the Physical Sciences, McGraw-Hill, New York. Bevington, P. R., and Robinson, D. K. (1992). Data Reduction and Error Analysis for the Physical Sciences, 2nd ed. WCB/McGraw-Hill, Boston. Celio, M. R., Pauls, T., and Schwaller, B. (1996). Guidebook to the Calcium-Binding Proteins, Oxford University Press, Oxford. Eatough, D. (1970). Calorimetric determination of equilibrium constants for very stable metal-ligand complexes. Anal. Chem. 42, 635–639. Freyer, M. W., and Lewis, E. A. (2008). Isothermal titration calorimetry: Experimental design, data analysis, and probing macromolecule/ligand binding and kinetic interactions. Methods Cell Biol. 84, 79–113. Fukada, H., and Takahashi, K. (1998). Enthalpy and heat capacity changes for the proton dissociation of various buffer components in 0.1 M potassium chloride. Proteins 33, 159–166. Goldberg, R. N., Kishore, N., and Lennen, R. M. (2002). Thermodynamic quantities for the ionization reactions of buffers. J. Phys. Chem. Ref. Data 31, 231–370. Haiech, J., Derancourt, J., Pechere, J. F., and Demaille, J. G. (1979). Magnesium and calcium binding to parvalbumins: Evidence for differences between parvalbumins and an explanation of their relaxing function. Biochemistry 18, 2752–2758. Heizmann, C. W., and Kagi, U. (1989). Structure and function of parvalbumin. Adv. Exper. Med. Biol. 255, 215–222. Henzl, M. T., Agah, S., and Larson, J. D. (2004). Rat a- and b-parvalbumins: Comparison of their pentacarboxylate and site-interconversion variants. Biochemistry 43, 9307–9319. Henzl, M. T., Davis, M. E., and Tan, A. (2008). Divalent ion binding properties of the timothy grass allergen, Phl p 7. Biochemistry 47, 7846–7856.

296

Michael T. Henzl

Henzl, M. T., Hapak, R. C., and Goodpasture, E. A. (1996). Introduction of a fifth carboxylate ligand heightens the affinity of the oncomodulin CD and EF sites for Ca2þ. Biochemistry 35, 5856–5869. Hu, D. D., and Eftink, M. R. (1994). Thermodynamic studies of the interaction of trp aporepressor with tryptophan analogs. Biophys. Chem. 49, 233–239. Johnson, M. L., Correia, J. J., Yphantis, D. A., and Halvorson, H. R. (1981). Analysis of data from the analytical ultracentrifuge by nonlinear least-squares techniques. Biophys. J. 36, 575–588. Johnson, M. L., and Faunt, L. M. (1992). Parameter Estimation by Least-Squares Methods. Methods Enzymol. 210, 1–37. Kawasaki, H., and Kretsinger, R. H. (1995). Calcium-binding proteins 1: EF-hands. Protein Profile 2, 297–490. Kretsinger, R. H. (1980). Structure and evolution of calcium-modulated proteins. CRC critical reviews in biochemistry 8, 119–174. Kretsinger, R. H., and Nockolds, C. E. (1973). Carp muscle calcium-binding protein. II. Structure determination and general description. J. Biol. Chem. 248, 3313–3326. Ladbury, J. E., and Chowdhry, B. Z. (1996). Sensing the heat: The application of isothermal titration calorimetry to thermodynamic studies of biomolecular interactions. Chem. Biol. 3, 791–801. Leavitt, S., and Freire, E. (2001). Direct measurement of protein binding energetics by isothermal titration calorimetry. Curr. Opin. Struct. Biol. 11, 560–566. Lee, Y. H., Tanner, J. J., Larson, J. D., and Henzl, M. T. (2004). Crystal structure of a highaffinity variant of rat a-parvalbumin. Biochemistry 43, 10008–10017. Lewis, E. A., and Murphy, K. P. (2005). Isothermal titration calorimetry. Methods Mol. Biol. 305, 1–16. Marsden, B. J., Hodges, R. S., and Sykes, B. D. (1988). 1H NMR studies of synthetic peptide analogues of calcium-binding site III of rabbit skeletal troponin C: Effect on the lanthanum affinity of the interchange of aspartic acid and asparagine residues at the metal ion coordinating positions. Biochemistry 27, 4198–4206. Pauls, T. L., Cox, J. A., and Berchtold, M. W. (1996). The Ca2þ-binding proteins parvalbumin and oncomodulin and their genes: New structural and functional findings. Biochim. Biophys. Acta 1306, 39–54. Pepper, D. S. (1998). Some Alternative Coupling Chemistries for Affinity Chromatography, in Protein Protocols on CD-ROM, (Walker, ed.), Humana Press, Totowa, N.J. Philo, J. S. (1997). An improved function for fitting sedimentation velocity data for lowmolecular-weight solutes. Biophys. J. 72, 435–444. Schuck, P. (2000). Size-distribution analysis of macromolecules by sedimentation velocity ultracentrifugation and Lamm equation modeling. Biophys. J. 78, 1606–1619. Sigurskjold, B. W. (2000). Exact analysis of competition ligand binding by displacement isothermal titration calorimetry. Anal. Biochem. 277, 260–266. Sigurskjold, B. W., Berland, C. R., and Svensson, B. (1994). Thermodynamics of inhibitor binding to the catalytic site of glucoamylase from Aspergillus niger determined by displacement titration calorimetry. Biochemistry 33, 10191–10199. Strynadka, N. C. J., and James, M. N. G. (1989). Crystal structures of the helix-loop-helix calcium-binding proteins. Ann. Rev. Biochem. 58, 951–998. Tanner, J. J., Agah, S., Lee, Y. H., and Henzl, M. T. (2005). Crystal structure of the D94S/ G98E variant of rat a-parvalbumin. An explanation for the reduced divalent ion affinity. Biochemistry 44, 10966–10976. Tellinghuisen, J. (2004). Statistical error in isothermal titration calorimetry. Methods Enzymol. 383, 245–282. Tinghino, R., Twardosz, A., Barletta, B., Puggioni, E. M., Iacovacci, P., Butteroni, C., Afferni, C., Mari, A., Hayek, B., Di, F. G., Focke, M., Westritschnig, K., et al. (2002).

Parvalbumin and Polcalcin Divalent Ion Binding

297

Molecular, structural, and immunologic relationships between different families of recombinant calcium-binding pollen allergens. J. Allergy & Clin. Immunol. 109, 314–320. Valenta, R., Hayek, B., Seiberler, S., Bugajska-Schretter, A., Niederberger, V., Twardosz, A., Natter, S., Vangelista, L., Pastore, A., Spitzauer, S., and Kraft, D. (1998). Calcium-binding allergens: From plants to man. [Review] [53 refs]. Intl. Arch. Allergy & Immunol. 117, 160–166. Verdino, P., Westritschnig, K., Valenta, R., and Keller, W. (2003). Three-dimensional structure of the panallergen Phl p 7. Intl. Arch. Allergy & Immunol. 130, 10–11.

C H A P T E R

E L E V E N

Energetic Profiling of Protein Folds Jason Vertrees,*,† James O. Wrabl,*,† and Vincent J. Hilser*,† Contents 1. Introduction 2. Modeling the Native State Ensemble of Proteins using Statistical Thermodynamics 3. Energetic Profiles of Proteins Derived from Thermodynamics of the Native State Ensemble 4. Principal Components Analysis of Energetic Profile Space 5. Energetic Profiles are Conserved Between Homologous Proteins 6. Direct Alignment of Energetic Profiles Based on a Variant of the CE Algorithm 7. CE Algorithm Described for Structure Coordinates 8. Necessary Deviations from the CE Algorithm to Accommodate Energetic Profiles 9. Towards a Thermodynamic Homology of Fold Space: Clustering Energetic Profiles using STEPH 10. Energetic Profiles Provide a Vehicle to Discover Conserved Substructures in the Absence of Known Homology 11. Conclusion Acknowledgments References

300 301 304 306 308 315 316 317 318 321 323 325 325

Abstract Current protein classification methods treat high-resolution structures as static entities. However, experiments have well documented the dynamic nature of proteins. With knowledge that thermodynamic fluctuations around the highresolution structure contribute to a more physically accurate and biologically meaningful picture of a protein, the concept of a protein’s energetic profile is introduced. It is demonstrated on a large scale that energetic profiles are both diagnostic of a protein fold and evolutionarily relevant. Development of Structural Thermodynamic Ensemble-based Protein Homology (STEPH), an algorithm

* {

Department of Biochemistry and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04211-0

#

2009 Elsevier Inc. All rights reserved.

299

300

Jason Vertrees et al.

that searches for local similarities between energetic profiles, constitutes a first step towards a long-term goal of our laboratory to integrate thermodynamic information into protein-fold classification approaches.

1. Introduction Fold classification is indispensable for rationalizing evolutionary relationships between proteins (Kinch and Grishin, 2002; Lecomte et al., 2005; Russell, 2002). Many classification schemes exist, each organizing fold space from different viewpoints (Finn et al., 2008; Greene et al., 2007; Heger et al., 2007; Kriventseva et al., 2001; Murzin et al., 1995; Shindyalov and Bourne, 2000; Tatusov et al., 2003). Classification can be based on criteria including sequence, structure, or function, with the most useful schemes integrating information from several sources and supplemented by expert curation. Naturally, schemes based on different criteria exhibit differences in classification (Alva et al., 2008; Day et al., 2003; Kolodny et al., 2006; Taylor, 2007). These differences sometimes result in conflicting biological interpretations. It is our hypothesis that a subset of the differences between classification schemes is caused by neglect of a fundamental property of proteins; namely, that proteins exhibit dynamic fluctuations. Fluctuations are amply documented by experiment (Henzler-Wildman and Kern, 2007; Igumenova et al., 2006; Mittermaier and Kay, 2006) and indicate the presence of an ensemble of conformations under physiological conditions. Thus, the biologically relevant character of even the most stable proteins is most accurately described by the formalism of statistical thermodynamics. However, all current classification schemes that we are aware of ignore the energetic and dynamic variations within individual proteins, treating each highresolution structure as merely a single rigid entity out of the billions of conformers sampled by each protein. At best, this treatment results in an incomplete understanding of fold space. At worst, it is possible that important evolutionary relationships, based on shared energetic patterns, are overlooked or incorrect. Integration of dynamic information into protein-fold classification schemes is a long-term goal of our laboratory. Through a combination of proven algorithms and emerging methodology, two initial steps have now been achieved. First, detailed descriptions of local energetics have been calculated for a sampling of the complete protein-fold space. These descriptions are termed ‘‘energetic profiles’’ of proteins. Second, it has become clear that energetic profiles are thermodynamic fingerprints, diagnostic of a

301

Energetic Profiling of Protein Folds

protein fold without regard to sequence, structure, or functional information. Interestingly, energetic profiles often, but not always, complement such information. Therefore, energetic profiling of fold space has become crucial to development of a future evolutionary classification scheme integrating sequence, structure, and function with thermodynamic information. It is hoped that such a classification will ultimately reconcile differences in existing schemes and lead to a greater understanding of principles of protein structure and mechanisms of protein evolution.

2. Modeling the Native State Ensemble of Proteins using Statistical Thermodynamics The energetic profile of a protein is defined by thermodynamic properties as a function of residue position (Fig. 11.1). In contrast to the more common sequence-based profile of a protein family, which contains only amino acid frequencies (Dunbrack, 2006), an energetic profile may contain one or more thermodynamic properties, such as Gibbs free energy of stability, enthalpy, and entropy. These properties can be determined experimentally for individual proteins in favorable cases, for example, by NMR-detected hydrogen exchange (Ferreon et al., 2003; Milne et al., 1999; Orban et al., 1995). However, it is difficult to obtain experimental measurements on large numbers of proteins. Therefore, energetic profiling of fold space demands a computational approach. Our approach is to model the native state ensemble of a protein using statistical thermodynamics (Hilser, 2001; Hilser et al., 2006). The basic recipe is as follows: make many virtual copies of the protein structure; slightly perturb each copy to create many microstates, each indexed below by the subscript i; estimate the Gibbs free energy and statistical weight for each microstate (Eq. (11.1)); and, finally calculate the relative Boltzmann-weighted probabilities for each microstate by comparing individual microstates to the sum of all microstates in the ensemble (Eq. (11.2)). Formally, for a single microstate i, its statistical weight Ki is defined as follows: DGi

Ki ¼ e RT ;

ð11:1Þ

where DGi is the Gibbs free energy of the microstate, R is the gas constant, and T is the temperature. We can now calculate the Boltzmann-weighted probability Pi of each microstate as the ratio of the statistical weight of one state over all N microstates:

302

Jason Vertrees et al.

Pi ¼

Ki Ki ¼ N ; Q P Ki

ð11:2Þ

i¼1

where Ki was calculated in Eq. (11.1) and Q is the sum of all the statistical weights, also known as the partition function. The COREX algorithm was designed to implement the strategy described previously (Hilser and Freire, 1996). COREX models the native state ensemble of a protein using as input a high-resolution structure and a surface-area parameterized energy function. The protein is partitioned, by use of a sliding window, into regions of local folding and unfolding.

A

25

Engrailed homeodomain (a)

Energetic value (kcal/mol)

20 15

C

10 5 0 −5

0

10

20

30

40

50

N

−10 −15

ΔG ΔHapol ΔHpol −TΔSconf

−20 −25 B

60

Residue number

20

SH3 domain (b)

Energetic value (kcal/mol)

15 10 5 0

0

10

20

30

40

50

60

−5

N

−10 −15 −20

Residue number

ΔG ΔHapol ΔHpol −TΔSconf

Figure 11.1 Continued

C

303

Energetic Profiling of Protein Folds

Class s GST, N-terminal domain (a/b)

C 20

Energetic value (kcal/mol)

15 10 5 0

0

10

20

30

40

50

60

70

80

−5 −10

N

−15 −20

Residue number

C

ΔG ΔHapol ΔHpol −TΔSconf

Figure 11.1 Energetic profiles of three diverse protein structures. These profiles consist of local stability (DG), apolar solvation enthalpy (DHapol), polar solvation enthalpy (DHpol), and conformational entropy (TDSconf). The COREX algorithm (window size 5 residues, minimum window-size 4 residues, entropy weighting factor 0.5, simulated pH of 7.0, temperature 25.0 C) was run on three proteins: (A) drosophila engrailed homeodomain (PDB code 1p7iA, SCOP sid d1p7ia, SCOP structural class allalpha, a.4.1.1), (B) mouse SH3 domain (PDB code 1ckaA, SCOP sid d1ckaa1, SCOP structural class all-beta, b.34.2.1), (C) human class sigma glutathione S-transferase, N-terminal domain (PDB code 1iyhA, SCOP sid d1iyha2, SCOP structural class alpha/beta, c.47.1.5). DSSP secondary structure(Kabsch and Sander, 1983) is indicated immediately above the x-axis, helices as cylinders and strands as arrows. Rainbow colors indicate progression from N to C terminus to aid in the reader’s mapping of locations along the primary sequence to locations in the tertiary structure. All energetic values vary as a function of location in the protein structure, a result observed by experiment but not anticipated by treatment of the structure as a rigid entity.

These regions, when taken in all possible combinations between a fully folded and a fully unfolded protein, result in an exponentially large number of conformational microstates varying in surface area exposure. Each microstate can then be Boltzmann weighted by the parameterized energy function based on the Gibbs-Helmholtz expression:

T : DGi ðT Þ ¼ DHi ðTref Þ T DSi ðTref Þ þ DCpi ðT Tref Þ T ln Tref ð11:3Þ In Eq. (11.3), T is temperature of the simulated ensemble, Tref is a reference temperature for the parameterization, DHi(Tref) is the enthalpy of microstate

304

Jason Vertrees et al.

i at the reference temperature, DSi(Tref) is the entropy of microstate i at the reference temperature, and DCpi is the heat capacity of microstate i. The enthalpy, entropy, and heat capacity have long been experimentally parameterized as functions of surface-area exposure (Hilser and Freire, 1996; Hilser et al., 2006), and it was shown (Wrabl et al., 2002) that Eq. (11.3) can be rearranged to obtain:

DGi ¼ DHapol;i þ DHpol;i T DSapol;i T DSpol;i T DSconf ;i : ð11:4Þ In Eq. (11.4), apol and pol refer to apolar and polar surface area exposure, respectively, and DSconf is a separately parameterized conformational entropy term. The output of the algorithm is thus an energetically reasonable model of the protein’s native state ensemble. These calculations have been implemented as a community resource in a Web-based server, BEST (Vertrees et al., 2005) (Biological Ensemble-based Statistical Thermodynamics, http://www.best.utmb.edu); input is simply a PDB-style coordinate file of a protein and output is the ensemble-based energetic calculation. Furthermore, recent developments largely obviate the need for a crystal structure, as good approximations to COREX results can be obtained from primary sequence alone (Gu and Hilser, 2008). It is important to emphasize that the COREX algorithm has been extensively validated by experiment. The original version was shown to accurately recapitulate the observed native-state hydrogen-exchange protection factors for five proteins, including a blind prediction (Hilser and Freire, 1996). Later work modeled the pH dependence of protein stability (Whitten et al., 2005) and the temperature dependence of protein stability (Babu et al., 2004; Whitten et al., 2006), predicted functional sites in proteins (Liu et al., 2007), and rationalized the relationship between cooperativity and ligand binding (Pan et al., 2000; Whitten et al., 2008).

3. Energetic Profiles of Proteins Derived from Thermodynamics of the Native State Ensemble With the native-state ensemble in hand, the rich detail of a protein’s thermodynamic character can be observed. Notably, this detail is invisible from treatment of the protein as a static structure. Ensemble-average

305

Energetic Profiling of Protein Folds

stabilities, enthalpies, and entropies can be computed for a residue j of interest as follows (Wrabl et al., 2002):

hDGij ¼ hDHij hT DSij : hDHij ¼ hDHapol ij þ hDHpol ij : hT DSij ¼ hT DSapol ij þ hT DSpol ij þ hT DSconf ij :

ð11:5Þ ð11:6Þ ð11:7Þ

Additionally, the individual apolar and polar surface area terms of the energy function may be parsed, as well as the conformational entropy term (Wrabl et al., 2002): NX folded;j

hDHapol ij ¼ hDHpol ij ¼ hDSapol ij ¼ hDSpol ij ¼ hDSconf ij ¼

folded;i

i¼1 NX folded;j i¼1

NX folded;j i¼1 NX folded;j i¼1

NX folded;j i¼1

Pi

DHapol;i

folded;i

Pi

DHpol;i

folded;i Pi DSpol;i folded;i

i¼1 NX unfolded;j i¼1

folded;i Pi DSapol;i

Pi

NX unfolded;j

NX unfolded;j i¼1

DSconf ;i

NX unfolded;j i¼1 NX unfolded;j i¼1

unfolded;j

DHapol;i : ð11:8Þ

unfolded;j

ð11:9Þ

Pi Pi

DHpol;i :

unfolded;j

DSapol;i : ð11:10Þ

unfolded;j

ð11:11Þ

Pi Pi

DSpol;i :

unfolded;j

Pi

DSconf ;i : ð11:12Þ

In Eqs. (11.8–11.12), i is the microstate index over the subsets of microstates in which residue j is either folded or unfolded. (Nfolded,j þ Nunfolded, j ¼ N, the total number of microstates in the ensemble, the identical N of Eq. (11.2)). All of these thermodynamic quantities potentially comprise the energetic profile of a protein. Note that many quantities are presently impossible to obtain by experiment. Three detailed examples of energetic profiles are displayed in Fig. 11.1. The examples were chosen to represent different secondary structural classes of protein: the all-alpha engrailed homeodomain (Fig. 11.1A), the all-beta SH3 domain (Fig. 11.1B) and the alpha/beta glutathione S-transferase N-terminal domain (Fig. 11.1C). Despite the obvious differences in secondary and tertiary structure among the three proteins, there are no obvious differences reflected in the energetic profiles. In other words, it is impossible to distinguish the all-alpha protein from the all-beta protein by the graphical features of the energetic profiles. Indeed, we view energetic profiles as transcending the traditional definitions of secondary and tertiary structure

306

Jason Vertrees et al.

(Wrabl et al., 2001). Two features are evident upon inspection of the stability profiles. First, there is significant variability in the local stability within a particular protein. Second, the stability across a secondary structural element can be nearly equal across the residues in that element (e.g., helix 2 in Fig. 11.1C) or in many cases the stability can vary widely (e.g., helix 3 in Fig. 11.1A). The latter point undermines the notion that secondary structures correspond to de facto cooperative units. The four thermodynamic quantities displayed in Fig. 11.1 are the overall stability and the three component parameters that fully define the COREX energy function (i.e., the apolar and polar enthalpy and conformational entropy (Eq. (11.4)). These four canonical parameters have been historically used with good success in fold recognition experiments (Larson and Hilser, 2004; Wang et al., 2008; Wrabl et al., 2002), suggesting that the properties provide an adequate thermodynamic description of a protein. However, it turns out that these four parameters are not mathematically independent, and thus do not provide mechanistic details about the underlying origins of the different processes responsible for the energetics at each position in a protein. To elucidate these underlying processes, we applied principal components analysis.

4. Principal Components Analysis of Energetic Profile Space Clustering of the energetic profile space of a number of structurally diverse proteins indicates that it is meaningful to simplify the space into a small number of so-called thermodynamic environments (Larson and Hilser, 2004). Furthermore, it was discovered that specific amino acid types nonrandomly distribute into certain thermodynamic environments (Larson and Hilser, 2004; Wrabl et al., 2001, 2002). Although these thermodynamic propensities were found to be useful for protein-fold recognition, the biophysical explanation remains elusive. Fortunately, principal components analysis (PCA) of energetic profile space provides an explanation. PCA was performed on of 120 human proteins, comprising more than 17,000 residues. Briefly, the DG, DHapol, D, and TDSconf were computed as a function of residue for each protein using the COREX algorithm as described previously. These four-dimensional data were centered and the eigenvectors and eigenvalues (Table 11.1) were computed from the covariance matrix using standard methods (Manly, 1986; Press et al., 1992; Vertrees, 2008). Of note is that the first three principal components captured 99.2% of the total variance in the dataset (Fig. 11.2). There was a sharp decrease in the magnitude of the eigenvalues indicating a nonrandom signal and supporting

307

Energetic Profiling of Protein Folds

Table 11.1 Eigenvectors (principal components, PC) and eigenvalues (variances) of four-dimensional energetic profile space

Eigenvector

DG

PC1 0.55 PC2 0.15 PC3 0.59 PC4 0.57 Average 8.14 Energeticsa a

DHapol DHpol

TDSconf

0.65 0.51 0.69 0.70 0.22 0.23 0.23 0.44 9.53 11.72

0.09 0.11 0.74 0.66 4.56

Eigen value

Fractional Variance

24.07 7.04 0.85 0.02

0.752 0.220 0.027 0.001

In units of kcal/mol, at simulated temperature and pH of 25.0 C and 7.0, respectively.

100 90

Percentage of total variance

80 70 60 50 40 30 20 10 0 PC1

PC2

PC3

PC4

Figure 11.2 Variance of four-dimensional energetic profile space explained by principal components. Percentage variance of each principal component (i.e., eigenvalue in Table 11.1) is displayed. Clearly the first principal component accounts for the majority of variance and the first two components account for almost all variance. Therefore, subsequent energetic profiles may be greatly simplified by using only the first component instead of the four thermodynamic quantities DG, DHapol, DHpol, and TDSconf.

the use of principal components analysis in this application. Notably, principal component 1 (PC1) explained more than 75% of the variance. Interpretation of the eigenvectors as a rotation matrix indicated the linear combination of energetic terms that defined the principal axes (Table 11.1).

308

Jason Vertrees et al.

These results provided a biophysical understanding of the energetic profile space of proteins ( J. Vertrees, J. O. Wrabl, and V. J. Hilser, manuscript in preparation).

5. Energetic Profiles are Conserved Between Homologous Proteins As Fig. 11.2 suggests, the first three principal components provide a set of noncorrelated descriptors from which to generate energetic profiles of proteins. Fig. 11.3 shows the proteins described in Fig. 11.1, with their redefined energetic profiles resulting from centering and rotating according to the eigenvectors in Table 11.1. It is important to note that PCA orders the resulting eigenvectors by the magnitudes of their corresponding eigenvalues in a decreasing manner. Therefore, the amount of variance explained by the first principal component is greatest, the amount explained by the second next greatest, and so forth. This is seen in Figs. 11.3A–C in the following way: once the data are projected onto the principal components, the projected data points have the largest range (or highest variance) directly incident to the first principal component. Consequently, the PC1 curves in Figs. 11.3A–C always have larger ranges than either PC2 or PC3, a reflection of the fact that PC1 contains the most information of the three. Interestingly, stability and the apolar and polar enthalpies contributed approximately equally to PC1. Could energetic profiles, though apparently transcendent of the secondary and tertiary structural details of proteins, illuminate our understanding of the physical principles of protein structure and evolution? Comparison of two or more energetic profiles was necessary to address this question. Two identical structures are expected to result in identical energetic profiles, given the nature of the COREX algorithm. This result, though perhaps predictable for highly structurally similar proteins without insertions or deletions, was not obvious in general due to the difficulty of optimally aligning multidimensional energetic profiles of varying length. At least two approaches for aligning energetic profiles could be envisioned: (1) analyze the proteins and directly align the thermodynamic profiles, or (2) align their sequences or high-resolution structures and investigate the correlated thermodynamics of the equivalenced residues. We have actively pursued both approaches, and they have converged on a common result: energetic profiles are largely conserved between homologous proteins, diagnostic of particular folds, and thus appear to be relevant in protein evolution. Using the structural alignment approach, conservation of energetic profiles was demonstrated over a large-scale representative sample of fold space, described by Fig. 11.4. Diverse protein domains from the ASTRAL

309

Energetic Profiling of Protein Folds

compendium (Chandonia et al., 2004), as classified in the SCOP hierarchy (Murzin et al., 1995), were pairwise structurally aligned and their energetic profiles (from PC1) were equivalenced according to the structural alignment (Fig. 11.4A). Importantly, in this experiment the SCOP classification

A

Engrailed homeodomain (a)

6 4

Principal component value

2 0 0

10

20

30

40

50

60

−2 −4 −6 −8 PC1 −10

PC2 PC3

−12

Residue number

SH3 domain (b)

B 10

Principal component value

5

0 0

10

20

30

40

50

60

−5

−10

PC1 PC2

−15

PC3 Residue number

Figure 11.3 Continued

310

Jason Vertrees et al.

C

Class s GST, N-terminal domain (a/b)

10

Principal component value

5

0 0

10

20

30

40

50

60

70

80

−5

PC1

−10

PC2 PC3 −15

Residue number

Figure 11.3 Principal components transformed energetic profiles of three diverse protein structures. Energetic profiles of the structures of Fig. 11.1 are displayed as three transformed datasets, derived from the four thermodynamic quantities in Fig. 11.1 and transformed by the first three eigenvectors of Table 11.1.

provided a reasonable assessment of the evolutionary relationship between two proteins: if two proteins belonged to the same SCOP family, they were likely homologous. If they belonged to different SCOP secondary-structure classes, they were likely nonhomologous. A Pearson linear correlation coefficient (Press et al., 1992) was then computed between the two sets of equivalenced energetic values (Fig. 11.4B). These correlations were tabulated for both homologous and nonhomologous pairs exhaustively taken from the ASTRAL representative database, over 1 million pairs total. The probability distributions in Fig. 11.4C clearly demonstrate that homologous proteins have highly correlated energetic profiles, while nonhomologous proteins were less correlated. It is emphasized that the homologous proteins analyzed in Fig. 11.4 were relatively distant evolutionarily, on average exhibiting twilight zone (i.e., <25%) pairwise sequence identity, signifying that energetic comparisons were nontrivial. Does the conservation of energetic profiles extend deeper than just pairs of proteins? This question was addressed by similar experiments using multiple sequence alignments of entire protein families as proxy for pairwise equivalencies of energetic profiles. Results in Fig. 11.5 demonstrate similar results for protein families composed of many aligned sequences: energetic profiles were correlated when the proteins were homologous but were less correlated when the sequences were unrelated.

311

Energetic Profiling of Protein Folds

Note in both Figs. 11.4C and 11.5 that in some cases even nonhomologous proteins exhibited moderately high correlations while in other cases homologous proteins exhibited moderately low correlations. We have A

10 8

Principal component 1 value

6 4 2 0 0

10

20

30

40

50

60

70

−2 −4

d1t56a1 d1pb6a1

−6 −8

Aligned residue position 8

d1pb6a1 Principal component 1 value

B

4

−8

−4

0 4

0

−4

r = 0.75

−8 d1t56a1 principal component 1 value

Figure 11.4 Continued

312

Jason Vertrees et al.

C

2 Homologs (same SCOP family)

Probability density

Non-homologs (different SCOP class)

1

−1

−0.5

0 0

0.5

1

Pearson correlation coefficient, r

Figure 11.4 Probability densities of Pearson correlations of energetic profiles between homologous and nonhomologous proteins. 1866 protein domains of 100 residues or less were taken from the ASTRAL 1.69 database of 40% maximum sequence identity representatives (Chandonia et al., 2004); this set constitutes an arguably exhaustive sampling of known fold space. The domains were structurally aligned using DALI (Holm and Park, 2000) in an all-versus-all fashion (Fig. 11.4A, inset; the example shows a homologous engrailed homeodomain pair superimposed with an RMSD of ˚ according to the DALI alignment). Then, energetic profiles were computed for 1.5 A each domain using COREX (run under the parameters listed in the Fig. 11.1 legend) and the first eigenvalue given in Table 11.1 was used to transform each fourdimensional profile into one-dimensional principal component space. First principal components from each member of all possible pairs of proteins were equivalenced according to the DALI structure alignment (Fig. 11.4A) and a Pearson linear correlation coefficient, r, was computed for each pair (Fig. 11.4B). The first and last four residues in each energetic profile, if part of the structural alignment, were ignored in the correlation because of sliding window end effects in the COREX algorithm. To reduce noise ˚ and structure in the correlations, only pairs of proteins with resolutions less than 2.5 A alignments greater than 20 residues were considered. The densities of correlations corresponding to homologous and nonhomologous protein pairs (as defined by belonging to the same SCOP family or different SCOP secondary structure classes, respectively) were normalized such that their total areas equaled 1 (Fig. 11.4C). There were a total of 3715 homologous pairs and 547,600 nonhomologous pairs analyzed. Clearly, homologous protein pairs exhibited similar energetic profiles, as the median correlation between energetic profiles of homologous proteins was approximately r ¼ 0.6, and the median correlation for nonhomologs was approximately r ¼ 0.3.

observed at least two artifactual reasons for these observations. First, an unavoidable property of the Pearson correlation coefficient is that high correlations are statistically more likely with fewer numbers of points, as is

313

Energetic Profiling of Protein Folds

often the case for alignments of nonhomologous proteins with low sequence or structure similarity. Second, the ASTRAL compendium by design contains examples of clearly homologous proteins that undergo A

15

Engrailed homeodomain (a)

Number of protein pairs

Homeodomains

−1

Random domains

10

5

0

−0.5

0

0.5

1

0.5

1

Pearson correlation coefficient, r B

SH3 domain (b )

100

Number of protein pairs

SH3 domains Random domains

75

50

25

−1

−0.5

0

0

Pearson correlation coefficient, r

Figure 11.5 Continued

314

Jason Vertrees et al.

C

Class s GST, N-terminal domain (a/b) 40

Number of protein pairs

GST-N domains

−1

30

Random domains

20

10

−0.5

0

0

0.5

1

Pearson correlation coefficient, r

Figure 11.5 Distributions of Pearson correlations of energetic profiles within three protein-fold families. All domains (100 residues) of three SCOP families contained in the ASTRAL 1.69 40% representatives database were extracted: homeodomain (a.4.1.1, 11 members), SH3-domains (b.34.2.1, 29 members), glutathione S-transferase, N-terminal domain (c.47.1.5, 16 members). Each family was subjected to multiple sequence alignment using PROMALS3D (Pei et al., 2008). A randomly chosen set of nonhomologous domains, equal in members and chain lengths, was also multiply aligned as a control for each family. Then, pairwise Pearson correlation coefficients were computed in an all-versus-all fashion for each family and control, as described in the Fig. 11.4 legend. Distributions of these correlations clearly demonstrated that energetic profiles were similar within protein families. Median correlation coefficients for all families were greater than r ¼ 0.6, and median correlations for all unrelated proteins in the control alignments were less than r ¼ 0.3.

conformational changes across multiple crystal structures. These homologs were not removed during our analyses, and the different structures correctly resulted in different computed energetics, thus weakening specific correlations. However, we believe that a subset of these moderately high correlations between nonhomologs reflect regions of true thermodynamic or structural similarity between ostensibly unrelated proteins, one example of which is described subsequently in this chapter. This result suggests that thermodynamic information, while diagnostic of known evolutionary relationships, could complement existing classifications by uncovering new relationships. This subset of regions of true thermodynamic similarity between unrelated proteins will be the focus of future investigations.

315

Energetic Profiling of Protein Folds

6. Direct Alignment of Energetic Profiles Based on a Variant of the CE Algorithm A second strategy for comparing two energetic profiles involves treating the profiles as hyperdimensional structures for direct alignment. This is a difficult task because of the large problem space and the inclusion of gaps for two proteins of different lengths. In fact, any practical solution must formally be labeled ‘‘near-optimal’’ because, analogous to structure alignment, this problem is NP complete (Holm and Sander, 1995a; Lathrop, 1994). There exist numerous methods to determine near-optimal subsets of residues to be aligned from two proteins. Some include difference of distance matrices (DALI) (Holm and Sander, 1995b), combinatorial extension (CE) of the optimal path (Shindyalov and Bourne, 1998), genetic algorithms (Szustakowski and Weng, 2000), deterministic annealing (Chen et al., 2005), iterated double dynamic programming (Taylor, 1999), and heuristics combined with dynamic programming (Zhang and Skolnick, 2005). The two most often applied methods are DALI and CE, and we have adapted both methods to allow multidimensional energetic profiles as input. Although a systematic comparison of the utility of each method is beyond the scope of this chapter, CE was generally preferred because, in our hands, it exhibited greater efficiency and produced longer energetic alignments. CE also makes no assumptions about secondary structure (as is the case with the current version of DALI), and we desired alignments based solely on thermodynamic criteria without regard to sequence or three-dimensional structure characteristics. To modify the CE algorithm for our purposes, we considered the following. CE is based on operations that are performed on two important parameter matrices, the distance matrix and the similarity matrix. Let V be a given vector set with m vectors of dimension n. These vectors might be (x, y, z) structure coordinates; in such a case n would equal 3 and m would equal the number of residues observed in the crystal structure. In contrast, these vectors could be energetic profiles where n equals 4 principal components and m equals the number of residues in the protein. In either case, a distance matrix for the vector set V is the matrix D such that each element of di,j of D represents the Euclidean distance from vector vi to vector vj. Symbolically,

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ n X di;j ¼ ðvi;k vi;k Þ2 :

ð11:13Þ

k¼1

In Eq. (11.13), indices i and j cover all possible pairs of residues in the protein. Because the distance from point vi to point vj is identical to the distance from vj to vi, distance matrices are symmetric.

316

Jason Vertrees et al.

Next, let a and b be two vector sets, with m and p entries of dimension n, respectively. Furthermore, let A and B be their respective distance matrices of sizes m m and p p, respectively, created from Eq. (11.13). The similarity matrix, S, is defined as: w X X w 1 mX j Ai;jþk Bi;jþk j : w i¼1 j¼1 k¼1 pw

Si;j ¼

ð11:14Þ

where w is a fixed window size. In the original CE algorithm w ¼ 8 residues, and in our work w is usually less than 30. The similarity matrix therefore measures the similarity between two coordinate or energetic substructures, one from vector set a anchored at residue i to i þ k and the other from vector set b anchored at j to j þ k. High values denote dissimilar regions, low values denote similar regions, and zero is an exact match. Because the distance from points (ai, aj) in distance matrix A is not necessarily the same as the distance from points (bi, bj) in distance matrix B, not all similarity matrices are symmetric. In fact, they are only symmetric when A is equivalent to B. Good scoring sets of contiguous residues, or paths, can be quickly read off a similarity matrix. For example, to find the best path through the similarity matrix, one starts in the lower left corner and traces to the upper right corner along the lowest scoring diagonal path. Every (i, j) element in the similarity matrix deemed a ‘‘good’’ match according to a user-defined score threshold then pairs together residue i in protein A to residue j in protein B. Gaps are naturally allowed: move one or more positions to the up or right in the similarity matrix. Thus, two protein structures, or energetic profiles of any number of dimensions, may be optimally aligned from their similarity matrix.

7. CE Algorithm Described for Structure Coordinates CE finds the longest path of lowest score through the similarity matrix for two given protein structures. It can start anywhere in the similarity matrix (although S1,1 is a good choice as it ensures consideration of the longest possible path first) and compares that starting entry, for example Si,j, with a ˚ , then the substructures predefined cutoff measure of 3.0 A˚. If Si,j > 3.0 A anchored at i and j in proteins A and B, respectively, are not a good match and are thus ignored. If this is the case, Siþ1, j is considered. Once i is exhausted, or ˚ then the path has gapped 30 times, i is reset and j is incremented. If Si,j 3.0 A that residue pair (i, j) is stored and (iþw, jþw) (i.e., (iþ8, jþ8)), is considered. The algorithm is repeated until the similarity matrix is exhaustively searched

Energetic Profiling of Protein Folds

317

(using their heuristics to prune spurious low-scoring paths). At this point, the result is the longest possible, best scoring subsets of residues from the two structures. One can then move on to the next step of optimal rigid-body alignment. This variant of CE has been made available to the community on the PyMOL Wiki site (http://www.pymolwiki.org/index.php/Cealign). CE uses knowledge of protein structures to introduce a few heuristics to decrease the running time. First, CE partitions the protein sequence into windows of eight residues; this decreases the number of combinations to inspect. Second, CE ignores the distances between a residue and its two sequence neighbors because this distance varies little between residues. For example, the distance between an alpha carbon and its sequence neighbor’s alpha carbon is, with high probability, 3.8 A˚; any deviation is biophysically unlikely or of insufficient magnitude to impact the calculation and is thus ignored.

8. Necessary Deviations from the CE Algorithm to Accommodate Energetic Profiles Not surprisingly, numerical values of energetic profiles do not conform to assumptions made in the CE algorithm regarding the numerical values of protein structure coordinates. This necessitated investigation of every structure-based assumption in CE; the assumptions and necessary changes are enumerated here. First, because sequence neighbors may have large discontinuities in energetic space, during the equivalence step a modified algorithm must consider neighbor-neighbor distances while the original CE algorithm did not. Second, also because of extremely large dynamic range of values, some normalization of the energetic data was necessary. Without ˚ in normalization, large numerical jumps (e.g., 15 energetic units versus 3.0 A Cartesian coordinates) would dominate the similarity measures and cause spurious matches. Our variant of CE was built such that during construction of the similarity matrix, the distance matrices were locally scaled such that the largest distance in each submatrix is 1.0. This modification enabled windowbased structure scaling. Though it increased the running time of the algorithm, it allowed the algorithm to be more tolerant of the vagaries of scale and produced more accurate energetic profile alignments. A third deviation from the original CE algorithm occurred during the refinement step. CE called for a sequence-based refinement step. The variant algorithm removed the refinement step as alignments based solely on energetic information were desired. Finally, the last set of deviations dealt with the empirically defined parameters enforced by CE. The original algorithm required a window size (eight residues was determined to be optimal), a maximum gap of 30 ˚ , and a minimum path cutoff of 4.0 A ˚. residues, a minimum cutoff of 3.0 A

318

Jason Vertrees et al.

These values were determined from fine tuning CE against known structure databases. Unfortunately, we did not have the luxury of known thermodynamic homology—indeed that was the question we wished to address. We estimated these tunable parameters by comparison of three known structure families and clustering the results (data not shown). The optimal distance cutoffs chosen were 0.040w, where w was the window size in residues. Both cutoffs were set to this value in the energetic comparisons described subsequently. The longest continuous energetic substructures were found by setting the maximum gap to 0, with iterative increases of the window size from 6 to 26. This empirical procedure allowed initial detection of smaller as well as larger regions of energetic similarity. In protein structure comparison, the root-mean-square deviation (RMSD) of the equivalenced atoms is often used when checking the quality of the alignment. RMSD provides a quantitative measure of how good the alignment is but does not consider either the number of gaps or the length of the alignment. Thus, RMSD alone can be a misleading measure of quality; ˚ RMSD for example, an alignment with 200 equivalenced atoms and a 1.4 A with no internal gaps would be expected to be more biologically relevant than a gapped alignment exhibiting the same RMSD over only 20 atoms. Therefore, in scoring the quality of energetic profile alignments we employed the CE score, which has been demonstrated to better represent alignment quality ( Jia et al., 2004). The CE score is defined as,

RMSD G CE score ¼ 1:0 þ : L L

ð11:15Þ

In Eq. (11.15), RMSD is the minimum RMSD (Kabsch, 1976, 1978) of the energetic profile-equivalenced CA atoms, L is the number of equivalenced residues, and G is the number of internal gaps in the energetic profile alignment. An implementation of the Kabsch minimum RMSD algorithm used in this work has been made freely available to the community on the PyMOLWiki site (http://www.pymolwiki.org/index.php/Kabsch).

9. Towards a Thermodynamic Homology of Fold Space: Clustering Energetic Profiles using STEPH Given energetic profiles of proteins and a method to compare them, we could now consider the organization of thermodynamic fold space. Which proteins were related and why? What families of folds could be identified using only thermodynamic information? Answering such questions may provide a novel viewpoint to understand the evolution of

Energetic Profiling of Protein Folds

319

proteins. We employed our CE variant in an all-against-all fashion versus a subset of a thermodynamic database of 120 proteins previously studied in our laboratory (Larson and Hilser, 2004). We then analyzed these results using agglomerative hierarchical clustering tools to determine apparent thermodynamic relationships. Finally, we compared thermodynamic relationships with standard structure-based relationships. Our CE variant, Structural Thermodynamic Ensemble-based Protein Homology (STEPH) compared each energetic profile to every other, providing a dissimilarity matrix of CE scores where each protein was represented as a point in a high-dimensional energetic fold space. We next desired to divide the space into mutually exclusive regions and thereby define thermodynamic fold families. We started with the straightforward clustering method of Ward (1963), as implemented in the R language (http://www.rproject.org), to cluster the dissimilarity matrix. As mentioned earlier, we chose to restrict the gap parameter to zero in STEPH, otherwise highly fragmented alignments are observed. To find longer alignments, we iterated over a window size of 6 to 26 residues, retaining optimal energetic alignments. Each iteration’s results were turned into a distance matrix for that window size. Subsequently, each distance matrix was averaged into a global distance matrix and the global matrix was clustered. In detail, for each window size, the following binary mask M was applied to the alignment score matrix:

MA;B ¼

0 if ðA; BÞ 25th quantile : 1otherwise

ð11:16Þ

Therefore, if the energetic profile of a protein was close to another (i.e., within the 25th quantile), then its distance was reset to 0. If the profile was not close, its distance was reset to 1. In other words, for each window size, specific relationship information was retained and noise was disregarded, as defined by the 25th quantile. The 25th quantile was empirically chosen to retain some longer distances, as the 5th quantile, for example, retained too few profiles and was not a useful cutoff. The binary-masked matrices were added to form an averaged matrix for distance representation, and this averaged matrix was clustered using Ward’s algorithm as implemented by the hclust function in R. This procedure was applied to five small families of homologous proteins. The procedure was run on both structure coordinates as well as energetic profiles. Results are displayed in Fig. 11.6 as phylogenetic trees, although presently we interpret these trees simply as clusters and do not necessarily consider branch lengths as evolutionarily meaningful. The major difference between the arrangement of the structure-based clustering and the energetic profile-based clustering is the location of the 1fnlA/2fcbA branch. In structural terms it has been determined to have

320

Jason Vertrees et al.

b.1.1.4 d2fcba1

b.1.1.4 d1fn1a1

b.55.1.1 d1faoa

b.55.1.1 d1eaza

b.34.2.1 d2ab1a1

b.34.2.1 d1phta

b.18.1.2 d1d7pm

b.18.1.2 d1czta

b.18.1.2 d1kexa

C.37.1.8 d1n6ha

C.37.1.8 d1kaoa

b.34.2.1 d2ab1a1

b.34.2.1 d1phta

b.55.1.1 d1faoa

b.55.1.1 d1eaza

b.18.1.2 d1d7pm

b.18.1.2 d1czta

b.18.1.2 d1kexa

c.37.1.8 d1mh1a

c.37.1.8 d1kaoa

c.37.1.8 d1n6ha

c.37.1.8 d1m7ba

b.1.1.4 d2fcba1

Clustering from energetic profiles

b.1.1.4 d1fn1a1

B

C.37.1.8 d1mh1a

Clustering from structure coordinates

C.37.1.8 d1m7ba

A

Figure 11.6 Cluster trees built from either structure coordinates or energetic profiles of five homologous protein-fold families. Protein domains were contained in our previously studied human protein thermodynamic database (Larson and Hilser, 2004). SCOP families represented are G proteins (c.37.1.8), discoidin (b.18.1.2), SH3 domains (b.34.2.1), Pleckstrin-homology domains (b.55.1.1), and I set domains (b.1.1.4). Trees were built using agglomerative hierarchical clustering, as described in text, with input data from CE structure alignments or STEPH, our variant of CE, energetic profile alignments. (A) Clustering from three-dimensional structure coordinates. SCOP families are clearly segregated with all-beta proteins on a separate branch from the alpha/beta proteins. (B) Clustering from three-dimensional energetic profiles. SCOP families are again properly segregated but the I set domains have moved to the alpha/beta branch, for thermodynamic reasons described in the text.

Energetic Profiling of Protein Folds

321

some propinquity with the folds on the right side of the tree (Fig. 11.6A). The results from the clustering show the branch moving to the opposite side of the tree (Fig. 11.6B). What is the origin of this observation? To answer this, we consider the following. First, inspection of the optimal alignments and scores showed that the 1fnlA/2fcbA branch scored well thermodynamically with respect to the 1m7bA branch. Fig. 11.7 indicates why; the overlapping segments are comprised of residues 115–168 in 1fnlA and 39– 92 in 1m7bA. In Figs. 11.7A–D the two regions are aligned according to the STEPH alignment and their residues renumbered from 1 to more easily discuss the overlapping features. Though the overlap in general appears acceptable, we focus on two regions: spanning residues 10–18 and 35–42. The first region shows high overlap on the apolar enthalpy dimension with some variation on the polar enthalpy dimension. Inspection of the structural fragments is revealing (Fig. 11.8). The region from 1m7bA consists mainly of apolar residues in a loop fully exposed to solvent. Thus, upon unfolding of this region, the energetic contribution will be a relatively low amount of polar surface area with a relatively low amount of apolar complementary surface area being exposed as well. This effectively balances the apolar to polar enthalpy ratio: low direct polar exposure, low complementary apolar exposure. Thus, the apolar enthalpy dimensions for the two proteins should be relatively similar throughout this region, thus the overlap of apolar enthalpy. This also explains why the conformational entropy values for this region are vastly different, considering their locations in secondary and tertiary structure. The second region of importance spans residues 35–42. This region in both proteins exhibits similar values for all four descriptors, resulting in a good alignment. Thus, exploring these two subsequences shows that proteins have at their disposal various combinations of mechanisms to modulate local stability, enthalpy and entropy values. As reinforced by the energetic profiles shown in Fig. 11.1, the various mechanisms are simply not apparent in the tenets of ‘‘helices and sheets are stable while loops are unstable’’.

10. Energetic Profiles Provide a Vehicle to Discover Conserved Substructures in the Absence of Known Homology Not infrequently, substantial structural overlap was observed between two proteins with no known homology. This phenomenon occurs in the 1fnl and 1m7b proteins contained in the example of Fig. 11.6. They exhibited structural overlap of 24 residues, determined from their energetic profiles alone. Fig. 11.9 shows the two proteins with their thermodynamically aligned segments highlighted. Notice when structurally superimposed, these

1.0

B

1.0

Normalized energetic value

A

0.8

Normalized energetic value

ΔG

ΔHapol

0.8

0.6

0.4

0.2

0.6

0.4

0.2

d1fnla1 115–168 d1m7ba139–92

d1fnla1 115–168 d1m7ba139–92

0

0 0

10

20

30

40

50

60

0

10

Residue number in aligned fragment 1.0

D

1.0

Normalized energetic value

C

20

30

40

0.8

ΔHpol Normalized energetic value

50

60

Residue number in aligned fragment

TΔSconf

0.8

0.6

0.4

0.2

0.6

0.4

0.2

d1fnla1 115–168 d1m7ba139–92

d1fnla1 115–168 d1m7ba139–92 0

0 0

10

20

30

40

Residue number in aligned fragment

50

60

0

10

20

30

40

Residue number in aligned fragment

50

60

Energetic Profiling of Protein Folds

323

˚ RMSD). We believe this phenomenon is segments also aligned well (3.1 A characteristic in general of fold space, as exemplified by the presence of moderately high energetic correlations between nonhomologous proteins mentioned above (Fig. 11.4C). A second observation from these two overlapping substructures is that they were centered upon loops or coils. This hinted at an underlying principle of energetic profile alignments being anchored at loops or coils with different intervening secondary structure elements. In other words, if the loops/coils aligned well, then the intervening secondary structure elements aligned because of shared topological features affecting the local energetics, instead of shared secondary structure type. Because loops and coils are typically highly solvent accessible in globular proteins, perhaps it is not surprising that they exhibit thermodynamically uniform characteristics, in contrast to helices and strands which may have varying degrees of solvent exposure.

11. Conclusion Energetic profiles, derived from an experimentally validated statistical mechanical representation of a protein’s native state ensemble, are demonstrated to be a novel diagnostic of diverse protein folds. Consequently, new tools, such as the STEPH algorithm described in this work, are necessary to compute, compare, and classify energetic profiles. Initial analysis of a large representative sample of protein-fold space suggests that energetic profiles are evolutionarily relevant. However, the information in energetic profiles, although mostly complementary to the sequence, structure and functional descriptors used in current databases, also has the potential to reclassify protein-fold space. We believe the integration of this thermodynamic information into existing classification schemes will increase understanding of the physical principles of protein structure and the mechanisms of protein evolution.

Figure 11.7 Energetic profiles of two fragments of STEPH aligned nonhomologous proteins. A subset of 54 residues from the complete energetic alignment is shown: residues 115–168 (PDB numbering) of d1fnla1 and residues 39-92 (PDB numbering) of d1m7ba1. The aligned subset has been renumbered starting from 1. For clarity, the principal components transformed profiles are displayed as the original four energetic quantities: (A) local stability (DG), (B) apolar solvation enthalpy (DHapol), (C) polar solvation enthalpy (DHpol), and D. conformational entropy (TDSconf). Energetic profiles are normalized so that minimum values within each protein equal 0 and maximum values equal 1. Regions 10–18 and 35–42, discussed in text, are boxed in each panel.

324

Jason Vertrees et al.

Figure 11.8 Molecular rationalization of STEPH aligned fragments of nonhomologous proteins. Fragments of d1fnla (PDB 1fnlA residues 115–168) and d1m7ba1 (PDB 1m7bA residues 39–92) as energetically aligned by STEPH are displayed in green and light blue, respectively. Renumbered residues 10–18 of the 1m7bA aligned fragment discussed in the text are displayed in yellow, with apolar solvent exposed side chains explicitly shown.

Figure 11.9 Structurally similar regions of two nonhomologous proteins revealed by alignment of energetic profiles. Two structurally similar regions of 24 residues each, determined to be energetically similar from STEPH alignment as discussed in the text, are highlighted in red (d1fnla1) and yellow (d1m7ba). The remainders of both proteins are displayed in blue.

Energetic Profiling of Protein Folds

325

ACKNOWLEDGMENTS Supported by NIH grant (GM63747), NSF grant (MCB0406050), and the Robert A. Welch Foundation (H-1461).

REFERENCES Alva, V., Koretke, K. K., Coles, M., and Lupas, A. N. (2008). Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. Curr. Opin. Struct. Biol. 18, 358–365. Babu, C. R., Hilser, V. J., and Wand, A. J. (2004). Direct access to the cooperative substructure of proteins and the protein ensemble via cold denaturation. Nat. Struct. Mol. Biol. 11, 352–357. Chandonia, J. M., Hon, G., Walker, N. S., Lo Conte, L., Koehl, P., Levitt, M., and Brenner, S. E. (2004). The ASTRAL compendium in 2004. Nucleic Acids Res. 32, D189–D192. Chen, L., Zhou, T., and Tang, Y. (2005). Protein structure alignment by deterministic annealing. Bioinformatics 21, 51–62. Day, R., Beck, D. A., Armen, R. S., and Daggett, V. (2003). A consensus view of fold space: Combining SCOP, CATH, and the Dali Domain Dictionary. Protein Sci. 12, 2150–2160. Dunbrack, R. L., Jr. (2006). Sequence comparison and protein structure prediction. Curr. Opin. Struct. Biol. 16, 374–384. Ferreon, J. C., Volk, D. E., Luxon, B. A., Gorenstein, D. G., and Hilser, V. J. (2003). Solution structure, dynamics, and thermodynamics of the native state ensemble of the Sem-5 C-terminal SH3 domain. Biochemistry 42, 5582–5591. Finn, R. D., Tate, J., Mistry, J., Coggill, P. C., Sammut, S. J., Hotz, H. R., Ceric, G., Forslund, K., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2008). The Pfam protein families database. Nucleic Acids Res. 36, D281–D288. Greene, L. H., Lewis, T. E., Addou, S., Cuff, A., Dallman, T., Dibley, M., Redfern, O., Pearl, F., Rekha Nambudiry., Reid, A., Sillitoe, I., Yeats, C., et al. (2007). The CATH domain structure database: New protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 35, D291–D297. Gu, J., and Hilser, V. J. (2008). Predicting the energetics of conformational fluctuations in proteins from sequence: A strategy for profiling the proteome. Structure 16, 1627–1637. Heger, A., Mallick, S., Wilton, C., and Holm, L. (2007). The global trace graph, a novel paradigm for searching protein sequence databases. Bioinformatics 23, 2361–2367. Henzler-Wildman, K., and Kern, D. (2007). Dynamic personalities of proteins. Nature 450, 964–972. Hilser, V. J. (2001). Modeling the native state ensemble. Methods Mol. Biol. 168, 93–116. Hilser, V. J., and Freire, E. (1996). Structure-based calculation of the equilibrium folding pathway of proteins. Correlation with hydrogen exchange protection factors. J. Mol. Biol. 262, 756–772. Hilser, V. J., Garcia-Moreno, E. B., Oas, T. G., Kapp, G., and Whitten, S. T. (2006). A statistical thermodynamic model of the protein ensemble. Chem. Rev. 106, 1545–1558. Holm, L., and Sander, C. (1995a). 3-D lookup: Fast protein structure database searches at 90% reliability. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 179–187. Holm, L., and Sander, C. (1995b). Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233, 123–138.

326

Jason Vertrees et al.

Igumenova, T. I., Frederick, K. K., and Wand, A. J. (2006). Characterization of the fast dynamics of protein amino acid side chains using NMR relaxation in solution. Chem. Rev. 106, 1672–1699. Jia, Y., Dewey, T. G., Shindyalov, I. N., and Bourne, P. E. (2004). A new scoring function and associated statistical significance for structure alignment by CE. J. Comp. Biol. 11, 787–799. Kabsch, W. (1976). A solution for the best rotation to relate two vector sets. Acta Cryst. Sec. A 32, 922–923. Kabsch, W. (1978). A discussion of the solution for the best rotation to relate two vector sets. Acta Cryst. Sec. A 34A, 827–828. Kinch, L. N., and Grishin, N. V. (2002). Evolution of protein structures and functions. Curr. Opin. Struct. Biol. 12, 400–408. Kolodny, R., Petrey, D., and Honig, B. (2006). Protein structure comparison: Implications for the nature of fold space, and structure and function prediction. Curr. Opin. Struct. Biol. 16, 393–398. Kriventseva, E. V., Fleischmann, W., Zdobnov, E. M., and Apweiler, R. (2001). CluSTr: A database of clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res. 29, 33–36. Larson, S. A., and Hilser, V. J. (2004). Analysis of the ‘‘thermodynamic information content’’ of a Homo sapiens structural database reveals hierarchical thermodynamic organization. Protein Sci. 13, 1787–1801. Lathrop, R. H. (1994). The protein threading problem with sequence amino acid interaction preferences is NP complete. Protein Eng. 7, 1059–1068. Lecomte, J. T., Vuletich, D. A., and Lesk, A. M. (2005). Structural divergence and distant relationships in proteins: Evolution of the globins. Curr. Opin. Struct. Biol. 15, 290–301. Liu, T., Whitten, S. T., and Hilser, V. J. (2007). Functional residues serve a dominant role in mediating the cooperativity of the protein ensemble. Proc. Natl. Acad. Sci. USA 104, 4347–4352. Manly, B. F. J. (1986). Multivariate statistical methods: A primer Chapman and Hall, New York. Milne, J. S., Xu, Y., Mayne, L. C., and Englander, S. W. (1999). Experimental study of the protein folding landscape: Unfolding reactions in cytochrome c. J. Mol. Biol. 290, 811–822. Mittermaier, A., and Kay, L. E. (2006). New tools provide new insights in NMR studies of protein dynamics. Science 312, 224–228. Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540. Orban, J., Alexander, P., Bryan, P., and Khare, D. (1995). Assessment of stability differences in the protein G. B1 and B2 domains from hydrogen-deuterium exchange: Comparison with calorimetric data. Biochemistry 34, 15291–15300. Pan, H., Lee, J. C., and Hilser, V. J. (2000). Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc. Natl. Acad. Sci. USA 97, 12020–12025. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing. Cambridge University Press, New York. Russell, R. B. (2002). Classification of protein folds. Mol. Biotechnol. 20, 17–28. Shindyalov, I. N., and Bourne, P. E. (1998). Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11, 739–747. Shindyalov, I. N., and Bourne, P. E. (2000). An alternative view of protein fold space. Proteins 38, 247–260. Szustakowski, J. D., and Weng, Z. (2000). Protein structure alignment using a genetic algorithm. Proteins. Struct. Funct. Genet. 38, 428–440.

Energetic Profiling of Protein Folds

327

Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., et al. (2003). The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4, 41. Taylor, W. R. (1999). Protein structure comparison using iterated double dynamic programming. Protein Science 8, 654–665. Taylor, W. R. (2007). Evolutionary transitions in protein fold space. Curr. Opin. Struct. Biol. 17, 354–361. Vertrees, J. (2008). A thermodynamic definition of protein folds., Department of Biochemistry and Molecular Biology, Vol. Doctor of Philosophy. University of Texas Medical Branch, Galveston. pp. 159. Vertrees, J., Barritt, P., Whitten, S., and Hilser, V. J. (2005). COREX/BEST server: A web browser-based program that calculates regional stability variations within protein structures. Bioinformatics 21, 3318–3319. Wang, S., Gu, J., Larson, S. A., Whitten, S. T., and Hilser, V. J. (2008). Denatured-state energy landscapes of a protein structural database reveal the energetic determinants of a framework model for folding. J. Mol. Biol. 381, 1184–1201. Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244. Whitten, S. T., Garcia-Moreno, B. E., and Hilser, V. J. (2008). Ligand effects on the protein ensemble: Unifying the descriptions of ligand binding, local conformational fluctuations, and protein stability. Methods Cell Biol. 84, 871–891. Whitten, S. T., Garcia-Moreno, E. B., and Hilser, V. J. (2005). Local conformational fluctuations can. modulate the coupling between proton binding and global structural transitions in proteins. Proc. Natl. Acad. Sci. USA 102, 4282–4287. Whitten, S. T., Kurtz, A. J., Pometun, M. S., Wand, A. J., and Hilser, V. J. (2006). Revealing the nature of the native state ensemble through cold denaturation. Biochemistry 45, 10163–10174. Wrabl, J. O., Larson, S. A., and Hilser, V. J. (2001). Thermodynamic propensities of amino acids in the native state ensemble: Implications for fold recognition. Protein Sci. 10, 1032–1045. Wrabl, J. O., Larson, S. A., and Hilser, V. J. (2002). Thermodynamic environments in proteins: Fundamental determinants of fold specificity. Protein Sci. 11, 1945–1957. Zhang, Y., and Skolnick, J. (2005). TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309.

C H A P T E R

T W E LV E

Model Membrane Thermodynamics and Lateral Distribution of Cholesterol: From Experimental Data to Monte Carlo Simulation Juyang Huang* Contents 1. Introduction 2. Materials and Methods 2.1. Materials 2.2. Liposome preparation for COD activity measurement 2.3. Liposome preparation for X-ray diffraction measurement 2.4. X-ray diffraction 2.5. Cholesterol oxidase activity assay 2.6. Monte Carlo simulation of lipid membranes using a lattice model 2.7. Pairwise interactions and multibody interactions 2.8. Calculation of chemical potential of cholesterol from simulation 3. Result and Discussion 3.1. Maximum solubility of cholesterol in PC bilayers 3.2. The competition between cholesterol and ceramide in POPC bilayers 3.3. Measurement and simulation of the chemical potential of cholesterol in PC bilayers 4. Concluding Remarks Acknowledgments References

330 331 331 331 332 333 333 334 335 336 338 338 350 352 362 362 363

Abstract Thermodynamic analysis and Monte Carlo simulation techniques were used to study cholesterol-lipid interactions in lipid membranes. Experimental data, including the maximum solubility of cholesterol in lipid bilayers, the 1-to-1

*

Department of Physics, Texas Tech University, Lubbock, Texas, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04212-2

#

2009 Elsevier Inc. All rights reserved.

329

330

Juyang Huang

displacement of cholesterol by ceramide, and the cholesterol chemical activity with cholesterol oxidase (COD), were systematically analyzed using thermodynamic principles. A conceptual model, the umbrella model, is presented to describe the key cholesterol-lipid interaction in lipid membranes. In a lipid membrane, nonpolar cholesterol relies on polar phospholipid headgroup coverage to avoid the unfavorable free energy of cholesterol contact with water. This coverage requirement leads to cholesterol’s strong tendency not to clustering in a bilayer, its preferential association with large headgroup lipids with saturated acyl chains, and its competition with ceramide for large headgroup lipids. The umbrella model was parameterized into a form of multibody (i.e., nonpairwise) interaction for Monte Carlo simulation, and the measured chemical potentials of cholesterol agreed favorably with the predictions from the simulation. Under the right conditions, the multibody interactions can also lead to the formation of cholesterol superlattices. Also, an intrinsic thermodynamic connection between a jump in chemical potential and a regular distribution (RD) of membrane molecules was uncovered. This study shows that combining thermodynamics with computer simulation can be a productive approach for analyzing and interpreting complex experimental data, and thermodynamics can yield a predicting power in bioscience research.

1. Introduction Synthetic liposomes provide a valuable platform for studying the behaviors of membrane molecules in a controlled environment with a well-defined membrane composition and heterogeneity. Under normal experimental conditions, multicomponent liposomes are sufficiently large and are in thermodynamic equilibrium. Therefore, thermodynamics can be a powerful tool to analyze and interpret the complex experimental results. Cholesterol is a major constituent of the mammalian plasma membranes. It has been shown that cholesterol has a remarkable ability to alter the physicochemical properties of membranes and to induce membrane heterogeneity. The presence of cholesterol in a lipid membrane can drastically increase lipid acyl chain order, induce regular distributions (RD) of lipids or lipid raft domains, and modulate the activities of surface acting enzymes (Ahn and Sampson, 2004; Brown and London, 2000; Chong, 1994; Vist and Davis, 1990). In general, the interactions between cholesterol and other membrane molecules are not ideal. Mean-field regular solution theories are often inadequate to describe the complicity of the systems, because lipid mixtures are often heterogeneous, and distributional as well as conformational entropies play important roles. To overcome the shortcoming, computer simulation has become a valuable tool to explore the molecular interactions between membrane molecules and to simulate membrane domain distributions as well as phase separations.

Thermodynamics of Cholesterol-Lipid Interaction

331

In this article, we show that thermodynamics played an important role in understanding the complex experimental data, including the maximum solubility of cholesterol in lipid membranes, the competition between ceramide and cholesterol, the driving forces of cholesterol regular distributions (superlattices), and cholesterol’s chemical activity with cholesterol oxidase (COD). Combining thermodynamic principles with Monte Carlo simulation, it allowed us to quantitatively describe the key cholesterol-lipid interactions and to predict the behavior of cholesterol in lipid membranes.

2. Materials and Methods 2.1. Materials Phosphatidylcholines (PC), phosphatidylethanolamine (PE), and brain ceramide were purchased from Avanti Polar Lipids (Alabaster, AL). Cholesterol was purchased from Nu Chek Prep (Elysian, MN). Lipid purity (>99%) was confirmed by thin layer chromatography (TLC) on washed, activated silica-gel plates (Alltech Associates, Deerfield, IL) and developed with a 65/25/4 chloroform/methanol/water mixture for phospholipid analysis or with a 7/3/3 petroleum ether/ethyl ether/ chloroform mixture for cholesterol analysis. Concentrations of phospholipid stock solutions were determined with a phosphate assay (Kingsley and Feigenson, 1979). Aqueous buffer [5 mM PIPES, 200 mM KCl, and 1 mM NaN3 (pH 7.0)] was prepared from deionized water (18 MO) and filtered through a 0.1 mm filter before use. Recombinant cholesterol oxidase (COD) expressed in Escherichia coli (C-1235), peroxidase (P-8250) from horseradish, and other chemicals for the cholesterol oxidation measurements were obtained from Sigma (St. Louis, MO).

2.2. Liposome preparation for COD activity measurement The cholesterol content in all samples was kept at 60 mg, and the cholesterol mole fractions of samples were adjusted by adding appropriate amounts of PC or ceramide. Liposomes were prepared by the rapid solvent exchange (RSE) method (Ali et al., 2006; Buboltz and Feigenson, 1999). First, lipids were dissolved in 70 mL of chloroform. The lipid solution was then heated to 55 C briefly in a glass tube, and 1.3 mL of aqueous buffer was added. While the mixture was kept vigorously vortexed in the glass tube, the bulk solvent was removed by gradually reducing the pressure to 3 cm of Hg using a home-built vacuum attachment. The remaining trace chloroform was removed by vortexing for an additional 1 min at the same pressure. The liposomes prepared by these procedures were all sealed under argon. Sample tubes were placed in a programmable water bath (VWR, model 1187P),

332

Juyang Huang

preheated to 50 C for the subsequent heating and cooling cycle. The samples were first cooled to 24 C at a rate of 10 C/h and again heated to 50 C at the same rate. The samples were then kept at 50 C for an additional 1 h before finally being cooled to room temperature at a rate of 1.5 C/h. Finally, the liposomes were stored at room temperature on a mechanical shaker for 10 days in the dark before the cholesterol oxidation measurements. The majority of liposomes made by the RSE method are large unilamellar vesicles and can settle down in a test tube under gravity in a few hours (Buboltz and Feigenson, 1999). Compared with the ethanol injection and the extrusion method, RSE is a convenient method with no lipid-binding, solvent-contamination, or lipid-demixing concerns.

2.3. Liposome preparation for X-ray diffraction measurement The liposome samples for X-ray diffraction measurement were either made by the original Rapid solvent exchange (RSE) method or by the low temperature trapping (LTT) method. The original RSE method is described in detail in Buboltz and Feigenson (1999). Briefly, lipids were codissolved in 10–100 ml of dichloromethane (0.1% MeOH, 0.05% H2O) and then sprayed into vortexing buffer at reduced pressure, rapidly vaporizing the solvent and precipitating the lipid mixture in an aqueous environment. For sample preparation of mixtures containing DPPC, the buffer was maintained at 50 C throughout the RSE procedure, before cooling to room temperature. All samples were sealed under argon immediately following RSE. In the LTT method, lipids were dissolved in CHCl3 and the solvent removed under vacuum at 30 millitorr for 10 h. Lipids were redissolved in dry chloroform containing 1% methanol, then frozen in liquid nitrogen. Samples were lyophilized at low temperature, carefully controlling the temperature so that the chloroform remains solid. After bulk solvent had been removed, the lipid powders were kept cool (20 C) during continued vacuum incubation (12 h) to remove residual solvent. Just before hydration, the sample was warmed to room temperature in a stirring water bath for 1 min, then buffer added to the dry powder. The suspension was immediately vortexed for 1 min. Samples containing DPPC were hydrated and vortexed at 50 C. All samples were sealed under argon following hydration. Hydrated liposome dispersions made by LTT were first pelleted at 1,000g for 10–25 min; dispersions made by RSE were pelleted at 20,000g for 15 min. The lipid sediment was loaded into thinwalled 1.0-mm glass X-ray capillaries and further centrifuged in a buoyant support apparatus (Buboltz and Feigenson, 1999) at 20,000g for 15 min to produce a uniformly dense pellet. Capillaries were sealed by paraffin wax under argon gas. A typical sample contained about 1.5 mg of lipid.

333

Thermodynamics of Cholesterol-Lipid Interaction

2.4. X-ray diffraction X-ray diffraction experiments were carried out at the A-1 and F-1 beamlines at the Macromolecular Diffraction Facility at the Cornell High Energy Synchrotron Source (MacCHESS). Samples were illuminated by an intense synchrotron x-ray beam, with a wavelength of 0.908 A˚, passing through a 0.2-mm collimator. Diffraction images were collected with a Princeton 2K CCD detector containing 2048 2048 41-micron pixels. Both the low˚ to 110 A ˚ ) were angle and wide-angle diffraction patterns (from 3.8 A captured simultaneously on the same image. Depending on beam intensity and sample density, the exposure time for hydrated samples varied from 10–80 s. Samples were scanned 2–3 mm along the capillary axis during exposure, using a stepping motor. This procedure reduces radiation damage to the lipid and achieves more representative sampling. Image files were corrected for geometric distortions introduced by the CCD camera. To transform the powder patterns into radial profiles of diffraction intensity, each image was circularly integrated (using the IMP program provided by CHESS). The center of the beam and the tilt angle of the detector surface were precisely determined to prevent line broadening.

2.5. Cholesterol oxidase activity assay Cholesterol oxidase (COD) is a water-soluble monomeric enzyme that catalyzes the conversion of cholesterol to cholest-4-en-3-one. The initial rate of oxidation of cholesterol by the oxidase enzyme was determined through a coupled enzyme assay scheme:

COD

Cholesterol þ O2 ! H2 O2 þ Cholest4en3one POD

2H2 O2 þ 4Aminoantipyrine þ Phenol ! Red Quinoneimine þ H2 O The total reaction involves two steps. In the first step, the CODmediated oxidation of membrane cholesterol produces two products, hydrogen peroxide and cholest-4-en-3-one (Ahn and Sampson, 2004). Catalyzed by peroxidase (POD), the production of hydrogen peroxide in the first step subsequently leads to reaction with 4-aminoantipyrine and phenol and produces red-colored quinoneimine, which has a distinctive

334

Juyang Huang

absorption peak at 500 nm. For each cholesterol oxidation measurement, 1.3 mL of liposomes was first mixed with 0.2 mL of a 140 mM phenol solution. After mixing had been carried out, 1 mL of reaction buffer [1.64 mM aminoantipyrene and 10 units/mL peroxidase in PBS buffer (pH 7.40)] was added. The preceding mixture was then incubated at 37 C for at least 10 min and finally transferred to a cuvette preheated to 37 C in a heating block. The sample in the cuvette was maintained at 37 C and stirred with a home-built magnetic mini-stirrer during the measurement. An HP-8453 (UV-vis) linear diode array spectrophotometer (Agilent Technologies, Wilmington, DE) was used to measure the absorption spectra of the samples. The reaction was started by injecting 20 mL of a COD solution (5 units/mL) into the cuvette. The spectra were collected at a rate of two spectra per second, and the total data collection time per sample was 120 s. Background-corrected time dependent absorption of quinoneimine was determined by calculating the difference in the absorption at 500 nm and the background average over the range of 700–800 nm as a function of time. The initial oxidation rate (i.e., the rate of change in the quinoneimine absorption at time zero) was determined using a secondorder polynomial fit to the first 40 s of the absorption data. All data acquisition and data analysis were performed using the UV-visible ChemStation Software provided by Agilent Technologies.

2.6. Monte Carlo simulation of lipid membranes using a lattice model The cholesterol/phospholipid bilayer is modeled as a 2-dimensional hexagonal lattice. Each lattice site can be occupied by either a phospholipid acyl chain or a cholesterol molecule. Because the sizes and packing details are surely not identical for cholesterol and phospholipid acyl chains, a distorted hexagonal lattice seems likely. Such a distorted hexagonal lattice will serve our purpose well, as long as each site has 6 nearest neighbors. The effects of water and conformation entropies are taken into account via the energy parameters. The main advantages this Monte Carlo simulation are that it can handle large simulation size, and true equilibrium distribution of the molecules can be reached, thereby making possible the connection to equilibrium thermodynamic data, such as chemical potentials. All the simulations were performed on a 120 120 hexagonal lattice with a standard periodical boundary condition. As demonstrated earlier (Huang and Feigenson, 1993), such a large-scale simulation makes the simulation size effect negligible. Neighboring cholesterols and acyl chains can exchange their position with a probability given by the Metropolis method. All simulations started from a random mixture of given composition. Equilibrium conditions were

Thermodynamics of Cholesterol-Lipid Interaction

335

established after an initial 25,000–70,000 Monte Carlo steps. The ensemble average of the data was obtained in 10,000 Monte Carlo steps after equilibrium. Data were averaged from three independent runs.

2.7. Pairwise interactions and multibody interactions The Hamiltonian has two major components: one describing pairwise interactions and another for cholesterol multibody interactions with its nearest neighbors (Huang and Feigenson, 1999).

Htotal ¼ Hpair þ Hmulti :

ð12:1Þ

The pairwise part of the Hamiltonian includes the interactions of acyl chain-acyl chain, cholesterol-cholesterol, and acyl chain-cholesterol contact pairs:

Hpair ¼

1X 1X 1X Eaa Lai Laj þ Ecc Lci Lcj þ Eac ðLai Lcj þ Lci Laj Þ; 2 i;j 2 i;j 2 i;j ð12:2Þ

where Eaa, Ecc, and Eac are the interaction energies between acyl chains, between cholesterols, and between acyl chain-cholesterol, respectively; Lai and Lci are the occupation variables (= 0 or 1) of acyl chains and cholesterols, respectively. The summation i is over all lattice sites, and j is over the nearest-neighbor sites of i only. The factor 1/2 is necessary to avoid counting each contact pair twice. For this lattice system, the three interaction energies in Eq. (12.2) can be further reduced to just one independent variable. Eq. (12.2) is rewritten as:

Hpair ¼

ZX ZX 1X Eaa Lai þ Ecc Lci þ DEm ðLai Lcj þ Lci Laj Þ; 2 i 2 i 2 i;j ð12:3Þ

where Z is the number of nearest-neighbors to a lattice site, which is 6 for a hexagonal lattice. DEm is the pairwise-additive excess mixing energy of acyl chains and cholesterols, defined as

DEm ¼ Eac ðEaa þ Ecc Þ=2:

ð12:4Þ

336

Juyang Huang

In a canonical Monte Carlo simulation, P the total number of lattice sites (N ), the number of acyl chains (N ¼ a i Lai ), the number of cholesterols P (Nc ¼ i Lci ), and the temperature (T ) are all fixed for each simulation. Therefore, the first two terms in Eq. (12.3) are independent of lipid lateral distribution, and the entire contribution of pairwise-additive interactions to the lipid mixing behavior is determined by the value of DEm in the third term. The cholesterol multibody interaction with its 6 nearest neighbors is given by:

Hmulti ¼

6 XX i

DEc cs Lsi Lci ;

ð12:5Þ

s¼0

where DEc is the strength of the cholesterol multibody interaction, cs are the energy scaling factors, and Lsi is the environment variable of a lattice site, which is defined as

Lsi ¼

1; if site i has s cholesterol as it nearest neighbor 0; otherwise:

In Eq. (12.5) the summation s is over seven possible environments for lattice site i: a site can have zero to six cholesterols as nearest neighbors. Thus, if a cholesterol molecule is surrounded by s other cholesterols, then the multibody interaction energy for this cholesterol would be DEc cs. No energy difference is assumed for the different arrangements of these s cholesterols among the nearest-neighbor sites. Seven energy scaling factors (c0, c1,. . .,c6) define the relative magnitude of the multibody interaction in the seven possible situations, and DEc determines the overall strength of the cholesterol multibody interaction.

2.8. Calculation of chemical potential of cholesterol from simulation To relate our microscopic interaction model to experimental data, it is crucial to be able to calculate the chemical potential of cholesterol from computer simulations. We applied the Kirkwood coupling parameter method (Chialvo, 1990; Haile, 1986) to calculate the excess Gibbs free energy of the cholesterol/phospholipid mixtures. Although this approach is computationally intensive, it provides complete information: mixing free energy, enthalpy, entropy, and chemical potentials of cholesterol and acyl chains.

337

Thermodynamics of Cholesterol-Lipid Interaction

Following similar steps described in our earlier paper (see Huang et al., 1993, appendix), the excess Gibbs free energy of a cholesterol/phospholipid mixture is given by

DGE ðDEm ; DEc Þ ¼ N0

þN0

R DEc

R DEm

<

1X ðLai Lcj þ Lci Laj Þ>lc ¼0 2 i;j N

0

dlE

6 XX < Cs Lsi Lci >lE ¼DEm i

s¼0

N

0

dlC N0 DEc C6

Nc ; N

ð12:6Þ where N0 is Avogadro’s number; lE and lC are coupling parameters; and angle brackets denote an ensemble average from Monte Carlo simulations. By performing numerical integration, DGE can be calculated. The excess enthalpy DHE and entropy DSE are given by:

DH E

1X ðLai Lcj þ Lci Laj Þ > DEm =N 2 i;j ; 6 XX þN0 < Cs Lsi Lci > DEc =N N0 DEc C6 Nc =N

¼ N0 <

i

s¼0

ð12:7Þ and

DSE ¼ ðDH E DGE Þ=T :

ð12:8Þ

The excess chemical potentials of cholesterol and phospholipid (mEchol and can be obtained by differentiating DGE,

mElipid )

DGE ¼ mEchol Nc =N þ mElipid Na =N :

ð12:9Þ

For convenience, we choose the standard state of mEchol as that in which cholesterols are infinitely dilute in a phospholipid bilayer, and the standard state of mElipid as that in a pure phospholipid bilayer. Because each

338

Juyang Huang

phospholipid has two acyl chains, the mole fraction of cholesterol in a bilayer is given by

wchol ¼ 2Nc =ð2Nc þ Na Þ:

ð12:10Þ

3. Result and Discussion 3.1. Maximum solubility of cholesterol in PC bilayers The maximum solubility of cholesterol in a lipid bilayer (wchol ) is the upper limit of the mole fraction of cholesterol that can be incorporated into a lipid bilayer. When the cholesterol mole fraction of a sample exceeds this limit, excess cholesterol precipitates from the bilayer to form cholesterol monohydrate crystals. In terms of thermodynamics, this limit defines the phase boundary that separates a 1-phase region containing lipid bilayers in the liquid-ordered phase from a 2-phase region containing lipid bilayers saturated with cholesterol and cholesterol monohydrate crystals. Making a truly equilibrium phospholipid and cholesterol suspension at high cholesterol mole fraction is a challenging task and the key to making an accurate measurement of wchol is good sample preparation. It has been shown that artifactual demixing of cholesterol can occur during conventional sample preparation (either by a dry film method or by lyophilization) and that this demixed cholesterol may produce artifactual cholesterol crystals in a sample at a wchol far below the true cholesterol solubility limit. Therefore, a falsely low wchol value could result from conventional sample preparation. Two novel preparative methods, LTT (Huang et al., 1999) and RSE (Buboltz and Feigenson, 1999), have been developed to prevent the demixing. Both methods can produce truly equilibrium phospholipid and cholesterol suspensions. We found that X-ray diffraction can quantitatively and sensitively detect the formation of cholesterol monohydrate crystals. Experiments based on LTT or RSE sample preparation methods yield reproducible and precise cholesterol solubility limits. Fig. 12.1 shows a typical result of the X-ray diffraction experiment: The diffraction intensity of cholesterol monohydrate crystal remains zero until wchol in POPC bilayers is greater than 0.66, which is determined to be the maximum solubility of cholesterol in POPC bilayers. Table 12.1 summarizes the measured values of wchol in various lipid bilayers. Here, a clear pattern merges: wchol for many phosphatidylcholine bilayers is 0.66, and for a phosphatidylethanolamine bilayer is 0.51 (Huang et al., 1999).

339

Thermodynamics of Cholesterol-Lipid Interaction

0.8

3.5E−03

0.7 0.6 0.5

χc = 0.60

3.0E−03

χc = 0.54 2.5E−03

0.4

Time (s) 0.0E+00

0

0.3

20

40

60

80

2.0E−03 0.2 X-ray 0.1

COD initial reaction rate (AU)

χc = 0.64

Absorbance

Cholesterol crystal diffraction intensity (AU)

1.2E−01

1.5E−03

COD

0 0.4

0.5

−0.1

0.6

0.7

χchol

0.8 1.0E−03

Figure 12.1 Experimental determination of the maximum solubility of cholesterol, wchol , in POPC bilayers. Circles: Cholesterol crystal diffraction intensity versus cholesterol mole fraction. The intensity increases steadily as wchol > 0.66. Squares: The initial reaction rate of COD as a function of wchol. The rate has a sharp peak at wchol ¼ 0.67. Inset: COD reaction progress curves at wchol ¼ 0.54, 0.60, and 0.64. Table 12.1 Maximum solubility of cholesterol in PC and PE bilayers measured by X-ray diffraction or COD activity assay

Bilayer type

16:0, 18:1-PE (POPE) 16:0, 18:1-PC (POPC) di22:1-PC di16:0-PC (DPPC) di12:0-PC (DLPC) di14:0-PC (DMPC) di18:1-PC (DOPC) a b c

Maximum solubility by X-ray diffraction

0.51 (0.01)a 0.66 (0.01)a 0.66 (0.01)a 0.66 (0.01)a 0.66 (0.01)a

Maximum solubility by COD activity assay

0.67 (0.015)b 0.68 (0.015)b 0.67 (0.015)c 0.67 (0.015)c 0.66 (0.015)b

From Huang et al., 1999. From Ali et al., 2007. Unpublished data.

Recently, we developed a cholesterol oxidase (COD) activity assay, which can also be used to measure wchol in various lipid mixtures. After the injection of the COD enzyme, the height of the absorption peak of the

340

Juyang Huang

COD assay product, red quinoneimine at 500 nm was found to increase steadily with time, indicating the progress of the COD-mediated cholesterol oxidation reaction. The insert of Fig. 12.1 shows the absorption at 500 nm versus time for three POPC/cholesterol mixtures: wchol ¼ 0.54, 0.60, and 0.64. Because of the concern that the accumulation of the oxidized cholesterol product (i.e., cholest-4-en-3-one) in the lipid bilayer may alter the membrane properties, only the first 40 s of data was used to fit a second-order polynomial and to calculate the initial rate (i.e., initial slope) of the reaction. Fig. 12.1 also shows the initial rate of COD reaction as a function of wchol in POPC bilayers. The initial reaction rate has a sharp peak at wchol of 0.67, which coincides with wchol in POPC bilayers measured by X-ray diffraction within experimental uncertainty. Why does the initial reaction rate of COD reaction peak at the composition corresponding to the maximum solubility of cholesterol? Below wchol , the initial rate increases with wchol. Ahn and Sampson have pointed out that the initial rate of COD reaction is related to the chemical potential of cholesterol (mchol) in a lipid bilayer (Ahn and Sampson, 2004). As discussed in later sections, mchol should increase sharply near wchol , to approach that in cholesterol monohydrate crystals. When the overall cholesterol mole fraction is above wchol , COD initial reaction rate no longer reflects the behavior of mchol and drops sharply due to following reasons: (1) Because every sample has an identical amount of cholesterol (60 mg), when the overall cholesterol mole fraction of a sample is above wchol , some cholesterol is in the cholesterol crystal form, and less cholesterol remains in the bilayer phase available to react with COD; (2) cholesterol microcrystals could physically change the packing order of lipids and the curvature of lipid bilayers and affect COD activity. In previous X-ray diffraction experiments, it has been found that as soon as the cholesterol mole fraction passes the solubility limit and cholesterol monohydrate crystals begin to form, the broad wide-angle diffraction ˚ , corresponding to the acyl chain packing in the bilayers, peak at 4.9 A quickly disappears (Huang et al., 1999). Because the location of the sharp peak in the COD initial reaction rate coincides with the cholesterol solubility limit, it allows us to conveniently and sensitively measure wchol using this COD activity assay. Table 12.1 also lists some wchol values measured by COD activity assay, and the numbers agree quite well with those obtained by X-ray diffraction within experimental uncertainty. The values of wchol in binary mixtures of cholesterol and phospholipid show a very interesting pattern: for many PCs studied, regardless of acyl chain type (di12:0 or di14:0 or di16:0 or 16:0,18:1 or di22:1) the value of is 0.67. In contrast, we found the distinctly different result that for POPE is 0.5. These results naturally lead us to the following interesting questions: Why are the values of wchol insensitive to the type of phospholipid acyl chains? Why wchol occur close to cholesterol:phospholipid mole ratios of 1:1 and 2:1? What is the lateral packing of cholesterol and phospholipids at the

Thermodynamics of Cholesterol-Lipid Interaction

341

cholesterol solubility limit? What kind of microscopic interactions in cholesterol/phospholipid mixtures could give rise to the observed values of 0.67 and 0.5? It turns out that thermodynamics is a key tool to answer these questions. 3.1.1. General picture of the chemical potential of cholesterol near wchol Chemical potential of cholesterol (mchol), an important thermodynamic quantity, is a function of wchol. mchol can be interpreted as the molar free energy cost of adding one more cholesterol molecule to a lipid bilayer, thus its value reflects the interactions of cholesterol molecules with their surrounding molecules. Given the experimental data of wchol in Table 12.1, what is the general picture of the way in which cholesterol chemical potential changes as its mole fraction increases in a bilayer? First, the chemical potential of cholesterol in monohydrate crystal, mcrystal chol , is a constant, as cholesterol monohydrate crystal is a pure substance. On the other hand, the chemical potential of cholesterol in a lipid bilayer, mchol, is a function of the bilayer composition. When wchol < w∗ chol ; mchol must be less crystal than mchol for the bilayer to be the only stable phase. When wchol w∗ chol , the lipid bilayer phase and the cholesterol monohydrate crystal phase coexist. In this 2-phase region, the chemical potential of cholesterol in the bilayer must be equal to the chemical potential of cholesterol in the monohydrate crystal, mchol ¼ mcrystal chol . Thus, as wchol increases to the value of wchol , the chemical potential of cholesterol in the bilayer must increase to equal mcrystal chol . Because the wchol for PCs with very different acyl chains are almost identical, whatever is the nature of the microscopic interaction that induces the cholesterol precipitation, it must cause the chemical potential of cholesterol in the bilayer to increase so sharply at wchol that other contributions can be neglected. The experimental measurement of mchol (see section 3.3) showed that this general picture of mchol is correct. 3.1.2. Multibody interactions and jumps in cholesterol chemical potential What kind of molecular interaction between cholesterol and PC can produce a steep increase of chemical potential of cholesterol at wchol ¼ 0.50 and 0.67 (or at cholesterol:PC ratio of 1:1 and 2:1)? We explored various molecular interactions using Monte Carlo simulation, and reached a very interesting conclusion. A relatively simple form of the cholesterol multibody interaction can produce a sharp increase of mchol either at wchol ¼ 0.50, or 0.571, or 0.667 (i.e., at cholesterol:PC ratio of 1:1, or 4:3, or 2:1). Accompanying each steep increase, the lateral distribution of cholesterol in the bilayer adopts a well-defined regular distribution (RD) pattern: a hexagonal monomer pattern at wchol ¼ 0.5, an aligned dimer pattern at 0.571, and a maze pattern at 0.667 (Fig. 12.4). In general, pairwise

342

Juyang Huang

interactions cannot produce these steep increases. Thus, the data of maximum solubility of cholesterol is only consistent with the assumption that the key interaction between cholesterol and phospholipid is a multibody interaction. For a pairwise interaction, the number of interaction pairs is counted, and the total interaction energy increases linearly with the number of interaction pairs. For example in Eq. (12.2), Ecc is the interaction energy for a cholesterol-cholesterol pair. If there are n cholesterol-cholesterol contacts, then the total energy is simply nEcc. Thus, the total pairwise energy is the sum of the energies of each individual pair. In contrast, the multibody interaction energy is a description of the interactions of all nearest neighbors considered as a group. The multibody interaction Hamiltonian in Eq. (12.5) allows the total interaction energy to increase nonlinearly with the number of cholesterol-cholesterol contacts. For example, if a cholesterol has two cholesterol-cholesterol contacts, the interaction energy would be c2DEc instead of 2DEc. Here c2 is a chosen parameter. Thus, six parameters, c1. . .c6 (c0 is always 0) need to be specified for the cholesterol multibody interaction in a hexagonal lattice. Table 12.2 lists four sets of multibody interaction energy parameters (MIEP), which can produce jumps in mEchol at certain cholesterol mole fractions. It is interesting to notice that a pairwise interaction is actually a special case of multibody interaction. For example, in the first parameter set, MIEP I, the energy scaling factors of cholesterol multibody interaction are assigned as (c0, c1, . . ., c6) ¼ (0, 1, 2, 3, 4, 5, 6) (i.e., the total energy increases linearly with the number of cholesterolcholesterol contacts), and the multibody interaction Hamiltonian is reduced to a pairwise Hamiltonian. Thus, the term, ‘‘cholesterol multibody interaction’’, precisely means the nonlinear increase of the total interaction energy with the number of cholesterol-cholesterol contacts. The second set, MIEP II, is constructed to create a relatively small energy cost for each pair of cholesterols that are in contact but completely surrounded by phospholipids (i.e., a dimer), with a much greater energy cost for three cholesterols in contact. The first cholesterol-cholesterol Table 12.2 Examples of cholesterol multibody interaction energy parameter set for MC simulation

Multibody interaction energy parameter set

Energy cost for each additional cholesterol-cholesterol contact

Eq. (12.5) coefficients c0 c1 c2 c3 c4 c5 c6

MIEP I MIEP II MIEP III MIEP IV

0111111 0183333 0 1 1 10 3 3 3 0123456

0123456 0 1 9 12 15 18 21 0 1 2 12 15 18 21 0 1 3 6 10 15 21

Thermodynamics of Cholesterol-Lipid Interaction

343

Figure 12.2 Snapshots of phospholipid and cholesterol lateral distribution simulated using MIEP II. (A) wchol ¼ 0.51 and DEC ¼ 0.5 kT; (B) wchol ¼ 0.57 and DEC ¼ 0.5 kT. Cholesterols form an aligned dimer pattern; (C) wchol ¼ 0.62 and DEC ¼ 0.5 kT; (D) wchol ¼ 0.57 and DEC ¼ 0.2 kT. Filled circles: cholesterol. Open circles: Acyl chains of PC.

contact costs 1 unit of energy, whereas the next one costs 8 energy units, much higher than the first. Contacts that are even higher order are not so critical, so all are assigned to 3 units. Thus, c1 ¼ 1, c2 ¼ 1 þ 8 ¼ 9, c3 ¼ 1 þ 8 þ 3 =12, and so on. Fig. 12.2 shows snapshots of cholesterol and acyl chain lateral distributions simulated with MIEP II. With a low magnitude of DEc (0.2 kT), as shown in Fig. 12.2D, at wchol ¼ 0.57 the distribution shows no particular pattern. However, at the same composition, with DEc ¼ 0.5 kT, cholesterol molecules form a regular distribution (RD): an aligned dimer pattern, in which each cholesterol has exactly one cholesterolcholesterol contact (Fig. 12.2B). Fig. 12.3B shows the excess chemical potential of cholesterol versus wchol : mEchol has a sharp jump at 0.57 for DEc 0.5 kT. Because mchol ¼ kT lnwchol þ mEchol , and the first term (i.e., the ideal mixing chemical potential) is a smooth function of wchol, a jump in mEchol is equivalent to a jump in mchol. Understanding the mechanism of the dimer pattern formation can help one to understand how other multibody interaction sets work. The formation of the dimer pattern and abrupt increase of mEchol at wchol ¼ 0.57 are direct results of this particular choice of the multibody interaction parameters. In Table 12.2, MIEP II is deliberately formulated to make the second cholesterol-cholesterol contact cost much more energy than the first. The mixture responds to MIEP II with a distribution that minimizes the second cholesterol-cholesterol contact, as shown in Fig. 12.2B. At wchol ¼ 0.57, the cholesterol:PC ratio is 4:3.

344

Juyang Huang

A

ΔEm = −3 kT

m Echol/RT

−1.0

5

−0.5

0.5

0.6

15

D

ΔEc = 0.7 kT 0.6 0.4 0.3

5

0 0.4

10

0.4

5

0.2

0.5

0.6

0

0.7 0.2

MIEP III

0.4 0.6

−10 −15

ΔEc = 1.0kT

−20 0.4

0.7

0.6 χchol

0.6

0.8

0.2

0.5

0.8

15

−5

0.5

10

ΔEc = 1.0kT MIEP III

0 0.4

0.7

MIEP II m Echol/RT

20

−2.0 −1.5

0 0.4

B

−2.5

m Elipid /RT

m Echol/RT

Pairwise-Additive or MIEP I 10

25

C

0.5

0.6

0.7

E ΔEc = 6 kT

MIEP IV m Echol /RT

40

5 4 3

20

2 1

0 0.4

0.5

χ chol

0.6

0.7

Figure 12.3 Excess chemical potential of cholesterol, mEchol , or of phospholipid, mElipid , as a function of wchol. (A) mEchol , simulated using MIEP I (equivalent to pairwise); (B) mEchol , simulated using MIEP II; (C) mEchol and (D) mElipid , simulated using MIEP III; (E) mEchol , simulated using MIEP IV. From Huang and Feigenson (1999), with permission.

The dimer pattern is the only pattern at this composition such that no cholesterol has the second cholesterol-cholesterol contact (i.e., the minimum energy distribution at this composition). MIEP III can produce a steep increase of mEchol and the maze regular distribution pattern at wchol ¼ 0.67. As shown in Fig. 12.4F, at DEc ¼ 0.6 kT, cholesterol molecules form the maze pattern: the majority of cholesterols have two cholesterol-cholesterol contacts. MIEP III is formulated so that the first two cholesterol-cholesterol contacts only cost 1 energy unit each, but the third one costs 10 units, much higher than the first two. The mixture responds to MIEP III with another RD, which minimizes the third cholesterol-cholesterol contact, as shown in Fig. 12.4F. The excess chemical potential of the mixture components is shown in Fig. 12.3C,D.

345

Thermodynamics of Cholesterol-Lipid Interaction

A

B

C

D

E

F

Figure 12.4 Six regular distributions of cholesterol that have been successfully simulated using MC simulation based on the umbrella model. Black circles: cholesterol. Open circles: acyl chains of PC. (A) wchol ¼ 0.154; (B) wchol ¼ 0.25; (C) wchol ¼ 0.40; (D) The ‘‘monomer’’ pattern at wchol ¼ 0.50; (E) The dimer pattern at wchol ¼ 0.571; (F) The ‘‘maze’’ pattern at wchol ¼ 0.667, which is the maximum solubility limit of cholesterol in PC.

With DEc 0.3 kT, mEchol has a steep increase at wchol ¼ 0.67. The excess chemical potential of the acyl chains is plotted in Fig. 12.3D. It shows the decrease at wchol ¼ 0.67 expected from the Gibbs-Duhem equation. Similarly, MIEP I can produce a steep increase of mEchol at wchol ¼ 0.50. Fig. 12.4D is a snapshot of lipid distribution simulated with MIEP I. When DEc 4 kT (i.e., DEm 2 kT, using Eq. (12.4)), cholesterol molecules form a hexagonal monomer pattern: the majority of cholesterols have no cholesterol-cholesterol contact. In this case, the energy cost for the first cholesterol-cholesterol contact becomes so high that cholesterol is avoiding any clustering. The excess chemical potential of cholesterol is shown in Fig. 12.3A. As DEm < 2 kT, mEchol has a steep increase at wchol ¼ 0.50. MIEP I, MIEP II, or MIEP III were designed to produce a single regular distribution pattern and a cholesterol chemical potential jump at wchol ¼ 0.50, 0.57, or 0.67, respectively. It is actually possible to have a MIEP set that produces all three regular distribution patterns and jumps: The basic requirement is that there must be an accelerating increase of interaction energy as a function of the number of cholesterol-cholesterol contacts (i.e., (ciþ1 ci) > (ci ci-1). MIEP IV in Table 12.2 satisfies such a requirement. In MIEP IV, the energy cost for each additional cholesterol-cholesterol contact was chosen to be 1 unit higher than the preceding one. Thus, ci increases nonlinearly from 0 to 21. Fig. 12.3E shows the excess chemical

346

Juyang Huang

1

ΔGE, ΔHE, and −TΔSE/RT

0

−TΔSE

−1 −2 −3 ΔGE −4 ΔHE −5 0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

χchol

Figure 12.5 DGE and DHE, and TDSE as functions of cholesterol mole fraction, for DEC ¼ 0.6 kT, simulated with MIEP II. From Huang and Feigenson (1999), with permission.

potential of cholesterol versus wchol. This curve has 3 steep increases for DEc 3 kT: a steep jump at wchol ¼ 0.5, and more gradual rises at 0.57 and 0.67. The steepness of each increase can be adjusted by fine-tuning MIEP parameters. Fig. 12.5 shows a plot of DGE and DHE, together with TDSE for DEc ¼ 0.6 kT, simulated with MIEP II. DGE and DHE are both at global minima at wchol ¼ 0.57. On the other hand, TDSE has a peak at wchol ¼ 0.57. This indicates that by forming the aligned dimer pattern at wchol ¼ 0.57, as in Fig. 12.2B, the excess entropy DSE is drastically reduced. The entropy part of the Gibbs free energy TDSE actually increases to a peak at wchol ¼ 0.57. In general, whenever a regular distribution is formed, the excess entropy, DSE, reaches a minimum, and therefore TDSE is always at a peak. In contrast, DHE is always at a local (sometimes global) minimum. On the other hand, DGE is not always at a minimum, but usually has a sudden change of slope at a regular distribution composition. 3.1.3. Physical origin of the cholesterol multibody interaction: The umbrella model The key characteristic of the multibody interaction is that the energy cost for some higher-order cholesterol-cholesterol contacts must become higher than that for lower-order contacts, which produces steep increases in cholesterol chemical potential at wchol ¼ 0.50 and 0.67. The umbrella

347

Thermodynamics of Cholesterol-Lipid Interaction

model was proposed to explain the physical origin of the multibody interactions (Huang and Feigenson, 1999). Cholesterol has a large nonpolar steroid ring body and a relatively small polar hydroxyl headgroup. In water, cholesterol forms monohydrate crystals instead of bilayers, because the small hydroxyl group is unable to protect its large nonpolar body from water. When cholesterols are incorporated into a phospholipid bilayer, neighboring phospholipid headgroups provide cover to shield the nonpolar part of cholesterol from exposure to water to avoid the unfavorable free energy. Cholesterol hydroxyl groups also interact at the aqueous interface to provide partial coverage, but these hydroxyls cannot completely cover the nonpolar part of cholesterol without help from phospholipid headgroups. This is illustrated schematically in Fig. 12.6A. Phospholipid headgroups act like umbrellas. The space under the headgroups is shared by acyl chains and cholesterols. As the cholesterol content in a bilayer increases, phospholipids laterally redistribute and their polar headgroups reorient to provide more coverage per headgroup for the increasing fraction of cholesterol molecules, as drawn

A Low concentration

D

B Saturation concentration

Crystal precipitation C

Unfavorable situation

Cholesterol crystals

Figure 12.6 The umbrella model and the physical interpretation of cholesterol maximum solubility limit. (A) At low cholesterol concentration, headgroups of neighboring PCs easily cover the hydrophobic bodies of cholesterol. Motion of acyl chains next to cholesterol is restricted by the rigid body of cholesterol. (B) At the saturation concentration, headgroups of PCs work together to cover the maximum amount of cholesterol. The acyl chains of PCs become highly ordered. (C) If more cholesterol were added and stays in the bilayer, patches of pure cholesterol would form, and some cholesterol would be exposed to water, which is highly unfavorable. (D) To avoid forming cholesterol patches, excess cholesterol precipitate and form cholesterol crystals, and the lipid bilayer retains the maximum amount of cholesterol that can be covered.

348

Juyang Huang

in Fig. 12.6B. The headgroup umbrellas are stretched to provide more coverage area. Under the umbrella, acyl chains and cholesterol molecules become tightly packed. No cholesterol is exposed to water at this point. Obviously, it would be much easier for neighboring phospholipids to cover a cholesterol monomer than to cover a cholesterol cluster, and the free energy cost of the coverage must increase rapidly with the size of the cholesterol cluster. This is likely to be the physical origin of the large increase of multibody interaction energy for some higher-order cholesterol-cholesterol contacts. As cholesterol concentration increases, fewer and fewer lateral distributions of lipid can satisfy the coverage requirement. This forces cholesterol to form regular distributions (e.g., hexagonal or maze patterns) in the bilayer. As illustrated in Fig. 12.6C, if the phospholipid headgroups are stretched to their limits, they can no longer provide shielding for additional cholesterols. Exposure of cholesterol to water is very unfavorable. This may well be the big increase in energy cost for acquiring an additional cholesterol-cholesterol contact needed to produce the regular distribution of cholesterol and the steep jump of mchol at critical mole fractions. To lower the overall free energy, instead of allowing the hydrophobic regions of bilayers to be exposed to water, excess cholesterol molecules precipitate, forming cholesterol monohydrate crystals, as shown in Fig. 12.6D. Therefore, in the umbrella model, wchol has a clear physical meaning: wchol is a cholesterol mole fraction at which the capability of phospholipid headgroups to cover cholesterol molecules from water has reached its maximum. Any additional cholesterol in the bilayer would be exposed to water. The difference in wchol values of PE and PC may originate from the size of their headgroups. Here, we mean the effective size of the headgroup, including bound water. The smaller headgroup of PE would be less effective than that of PC in providing shielding, so PE/cholesterol mixtures would have a high energy cost even for covering a cholesterol dimer cluster. This is equivalent to a relatively high energy cost for the first cholesterolcholesterol contact in our simulation, which results in mchol rising sharply at 0.50, and thus wchol ¼ 0:50. At its maximum solubility limit in a PE bilayer, cholesterol forms the hexagonal lateral distribution pattern, in which all cholesterols stay as monomers. In contrast, the larger headgroup of PC might well accommodate the first two cholesterol-cholesterol contacts, but be overwhelmed by the third one. The energy cost profile could then be modeled by MIEP III (i.e., very high energy cost for the third cholesterol-cholesterol contact). This would result in wchol ¼ 0:67. At its maximum solubility limit in a PC bilayer, cholesterol forms the maze lateral distribution pattern. So if the size of the phospholipid headgroup is the dominating factor, then the lack of acyl chain dependence of wchol for PCs is explained. Thus, the cholesterol multibody interaction described here is the parameterization of the free energy cost for the bilayer to cover a cholesterol

Thermodynamics of Cholesterol-Lipid Interaction

349

cluster, and this cost indeed should increase nonlinearly with the size of cholesterol cluster. 3.1.4. The intrinsic connection between a regular distribution and a jump in chemical potential Is there an intrinsic connection between a regular distribution and a jump in chemical potential in general? The answer is yes. This intrinsic connection is best explained by Ben Widom’s (1963) test particle insertion method. As pointed out by Widom, the excess chemical potential of a particle can be estimated by an ensemble average

mE ¼ kT lnhexpðVtest =kT Þi;

ð12:11Þ

where Vtest is the potential energy change that would result from the addition of a ghost particle to the system. We will use the regular distribution at wc ¼ 0.57 (Fig. 12.2B) as an example. This regular distribution is characterized by each cholesterol having exactly one cholesterol as its nearest neighbor. Because MIEP II is deliberately formulated to make the second contact cost much more energy than the first, to lower the energy of the whole system, cholesterol molecules avoid two or more cholesterol-cholesterol contacts. At wchol less than 0.57, for example, at wc ¼ 0.51 (Fig. 12.2A), the concentration of cholesterol is relatively low and there are many sites into which a ghost cholesterol can be inserted without creating multiple cholesterol-cholesterol contacts. These low-energy insertions make the majority contribution to the ensemble average in Eq. (12.11). Therefore the cholesterol chemical potential would be low. However, at wc ¼ 0.57 or greater (Fig. 12.2B or 12.2C), any insertion of a ghost cholesterol into the distribution would create multiple cholesterol-cholesterol contacts. Thus, the energy cost of insertion must suddenly becomes higher, and the chemical potential evaluated by Eq. (12.11) must have a jump at wc ¼ 0.57. The magnitude of the jump is largely determined by the energy difference between the first and the second cholesterol-cholesterol contacts (i.e. (c2 c1) DEc). To take the preceding argument a step further, a general statement can be made that any stable regular distribution must result in a jump in chemical potential, without regard to the details of the distribution and molecular interactions. We can justify the above statement by the following arguments: Comparing a highly ordered RD with other disordered distributions at the same composition, the difference in free energy, DG ¼ G(RD) – G(other) ¼ DH TDS, must be negative, for the given molecular interactions. The entropy of the RD state must be lower. Thus, TDS must be positive, as demonstrated in Fig. 12.5. Thus, DH must be negative (i.e., the energy of a regular distribution must be lower than any other disordered distributions). At a cholesterol mole fraction slightly below the RD composition, there will be

350

Juyang Huang

some areas not covered by RD pattern. Thus, the energy cost of insertion of a cholesterol in these areas will be low. Again, these low-energy insertions make the majority contribution to the ensemble average in Eq. (12.11). However, at a RD composition, the entire membrane is in regular distribution. Adding a cholesterol to a regular distribution must bring a higher energy cost now, as it creates a more disordered state, which would result in a jump in chemical potential.

3.2. The competition between cholesterol and ceramide in POPC bilayers More insight into the physical packing of lipid bilayers at the cholesterol solubility limit was provided by the effect of ceramide on the value of wchol in a lipid bilayer (Ali et al., 2006). Although the molecular structure of ceramide is quite different from that of cholesterol, both molecules have a small polar headgroup and a large nonpolar body. Like cholesterol, ceramide cannot form a bilayer by itself and is present as crystals in water. Also like cholesterol, a substantial amount of ceramide can be incorporated into a phospholipid bilayer. Using optical microscopy, the maximum solubility of brain ceramide in a POPC bilayer was determined to be 0.68 0.02 (Ali et al., 2006). This number by itself is very interesting, because it is similar to the maximum solubility of cholesterol in POPC bilayers. On the basis of the umbrella model, it is likely that ceramide in a lipid bilayer also seeks the coverage of the neighboring PC headgroups to shield its large nonpolar body from water exposure, like cholesterol does. If there is a competition between cholesterol and ceramide for the coverage of neighboring PC headgroups, the more ceramide is present in a PC bilayer, the less the bilayers can accommodate cholesterol. Thus, one expects a consistent decline of the value of wchol in response to an increasing ceramide content in the lipid bilayer. Using the COD activity assay, the maximum solubility of cholesterol in POPC bilayers with various amounts of brain ceramides has been investigated. Fig. 12.7 shows the maximum solubility of cholesterol in ternary mixtures of POPC/cholesterol/ceramide as a function of the molar ratio R, defined as ceramide/(ceramide þ POPC). As shown in Fig. 12.7, at R ¼ 0, there is no ceramide, and the mixtures are actually POPC/cholesterol binary mixtures, and the wchol value is 0.67. As the ratio R increases (i.e. more ceramide is present in the lipid bilayers), the maximum solubility of cholesterol continuously decreases, and eventually reaches zero at R ¼ 0.68 (i.e., at the maximum solubility of ceramide in POPC bilayer). It is more revealing to plot the ratio (cholþcer)/ (cholþcerþPOPC) at the cholesterol solubility limit, which is given by ∗ Rð1w∗ chol Þ þ wchol , as a function of R. As shown in Fig. 12.7, this ratio essentially stays constant at 0.67. Therefore, at the cholesterol solubility limit, the ratio (cholþcer):POPC is simply 2:1, regardless of the amount of ceramide in the mixtures.

351

Thermodynamics of Cholesterol-Lipid Interaction

0.7

X* or (Cer + Chol)/(Cer + Chol + PC)

0.6

0.5

0.4

0.3

0.2

0.1

0.0 0.0

0.1

0.2

0.3 0.4 R = Cer/(Cer + PC)

0.5

0.6

0.7

Figure 12.7 The maximum solubility of cholesterol, wchol , or the ratio (ceramideþcholeserol)/(ceramideþcholesterolþPC) as a function of ceramide/(ceramideþPOPC) ratio. Circles: wchol , measured by COD activity assay. Triangle: The maximum solubility of ceramide in POPC bilayers determined by optical microscopy measurement. The dashed line is the theoretical wchol curve with the assumption that one ceramide molecule displaces one cholesterol from the bilayer phase into the cholesterol crystal phase. Squares: the ratio (ceramideþcholeserol)/(ceramideþcholesterolþPC) at the cholesterol solubility limits.

The results showed that cholesterol is displaced by ceramide from the bilayer phase into the crystal phase at its solubility limit. Most significantly, the data also demonstrated that each ceramide molecule displaces exactly one cholesterol molecule from the bilayer phase. This 1-to-1 displacement relationship keeps the ratio of (cholþcer):POPC at the constant 2:1. According to the umbrella model, at the solubility limit, the coverage capability of PCs has been stretched to the limit and lipids form a highly ordered maze pattern lateral distribution in the bilayer, so covering one additional ceramide is at the cost of covering one fewer cholesterol. A POPC bilayer could accommodate either up to 67 mol % cholesterol, 67 mol % ceramide, or a combined 67 mol % cholesterol and ceramide. The 1-to-1 displacement validates the umbrella model’s physical interpretation of wchol . The data also suggested that ceramide has a much higher affinity for the ordered bilayer phase than does cholesterol, and cholesterol cannot displace brain ceramide from the lipid bilayer phase. One possible reason is that the

352

Juyang Huang

long saturated chains of ceramide allow a tight packing of the acyl chains of PC around ceramide, which is more difficult with the sterol rings of cholesterol. Megha and London (2004) have shown that the tight lipid packing is important for the displacement of cholesterol. Another possible contribution to ceramide’s high affinity is the fact that ceramide has a higher headgroup/body ratio than cholesterol (Ali et al., 2006). It should be easier for the neighboring PCs to cover ceramide than cholesterol, or equivalently, the free energy cost for covering ceramide should be lower than that for covering cholesterol.

3.3. Measurement and simulation of the chemical potential of cholesterol in PC bilayers The focus of the preceding sections is the thermodynamics of cholesterolphospholipid interaction at the cholesterol solubility limit. In this section, cholesterol-phospholipid interaction over the entire range of wchol will be discussed. Despite significant technical advances in lipid membrane research in recent years, the detailed nature of cholesterol-lipid interactions is still a subject of ongoing debate. Existing conceptual models, including the condensed complex model, the superlattice model, and the umbrella model, identify different molecular mechanisms as the key to cholesterol– lipid interactions in biomembranes. Here, an important thermodynamic parameter, i.e., the chemical potential of cholesterol over a wide range of cholesterol mole fraction, will be analyzed and its implication to cholesterol-lipid interaction will be discussed. 3.3.1. Current conceptual models of cholesterol-lipid interactions (i) The condensed complex model. The model was initially proposed based on a study of lipid monolayers at the air–water interface (Radhakrishnan and McConnell, 1999). The model hypothesizes the existence of low freeenergy stoichiometric cholesterol–lipid chemical complexes that occupy smaller molecular lateral areas (Radhakrishnan et al., 2000; Radhakrishnan and McConnell, 1999). At a stoichiometric composition, such as at cholesterol:lipid ¼ 1:1 or 1:2, a sharp jump in cholesterol chemical potential has been predicted from a mean-field calculation as shown in Fig. 12.8A. Because the proposed condensed complex has a compact low-energy structure, the model explicitly predicted that cholesterol is more likely to form condensed complexes with lipids with which it can mix favorably, such as phosphatidylcholine (PC) with long saturated chains, or sphingomyelins. It has also been suggested that cholesterol superlattices as well as lipid rafts are examples of the proposed condensed complexes (Radhakrishnan et al., 2000; Radhakrishnan and McConnell, 2005). In this model, the strong tendency to form cholesterol–lipid condensed complexes represents an essential feature of cholesterol–lipid interactions.

353

Thermodynamics of Cholesterol-Lipid Interaction

A

B

C

mc

χc 0.33

0.15

0.25

0.4

0.5

0.57

0.67

Figure 12.8 Schematic of the chemical potential of cholesterol as a function of wchol predicted by various models. (A) The condensed complex model predicted a jump in mchol at a stoichiometric composition. (B) The superlattice model predicted dips in free energy at superlattice compositions, which also implied sharp spikes in mchol. (C) The umbrella model predicted a cascade of jump in mchol, and each jump corresponds to a stable cholesterol regular distribution.

(ii) The superlattice model. This model suggests that the difference in the cross-sectional area between cholesterol and other lipid molecules can result in a long-range repulsive force among cholesterols and thereby produce superlattice distributions (Chong, 1994; Somerharju et al., 1985). Many superlattice patterns, either hexagonal or centered rectangular, have been predicted from a set of algebraic equations based on a geometric-symmetry argument. At the cholesterol mole fractions where superlattices occur, dips in free energy have also been suggested (Somerharju et al., 1999). Because mchol can be calculated by taking the derivative of the free-energy profile (Huang and Feigenson, 1999), this prediction also implies that mchol should have sharp spikes at those predicted mole fractions (Fig. 12.8B). The superlattice model emphasizes that the long-range repulsive force among cholesterols plays the dominant role in cholesterol–lipid interactions. (iii) The umbrella model. As described in the previous section, the umbrella model suggests that the mismatch between the small cholesterol polar headgroup with its large nonpolar body determines its preferential association with large-headgroup lipids, such as PCs or sphingomyelins. Because it costs much more free energy to cover a cholesterol cluster than a single cholesterol, cholesterol molecules have a strong tendency not to cluster or at least not to form large clusters. In the high-cholesterol region (wchol > 0.45), to reduce the free-energy cost, cholesterol in a bilayer distributes in a manner so as to minimize cholesterol cluster size. This mechanism can be formulated as a cholesterol multibody interaction. Monte Carlo simulations based on this interaction showed that cholesterol can form a hexagonal monomer pattern at mchol ¼ 0.50, or a dimer pattern at 0.571, or a maze pattern at 0.667 (Fig. 12.4). In addition, any stable regular distribution is always accompanied by a jump in mchol. In addition, with the assumption that there is an accelerating increase of the coverage cost with

354

Juyang Huang

the size of cholesterol clusters, a cascade of jumps in mchol is possible (Fig. 12.8C). In the low-cholesterol region (wchol < 0.45), three more regular distributions (superlattices) have been successfully simulated. They occur at cholesterol mole fraction of 0.154, 0.25, and 0.4, corresponding to cholesterol:phospholipid ratio of 2:11, 1:3, and 2:3, respectively. Conclusively, the study showed that any pairwise repulsive forces between cholesterol cannot produce these regular distributions. Two requirements are needed to generate these cholesterol superlattices: (i) a large interaction against any cholesterol clustering; and (ii) a smaller unfavorable acyl chain multibody interaction, which increases nonlinearly with the number of chain-cholesterol contacts and tends to minimize the acyl chain contact with cholesterol (Huang, 2002). A delicate balance must be maintained among the magnitudes of the two interactions and the combined effect of both interactions must still favor the cholesterol-phospholipid mixing. The first interaction likely originates from the requirement for PC to cover cholesterol, and the second unfavorable interaction is likely from the sharp decrease of acyl chain conformation entropy due to chain-cholesterol contact. Although mixing cholesterol with phospholipids must have an overall favorable free energy, from the point of view of acyl chains, cholesterol molecules are aggressive invaders. Cholesterol molecules have to partially hide under the headgroups of phospholipids, and occupy the lateral spaces that would otherwise be available to acyl chains. The rigid, bulky sterol rings of cholesterol can significantly reduce the number of possible conformations of neighboring acyl chains, which is evidenced by increasing chain order parameter when cholesterol is added to a bilayer (Vist and Davis, 1990). Similar to the regular distributions in the high cholesterol region, each formation of regular distributions in the low-cholesterol region is also accompanied by a jump in chemical potential of cholesterol (unpublished data). This finding is consistent with our earlier argument that any stable regular distribution must result in a jump in chemical potential, based on the Widom’s test particle insertion interpretation of chemical potential. Thus, experimentally, a jump in mchol can be considered as a thermodynamic indicator for a regular distribution. Interestingly, there are some common features in all three models: (i) The overall interaction between cholesterol and phospholipid is attractive, or equivalently, the interaction between cholesterols is repulsive; (ii) There could be some special lateral distributions of molecules (i.e., SL or RD or condensed complex) at some well-defined bilayer compositions, at which the ratio cholesterol:phospholipid ¼ m:n; here, m and n are both integers. Therefore, there is some confusion about the validity of each model as well as the differences between models. However, different models have different predictions for the chemical potential profile. Therefore, an experimental measurement of mchol can provide a strong test of the models.

Thermodynamics of Cholesterol-Lipid Interaction

355

3.3.2. Chemical potential of cholesterol in DOPC, POPC, and DPPC bilayers explored by COD activity assay The COD activity assay has been used to measure the chemical potential profile of cholesterol (Ali et al., 2007). It has been shown that a COD enzyme first physically associates with lipid bilayers without perturbing the membrane structure (Ahn and Sampson, 2004). The enzyme then goes through conformation changes and provides a hydrophobic binding cavity that allows a favorable partitioning of the cholesterol from the membrane into the COD cavity. The cavity can be viewed as a standard state, and the rate of cholesterol partition into the cavity depends on its chemical activity in the bilayer. The initial-reaction rate of the oxidation should depend on COD concentration, cholesterol concentration (i.e., substrate concentration), the binding affinity of COD for lipid vesicles, and the cholesterol chemical activity in a lipid bilayer. Because the COD concentration and cholesterol content in our samples were kept constant, they should not contribute to the change of the initial rate. Ahn and Sampson (2004) have shown that the binding affinity of COD for vesicles is only a weak function of wchol. Thus, the changing of the initial rate essentially reflects the behavior of the chemical activity of cholesterol, which relates to the chemical potential of cholesterol mchol in lipid bilayers by exp(mchol/kT). mchol is directly related to the cholesterol interaction with surrounding lipids and the lateral organization within the bilayer. Fig. 12.9 shows the COD initial reaction rate as a function of cholesterol mole fraction in the high-cholesterol region (wchol > 0.45) (Ali et al., 2007). The initial rate has a global peak around wchol ¼ 0.67 in all three PC bilayers, indicating the maximum solubility of cholesterol in these bilayers. The values are in good agreement with the previous measurements by X-ray diffraction, light scattering, and fluorescence spectroscopy (Huang et al., 1999; Parker et al., 2004). As wchol approaches the maximum solubility limit, mchol in lipid bilayers increases sharply until it equals mchol in cholesterol monohydrate crystals and results in a sharp increase of the COD initial reaction rate. At greater than the maximum solubility limit, the bilayer phase and the cholesterol crystal phase coexist, and mchol should remain constant. However, once the crystal phase appears, the COD initial reaction rate no longer follows the behavior of mchol, and displays a sharp decline as discussed earlier. An obvious feature in Fig. 12.9 is that the COD initial rate is highest in DOPC bilayers and lowest in DPPC bilayers at the same wchol. Because the initial rate largely reflects mchol, it indicates that mchol is lowest in a PC bilayer with all saturated chains (DPPC), higher in a PC bilayer with mixed chains (POPC), and highest in a PC bilayer with all unsaturated chains (DOPC). A higher mchol reflects a stronger tendency for cholesterol to escape from the bilayer. This result indicates that cholesterol interacts more favorably with saturated chains than unsaturated chains.

356

Juyang Huang

0.2

Xc = 0.60

Absorbance

Initial reaction rate (AU)

A 3.E − 03

2.E − 03

Xc = 0.50 Time (s)

0

0

30

60

90

120

1.E − 03 DOPC POPC DPPC

0.E + 00

Initial reaction rate (AU)

B 3.E − 03

2.E − 03

1.E − 03 DOPC POPC DPPC

0.E + 00 0.4

0.5

0.6 χc

0.7

0.8

Figure 12.9 COD initial reaction rate as a function of cholesterol mole fraction in the high-cholesterol region. The shaded bars indicate the locations of expected jumps at 0.50 and 0.571 and the cholesterol maximum solubility limit at 0.667 predicted from the MC simulations based on the umbrella model. The width of the bars reflects the experimental uncertainty in wchol in our samples (0.015) (A) Average curves, each obtained from three independent sample sets. (B) Individual curves. The jumps at 0.5 and 0.57 appear sharper. Inset: COD reaction progress curves of DOPC/cholesterol mixtures. The reaction rate is higher at wchol ¼ 0.60 than that at 0.50. From Ali et al. (2007), with permission.

Fig. 12.9A shows the average curves of COD initial rates in DOPC, POPC, and DPPC bilayers, each obtained from three independent sample sets. The COD initial rate shows several interesting jumps. In POPC bilayers, the initial rate has clear jumps at wchol ¼ 0.52 and 0.58. In DOPC bilayers, the initial rate shows jumps at wchol ¼ 0.51, 0.57, and 0.62.

Thermodynamics of Cholesterol-Lipid Interaction

357

In contrast, the initial rate in DPPC bilayers only has a large jump at wchol 0.63 and a tiny jump at 0.58. It should be pointed out that these jumps are usually steeper in individual sample sets, as shown in Fig. 12.9B. Because of the experimental uncertainty in cholesterol mole fraction in our samples (0.015), the sharpness of the jumps in the average curves (Fig. 12.9A) is smoothed out considerably. The vertical shaded bars in Fig. 12.9 indicate the positions of expected chemical potential jumps at wchol ¼ 0.50 and 0.571 and cholesterol maximum solubility limit at 0.667 predicted from the MC simulations based on the umbrella model. The width of the bars represents the experimental uncertainty in wchol in our samples (0.015). Therefore, the locations of jumps agree favorably with the predicted values within the standard deviations of the data. According to the MC simulations, the molecular driving force for the regular distribution at wchol ¼ 0.50 is that the free energy cost of covering a cholesterol dimer is significantly higher than that of covering a cholesterol monomer (Huang and Feigenson, 1999). If the difference is sufficiently large, cholesterol would stay as monomer as long as the composition allows. This can result in a monomer regular distribution pattern at 0.50 (Fig. 12.4D). The height of the jump in mchol is related to that energy difference. Because an unsaturated chain containing a cis double bond is bulkier or less compressible than a saturated chain, DPPC with both saturated chains should have the largest headgroup/body ratio (or excess headgroup capacity to cover neighboring cholesterol clusters) among the three PCs. Therefore it is reasonable to assume that DPPC can cover a cholesterol dimer more easily (i.e., at a lower free-energy cost) than POPC or DOPC does. Thus, the driving force to form regular distribution at wchol ¼ 0.50 is weakest in DPPC, but strongest in DOPC. This is indeed the case, as shown in Fig. 12.9. No jump is seen at a wchol ¼ 0.50 in DPPC bilayers, and the jump was larger in DOPC than in POPC (Fig. 12.9B). Similarly, the driving force to form the dimer regular distribution pattern at wchol ¼ 0.571 (Fig. 12.4E) is that the free-energy cost of covering a larger cholesterol cluster is much higher than that of covering a cholesterol dimer. As shown in Fig. 12.9, a small jump was observed at ¼ 0.58 in DPPC, and bigger jumps were present in POPC and DOPC bilayers. The data show that all three PC are capable of covering cholesterol clusters up to the solubility limit (0.67), but the free-energy cost of the coverage in DOPC increases much more rapidly with the number of cholesterolcholesterol contacts than for the other PCs because of the chain unsaturation. The measured chemical potential profile in DOPC is similar to that simulated with MIEP IV in Fig. 12.3E. On the other hand, the measured chemical potential profile in DPPC is more similar to that simulated with MIEP III in Fig. 12.3C, indicating the free-energy cost of covering cholesterol monomer or dimers are low in DPPC bilayers, due to its large headgroup/body ratio.

358

Juyang Huang

It is interesting that the COD initial rate in DPPC bilayers jumps at wchol ¼ 0.63. In several measurements with DPPC and DOPC, jumps around 0.63 have also been observed. Previously, small dips in fluorescence energy transfer efficiency and DPH-PC fluorescence anisotropy have been found at 0.63 in DOPC bilayers (Parker et al., 2004). Whether this previously unassigned critical composition of 0.63 represents a regular distribution made of a trimer or tetramer pattern remains elusive. More study is needed to verify this interesting possibility. Fig. 12.10 shows the COD initial reaction rate in the low-cholesterol region (wchol < 0.45) (Ali et al., 2007). To resolve small jumps at very low cholesterol concentrations, three times more COD (0.3 unit per sample) was used for the low-cholesterol samples. Similar to the high-cholesterol region, the dominant feature of the figure is that the initial rate is highest in DOPC bilayers and lowest in DPPC bilayers. The differences in rate are quite spectacular. For example, at wchol ¼ 0.35, the rates in POPC and DOPC bilayers are about 11 and 44 times higher than that in DPPC bilayers, respectively. The data indicate that cholesterol interaction with unsaturated acyl chains is very unfavorable compared with saturated chains in this region. Although the COD initial rate generally increases with wchol in all three PC bilayers, the rate behaves quite differently in different bilayers. Clear jumps can be seen at wchol ¼ 0.25 and 0.40 in DOPC and POPC bilayers, whereas the rate in DPPC bilayers is featureless and flat. A close look at low wchol showed that the rate also has a small but distinct jump at wchol ¼ 0.15 in DOPC but not in others (Fig. 12.10, inset). In the low-cholesterol region, according to MC simulation, two opposing types of interactions are required to produce a regular distribution: A large unfavorable interaction for any cholesterol clustering, and a smaller unfavorable acyl chain multibody interaction, which increases nonlinearly with the number of chain-cholesterol contacts. To satisfy the first requirement, the headgroup/body ratio of a PC must be in some optimal range to create a strong tendency for cholesterol to avoid clustering in that particular bilayer. This strong tendency can be experimentally verified by examining whether cholesterol forms the monomer (i.e., hexagonal) regular distribution at wchol ¼ 0.50. Thus, forming a stable regular distribution at wchol ¼ 0.50 would meet the first necessary condition for forming regular distributions in the low cholesterol region, because it indicates that cholesterol tries to avoid forming dimer clusters in the bilayer due to the high free energy cost of covering a cholesterol dimer cluster. The second necessary interaction is likely from the reduction of acyl chain conformation entropy due to cholesterol contact. Because cholesterol does not form stable regular distributions at wchol ¼ 0.50 in DPPC bilayers, the first necessary condition is not met. This explains why there is no stable regular distribution in the low-cholesterol region in DPPC. On the other hand, cholesterol does form regular distributions at wchol ¼ 0.50 in POPC and DOPC bilayers.

359

Thermodynamics of Cholesterol-Lipid Interaction

8.E − 03

Initial reaction rate (AU)

6.E − 03

4.E − 03

8.E − 04

4.E − 04

DOPC 0.E + 00 0.1

0.15

0.2

0.25

POPC DPPC

0.3

2.E − 03

0.E + 00 0.1

0.2

0.3 χc

0.4

0.5

Figure 12.10 COD initial reaction rate as a function of wchol in the low-cholesterol region. The shaded bars indicate the locations of expected jumps at 0.154, 0.25, and 0.40 predicted from the MC simulations. The width of the bars reflects the experimental uncertainty in wchol. From Ali et al. (2007), with permission.

COD initial reaction rate curves show that cholesterol forms regular distributions at 0.25 and 0.4 in both bilayers. The data clearly demonstrate the direct correlation between the regular distribution at wchol ¼ 0.50 and the regular distributions in the low-cholesterol region (0.40, 0.25, and 0.154). Composition dependence of the chemical activity of cholesterol can provide valuable information about cholesterol-lipid interaction. As shown previously, it can be used to compare relative affinity of cholesterol for various bilayers, and to detect possible superlattices or condensed complexes. In addition, mchol increases with wchol in all three PC bilayers, which indicates that mixing of cholesterol with PC is favorable, and the overall interaction between cholesterol and PC is attractive, not repulsive. 3.3.3. Comparison of models of cholesterol-lipid interaction The condensed complex model: In Figs. 12.9 and 12.10, the COD initial reaction rate shows a series of jumps in POPC and DOPC bilayers. Could the corresponding cholesterol regular distributions (i.e., superlattices) at these compositions be the condensed complexes, as suggested previously (Radhakrishnan et al., 2000; Radhakrishnan and McConnell, 2005)? A careful analysis of data will show that regular distributions cannot be

360

Juyang Huang

condensed complexes: (1) Cholesterol regular distributions are neither condensed nor low-free-energy aggregates. In Figs. 12.9 and 12.10, there is no jump in the initial rate in DPPC bilayers for wchol < 0.57, but four jumps appear in DOPC bilayers in the same region. Obviously, the tendency to form cholesterol regular distributions is much stronger in DOPC bilayers, which are bilayers with which cholesterol has a less favorable interaction, indicated by the high chemical potential. The condensed complexes are supposed to be low free-energy aggregates of cholesterol and lipids, which occupy a smaller membrane lateral area. According to the condensed complex model (Radhakrishnan and McConnell, 2005), the condensed complexes should be formed in a PC bilayer with which cholesterol can mix very favorably, such as with DPPC or sphingomyelins, but not with DOPC or POPC. The data obtained directly contradict the model. It is well known that PC with unsaturated chains occupies more lateral area. Therefore, the regular distributions in DOPC or POPC bilayers are not structurally condensed. In addition, the mchol is higher in DOPC or POPC than that in DPPC. Thus, the observed regular distributions in DOPC and POPC actually have higher free energies than the cholesterol/ DPPC mixtures with the same lipid compositions. (2) Regular distributions lack lipid specificity. The same regular distribution can exist in DOPC, POPC, and other bilayers. In fact, not only are the phospholipids in regular distributions exchangeable, even cholesterol can be partially or completely replaced by ceramide in the regular distributions occurring at the cholesterol solubility limit. The lack of molecular specificity is inconsistent with the idea of a stoichiometric chemical complex. (3) Cholesterol is a relatively simple molecule. It is conceptually difficult to justify that cholesterol can form six different low free-energy stoichiometric complexes with one type of lipid (e.g., DOPC), and also mchol in these complexes monotonically increases with wchol. The superlattice model: The superlattice model correctly describes a general picture of cholesterol and PC mixing: Cholesterols tends to keep a distance from each other in a PC bilayer. The model suggests that superlattices are resulted from the physical shape of lipid molecules, not from specific chemical complexes, which is consistent with experiment data. The model also predicts many highly symmetrical regular distribution patterns (superlattices) at some well-defined lipid compositions, and a number of those distributions have been verified experimentally through various techniques (Chong and Olsher, 2004; Somerharju et al., 1999). However, the model does have a few deficiencies: (1) It incorrectly requires a long-range repulsive force among cholesterols to produce superlattices. While this mechanism does favor the mixing of cholesterol with phospholipids, it is not necessarily able to produce a regular distribution. MC simulations have demonstrated that multibody interactions are required for superlattice formation; (2) the superlattice model predicts superlattices based on geometric

Thermodynamics of Cholesterol-Lipid Interaction

361

symmetry, not based on a quantitative free-energy calculation. The model does correctly predict the superlattices at wchol ¼ 0.154, 0.25, 0.4, and 0.50, but it fails to predict the regular distribution at wchol ¼ 0.571 (dimer pattern) and 0.667 (maze pattern) because these two patterns are neither hexagonal nor centered rectangular. In addition, it predicts many other superlattices that are unlikely to exist, particularly at low-cholesterol mole fractions; (3) it has been hypothesized that a superlattice distribution produces a dip in free energy. The corresponding chemical potential profile (Fig. 12.8B) has been proved incorrect. In fact, free energy has a sudden change of slope at each regular distribution composition, not necessary a local minimum, as shown in Huang and Feigenson (1999). However, this free-energy minimum prediction was not a fundamental assumption of the model, which is largely a geometrical model without any explicit assumptions of specific molecular interactions among the lipid molecules. The umbrella model: The umbrella model and the consequent MC simulations have proved quite successful in predicting and explaining cholesterol’s mixing behavior with PCs: (1) the measured chemical potential profiles have excellent agreement with the calculated profiles from the MC simulation based on the model. This shows that the umbrella model captures the key cholesterol-lipid interaction. (2) The jumps in mchol at wchol ¼ 0.154, 0.25, 0.40, 0.50, 0.571, and 0.667 predicted from MC simulation based on the umbrella model have all been observed experimentally, which indicates that stable regular distributions can exist at these compositions. (3) The umbrella model naturally explains the detailed molecular interactions required for each regular distribution. The experimental result strongly supports the explanation of the driving forces for regular distributions. It shows that both the headgroup/body ratio of PC and the acyl-chain conformational entropy play the key roles in cholesterol superlattice formation. (4) Unlike the condensed complex model, which assumes specific chemical complex formation between cholesterol and other lipids, the umbrella model suggests that the key cholesterol-lipid interaction is a hydrophobic interaction, which arises from the shape of a cholesterol molecule: a small polar headgroup and a large nonpolar body. The model can be generalized and applied to the interactions between other small headgroup molecules (e.g., diacylglycerol or ceramide) and large headgroup lipids (e.g., PC or sphingomyelins). In addition, the umbrella model also provided a possible driving force for the formation of lipid rafts (Parker et al., 2004). 3.3.4. Quantitative indication of cholesterol affinity to lipid bilayers The umbrella model together with the MC simulations showed that the change in the wchol value is likely to be discontinuous in response to the change of headgroup/body ratio of host lipids. Some small changes in the headgroup/ body ratio would not alter the wchol value (e.g., from DPPC to DOPC), but

362

Juyang Huang

a sufficiently large change can cause membrane lipids to adopt a new regular distribution packing pattern and result in an abrupt change in wchol (e.g., from POPE to POPC). wchol could jump discretely from one regular distribution to another (e.g. from 0.50 hexagonal pattern to 0.57 dimer pattern and then to 0.67 maze pattern’’), as the headgroup/body ratio increases continuously. The 1-to-1 replacement indicates that the lateral packing of ceramide with POPC is likely to be the same maze pattern, which suggests that the difference in the headgroup/body ratio of ceramide and cholesterol does not cause the bilayers to adopt a different lateral packing pattern to accommodate ceramide. Therefore, the value of wchol in a particular lipid bilayer is just a rough measure of the affinity of cholesterol for that bilayer: a higher wchol value does indicate that the mixing of cholesterol with that particular lipid is more favorable. However, if the value of wchol is the same in several lipid bilayers, such as in POPC, POPD and DPPC bilayers, it does not indicate the mixings are equally favorable, because the value of wchol cannot reflect relatively small differences in affinity. On the other hand, a lower cholesterol chemical potential indicates a more favorable mixing of cholesterol with that particular bilayer. Thus, measuring chemical potential of cholesterol in lipid bilayers is the more precise thermodynamical method to find out the relative affinity. In addition, it also can reveal the formation of regular distribution of lipids.

4. Concluding Remarks Cholesterol is the most fascinating and the most studied membrane molecule. Despite decades of research, our understanding of lipidcholesterol-protein interactions at the molecular level in three or more component systems is still quite primitive. In this article, we showed that in the process of uncovering the key cholesterol-lipid interaction in binary mixture systems, a thermodynamic quantity (i.e., the chemical potential of cholesterol), served as an essential link between various experimental data and the microscopic molecular interactions. We are optimistic that the combination of experimental data, thermodynamics, and computer simulation can be a powerful approach to uncover the key molecular interactions and to predict the behaviors of biomolecules in more complex systems.

ACKNOWLEDGMENTS The author would like to thank Dr. Gerald W. Feigenson for many stimulating discussions and reading of this manuscript, and Dr. Jeffery T. Buboltz for developing and refining the RSE method. This work was supported by National Science Foundation Grants MCB-9722818 and MCB-0344463, Petroleum Research Fund Grant ACS PRF 41300-AC6, and National Institute of Health Grant 1 R01 GM077198-01A1 Subaward 49238-8402.

Thermodynamics of Cholesterol-Lipid Interaction

363

REFERENCES Ahn, K. W., and Sampson, N. S. (2004). Cholesterol oxidase senses subtle changes in lipid bilayer structure. Biochemistry 43, 827–836. Ali, M. R., Cheng, K. H., and Huang, J. (2006). Ceramide drives cholesterol out of the ordered lipid bilayer phase into the crystal phase in 1-palmitoyl-2-oleoyl-sn-glycero-3phosphocholine/cholesterol/ceramide ternary mixtures. Biochemistry 45, 12629–12638. Ali, M. R., Cheng, K. H., and Huang, J. (2007). Assess the nature of cholesterol-lipid interactions through the chemical potential of cholesterol in phosphatidylcholine bilayers. Proc. Natl. Acad. Sci. USA 104, 5372–5377. Brown, D. A., and London, E. (2000). Structure and function of sphingolipid- and cholesterol-rich membrane rafts. J. Biol. Chem. 275, 17221–17224. Buboltz, J. T., and Feigenson, G. W. (1999). A novel strategy for the preparation of liposomes: Rapid solvent exchange. Biochim. Biophys. Acta 1417, 232–245. Chialvo, A. A. (1990). Determination of excess Gibbs free energy from computer simulation by the single charging-intergral approach I. Theory. J. Chem. Phys. 92, 673–679. Chong, P. L. (1994). Evidence for regular distribution of sterols in liquid crystalline phosphatidylcholine bilayers. Proc. Natl. Acad. Sci. USA 91, 10069–10073. Chong, P. L. G., and Olsher, M. (2004). Fluorescence studies of the existence and functional importance of regular distributions in liposomal membranes. Soft Materials 2, 85–108. Haile, J. M. (1986). On the use of computer-simulation to determine the excess free-energy in fluid mixtures. Fluid Phase Equilibria 26, 103–127. Huang, J. (2002). Exploration of molecular interactions in cholesterol superlattices: Effect of multibody interactions. Biophys. J. 83, 1014–1025. Huang, J., Buboltz, J. T., and Feigenson, G. W. (1999). Maximum solubility of cholesterol in phosphatidylcholine and phosphatidylethanolamine bilayers. Biochim. Biophys. Acta 1417, 89–100. Huang, J., and Feigenson, G. W. (1993). Monte Carlo simulation of lipid mixtures: Finding phase separation. Biophys. J. 65, 1788–1794. Huang, J., and Feigenson, G. W. (1999). A microscopic interaction model of maximum solubility of cholesterol in lipid bilayers. Biophys. J. 76, 2142–2157. Kingsley, P. B., and Feigenson, G. W. (1979). The synthesis of a perdeuterated phospholipid:1,2-dimyristoyl-sn-glycero-3-phosphocholine-d72. Chem. Phys. Lipids. 24, 135–147. Megha, E., and London, E. (2004). Ceramide selectively displaces cholesterol from ordered lipid domains (rafts): Implications for lipid raft structure and function. J. Biol. Chem. 279, 9997–10004. Parker, A., Miles, K., Cheng, K. H., and Huang, J. (2004). Lateral distribution of cholesterol in dioleoylphosphatidylcholine lipid bilayers: Cholesterol-phospholipid interactions at high cholesterol limit. Biophys. J. 86, 1532–1544. Radhakrishnan, A., Anderson, T. G., and McConnell, H. M. (2000). Condensed complexes, rafts, and the chemical activity of cholesterol in membranes. Proc. Natl. Acad. Sci. USA 97, 12422–12427. Radhakrishnan, A., and McConnell, H. (2005). Condensed complexes in vesicles containing cholesterol and phospholipids. Proc. Natl. Acad. Sci. USA 102, 12662–12666. Radhakrishnan, A., and McConnell, H. M. (1999). Condensed complexes of cholesterol and phospholipids. Biophys. J. 77, 1507–1517. Somerharju, P., Virtanen, J. A., and Cheng, K. H. (1999). Lateral organisation of membrane lipids. The superlattice view. Biochim. Biophys. Acta 1440, 32–48.

364

Juyang Huang

Somerharju, P. J., Virtanen, J. A., Eklund, K. K., Vainio, P., and Kinnunen, P. K. (1985). 1-Palmitoyl-2-pyrenedecanoyl glycerophospholipids as membrane probes: Evidence for regular distribution in liquid-crystalline phosphatidylcholine bilayers. Biochemistry 24, 2773–2781. Vist, M. R., and Davis, J. H. (1990). Phase equilibria of cholesterol/dipalmitoylphosphatidylcholine mixtures: 2H nuclear magnetic resonance and differential scanning calorimetry. Biochemistry 29, 451–464. Widom, B. (1963). Some topics in the theory of fluids. J. Chem. Phys. 39, 2808–2812.

C H A P T E R

T H I R T E E N

Thinking Inside the Box: Designing, Implementing, and Interpreting Thermodynamic Cycles to Dissect Cooperativity in RNA and DNA Folding Nathan A. Siegfried* and Philip C. Bevilacqua* Contents 1. 2. 3. 4.

Introduction Folding Cooperativity Defined Thermodynamic Boxes: Design, Implementation, and Interpretation Thermodynamic Cubes: Design, Implementation, and Interpretation 5. Examples of Cooperativity in RNA 5.1. Secondary structure 5.2. Tertiary structure 6. Measuring Thermodynamic Parameters by UV Melting 6.1. Sample preparation 6.2. Wavelength selection 6.3. Extinction coefficient 6.4. Concentration independence 6.5. Buffer choice 6.6. Running an experiment 6.7. Melt fit equations 6.8. Nonlinear curve fitting with KaleidaGraph 7. Concluding Remarks Acknowledgment References

366 367 369 374 376 377 378 379 380 381 381 382 383 384 385 389 390 391 391

Abstract Double and triple mutant thermodynamic cycles provide a means to dissect the cooperativity of RNA and DNA folding at both the secondary and tertiary structural levels through use of the thermodynamic box or cube. In this article, we describe three steps for applying thermodynamic cycles to nucleic acid

*

Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04213-4

#

2009 Elsevier Inc. All rights reserved.

365

366

Nathan A. Siegfried and Philip C. Bevilacqua

folding, with considerations of both conceptual and experimental features. The first step is design of an appropriate system and development of hypotheses regarding which residues might interact. Next is implementing this design in terms of a tractable experimental strategy, with an emphasis on UV melting. The final step, and the one we emphasize the most, is interpreting mutant cycles in terms of coupling between specific residues in the RNA or DNA. Coupling free energy in the absence and presence of changes elsewhere in the molecule is discussed in terms of specific folding models, including stepwise folding and concerted changes. Last, we provide a practical section on the use of commercially available software (KaleidaGraph) to fit melting data, along with a consideration of error propagation. Along the way, specific examples are chosen from the literature to illustrate the methods. This article is intended to be accessible to the biochemist or biologist without extensive thermodynamics background.

1. Introduction The collective strength of multiple weak bonds allows remarkably complex biomolecular structures to form. In folding, weak bonding interactions (hydrogen bonds and stacking) often interact in a cooperative fashion. Relationships among bonding interactions can be complex. It is often assumed that weak bonds have little influence on one another; for example, that an interaction is weakened only slightly, or not at all, upon loss of a neighboring hydrogen bond. However, this interpretation is oversimplified. It has become increasingly clear that biopolymers, including nucleic acid are comprised of complex, interdependent interactions that serve structural and functional roles. Both RNA and DNA form large and complex structures. An RNA has greater than 50% more atoms than a protein with the same number of residues. Along with this increased size comes more dihedral bonds where rotation can occur, which introduces correspondingly greater potential for alternative conformations. Indeed, Turner (2000) pointed out that initiation of stacking in a dinucleotide requires a conformational entropy loss of 13 eu, equivalent to a TDSo contribution to DGo of approximately þ4.0 kcal/mol at 37 C. The resultant complex conformational landscape of nucleic acids has been treated with Ramachandran-like formalisms by Duarte and Pyle (1998), and Draper, Rose and coworkers (Murthy et al., 1999). Contributing to conformational entropy loss, RNA and DNA helices bear significant negative charge along the phosphodiester backbone that causes them to stiffen, as evidenced by long persistence lengths (Kebbekus et al., 1995). Such stiffening is often tempered, however, by defects in the helix, such as bulges and internal loops (Kahn et al., 1994), as well as by ionic strength (Lu et al., 2001). Indeed, shielding the negative backbone through cation association promotes structure formation, with divalent cations

Thermodynamic Cycles for RNA and DNA Folding

367

providing greater charge neutralization than monovalents (Misra and Draper, 2002). This ability of RNA and DNA to be stiff or flexible depending on sequence and solution conditions adds to functional diversity. As a result, DNA helices can spool around histones and RNAs often adopt compact and complex functional structures. These physical traits create a unique environment in which to probe the cooperativity of folding. The goal of this article is to provide conceptual and experimental frameworks for the biochemist or biologist who is interested in dissecting biopolymers in terms of functional group contributions to stability and structure. Although we focus on nucleic acids, the principles apply equally to proteins or nucleic acid–protein complexes. We assume that the reader has a basic grounding in thermodynamics and in RNA and DNA biochemistry, but that he or she is not necessarily an expert in thermodynamic measurements or interpretations. The article is designed to aid the design, execution, and interpretation of experimental data in terms of physical models. Of course, it is important to keep in mind that for all of the following discussions, thermodynamic cycles as manifested in boxes or cubes are path independent and therefore hold independent of the physical model used to interpret them.

2. Folding Cooperativity Defined In this section, we outline the use of thermodynamic boxes to determine the cooperativity of structure formation, or folding cooperativity, in RNA. We begin by defining cooperativity in the context of folding. In the extreme limit, an RNA system is highly cooperative (or nonadditive) if just two states, say folded (F) and unfolded (U), are populated, with any folding intermediates being poorly populated (Dill and Bromberg, 2003). In the other limit, a system is noncooperative (or additive) if other states, such as an intermediate (I), populate. (The origin of the terms additive and nonadditive is described subsequently.) Experimentally, folding cooperativity is assessed by testing whether disruption of one structural interaction affects the thermodynamic worth of another interaction. Cooperativity must also be defined in terms of the relative positioning of the interactions being examined. In one scenario, disruption of a particular hydrogen bond has little effect on overall structure (Fig. 13.1A). This would represent a non-cooperative case. In another scenario, disruption of a hydrogen bond weakens local interactions (Fig. 13.1B). If the second interaction queried is local, then the system would be deemed cooperative, but if the second interaction queried is distal, little or no cooperativity would be scored. Thus, the presence of cooperativity depends on how the question is asked. In a third scenario, disruption of a given hydrogen bond perturbs

368

Nathan A. Siegfried and Philip C. Bevilacqua

A

Break one bp

B

C or

Figure 13.1 Examples of cooperativity in hairpin structure formation. The fully paired hairpin on the left has a single base pair (in bold) disrupted by functional group modification, base substitution, or sugar modification. Response of nearby structure is used to quantify folding cooperativity. (A) Low cooperativity. Disruption of the single base pair does not appreciably disturb local or distant structure. (B) Intermediate cooperativity. Disruption of the single base pair weakens nearby structure. Probing nearby interactions reveals weakened stability and intermediate cooperativity. (C) Large cooperativity. Disruption of the single base pair breaks local structure, as depicted in the leftmost structure, and potentially long-range structure, as depicted in the ensemble of states in brackets. Probing local interactions should reveal no stability and high cooperativity, while probing distal interactions might reveal intermediate to high cooperativity.

local and distal interactions (Fig. 13.1C). In this case, the system would be deemed highly cooperative both locally and distally. Hess’s law states that the overall enthalpy change of a reaction is the sum of the enthalpy changes for all steps that lead from reactants to products, and that this overall enthalpy change is path independent. This allows a complex reaction to be treated as a series of smaller, experimentally accessible steps, and forms the basis for the thermodynamic box. An early application of thermodynamic boxes to biopolymers was to residues in the active site of a tRNA synthetase (Carter et al., 1984). The affinity of tyrosyl-tRNA synthetase for ATP was probed using double mutants. A threonine to proline substitution significantly increased substrate affinity, and thermodynamic boxes revealed the molecular basis for this effect. While the systems being studied with this method have broadened significantly over the years, including applications to RNA and DNA (see subsequently), the general concepts have not. A final word is in order about the thermodynamic boxes treated herein. We focus on methods to dissect folding cooperativity for unimolecular processes. However, there are many interesting systems involving bimolecular processes, such as duplex formation or ligand binding (e.g., proton or

369

Thermodynamic Cycles for RNA and DNA Folding

Mg2þ) (Bevilacqua et al., 2004; Misra et al., 2003; Moody et al., 2005). The discussion here applies equally to these, although it is important to be mindful that opposite edges in thermodynamic boxes, as discussed in the next section, must be of the same molecularity.

3. Thermodynamic Boxes: Design, Implementation, and Interpretation

Pathway 1

For the sake of experimental tractability, thermodynamic boxes are typically built up from two mutations to an RNA or DNA molecule of interest, and are thus often referred to as double mutant cycles (Fig. 13.2). If mutants are to be made to a larger RNA by a cloning method such as QuikChange (Stratagene), only two sets of mutant primers are needed, which provides experimental convenience (Chadalavada et al., 2002). A typical thermodynamic box consists of the following four sequences displayed at the corners of the box: wild-type (M00, M for mutant, and 0 for no change) representing the unperturbed molecule; double mutant (M11, 1 for change) representing the molecule in which both of the functional groups of interest, A and B, have been altered; and two single mutants (M10 and M01) representing the molecule in which just one of the functional groups,

M10 ΔGA M00

AΔG

B(=ΔGB

ΔG A

+ dAB)

M11 BΔG

A(=ΔGA

B

+ dAB)

M01

ΔGB Pathway 2

Figure 13.2 Thermodynamic box for a double mutant cycle. Each edge of the box represents the energetic consequence of a modification or mutation. For example the left-hand edge corresponds to change at position A in the background of wild-type with free energy DGA, and the right-hand edge to a change at position A in the background of a change at position B, BDGA. The extent to which these two terms differ, or couple, is dAB, the coupling free energy, which is shown in bold. DGAB represents the energetic impact of both mutations. Progression from M00 to M11 can follow either Pathway 1 or 2. See text for further analysis. Adapted from Moody and Bevilacqua (2003) with permission from the American Chemical Society.

370

Nathan A. Siegfried and Philip C. Bevilacqua

A or B respectively, has been modified (Fig. 13.2). This notation is adapted from Di Cera (1998). It is important to mention that as experimentalists we can only add or delete functional groups, we cannot add or delete interactions per se. As such, it is straightforward to state the thermodynamic worth of a given functional group in a given context, but it is not straightforward to state the thermodynamic worth of a given interaction in a given context. This is because deletion of a functional group can perturb (weaken or strengthen) multiple interactions. In actuality, it is the effect of functional group alteration on all other bonding interactions in the molecule that comprise physical models for interpretation of the box. These considerations have been used to search for functional groups whose protonation—an effective synthesis of a new functional group—might result in a large free-energy change, and therefore large pKa shift (Bevilacqua et al., 2004; Moody et al., 2005). It is useful to consider some of the experimental and conceptual implications of the thermodynamic box. We define pathway 1 as progressing from M00 to M10 and ending at M11, and pathway 2 as passing through M01 and ending at M11 (see Fig. 13.2). Note that the arrows for the four individual steps (edges along the box) are in a single direction. This is a convention that defines the algebraic sign on DG and is not meant to imply that the system is not at equilibrium. The arrow convention is also applied to the two large pathway arrows flanking the box. Each edge of the box is assigned a free energy, which is the difference in free energy between the state at the corner with the arrowhead and the state at the corner with the beginning of the same arrow. The left-hand edge (going from M00 to M10) is assigned a free energy of DGA, the free energy associated with changing functional group A, while the opposite edge (going from M01 to M11) is assigned BDGA, the free energy associated with changing A in the background of functional group B being changed. Similar definitions are made for the bottom and top edges in the box, corresponding to DGB and ADGB, respectively. In addition, we draw a diagonal from M00 to M11, which is assigned the difference in free energy between the double mutant and wild-type, DGAB. It should be noted that the free energy change along each of these arrows is between two folded states. Because the free energy at a given corner is typically determined between folded and unfolded states, an assumption underlying thermodynamic boxes is commonality of the unfolded state for each mutant. This assumption should be valid if there is no residual structure in the unfolded state. Discussion to this point has been grounded in experimentally determined values: the free energies at each vertex are those measured from experiment, while the free energies along each edge are subtractions of measurements. Next, we can begin to interpret these data using physical models.

Thermodynamic Cycles for RNA and DNA Folding

371

We evaluate the degree of cooperativity between A and B by comparing measured free energies between the two edges of the box across from each other. For example, the free energy associated with change A in the background of a change at site B (BDGA on the right-hand edge) is compared to the free energy associated with change A in the wild-type background (DGA on the left-hand edge) (see Fig. 13.1). Interpretation enters in asking whether these two changes have the same value, and if not by how much they differ. This is essentially a question of equality: ‘‘Does BDG =DG ?’’ A simple way to evaluate this equality is to solve the A A following equation, which introduces dAB, the coupling free energy, B

DGA ¼ DGA þ dAB :

ð13:1Þ

If the right- and left-hand edges of the box are equal in free energy, then dAB ¼ 0, but if the two edges are unequal, then dAB 6¼ 0. We can infer from the simple form of Eq. (13.1) (i.e., from the mathematical operation of addition between DGA and dAB) that dAB must be a type of free energy, consistent with its assignment as coupling free energy. In fact, dAB gives the magnitude of the energetic impact mutation at one site (site B) has on the other (site A). If the sign on dAB is negative then removal of the first interaction weakens the second (positive coupling), while if the sign is positive, removal of the first interaction strengthens the second (negative coupling) (Di Cera, 1998). A value of zero, on the other hand, indicates no coupling (complete additivity). The most common type of coupling observed in nucleic acids is positive coupling (see section 5). This treatment leads us to two equations:

dAB ¼ DGo ðM00 Þ þ DGo ðM11 Þ ½DGo ðM10 Þ þ DGo ðM01 Þ ð13:2Þ dAB ¼ DGAB ½DGA þ DGB

ð13:3Þ

Equation (13.2) is obtained by substituting appropriate free energies from the box into Eq. (13.1), while Eq. (13.3) can be arrived at by adding the four reference states of 2DG(M00)–2DG(M00) (¼0) to the right-hand side of Eq. (13.2). As such, Eq. (13.3) does not add anything new mathematically to our treatment; in fact, this equation should be avoided for error analysis because of overcounting, as described subsequently. However, Eq. (13.3) is conceptually valuable in that it states that coupling free energy is a quantitative evaluation of how the free energy associated with the double mutation (DGAB) differs from the sum of the free energies associated with the two single mutations (DGA þ DGB). Practically speaking, it is often convenient

372

Nathan A. Siegfried and Philip C. Bevilacqua

to place a DGA þ DGB column adjacent to a DGAB column in a table for comparative purposes (Klostermeier and Millar, 2002; Moody and Bevilacqua, 2003). If the free energy changes associated with the two single mutations add up to the free-energy change associated with the double mutation, then Eq. (13.3) indicates that dAB ¼ 0, and the system is deemed additive (i.e., A and B do not interact and the system is noncooperative), otherwise the system is nonadditive (i.e., A and B interact and the system is cooperative). This is the origin of the term ‘‘additive.’’ It is important to note that additivity is applicable only for free energy (or enthalpy, entropy, and so on), not for equilibrium constants, which should be converted into free energies before carrying out such an analysis. A graphical corollary of Eq. (13.3) is that if the system is additive, then the diagonal arrow emanating from M00 will be the sum of the two edge arrows emanating from M00 (Fig. 13.2). Finally, we note that Eq. (13.2) (and therefore Eq. (13.3)) can also be arrived at by considering the bottom and top edges of the thermodynamic box, beginning with the question, ‘‘Does ADGB = DGB ?’’ Because of this, we can write down an equation analogous to Eq. (13.1), A

DGB ¼ DGB þ dAB :

ð13:4Þ

Solving Eqs. (13.1) and (13.4) for dAB and setting them equal, leads to the equation B

DGA DGA ¼ A DGB DGB ;

ð13:5Þ

which is a statement that the difference between the right- and left-hand edges of the box is the same as the difference between the top and bottom edges of the box. A corollary of these expressions is that the value of the coupling free energy along the right-hand and upper edges of the box will always turn out to be the same. The discussion to this point has been fairly abstract. To aid implementation of these ideas into experiments, we now provide a specific example from the literature, the treatment of which will be built up in a manner that parallels the conceptual treatment above. This treatment is provided in Fig. 13.3, which can be referenced to Fig. 13.2. We proceed from direct recording of experimental values for the free energy differences between folded and unfolded states (i.e., values from thermal denaturation, or melting experiments, which are described subsequently), to differences between two folded states (assuming a common unfolded state, see earlier), to interpretation of these differences in terms of a coupling free-energy model. The system under consideration is a DNA hairpin with a GCA triloop (Moody and Bevilacqua, 2003). The changes were to two functional groups thought to hydrogen bond to each other in the loop; details beyond this are

373

Thermodynamic Cycles for RNA and DNA Folding

A

M10 −1.78

M11 −1.91

M00 −3.60

M01 −1.95

B M10 −1.78

+1.82 M00 −3.60 C M10 −1.78

+1.82 M00 −3.60

−0.13

70

+1.

+1.65 −0.13 (=+1.65 -1.78)

70

+1.

+1.65

M11 −1.91

0.04 M01 −1.95

M11 −1.91 0.04 (=+1.82 -1.78) M01 −1.95

Figure 13.3 Implementation of the thermodynamic box from Fig. 13.2. The freeenergy values in this example come from a study of a GCA triloop hairpin in DNA (Moody and Bevilacqua, 2003). All values are in units of kcal/mol. (A) The first step is to measure the thermodynamic stability of the wild-type (M00), two single mutants (M10 and M01), and the double mutant (M11).Values were generated from UV-melting studies. These are the only experimental inputs. (B) The next step is to subtract the appropriate corners to give values associated with a given mutation, which are placed along the diagonal and each of the four edges of the box. (C) The final step is to evaluate equality between the right- and left-hand edges of the box using Eq. (13.1), and between the top and bottom of the box using Eq. (13.4).These terms are enclosed in parentheses to emphasize their interpretive nature. The value for dAB, of 1.78 kcal/mol in this instance, is shown in bold as in Fig.13.2.

of a lesser concern here, and the interested reader is referred to the original article. The first step in our treatment is to record the experimental free energies for folding of the four sequences of interest: the wild-type, the two single mutants, and the double mutant. These values, which were obtained by UV melting experiments, are placed at the appropriate corners of the box (Fig. 13.3A). It should be noted that these four values are the only experimental inputs for the box; the remainder of the following treatment is simply mathematical manipulation of these values. Next, we subtract the appropriate corners to give values associated with each mutation and write these along the diagonal and each of the four edges

374

Nathan A. Siegfried and Philip C. Bevilacqua

of the box (Fig. 13.3B). At this point, we note that no coupling model has been invoked. Last, we evaluate equality between the right- and left-hand edges of the box using Eq. (13.1), and between the top and bottom of the box using Eq. (13.4). This begins the process of data interpretation (Fig. 13.3C). These last terms are added in parentheses to the thermodynamic box to emphasize that they arise from interpretation of experimental measurements. Note also that the dAB values on the top and right-hand edges of the box turn out the same and that the final box in Fig. 13.3C is built up just from the results of four melting experiments (Fig. 13.3A). The final step in this process is to interpret these values in terms of a structural model. We see that the second change along either pathway has essentially no thermodynamic worth (Fig. 13.3B), and that the magnitude of dAB is large (1.78 kcal/mol) (compare Fig. 13.3C to Fig. 13.2) and approximately equal to the first change along either pathway. This outcome of high cooperativity, or nonadditivity, is congruent with the model that A and B participate in the same hydrogen bond, in that the hydrogen bond can be removed by deleting A or B, and that once it is deleted it cannot be deleted again. As discussed, a negative sign on dAB denotes positive coupling, consistent with the two functional groups participating in the same hydrogen bond. This example has also be referred to as fully cooperative or completely nonadditive because dAB equals the smaller free energy of the two edges emanating from M00 (Moody and Bevilacqua, 2004; Siegfried et al., 2007). ‘‘Fully cooperative’’ is an apt term because this relationship causes either BDGA or ADGB to approach zero (Eqs. (13.1) or (13.4)), consistent with loss of one interaction causing loss of the other. The nonadditive nature of this example can also be seen in that the diagonal emanating from M00 is not the sum of the two edge arrows emanation from M00.

4. Thermodynamic Cubes: Design, Implementation, and Interpretation Hess’s law can also used to develop more complex thermodynamic relationships. In particular, thermodynamic cubes can be constructed from triple mutant cycles on the basis of Hess’s law (Fig. 13.4) (Klostermeier and Millar, 2002; Moody and Bevilacqua, 2003).1 Thermodynamic cubes are built up from the 8 sequences displayed at the vertices of the cube: wildtype (M000); three single mutants (M100, M010, M001) with changes at positions A, B, and C, respectively; three double mutants (M110, M101, 1

In actuality, the term ‘‘thermodynamic box’’ is a misnomer for the model in Fig. 13.2, which is a square rather than a box, while the cube in Fig. 13.4 is really a box. However, usage of ‘thermodynamic box’ for the model in Fig. 13.2 is entrenched in the literature.

375

Thermodynamic Cycles for RNA and DNA Folding

ACΔG B (=AΔGB + AδBC) (=CΔGB + CδAB)

M111

(= A ΔG (= B ΔG C ΔG C + A C+ B δB δ C) A C)

AΔG B (=ΔGB + δAB)

M100

BCΔG A (=BΔGA + BδAC) (=CΔGA + CδAB)

A

B

A

(= ΔG ΔG C C+

δ

A

C)

M101

ΔGA

BΔG

A

(=ΔGA + δAB)

CΔG A (=ΔGA + δAC)

M110

CΔG

B

M001

(=ΔGB + δBC)

B

(= ΔG ΔG C C+

δ

ΔG

C

BC

)

M011

M000

ΔGB

M010

Figure 13.4 Thermodynamic cube for a triple mutant cycle. This higher-order thermodynamic cycle is constructed from thermodynamic boxes as illustrated in Fig. 13.2. Three interactions are examined; accordingly M000 is the wild-type (front of cube, lower left corner) and M111 is the triple mutant (rear of cube, top right corner). The large number of pathways from M000 to M111 provides additional points to evaluate structural cooperativity. See text for further analysis. From Moody and Bevilacqua (2003) with permission from the American Chemical Society.

and M011); and a triple mutant (M111) (Moody and Bevilacqua, 2003, 2004). The concepts developed for the thermodynamic box have direct analogies in the cube. The front, left, and bottom faces of the cube are double mutant cycles analogous to Fig. 13.2 and comprise discrete thermodynamic boxes. These three boxes share the M000 vertex. The back, right, and top faces of the cube are double mutant cycles too, but are in the background of a change at a third site. For example, the back face is the same double mutant cycle as the front face, but in the background of a mutation at position C. These three boxes share the M111 vertex. The higher dimensionality of the cube requires us to define higher order coupling free energies, such as CdAB, which is the coupling free energy

376

Nathan A. Siegfried and Philip C. Bevilacqua

between A and B in the background of a mutation at position C. For graphical assignment of this coupling energy to a specific face, the reader is referred to (Moody et al., 2004). In analogy to the thermodynamic box, we compare the right- and left-hand edges of the back face, and ask, ‘‘Does BCDGA ¼ CDGA?’’, where BCDGA is the free energy associated with a change at position A in the background of changes at positions B and C. We begin with Eq. (13.6), which is analogous to Eq. (13.1), BC

DGA ¼ C DGA þ C dAB :

ð13:6Þ

Likewise, in analogy with the simple thermodynamic box, we have two additional equations: C

dAB ¼ DGo ðM001 Þ þ DGo ðM111 Þ ½DGo ðM101 Þ þ DGo ðM011 Þ ð13:7Þ dAB ¼ C D GAB ½C DGA þ C DGB

C

ð13:8Þ

Di Cera defined coupling between A and B as direct when it is independent of changes elsewhere in the molecule, and indirect when it depends on other changes in the molecule. Mathematically, we can evaluate the direct or indirect nature of a coupling by asking whether CdAB ¼ dAB. If the answer is yes, then coupling is likely to be direct, and if no, then coupling is definitely indirect. In principle, direct coupling requires equality of idAB and dAB for any ith site, although in practice this is very hard to achieve because of the vast number of mutants that would need to be evaluated. Somewhat paradoxically, for directly coupled systems, C can couple with both A and B, even though C does not affect the coupling between A and B (Di Cera, 1998; Moody et al., 2004). Accordingly, directly coupled systems often fold in a stepwise manner and obey nearest-neighbor rules. Indirectly coupled systems, on the other hand, are defined by a network of interdependent interactions, and as such typically fold in a concerted manner.

5. Examples of Cooperativity in RNA It is not the intention of this article to provide a comprehensive review of the literature on thermodynamic coupling in nucleic acids. Nonetheless, it is instructive to briefly examine the scope of studies done to date on RNA and DNA, both in terms of size of the nucleic acids and the nature of the coupling revealed. These examples provide a useful reference from which

Thermodynamic Cycles for RNA and DNA Folding

377

to design, implement, and interpret thermodynamic cycles in systems of interest to the reader. We begin with smaller model oligonucleotides and move to larger functional RNAs.

5.1. Secondary structure Perhaps the most distinguishing feature of RNA is that it is typically single stranded and so can fold back on itself and form structural motifs with high diversity, including hairpin loops, bulges, and internal loops (Bevilacqua and Blose, 2008). Recent studies have begun to probe the molecular and energetic origin of these structures through thermodynamic cycles. Studies from our lab comparing model RNA and DNA hairpin loops have revealed extensive coupling within DNA triloops and tetraloops (Moody and Bevilacqua, 2003, 2004). Single mutations provide large free energy changes on the order of 1.5 kcal/mol; double mutant cycles reveal nearly complete nonadditivity, with dAB values of 1 kcal/mol; and triple mutant cycles show indirect coupling with dAB and CdAB differing by more than 0.5 kcal/mol. These studies support concerted folding of these DNA loops. There are relatively few structural interactions in these DNA loops (Hirao et al., 1994), thus loss of any one interaction may lead to loss of the others in analogy to a three-legged stool. RNA tetraloops, on the other hand, revealed much less cooperative behavior. Single mutations provided much smaller free energy changes, as previously reported (SantaLucia et al., 1992), double mutant cycles gave less nonadditivity, and coupling was largely direct (Moody et al., 2004). These studies supported a less-concerted, stepwise folding of RNA loops, consistent with the significantly greater number of interactions in these loops, which has been likened to an eight-legged stool. A take-home message from this study is that nucleic acid folding can be related to single mutants, double mutant cycles, and triple mutant cycles in ways that are congruent with structure. Such studies are especially powerful when conducted in a comparative fashion between two related systems. As a final point, we would like to return to the axiom that an experimentalist can only delete a functional group, not an interaction. As such, one must exercise care in assigning a resultant free energy to a specific interaction. A cautionary tale is that deletion of a hydrogen bonding functional group in a sheared GA in a DNA hairpin triloop or tetraloop is worth 1.5–1.6 kcal/mol (Moody and Bevilacqua, 2003, 2004), while deletion of the same hydrogen bonding functional group in a sheared GA in an RNA hairpin loop is worth just 0.65 to 0.75 kcal/mol (Moody et al., 2004; SantaLucia et al., 1992). Given these data, we should not conclude that hydrogen bonding is worth more in DNA, unless a full thermodynamic cycle at the level of a cube is conducted. What actually seems to be happening is that deletion of the functional group in DNA leads to loss of

378

Nathan A. Siegfried and Philip C. Bevilacqua

more than just this one hydrogen bond, as evidenced by the greater cooperativity of the system. An alternative explanation, that the hydrogen bond is worth just as much in RNA but is masked by strengthening of other hydrogen bonds upon removal of the first interaction, seems less attractive because deletion of other hydrogen bonding groups in the absence of the first interaction does not lead to a greater thermodynamic penalty, but a lesser one (Moody et al., 2004). We also conducted a study on folding cooperativity within model DNA and RNA helices (Siegfried et al., 2007). In this study, the position of the double mutant cycle was varied along the length of the helix, and the spacing of the base pairs in the cycle was also tested. In external registers (those near the base of the helix), folding cooperativity was large for both RNA and DNA, while in internal registers, folding cooperativity was large for RNA only. The former observation was reconciled with helix initiation phenomena common to RNA and DNA, while the latter observation was reconciled with steric and electrostatic properties of RNA (Fig. 13.1). This study emphasizes the importance of defining the relative positioning of the two interactions in a coupling, as well as importance of conducting comparative analyses, this time comparing both along a given helix as well as between RNA and DNA.

5.2. Tertiary structure Three studies on larger RNAs provide insight into thermodynamic cycles, especially because each was implemented in a unique manner. First was a study from Klostermeier and Millar (2002) on the hairpin ribozyme, an 100-nt self-cleaving ribozyme. Free energies were measured in this study by time-resolved fluorescence resonance energy transfer (trFRET) using fluorescently labeled hairpin ribozymes. Deletion of single hydrogenbonding groups destabilized the RNA by only 0.5 kcal/mol; double mutant cycles suggested that hydrogen bonds adjacent to the cleavage site (near G þ 1) or near a distal tertiary folding site (near U42) behave cooperatively, with dAB values of 0.25 to 0.62 kcal/mol, while triple mutant cycles showed indirect coupling, with dAB and CdAB differing by 0.25 kcal/mol at the cleavage site. Interestingly, no coupling was found between the G þ 1 and U42 sites, suggesting that, at least for these two regions of the ribozyme, coupling is not global in nature. The small magnitude of the free energy changes from double mutant cycles and single mutations is more similar to the findings from the RNA tetraloop study (Moody et al., 2004; SantaLucia et al., 1992), consistent with the greater number of interactions in an RNA, as expected given the 20 -OH; indeed, G þ 1 and U42 engage in 5 interactions each in the wild-type ribozyme. The other two studies on larger RNAs considered here were conducted on the 160 nt independently folding P4-P6 domain from the group I

Thermodynamic Cycles for RNA and DNA Folding

379

intron from Tetrahymena thermophila. Silverman and Cech (1999) examined contributions of individual hydrogen bonds to folding of a ribose zipper motif in P4-P6. Free energies were measured by analysis of relative mobility in native gels containing different concentrations of Mg2þ. Removal of hydrogen bonding groups destabilized this RNA by only 0.5 kcal/mol, similar to the aforementioned study by Klostermeier and Millar (2002). Double mutant cycles revealed that contributions were fairly additive, consistent with absence of cooperativity in folding of the ribose zipper motif. More recently, Sattin and coworkers (2008) evaluated cooperativity between two distant tertiary contacts—a metal core/metal core receptor and a tetraloop/tetraloop receptor—in P4-P6 folding using a single-molecule FRET (smFRET) techniques, yet another experimental implementation of cooperativity measurements. Single-molecule studies, these authors point out, have the advantage that they directly measure populations of states, and so are intrinsically more accurate than other methods (e.g., Mg2þ titrations and UVmelts) that require extrapolation to conditions far from the midpoint of the measurement. These authors reported a large coupling free energy, which they refer to as tertiary cooperativity, of 3.2 kcal/mol, which is comparable to tertiary cooperativity for protein folding. Differences in cooperativity between the two P4-P6 studies appears to be a manifestation of the scale over which the changes are assayed. When assayed locally, as within the ribose zipper, no cooperativity was seen, but when assayed globally, as in two distal tertiary contacts, high cooperativity was found. To summarize, smaller nucleic acids systems can show large or small cooperativity, as can large nucleic acids. Multiple experimental methods can be implemented to make the requisite free energy measurements, including UV-melts, trFRET, Mg2þ-dependent PAGE, and smFRET. The outcome of a given study depends on experimental design, both the system chosen and the length scale over which the questions are asked. Interpretation of the results depends on the molecular features and physical interactions of the system at hand.

6. Measuring Thermodynamic Parameters by UV Melting A variety of experimental methods can be implemented to provide thermodynamic parameters for constructing thermodynamic boxes, as illustrated by the preceding examples. Perhaps the most common method for determining nucleic acid thermodynamics is thermal denaturation, often referred to as UV-melting or absorbance melting, as the unfolding transitions have analogies to phase transitions and the experimental observable is the intrinsic hyperchromicity of the nucleobases. Hyperchromicity is a

380

Nathan A. Siegfried and Philip C. Bevilacqua

convenient property since potentially perturbing labels do not have to be introduced to the RNA or DNA. It arises because the intrinsic electric dipole transition moments tend to cancel when bases stack or pair, giving rise to a smaller extinction coefficient in the folded state (Cantor and Schimmel, 1980). This method is common both because of its simplicity in data acquisition and fitting, and because UV-melting apparatuses are commercially available and relatively inexpensive. To carry out UV melts, we need a UV-vis spectrometer with good S/N that houses microcells, can vary temperature, and has a cell changer. Such spectrometers are commercially available from Beckman Coulter, Cary, Jasco, and Shimadzu. It is advisable that an instrument be calibrated on a regular basis. A small thermocouple can be placed in the cell and the temperature read out on an electronic thermometer. In addition, melting of a standard sample with well-fit parameters, such as tRNA, should be done periodically (Draper et al., 2000). We provide here methods for conducting UV-melts and obtaining thermodynamic values from unimolecular unfolding nucleic acids, such as the small hairpins and larger ribozymes described previously. Puglisi and Tinoco (1989) published an insightful review article on UV melting that takes into account systems of higher molecularity, as well as alternative data fitting methods.

6.1. Sample preparation The concentration of RNA of DNA is an important consideration. Overly concentrated or dilute samples will result in poor S/N from either too few photons reaching the detector, or the transmitted light being too similar to the incident light. Ideally, absorbance values measured at high temperature (i.e., for the unfolded state) should fall between 0.3 and 1.0, although we have obtained acceptable data with absorbance as low as 0.1 or as high as 2.0 if the melting transition is sharp. Typical percentage hyperchromicity, defined as (AF AU)/AF at temperatures near the melting transition, is typically between 15% and 20% but can vary depending upon wavelength (see subsequently), the fraction RNA structured, and sequence. The BeerLambert law is used to determine oligonucleotide concentration and ensure that absorbance values will be optimal for melting,

AðTÞ ¼ eðTÞbc;

ð13:9Þ

where A is absorbance and is a function of temperature; e is extinction coefficient and is a function of temperature; b is pathlength of the cuvette; and c is concentration of the sample.

Thermodynamic Cycles for RNA and DNA Folding

381

6.2. Wavelength selection Absorbance (A) is typically monitored at 260 or 280 nm for nucleic acids, although other wavelengths can occasionally be used (see subsequently). Wavelength selection depends on nucleobase composition. GU- and AU-rich sequences have maximum hyperchromicity at 260 nm and hyperchromicity near 0 at 280 nm; maximal hypochromicity for GC-rich structures, on the other hand, occurs at 280 nm, although significant hypochromicity persists at 260 nm (Fresco et al., 1963; Gray and Ratliff, 1977). In fact, melts conducted at both wavelengths, which some spectrometers are capable of doing in a single experiment, can give insight into which portion of a large RNA is unfolding in a given transition (Gluick and Draper, 1994; Gluick et al., 1997; Schaak et al., 2003). The thermodynamic parameters determined at the two wavelengths (and any other wavelengths) should be in good agreement if the sample melts in a two-state fashion. Last, the maximal change in hyperchromicity can be found on a case-by-case basis by comparing the UV-vis spectra of the folded and unfolded states. This could be helpful for choosing a wavelength that would lower the absorbance of especially concentrated samples (see subsequently).

6.3. Extinction coefficient The extinction coefficient, as it is dependent on base-base interactions, is the parameter in Eq. (13.9) that changes with melting of the nucleic acid. Because the extinction coefficient is dominated by base-base interactions, a nearest neighbor model can be used to predict its value with good accuracy. (Accuracy in concentration is especially important for bimolecular and higher molecularity systems (Puglisi and Tinoco, 1989).) Tables of terms for extinction coefficient calculation and formulas for calculating them have been compiled at 260 nm (Borer, 1975), as well as every 5 nm between 230 nm and 300 nm (Cantor and Schimmel, 1980; Richards, 1975). It is important to note that these extinction coefficients were tabulated with the assumption that the nucleic acid would be unfolded. This can be accomplished conveniently during each melt by simply using an absorbance value from the high temperature baseline for a concentration calculation. Concentration can be calculated with 260 or 280 nm melts, as tables report both of these values (Borer, 1975; Richards, 1975). If so desired, an extinction coefficient at another temperature can be calculated by the simple formula eT ¼ (AT/AU) eU, where AT is the absorbance of the same sample at the temperature of interest. Often a convenient temperature is room temperature as heating or cooling of the sample will not be needed. A number of vendors have calculators available on the Internet for extinction coefficient calculation. Before using these, however, it is important to ascertain

382

Nathan A. Siegfried and Philip C. Bevilacqua

whether these apply to low or high temperatures, as oligonucleotides with self-structure can lead to inaccuracies of 15%–20% from hypochromicity. The extinction coefficient of RNA or DNA can also be determined experimentally by degradation of the nucleic acid (Zaug et al., 1988). A room temperature absorbance reading of the folded RNA or DNA is taken, then the nucleic acid is cleaved to completion with nucleases, or using alkali in the case of RNA (overnight in 0.3 M NaOH at 37 C) (Cavaluzzi and Borer, 2004). The absorbance of the cleaved nucleic acid is recorded, and the extinction coefficient is calculated from the formula eT = (AT/AU)eU, where eU is determined from the sum of the nucleoside monophosphate extinction coefficients, which have been recently updated (Cavaluzzi and Borer, 2004). Last, it should be noted that the extinction coefficients of RNA and DNA are large. Average extinction coefficient at 260 nm for the purines and pyrimidines are 1.3 104 and 0.8 104 M1cm1, respectively (Cavaluzzi and Borer, 2004). As such, extinction coefficients of oligonucleotides are often 105 and of functional RNAs are 106 or 107 M1cm1. The magnitude of these values lead to optimal absorbance for melts with sample concentrations of just a few mM for a typical oligonucleotide and nM for large RNAs. Thus, UV melting is a sensitive technique, especially when compared to ITC and NMR.

6.4. Concentration independence In contrast to systems of higher molecularity, unimolecular unfolding nucleic acids have no concentration dependence to melting temperature (TM). Thus, an important control is to melt the RNA or DNA over a range of concentrations to assure concentration independence of the TM. The actual range of concentrations that should be tested is dependent on the thermodynamic properties of the particular RNA or DNA, and should be examined on a case-by-case basis. We recommend that the user calculate the expected change in TM using nearest-neighbor parameters for RNA and DNA (Mathews et al., 1999; SantaLucia, 1998; Xia et al., 1998), and the following formula for self-complementary duplexes (Turner, 2000), which is the state into which a hairpin or ribozyme is most likely to misfold,

TM ðo CÞ ¼

DHo 273:15; DSo þ R ln C

ð13:10Þ

where R is the universal gas constant of 0.001987 kcal K1mol1, and C is concentration of nucleic acid (in M). Equation (13.10) requires inputs of DHo and DSo from RNA and DNA studies (Mathews et al., 1999;

Thermodynamic Cycles for RNA and DNA Folding

383

SantaLucia, 1998; Xia et al., 1998). These values are generally unavailable at present on Web servers, but a particularly user-friendly guide to calculating the parameters for an oligonucleotide of interest is available (Serra and Turner, 1995). As an example, we calculated that a set of 12 mer RNA oligonucleotides, designed to form a stem of 4 bp and a tetraloop, would have a TM change of 9–12 C over a concentration range of 250-fold if they were forming self-complementary duplexes (Proctor et al., 2002). Upon melting these RNAs over this concentration range, the TM was found to be independent of concentration and fluctuated by only 1 C. This gave confidence that these particular oligonucleotides fold unimolecularly over this range. Nonetheless, most of the data for thermodynamic parameters in that study were collected at the low end of this concentration range, as duplexes will often form when the concentration is high enough, and to minimize sample requirement. It is important to note that the change in TM over a given concentration range for a self-complementary duplex will be smaller for a larger base-pairing region, which can be confirmed by substitution into Eq. (13.10). Thus, longer stretches of potential base pairing require a larger range of concentrations to assure concentration independence. Note also that if the concentration of salt is varied in the study, then this control should be repeated at high salt, as duplexes are preferentially favored at higher salt for short oligonucleotides (Proctor et al., 2002, 2003). To cover an appropriate range of concentrations while maintaining an ideal absorbance reading, b or e should be adjusted, see Eq. (13.9). To reach the lowest concentrations possible, melting should be conducted in a 1-cm pathlength cell at 260 nm, which is typically near the maximum in the absorbance spectrum. To reach the highest concentrations possible, melting should be done at 280 nm, using a shorter pathlength cell. We typically conduct melts in 1-, 0.5-, and 0.1-cm quartz cuvettes, which are commercially available from Beckman-Coulter, Nova Biotech, Starna, or Helma. Quartz cells are used because they are UV transparent and can withstand a wide range of temperatures. If very high concentrations are needed, it is possible to conduct melts in pathlengths down to 0.1 mm by inserting a 0.9-mm quartz spacer into 1-mm cuvettes, although the length of the spacer should be calibrated by dilution of stocks (Bevilacqua and Turner, 1991); also, wavelengths higher than 280 nm can be chosen if they give acceptable hyperchromicity.

6.5. Buffer choice Optimal buffer conditions are system dependent. For melting of oligonucleotides that form only secondary structure, divalent ions are typically omitted, and so phosphate is a good buffer choice given its pKa near neutrality and simple structure (Antao et al., 1991; Siegfried et al., 2007).

384

Nathan A. Siegfried and Philip C. Bevilacqua

Phosphate, however, should be avoided for Mg2þ-containing buffers as Mg3(PO4)2(s) forms at high temperatures, which reveals itself in strong scattering of the signal. A common buffer is P10E0.1 (¼ 10 mM sodium phosphate and 0.1 mM Na2EDTA, pH 7.0, with a final [Naþ] of 14 mM), often with addition of a monovalent salt such as 100 mM KCl. To exchange buffer, samples are dialyzed for 3–4 h using a Gibco BRL (Invitrogen) microdialysis apparatus with a flow rate of 1 L/hour and a Spectra-Por (Spectrum Labs) membrane with a 1000 molecular weight cutoff (Antao et al., 1991; Siegfried et al., 2007). The small amount of EDTA is included to bind trace polyvalent metals that otherwise might catalyze hydrolysis of RNA’s sugar-phosphate backbone, especially at the higher temperatures of a melt. In cases where higher-order folding of RNA or DNA is being studied, Mg2þ should be added, though it should be noted that these melts are generally only carried out from low to high temperature due to RNA hydrolysis (see subsequently). It is also important to choose a buffer whose pH is not significantly dependent on temperature and which does not bind divalents. Many of the Good buffers are excellent candidates, as tabulated in (Good et al., 1966), and Rorabacher and colleagues have prepared buffers based on tertiary amines that do not interact with metal ions (Yu et al., 1997).

6.6. Running an experiment Following is a brief guide to carrying out UV melting experiments. Step 1: We typically store solutions of RNA and DNA oligonucleotides at 20 C, either in water or melt buffer. It is critical that samples removed from the freezer be renatured before each experiment. This helps ensure a unimolecular and homogeneous folding population. We have observed extra transitions in melts for samples that were not renatured, presumably because dimerization occurs during freezing. It is also critical that samples that have been concentrated, either in a Speed-vac or precipitated, be renatured prior to melting as dimers can form during the concentration step and become kinetically trapped. Renaturation should be done before each experiment. If no Mg2þ is present, renaturation is typically accomplished by heating the sample to 95 C for 2 min then allowing it to cool to room temperature for 15 min. A convenient way to do this is in the cuvette in the UV-vis spectrometer itself. If RNA is being melted and its melting temperature is known, it is typically sufficient to heat to a temperature just above the TM, thereby minimizing hydrolysis of the backbone. If Mg2þ is needed for tertiary structure formation, it is often sufficient to heat the sample to

Thermodynamic Cycles for RNA and DNA Folding

385

90 C for 2 min in 1x melt buffer without Mg2þ, cool at room temperature, then add Mg2þ and incubate at 55 C for 10 min. This is a seminal renaturation that was developed for the Tetrahymena ribozyme (Herschlag and Cech, 1990). A more detailed consideration of preparation of a uniform ribozyme population can be found in Protocol 1 of Bevilacqua et al. (2003). Step 2: Place sample in a quartz cuvette. A volume should be chosen such that the sample is higher than the height of the beam. A convenient way to check this is by using a wavelength in the visible region, inserting an index card just after the sample, and visualizing both the beam and sample with a dentist’s mirror. In anticipation of changes in sample volume as a function of temperature, we fill our cuvettes with a volume above the minimum requirement but below the maximum cuvette volume, which leaves room both for sample contraction and expansion. The sample must be covered in some way to prevent evaporation. This can be done either by placing a drop of mineral oil on the sample or by covering the cuvette with a Teflon stopper. In the latter case, we also put a small piece of Teflon tape around the stopper to ensure a good seal. Step 3: Load samples into the spectrophotometer and set up the experiment. We perform absorbance melts on a Gilford Response II UV/VIS spectrophotometer equipped with a temperature controlled multicell changer. The multicell changer allows simultaneous analysis of up to six samples, while Peltier thermoelectric heating and cooling provides temperature control. A blank containing only buffer is loaded for each sample buffer condition. Set the temperature range to be scanned and the temperature change rate. Choose temperatures to give baselines of at least 10 C if possible. Most spectrometers can heat from 0.1 to 1 C/min. A new sample should be checked at two heating rates and the same melting profiles should be obtained if the sample is in equilibrium. In addition, we often melt from low to high temperature, and then from high to low temperature to ensure reversibility of the scans, as expected for a path-independent equilibrium process. (Reverse melts should not be carried out above 75–80 C in Mg2þcontaining RNA solutions due to hydrolysis.) A typical melt for an RNA is shown in Fig. 13.5.

6.7. Melt fit equations Here we provide equations that will be called upon in the next section for nonlinear curve fitting. Other methods of fitting melt data have been discussed elsewhere, some of which are useful in specialized cases such as where baselines are poorly defined or for bimolecular melts (Puglisi and Tinoco, 1989). We assume that a homogenous folded state, F, has been prepared from the renaturation and that it unfolds in a two-state manner to

386

Nathan A. Siegfried and Philip C. Bevilacqua

0.200 0.195

Absorbance (260 nm)

0.190 0.185 y = melt1 (4E-6,0.1,0.0004,.2... Error Value 0.00035276 6.4667e-06 a 0.14967 0.00032141 b 1.7768e-05 5.002e-06 c 0.19942 d 0.0015772 −65.413 f 1.0787 72.285 g 0.089939 2.8128e-06 Chisq NA 0.99992 R NA

0.180 0.175 0.170 0.165 0.160 40

50

60

70 80 Temperature (⬚C)

90

100

Figure 13.5 Sample UV melt data and fit. Data are for unfolding a 20-nt RNA hairpin with an AA homopurine mismatch (Siegfried et al., 2007). A sigmoidal transition occurs at 260 nm with aTM of 72.3 0.09 C. Data are fit a 2-state model wherein unfolding results in 20% hyperchromicity near the melting transition. Baselines are linear, with the folded baseline having a greater slope. Fitting (solid line) was by nonlinear least-squares in KaleidaGraph using the parametric equations described in the text. Fit parameters as output by KaleidaGraph are shown in the figure along with statistical errors, an R-value, and a Chi-squared. Fit parameters are 3.5 104, 0.15, 5.0 106, 0.20, 65.4 kcal/mol, and 72.3 C, for mF, bF, mU, bU, DHo, and TM, respectively. Percentage errors are reasonable on all parameters.The percentage error on mU is large only because its value is so close to zero.Values for DS and DG 37 can be calculated from the standard thermodynamic relationships, as described in the text.

the unfolded state, U. The absorbance measured at low temperatures is therefore that of the folded state, AF. As unfolding progresses and the unfolded state populates, and assuming U and F do not interact, the absorbance changes in a manner that reflects the population-weighted average of AF and AU,

AðTÞ ¼ AU ðTÞfU ðTÞ þ AF ðTÞfF ðTÞ;

ð13:11Þ

which is simply a restatement of the Beer-Lambert law for noninteracting mixtures. Each of the four terms on the right-hand of Eq. (13.11) is a function of temperature, as described subsequently. An alternative arrangement of Eq. (13.11) is often found in the literature, A ¼ AU þ (AF – AU)fF, which is equally applicable to fitting two-state melting curves. However, we choose to use the expression in Eq. (13.11) because this form is amenable to three-state melts as well (Nakano et al., 2003).

Thermodynamic Cycles for RNA and DNA Folding

387

The fraction of a species can be written as the concentration of that species divided by the total concentration, for example,

fU ¼

½U : ½U þ ½F

ð13:12Þ

At this point we introduce a partition function, which helps describe how a molecule populates its various states. The equilibrium constant for this twostate system, defined in the direction of folding as is typical for oligonucleotides, is

K ¼ ½F=½U:

ð13:13Þ

To relate Eqs. (13.12) and (13.13), we divide the numerator and denominator of Eq. (13.12) by [U], which we call the reference state, and then substitute Eq. (13.13) to give,

fU ¼

1 ; 1þK

ð13:14Þ

where the partition function, Q is given by

Q ¼ 1 þ K:

ð13:15Þ

The partition function has two terms in it for the two states of the system. In general, Q will have as many terms as there are states of the system. The beauty of this equation is that the fraction of any species can be represented simply by putting its corresponding weight in the numerator, and Q in the denominator. For example, the fraction of molecules in the folded state is the second term in Q, divided by Q,

fF ¼

K : 1þK

ð13:16Þ

Experimental data will be fit to Eq. (13.11), but we first need to write down the temperature dependence of each of the four terms on its righthand side. Two of the four terms, absorbance of the unfolded and folded states, are commonly assumed to be linear functions of temperature, which is backed up by extensive data,

AU ðTÞ ¼ mU T þ bU :

ð13:17Þ

AF ðTÞ ¼ mF T þ bF :

ð13:18Þ

388

Nathan A. Siegfried and Philip C. Bevilacqua

The other two terms in Eq. (13.11) that are a function of temperature, fU and fF, are defined in Eqs. (13.14) and (13.16), and the term of K in these equations is a function of temperature as follows. The van’t Hoff equation,

@ ln K DHo ; ¼ R @1=T

ð13:19Þ

can be integrated from 1/TM to 1/T, and knowing that K ¼ 1 at TM, and assuming DHo is not temperature dependent, we get,

DHo 1 1 lnK ¼ ; or R T M oT DH 1 1 : K ¼ exp R TM T

ð13:20Þ ð13:21Þ

Last, we note that for a unimolecular system,

TM ¼ DHo =DSo ;

ð13:22Þ

which can be arrived at by evaluating DG ¼ DH TDS at the TM. With the preceding equations, we can fit experimental melting data of absorbance versus temperature to Eq. (13.11), as planned. In our lab, this is done by modifying the library in the nonlinear curve-fitting program, KaleidaGraph (v. 3.6) (Synergy Software) (see the subsequent section), which provides values for the six variables: mU, bU, mF, bF, DHo, and TM. The parameters DSo and DG can be obtained from DHo and TM using Eqs. (13.22) and (13.23):

DGo ¼ DHo ð1 T=TM Þ:

ð13:23Þ

It is worth pointing out why we solve directly for TM in the melts rather than DSo. Attempts to solve directly for DSo resulted in getting trapped in local minima, while solving for TM did not. Presumably this is because TM is well defined within the experimental data set, but DSo requires a long extrapolation to infinite temperature. We conclude this section by pointing out that our approach is to use Eqs. (13.11), (13.14)–(13.18), and (13.21) parametrically in the KaleidaGraph library. Although cumbersome, it is also possible to substitute Eqs. (13.14) – (13.18) and (13.21) directly into Eq. (13.11) and use this directly in the Curve Fit window of KaleidaGraph. The parametric approach, however, is more transparent, as well as applicable to systems with more than two states (Draper et al., 2000; Nakano et al., 2003).

Thermodynamic Cycles for RNA and DNA Folding

389

6.8. Nonlinear curve fitting with KaleidaGraph To fit melting data we use KaleidaGraph. Curve fitting is done to the aforementioned two-state model, using the equations derived previously. We transcribed these equations into the program’s library, which is a text file that contains preprogrammed fitting equations, with user-supplied equations at its end. This file can be found in the Kaleidagraph program by one of two ways. If a plot is open, pull down the Curve Fit menu and choose General, Library; otherwise, pull down the Macros menu and choose Library. The original Library file should be copied as a backup. Actual edits to the file should be made through the KaleidaGraph program using the pull downs mentioned previously. The following should be added to the end of the Library, where everything after the semicolon is treated as a comment: ;MELTFIT FOR U¼F ;mf¼a¼m1; bf¼b¼m2; mu¼c¼m3; bu¼d¼m4; H1¼f¼m5; Tm1¼g¼m6; T¼x¼m0 ; basef¼a*xþb; defines the folded baseline (¼lower baseline) baseu¼c*xþd; defines the unfolded baseline (¼upper baseline) K1¼exp((f/R)*(1/(gþ273.15)-1/(xþ273.15))); Q1¼1þK1; melt(a0,b0,c0,d0,f0,g0)¼((baseu)*1þ(basef )*K1)/(Q1)\;a¼a0\;b¼b0\; c¼c0\;d¼d0\;f¼f0\;g¼g0; In addition, the following portion should be added near the top of the Library, after the title and version, ;__Variable Definitions x ¼ m0; a ¼ m1; b ¼ m2; c ¼ m3; d ¼ m4; f ¼ m6; g ¼ m7; h ¼ m8; i ¼ m9; j ¼ m5; ; ;__Constant Definitions e ¼ exp(1); R¼ 0.001987

390

Nathan A. Siegfried and Philip C. Bevilacqua

The first line of the MELTFIT portion of the Library is a comment that identifies the equilibrium of interest. The second line is a comment that defines the six variables mentioned above, as well as the independent variable T. As an example, the slope of the folded baseline is ‘‘a,’’ which is assigned to m1 during curve fitting. The third line is just a spacer. The fourth and fifth lines define the folded and unfolded baselines, respectively, as per Eqs. (13.18) and (13.17). The sixth line corresponds to Eq. (13.21), in which temperature is entered in Celsius, while the seventh line corresponds to the partition function in Eq. (13.15). The eighth and final line corresponds to Eq. (13.11) with appropriate substitutions. Once the Library has been exited, the data are fit. Enter your data into the program and generate a Scatter Plot. Next select the Curve Fit menu and choose General, Edit General. Click Add and select New Fit, entering a name for this equation, such as Melt (U = F), then click Edit. A window titled General Curve Fit Definition will open. This is where the equation and initial guesses for the variables will be entered. As an example, one could enter, melt(2e-5, 1, 2e-5, 0.85, -50, 70). Lower the Allowable Error to 0.00001% to get a better fit, and choose OK. (Note that the section beginning Curve Fit will only have to be done once.) One of the most important considerations in getting a fit to the data is the initial guesses. The TM is relatively easy to estimate from the midpoint of the transition, while DHo (in kcal/mol and for folding) can be estimated from nearest-neighbor parameters (Serra and Turner, 1995). The upper and lower baselines can be estimated by fitting just the appropriate portion of the data to a line. With these guesses, it should be possible to get an excellent fit. A typical melt and fitted parameters are shown in Fig. 13.5. A final comment is in order on error propagation. Errors in DG for a given melt should be propagated from errors in DHo and TM, which are statistical errors from KaleidaGraph. We provided a full description of error propagation in the Supporting Information in Siegfried et al. (2007). In general, a given experiment should be repeated three or more times. Propagation of error into the average DGo from these trials is also provided in Siegfried et al. (2007). Errors in the coupling free energy should be propagated from Eq. (13.2) rather than Eq. (13.3) to avoid overcounting (Siegfried et al., 2007). Typical errors in the coupling free energy are 0.1 kcal/mol.

7. Concluding Remarks Investigations into RNA and DNA folding have increased tremendously over the past 10 years. New insights have been gained into the folding of RNA and DNA secondary and tertiary structures. The function

Thermodynamic Cycles for RNA and DNA Folding

391

of many proteins is dictated by cooperativity, and this is likely to be true for RNA and DNA as well. A few studies have appeared in recent years that support the presence of cooperativity in nucleic acid folding, although it does not manifest itself in all systems or all parts of a given system. The importance of cooperativity in RNA and DNA folding will be better defined as additional studies are conducted. The goal of this article was to provide conceptual and experimental frameworks for the design, implementation, and interpretation of thermodynamic cycles in nucleic acids to help further such studies.

ACKNOWLEDGMENT This research is supported by NSF Grant 0527102.

REFERENCES Antao, V. P., Lai, S. Y., and Tinoco, I. Jr. (1991). A thermodynamic study of unusually stable RNA and DNA hairpins. Nucleic Acids Res. 19, 5901–5905. Bevilacqua, P. C., and Blose, J. M. (2008). Structures, kinetics, thermodynamics, and biological functions of RNA hairpins. Annu. Rev. Phys. Chem. 59, 79–103. Bevilacqua, P. C., Brown, T. S., Chadalavada, D., Parente, A. D., and Yajima, R. (2003). ‘‘Kinetic analysis of ribozyme cleavage.’’ Oxford University Press, Oxford. Bevilacqua, P. C., Brown, T. S., Nakano, S., and Yajima, R. (2004). Catalytic roles for proton transfer and protonation in ribozymes. Biopolymers 73, 90–109. Bevilacqua, P. C., and Turner, D. H. (1991). Comparison of binding of mixed ribosedeoxyribose analogues of CUCU to a ribozyme and to GGAGAA by equilibrium dialysis: Evidence for ribozyme specific interactions with 20 OH groups. Biochemistry 30, 10632–10640. Borer, P. N. (1975). In ‘‘Handbook of biochemistry and molecular biology: Nucleic acids’’ (G. D. Fasman, ed.), Vol. 1, p. 589. CRC Press, Cleveland, OH. Cantor, C. R., and Schimmel, P. R. (1980). In ‘‘Biophysical chemistry, part II: Techniques for the study of biological structure and function’’ W. H. Freeman, San Francisco, CA. Carter, P. J., Winter, G., Wilkinson, A. J., and Fersht, A. R. (1984). The use of double mutants to detect structural changes in the active site of the tyrosyl-tRNA synthetase (Bacillus stearothermophilus). Cell 38, 835–840. Cavaluzzi, M. J., and Borer, P. N. (2004). Revised UV extinction coefficients for nucleoside-50 -monophosphates and unpaired DNA and RNA. Nucleic Acids Res. 32, e13. Chadalavada, D. M., Senchak, S. E., and Bevilacqua, P. C. (2002). The folding pathway of the genomic hepatitis delta virus ribozyme is dominated by slow folding of the pseudoknots. J. Mol. Biol. 317, 559–575. Di Cera, E. (1998). Site-specific thermodynamics: Understanding cooperativity in molecular recognition. Chem. Rev. 98, 1563–1592. Dill, K. A., and Bromberg, S. (2003). ‘‘Molecular driving forces: Statistical thermodynamics in chemistry and biology.’’ Garland Science, New York. Draper, D. E., Bukhman, Y. V., and Gluick, T. C. (2000). Thermal Methods for the Analysis of RNA Folding Pathways. In ‘‘Current Protocols in Nucleic Acid Chemistry’’

392

Nathan A. Siegfried and Philip C. Bevilacqua

(S. L. Beaucage, D. E. Bergstrom, G. D. Glick, and R. A. Jones, eds.) pp. 11.13.11–11.13.13. John Wiley & Sons, New York. Duarte, C. M., and Pyle, A. M. (1998). Stepping through an RNA structure: A novel approach to conformational analysis. J. Mol. Biol. 284, 1465–1478. Fresco, J. R., Klotz, L. C., and Richards, E. G. (1963). A new spectroscopic approach to the determination of helical secondary structure in ribonucleic acids. Cold Spring Harb. Symp. Quant. Biol. 28, 83–90. Gluick, T. C., and Draper, D. E. (1994). Thermodynamics of folding a pseudoknotted mRNA fragment. J. Mol. Biol. 241, 246–262. Gluick, T. C., Wills, N. M., Gesteland, R. F., and Draper, D. E. (1997). Folding of an mRNA pseudoknot required for stop codon readthrough: Effects of mono- and divalent ions on stability. Biochemistry 36, 16173–16186. Good, N. E., Winget, G. D., Winter, W., Connolly, T. N., Izawa, S., and Singh, R. M. (1966). Hydrogen ion buffers for biological research. Biochemistry 5, 467–477. Gray, D. M., and Ratliff, R. L. (1977). Circular dichroism evidence for G-U and G-T base pairing in poly[r(G-U)] and poly[d(G-T)]. Biopolymers 16, 1331–1342. Herschlag, D., and Cech, T. R. (1990). Catalysis of RNA cleavage by the Tetrahymena thermophila ribozyme. 1. Kinetic description of the reaction of an RNA substrate complementary to the active site. Biochemistry 29, 10159–10171. Hirao, I., Kawai, G., Yoshizawa, S., Nishimura, Y., Ishido, Y., Watanabe, K., and Miura, K. (1994). Most compact hairpin-turn structure exerted by a short DNA fragment, d (GCGAAGC) in solution: An extraordinarily stable structure resistant to nucleases and heat. Nucleic Acids Res. 22, 576–582. Kahn, J. D., Yun, E., and Crothers, D. M. (1994). Detection of localized DNA flexibility. Nature 368, 163–166. Kebbekus, P., Draper, D. E., and Hagerman, P. (1995). Persistence length of RNA. Biochemistry 34, 4354–4357. Klostermeier, D., and Millar, D. P. (2002). Energetics of hydrogen bond networks in RNA: Hydrogen bonds surrounding Gþ1 and U42 are the major determinants for the tertiary structure stability of the hairpin ribozyme. Biochemistry 41, 14095–14102. Lu, Y., Weers, B., and Stellwagen, N. C. (2001). DNA persistence length revisited. Biopolymers 61, 261–275. Mathews, D. H., Sabina, J., Zuker, M., and Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911–940. Misra, V. K., and Draper, D. E. (2002). The linkage between magnesium binding and RNA folding. J. Mol. Biol. 317, 507–521. Misra, V. K., Shiman, R., and Draper, D. E. (2003). A thermodynamic framework for the magnesium-dependent folding of RNA. Biopolymers 69, 118–136. Moody, E. M., and Bevilacqua, P. C. (2003). Folding of a stable DNA motif involves a highly cooperative network of interactions. J. Am. Chem. Soc. 125, 16285–16293. Moody, E. M., and Bevilacqua, P. C. (2004). Structural and energetic consequences of expanding a highly cooperative stable DNA hairpin loop. J. Am. Chem. Soc. 126, 9570–9577. Moody, E. M., Feerrar, J. C., and Bevilacqua, P. C. (2004). Evidence that folding of an RNA tetraloop hairpin is less cooperative than its DNA counterpart. Biochemistry 43, 7992–7998. Moody, E. M., Lecomte, J. T., and Bevilacqua, P. C. (2005). Linkage between proton binding and folding in RNA: A thermodynamic framework and its experimental application for investigating pKa shifting. RNA 11, 157–172. Murthy, V. L., Srinivasan, R., Draper, D. E., and Rose, G. D. (1999). A complete conformational map for RNA. J. Mol. Biol. 291, 313–327.

Thermodynamic Cycles for RNA and DNA Folding

393

Nakano, S., Cerrone, A. L., and Bevilacqua, P. C. (2003). Mechanistic characterization of the HDV genomic ribozyme: Classifying the catalytic and structural metal ion sites within a multichannel reaction mechanism. Biochemistry 42, 2982–2994. Proctor, D. J., Kierzek, E., Kierzek, R., and Bevilacqua, P. C. (2003). Restricting the conformational heterogeneity of RNA by specific incorporation of 8-bromoguanosine. J. Am. Chem. Soc. 125, 2390–2391. Proctor, D. J., Schaak, J. E., Bevilacqua, J. M., Falzone, C. J., and Bevilacqua, P. C. (2002). Isolation and characterization of a family of stable RNA tetraloops with the motif YNMG that participate in tertiary interactions. Biochemistry 41, 12062–12075. Puglisi, J. D., and Tinoco, I. Jr. (1989). Absorbance melting curves of RNA. Methods Enzymol. 180, 304–325. Richards, E. G. (1975). In ‘‘Handbook of biochemistry and molecular biology: Nucleic acids’’ (G. D. Fasman, ed.), Vol. 1, p. 597. CRC Press, Cleveland, OH. SantaLucia, J. Jr. (1998). A unified view of polymer, dumbbell, and oligonucleotide DNA nearest- neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95, 1460–1465. SantaLucia, J. Jr., Kierzek, R., and Turner, D. H. (1992). Context dependence of hydrogen bond free energy revealed by substitutions in an RNA hairpin. Science 256, 217–219. Sattin, B. D., Zhao, W., Travers, K., Chu, S., and Herschlag, D. (2008). Direct measurement of tertiary contact cooperativity in RNA folding. J. Am. Chem. Soc. 23, 23. Schaak, J. E., Yakhnin, H., Bevilacqua, P. C., and Babitzke, P. (2003). A Mg2þ-dependent RNA tertiary structure forms in the Bacillus subtilis trp operon leader transcript and appears to interfere with trpE translation control by inhibiting TRAP binding. J. Mol. Biol. 332, 555–574. Serra, M. J., and Turner, D. H. (1995). Predicting thermodynamic properties of RNA. Methods Enzymol. 259, 242–261. Siegfried, N. A., Metzger, S. L., and Bevilacqua, P. C. (2007). Folding cooperativity in RNA and DNA is dependent on position in the helix. Biochemistry 46, 172–181. Silverman, S. K., and Cech, T. R. (1999). Energetics and cooperativity of tertiary hydrogen bonds in RNA structure. Biochemistry 38, 8691–8702. Turner, D. H. (2000). Conformational changes. In ‘‘Nucleic acids: Structure, properties, and functions’’ (V. A. Bloomfield, D. M. Crothers, and I. Tinoco Jr., eds.) pp. 259–334. University Science Books, Sausalito, California. Xia, T., SantaLucia, J. Jr., Burkard, M. E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox, C., and Turner, D. H. (1998). Thermodynamic parameters for an expanded nearestneighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37, 14719–14735. Yu, Q., Kandegedara, A., Xu, Y., and Rorabacher, D. B. (1997). Avoiding interferences from Good’s buffers: A contiguous series of noncomplexing tertiary amine buffers covering the entire range of pH 3-11. Anal. Biochem. 253, 50–56. Zaug, A. J., Grosshans, C. A., and Cech, T. R. (1988). Sequence-specific endoribonuclease activity of the Tetrahymena ribozyme: Enhanced cleavage of certain oligonucleotide substrates that form mismatched ribozyme-substrate complexes. Biochemistry 27, 8924–8931.

C H A P T E R

F O U R T E E N

The Thermodynamics of Virus Capsid Assembly Sarah Katen* and Adam Zlotnick*,† Contents 1. Introduction 2. The Structural Basis of Capsid Stability 2.1. Virus capsid geometry 2.2. Structures of virus capsid proteins 2.3. Interaction specificity as a means of assembly regulation 2.4. Subunit interactions 3. Analysis of Capsid Stability 3.1. Macromolecular polymerization and classical nucleation theory 3.2. Thermodynamic theory of capsid assembly 3.3. Methods of analysis 4. Applications of Thermodynamic Evaluation of Virus Capsid Stability 4.1. Cowpea chlorotic mottle virus 4.2. Hepatitis B virus 4.3. Human papillomavirus 4.4. Bacteriophages 5. Concluding Remarks References

396 397 397 399 399 400 401 401 403 405 408 409 410 412 412 414 414

Abstract Virus capsid assembly is a critical step in the viral life cycle. The underlying basis of capsid stability is key to understanding this process. Capsid subunits interact with weak individual contact energies to form a globally stable icosahedral lattice; this structure is ideal for encapsidating the viral genome and host partners and protecting its contents upon secretion, yet the unique properties of its assembly and inter-subunit contacts allow the capsid to dissociate upon entering a new host cell. The stability of the capsid can be analyzed by treating capsid assembly as an equilibrium polymerization reaction, modified from the

* {

Department of Biology, Indiana University, Bloomington, Indiana, USA Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04214-6

#

2009 Elsevier Inc. All rights reserved.

395

396

Sarah Katen and Adam Zlotnick

traditional polymer model to account for the fact that a separate nucleus is formed for each individual capsid. From the concentrations of reactants and products in an equilibrated assembly reaction, it is possible to extract the thermodynamic parameters of assembly for a wide array of icosahedral viruses using well-characterized biochemical and biophysical methods. In this chapter we describe this basic analysis and provide examples of thermodynamic assembly data for several different icosahedral viruses. These data provide new insights into the assembly mechanisms of spherical virus capsids, as well as into the biology of the viral life cycle.

1. Introduction A generic virus self-assembles within the cell, packaging the viral genome, viral proteins, and relevant host partners within a protein capsid for export from the cell. Spherical virus capsids typically comprise many copies of a few or even a single gene product. As such the virus uses relatively little of its genome to code for the capsid, while the host generates the many copies of capsid subunit(s), thus exploiting the host resources to maximum effect with minimum cost to the virus. While this strategy leaves the virus dependent on the concerted assembly of hundreds of individual subunits that must recognize and specifically package the virus components with high fidelity, assembly usually proceeds to successfully form many complete and infectious virions. Besides protecting the genome, virus capsids may also be involved with other steps in the virus life cycle, interactions that are likely to be affected by capsid physical properties. Stability of protein-protein interactions is required for assembly. However, many viruses uncoat in response to exogenous signals to release their genome (e.g., poliovirus, Flockhouse virus). This requires sampling the environment, often accomplished by transiently exposing internal components of the virus. In many viruses (e.g., Hepatitis B virus) transiently exposed signals play a role in intracellular trafficking (Ganem and Prince, 2004). Thus, at odds with the need for stability, the transient exposure of internal residues implies that a capsid is a flexible, dynamic structure (Bothner et al., 1998; Hilmer et al., 2008; Lewis et al., 1998). The seemingly contradictory roles of protection and release imply a physical chemical explanation that should be evident in the thermodynamic stability of the capsid. In this contribution we describe methods for determining a thermodynamic description of virus capsid assembly based on treating assembly as a polymerization reaction. Values determined experimentally are consistent with in silico assembly—master equations (Zlotnick et al., 1999), discreet event simulation (Zhang and Schwartz, 2006), and coarse grained molecular

397

Evaluating Virus Stability

dynamics—that are outside the scope of this review (Hagan and Chandler, 2006; Nguyen et al., 2007; Rapaport, 2004). Within the scope of this review, we examine the thermodynamics of assembly of several viruses and emphasize the underlying physical chemistry.

2. The Structural Basis of Capsid Stability 2.1. Virus capsid geometry Most small virus capsids take the shape of rods or spheres. More than 50 years ago it was first observed that tomato bushy stunt virus was icosahedral (Caspar, 1956), in support of the hypothesis that capsids are constructed of symmetrical repeating arrays of smaller subunits (Crick and Watson, 1956). It has since been observed that almost all spherical viruses are based on icosahedral symmetry. An icosahedron is a geometric solid comprised of 20 equilateral triangular facets (Fig. 14.1A). An icosahedron has 12 vertices, each corresponding Twofold

A

Threefold Fivefold

B T=7 T=3 T=1

T=4

Figure 14.1 Icosahedral Symmetry in Virus Capsids (A) Icosahedra and icosahedral symmetry, with views down symmetry axes (Reprinted from JMR 8(6):480 Copyright 2005 byWiley Interscience). (B) Selected icosahedral facets, drawn to show quasi equivalence in icosahedral assembly (Reprinted from PNAS, 101(44):15540, copyright 2004 by Proceedings of the National Academy of Sciences).

398

Sarah Katen and Adam Zlotnick

to a five-fold symmetry axis. Ten threefold symmetry axes pass through the 20 triangular faces, and there are 15 two-fold axes that pass through the 30 edge-to-edge contacts between each facet. Viruses with no apparent common host or ancestry have independently evolved to form this structure. An icosahedron is a simple repeating lattice of a single subunit, allowing a proportionally small surface area to enclose a large volume (Crick and Watson, 1956). For a virus, this means that a relatively small protein provides sufficient space to package the gene that encodes it as well as the rest of the genome. As a protein cannot have the intrinsic three-fold symmetry of an equilateral triangle, one facet of an icosahedral capsid must consist of at least three proteins. Thus, virus capsids are assembled from 60 subunits (3 proteins 20 facets ¼ 60 proteins) or a multiple thereof. The theory of quasi equivalence (Caspar and Klug, 1962) provides a mechanism for building larger icosahedra: triangular facets can be extracted from a hexagonal lattice such that the vertices of each facet fall on lattice vertices, and each facet encompasses an integral number of repeating units. The resulting icosahedra display a mix of strict icosahedral five-fold axes and an array of quasi-six-fold axes (axes with local but not global sixfold symmetry). Quasi equivalence is described in terms of a T-number (triangulation number), where T is the number of subunits in the repeating asymmetric unit, three of which form a triangular icosahedral facet (Fig. 14.1B); T multiples of 60 asymmetric units make up the icosahedron. An icosahedral T number is restricted to an integer that fits the equation:

T ¼ h2 þ hk þ k2 ;

ð14:1Þ

where h and k are integers, with h 1 and k 0; the restriction of the T number to fit this equation retains the quasi equivalence of the virus structure. The subunits forming an icosahedral facet are termed quasi-equivalent, as the repeated protein is in similar but different environments (Caspar and Klug, 1962). The same contacts that form a hexagonal sheet can, through slight structural and environmental variation, form the five-fold contacts required to induce curvature in the otherwise-flat hexagonal lattice (Caspar, 1980). With this freedom to arrange identical subunits into similar functional geometries, a relatively small protein can enclose a very large volume. Some spherical viruses stretch the rules of quasi equivalence. The polyoma- and papillomaviruses are assembled exclusively from pentamers, which cannot support the hexagonal repeat required for quasi equivalence. And yet, upon the structure determination, it was shown that 12 pentamers occupied the predicted five-fold vertices and 60 pentamers occupied hexavalent positions predicted for a T ¼ 7 quasi-equivalent lattice, with distinctly nonequivalent interactions of the identical chains within an icosahedral asymmetric unit (Baker and Caspar, 1984; Baker et al., 1989). Another variation was observed in the como- and picorna viruses; the

Evaluating Virus Stability

399

subunits in the icosahedral asymmetric unit of these viruses are structurally similar but not identical, resulting in a pseudo-T ¼ 3, or P ¼ 3, lattice (Rossmann and Johnson, 1989).

2.2. Structures of virus capsid proteins Evolutionary forces seem to have favored a few structural motifs in virus capsid proteins. Most small, nonenveloped viruses share common protein fold: an eight-strand antiparallel b-barrel. This ‘‘jelly-roll’’ capsid protein is found across RNA picornaviruses, DNA parvoviruses, unrelated families of noneveloped RNA plant viruses, DNA polyomaviruses and papillomaviruses, and some DNA bacteriophages (Rossmann and Johnson, 1989). However, not all nonenveloped spherical viruses share this b-barrel subunit fold. The small RNA bacteriophage MS2 has a five-strand b-sheet flanked by two C-terminal alpha-helices (Golmohammadi et al., 1993); this fold is shared by other members of the Leviviridae family, such as Qb, R17, and PP7. Many large, multicomponent phages (and also the herpesviruses) share a common fold that was first discovered in bacteriophage HK97 ( Jiang and Lin, 2003; Wikoff et al., 1998). In the hepatitis B virus (HBV), an enveloped DNA virus, the capsid protein has a unique alpha-helical fold (Wynne et al., 1999); HBV has been one of the major systems used for investigating virus assembly thermodynamics. Despite the advantages of stability and structure afforded by the icosahedral geometry, retroviruses, which also have a predominantly helical capsid protein, are notable for their irregular capsids. However, even these viruses display some measure of symmetry. The mature cores of the human immunodeficiency virus (HIV) have the form of a fullerene cone (Ganser et al., 1999); however, the immature virions display a patchy hexagonal lattice suggestive of quasi equivalence (Wright et al., 2007).

2.3. Interaction specificity as a means of assembly regulation Despite the inter-subunit contact degeneracy that allows a single subunit to form multiple types of interactions, a virus nonetheless must assemble with high fidelity to successfully form an infectious particle. In some cases changes to capsid geometry may be a means of assembly regulation (Maxwell et al., 2002). Alternatively, Flockhouse virus uses RNA-peptide interactions as a switch for conformation change (Dong et al., 1998). In the case of cowpea chlorotic mottle virus, a conformational switch affects correct assembly, directing toward the correct T ¼ 3 assembly products rather than toward aberrant T ¼ 1 or pseudo T ¼ 2 particles (Tang et al., 2006). Scaffold proteins in phages fX174 and P22 constrain conformational change and direct the coat into the correct icosahedral shape (Fane and Prevelige, 2003).

400

Sarah Katen and Adam Zlotnick

Correct capsid geometry and the interaction stability becomes a matter not merely of formation of a shell for packaging but also of assembly regulation. Subunits require a switch to activate them for assembly, package the correct material, and select the right conformation for the quasiequivalent environment. Disruption of the native geometry thus becomes a serious liability to a virus, disrupting the biological timing of assembly or resulting in aberrant assembly that fails to package the correct viral contents (Stray et al., 2005).

2.4. Subunit interactions By definition an icosahedron requires both hexameric and pentameric contacts to build a complete structure; contact domains must be capable of sustaining interactions in the necessarily different local geometries (Caspar, 1980). The assembly of virus capsids tends to be driven by the burial of hydrophobic surface area at the inter-subunit contact points (Bahadur et al., 2007); this predominance can be seen in the VIPER database of virus structures (Reddy et al., 2001). The burial of the hydrophobic residues at the subunit interfaces is consistent with the observation that assembly (of at least those viruses that have been rigorously tested) is driven by an increase in system entropy (i.e., despite the ordering of the capsid subunits, the burial of the hydrophobic regions results in a more disordered solvent) (Ceres and Zlotnick, 2002; Prevelige et al., 1994). On average a single capsid inter-subunit contact buries some 1750 A˚2 of surface area; this is less than one would find in a typical homodimeric protein complex but significantly more than a simple crystal contact (Bahadur et al., 2007). This relatively small contact area argues for the formation of inherently unstable assembly intermediates that are stabilized upon completion of a network of otherwise relatively weak interactions (Bahadur et al., 2007; Ceres and Zlotnick, 2002). The hepatitis B virus capsid is the system for which many of the thermodynamic models of assembly were first applied and will be used ˚ here as an example. The structure of this capsid has been solved to 3.3 A (Wynne et al., 1999); HBV forms a T ¼ 4 capsid based on hydrophobic contacts, with each asymmetric unit comprised of two copies of the homodimeric capsid protein (with a small population of 90-mer T ¼ 3 capsids). Assembly is driven by increasing ionic strength, temperature, and/or capsid protein concentration. In the crystal structure, it was observed that the inter-subunit contacts were distinct and separate, facilitating an estimation of buried surface area and contact energy (Ceres and Zlotnick, 2002). On ˚ 2. average a single HBV subunit-subunit contact buries 1500 A

Evaluating Virus Stability

401

3. Analysis of Capsid Stability 3.1. Macromolecular polymerization and classical nucleation theory While the process of macromolecular polymerization has been studied extensively, the models for describing these infinite reactions fall short of describing virus capsid assembly. At the most basic level, models of assembly of macromolecular polymers are methods of describing a large population of discrete association reactions. Classical nucleation theory is built on the theoretical platform of an open-ended infinite polymer or lattice. Assembly begins with a nucleation event, and the polymer is then elongated through a series of faster and/or more stable additions of subunit, until an equilibrium between the polymer and the free subunits is reached (Frieden, 1985). As the ends of the polymer or lattice remain open, subunits can freely associate and dissociate from the polymer, reflecting a critical concentration of subunit that must be present to maintain the polymer as a unique phase (Zlotnick, 2005). In a simplest case of classical polymerization, the initial rate of nucleus formation is determined by the rate of forward nucleation and the (assumed first-order) rate of nucleus dissociation and can be expressed as follows:

d½nucleus=dt ¼ knucleation ½subunitn kdissoc ½nucleus:

ð14:2Þ

Nuclei are the first species form in the reaction and are subsequently depleted as they elongate into polymers. Once a nucleus has been formed, elongation can proceed at the ends of the molecule by the stepwise addition of equivalent subunits.

d½ends=dt ¼ kassoc ½ends½subunit kdissoc ½ends:

ð14:3Þ

When the polymerization reaction reaches steady state, the net growth of the molecule is 0; thus the critical concentration of free subunit remaining in solution can be expressed as:

½ends½subunit=½ends ¼ kdissoc =kassoc :

ð14:4Þ

In this classical view of macromolecular polymerization, a polymer grows from a single nucleation event, and at equilibrium the final polymer’s ends remain in a state of flux but achieve no net growth. A reaction initiated by a single nucleus gives rise to the sigmoidal kinetics (Fig. 14.2A), where there is a lag phase while the nucleus forms, followed by rapid addition of subunits, and ending with a flat steady-state region wherein the critical subunit concentration has been reached.

402

Sarah Katen and Adam Zlotnick

A

Filament/crystal polymerization

Elongation

Equilibrium

Assembly

One nucleation event

Time

B

Capsid polymerization

Elongation/equilibration

Assembly

Establish Nucleation events ongoing during assembly steady state of intermediates

Time

Rate

dnucleation/dt

delongation/dt

dcapsid/dt Time

Figure 14.2 Modeling Capsid Assembly: (A) Classical nucleation model, wherein a single nucleus may give rise to a single infinite polymer. (B) Capsid assembly kinetics, wherein each discrete capsid must arise from an individual nucleus. Top: Typical lightscattering trace of assembling capsid population. Bottom: Representation of reaction rate continuum within the population (A and B, top reprinted from JMR 8(6):483 Copyright 2005 byWiley Interscience).

Evaluating Virus Stability

403

Superficially, capsid growth kinetics display a similar pattern (Fig. 14.2B, top), and the classical polymerization model has been applied to capsid growth (Zandi et al., 2006). However, we suggest that a very different reaction is taking place requiring a different mathematical treatment (Zlotnick, 1994). First, the very nature of assembly of a capsid population is distinct from the assembly of a single large polymer; capsids are discrete, individual units. Each individual capsid is the endpoint of a unique nucleus, rather than the entire population growing from a single point as with a classic polymer. Moreover, capsids do not exist in the same state of flux with open ends, but are closed, complete, and distinct structures. In a classical infinite polymer, all ends are equivalent, whereas the gradual completion of the capsid requires a progressive change in the average number of ligands per subunit, which engenders an apparent cooperativity of assembly. Last, nuclei and capsid intermediates continually form during the reaction, approximating a steady-state concentration of free subunits and intermediates through all but the initial stages of the reaction (Fig. 14.2B, bottom). These distinct differences in capsid polymerization as compared to the classical infinite polymerization define a specific formalism for capsid assembly.

3.2. Thermodynamic theory of capsid assembly The capsid assembly reaction can be simply described as:

Subunits , nuclei þ subunits , capsids:

ð14:5Þ

This reaction appears very similar to the assembly of an infinite polymer. The assembly of an individual capsid initially follows the classical nucleation scheme, with a single nucleation event followed by the addition of subunits until the icosahedron is completed. However, within a population of assembling capsids, these reactions are concurrent but not synchronized and sum as shown in Fig. 14.2B (Zlotnick, 2005). With a classical polymer, the lag phase ends upon formation of the polymer nucleus. In capsid assembly, the apparently lag phase is characterized by the formation of many individual nuclei that then begin to elongate into progressively larger and more complete intermediates. The lag phase ends when an assembly line of intermediates is complete and capsids begin to accumulate. The elongation phase of a classical polymer is characterized by the addition of subunits to the single nucleus. However, in capsid assembly, the rapid growth phase is characterized by continuing and concurrent nucleation and elongation with the subsequent formation of capsids. The reaction gradually slows as the free subunit population is consumed, eventually plateauing when the free subunit concentration reaches the pseudocritical point.

404

Sarah Katen and Adam Zlotnick

The assembly of a population of capsids is thus characterized by concurrent reactions for nucleation, elongation, and capsid completion. The phases apparent in a light scattering experiment (Fig. 14.2B) do not correspond to particular steps in the assembly reaction. All reaction steps have a certain sameness as the individual capsid assembly steps overlap through the reaction, with intermediates forming and then being consumed in the formation of progressively larger, more complete intermediates; all nuclei with one subunit added are used to form nuclei with two subunits added, and so forth, to a complete capsid. Thus, the rate of accumulation of an intermediate of a given size depends on the rates of formation of smaller nuclei and the rates of dissociation of subunits of larger size, described as follows:

d½nuc þ n=dt ¼ kelong;n1 ½nuc þ n 1½subunit þ kdissoc;nþ1 ½nuc þ n þ 1½subunit kelong;n ½nuc þ n½subunit þ ðother termsÞ: ð14:6Þ As these are reversible reactions and may follow multiple paths, ‘‘other terms’’ encompasses all of those other association and dissociation reactions that describe the reversible reactions leading to and from an intermediate of size nucleus þ n subunits. As the reaction proceeds, it approaches steady-state concentrations of intermediates. Ideally, assembly reactions gradually reach equilibrium with the final concentration of capsid dependent on stability, which for a capsid of N subunits can be described in terms of an overall association constant expressed as:

Kcapsid ¼ ½capsid=½subunitN :

ð14:7aÞ

For practical reasons, Eq. (14.7a) is calculated in its logarithmic form:

logðKcapsid Þ ¼ log ½capsid N logð½subunitÞ :

ð14:7bÞ

Kinetic trapping is a critical barrier to reaction equilibration. This can result from conditions that (1) cause excessively high association energies, which in turn can deplete the subunit pool without successfully forming complete capsids (Endres and Zlotnick, 2002), (2) where nucleation occurs so rapidly as compared to elongation that there are many metastable intermediates even in the presence of free subunits (Stray et al., 2004), or (3) when off-path assembly leads to formation of metastable intermediates, intermediates that may even be icosahedral ( Johnson et al., 2005; Tang et al., 2006). Given equilibration, Eq. (14.7) demonstrates a steep Nth power dependence on concentration; however, this does not limit the maximum

405

Evaluating Virus Stability

concentration of free subunit (i.e., a critical concentration), but in practical terms it generates a pseudocritical concentration. The value Kcapsid is a global capsid association constant, and thus is very large, in units of MN, and impractical for calculations. From the value of Kcapsid, one can derive two terms that are much more convenient. The first is the apparent dissociation constant for a complete capsid, KD,apparent, which is approximately equal to the pseudocritical concentration and is defined as the point when the concentrations of subunit equal that of capsid, written as:

KD;apparent ¼ ðKcapsid Þ1=1N :

ð14:8Þ

Alternatively, one can partition Kcapsid into the average association energy between two subunits; one now has in hand a small term that can be used to evaluate the stability of a single inter-subunit contact. If one assumes that all contacts in the virus capsid are identical, then Kcapsid can be expressed in terms of a statistical factor accounting for reaction degeneracy and some power of the single contact equilibrium constant, Kcontact. The number of contacts forming a capsid, the exponential term for Kcontact, is cN/2, where N is the number of subunits, c is the number of contact surfaces per subunit, and the factor of one-half is to account for the fact that a single subunit accounts for only half of each pairwise interaction. The statistical factor accounting for j-fold degeneracy with N subunits is of the general form jN1/N. Thus, Kcontact can be expressed in terms of Kcapsid by the following equation:

Kcapsid ¼ ð jN 1 =N ÞðKcontact ÞcN =2 :

ð14:9Þ

The association energy per contact is:

DGcontact ¼ RT lnðKcontact Þ

ð14:10Þ

Using an assembly-derived DGcontact, one can readily dissect the energy of association into its entropic and enthalpic components by van’t Hoff analysis, illuminating the driving forces behind capsid assembly.

3.3. Methods of analysis The thermodynamic parameters of capsid assembly can be easily determined from the concentration free subunits and assembled capsids in an equilibrated assembly reaction. However, it is necessary to verify that the reaction has in fact reached equilibrium. The ideal demonstration of equilibration would be comparison of assembly to disassembly to show that both reactions reach the same end point. Unfortunately, this generally does not work due to

406

Sarah Katen and Adam Zlotnick

the hysteresis to dissociation, which is believed to derive from the difficulty of removing the first few subunits from the capsid (Singh and Zlotnick, 2003) as well as depletion of intermediates (Bruinsma et al., 2003). A timecourse measurement of the amount of reactants and products can give a strong indication that a reaction has reached steady state (e.g., Fig. 14.4B). It is also necessary to ensure that the reaction is not kinetically trapped. Kinetic traps may be present when there is sufficiently high association energy such that very little free subunit remains in solution, when there are many intermediates but few capsids, or when there is off-path assembly. These last two possibilities can still be analyzed by treating the concentration of free subunit as a critical concentration (Bourne et al., 2008). Perhaps the most obvious and simple method for evaluating assembly reactions is size-exclusion chromatography (SEC) (Ackers, 1970). Provided that capsids do not dissociate during chromatography, an equilibrated assembly reaction can be separated into reactants and products (Fig. 14.3A, inset, and Fig. 14.4A). Once the equilibrated reaction components have been separated, the integrated areas of the chromatogram peaks are proportional to the total protein concentration that is assembled or free in solution (Ceres and Zlotnick, 2002; Mukherjee et al.; Zlotnick et al., 2000). SEC fractionation followed by quantitative gel densitometry was used to quantify the assembly products of multicompetent icosahedral virus-like particles of bacteriophage P22 (Parent et al., 2006b). After assembly, reactions were fractionated, and the concentration of free subunits and scaffold were quantified by densitometry of an SDS polyacrylamide gel. In the case of phages like P22, the role of the scaffold protein is an added concern for evaluation. Differential scanning calorimetry (DSC), only rarely applied to viruses, was used very successfully with bacteriophage HK97. The energy required to dissociate a capsid can be directly measured. In the case of HK97, DSC was also used to monitor the stability of the intermediate forms of the capsid head found in the maturation process of HK97, allowing further quantification of energetics of capsid function that tend to be otherwise experimentally inaccessible (Conway et al., 2007). An advantage to this method is that the addition of heat can allow one to overcome the barriers of hysteresis to dissociation, allowing for measurement of both assembly and disassembly under similar conditions and without the use of denaturants. However, difficulties can arise when there is little separation of the transitions for disassembly and unfolding of the subunits themselves, as the two reactions can occur simultaneously (Galisteo and King, 1993; Wingfield et al., 1995). An added complication, particularly evident with phages, is that maturation may result in irreversible conformational changes that will skew the determination of capsid stability. Another classic method of measuring protein stability, denaturant titration, is also fraught with these limitations when quantifying capsid stability. Titrations of HBV capsid with GuHCl or urea did show a two-phase transition, demonstrably the separate dissociation and unfolding reactions

407

Evaluating Virus Stability

A

B 7

Dimer

[dimer] in each pool, mM

[dimer] in each pool, mM

Capsid 15

6 10

5

8

4 3 2 Dimer mM Capsid mM

1

Dimer

15

6

10 8

5 4 3 2 1

T=1 T2&T3 Dm

0

0 2

4 6 8 10 [total dimer], mM 20 min incubation 0

C

Capsid

7

0

4

8

12

0

CP dimer : RNA1 12 16 20 24 30 40 50 70

2

4 6 8 10 [total dimer], mM

12

90 100 120 150 Virus

Virus RNA1

Figure 14.3 The Mechanism of Assembly of CCMV. (A) Assembly induced by acidification illustrates the critical concentration of protein required for assembly, as quantified by SEC (inset). (B) Assembly induced by acidification of the CCMV ND34 mutant capsid protein shows kinetic trapping and failure of the reaction to reach equilibrium (Reprinted from Nano Letters, 5(4):766, copyright 2005 byAmerican Chemical Society). (C) Titration of CCMV RNA1 with CCMV CP showing weak cooperativity of assembly (Panels A and B reprinted from JMB 335:456, copyright 2004 from Elsevier).

(Singh and Zlotnick, 2003). However, the calculated contact energies from the denaturant titrations returned contact energy values roughly twice those determined from assembly studies. This and the change of contact energy with subunit concentration indicated that denaturant titration did not effect a true equilibrium between capsids and subunits, but merely demonstrated the hysteresis of disassembly that prevented accurate measure of subunit contact energies (Singh and Zlotnick, 2003). It should be noted that indirect signals have also been correlated with assembly and can be useful tools for examining these reactions. The intrinsic fluorescence and/or anisotropy of tryptophan and/or tyrosine may change during assembly correlating changes in the burial and/or mobility of these residues (Da Poian et al., 1993; Prevelige et al., 1994; Silva and Weber, 1988; Singh and Zlotnick, 2003). A recently described assay used proteins

408

Sarah Katen and Adam Zlotnick

A 1.0

107

Refractive index

0.8 106

0.6 0.4

105 0.2 0

B

0

5

10 Volume (mL)

104 20

15

50 F97L

Capsid fraction, (%)

40 30 20

wt 10 0

0

4

8

12 16 Time, (h)

20

24

Figure 14.4 Quantifying the thermodynamics of HBVcapsid assembly. (A) SEC of a typical assembly reaction of 15 uM HBVCp,0.3 M NaCl (Reprinted with permission from Biochemistry 41(39):11527 copyright 2002 byAmerican Chemical Society). (B) Comparison of assembly kinetics of the wild-type capsid protein to the F97L mutation, which displays more rapid assembly into more stable capsids as compared towild-type (Reprinted fromJ.Virology 78:9540 copyright 2004 byAmerican Society for Microbiology).

labeled with a self-quenching fluorophore, BoDIPY-FL, so that assembly correlated with the loss of fluorescence, thus generating a high-throughput screen for assembly-directed antivirals (Zlotnick et al., 2007).

4. Applications of Thermodynamic Evaluation of Virus Capsid Stability With the basic framework outlined, one can apply the described methods to a wide array of viruses. The results have shed light on certain aspects of virus capsid behavior that had previously been experimentally inaccessible.

Evaluating Virus Stability

409

4.1. Cowpea chlorotic mottle virus Cowpea chlorotic mottle virus (CCMV) is a nonenveloped RNA virus that forms T ¼ 3 capsids comprised of 90 homodimeric subunits. In vitro the purified capsid proteins can be induced to form a variety of empty polymorphic structures by altering the chemical environment, specifically pH and ionic strength (Bancroft et al., 1967). This has made it a popular system for the study of icosahedral capsid assembly. Early studies of CCMV assembly have been expanded and quantified. Purified dimer was induced to assemble by lowering pH with sodium acetate. Equilibrated reactions were then purified by SEC and capsid and dimer was quantified (Fig. 14.3A, inset). Consistent with predictions, at very low concentrations capsids did not assemble. With increasing total protein concentrations, capsid was observed and free dimer reached a pseudocritical concentration of approximately 6 uM (Fig. 14.3A). Lower concentration reactions near the pseudocritical concentration tend to be slower to equilibrate and thus can result in less accurate measurement of Kcontact. From this SEC data, it was determined that the average pairwise association energy between subunits was very weak, only 3.1 kcal/mol at pH 5.25, 3.4 kcal/mol at pH 5.0, and 3.7 kcal/mol at pH 4.75. Nonetheless, in acidic solutions, capsids were globally stable. These data can relate the free energy of the inter-subunit contacts to the biology of the virus. Under physiological conditions, the protein-protein interactions are very weak, and interactions between protein and nucleic acid are relatively strong ( Johnson et al., 2004; Mukherjee et al., 2006). Thus, interaction with viral RNA is expected to provide most of the support for assembly, and the cooperativity resulting from protein-protein interaction is expected to be weak. This was borne out by an electrophoretic mobility shift assay, where CCMV RNA was titrated with increasing amounts of capsid protein (Fig. 14.3C) ( Johnson et al., 2004). Increasing concentrations of capsid protein resulted in a gradual shift of the RNA band. Strong protein-protein interactions were predicted to lead to a highly cooperative assembly, resulting in capsids and bare RNA. However, intermediate concentrations of protein resulted in uniform intermediate migration of the RNA on the gel, and half-capsids were visible in micrographs. Studies with CCMV also demonstrated the nature of kinetic traps ( Johnson et al., 2005). While the complete capsid protein exclusively forms T ¼ 3 particles, a mutant lacking the N-terminal 34 amino acid residues yields a mixed population of T ¼ 1 (1.1 MDa), pseudo-T ¼ 2 (2.0 MDa), and T ¼ 3 capsids that can be observed by the uneven capsid peak (Fig. 14.3B, inset). It was observed that the reaction never reached equilibrium, even after five days of incubation. There was clearly no critical concentration, seen by the absence of a plateau concentration of free subunit concentration (Fig. 14.3B). It was proposed that the loss of this

410

Sarah Katen and Adam Zlotnick

N-terminal domain results in a greater freedom of motion of the angles between the capsid subunits and a loss of control of inter-subunit geometry. Interestingly, the pseudo-T ¼ 2 particles are extremely heterogeneous, suggesting a broad and poorly defined energy minimum (Tang et al., 2006).

4.2. Hepatitis B virus The hepatitis B virus (HBV) is the model system on which the bulk of our quantitative thermodynamic analyses have been performed. HBV is an enveloped DNA virus with an alpha-helical capsid protein that forms a T ¼ 4 capsid (Ganem and Prince, 2004). In vitro a truncated form of the capsid protein consisting of only the assembly domain (residues 1–149) can be assembled into complete capsids of 120 homodimeric subunits. The rates and extents of reactions depend on ionic strength, capsid subunit concentration, and reaction temperature. Amounts of reactants and products for an HBV assembly reaction were usually quantified by SEC (Fig. 14.4A), though light scattering and fluorescence were also used semiquantitatively (Bourne et al., 2008). A time course handily showed a reaction that was closely approaching equilibrium within 24 h (Fig. 14.4B). Results from assembly and van’t Hoff analyses show assembly is entropy driven and heavily compensated, consistent with burial of hydrophobic surface (Table 14.1) (Ceres and Zlotnick, 2002). These results demonstrated a surprising disagreement between experimental results and predictions based on structure. Structure-based estimation of association energy depends on buried surface area. The average value published on VIPER is 15 kcal/mol for each subunit of the capsid. A similar approach with different parameterization (Baker and Murphy, 1998) yielded values of about 9 kcal/mol. The calculated association energies suggest extraordinarily stable capsids. However, Table 14.1 shows that the experimentally determined average values for this contact energy is between 3 and 4 kcal/mol, values where both dimer and capsid concentrations are easily measured. Where does this the error in calculated association energy come from? It may be simply due to an incorrect parameterization in energy calculations; as described by Bahadur et al., virus contacts are not as tightly packed as a typical dimerization contact. Also, the experimental values may reflect the energetic cost of conformational change from a hypothetical solution structure to the observed capsid structure (Horton and Lewis, 1992). The inter-subunit contact energies determined experimentally are so weak that, if taken on an individual basis, one would doubt their specificity. These results have led us to suggest that a virus is a dynamic structure, with the capsid stability maintained by its geometry and the sum of its icosahedral network of many weak individual contacts (Zlotnick, 2003). The physiological basis of evolutionary selection in viruses can also have a thermodynamic basis. A common naturally occurring mutant of HBV, F97L, resulted in increased virion production and secretion of immature virus

Table 14.1 Thermodynamics of HBV assembly: Experimental and calculated valuesa [NaCl], M Energy per contact

DG (kcal/mol) DH (kcal/mol) TDS (kcal/mol) DS [cal/ (mol deg)1] KD,apparent

a

0.15

0.3

0.5

0.7

Calculated

3.1 0.1 þ2.0 1.0 þ5.1 1.1 þ17 3.7

3.7 0.2 þ4.3 0.4 þ8.0 0.6 þ27 2.0

3.7 0.2 þ6.1 0.8 þ9.8 1.0 þ33 3.3

4.0 0.2 þ6.2 0.2 þ10.1 0.4 þ34 1.3

8.5 0.8 þ9.4 2.6 þ18 3.7 þ60 12

14 mM

1.9 mM

1.8 mM

0.77 mM

0.14 pM

Data reported from Ceres and Zlotnick (2002).

412

Sarah Katen and Adam Zlotnick

particles in vivo. Quantitative analysis showed that this conservative mutation led to greater enthalpy and compensating entropy per contact, resulting in a substantially more negative DGcontact at 37 C, than wild-type (Fig. 14.4B). This result indicates a selective advantage for this mutation in chronic infections where viral protein production is attenuated (Ceres et al., 2004). Thermodynamic analysis can contribute to our understanding of the mechanism of putative antiviral drugs that target the assembly pathway. The heteroaryldihydropryimidines (HAPs) are a class of compounds that decrease viral titer in cell culture (Deres et al., 2003). SEC showed that HAPs drove assembly but resulted in the formation of aberrant products larger than normal capsids. Transmission electron microscopy showed that reaction products were large polymers dominated by hexagonal sheets of capsid protein. The association energy between subunits is increased in the presence of HAPs by as much as 2 kcal/mol (Bourne et al., 2008). Crystallographic studies of HBV capsids complexed with a HAP showed the small molecule filling a gap in the interdimer contact that led to a global alteration of quaternary structure—but not tertiary structure—that flattened quasi six folds and caused five-fold vertices to protrude (Bourne et al., 2006). The thermodynamic effect of the HAPs is expected (and observed) to affect assembly kinetics as well as the extent of assembly. Combined with their assembly misdirecting effect, the HAPs may represent a new mechanism for antiviral therapeutics.

4.3. Human papillomavirus Papillomaviruses form icosahedral arrays of pentameric subunits in a T ¼ 7 lattice, with pentamers occupying both the penta- and hexavalent environments. The C-terminal arms of the subunits interlock with adjacent pentamers and also participate in formation of disulfide crosslinks in the assembled capsid. Because it is not feasible to parse out the individual pairwise interactions, its was chosen to report the association energy per pentamer, which closely corresponds to the KD,apparent (Mukherjee et al., 2008). This average association energy per pentamer was between 8 and 10 kcal/mol—a very narrow range, but in agreement with the weakly associated capsid subunit model. Critical to this analysis, reversible assembly was induced under reducing conditions, indicating that formation of the disulfide crosslinks was not prerequisite to assembly.

4.4. Bacteriophages Bacteriophage P22 is a tailed phage, but empty icosahedral capsids can be assembled in vitro from coat and scaffold protein. These capsids comprise 420 coat protein molecules that assemble on a scaffold of between 100 and 300 scaffold protein molecules; these are a form of assembly chaperone and are unnecessary for the structure of the final assembled capsid (King et al., 1976).

Evaluating Virus Stability

413

Prevelige et al. observed a pseudocritical concentration for assembly, but analysis is complicated because of the energetic contributions from the scaffold protein, which binds nonstoichiometrically to the coat. The thermodynamic contributions of these scaffold proteins were examined through the use of SEC and gel densitometry, which were used as means to quantify reactants and products (Parent et al., 2006b). The observed exchange of free and bound coat protein in P22 capsids uniquely demonstrated that the assembly reaction is in equilibrium (Parent et al., 2006a). From these analyses it was determined that both the coat and scaffolding proteins contribute relatively weak association energies, but the network of all contributions result in a globally stable capsid, with each capsid protein subunit contributing 7.2 kcal/mol, and each scaffold protein contributing 6.1 kcal/mol. These values are per protein, rather than per contact, due to ambiguity in how to dissect protein-protein interactions. Like HPV, these per protein energies roughly correspond to the KD,apparent. At lower scaffold to capsid protein ratios, a small increase in the amount of scaffold present dramatically increased the number of capsids formed. However, at high ratios, scaffold concentration increases resulted in a markedly lower yield of capsids due to overnucleation (Parent et al., 2006b). The thermodynamics of bacteriophage HK97 were analyzed not by their assembly but through their disassembly and denaturation by DSC, allowing for comparison between the two methods. This contrast in methods is particularly informative as the answers are surprisingly similar. In vitro assembled bacteriophage HK97 capsids are T ¼ 7 icosahedra consisting of capsid subunits comprised of fused assembly and scaffold domains as well as a viral protease. After assembly of a functional prohead from a mix of pentamers and hexamers, the protease cleaves the scaffold domains, and the capsid undergoes a structural change upon DNA packaging, expanding via multiple intermediates into a final mature head, a process that can be mimicked in vitro by acidification (Duda et al., 1995). Maturation is characterized first by the cleavage event, then by a series of reorganizations of the capsid subunits, and finally by the formation of covalent crosslinks (Ross et al., 2006). DSC was used to examine the stability of each of the different prohead states and to determine the thermodynamic contributions of the different assembly proteins. From the peak temperatures of each denaturation isotherm, it is apparent that each subsequent structure in the maturation process has a greater thermal stability. The contribution of the cross-linking step to the stability of the mature head was investigated using a K169Y mutant, which was unable to cross-link. This mutant had the same stability as the wild-type for all steps prior to cross-linking, but significantly lower stability after that maturation step due to the loss of the cross-links. The thermodynamic parameters of the virus structures were determined from the peak areas of the thermograms. The contact energy for Prohead I, the

414

Sarah Katen and Adam Zlotnick

first assembly product, is around 2k cal/mol per pairwise interaction, in excellent agreement with the weak association model for virus assembly. Each subsequent structure in the maturation process had a lower free energy with a high barrier in between steps; thus, the lowered energies serve to lock the capsid in subsequent conformation (Ross et al., 2006).

5. Concluding Remarks The thermodynamic stability of a virus capsid is the underlying means by which the capsid fulfills its roles in the virus; understanding the mechanisms of stability provide insight into mechanisms of capsid function in the virus life cycle, as well as providing a means for the design of new antiviral drugs. The WIN family of compounds bind poliovirus post-assembly to entropically stabilize it to prevent genome release and infection (Tsang et al., 2000). More recent efforts focus on antivirals that block viral replication by affecting the stability of the capsid prior to and during assembly. Challenges still remain in terms of completely and accurately describing virus assembly. Current methods tend towards treating all subunits as equal, but quasi equivalence makes it clear that all subunits are not equal. Moreover, new analyses and techniques must be devised to analyze the energetics of assembly of the multicomponent virus and phage capsids. Nonetheless, we have today a solid framework for the analysis of virus capsid stability.

REFERENCES Ackers, G. K. (1970). Analytical gel chromatography of proteins. Adv. Protein. Chem. 24, 343–446. Bahadur, R. P., Rodier, F., and Janin, J. (2007). A dissection of the protein-protein interfaces in icosahedral virus capsids. J. Mol. Biol. 367, 574–590. Baker, B. M., and Murphy, K. P. (1998). Prediction of binding energetics from structure using empirical parameterization. Methods Enzymol. 295, 294–315. Baker, T. S., and Caspar, D. L. (1984). Computer image modeling of pentamer packing in polyoma virus ‘‘hexamer’’ tubes. Ultramicroscopy 13, 137–151. Baker, T. S., Drak, J., and Bina, M. (1989). The capsid of small papova viruses contains 72 pentameric capsomeres: Direct evidence from cryo-electron-microscopy of simian virus 40. Biophys. J. 55, 243–253. Bancroft, J. B., Hills, G. J., and Markham, R. (1967). A study of the self-assembly process in a small spherical virus. Formation of organized structures from protein subunits in vitro. Virology 31, 354–379. Bothner, B., Dong, X. F., Bibbs, L., Johnson, J. E., and Siuzdak, G. (1998). Evidence of viral capsid dynamics using limited proteolysis and mass spectrometry. J. Biol. Chem. 273, 673–676. Bourne, C., Finn, M. G., and Zlotnick, A. (2006). Global structural changes in hepatitis B capsids induced by the assembly effector HAP1. J. Virol. 80, 11055–11061.

Evaluating Virus Stability

415

Bourne, C., Lee, S., Venkataiah, B., Lee, A., Korba, B., Finn, M. G., and Zlotnick, A. (2008). Small-molecule effectors of hepatitis B virus capsid assembly give insight into virus life cycle. J. Virol. 82, published electronically. Bruinsma, R. F., Gelbart, W. M., Reguera, D., Rudnick, J., and Zandi, R. (2003). Viral self-assembly as a thermodynamic process. Phys. Rev. Lett. 90, 248101. Caspar, D. L. (1980). Movement and self-control in protein assemblies. Quasi-equivalence revisited. Biophys. J. 32, 103–138. Caspar, D. L. D. (1956). Structure of tomato bushy stunt virus. Nature 177, 476–477. Caspar, D. L. D., and Klug, A. (1962). Physical principles in the construction of regular viruses. Cold Spring Harbor Symp. Quant. Biol. 27, 1–24. Ceres, P., Stray, S. J., and Zlotnick, A. (2004). Hepatitis B virus capsid assembly is enhanced by naturally occurring mutation F97L. J. Virol. 78, 9538–9543. Ceres, P., and Zlotnick, A. (2002). Weak protein-protein interactions are sufficient to drive assembly of hepatitis B virus capsids. Biochemistry 41, 11525–11531. Conway, J. F., Cheng, N., Ross, P. D., Hendrix, R. W., Duda, R. L., and Steven, A. C. (2007). A thermally induced phase transition in a viral capsid transforms the hexamers, leaving the pentamers unchanged. J. Struct. Biol. 158, 224–232. Crick, F. H. C., and Watson, J. D. (1956). The structure of small viruses. Nature 177, 473–475. Da Poian, A. T., Oliveira, A. C., Gaspar, L. P., Silva, J. L., and Weber, G. (1993). Reversible pressure dissociation of R17 bacteriophage. The physical individuality of virus particles. J. Mol. Biol. 231, 999–1008. Deres, K., Schroder, C. H., Paessens, A., Goldmann, S., Hacker, H. J., Weber, O., Kramer, T., Niewohner, U., Pleiss, U., Stoltefuss, J., Graef, E., Koletzki, D., et al. (2003). Inhibition of hepatitis B virus replication by drug-induced depletion of nucleocapsids. Science 299, 893–896. Dong, X. F., Natarajan, P., Tihova, M., Johnson, J. E., and Schneemann, A. (1998). Particle polymorphism caused by deletion of a peptide molecular switch in a quasiequivalent icosahedral virus. J. Virol. 72, 6024–6033. Duda, R. L., Hempel, J., Michel, H., Shabanowitz, J., Hunt, D., and Hendrix, R. W. (1995). Structural transitions during bacteriophage HK97 head assembly. J. Mol. Biol. 247, 618–635. Endres, D., and Zlotnick, A. (2002). Model-based analysis of assembly kinetics for virus capsids or other spherical polymers. Biophys. J. 83, 1217–1230. Fane, B. A., and Prevelige, P. E., Jr. (2003). Mechanism of scaffolding-assisted viral assembly. In ‘‘Virus Structure’’ (W. Chiu and E. Johnson, eds.), Vol. 64, pp. 259–299. Academic Press, San Diego. Frieden, C. (1985). Actin and tubulin polymerization: The use of kinetic methods to determine mechanism. Annu. Rev. Biophys. Chem. 14, 189–210. Galisteo, M. L., and King, J. (1993). Conformational transformations in the protein lattice of phage P22 procapsids. Biophys. J. 65, 227–235. Ganem, D., and Prince, A. M. (2004). Hepatitis B virus infection: Natural history and clinical consequences. N. Engl. J. Med. 350, 1118–1129. Ganser, B. K., Li, S., Klishko, V. Y., Finch, J. T., and Sundquist, W. I. (1999). Assembly and analysis of conical models for the HIV-1 core. Science 283, 80–83. Golmohammadi, R., Valegard, K., Fridborg, K., and Liljas, L. (1993). The refined structure of bacteriophage MS2 at 2.8 A resolution. J. Mol. Biol. 234, 620–639. Hagan, M. F., and Chandler, D. (2006). Dynamic pathways for viral capsid assembly. Biophys. J. 91, 42–54. Hilmer, J. K., Zlotnick, A., and Bothner, B. (2008). Conformational equilibria and rates of localized motion within hepatitis B virus capsids. J. Mol. Biol. 375, 581–594.

416

Sarah Katen and Adam Zlotnick

Horton, N., and Lewis, M. (1992). Calculation of the free energy of association for protein complexes. Protein Sci. 1, 169–181. Jiang, H., and Lin, W. B. (2003). Self-assembly of chiral molecular polygons. J. Am. Chem. Soc. 125, 8084–8085. Johnson, J. M., Tang, J., Nyame, Y., Willits, D., Young, M. J., and Zlotnick, A. (2005). Regulating self-assembly of spherical oligomers. Nano Lett. 5, 765–770. Johnson, J. M., Willits, D., Young, M. J., and Zlotnick, A. (2004). Interaction with capsid protein alters RNA structure and the pathway for in vitro assembly of cowpea chlorotic mottle virus. J. Mol. Biol. 335, 455–464. King, J., Botstein, D., Casjens, S., Earnshaw, W., Harrison, S., and Lenk, E. (1976). Structure and assembly of the capsid of bacteriophage P22. Philos. Trans. R. Soc. Lond. B Biol. Sci. 276, 37–49. Lewis, J. K., Bothner, B., Smith, T. J., and Siuzdak, G. (1998). Antiviral agent blocks breathing of the common cold virus. Proc. Natl. Acad. Sci. USA 95, 6774–6778. Maxwell, K. L., Yee, A. A., Arrowsmith, C. H., Gold, M., and Davidson, A. R. (2002). The solution structure of the bacteriophage lambda head-tail joining protein, gpFII. J. Mol. Biol. 318, 1395–1404. Mukherjee, S., Pfeifer, C. M., Johnson, J. M., Liu, J., and Zlotnick, A. (2006). Redirecting the coat protein of a spherical virus to assemble into tubular nanostructures. J. Am. Chem. Soc. 128, 2538–2539. Mukherjee, S., Thorsteinsson, M. V., Johnston, L. B., DePhillips, P., and Zlotnick, A. (2008). A quantitative description of in vitro assembly of human papillomavirus 16 viruslike particles. J. Mol. Biol. 381, 229–237. Nguyen, H. D., Reddy, V. S., and Brooks, C. L., 3rd. (2007). Deciphering the kinetic mechanism of spontaneous self-assembly of icosahedral capsids. Nano Lett. 7, 338–344. Parent, K. N., Suhanovsky, M. M., and Teschke, C. M. (2006a). Phage P22 procapsids equilibrate with free coat protein subunits. J. Mol. Biol. 365, 513–522. Parent, K. N., Zlotnick, A., and Teschke, C. M. (2006b). Quantitative analysis of multicomponent spherical virus assembly: Scaffolding protein contributes to the global stability of phage P22 procapsids. J. Mol. Biol. 359, 1097–1106. Prevelige, P. E., Jr., King, J., and Silva, J. L. (1994). Pressure denaturation of the bacteriophage P22 coat protein and its entropic stabilization in icosahedral shells. Biophys. J. 66, 1631–1641. Rapaport, D. C. (2004). Self-assembly of polyhedral shells: A molecular dynamics study. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 70, 051905. Reddy, V. S., Natarajan, P., Okerberg, B., Li, K., Damodaran, K. V., Morton, R. T., Brooks, C. L., III, and Johnson, J. E. (2001). Virus Particle Explorer (VIPER), a website for virus capsid structures and their computational analyses. J. Virol. 75, 11943–11947. Ross, P. D., Conway, J. F., Cheng, N., Dierkes, L., Firek, B. A., Hendrix, R. W., Steven, A. C., and Duda, R. L. (2006). A free energy cascade with locks drives assembly and maturation of bacteriophage HK97 capsid. J. Mol. Biol. 364, 512–525. Rossmann, M. G., and Johnson, J. E. (1989). Icosahedral RNA virus structure. Annu. Rev. Biochem. 58, 533–573. Silva, J. L., and Weber, G. (1988). Pressure-induced dissociation of brome mosaic virus. J. Mol. Biol. 199, 149–159. Singh, S., and Zlotnick, A. (2003). Observed hysteresis of virus capsid disassembly is implicit in kinetic models of assembly. J. Biol. Chem. 278, 18249–18255. Stray, S. J., Bourne, C. R., Punna, S., Lewis, W. G., Finn, M. G., and Zlotnick, A. (2005). A heteroaryldihydropyrimidine activates and can misdirect hepatitis B virus capsid assembly. Proc. Natl. Acad. Sci. USA 102, 8138–8143. Stray, S. J., Ceres, P., and Zlotnick, A. (2004). Zinc ions trigger conformational change and oligomerization of hepatitis B virus capsid protein. Biochemistry 43, 9989–9998.

Evaluating Virus Stability

417

Tang, J., Johnson, J. M., Dryden, K. A., Young, M. J., Zlotnick, A., and Johnson, J. E. (2006). The role of subunit hinges and molecular ‘‘switches’’ in the control of viral capsid polymorphism. J. Struct. Biol. 154, 59–67. Tsang, S. K., Danthi, P., Chow, M., and Hogle, J. M. (2000). Stabilization of poliovirus by capsid-binding antiviral drugs is due to entropic effects. J. Mol. Biol. 296, 335–340. Wikoff, W. R., Duda, R. L., Hendrix, R. W., and Johnson, J. E. (1998). Crystallization and preliminary X-ray analysis of the dsDNA bacteriophage HK97 mature empty capsid. Virology 243, 113–118. Wingfield, P. T., Stahl, S. J., Williams, R. W., and Steven, A. C. (1995). Hepatitis core antigen produced in Escherichia coli: Subunit composition, conformational analysis, and in vitro capsid assembly. Biochemistry 34, 4919–4932. Wright, E. R., Schooler, J. B., Ding, H. J., Kieffer, C., Fillmore, C., Sundquist, W. I., and Jensen, G. J. (2007). Electron cryotomography of immature HIV-1 virions reveals the structure of the CA and SP1 Gag shells. EMBO J. 26, 2218–2226. Wynne, S. A., Crowther, R. A., and Leslie, A. G. (1999). The crystal structure of the human hepatitis B virus capsid. Mol. Cell 3, 771–780. Zandi, R., van der Schoot, P., Reguera, D., Kegel, W., and Reiss, H. (2006). Classical nucleation theory of virus capsids. Biophys. J. 90, 1939–1948. Zhang, T., and Schwartz, R. (2006). Simulation study of the contribution of oligomer/ oligomer binding to capsid assembly kinetics. Biophys. J. 90, 57–64. Zlotnick, A. (1994). To build a virus capsid. An equilibrium model of the self assembly of polyhedral protein complexes. J. Mol. Biol. 241, 59–67. Zlotnick, A. (2003). Are weak protein-protein interactions the general rule in capsid assembly? Virology 315, 269–274. Zlotnick, A. (2004). Viruses and the physics of soft condensed matter. Proc. Natl. Acad. Sci. USA 101, 15549–15550. Zlotnick, A. (2005). Theoretical aspects of virus capsid assembly. J. Mol. Recognit. 18, 479–490. Zlotnick, A., Aldrich, R., Johnson, J. M., Ceres, P., and Young, M. J. (2000). Mechanism of capsid assembly for an icosahedral plant virus. Virology 277, 450–456. Zlotnick, A., Johnson, J. M., Wingfield, P. W., Stahl, S. J., and Endres, D. (1999). A theoretical model successfully identifies features of hepatitis B virus capsid assembly. Biochemistry 38, 14644–14652. Zlotnick, A., Lee, A., Bourne, C. R., Johnson, J. M., Domanico, P. L., and Stray, S. J. (2007). In vitro screening for molecules that affect virus capsid assembly (and other protein association reactions). Nature Protocols 2, 490–498.

C H A P T E R

F I F T E E N

Extracting Equilibrium Constants from Kinetically Limited Reacting Systems John J. Correia* and Walter F. Stafford† Contents 420 421 421 428 436 442 443 443

1. Introduction 2. Methods 3. Simulation and Analysis of Dimerization 4. Kinetically Mediated Dimerization 5. A Stepwise Approach 6. Final Thoughts Acknowledgments References

Abstract It has been known for some time that slow kinetics will distort the shape of a reversible reaction boundary. Here we present a tutorial on direct boundary fitting of sedimentation velocity data for a monomer-dimer system that exhibits kinetic effects. Previous analysis of a monomer-dimer system suggested that rapid reaction behavior will persist until the relaxation time of the system exceeds 100 s (reviewed in Kegeles and Cann, 1978). Utilizing a kinetic integrator feature in Sedanal (Stafford and Sherwood, 2004), we can now fit for the koff values and measure the uncertainty at the 95% confidence interval. For the monomer-dimer system the range of well determined koff values is limited to 0.005 to 105 s1 corresponding to relaxation times (at a loading concentration of the Kd) of 70 to 33,000 s. For shorter relaxation times the system is fast and only the equilibrium constant K but not koff can be uniquely determined. For longer relaxation times the system is irreversibly slow, and assuming the system was at initial equilibrium before the start of the run, only the equilibrium constant K but not koff can be uniquely determined.

* {

Department of Biochemistry, University of Mississippi Medical Center, Jackson, Mississippi, USA Boston Biomedical Research Institute, Watertown, Massachusetts, USA

Methods in Enzymology, Volume 455 ISSN 0076-6879, DOI: 10.1016/S0076-6879(08)04215-8

#

2009 Elsevier Inc. All rights reserved.

419

420

John J. Correia and Walter F. Stafford

1. Introduction Sedimentation velocity has been a standard hydrodynamic technique since the inception of the method in the 1920s by Svedberg (Svedberg and Nichols, 1927; reviewed in van Holde, 2004). As with all separation techniques, the resolution and the shape of the profile can be influenced by the presence of molecular interactions, including interactions with small molecules. This was intensively investigated by many early practitioners of analytical ultracentrifugation, including Gilbert, Cann, Cox, and Kegeles (Cann, 1970; Cann and Kegeles, 1974; Gilbert, 1955, 1959, 1960; Gilbert and Jenkins, 1959; Kegeles et al., 1967; Oberhauser et al., 1965) and extensively reviewed by Kegeles, Cox, and Cann in 1978 (Cann, 1978a,b; Cox, 1978; Kegeles, 1978; Kegeles and Cann, 1978). Much of the early work was done assuming the absence of diffusion (Belford and Belford, 1962; Gilbert and Gilbert, 1978; Gilbert and Jenkins, 1959; van Holde, 1962) and with the appropriate approximate solutions to the transport equations. Analysis often involved graphical comparisons of simulation with experimental data. Simulation methods for solutions of the Lamm equation, the transcendental differential equation that describes sedimentation and diffusion in the ultracentrifuge, used numerical methods developed by Weiss and Yphantis (Correia et al., 1976; Dishon et al., 1966, 1967) and were advanced greatly by the development of the finite element method developed by Claverie (Claverie, 1976; Claverie et al., 1975). The development of direct comparison of data with simulation by computer fitting techniques were first developed in 1981 by Todd and Haschemeyer, but complete implementation of this approach had to wait for the development of faster computing power. Development of better fitting and simulation algorithms continues (Dam et al., 2005; Demeler and Saber, 1998; Philo, 2006; Schuck, 1998; Schuck and Demeler, 1999; Stafford, 1998, 2000; Stafford and Sherwood, 2004), and these approaches have now been fully implemented into user-friendly software platforms (including but not limited to Sedfit, Sedphat, Sedanal, and Ultrascan). This rather brief and selective introduction brings us to a stage where users are now able to directly fit sedimentation velocity data with relative ease, but with the caveat that complexities of many kinds require some thoughtful selection of approaches and fitting models. Here we begin by presenting a brief tutorial on the analysis and direct boundary fitting of sedimentation velocity data focusing on a weak monomer-dimer system. Then we discuss the effects of slow kinetics on the shape of a reversible reaction boundary, and we present direct boundary-fitting methods on sedimentation velocity data for kinetically mediated monomer-dimer systems.

Extracting Equilibrium Constants

421

2. Methods All data simulation and fitting were done with Sedanal. The maximum integration steps allowed should be reset to at least 10,000,000 to allow slow kinetic systems to come to equilibrium before the start of the simulated run. Data were simulated in absorbance mode at 50 K rpm, unless noted otherwise, with 0.005 abs units of Gaussian random noise added to each data set. For the dimer cases a 100 KDa monomer, s1 ¼ 5.8 S, s2 ¼ 9 S, with an extinction coefficient of 1.2 ml/mg/cm was used. Data were simulated with a time interval between scans of 100 s. Data were preprocessed to select for meniscus, base, and fit regions and stored as abr files. An even number of scans (typically 20) were chosen for direct boundary fitting so that the gradient in the last few scans was near the base of the fit region. Fitting models were constructed with ModelEditor 1.74. Fitting is done using the Levenberg-Marquart algorithm, although initial approaches to the minima were often done with the Simplex algorithm of Nelder and Mead (1965). Confidence intervals for s1, s2, K, and koff were estimated using F-statistics at the 95% confidence interval.1

3. Simulation and Analysis of Dimerization A weak but rapidly reversible dimerization reaction is shown in Fig. 15.1, where panel A shows the weight average sedimentation coefficients, Sw, plotted as a function of the logarithm of the plateau concentration (see Correia, 2000, for a detailed discussion of the use of Sw) and panel B shows the shape of the g(s) distributions derived from simulated data. Note that the Sw for this 81-fold concentration series are sufficiently far from the s1 and s2 values for the monomer and dimer species alone and thus will require some extrapolation or fitting method to be estimated. The shapes of the g(s) distribution show the same general result with the boundary maximum position not predicting either the s1 or s2 values. This is a typical situation for associating systems, unless they are strongly cooperative (Cann, 1978), and the determination of one or both end points is a major experimental challenge for the investigator.

1

An F statistic is constructed by stepping or searching along a parameter axis repeating the fit for all floated parameters until the fit hits a target rms value expressed as the (rms/rmso)2, where is the rmso of the best fit of the data. The target F-stat is determined by the degrees of freedom and is typically around 1.013 for this kind of data at the 95% or two standard deviation level. Notice many of the confidence intervals are asymmetric (Tables 15.1 and 15.2 and Fig. 15.8).

422

John J. Correia and Walter F. Stafford

A 9.0 8.5

Sw

8.0 7.5 7.0 6.5

Kd

6.0 5.5 −6.5

−6.0

−5.5 −5.0 logC (M)

B

−4.5

−4.0

1.6 1.4 1.2

0.6 0.5

0.8

0.4 g(s)/C

g(s)

1.0

0.3 0.2 0.1

0.6

0.0 4

0.4

6

8

10

S

0.2 0.0 2

4

6

8

10

12

S

Figure 15.1 Simulation and analysis of concentration dependence of weak dimerization with a 100K protein, Kd ¼ 4 mM, ext ¼ 1.2 ml/mg/cm, s1 ¼ 5.8 and s2 ¼ 9.0. Panel A: weight average sedimentation coefficient Sw on simulations performed at 0.4 to 40 mM. Panel B: g(s) analysis performed with DCDTþ2 (Philo, 2006) on simulations done at 2, 4, 8, 12, 20, and 28 mM (corresponding to 0.24 to 3.36 OD and thus implying the use of both a 1.2-cm and a 3-mm cell in an XLA). These data correspond to the boxed region in panel A. The horizontal lines in panel A and the vertical dotted lines in panel B correspond to the sedimentation coefficients of the monomer and the dimer. The insert presents normalized traces, g(s)/co.

Quantitative analysis of these data is best done by global direct boundary fitting of all the data to an appropriate model. Our preferred approach is fitting with Sedanal, which fits multiple data sets to concentration time-difference

423

Extracting Equilibrium Constants

curves to eliminate time-independent systematic errors inherent in the optical systems (Stafford and Sherwood, 2004). An example of this is shown in Fig. 15.2 when the best direct boundary fit is presented for all the data in Fig. 15.1B to a rapid monomer-dimer equilibrium model (koff ¼ 0.1 s1). The best-fitted K2 value is listed in Table 15.1 along with a 95% confidence interval. As pointed out earlier the extrapolated values for the monomerdimer sedimentation coefficients are not known a priori for experimental data, and thus Table 15.1 also presents a fit floating s2 and K2. The values are clearly correlated (a correlation coefficient R of 0.99884 can be estimated from the data in the F-stat log file generated under output files) with a slightly smaller s2 causing a larger K2, and the total confidence interval for K2 larger than for the fit of K2 alone (0.4 vs. 0.12). The estimate of s2, 8.934 S, in this case is close to the correct value of 9.0 S. One might thus infer this reflects the fact that 0.10

A

0.20

B

ΔC

0.15 0.10

0.05

0.05 0.00

ΔC

0.4

0.00 0.8

D

C

0.3

0.6

0.2

0.4

0.1

0.2 0.0 1.5

0.0

ΔC

1.0

E

F

0.8

1.2

0.6

0.9

0.4

0.6

0.2

0.3 0.0

0.0 6.0

6.2

6.4

6.6

Radius (cm)

6.8

7.0

6.0

6.2

6.4

6.6

6.8

7.0

Radius (cm)

Figure 15.2 A plot of the best Sedanal global fit of the monomer dimer data (2–28 mM data shown in panels A–F) presented in Fig. 15.1B. The model is a rapidly reversible monomer-dimer holding s1 and s2 to the correct values. Data are plotted as Dc, which is the difference between pairs of scans, to remove time independent systematic noise. Superimposed are the data (black squares) and the best fit (black lines) for each cell. The residuals are also plotted (black lines) and appear as a noisy trace near zero. The best fitted value for K2 (2.527 105) appears in Table 15.1 with a 95% confidence interval <2.47, 2.59> calculated with an F-stat procedure available in Sedanal. A second fit allowing s2 to float is also presented in Table 15.1 Notice the correlation between a lower s2 and a larger K2, value as well as the larger confidence interval that reflects the coupling between the s2 and K2 values.

Table 15.1

Monomer-dimer

Model

C’s

s1

s2

K2 (M1)

rms

koff ¼ 0.1

1-28 mM

5.8 5.8 4.515 <3.46,5.15> 4.658 <3.72,5.24> 5.978 <5.88,6.07> 5.85

9.0 8.934 <8.91,8.96> 8.971 <8.94,9.00> 8.967 <8.93,9.00> 8.946 <8.93,9.00> 8.77

2.527 105 <2.47,2.59> 2.962 105 <2.77,3.17> 5.684 105 <4.21,8.48> 5.356 105 <4.02,7.79> 2.50 105 4.558 105 <4.38,4.74>

0.00570 0.00543 0.00520 0.00520 0.00550 0.00690

5.85

8.932 <8.91,8.96>

2.871 105 <2.68,3.08>

0.00541

2 species plot s values

0.4-28 mM 0.4-28 mM 0.4-28mM

Extracting Equilibrium Constants

425

multiple data sets from a range of concentrations around Kd (cmax ¼ 7Kd in this case) can be fitted to extract an accurate s2 value when the correct s1 value is used in the fitting. To test this inference, a third fit floating s1, s2, and K2 is also shown in Table 15.1 Both parameter uncertainty and confidence intervals are clearly larger for K2, reflecting a strong correlation between the limits of s in the fit and K2 (R for K2 vs. s1 ¼ 0.9985; R for K2 vs. s2 ¼ 0.9990; R for s2 vs s1 ¼ 0.9999). In this case while s2 is again well determined, K2 is too large by a factor of two (which is only 0.48 kcal smaller on a DG scale) with a large and skewed confidence interval and s1 is smaller by 22% with a large and asymmetric confidence interval that does not include the correct value, 5.8 S. Adding a lower concentration point of 0.4 mM to the fit (corresponding to an Abs280 ¼ 0.0576) might help constrain s1 except the lower signal-to-noise in this case appears to limit its impact (Table 15.1). Thus, the best-fitted parameter values for s2 and K2 are surprisingly insensitive to the absolute value of s1. These results are presented to introduce the method of direct boundary fitting, and they allow us to make three additional points: 1. The difficulty of fitting velocity data for a self-associating system where the model and the endpoints may be uncertain is the best reason for also using equilibrium analysis where the molecular weights of monomer and dimer can be calculated and are usually known to good precision, thus making the fitting for K2 more robust (Correia et al., 1995, 2001; Zhao and Beckett, 2008). Some systems may still be more amenable to velocity studies than equilibrium studies (Correia, 2000), and it is worth emphasizing that only velocity studies will provide information about shape and kinetics (Gelinas et al., 2004; Stafford and Sherwood, 2004). That being said, one can still attempt to fit velocity data to extract molar mass. In fact, for experimental data an additional test of the model is to fit for molecular weight. For a monomer-dimer system one would start by fitting M1 and constraining M2 to 2 M1. For these data the best fit to M1 and K2, constraining s1 and s2, returns values of M1 ¼ 51384 <50518, 52281> and K2 ¼ 3.076 105 M1 <2.98, 3.18>. Thus, this monomer-dimer system is exceedingly well determined. 2. Data must be collected over as wide a concentration range as possible, 10% to 90% degree of association, to obtain the reasonable estimates for the end points of the isotherm, although this is clearly not necessarily sufficient to determine both endpoints. This range is obviously dependent upon Kd, the optics used for data collection and thus the range of experimental concentrations. We stress this point to discourage the idea that a single concentration or a few concentrations are sufficient to extract de novo thermodynamic and hydrodynamic information about a system. For example, fitting the 4 mM data alone yields a K2 ¼ 2.456 105 M1 with error limits three times larger <2.29, 2.64> than the s1, s2 constrained fit in Table 15.1 The central issues are where the monomer-

426

John J. Correia and Walter F. Stafford

dimer model and the sedimentation coefficient limits would come from in the case of a single experimental data set. It is simply not an appropriate application of the scientific method. To enhance the range of concentrations the use of different path length cells or different wavelengths, and the use of interference or fluorescence optics are extremely helpful. These approaches raise issues about determining extinction coefficients that will be addressed in a later section. 3. Extrapolation techniques may also be utilized or required to accurately estimate hydrodynamic end points. These techniques have been discussed previously in a review on weight average techniques (Correia, 2000). Direct fitting of Sw vs. c is a common approach although the end points are usually constrained and not necessarily well determined (Correia, 2000). The use of a reciprocal plot, 1/sw vs. 1/c may be useful to extract s2 or sn with the caveat that the functional form of the extrapolation is unclear in part because at extremely high macromolecule concentrations nonideality may become an issue and thus require additional fitting parameters (Stafford and Sherwood, 2004). Additional graphical techniques commonly used in sedimentation equilibrium analysis involve plotting of the moments of the distributions (Stafford, 1980; Yphantis and Roark, 1972). Figure 15.3 presents a two-species plot (Roark and Yphantis, 1969; Sophianopoulos and van Holde, 1964) for the data in Fig. 15.1B. A linear plot is typically consistent with the presence of only two species (hence the name), thus improving one’s confidence in a real experimental situation about the fitting model proposed. The lower moments, sw vs. 1/sn, appear to do a better job of predicting the correct s1 value, while the higher moments, szþ1 vs. 1/sz, underestimate but approach the correct s2 value for the dimer. (To our knowledge only DCDT (Stafford, 1992), DCDTþ2 (Philo, 2000), and Sedanal (Stafford and Sherwood, 2004) calculate these higher moments, although it would be trivial to program in any spreadsheet. Only Sedanal is programmed to generate two species plots for equilibrium data analysis.) This approach is useful for estimating the parameter guesses for an unknown fitting problem in combination with simple rules about the expected relationship between s values for a polymer. On the basis of a constant axial ratio model between monomers and oligomers, we expect sn values to be s1*n2/3. (In this case 9/5.8 ¼ 1.552, which is slightly less than 22/3 ¼ 1.587.) The ratio of s2/s1 can thus be constrained by entering a relationship into the equation editor of Sedanal found under the fit window; for example, s2 ¼ 1.59*s1 would force the fit to obey the n2/3 rule. An exception to this s1*22/3.

427

Extracting Equilibrium Constants

Sw vs 1/Sn Sz vs 1/Sw Sz + 1 vs 1/Sz

Sw Sz Sz+1

9

8

7

6

0.10

0.12

0.14 0.16 1/Sn 1/Sw 1/Sz

0.18

Figure 15.3 An example of a two-species plot to estimate s1 and s2 for the dimerization reaction presented in Figs. 15.1 and 15.2. The data at 2, 4, 8, 12, 20, and 28 mM were analyzed with DCDT þ 2 to generate n-, w-, z- and z þ 1-average sedimentation coefficients of the g(s) distribution. The data are then plotted as shown, with the crosses corresponding to the correct monomer and dimer sedimentation coefficients. Linear fits have correlation coefficients of 0.9996, 0.9992, and 0.9992. The lower moments, Sw vs. 1/Sn, appear to do a better job of predicting s1, crossing the line at 5.85 S, while the higher moments, Szþ1 vs. 1/Sz approach the s2 value for the dimer, crossing the line at 8.77 S. The near approach emphasizes the importance of spanning a wide concentration range during the analysis of binding data. The heavy black line is a plot of s vs 1/s.

While this graphical approach has been criticized for potentially being too sensitive to noise and baseline artifacts in the g(s) data, especially the higher moments (Correia, 2000; Philo, 2001), it has also been pointed out that one must simulate the actual experiment to know how conditions will affect moment analysis (Philo, 2000). In the simulations discussed by Philo (2000), the samples were mixtures of eight noninteracting components with s values raging from 6.2 to 30 S. In this dimerization case (Fig. 15.1) the s values for the various samples only vary from 6.5 to 8.5 S, and thus for any one sample the scans come from a narrower region and the moment estimates are much more precise (much less than 1% using the new algorithms in DCDTþ2; see Fig. 15.1A). If we use these values in the direct boundary analysis the fit will clearly be worse because 8.77 S is outside the 95% confidence interval for s2. If we constrain the s1 value to 5.85 S and float s2 and K2, then the fit is reasonable with accurate and precise estimates for both parameters (Table 15.1). An additional solution to this problem of estimating hydrodynamic parameter values for monomers and oligomers involves the use of hydrodynamic bead modeling in combination with structural data (Byron, 2008).

428

John J. Correia and Walter F. Stafford

We reemphasize the importance of spanning a wide concentration range, if possible, during the collection and analysis of binding data, in this context to minimize the distance required to extrapolate to the end points. If one imagines data sets where Kd is significantly larger or smaller than 1 mM, well-determined estimates of s1 and s2, respectively, are likely to be obtained with a major challenge being estimating the unknown s value. In some instances an experimental approach involving mutants that knock out association can be utilized to extract s1 values (Zhao and Beckett, 2008). This is generally a useful strategy employed in studies of the thermodynamics of homo- and heteroassociation (Chen et al., 2008; Correia et al., 2001). In addition, many proteins have irreversible aggregates or inactivated monomers in the mixture, and these may prove useful in establishing one or more end points of the reaction (Correia et al., 1995; Snyder et al., 2004). Unfortunately, unless these components can be removed by gel filtration or dissociated by the addition of reducing agents (TCEP being preferred because of low absorbance and the absence of pH dependence), they also add parameters to the fitting model. In the case of a larger Kd, verifying the hypothesis of a monomer-dimer equilibrium becomes challenging because there may be no experimental estimate possible for a maximum molecular weight or sedimentation coefficient. One additional inverse approach is to constrain K2 with a value determined by sedimentation equilibrium to help narrow down the range of s values. (For example, constraining K2 to 2.5 105 M1 actually improves the accuracy and precision of the lower s value limit (s1 ¼ 5.978 S; s2 ¼ 8.946 S; see Table 15.1), although the rms of the fit is clearly worse. These considerations are presented and discussed to introduce the modeling and direct boundary-fitting methods that are also useful for analysis of kinetically mediated velocity data.

4. Kinetically Mediated Dimerization We will now apply these general approaches to the analysis of kinetically mediated velocity data. Simulations of the role of kinetics in a monomer-dimer sedimentation velocity experiment are presented in Fig. 15.4. Dimerization data were simulated at the Kd, 4 mM, and koff was allowed to vary from 0.1 to 0.000001 s1. At koff ¼ 0.001 s1 the shape of the boundary becomes more skewed relative to the faster cases and as the off rate decreases the shapes become profoundly bimodal resolving into a monomer and a dimer zone. There is no significant difference between the shapes generated between 105 and 106 s1, and thus we can qualitatively conclude that hydrodynamic data for a kinetically mediated dimer reaction is sensitive to koff values from 103 to 105 s1. These features

429

Extracting Equilibrium Constants

A koff = 0.1 koff = 0.01 koff = 0.001 koff = 0.0001 koff = 0.00001 koff = 0.000001

c(s)

0.6

0.4

0.2

B 0.0 1 0.2

2

g(s)

3 4

0.1

5 6 0.0 2

4

6

8

10

S

Figure 15.4 Simulations of the role of kinetics in a monomer-dimer sedimentation velocity experiment. Dimerization data were simulated at the Kd, 4 mM, and koff was allowed to vary from 0.1 to 0.000001 s1. Panel A presents c(s) distributions derived from all 90 scans and fitting f/fo at a resolution of 0.1 s. Panel B presents g(s) distributions, labeled as 1 through 6, corresponding to fast to slow koff. The curves verify that both approaches can provide similar visual information about the system, especially when one fits for f/fo. (Note the apparent noise in these data vs. the data in Fig. 15.1B is due to different scaling.) At koff ¼ 0.001 s1 the shape of the boundary becomes skewed and as the off rate decreases the shapes become profoundly bimodal resolving into a monomer and a dimer zone. There is no significant difference between the shapes generated between 105 and 106 s1.

are evident in both g(s) distribution and c(s) distributions (if generated with f/fo fitting and regularization). When comparing distributions at different loading concentrations it is recommended to plot normalized g(s) distributions (g(s*)/co) to observe coincidence of the curves, which implies lack of association, or shifting in the distribution, consistent with the presence of concentration dependence (Gelinas et al., 2004; Stafford, 2009; Figure 15.1B). Why are velocity data influenced by kinetics? There are two reasons. First, the centrifuge cell is sector shaped to avoid convection, and thus during the run there is radial dilution of the sample, which causes some dissociation of the reactive complexes during the run. Second, sedimentation causes separation and resolution of different hydrodynamic particles and as they separate the species must reequilibrate to maintain equilibrium.

430

John J. Correia and Walter F. Stafford

If the time of reequilibration is fast, then the boundary maintains equilibrium throughout and migrates as a reaction boundary (Cann, 1970). If the time of reequilibration is slower than the rate of sedimentation, then the species become fractionated and the boundary shape is kinetically mediated. Strictly speaking, the ability to resolve boundaries depends on the experimental conditions (Cann and Kegeles, 1974). For example, Fig. 15.5 shows simulations at the Kd for koff ¼ 0.001 s1 where the runs were done at 40, 50, and 60K rpm. This has been previously expressed in terms of half-time of dissociation vs. the time of the experiment or of sedimentation (Kegeles and Cann, 1978). To be consistent with this previous approach, one can use a kinetic equation derived in Bernasconi (1976) for a monomer-dimer equilibrium:

ð1=tÞ2 ¼ 8k1 k1 ½co þ ðk1 Þ2 ;

ð15:1Þ

where t is the relaxation time, k1 and k1 are the forward and reverse rate constants, kon and koff in our nomenclature, and [co] is the initial total concentration of monomer (Fig. 15.6). Alternatively, Sedanal has a kinetic calculator that can simulate relaxation kinetics for various models and 0.16 60 K 50 K 40 K

0.14 0.12

g(s)

0.10 0.08 0.06 0.04 0.02 0.00 0

2

4

6

8

10

12

14

S

Figure 15.5 The observed shape of a boundary is influenced by the speed of the experiment (Cann and Kegeles, 1974). A g(s) analysis is shown for data simulated at the Kd, 4 mM, with koff ¼ 0.001 s1 and at 40, 50, and 60 K rpm. The bimodality of the boundary is more evident as the speed of sedimentation exceeds the ability of the system to reequilibrate. Note that data were chosen so that the peak broadening limits were in a comparable range (243–254 kD) so as to maintain similar height to width characteristics of the distributions.

431

Extracting Equilibrium Constants

6 0.01 0.001 0.0001 0.00001

(1/t)2 = 8k1k−1[Co] + (k−1)2

Log(t s)

5

4

3

2

1 0.000000

0.000005

0.000010 [Co] M

0.000015

0.000020

Figure 15.6 Relaxation times for dimer dissociation calculated with an approximate equation derived in Bernasconi (1976). The data are consistent with relaxation times at 4 mM Co varying from 70 s at a koff of 0.005 s1 to 33,000 s for a koff of 105 s1. More precise estimates can be derived with a kinetic integrator function in Sedanal or with Kimsim at various monomer-dimer ratios.

species concentrations, and thus allows for estimation of relaxation times. This is analogous to what one could simulate with a program like KINSIM. Explicit derivations of relaxation constants in terms of equilibrium and rate constants are outlined in Bernaconi (1976), and examples for sedimentation velocity can be found in earlier literature (Cann and Kegeles, 1974, Kegeles and Cann, 1978). To explore the ability to extract kinetic parameters from these systems, sedimentation velocity data for koff equal to 103, 104 and 105 s1 were simulated for nine loading concentrations from 0.4 to 40 mM and fit by the direct whole boundary-fitting approach. Results for fits with constrained s1 and s2 end points are presented in Table 15.2 and an example of the best fit of koff equal to 104 s1 data (only 2–28 mM data presented) is shown in Fig. 15.7. Relative to the data in Fig. 15.2, notice the bimodal appearance of the curves in panels 7A and 7B and the exaggerated skewed peaks in panels 7C through 7F. The K2 values are all accurate to within 3% to 6%, while the koff values are better determined for the slower cases. Note for the 0.001 s1 case in particular the confidence interval is highly asymmetric towards faster off rates (Table 15.2 and Fig. 15.8). This suggests kinetic effects will be evident in the 0.01 to 0.001 s1 regime. Floating s2 in the fits for all three kinetically mediated cases reveals that s2 is well determined for this span of

Table 15.2 Kinetically mediated monomer-dimer Model

s1

s2

K2 (M1)

koff (sec1)

rms

koff ¼ 0.001

5.8

9.0 8.971 <8.95,9.00> 8.985 <8.96,9.01> 9.0

3.965 103 <2.41,10.99> 2.19 103 <1.37,4.83> 2.08 103 <1.42,3.66> 1.130 104 <1.08,1.19> 0.899 104 <0.77,1.04> 0.900 104 <0.76,1.06> 0.950 105 <0.82,1.09> 0.726 108 0.950 105

0.00548

5.8

2.66 105 <2.59,2.73> 2.825 105 <2.67,2.99> 3.831 105 <3.12,4.83> 2.584 105 <2.53,2.64> 2.769 105 <2.65,2.90> 2.769 105 <2.62,2.92> 2.595 105 <2.53,2.66> 2.727 105 <2.65,2.88> 2.620 105 <2.55,2.69> 2.723 105 <2.65,2.88>

0.552 109

0.00513

koff ¼ 0.0001

5.236 <4.74,5.61> 5.8 5.8

koff ¼ 0.00001

5.80 <5.71,5.88> 5.8 5.8 5.8 5.804 <5.76,5.85>

8.963 <8.93,8.98> 8.963 <8.94,8.99> 9.0 8.951 <8.94,8.97> 8.995 <8.99,9.00> 8.952 <8.94,8.97>

0.00543 0.00534 0.00527 0.00517 0.00517 0.00539 0.00513 0.00536

433

Extracting Equilibrium Constants

ΔC

0.08

A

0.15

B

0.10

0.04

0.05 0.00

ΔC

0.3

C

0.00 0.6

D

0.4

0.2

0.2

0.1

ΔC

0.0 1.0 0.8

0.0 1.5

F

E

0.6

1.2 0.9

0.4

0.6

0.2

0.3

0.0

0.0 6.0

6.2

6.4 6.6 6.8 Radius (cm)

7.0

6.0

6.2

6.4 6.6 6.8 Radius (cm)

7.0

Figure 15.7 A plot of the best Sedanal global fit of kinetically mediated monomerdimer data (2–28 mM data shown in panels A–F) with koff ¼ 0.0001 s1 (Fig. 15.4). The model is a reversible monomer-dimer equilibrium holding s1 and s2 to the correct values and fitting for K2 and koff. Data are plotted as Dc, which is the difference between pairs of scans to remove time independent systematic noise. Superimposed are the data (black squares) and the best fit (lines) for each cell. The residuals are also plotted and appear as a noisy trace near 0. The best fitted value for K2 (2.584 105 M1) and koff (1.130 104 sec1) appear in Table 15.2 with 95% confidence intervals <2.53 105, 2.64 105> and <1.08 104,1.19 104> calculated with an F-stat procedure available in Sedanal. A second fit allowing s2 to float is also presented in Table 15.2 Notice the correlation between a lower s2 and a larger K2 value as well as the larger confidence interval that reflects the coupling between the s2 and K2 values.

monomer-dimer data. This is anticipated since s2 was well determined in the rapidly reversible case (see Table 15.1). As for the rapidly reversible data, s2 and K2 are coupled with a lower s2, causing a larger K2 value with larger confidence intervals for K2 (but not necessarily koff) relative to the K2 fit alone for all three ranges of koff. Interestingly, the best fit koff values for the 103 and 104 s1 cases are closer to the correct values with confidence intervals that include the correct values when you allow s2 to float. In particular, note the upper error limit of 4.8 103 s1 for the 0.001 s1 data, which suggests for this monomer-dimer system that 0.005 s1 is also the upper limit for quantitatively observing kinetic effects. For the 105 s1 data this is not the case, where floating s2 allows or causes koff to converge on a much smaller value (0.73 108 s1) with an unbound lower confidence limit.

434

John J. Correia and Walter F. Stafford

A Kkoff fit Kkoffs2 fit Kkoffs1s2 fit

1.02 1.01 1.00 0.002

0.004

0.006

0.008

0.010

0.012

0.00008

0.00009

0.00010

0.00011

0.00012

0.00013

B

Fstat

1.06 1.04 1.02 1.00 0.00007 C 1.03 1.02 1.01 1.00 0.000000

0.000003

0.000006

0.000009

0.000012

koff S−1

Figure 15.8 F-statistic values are plotted for the all koff fits reported in Table 15.2 Panels A–C report results for koff values equal to (A) 103, (B) 104 and (C) 105 s1. In each panel different line-symbols as shown in the upper panel represent fits where K2 and koff were floated (filled squares), where K2, koff and s2 were floated (open squares), and where K2, koff, s1 and s2 were floated (filled downward triangle). The horizontal dotted lines represent the target F-stat values consistent with a 95% confidence interval.

There are relatively small changes in the s2 and K2 values, suggesting 105 s1 is the lower limit or slightly beyond the lower limit for quantitatively estimating kinetic parameters for this system when you do not know the end points of the reaction. To investigate this further, fits were done constraining s1 to 5.8 S and koff to the best fit value of 0.905 105 s1. The rms of this fit (0.00536) is close to that of the s2, K2, koff fit (0.00513), and thus outside the confidence interval, but more important the s2 and K2 values are not significantly changed. Thus, in this range of koff values (<0.00001 s1) varying koff has no significant impact on parameter estimates. (It is helpful to do the fitting and F-stat analysis

Extracting Equilibrium Constants

435

on log(koff) when koff is in this range and thus poorly determined. The upper limit of the confidence interval in this case (0.34 105 s1) is three orders of magnitude greater than the ‘‘best’’ fit value (0.72 108 s1) and thus searching error space in log units improves the range of the region to be potentially searched.) In attempts to also fit for s1 in these three kinetic regimes, we found that s1 values become better determined as the kinetics slow down, consistent with emergence of a boundary that sediments with the monomer s value. This is evident by inspection in Fig. 15.4. Surprisingly, floating s1 has no significant effect on K2 or koff, either best-fit values or confidence intervals (Table 15.2 and Fig. 15.8). Note in particular the same upper confidence limit is obtained for the 105 s1 koff F-stat <, 0.40> when fitting for s1, s2, and K2. Thus, for this system we conclude that constraining s1 is a reasonable least squares approach to analyzing the data, and because s2 values are well determined in all these cases (see Tables 15.1 and 15.2), it further suggests for analysis of real data the traditional approach of fixing the end points of the fit (by extrapolation or constraining s2/s1 ratios) is a reasonable starting point for quantitative analysis. The magnitude of K2 in particular is coupled to the s1 and s2 values but relatively insensitive to reasonably constrained end points (i.e., a factor of 2 in Table 15.2). As we have tried to stress throughout this presentation, these results pertain to both the model and the data sets or the concentrations spanned in the analysis. Success must be measured by a combination of reasonable assumptions and systematic testing of the reliability and impact of each parameter fixed or floated in the fitting. The total uncertainty should reflect this span of parameters or at least should include caveats about how parameters were both estimated and constrained. For these monomer-dimer systems, which include the range of data sets analyzed, the range of well-determined koff values occurs between 0.005 and 105 s1, corresponding to relaxation times (at a loading concentration of the Kd) of 70 to 33,000 s. Examples from the literature that have applied this approach and measured koff values in this ranges include Zhao and Beckett (2.7 104 s1; 2008) and Gelinas et al. (1.41 103 s1; 2004). This problem has also been looked at by Dam et al. (2005), who concluded a slightly narrower range of accessible koff values, 103 to 104 s1 for an A þ B $ C system. Our criteria for deciding what koff values are measurable are a bit different. Here we focus on direct boundary fitting and F-stat ranges, while they at times compare rms values and primarily visually compare c(s) and Lsg(s) traces for different orders of magnitude of koff. Note in Table 15.1 the rms alone is not necessarily the best criterion of the best model, although the point we are trying to make pertains to experimental data when you are also trying to constrain parameter guesses. Nonetheless, in general we both agree that the particular model, the range of the data, the presence or absence of heterogeneity, and most important

436

John J. Correia and Walter F. Stafford

parameter correlations, will influence ones ability to interpret kinetically mediated sedimentation velocity data. Ultimately the ability to see and quantify kinetic effects in transport data depends on the relaxation time of the system relative to the time of the experiment (Cann and Kegeles, 1974; Kegeles and Cann, 1978; Fig. 15.6). For relaxation times less than 70 s the system is fast and only the equilibrium constant K2 but not koff can be uniquely determined. For relaxation times greater than 33,000 s the system is irreversibly slow and, assuming the system was at initial equilibrium before the start of the run, only the equilibrium constant K but not koff can be uniquely determined (see further examples in Stafford, 2000; Dam et al., 2005).2 For this monomer-dimer system koff values occurring between 0.005 and 105 s1 are quantitatively measurable (Table 15.2). Our analysis has also determined that dimer sedimentation coefficients are much better determined than monomer values. This is probably influenced by the signal-to-noise in the data at higher concentrations, and we anticipate that the use of interference or fluorescence data with more uniform signal-to-noise might improve this situation. Monomer s values only become well determined for these cases in the limit of slow kinetics coinciding with the emergence of a monomer zone in the reacting boundary. For the same concentration range of the data, as Kd gets larger s1 should become better determined. Finally, it is worth stating that for real data sets, simulating the best-fitted values over the same concentration ranges as the experimental data and then reanalyzing the simulated data serves as a useful test of the reliability of the experimental conclusions and the constraints imposed upon the analysis.

5. A Stepwise Approach Here we have presented a brief tutorial on the direct boundary fitting of sedimentation velocity data, including the effects of slow kinetics for a monomer-dimer system. We have stressed the role of parameter estimation and cross-correlation effects between parameters. We have related the fitting process back to experimental situations in which one may not 2

We can relate this situation back to the analysis of a single 4 mM data set. At a koff 105 s1 we have approached a situation that resembles two noninteracting components. Yet one may still attempt to fit the data for a K assuming they are reversibly interacting. In this case we get K ¼ 2.47 105 M1 <2.24 105,2.74 105> and koff ¼ 0.912 105 s1 , and yet we would have no experimental evidence of reversible association unless we repeated the run at a different concentration. You can also fit this single data set to a two component model A + B to extract molecular weights (M1 ¼ 48,990 <40403, 59350>; M2 ¼ 107,029 <84357, 138531>) where you have assumed, not proved, two noninteracting components. Fitting multiple data sets would give better confidence intervals but also reveal a concentration dependent shift in the ratio of dimer to monomer and thus prove a reversible association model.

Extracting Equilibrium Constants

437

know the correct model or correct end points for the reaction. An overview of this approach includes the following steps. 1. Collect velocity data over a wide concentration range. One may need to return to experiments if the fitting demonstrates a still wider concentration range is required. We typically stress a range of association from 10% to 90% for binding data, but some cases may clearly require more extreme extents of reaction to nail down parameters in the best fit sense. 2. Initially analyze all data by generating g(s) and/or c(s) distributions. Follow recommended protocols for these analyses. For g(s), use scans late in the run to maximize species resolution and minimize the total number of scans to enhance the peak broadening limits without sacrificing the advantages of averaging over many pairs of scans (Philo, 2006; Stafford, 1992). Try to use a consistent range of scans (i.e., the same range of o2t), giving a consistent value for the peak broadening limit, referred to as the maximum molar mass in Sedanal. This will maintain a constant height-to-width characteristic in the family of g(s) distributions and thus provide better superposition for monodisperse distributions or a smoother appearance to any concentration dependent transitions (Figure 15.1B). For c(s), the best results are obtained if you allow the smallest species to pellet and include data from the full range of scans (Schuck, 1998). One advantage of c(s) is the ability to see small amounts of larger aggregates in the system. Another case where c(s) excels appears to be with antibody samples where dissociated chains and higher-order cross-linked aggregates (dimer, trimer, tetramer, and above) are observed in significant amounts along with the main antibody complex (Arakawa et al., 2006). This can be very useful for model building. 3. Plot the family of distributions to generate a hypothesis about the behavior of the system. For small changes or subtle changes in the distribution shape, it helps to normalize the distributions by dividing by the concentration (the signal in appropriate units or the area under the distributions see Figure 15.1B). For noninteracting systems the normalized data (g(s)/ co) should superimpose (see Figure 1 in Gelinas et al., 2004). One does not generally expect c(s) distributions to perfectly overlay because of the nature of the fitting function, but the same general features are typically evident. For self-associating systems the distributions should shift by mass action to larger species as the loading concentration is increased (see Figure 3 in Gelinas et al., 2004) unless there is strong nonideal behavior, in which case the nonideality must be taken into account in the model and the fitting described subsequently (Stafford, 2009). 4. Plotting Sw vs concentration helps to establish the range of sedimentation coefficients observed and thus the changes that are occurring in the underlying species distribution. Be warned that g(s) distributions and Sw values can be very misleading about the actual species present in the

438

John J. Correia and Walter F. Stafford

reacting boundary (Stafford, 2009). At this stage a hypothesis about an appropriate model is required to proceed to direct boundary fitting. Most software packages come with preloaded models of various types (noninteracting and interacting, definite and indefinite association). Sedanal offers ModelEditor, which allows the user to build any model desired, up to 25 reactions, 26 species, and 10 components, in addition to the isodesmic type models. Typically, users begin with their preconceived ideas about the system. 5. For noninteracting systems, global fitting to molecular weight should quickly reveal monodisperse behavior (i.e., a monomer or a dimer). Multiple peaks or overlapping peaks requires two component or three component noninteracting models. For small amounts of large aggregate choose scans late in the run to exclude those aggregates from the fitting. (F-stat analysis on these minor components typically reveals very poorly determined s values for small amounts of aggregation.) In the therapeutic proteins industry analysis of aggregation is an important and central aspect of the study as aggregates can be the cause of therapeutic instability and undesired side effects (Arakawa, 2006). For interacting systems, we mostly try to exclude or account for aggregates in the model so that we can extract stoichiometric and thermodynamic information about the system. 6. For a simple monomer-dimer system, the boundary should skew to the faster sedimenting species as concentration is raised. It helps to recognize the expected shape of the distribution for different models (see Figs. 15.1B and 15.4) and simulations are useful to establish the expected behavior. As described previously, one expects s2 to be s1*22/3. If Sw exceeds this value, then consider higher-order reaction schemes. In a two species plot this generates upward curvature (Roark and Yphantis, 1969). 7. We have said nothing about calculating extinction coefficients or the buoyancy terms, (1 nr) for two component systems or density increment, (@r/@c2)m, for multiple component systems. In the simulations shown above these values were based on typical data for tubulin, 1.2 ml/ mg/cm and a reasonable (1 nr) value for proteins, 0.257. (Sedanal defines the buoyancy term as density increment in the ModelEditor.) These values can be different for each data set, especially for different wavelengths or path lengths or for heterogeneous multicomponent systems (see step 8 subsequently). Most users seem to estimate these values from amino acid compositions using a program like Sednterp. When doing runs on highly charged macromolecules or in the presence of osmolytes preferential interaction effects or osmotic stress effects can influence these values dramatically (Eisenberg, 2002). For these cases, direct measurement of density, partial specific volume, or density increment with an Anton Paar Density Meter (DMA 5000) is preferable.

Extracting Equilibrium Constants

439

In fact our experience is that even when comparing different batches of buffers, it can make a difference in the best fitted s values obtained when doing global direct boundary fitting. This can show up as slight systematic differences between the different sets of data. Each experimental situation and system should dictate what is or is not necessary. A typical inverse use of Sedanal for monodisperse systems is to constrain the molar mass to the correct value, when known, and fit for the density increment just to be sure it is in a reasonable range of values. Extinction coefficients, density and buoyancy values must be known or assumed to start fitting data, so compile information early. 8. Estimation of extinction coefficients from amino acid composition works very well at 280 nm (Laue et al., 1992; Pace et al., 1995). But what do you do when you collect data at 290 nm or 230 nm, or better yet at the minima typically found near 250 nm? Many users collect a wavelength scan and then determine the absorbance ratio between 280 and the wavelength of the unknown extinction. However, this is not typically of sufficient accuracy for direct boundary fitting. A direct approach for a noninteracting system is to acquire and fit data simultaneously from both wavelengths constraining the 280-nm extinction value and floating the unknown extinction value while linking the fitted concentration in each cell. Do this over multiple samples and take an average value for use in subsequent analysis. For proteins labeled chemically or with a GFP construct, this procedure may have to be repeated for every new preparation due to differences in labeling stoichiometry or GFP-folding efficiency. 9. Start with reasonable parameter estimates and constrain as many parameters as possible to begin direct boundary fitting. This was done earlier by fitting for K2 while constraining s1 and s2, even if they are only guesses. Current versions of Sedanal allows up to 32 data sets for velocity analysis. It is often sufficient to start with 3 to 6 data sets to speed up the fitting and get an initial verification that the model is appropriate. Starting with 200 points in the grid between the meniscus and the base of the cell also speeds up the fitting. This should be increased later to 400 or 800 points to improve the resolution of parameter estimation. All the fits for Tables 15.1 and 15.2 were done with 800 points in each grid. Note however, for some models that generate high gradients near the meniscus and base regions of the cell the number of points is a critical feature. For example, the isodesmic model often requires 2400 points with more point density near the meniscus and still more point density in the base region (Sontag et al., 2004). (Grid spacing is under Claverie control in the ‘‘Advanced. . .’’ window.) 10. Good fits generate random residuals with rms noise levels in the range of 0.003 to 0.008 a.u. for absorbance data, depending upon the condition of the optics, the wavelength and the path length of cell. At this stage

440

John J. Correia and Walter F. Stafford

adding more data sets or increasing the number of points in the grid will improve the stability of the numerical analysis. If there are not enough points or an inappropriate point density for a fit, a grid ‘‘Check Grid’’ error message will appear on the fit window screen. 11. Confidence intervals can be generated in three ways (found under error estimation control in the ‘‘Advanced. . .’’ window): (1) Monte Carlo, which is performed on simulated data, (2) bootstrap with replacement, and (3) F-statistic analysis. For example the fit of the single 4-mM data set described earlier was repeated with bootstrap and generated a mean of 2.455 105 M1 with a standard deviation of 0.01 105. Recall the F-stat result was 2.456 105 M1 with a confidence interval of <2.29, 2.64> or 0.35 105. An F-stat analysis forces the fitting into regions of parameter error space that are more sensitive to cross-correlation effects typically not explored by bootstrap or Monte Carlo methods.3 An F-statistic for a particular control file is activated under the error estimation control button but must also be activated for each parameter you want to do statistics on by a shift-left-click with the cursor in that parameter box. F-stat analysis can be very slow, as it can take up to 32 consecutive fits (default maxima for error searching is 16 fits below and 16 fits above the best fit value) to explore error space above and below the best-fitted value. For this reason we usually do one parameter F-stat at a time and set up concurrent fits for different parameters on the same computer or different computers. For example, some of the F-stats in Table 15.2 fits were performed on s1, s2, K2, and koff, thus requiring four separate searches of error space. Done concurrently as four separate fits can take up to 24 h, even on a fast computer; done consecutively, this might take 2–3 days. To run different fits on different computers involves copying 9 data sets, a control file (which defines the fit in the control window in Sedanal), and a Modelinfo.txt file. These can all be conveniently transferred between computers by clicking on Package on the Control screen; this creates a zip file containing of all the required files for an analysis. This zip file can be loaded on several different machines to carry out the F-stat analysis. (Poor man’s parallel processing.) We are currently working on parallelizing SEDANAL as much as possible to run on multiprocessor PCs. 12. Upon obtaining good fits of the data, one can release constraints on limited numbers of parameters and then investigate their uncertainties. 3

Note that Monte Carlo and bootstrap (with or without replacement) methods have historically been used to estimate error bars by simulating and fitting many different noise perturbed data sets or by fitting select or limited regions of an experimental data set. In general, F-stat approaches, first applied to sedimentation equilibrium data analysis in the program Nonlin ( Johnson, et al., 1981), give much larger and more asymmetric estimates of error bars than these other methods. One can think of a Monte Carlo or bootstrap method as exploring the size of the minima in the error space and not the width at the F-stat limit. Parameter correlation effects account for the major differences between the approaches.

Extracting Equilibrium Constants

441

In the monomer-dimer examples above, after fitting K2 or K2 and koff we next floated s2 and then s1 and s2. Because s2 was well determined, it forced us to trust those estimates rather than impose some other constraint on the system. Alternatively, as s1 was much less well determined, one might consider fixing s2 and constraining s1 by an n2/3 rule. The rms of the fit is the ultimate determinant of the best fit, although one prefers to avoid parameter values that contradict hydrodynamic principles (e.g., having species sediment faster than physically possible for their size, shape, and partial specific volume). In Sednterp this shows up as s > smax error messages. (In Sednterp, the inquiring user can type in the molecular weight of a protein or proposed complex and then predict the s value and, under Results, look at the axial ratio required to generate this hydrodynamic behavior.) 13. What can go wrong? Nonrandom residuals imply the wrong model at worst or incorrect parameters for the correct model. A lot of computer work usually has to be done to reject a model. One must first explore parameter values; therefore, changing the guess for both fitted and fixed parameters is a good starting place. Be systematic, using the flow chart described previously for sedimentation coefficients as an example. In the well-behaved cases shown earlier, all fits were relatively reasonable, even for incorrect s1 values. Ideally, convergence of the fit to the correct values improves the goodness of fit and the confidence intervals verify the assignments. As previously, the uncertainty in these cases requires a realistic evaluation of the confidence intervals. Furthermore, sedimentation velocity may not be the best way to measure kinetics for many systems, although we suggest when self-association or macromolecular interactions are involved that sedimentation studies are certainly an extremely useful technique for complementing other kinetic methods (see Eccleston et al., 2008). 14. What if the residuals are nonrandom and nothing seems to improve them? The best case scenario is that the model is basically correct (e.g., monomer-dimer) but there is heterogeneity in the sample. Maybe there is incompetent monomer (Snyder et al., 2004) or aggregated or disulfide cross-linked dimer present. This can be seen as a trailing or leading deviation in the residuals. This is where ModelEditor is very useful. Make a monomer-dimer model with a second component that is noninteracting (2A $A2; and A0 ) and refit the data. If there are deviations in the residuals on the lower s side of the sedimentation boundary, then assume it is a monomer; if the deviations are on the higher s side, assume it is a dimer; and fit for s initially assuming the concentration of A0 is a global parameter present at the same fractional amounts in all samples. This constraint can be released later. Then float the molecular weight to verify the identity of the contaminant. A more complex case might involve heterogeneity in K2, where different isoforms have

442

John J. Correia and Walter F. Stafford

different association constants, or sample aging produces a distribution of affinities. All of these situations show up as a systematic variation in K2 with the affinity decreasing with increasing loading concentration (Yphantis et al., 1978; Xu, 2004).

6. Final Thoughts This simple tutorial is hopefully a useful tool for outlining how to analyze velocity data collected in the analytical ultracentrifuge by direct boundary fitting. As the system becomes more complex, simple approaches still apply, although each additional component must be studied with the same care and precision. The more components, the more species, the more parameters, the more cross-correlation, and the more difficult it is to arrive at a single simple solution. (This is where 32 data sets may become important.) Constraining parameters becomes more critical. Systems biology approaches can apparently simulate anything but the uniqueness of the parameter set remains a challenging open question (Alon, 2007). Use the literature, and learn from the past. Our ongoing research projects include Caþ2-mediated nonideal dimerization, heteroassociations (1:1 and 2:1), antibody-antigen interactions, phosphorylation-dependent transcription factor associations, ligand-mediated oligomerization, and indefinite associations. All of these systems are being approached with direct boundary-fitting methods. The strategies described here apply. There is extensive mention in data analysis literature to the best model or the simplest model that describes the data. In the AUC field this has come to be known as the most parsimonious model applying the principles of Occam’s razor (Brookes and Demeler, 2008; Brown et al., 2007), which is based on the idea that the simplest model (the model with the fewest assumptions) that fits the data (in the rms sense) is the best model. If more parameters do not improve the fit, then exclude them (shave them away as with a razor), ignore them, or constrain them in some reasonable way. This was the case in the equilibrium and kinetically mediated data sets where s1 had a surprisingly small impact on the other parameters or on the goodness of fit. For these data, K2 and koff were fairly well determined. K2 was consistently within a factor of two of the correct answer, while koff was typically within the error bars of the uncertainty, with caveats about parameter correlations and not being to close to the edges of the 0.005 to 1.0 105 s1 experimentally measurable regime. May this success be your good fortune in many of the systems you study as well.

Extracting Equilibrium Constants

443

ACKNOWLEDGMENTS We thank our collaborators and their and our experimental systems that over the years have challenged and taught us how to problem solve with AUC data. We thank Nichola Garbett, David Dignam, and Jim Cole for reading earlier versions of this chapter. The authors regret the need to be selective in referencing and apologize for all the omissions necessitated by the ever-expanding work in this field.

REFERENCES Alon, U. (2007). ‘‘An introduction to systems biology: Design principles of biological circuits.’’ Chapman & Hall/CRC Mathematical and Computational Biology Series, London. Arakawa, T., Philo, J. S., Ejima, D., Tsumoto, K., and Arisaka, F. (2007). ‘‘Aggregation analysis of therapeutic proteins, part 2.’’ Bioprocess International 5, 36–47. Belford, G. G., and Belford, R. L. (1962). ‘‘Sedimentation in chemically reacting systems. II. Numerical calculations for dimerization.’’ J. Chem Phys. 37, 1926–1932. Bernasconi, C. F. (1976). ‘‘Relaxation kinetics’’ pp. 14–15. Academic Press, New York. Brookes, E., and Demeler, B. (2007). ‘‘Parsimonious regularization using genetic algorithms applied to the analysis of analytical ultracentrifugation experiments.’’ GECCO Proc. ACM. 978-1-59593-697. Brown, P., Balbo, A., and Schuck, P. (2007). Using prior knowledge in the determination of macromolecular size-distributions by analytical ultracentrifugation. Biomacromol. 8, 2011–2024. Byron, O. (2008). ‘‘Hydrodynamic modeling: The solution conformation of macromolecules and their complexes.’’ Methods Cell Biol. 84, 327–373. Cann, J. R. (1970). ‘‘Interacting macromolecules: The theory and practice of their electrophoresis, ultracentrifugation, and chromatography.’’ Academic Press, New York. Can, J. R, and Kegele, G (1974). Theory of sedimentation for kinetically controlled dimerization reaction. Biochemistry 13, 1868–1874. Cann, J. R. (1978a). Measurement of protein interactions mediated by small molecules using sedimentation velocity. Methods Enzymol. 48, 242–248. Cann, J. R. (1978b). Ligand binding by associating sytems. Methods Enzymol. 48, 299–307. Chen, W., Lam, S. S., Srinath, H., Jiang, Z., Correia, J. J., Schiffer, C. A., Fitzgerald, K. A., Lin, K., and Royer, W. E., Jr. (2008). Interferon regulatory factor activation revealed by the crystal structure of dimeric IRF-5. Nature Struct. & Mol. Biol. 15, 1213–1220. Claverie, J. M. (1976). Sedimentation of generalized systems of interacting particles III. Concentration dependent sedimentation and extension to other transport methods. Biopolymers 15, 843–857. Claverie, J. M., Dreux, H., and Cohen, R. (1975). Sedimentation of generalized systems of interacting particles. I. Solution of systems of complete Lamm equations. Biopolymers 14, 1685–1700. Correia, J. J. (2000). ‘‘The analysis of weight average sedimentation data.’’ Methods in Enzymol. 321, 81–100. Correia, J. J., Chacko, B. M., Lam, S. S., and Lin, K. (2001). Sedimentation studies reveal a direct role of phosphorylation in Smad3:Smad4 homo- and hetero-trimerization. Biochemistry 40, 1473–1482. Correia, J. J., Gilbert, S. P., Moyer, M. L., and Johnson, K. A. (1995). Sedimentation studies on the kinesin head domain constructs K401, K366 and K341. Biochemistry 34, 4898–4907.

444

John J. Correia and Walter F. Stafford

Correia, J. J., Johnson, M. L., Weiss, G. H., and Yphantis, D. A. (1976). Numerical study of the Johnson–Ogston effect in two component systems. Biophysical Chem. 5, 255–264. Cox, D. J. (1978). Calculation of simulated sedimentation velocity profiles for selfassociating solutes. Methods Enzymol. 48, 212–242. Dam, J., Velikovsky, C. A., Mariuzza, R. A., Urbanke, C., and Schuck, P. (2005). Sedimentation velocity analysis of heterogeneous protein-protein interactions: Lamm equation modeling and sedimentation coefficient distributions c(s). Biophys. J. 89, 619–634. Demeler, B., and Saber, H. (1998). Determination of molecular parameters by fitting sedimentation data to finite-element solutions of the Lamm equation. Biophys. J. 74, 444–454. Dishon, M., Weiss, G. H., and Yphantis, D. A. (1966). Numerical solutions of the Lamm equation. I. Numerical procedure. Biopolymers 4, 449–456. Dishon, M., Weiss, G. H., and Yphantis, D. A. (1967). Numerical solutions of the Lamm equation. III. Velocity centrifugation. Biopolymers 5, 697–713. Eccleston, J. F., Martin, S. R., and Schilstra, M. J. (2008). Rapid kinetic techniques. Methods Cell Biol. 84, 445–477. Eisenberg, H. (2002). Modern analytical ultracentrifugation in protein science: Look forward, not back. Protein Sci. 11, 2647–2649. Gelinas, A. D., Toth, J., Bethoney, K. A., Stafford, W. F., and Harrison, C. J. (2004). Mutational analysis of the energetics of the GrpEDnaK binding interface: Equilibrium association constants by sedimentation velocity analytical ultracentrifugation. J. Mol. Biol. 339, 447–458. Gilbert, G. A. (1955). General discussion. Discuss Faraday Soc. 20, 65–77. Gilbert, G. A. (1959). Sedimentation and electrophoresis of interacting substances. 1. Idealized boundary shape for a single substance aggregating reversibly. Proc. Roy. Soc. (London) A250, 377–388. Gilbert, G. A. (1960). Concentration-dependent sedimentation of aggregating proteins in the ultracentrifuge. Nature 186, 882–883. Gilbert, G. A., and Jenkins, R. C. Ll. (1959). Sedimentation and electrophoresis of interacting substances. II. Asymptotic boundary shape for two substances interacting reversibly. Proc. Royal Soc. London Ser. A 253, 420–437. Gilbert, L. M., and Gilbert, G. A. (1978). Molecular transport of reversibly reacting systems: Asymptotic boundary profiles in sedimentation, electrophoresis, and chromatography. Methods Enzymol. 48, 195–213. Johnson, M. L., Correia, J. J., Halvorson, H., and Yphantis, D. A. (1981). Analysis of data from the analytical ultracentrifuge by nonlinear least-squares techniques. Biophysical J. 36, 575–588. Kegeles, G. (1978). Pressure-jump light-scattering observations of macromolecular interaction kinetics. Methods Enzymol. 48, 308–320. Kegeles, G., and Cann, J. (1978). Kinetically controlled mass transport of associatingdossociating macromolecules. Methods Enzymol. 48, 248–270. Kegeles, G., Rhodes, L., and Bethune, J. L. (1967). Sedimentation behavior of chemically reacting systems. PNAS 58, 45–51. Laue, T. M., Shah, B. D., Ridgeway, T. M., and Pelletier, S. L. (1992). Computer-aided interpretation of analytical sedimentation data for proteins. In ‘‘Analytical ultracentrifugation in biochemistry and polymer science’’ (S. E. Harding, et al., eds.), pp. 90–125. Royal Society of Chemistry, Cambridge, UK. Nelder, J. A., and Mead, R. (1965). A simplex method for function minimization. Comput. J. 7, 308–313.

Extracting Equilibrium Constants

445

Oberhauser, D. F., Bethune, J. L., and Kegeles, G. (1965). Countercurrent distibutions of chemically reacting systems: IV. Kinectically controlled dimerization in a boundary. Biochemistry 4, 1878–1884. Pace, C. N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995). How to measure and predict the molar absorption coefficient of a protein. Prot. Sci 4, 2411–2423. Philo, J. S. (2000). A method for directly fitting the time derivative of sedimentation velocity data and an alternative algorithm for calculating sedimentation coefficient distribution functions. Anal. Biochem. 279, 151–163. Philo, J. S. (2006). Improved methods for fitting sedimentation coefficient distributions derived by time-derivative techniques. Anal. Biochem. 354, 238–246. Roark, D. E., and Yphantis, D. A. (1969). Studies of self-associating systems by equilibrium ultracentrifugation. Ann. N.Y. Acad. Sci. 164, 245–278. Schuck, P. (1998). Sedimentation analysis of noninteracting and self-associating solutes using numerical solutions to the Lamm equation. Biophys. J. 75, 1503–1512. Schuck, P., and Demeler, B. (1999). Direct sedimentation analysis of interference optical data in analytical ultracentrifugation. Biophys. J. 76, 2288–2296. Snyder, D., Lary, J., Chen, Y., Gollnick, P., and Cole, J. L. (2004). Interaction of the trp RNA-binding attenuation protein (TRAP) with anti-TRAP. J. Mol. Biol. 338, 669–662. Sontag, C. A., Stafford, W. F., and Correia, J. J. (2004). A comparison of weight average and direct boundary fitting of sedimentation velocity data for indefinite polymerizing systems. Biophys. Chem. 108, 215–230. Sophianopoulos, A. J., and van Holde, K. E. (1964). Physical studies of muramidase (lysozyme) II. pH-dependent dimerization. J. Biol. Chem. 239, 2516–2524. Stafford, W. F. (1980). Graphical analysis of nonideal monomer, N-mer, isodesmic, and type II indefinite self-associating systems by equilibrium ultracentrifugation. Biophys. J. 29, 149–166. Stafford, W. F. (1992). Boundary analysis in sedimentation transport experiments: A procedure for obtaining sedimentation coefficient distributions using the time derivative of the concentration profile. Anal. Biochem. 203, 295–301. Stafford, W. F. (1998). Time difference sedimentation velocity analysis of rapidly reversible interacting systems: Determination of equilibrium constants by non-linear curve fitting procedures. Biophys. J. 74, A301. Stafford, W. F. (2000). Analysis of reversibly interacting macromolecular systems by time derivative sedimentation velocity. Methods Enzymol. 323, 302–325. Stafford, W. F. (2009). Protein-protein and ligand-protein interactions studied by analytical ultracentrifugation. In ‘‘Protein structure, stability, and interactions,’’ ( J. W. Schriver, ed.), vol. 490, pp. 83–113. Humana Press, New York. Stafford, W. F., and Sherwood, P. J. (2004). Analysis of heterologous interacting systems by sedimentation velocity: Curve fitting algorithms for estimation of sedimentation coefficients, equilibrium and kinetic constants. Biophys. Chem. 108, 231–243. Svedberg, T., and Nichols, J. B. (1927). The application of the oil turbine type of ultracentrifuge to the study of the stability region of CO-hemoglobin. J. Am. Chem. Soc. 49, 2920–2934. Tai, M., and Kegeles, G. (1984). A micelle model for the sedimentation behavior of bovine beta-casein. Biophys. Chem. 20, 81–87. Todd, G. P., and Haschemeyer, R. H. (1981). General solution to the inverse problem of the differential equation of the ultracentrifuge. PNAS 78, 6739–6743. van Holde, K. E. (1962). Sedimentation in chemically reacting systems. I. The isomerization reaction. J. Chem. Phys. 37, 1922–1926. van Holde, K. E. (2004). Sedimentation equilibrium and the foundations of protein chemistry. Biophys. Chem. 108, 5–8.

446

John J. Correia and Walter F. Stafford

Xu, Y. (2004). Characterization of macromolecular heterogeneity by equilibrium sedimentation techniques. Biophys. Chem. 108, 141–163. Yphantis, D. A., and Roark, D. E. (1972). Equilibrium centrifugation of nonideal systems. Molecular weight moments for removing the effects of nonideality. Biochemistry 11, 2925–2934. Yphantis, D. A., Correia, J. J., Johnson, M. L., and Wu, G. M. (1978). ‘‘Detection of Heterogeneity in Self Associating Systems,’’ In ‘‘Physical Aspects of Protein Interactions,’’ N. Catsimpoolas, ed. Elsevier, New York, pp. 275–303. Zhao, H., and Beckett, D. (2008). Kinetic partitioning between alternative protein-protein interactions controls a transcriptional switch. J. Mol. Biol. 380, 223–236.

Author Index

A Abel, M. G., 66 Ackerman, A., 217 Ackers, G. K., 46, 47, 141, 193, 195, 196, 200, 205, 206, 207, 209, 210, 211, 406 Adair, B., 217 Adair, G. S., 130, 198, 203, 204, 210 Adamian, L., 229 Afferni, C., 291 Agah, S., 286, 288 Ahn, K. W., 330, 333, 340, 355 Aksel, T., 95 Aldrich, R., 406 Alexander, P., 301 Ali, M. R., 331, 339, 350, 352, 355, 356, 358, 359 Alon, U., 442 Alva, V., 300 Alvarez, L. J., 161 Amstutz, P., 100, 119 Ananthanarayanan, S., 159 Anderson, C. F., 73, 74, 78 Anderson, T. G., 352, 359 Andrec, M., 241 Antao, V. P., 383, 384 Applequist, J., 97 Apweiler, R., 300 Arakawa, T., 437, 438 Arisaka, F., 437, 438 Armen, R. S., 300 Armstrong, R. N., 21 Arnemann, J., 54, 62 Arora, A., 214, 223, 224 Arrowsmith, C. H., 399 Arseniev, A. S., 222 Auton, M., 103 B Babitzke, P., 381 Babu, C. R., 303 Bagheri-Yarmand, R., 243 Bagshaw, C. R., 158 Bahadur, R. P., 400, 410 Bailar, J. C., 78 Bailly, A., 54, 62 Bain, A., 239, 244 Bain, D. L., 43, 44, 45, 46, 47, 48, 51, 52, 53, 54, 56, 59, 60, 61, 63, 64, 66, 67

Baines, I. C., 161 Baker, B. M., 410 Baker, T. S., 398 Balasenthil, S., 243 Balbo, A., 442 Baldwin, R. L., 30, 32, 103 Baldwin, S. A., 224 Baldwin, T. O., 18, 21 Bancroft, J. B., 409 Barbar, E., 236, 237, 238, 241, 242, 243, 244, 246, 247, 248, 249, 250, 251 Barcroft, J., 198, 201, 203 Bare, E., 219, 220, 222, 228 Barford, D., 124 Baritt, P., 304, 306 Barletta, B., 291 Barrick, D., 95, 97, 103, 114, 116, 123 Barshop, B. A., 32 Bateman, A., 300 Baumann, R., 241 Bax, A., 240 Beato, M., 54, 62 Beck, D. A., 300 Beckett, D., 425, 428, 435 Beckwith, J., 218 Bedouelle, H., 21 Beechem, J. M., 262 Belford, G. G., 420 Belford, R. L., 420 Belknap, B., 178 Benevolenskaya, E., 248 Benison, G., 237, 238, 241, 243, 244, 246, 247, 248, 249, 250, 251 Berchtold, M. W., 261 Berg, J. S., 158 Berkholz, D., 241 Berland, C. R., 276 Bernasconi, C. F., 430, 431 Berton, N., 43, 44, 52, 54 Bethoney, K. A., 425, 429, 435, 437 Bethune, J. L., 420 Bevilacqua, P. C., 365, 366, 368, 369, 370, 372, 373, 374, 375, 376, 377, 378, 381, 383, 384, 385, 386, 390 Bevington, P. R., 277, 280 Bezrukov, S. M., 230 Bibbs, L., 396 Bieri, O., 30, 32 Billeter, M., 239, 240

447

448 Biltonen, R., 16 Bina, M., 398 Bina-Stein, M., 72 Binz, H. K., 100, 103, 117, 119, 120, 121 Bishop, J. A., 78 Blose, J. M., 377 Bocquel, M. T., 42, 65, 67 Bohr, C., 201 Bolen, D. W., 20, 103 Bona, D., 22 Bonhivers, M., 224 Booth, P., 214, 219, 220 Borer, P. N., 381, 382 Bose, K., 19, 21, 27 Botelho, A. V., 230 Bothner, B., 396 Botstein, D., 412 Boulanger, P., 224 Bourne, C., 406, 410, 412 Bourne, C. R., 400, 408 Bourne, P. E., 300, 315, 318 Bowers, K. E., 72 Bowie, J., 219, 220, 222, 227, 228 Bowie, J. U., 20, 213, 228, 231 Bragg, J., 97, 100, 104 Brand, H., 42 Brandts, J., 16 Brandts, J. F., 146, 148, 149 Brenner, S. E., 300, 309, 312 Brenowitz, M., 45, 47 Brockwell, D. J., 224 Brodie, J., 67 Bromberg, S., 367 Brookes, E., 442 Brooks, C. L. III., 397, 400 Brosig, B., 218, 228 Brown, A. G., 22 Brown, D. A., 330 Brown, F., 154, 158, 159, 170, 171, 174, 179, 180 Brown, M. F., 230 Brown, P., 442 Brown, T. S., 369, 370, 385 Bruinsma, R. F., 406 Bruller, H.-J., 54, 62 Brune, M., 178, 179 Brush, S. G., 96 Bryan, P., 301 Brzeska, H., 161 Buboltz, J. T., 331, 332, 338, 339, 340, 355 Buczek, P., 148, 149 Bugajska-Schretter, A., 291 Bukhman, Y. V., 366, 367, 380, 388 Burgess, N. K., 223 Burkard, M. E., 382, 383 Burley, S. K., 229 Burnside, B., 159 Bustamante, C., 72

Author Index

Butteroni, C., 291 Bycroft, M., 229 Byron, O., 427 C Caliskan, G., 72, 92 Cammett, T. J., 100, 119 Candia, O. A., 161 Cann, J. R., 419, 420, 421, 430, 431, 436 Cantor, C. R., 380, 381 Cantor, R. S., 230 Cao, W., 159, 168, 172, 174, 180, 182, 186, 187, 188 Capaldi, A. P., 32, 33 Carter, P. J., 368 Casjens, S., 412 Caspar, D. L., 397, 398, 400 Cato, A. C. B., 54, 62 Cavaluzzi, M. J., 382 Cech, T. R., 379, 382, 385 Celio, M. R., 261 Ceres, P., 400, 404, 406, 410, 411, 412 Ceric, G., 300 Cerrone, A. L., 366, 368 Chacko, B. M., 425, 428 Chadalavada, D., 369, 385 Chadi, D. C., 229 Chaffotte, A. F., 30 Chaires, J. B., 19 Chalepakis, G., 54, 62 Chalovich, J. M., 161, 166 Chambat, G., 148 Chamberlain, A., 227 Chambon, P., 43, 44 Chandler, D., 397 Chandonia, J. M., 309, 312 Chang, W., 187 Changeux, J., 251 Chen, G. Q., 219 Chen, L., 315 Chen, M., 238, 243, 246 Chen, W., 430 Chen, Y., 215, 428, 441 Chen, Z., 43, 44 Cheney, R. E., 158 Cheng, K. H., 331, 339, 350, 352, 353, 355, 356, 358, 359, 360, 361 Cheng, N., 406, 413, 414 Cheung, H. C., 154, 158, 159, 170, 171, 174, 179, 180 Chialvo, A. A., 336 Chin, C., 215 Choma, C., 217 Chong, P. L., 330, 353, 360 Chothia, C., 300, 309 Chow, M., 414 Chowdhry, B. Z., 260 Chu, S., 379

449

Author Index

Chung, L., 217 Clardy, J., 238, 241, 246 Clark, A. C., 1, 2, 18, 19, 21, 27, 34, 36 Clarke, A. R., 32 Clarke, R. J., 72 Claverie, J. M., 420 Cleland, W. W., 4 Cocco, M. J., 100, 118 Cochrun, L., 242, 243 Coggill, P. C., 300 Cohen, F. E., 2 Cohen, R., 420 Cohn, E. J., 205 Cole, J. L., 428, 441 Coles, M., 300 Colomer-Pallas, A., 224 Coluccio, L. M., 174 Connaghan-Jones, K. D., 43, 45, 46, 47, 48, 51, 52, 53, 54, 56, 59, 60, 63, 64, 66, 67 Connolly, T. N., 384 Conway, J. F., 406, 413, 414 Correia, J. J., 269, 277, 280, 419, 420, 421, 425, 426, 427, 428, 430, 439, 440, 442 Corrie, J. E., 179 Cortajarena, A. L., 103, 109, 115, 116, 117, 118 Cortes, D. M., 230 Courtemanche, N., 97, 103, 114, 116, 123 Cox, C., 382, 383 Cox, D. J., 420 Cox, J. A., 261 Creighton, T. E., 2, 32 Crick, F. H., 397, 398 Criddle, A. H., 166 Cristian, L., 218 Crothers, D. M., 72, 77, 97, 366 Crowther, R. A., 399, 400 Curnow, P., 219, 220 Curran, A., 215 D Daggett, V., 300 Dalma-Weiszhausz, D. D., 45 Dam, J., 420, 435, 436 Damle, V., 97 Damodaran, K. V., 400 D’Andrea, L., 100, 118 Dang, Q., 189 Danthi, P., 414 Dao, T. P., 223 Da Poian, A. T., 407 Davidson, A. R., 22, 32, 399 Davis, J. H., 330, 354 Davis, M. E., 291 Day, R., 300 Deber, C. M., 229 DeGrado, W. F., 217, 218, 228, 229 Deinzer, M. L., 21

De La Cruz, E. M., 97, 157, 158, 159, 160, 161, 163, 164, 165, 168, 170, 171, 172, 174, 175, 176, 178, 179, 180, 181, 182, 186, 187, 188 DeLano, W. L., 98 De Los Rios, M. A., 22 Demaille, J. G., 267 Demeler, B., 420, 437, 442 den Hollander, P., 243 Denk, W., 241 DePhillips, P., 406, 412 Derancourt, J., 267 Deres, K., 412 Desmadril, M., 224 Desrosiers, D. C., 100, 119 Dewey, T. G., 318 Di, F. G., 291 Di Cera, E., 141, 152, 370, 371, 376 Dierkes, L., 413, 414 Dignam, J. D., 19 Dill, K. A., 3, 367 Ding, H. J., 399 Di Nola, A., 229 Dirr, J. M., 21 Dishon, M., 420 Dixon, A., 215 Domanico, P. L., 408 Dong, X. F., 396, 399 Dose, A. C., 159 Dougherty, D. A., 217 Drak, J., 398 Draper, D. E., 71, 72, 73, 76, 77, 78, 92, 366, 367, 369, 380, 381, 388 Dreux, H., 420 Dryden, K. A., 399, 404, 410 Duarte, C. M., 366 Duda, R. L., 399, 406, 413, 414 Dunbrack, R. L., Jr., 301 Duong, M. T., 219, 228 Dupuy, A., 215 Durand, B., 43, 44 Dyson, H., 248, 249 E Earnshaw, W., 412 Eatough, D., 276 Eccleston, J. F., 158, 441 Eckstein, F., 158 Eddy, S. R., 300 Edelstein, S., 251 Edison, A., 238, 242, 246 Edsall, J. T., 200, 201, 203 Edwards, D. P., 53 Eftink, M. R., 3, 6, 32, 276 Eisenberg, E., 161, 166 Eisenberg, H., 438 Ejima, D., 437, 438

450

Author Index

Eklund, K. K., 353, 360 El Mezgueldi, M., 158, 159, 175 Elofsson, A., 229 Endres, D., 396, 404 Engelman, D., 215, 216, 217, 218, 219, 222, 227, 228 Englander, S. W., 301 Enoki, S., 12 Eriksson, P., 54, 62 F Faham, S., 219, 220, 222, 228, 231 Fahreus, R., 203 Falzone, C. J., 383 Fan, J., 238, 242, 248, 250 Fane, B. A., 399 Farooq, A., 214, 219, 220 Farrow, N. A., 244 Faunt, L. M., 262 Fee, L., 439 Feeney, B., 27, 34, 36 Feerrar, J. C., 376, 377, 378 Feigenson, G. W., 331, 332, 334, 335, 337, 338, 339, 340, 344, 346, 347, 353, 355, 357, 361 Felitsky, D. J., 74 Ferguson, N., 32, 33 Ferreon, J. C., 301 Fersht, A. R., 229, 368 Fierke, C. A., 72 Fillmore, C., 399 Filmer, D., 209 Finch, J. T., 399 Finn, M. G., 400, 406, 410, 412 Finn, R. D., 300 Firbank, S., 123 Firek, B. A., 413, 414 Fitzgerald, K. A., 430 Flannery, B. P., 306, 310 Fleischmann, W., 300 Fleming, K. G., 217, 219, 223, 228 Flitsch, S., 214, 219, 220 Focke, M., 291 Forman-Kay, J. D., 244 Forrer, P., 100, 119 Forslund, K., 300 Foth, B. J., 158 Franden, M. A., 43, 53, 67 Franzen, J. S., 228 Frederick, K. K., 300 Freire, E., 127, 260, 302, 304 Frere, J. M., 27 Fresco, J. R., 381 Freyer, M. W., 260 Fridborg, K., 399 Frieden, C., 32, 189, 401 Friel, C. T., 22

Fritz, H. J., 218 Fukada, H., 263 G Galisteo, M. L., 406 Gallivan, J. P., 217 Ganem, D., 396, 410 Gannon, F., 42 Ganser, B. K., 399 Garcia-Moreno, E. B., 301, 304 Gaspar, L. P., 407 Geeves, M. A., 158, 166, 172, 174, 182 Gelbart, W. M., 406 Gelinas, A. D., 425, 429, 435, 437 Georgescu, R. E., 30 Gesteland, R. F., 381 Gewirth, D. T., 91 Gilbert, G. A., 420 Gilbert, L. M., 420 Gilbert, S. P., 178, 425, 428 Gilboa-Garber, N., 148 Gill, S. J., 44, 76, 128, 130, 131, 132, 141, 151, 152 Gloss, L. M., 18, 21 Gluick, T. C., 77, 366, 367, 380, 381, 388 Goedecke, M. C., 158 Gold, M., 399 Goldberg, M. E., 30 Goldberg, R. N., 263 Goldman, A., 215 Goldman, Y. E., 168 Goldmann, S., 412 Gollnick, P., 428, 441 Golmohammadi, R., 399 Gon˜i, G., 151 Good, N. E., 384 Goodpasture, E. A., 286 Goody, R. S., 158 Gorenstein, D. G., 301 Gorshkova, I., 148, 149 Gouaux, E., 219 Grabo, M., 217 Graef, E., 412 Granseth, E., 229 Gratkowski, H., 217, 228 Gray, D. M., 381 Gray, T., 439 Greene, L. H., 300 Greene, R. F., 12 Greene, R. F., Jr., 226 Gregor, I., 217 Grilley, D., 71, 72, 73, 76, 92 Grimsley, G. R., 4, 439 Grimsley, J. K., 21 Grishin, N. V., 300 Gronemeyer, H., 42, 43, 44, 65, 67 Gross, B., 54, 62

451

Author Index

Gross, H. J., 72 Grosshans, C. A., 382 Groves, M. R., 124 Grutter, M. G., 123 Gu, J., 304, 306 Gue´ron, M., 77 Guillaume, G., 27 Guntert, P., 240 Guo, W., 238, 241, 246 Gurezka, R., 218, 228 Gustafsson, J.-A., 68 Gutfreund, H., 158 Gvozdev, V., 248 H Hach, R., 72 Hacker, H. J., 412 Hackney, D. D., 159, 168, 172, 174, 180, 186, 187, 188 Hagan, M. F., 397 Hager, G. L., 42 Hagerman, P., 366 Haiech, J., 267 Haile, J. M., 336 Hall, A., 251 Hall, J., 251 Halvorson, H. R., 196, 210, 269, 277, 280, 440 Hannemann, D. E., 159, 172, 174, 182, 187, 188 Hapak, R. C., 286 Harder, M. E., 21 Hare, M., 238, 242, 243, 248, 249 Harrison, C. J., 425, 429, 435, 437 Harrison, L. N., 18 Harrison, S., 412 Hartl, D., 248 Haschemeyer, R. H., 420 Hasselbalch, K. A., 201 Hayek, B., 291 Hays, T., 238, 242, 243, 244, 246, 247, 248, 249 Hedberg, L., 22 Heger, A., 300 Heidary, D. K., 34 Heizmann, C. W., 261 Helfgott, C., 90 Hempel, J., 413 Hendrix, R. W., 399, 406, 413, 414 Heneghan, A. F., 43, 44, 45, 46, 47, 48, 51, 52, 53, 54, 56, 59, 60, 61, 63, 64, 66, 67 Henn, A., 159, 165, 168, 172, 174, 178, 180, 186, 187, 188 Henzl, M. T., 259, 286, 288, 291 Henzler-Wildman, K., 300 Herschlag, D., 379, 385 Higuchi, H., 159

Hill, A. V., 141, 152, 198, 201, 203 Hill, T. L., 50 Hills, G. J., 409 Hilmer, J. K., 396 Hilser, V. J., 299, 301, 302, 303, 304, 305, 306, 319, 320 Hirao, I., 377 Hiratsuka, T., 173 Hodges, R. S., 286 Hogle, J. M., 414 Hokanson, D. E., 159, 174, 184 Holetzki, D., 412 Holm, L., 300, 312, 315 Holmes, K. C., 158 Holt, J. M., 141, 193, 195, 196, 205, 209 Holthauzen, L. M., 103 Hon, G., 309, 312 Hong, H., 213, 224, 227, 229, 230, 231 Hong, J., 74 Honig, B., 300 Hornby, J. A., 21 Horng, J. C., 22 Horton, N., 410 Horvath, M. P., 148, 149 Horwitz, K. B., 42, 43, 53, 65, 66, 67 Hotz, H. R., 300 Hovland, A. R., 42, 53, 65 Hristova, K., 217, 218, 227, 228 Hu, A., 159 Hu, D. D., 276 Huang, D., 243 Huang, J., 329, 331, 334, 335, 337, 338, 339, 340, 344, 346, 347, 350, 352, 353, 354, 355, 356, 357, 358, 359, 361 Huang, Y., 238, 242, 246 Hubbard, T., 300, 309 Huber, T., 230 Hubner, M. R., 42 Hu¨fner, G., 200, 201 Hunt, D., 413 Hunter, J. L., 179 Hurd, C., 44 Huysmans, G. H., 224 I Iacovacci, P., 291 Ibel, K., 220 Ichimura, K., 146 Igumenova, T. I., 300 Ikai, A., 2 Imberty, A., 148 Imhoff, D., 238, 242, 243 Inobe, T., 12 Ishido, Y., 377 Ishiwata, S., 159 Ising, E., 96 Izawa, S., 384

452

Author Index J

Jackson, S. E., 34, 100, 118 Jacobsen, B. M., 66 Jaenicke, R., 32 Jaffrey, S., 238, 241, 246 Jahnig, F., 223 James, M. N. G., 261 James, R., 32, 33 Janin, J., 400, 410 Jaszewski, T. M., 219, 228 Jayasinghe, S., 227 Jeffries, T., 166 Jenkins, R. C. L., 420 Jennings, P. A., 34 Jensen, G. J., 399 Jia, Y., 318 Jiang, H., 399 Jiang, Z., 430 Jiao, X., 382, 383 Jimenez, R. H., 227, 229, 231 Joh, N. H., 213, 228, 231 Johnson, A. D., 47, 399 Johnson, J. E., 396, 399, 400, 404, 410 Johnson, J. M., 396, 399, 404, 406, 408, 409, 410 Johnson, K. A., 158, 175, 177, 178, 188, 425, 428 Johnson, M. L., 44, 109, 112, 117, 196, 210, 262, 269, 277, 280, 420, 440, 442 Johnston, L. B., 406, 412 Jones, L., 217 K Kabsch, W., 303, 318 Kagi, U., 261 Kahn, J. D., 366 Kahne, D., 228 Kajander, T., 103, 109, 115, 116, 117, 118 Kajava, A. V., 98, 99 Kallenbach, N. R., 97 Kamagata, K., 12 Kan, H., 238, 242 Kandegedara, A., 384 Kaplan, W., 21 Kapp, G., 301, 304 Karplus, P. A., 246, 248, 250, 251 Katen, S., 395 Kawai, G., 377 Kawasaki, H., 261 Kawashima, S., 146 Kay, L. E., 244, 300 Kazemi, N., 230 Kebbekus, P., 366 Keelara, V., 195, 209 Kegel, W., 403 Kegeles, G., 419, 420, 430, 431, 436, 445 Keller, W., 291 Kelley, J. W., 2

Kendrick-Jones, J., 159 Kenig, M., 100, 103, 117, 119, 120, 121 Kent, C., 251 Kern, D., 300 Khare, D., 301 Khorana, H., 219 Kieffer, C., 399 Kiefhaber, T., 30, 32, 224 Kierzek, E., 377, 378, 383 Kierzek, R., 382, 383 Kihara, H., 146 Kim, S., 227, 228 Kinch, L. N., 300 King, J., 406, 412 King, S., 243 Kingsley, P. B., 331 Kinnunen, P. K., 353, 360 Kirschner, K., 220 Kishore, N., 263 Kisselev, L., 72 Kleanthous, C., 32, 33 Kleinman, B., 238, 242, 243 Kleinschmidt, J. H., 214, 216, 223, 224, 226 Klinger, A. L., 195, 209 Klishko, V. Y., 399 Kloda, A., 230 Kloss, E., 97, 116, 123 Klostermeier, D., 372, 374, 378, 379 Klotz, I. M., 228 Klotz, L. C., 381 Klug, A., 398 Klymenko, O. V., 34 Kobe, B., 99 Koebnik, R., 224 Koehl, P., 309, 312 Kolmar, H., 218 Kolodny, R., 300 Kontaxis, G., 240 Kopan, R., 109 Korba, B., 406, 410, 412 Koretke, K. K., 300 Korn, E. D., 161 Korolev, S., 109 Kos, M., 42 Koshland, D. E., 209 Kostlanova, N., 148 Kouyama, T., 159, 166 Kovacs, M., 159 Kraft, D., 291 Krakauer, H., 72 Kramer, T., 412 Kretsinger, R. H., 261 Kriventseva, E. V., 300 Kroenke, C., 239 Krogh, A., 201 Kumar, R., 67, 238, 243, 246 Kurtz, A. J., 304 Kuwajima, K., 12

453

Author Index L Laage, R., 228 Laakso, J. M., 159 Ladbury, J. E., 260 Lahmann, M., 148 Lai, S. Y., 383, 384 Laing, L. G., 77 Lakowicz, J. R., 3, 28 Lam, S. S., 430 Landolt-Marticorena, C., 229 Langosch, D., 218, 228 Lanzetta, P. A., 161 Larson, J. D., 286 Larson, S. A., 299, 304, 305, 306, 319, 320 Lary, J., 428, 441 Lathrop, R. H., 315 Lau, F., 219, 220, 222 Laue, T. M., 439 Law, P. B., 229 Lear, J. D., 217, 218, 228 Leavitt, S., 260 Lecomte, J. T., 300, 369, 370 Lee, A., 215, 408 Lee, J. C., 304 Lee, S., 406, 410, 412 Lee, Y. H., 286, 288 Leeds, J. A., 218 Lefstin, J. A., 62, 68 Lehnert, U., 215 Leid, M. E., 21 Lejeune, A., 27 Lenk, E., 412 Lennen, R. M., 263 Lenz, W., 96 Lerouge, T., 42, 65, 67 Leroy, J. L., 77 Lesk, A. M., 300 Leslie, A., 239 Leslie, A. G., 399, 400 Letellier, L., 224 Levitt, M., 309, 312 Lewis, E. A., 260 Lewis, J. H., 159, 168, 174, 184 Lewis, J. K., 396 Lewis, M., 410 Lewis, W. G., 400 Li, E., 217 Li, F., 228 Li, J. H., 30 Li, K., 400 Li, M., 238, 242, 243, 248 Li, S., 399 Liang, J., 229, 238, 241, 246 Lifson, S., 97 Liljas, L., 399 Lin, K., 430 Lin, L.-N., 146, 148, 149

Lin, T., 159, 168, 174, 184 Lin, W. B., 399 Liu, J. C. I., 78, 406, 409 Liu, T., 304 Liu, Z., 42 Lo, K., 238, 242, 248 Lo Conte, L., 309, 312 Lohman, T. M., 189 London, E., 219, 330, 352 Loria, J., 239 Lortat-Jacob, H., 148 Lowe, A. R., 100, 118 Lu, Y., 366 Lubman, O. Y., 109 Luck, S. D., 21 Ludwig, B., 217 Lumry, R., 16 Lundahl, P., 220 Luo, J. K., 21 Lupas, A. N., 300 Lustig, A., 217 Luxon, B. A., 301 Lymn, R. W., 158, 173, 174 Lynch, T. J., 161 M Ma, J., 238, 243, 246 MacKenzie, K. R., 219, 228 Main, E. R., 22, 100, 103, 115, 116, 117, 118 Maki, K., 12 Makino, S., 217 Makokha, M., 238, 242, 246, 248 Malinvemi, J., 228 Mallam, A. L., 34 Mallick, S., 300 Malmodin, D., 239, 240 Manceva, S., 168 Manly, B. F. J., 306 Mann, C. J., 6, 226 Manning, N. G., 66 Mari, A., 291 Mariuzza, R. A., 420, 435, 436 Marjoliash, E., 203 Markham, R., 409 Marsden, B. J., 286 Marsh, D., 230 Martell, A. E., 78, 81, 82 Martin, S. R., 441 Martinac, B., 230 Mascher, E., 220 Mason, A. B., 146, 148, 149 Mathews, D. H., 382 Matthews, C. R., 2, 6, 21, 30, 32, 226 Matthews, E., 215 Mattos, C., 34, 36 Maxwell, K. L., 22, 399 May, R. P., 220

454 Mayne, L. C., 301 McConnell, M. H., 352, 359, 360 McEwan, I. J., 67 McGhee, J. D., 97 McKenney, K. H., 148, 149 McKillop, D. F., 172 McManaman, J. L., 43, 53, 67 Mead, R., 421 Medina, M., 151 Megha, E., 352 Mello, C. C., 103, 123 Melville, M. Y., 42, 53, 65 Merz, T., 123 Metivier, R., 42 Metzger, S. L., 374, 378, 383, 384, 386, 390 Meyer, M. E., 42, 65, 67 Michel, H., 413 Michnick, S. W., 219 Mihashi, K., 159, 166 Mikhailenko, S. V., 159 Milam, S. L., 1, 34, 36 Miles, K., 355, 358, 361 Milgrom, E., 54, 62 Millar, D. P., 372, 374, 378, 379 Miller, E. J., 22 Milligan, R. A., 154, 158, 159, 170, 171, 174, 179, 180 Mills, F. C., 196, 210 Milne, J. S., 301 Min, A., 228, 231 Minor, D. L., Jr., 100, 119 Minor, W., 241 Minton, A. P., 205 Misra, V. K., 72, 73, 78, 92, 367, 369 Mistry, J., 300 Mitchell, E. P., 148 Mittermaier, A., 300 Mittl, P., 123 Miura, K., 377 Miura, M. T., 43, 44, 45, 46, 47, 48, 51, 52, 53, 54, 56, 59, 60, 61, 63, 64, 66, 67 Mochrie, S. G., 100, 103, 109, 115, 116, 117, 118 Moeck, G. S., 224 Mogensen, J. E., 221, 223, 224 Molloy, J. E., 159 Montelione, G., 238, 242, 246 Moody, A. D., 47, 48, 376, 377, 378 Moody, E. M., 369, 370, 372, 373, 374, 375, 376, 377, 378 Moore, J. E., 159 Moore, J. L., 148, 149 Moore, K. J., 189 Moore, P. B., 91 Mooseker, M. S., 187 Morton, R. T., 400 Mosavi, L. K., 100, 119 Moudgil, V. K., 44

Author Index

Moyer, M. L., 425, 428 Mukherjee, S., 406, 409, 412 Munoz, V., 123 Murminskaya, M., 248 Murphy, K. P., 260, 398, 410 Murthy, V. L., 366 Murzin, A. G., 300, 309 Musina, L., 222 Myers, J. K., 20 N Nagaich, A. K., 42 Naisbitt, S., 248 Nakano, S., 366, 368, 369, 370 Nakatani, H., 12 Nanda, V., 229 Natarajan, P., 399, 400 Nath, U., 30 Natter, S., 291 Nelder, J. A., 421 Nemethy, G., 209 Nguyen, D., 243 Nguyen, H. D., 397 Ni, H., 78 Nicely, N. I., 34, 36 Nichols, J. B., 420 Niederberger, V., 291 Niewohner, U., 412 Nishimura, Y., 377 Niss, M., 96 Nissen, M., 53 Nockolds, C. E., 261 Norwood, S., 242, 243 Notides, A. C., 65 Noy, N., 43, 44 Nozaki, Y., 217 Nurminsky, D., 248 Nyame, Y., 404, 409 Nyarko, A., 238, 242, 243, 244, 246, 247, 248, 249, 250 O Oas, T. G., 301, 304 Oberhauser, D. F., 420 Oguchi, Y., 159 Ohki, T., 159 Okerberg, B., 400 Olivares, A. O., 159, 172, 174, 182, 187, 188 Oliveira, A. C., 407 Olsher, M., 360 O’Malley, B. W., 42, 53, 62, 67 Onate, S. A., 53 O’Neill, J. C., 34 Orban, J., 301 O’Reilly, L., 243 Orekhov, V., 222 Oroguchi, T., 12

455

Author Index

Oscarson, S., 148 Ostap, E. M., 154, 157, 158, 159, 160, 165, 168, 170, 171, 174, 175, 176, 179, 180, 181, 184 Otwinowski, Z., 241 Otzen, D. E., 217, 219, 220, 221, 222, 223, 224 P Pace, C. N., 4, 6, 8, 10, 12, 19, 20, 21, 103, 114, 226, 439 Paessens, A., 412 Pain, R. H., 27 Palmer, A., 239 Pan, H., 304 Parent, K. N, 406, 413 Parente, A. D., 385 Park, J., 312 Park, Y. C., 21 Parker, A., 355, 358, 361 Parker, M. J., 32 Pastore, A., 291 Pauling, L., 196 Pauls, T., 261 Pauls, T. L., 261 Pechere, J. F., 267 Pei, J., 314 Pelletier, S. L., 439 Peng, Z. Y., 100, 119 Penot, G., 42 Pepper, D. S., 264 Peregrina, J. R., 151 Perlmann, T., 54, 62 Perozo, E., 230 Pervushin, K. V., 222 Petrey, D., 300 Petsko, G. A., 229 Pettijohn, D. E., 53 Pfeifer, C. M., 406, 409 Phale, P. S., 224 Phale, V. P., 224 Pham, H., 168 Philippsen, A., 224 Phillipson, P., 132 Philo, J. S., 262, 420, 422, 426, 427, 437, 438 Pickett, J. S., 72 Pini, C., 291 Pink, H., 90 Placek, B. J., 18 Platz, M. S., 34 Pleiss, U., 412 Pluckthun, A., 100, 103, 117, 119, 120, 121, 123 Poland, D., 97, 104 Pollard, T. D., 159, 161, 189 Pomethun, M. S., 304 Pop, C., 27 Popot, J., 215, 216, 222, 227 Popov, A. I., 222

Powell, B. C., 158 Prendergast, P., 53 Press, W. H., 306, 310 Prestegard, J., 228, 241 Prevelige, P. E., Jr., 399, 400, 407, 413 Prince, A. M., 396, 410 Privalov, P. L., 3 Proctor, D. J., 383 Puggioni, E. M., 291 Puglisi, J. D., 380, 381, 385 Punna, S., 400 Pursifull, N., 242, 243, 251 Puthalakath, H., 243 Pyle, A. M., 366 Q Qu, X., 19 Quirin-Stricker, C., 42, 65, 67 R Radford, S. E., 32, 33, 224 Radhakrishnan, A., 352, 359, 360 Rapaport, D. C., 397 Rapoport, T. A., 215 Ratliff, R. L., 381 Rauch, C., 54, 62 Rayala, S., 238, 243, 246 Rayment, I., 178, 179, 181 Record, M. T., 74 Record, M. T., Jr., 73, 78 Reddy, L., 217 Reddy, V. S., 397, 400 Reeves, R., 53 Regan, L., 100, 103, 109, 115, 116, 117, 118 Regenass, M., 217 Reguera, D., 403, 406 Reid, G., 42 Reinach, P. S., 161 Reiss, H., 403 Reithmeier, R. A., 229 Remy, I., 219 Renthal, R., 220, 222 Reshetnyak, Y., 215 Reynolds, J., 217 Rhodes, L., 420 Richards, E. G., 381 Richer, J. K., 66 Ridgeway, T. M., 439 Rinehart, D., 223, 224, 227, 229, 231 Rischel, C., 240 Roark, D. E., 426, 438 Robblee, J. P., 159, 172, 174, 182, 187, 188 Robinson, D. K., 277 Roder, H., 21 Rodier, F., 400, 410 Roig, A., 97 Ro¨mer, R., 72

456

Author Index

Rorabacher, D. B., 384 Rose, G. D., 103, 366 Rosenbusch, J. P., 217, 224 Rosenfeld, S. S., 154, 158, 159, 170, 171, 172, 174, 175, 179, 180 Ross, P. D., 406, 413, 414 Rossmann, M. G., 399 Rostovtseva, T. K., 230 Roy, M., 34 Royer, C. A., 3, 6, 226 Royer, W. E., Jr., 430 Rudnick, J., 406 Rudolph, R., 32 Ruiz, N., 228 Russ, W., 218, 219, 227 Russell, R. B., 300 S Saber, H., 420 Sabina, J., 382 Safer, D., 160 Sahin, A., 243 Saito, Y., 12 Sakmar, T. P., 230 Sammut, S. J., 300 Sampson, N. S., 330, 333, 340, 355 Sander, C., 303, 312, 315 Sanker, S., 251 Sansom, M. S., 229 SantaLucia, J., Jr., 377, 378, 382, 383 Santoro, M. M., 20 Saroff, H. A., 205 Sartorius, C. A., 42, 53, 65 Satake, K., 146 Sattin, B. D., 379 Sauer, R. T., 20 Schaak, J. E., 381, 383 Schejter, A., 203 Schellman, J., 130, 141, 152 Schellman, J. A., 97 Scheraga, H. A., 97, 104, 123 Schiffer, C. A., 430 Schilstra, M. J., 441 Schimer, T., 224 Schimerlik, M. I., 21 Schimmel, P. R., 380, 381 Schmid, F. X., 4, 27, 32 Schmidt, M. A., 223, 224 Schmitz, S., 159 Schneemann, A., 399 Schneider, D., 218, 219 Scholtz, J. M., 4, 8, 12, 20, 21, 103 Scho¨n, A., 127 Schooler, J. B., 399 Schrader, W. T., 44 Schroder, C. H., 412 Schroeder, S. J., 382, 383

Schuck, P., 262, 420, 435, 436, 437, 442 Schwaller, B., 261 Schwartz, R., 396 Schwarz, F. P., 148, 149 Sehgal, P., 217, 220, 221 Seiberler, S., 291 Sellers, J. R., 159 Senchak, S. E., 369 Senear, D. F., 45, 47 Senes, A., 215, 229 Serebrov, V., 72 Serra, M. J., 383, 390 Serrano, L., 229 Settanni, G., 100, 103, 117, 119, 120, 121 Shabanowitz, J., 413 Shah, B. D., 439 Shastry, M. C. R., 21, 32 Shea, M. A., 47 Shemshedini, L., 43, 44 Sheng, M., 248 Sherwood, P. J., 419, 420, 423, 425, 426 Shevelyov, Y., 248 Shiman, R., 72, 369 Shindyalov, I. N., 300, 315, 318 Shuman, H., 159 Siegfried, N. A., 365, 374, 378, 383, 384, 386, 390 Sigurskjold, B. W., 276 Silhavy, T. J., 228 Silva, J. L., 407 Silverman, S. K., 379 Sinclair, J. F., 18, 21 Singh, R. M., 384 Singh, S., 406, 407, 408 Siuzdak, G., 396 Skafar, D. F., 46 Skolnick, J., 315 Slater, E., 54, 62 Smith, R. M., 78, 81, 82 Smith, T. J., 396 Snyder, D., 428, 441 Snyder, S., 238, 241, 246 Soldati, D., 158 Somerharju, P., 353, 360 Sompornpisut, P., 230 Song, C., 238, 243, 246 Song, Y., 238, 243, 244, 246, 247 Sonnhammer, E. L., 300 Sontag, C. A., 439 Sophianopoulos, A. J., 426 Soto, A. M., 71, 72, 73, 76, 78, 92 Soto, C., 2 Spencer, J., 32 Spitzauer, S., 291 Spudich, J. A., 159 Srinath, H., 430 Srinivasan, R., 366

457

Author Index

Stafford, W. F., 419, 420, 423, 425, 426, 429, 435, 436, 437, 438, 439 Stahl, S. J., 396, 406 Stanley, A. M., 219, 223, 228 Stein, A., 72, 77 Steiner, D., 119 Steitz, T. A., 215 Stellwagen, N. C., 366 Steven, A. C., 406, 413, 414 Stevens, J. M., 21 Stevens, S., 251 Stewart, J. M., 103 Stoltefuss, J., 412 Strang, G., 107, 113 Strasser, A., 243 Strauss, U. P., 90 Stray, S. J., 400, 404, 408, 412 Street, T. O., 103, 114 Strynadka, N. C. J., 261 Stumpp, L. M., 100, 119 Suhanovsky, M. M., 413 Sundquist, W. I., 399 Surrey, T., 223 Svedberg, T., 420 Svedburg, T., 203 Svensson, B., 276 Svir, I. B., 34 Sweeney, H. L., 154, 158, 159, 160, 165, 168, 170, 171, 174, 175, 176, 179, 180, 181 Sykes, B. D., 286 Szabo, G., 223, 224, 228, 229, 231 Szadkowski, H., 220 Szewczak, A. A., 91 Szustakowski, J. D., 315 T Tachi’iri, Y., 146 Tai, M., 445 Takahashi, K., 12, 263 Takimoto, G. S., 42, 43, 53, 65, 67 Tamm, L. K., 213, 214, 216, 223, 224, 226, 227, 229, 230, 231 Tan, A., 291 Tanford, C., 2, 217, 230 Tang, J., 399, 404, 409, 410 Tang, N., 158, 159, 168, 175 Tang, Y., 315 Taniguchi, T., 146 Tanner, J. J., 286, 288 Tasayco, M. L., 30 Tate, J., 300 Tatusov, R. L., 300 Taylor, E. W., 158, 171, 172, 173, 174, 175, 177, 188 Taylor, W. R., 300, 315 Tellinghuisen, J., 280 Teschke, C. M., 406, 413

Teukolsky, S. A., 306, 310 Theofan, G., 65 Thomas, D., 217 Thompson, E. B., 67 Thorsteinsson, M. V., 406, 412 Tihova, M., 399 Tinghino, R., 291 Tinoco, I., Jr., 72, 380, 381, 383, 384, 385 Tjandra, N., 240 Tochio, H., 250 Todd, G. P., 420 Tomoyori, K., 12 Toth, J., 425, 429, 435, 437 Travers, K., 379 Trentham, D. R., 158 Treuter, E., 68 Tsai, M.-J., 42, 53, 62, 67 Tsai, S. Y., 42, 53 Tsang, S. K., 414 Tsumoto, K., 437, 438 Tung, L., 42, 53, 65 Turner, D. H., 366, 377, 378, 382, 383, 389, 390 Twardosz, A., 291 U Udgaonkar, J. B., 30 Uemura, S., 159 Ulmschneider, M. B., 229 Urbanke, C., 420, 435, 436 Utiyama, H., 30, 32 V Vadlamudi, R., 243 Vainio, P., 353, 360 Vajdos, F., 439 Valegard, K., 399 Valenta, R., 291 Vallee-Belisle, A., 22 van der Schoot, P., 403 Vangelista, L., 291 van Holde, K. E., 420, 426 Vanhove, M., 27 Veigel, C., 159 Velazquez-Campoy, A., 127, 151 Velikovsky, C. A., 420, 435, 436 Venkataiah, B., 406, 410, 412 Verdino, P., 291 Vertrees, J., 299, 304, 306 Vetterling, W. T., 306, 310 Villers, B. M., 18 Vincenz, C., 3 Virden, R., 27 Virtanen, J. A., 353, 360 Vist, M. R., 330, 354 Volk, D. E., 301 von Heijne, G., 229

458

Author Index

von Hippel, P. H., 97 Voth, A., 242, 243 Vournakis, J. N., 97 Vulletich, D. A., 300 W Wachsstock, D. H., 189 Wada, A., 12 Waggoner, A., 3 Wagner, G., 241 Wagner, J. P., 53 Waksman, G., 109 Walker, D. A., 42 Walker, N. S., 309, 312 Wallace, L. A., 2, 21, 30, 32 Walters, J., 1 Walters, R. F., 229 Wand, A. J., 300, 303, 304 Wang, F., 159 Wang, L., 238, 248 Wang, S., 306 Wang, W., 238, 242 Ward, J. H., 319 Warnmark, A., 68 Watanabe, K., 377 Watson, J. D., 397, 398 Watt, S., 159 Webb, M. R., 161, 178, 179 Weber, C. H., 3, 34 Weber, G., 407 Weber, O., 412 Weers, B., 366 Weinrich, M., 230 Weiss, G. H., 420 Wells, A. L., 154, 158, 159, 168, 170, 171, 174, 179, 180 Wen, W., 238, 243, 246 Weng, Z., 315 Westritschnig, K., 291 Wetzel, S. K., 100, 103, 117, 119, 120, 121, 123 Whitaker, M., 154, 158, 159, 170, 171, 174, 179, 180 White, H. D., 178, 179, 181 White, S. A., 91 White, S. H., 227, 229 Whitelegge, J. P., 219, 220, 222, 228, 231 Whitten, S. T., 301, 304, 306 Widom, B., 349 Wikoff, W. R., 399 Wild, J. R., 21 Wildes, D., 22 Wilkinson, A. J., 368 Williams, K. A., 229 Williams, R. W., 406 Willits, D., 404, 409 Wills, N. M., 381 Wilson, C. J., 3

Wilton, C., 300 Wimley, W. C., 217, 229 Wimmertova, M., 148 Winget, G. D., 384 Wingfield, P. T., 406 Wingfield, P. W., 396 Winter, G., 368 Winter, W., 384 Wittung-Stafshede, P., 3 Wolf, D. M., 66 Wolford, R., 42 Wong, J., 42 Woods, V. L., 228, 231 Woodworth, R. C., 146, 148, 149 Woody, R. W., 6 Wrabl, J. O., 299, 304, 305, 306 Wrange, O., 54, 62 Wrenn, R. F., 32 Wright, A. P. H., 68 Wright, E. R., 399 Wright, P., 248, 249 Wu, G. M., 442 Wu, T., 228 Wyman, J., 44, 76, 128, 130, 131, 132, 141, 151, 152, 204, 205 Wynne, S. A., 399, 400 X Xia, T., 382, 383 Xing, J., 154, 158, 159, 170, 171, 174, 179, 180 Xiong, Y., 100, 118 Xu, Y., 301, 384, 442 Y Yajima, R., 369, 370, 385 Yakhnin, H., 381 Yamamoto, K. R., 62, 68 Yamamura, T., 146 Yang, D., 219, 220, 222, 228, 231 Yang, Z., 243 Yarian, C. S., 195, 209 Yee, A. A., 399 Yengo, C. M., 159, 160 Yohannan, S., 219, 220, 222, 228 York, E. J., 103 Yoshizawa, S., 377 You, M., 217, 228 Young, M. J., 399, 404, 406, 409, 410 Yphantis, D. A., 269, 277, 280, 420, 426, 438, 440, 442 Yu, Q., 384 Yun, E., 366 Z Zahler, W. L., 4 Zaidi, F. N., 30

Author Index

Zandi, R., 403, 406 Zarrine-Afsar, A., 22, 32 Zaug, A. J., 382 Zdobnov, E. M., 300 Zhang, M., 238, 242, 243, 246, 248, 250 Zhang, O. W., 244 Zhang, Q., 250 Zhang, T., 396 Zhang, W., 73

459 Zhang, Y., 159, 315 Zhao, H., 425, 428, 435 Zhao, W., 379 Zhou, T., 315 Zimm, B., 97, 100, 104 Zlotnick, A., 395, 396, 399, 400, 401, 404, 406, 407, 408, 409, 410, 411, 412, 413 Zuiderweg, E., 251 Zuker, M., 382

Subject Index

A

Bacteriophge HK97, capsid stability studies, 413 Bacteriophge P22, capsid stability studies, 412–413 Bacteriorhodopsin, sodium dodecyl sulfate unfolding assay, 219–222 Binding polynomial, see Isothermal titration calorimetry

data analysis equilibrium constants, 19–20 fitting equilibrium unfolding data, 21 fractions of species, 20–21 data collection, 6 equilibration time establishment, 9–10 experiment setup, 10–12 homodimeric protein models, 12–14, 16–19 instrumentation, 5 interpretation of curves, 12–19 monomeric protein models, 12, 15–16 practical considerations, 3–5 unfolded protein confirmation, 8–9 urea stock solution preparation, 6, 8 solvent denaturation analysis of membrane proteins, 224–226 COD, see Cholesterol oxidase Combinatorial extension algorithm, see Protein fold Condensed complex model, cholesterol–membrane interactions, 352, 359–360 COREX, modeling of native state ensemble of proteins, 302–304, 306, 308 Cowpea chlorotic mottle virus, capsid stability studies, 409–410 Cyclic AMP receptor protein, isothermal titration calorimetry of binding, 148–149

C

D

AIDA, differential scanning calorimetry, 224 Analytical ultracentrifugation progesterone receptor structural homogeneity analysis, 45–46 sedimentation velocity of kinetically limited reacting systems dimerization kinetically mediated dimerization, 428–436 simulation and analysis, 421–428 overview, 420 prospects for study, 442 Sedenal software, 421 stepwise approach for boundary fitting of data, 436–442 transmembrane domain oligomer stability assay, 217 ATPase, see Myosin ATPase B

Calcium-binding proteins, see Isothermal titration calorimetry Capsid, see Virus capsid stability CD, see Circular dichroism Chemical potential, see Membrane cholesterol Cholesterol oxidase, assay catalytic reaction, 333 chemical potential for cholesterol measurement, 355–359 incubation conditions and data collection, 334 liposome preparation, 331–332 maximum solubility of cholesterol in lipid bilayers, 339–341 Cholesterol–lipid interactions, see Membrane cholesterol Circular dichroism protein folding analysis with equilibrium unfolding

Differential scanning calorimetry b-barrel membrane proteins, 223–224 virus capsid stability studies, 406, 413 Direct boundary fitting, see Analytical ultracentrifugation Disulfide cross-linking, transmembrane domain oligomer stability assay, 218 DNA folding, see Nucleic acid folding DsbB, sodium dodecyl sulfate unfolding assay, 219, 222 DSC, see Differential scanning calorimetry Dynein light chain LC8 function, 238 nuclear magnetic resonance allosteric interactions, 251–255 data reduction, 239–241, 253–256 dynein intermediate chain IC74 binding site mapping, 248–249

461

462

Subject Index

Dynein light chain LC8 (cont.) folding upon binding, 249–251 exchange rates, 238–239 ligand-induced dimerization, 246–247 ligand-induced folding, 250 monomer-dimer equilibrium coupled to electrostatics association and dissociation rate constant measurements, 243–246 histidine pKa measurements, 242–243 overview, 241–242 prospects for study, 255–256 E Energetic profiles, see Protein fold F FITSIM, myosin ATPase kinetic simulations, 189 Fluorescence resonance energy transfer, transmembrane domain oligomer stability assay, 217–218 Fluorescence spectroscopy myosin ATPase studies actin-activated ATPase assay, 162–166 instrumentation, 160 pyrene-labeled actin fluorescence titration, 168–170 protein folding analysis with equilibrium unfolding data analysis equilibrium constants, 19–20 fitting equilibrium unfolding data, 21 fractions of species, 20–21 equilibration time establishment, 9–10 experiment setup, 10–12 fluorescence emission, 5–7 homodimeric protein models, 12–14, 16–19 instrumentation, 5 interpretation of curves, 12–19 monomeric protein models, 12, 15–16 practical considerations, 3–5 unfolded protein confirmation, 8–9 urea stock solution preparation, 6, 8 RNA–magnesium interaction analysis with 8-hydroxyquinoline–5-sulfonic acid automated titrations, 84–87 controls, 90–92 data analysis, 88–90 ion-binding properties, 78–81 manual titrations, 87 reagents and stock solution preparation, 81–83 sample preparation, 83–84 stopped-flow, see Stopped-flow spectroscopy virus capsid stability, 407–408 FRET, see Fluorescence resonance energy transfer

G GALLEX, transmembrane domain oligomer stability assay, 219 H HBV, see Hepatitis B virus Hemoglobin cooperativity, free energies, and binding constants, 194–197 Hill coefficient Adair constants, 203–204 historical perspective and derivation, 200–203 microscopic cooperativity insensitivity, 211 redefinition by Wyman, 204–205 macroscopic binding equilibria, 197–200 microscopic cooperativity binding cascade, 205–209 binding isotherm insensitivity, 209–210 macroscopic versus microscopic binding constants, 208 Hepatitis B virus, capsid stability studies, 410–412 Hill plot Hill coefficient and hemoglobin Adair constants, 203–204 historical perspective and derivation, 200–203 microscopic cooperativity insensitivity, 211 redefinition by Wyman, 204–205 isothermal titration calorimetry, Hill slope, and binding capacity, 151–154 HPV, see Human papillomavirus HQS, see 8-Hydroxyquinoline–5-sulfonic acid Human papillomavirus, capsid stability studies, 412 8-Hydroxyquinoline–5-sulfonic acid, RNA–magnesium interaction analysis automated titrations, 84–87 controls, 90–92 data analysis, 88–90 ion-binding properties, 78–81 manual titrations, 87 reagents and stock solution preparation, 81–83 sample preparation, 83–84 I Ising models, see Protein folding Isothermal titration calorimetry advantages in ligand binding studies, 128–129 binding polynomial advantages, 150–151 cyclic AMP receptor protein study, 148–149 data analysis cooperative ligand binding in two identical sites, 144–146

463

Subject Index

independent ligand binding in two identical sites, 142–143 independent ligand binding in two nonidentical sites, 143 overview, 133–135 three ligand-binding sites, 150 two ligand-binding sites, 135–141 heterotropic interactions, 151 Hill slope and binding capacity, 151–154 independent versus cooperative binding, 132–133 lectin binding study, 148 microscopic constants and cooperativity, 131–132 ovotransferrin binding of ferric ions, 146–147 telomere-binding alpha protein binding to DNA, 148–150 theory, 128–131 transferrin binding of ferric ions, 148–149 divalent cation binding studies advantages, 295 buffers, 263 competing chelator binding modeling, 275, 283–284 parameters, 267, 289 competitive metal ion binding modeling, 276–277, 284–285 cooperative binding analysis, 288, 290–291 data set preparation, 268–270 error analysis, 280–281 independent two-site model, 281–283 least-squares minimization, 277–280 metal ion removal from protein solutions chromatography on EDTA-agarose, 265–267 EDTA-agarose preparation, 264–265 model development, 271–274 overview, 261–262 a-parvalbumin mutant studies, 285–289 parameter file preparation, 271 polcalcin Phl p 7 binding studies, 291–294 standardization of metal and chelator solutions, 263–264 titration and data collection, 268 principles, 260–261 ITC, see Isothermal titration calorimetry K KaleidaGraph, nonlinear melting curve fitting for nucleic acids, 389–390 KINSIM myosin ATPase kinetic simulations, 189 protein folding kinetic simulations, 34–36 L Lectin, isothermal titration calorimetry of ligand binding, 148

Least squares, minimization of isothermal titration calorimetry data, 277–280 Linear repeat proteins, see Protein folding Liposome, see Membrane cholesterol M Magnesium-binding proteins, see Isothermal titration calorimetry Magnesium–RNA interactions, see RNA–magnesium interactions Matrix heteropolymer approach consensus ankyrin repeat protein folding analysis with matrix heteropolymer approach, 119–123 overview, 109–111 Matrix homopolymer approach consensus tetratricopeptide protein folding analysis, 115–118 overview, 104–109 Melting curves, see Nucleic acid folding Membrane cholesterol ceramide competition analysis in lipid bilayers, 350–352 chemical potential and cholesterol affinity, 361–362 cholesterol oxidase assay catalytic reaction, 333 chemical potential for cholesterol measurement, 355–359 incubation conditions and data collection, 334 liposome preparation, 331–332 maximum solubility of cholesterol in lipid bilayers, 339–341 conceptual models of interactions and experimental evaluation condensed complex model, 352, 359–360 superlattice model, 353, 360–361 umbrella model of cholesterol multibody interaction, 346–349, 353–354, 361 liposome preparation materials, 331 Monte Carlo simulation of lipid membranes chemical potential for cholesterol calculation, 336–338, 357 intrinsic connection between regular distribution and jump in chemical potential, 349–350 multibody interaction energy parameters and effects, 341–346 near maximum solubility levels, 341 Hamiltonian cholesterol multibody interaction, 336 pairwise interactions, 335 lattice model, 334–335 physiochemical effects, 330 X-ray diffraction measurement data collection, 333 liposome preparation, 332, 338

464 Membrane cholesterol (cont.) maximum solubility of cholesterol in lipid bilayers, 338–339 Membrane protein stability b-barrel membrane protein stability assays differential scanning calorimetry, 223–224 sodium dodecyl sulfate denaturation assay, 223 solvent denaturation, 224–227 classes of proteins and folding pathways, 215–216 forces in stabilization aromatic interactions, 228–229 elastic lipid bilayer forces, 229–230 electrostatic interactions, 228 hydrogen bonding, 228 van der Waals packing, 227–228 prospects for study, 231–232 sodium dodecyl sulfate unfolding assay for a-helical membrane protein stability, 219–221 transmembrane domain oligomer stability assays analytical ultracentrifugation, 217 disulfide cross-linking, 218 fluorescence resonance energy transfer, 217–218 genetic selection systems GALLEX, 219 POSSYCAT, 218 TOXCAT, 218–219 Monte Carlo simulation, see Membrane cholesterol Mouse mammary tumor virus, promoter-binding studies of progesterone receptor, 54–62 Myosin ATPase actin purification for assay, 159–160 actomyosin binding affinities dissociation constant, 166 pyrene-labeled actin fluorescence titration, 168–170 sedimentation assays, 166–168 classes and isoforms, 158 fluorescence spectroscopy instrumentation, 160 kinetic simulations, 188–189 myosin preparation, 160 nucleotide stock solutions, 160 reaction cycle, 158–159 steady-state assays actin-activated ATPase assay, 162–166 high-salt ATPase activity, 161–162 transient kinetics ADP release ATP and ADP binding to actomyosin, 185–188 ATP binding to actomyosin-ADP mixture, 182–185 ATP competition study overview, 182 ATP binding and hydrolysis by myosin

Subject Index

quenched-flow, 177–178 tryptophan intrinsic fluorescence, 176–177 ATP binding to actomyosin, 173–175 myosin–actin interactions, 171–172 overview, 170–171 phosphate release, 179–182 quenched-flow instrumentation, 161 stopped-flow spectroscopy instrumentation, 160–161 N NMR, see Nuclear magnetic resonance Nuclear magnetic resonance, dynein light chain LC8 studies allosteric interactions, 251–255 data reduction, 239–241, 253–255 dynein intermediate chain IC74 binding site mapping, 248 folding upon binding, 249–251 exchange rates, 238–239 ligand-induced dimerization, 246–247 ligand-induced folding, 250 monomer-dimer equilibrium coupled to electrostatics association and dissociation rate constant measurements, 243–246 histidine pKa measurements, 242–243 overview, 241–241 prospects for study, 255–256 Nucleic acid folding cooperativity definition, 367–369 prospects for study, 390–391 RNA folding secondary structure, 377–378 tertiary structure, 378–379 thermodynamics boxes design, 369–370 implementation, 370–374 interpretation, 374 cubes, 374–376 overview, 366–367 ultraviolet melting studies buffers, 383–384 concentration independence of melting temperature, 382–383 equations, 385–389 extinction coefficient, 381–382 hyperchromicity, 379–380 incubation conditions and data collection, 384–385 nonlinear curve fitting with KaleidaGraph, 389–390 sample preparation, 380 wavelength selection, 381

465

Subject Index O OmpA sodium dodecyl sulfate denaturation assay, 223 solvent denaturation analysis, 224–227 OmpF, differential scanning calorimetry, 224 Ovotransferrin, isothermal titration calorimetry of ferric ion binding, 146–147 P a-Parvalbumin, isothermal titration calorimetry of divalent metal binding, 285–289 PCA, see Principal components analysis Polcalcin Phl p 7, isothermal titration calorimetry of divalent metal binding, 291–294 POSSYCAT, transmembrane domain oligomer stability assay, 218 PR, see Progesterone receptor Principal components analysis, protein fold energetic profile space, 306–308 Progesterone receptor analytical ultracentrifugation analysis of structural homogeneity, 45–46 coactivator recruitment energetics, 62–64 DNA interactions linked assembly reaction dissection with quantitative footprint titration, 46–53 mouse mammary tumor virus promoter studies, 54–62 overview, 42–43 isoforms, 42 prospects for thermodynamic studies, 67–68 purification and structural heterogeneity, 43 RU–486 binding analysis, 44–45 transcriptional activation energetics, 64–67 Protein fold ASTRAL database, 308, 310, 312–314 classification, 300–301, 309–310 energetic profile space combinatorial extension algorithm alignment of energetic profiles, 315–316 clustering of profiles with STEPH algorithm, 318–321, 323–324 structure coordinates, 316–317 variations, 317–318 conservation between homologous proteins, 308–314 conserved substructure discovery in absence of known homology, 321–323 principal components analysis, 306–308 native state ensemble of proteins energetic profiles of proteins derived from thermodynamic modeling, 304–306 modeling with statistical thermodynamics, 301–304 Protein folding, see also Membrane protein stability equilibrium unfolding circular dichroism, 6

data analysis equilibrium constants, 19–20 fitting equilibrium unfolding data, 21 fractions of species, 20–21 equilibration time establishment, 9–10 experiment setup, 10–12 fluorescence emission, 5–7 homodimeric protein models, 12–14, 16–19 instrumentation, 5 interpretation of curves, 12–19 monomeric protein models, 12, 15–16 practical considerations, 3–5 unfolded protein confirmation, 8–9 urea stock solution preparation, 6, 8 intermediates, 2 intrinsic probe techniques, 2–3 kinetic studies with stopped-flow data analysis burst phase, 29–30 exponential curve fitting, 30–32 simulations, 32–36 data collection, 24, 26 differential quenching by acrylamide data collection, 27–29 overview, 26–27 sample preparation, 27 general considerations, 21–23 initial parameterization, 23 sample preparation, 23–25 membrane proteins, see Membrane protein stability repeat-protein folding analysis with nearestneighbor statistical mechanical models homopolymer partition function formulation and zipper approximation, 100–103 Ising models linear biopolymer applications, 96–97 linear repeat protein features, 97–100 origins, 96 solvability criteria for repeat-protein folding, 111–115 matrix approach consensus ankyrin repeat protein folding analysis with matrix heteropolymer approach, 119–123 consensus tetratricopeptide protein folding analysis with matrix homopolymer approach, 115–119 heteropolymers, 109–111 homopolymers, 104–109 prospects for study, 123 Q Quenched-flow, myosin ATPase studies ATP binding and hydrolysis by myosin, 177–178 instrumentation, 161 overview, 170–171

466

Subject Index

Repeat-proteins, see Protein folding RNA folding, see Nucleic acid folding RNA–magnesium interactions equilibrium dialysis studies, 72–73, 76–77 fluorescence analysis with 8-hydroxyquinoline–5-sulfonic acid automated titrations, 84–87 controls, 90–92 data analysis, 88–90 ion-binding properties, 78–81 manual titrations, 87 reagents and stock solution preparation, 81–83 sample preparation, 83–84 preferential interaction coefficients binding density approach comparison, 76–78 magnesium-binding dye for determination, 74–76 overview, 73–74 tertiary structure stabilization, 72 S SDS, see Sodium dodecyl sulfate Sedenal, see Analytical ultracentrifugation Sedimentation velocity, see Analytical ultracentrifugation Size-exclusion chromatography, virus capsid assembly studies, 406, 409 Sodium dodecyl sulfate denaturation assay for b-barrel membrane protein stability, 223 unfolding assay for a-helical membrane protein stability, 219–221 STEPH algorithm, clustering of protein fold energetic profiles, 318–321, 323–324 Steroid receptor coactivator–2, energetics of progesterone receptor recruitment, 62–64 Stopped-flow spectroscopy myosin ATPase studies ADP release ATP and ADP binding to actomyosin, 185–188 ATP binding to actomyosin-ADP mixture, 182–185 ATP competition study overview, 182 ATP binding and hydrolysis by myosin with tryptophan intrinsic fluorescence, 176–177 ATP binding to actomyosin, 173–175 instrumentation, 160–161 myosin–actin interactions, 171–172 overview, 170–171 phosphate release, 179–182 protein folding kinetic studies data analysis burst phase, 29–30 exponential curve fitting, 30–32 simulations, 32–36 data collection, 24, 26

differential quenching by acrylamide data collection, 27–29 overview, 26–27 sample preparation, 27 general considerations, 21–23 initial parameterization, 23 sample preparation, 23–25 Superlattice model, cholesterol–membrane interactions, 353, 360–361 T Telomere-binding alpha protein binding, isothermal titration calorimetry of DNA binding, 148–150 Themodynamic boxes, see Nucleic acid folding Themodynamic cubes, see Nucleic acid folding TOXCAT, transmembrane domain oligomer stability assay, 218–219 Transferrin, isothermal titration calorimetry of ferric ion binding, 148–149 U Ultracentrifugation, see Analytical ultracentrifugation Ultraviolet melting, see Nucleic acid folding Umbrella model, cholesterol–membrane interactions, 346–349, 353–354, 361 V Virus capsid stability assembly macromolecular polymerization and classical nucleation theory, 401–403 size-exclusion chromatography, 406, 409 thermodynamic theory, 403–405 bacteriophage HK97, 413 P22, 412–413 cowpea chlorotic mottle virus, 409–410 denaturant titration, 406–408 differential scanning calorimetry, 406, 413 hepatitis B virus, 410–412 human papillomavirus, 412 prospects for study, 414 structural basis capsid geometry, 397–399 interaction specificity and assembly regulation, 399–400 protein structure, 399 subunits, 400 X X-ray diffraction, cholesterol–membrane interactions data collection, 333 liposome preparation, 332, 338 maximum solubility of cholesterol in lipid bilayers, 338–339

Methods in Enzymology 492 - Biothermodynamics, Part 4

Read more

Methods in Enzymology 455 - Biothermodynamics, Part 1

Read more

Methods in Enzymology 466 - Biothermodynamics, Part B

Read more

Action in Ecosystems Biothermodynamics for Sustainability

Read more

Methods in Enzymology 488 - Biothermodynamics, Part 3

Read more

Recommend Documents

Methods in Enzymology 492 - Biothermodynamics, Part 4

METHODS IN ENZYMOLOGY Editors-in-Chief JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of ...

Methods in Enzymology 455 - Biothermodynamics, Part 1

METHODS IN ENZYMOLOGY Editors-in-Chief JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of ...

Methods in Enzymology 466 - Biothermodynamics, Part B

METHODS IN ENZYMOLOGY Editors-in-Chief JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of ...

Action in Ecosystems Biothermodynamics for Sustainability

Action in Ecosystems: Biothermodynamics for Sustainability RESEARCH STUDIES IN BOTANY AND RELATED APPLIED FIELDS Seri...

Methods in Enzymology 488 - Biothermodynamics, Part 3

METHODS IN ENZYMOLOGY Editors-in-Chief JOHN N. ABELSON AND MELVIN I. SIMON Division of Biology California Institute of ...