Functional and Structural Proteomics of Glycoproteins
Raymond J. Owens · Joanne E. Nettleship Editors
Functional and Structural Proteomics of Glycoproteins
123
Editors Raymond J. Owens Oxford Protein Production Facility-UK University of Oxford The Research Complex at Harwell R92 Rutherford Appleton Laboratory Harwell Science and Innovation Campus Oxfordshire OX11 0FA UK
[email protected]
Joanne E. Nettleship Oxford Protein Production Facility-UK University of Oxford The Research Complex at Harwell R92 Rutherford Appleton Laboratory Harwell Science and Innovation Campus Oxfordshire OX11 0FA UK
[email protected]
ISBN 978-90-481-9354-7 e-ISBN 978-90-481-9355-4 DOI 10.1007/978-90-481-9355-4 Springer Dordrecht Heidelberg London New York © Springer Science+Business Media B.V. 2011 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Large-scale sequencing of the human and other mammalian genomes has created an enormous database of protein sequences for functional and structural analyses. It has been predicted that nearly half of all human proteins are glycosylated indicating the functional importance of glycoproteins in human health and disease. However, the study of glycoproteins presents major challenges. Unlike nucleic acid and amino acid sequences, the glycans attached to proteins are not directly coded for by a template. Rather, they are the result of a complex processing mechanism which acts on proteins destined for the cell surface either to be secreted or retained in the membrane. The glycans attached to proteins are no longer regarded as a byproduct of biosynthesis but are functionally significant in their own right. Importantly, these glycans have emerged as biomarkers in the diagnosis of human diseases such as cancers and play a significant role in the mechanisms by which pathogenic viruses gain entry into human cells. Manipulation of the glycosylation patterns of therapeutic antibodies has led to improvements in their mechanism of action which may ultimately translate into increased clinical efficacy. In the last few years, technology developments, in particular, advances in high throughput separation methods and detection techniques, have accelerated the characterization of the glycosylation patterns of cells and tissues. The use of lectin microarrays coupled to highly sensitive fluorescence-based detection systems has enabled the rapid profiling of glycan expression. Structural analysis is central to understanding the function of glycosylated proteins, though due to their heterogeneity, the attached glycans make glycoproteins difficult to crystallize for x-ray crystallography. The recent development of glyco-engineering techniques coupled to rapid protein production using transient expression in mammalian cells is facilitating the structural determination of glycoproteins. Key to exploiting the information generated by functional and structural studies of glycoproteins is the organization of the primary experimental data into public databases and the development of tools to search and analyse glycan structure and composition. In this volume, the state-of-the art in all these areas is reviewed by experts in the field of glycoproteomics. We are grateful to all the contributors to this book for sharing
v
vi
Preface
their experience and knowledge. We also thank Springer Verlag for the opportunity of undertaking this project and for their assistance during the production of the book. Oxford, UK
Raymond J. Owens Joanne E. Nettleship
Contents
1 Glycoproteomics in Health and Disease . . . . . . . . . . . . . . . . Weston B. Struwe, Eoin F.J. Cosgrave, Jennifer C. Byrne, Radka Saldova, and Pauline M. Rudd
1
2 Glyco-engineering of Fc Glycans to Enhance the Biological Functions of Therapeutic IgGs . . . . . . . . . . . . . T. Shantha Raju, David M. Knight, and Robert E. Jordan
39
3 Bioinformatics Databases and Applications Available for Glycobiology and Glycomics . . . . . . . . . . . . . . . . . . . . René Ranzinger, Kai Maaß, and Thomas Lütteke
59
4 Lectin Microarrays: Simple Tools for the Analysis of Complex Glycans . . . . . . . . . . . . . . . . . . . . . . . . . . . Lakshmi Krishnamoorthy and Lara K. Mahal
91
5 The Application of High Throughput Mass Spectrometry to the Analysis of Glycoproteins . . . . . . . . . . . . . . . . . . . . Sasha Singh, Morten Thaysen Andersen, and Judith Jebanathirajah Steen 6 Solutions to the Glycosylation Problem for Low- and High-Throughput Structural Glycoproteomics . . . . Simon J. Davis and Max Crispin
103
127
7 Role of Glycoproteins in Virus–Human Cell Interactions . . . . . . Thomas A. Bowden and Elizabeth E. Fry
159
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
181
vii
Contributors
Thomas A. Bowden The Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK,
[email protected] Jennifer C. Byrne National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Eoin F.J. Cosgrave National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Max Crispin Department of Biochemistry, Oxford Glycobiology Institute, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK,
[email protected] Simon J. Davis Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK,
[email protected] Elizabeth E. Fry The Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK,
[email protected] Robert E. Jordan Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] David M. Knight Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] Lakshmi Krishnamoorthy Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA,
[email protected] ix
x
Contributors
Thomas Lütteke Faculty of Veterinary Medicine, Institute of Biochemistry and Endocrinology, Justus-Liebig University Gießen, Frankfurter Str. 100, 35392 Gießen, Germany,
[email protected] Kai Maaß Department of Chemistry, Institute of Inorganic and Analytical Chemistry, Justus-Liebig University Gießen, Schubertstrasse 60, Building 16, 35392 Glessen, Germany,
[email protected] Lara K. Mahal Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA,
[email protected] T. Shantha Raju Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] René Ranzinger Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Road, Athens, Georgia 30602, USA,
[email protected] Pauline M. Rudd National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Radka Saldova National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Sasha Singh Proteomics Center at Children’s Hospital Boston, Boston, MA 02115, USA; Departments of Pathology, Harvard Medical School and Children’s Hospital Boston, Boston, MA 02115, USA; F. M. Kirby Neurobiology Center, Children’s Hospital Boston, Boston, MA 02115, USA,
[email protected] Judith Jebanathirajah Steen Proteomics Center at Children’s Hospital Boston, Boston, MA 02115, USA; F. M. Kirby Neurobiology Center, Children’s Hospital Boston, Boston, MA 02115, USA; Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA,
[email protected] Weston B. Struwe National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Morten Thaysen Andersen Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark,
[email protected]
Chapter 1
Glycoproteomics in Health and Disease Weston B. Struwe, Eoin F.J. Cosgrave, Jennifer C. Byrne, Radka Saldova, and Pauline M. Rudd
Abstract The addition of oligosaccharides to proteins is a significant posttranslational modification that modulates protein structure, function and localization. Glycans are vital for development in all eukaryotes and are profoundly connected to a large number of human diseases, ranging from glycan genetic diseases to autoimmune disorders and cancer. Glycans present a difficult challenge in the analytical field because of the intricate dynamics of their synthesis as well as the complexity of the structures themselves. In addition to the role of glycans in development and disease, they are of great interest in the biotherapeutic industry where modification of glycosylation can significantly enhance therapeutic efficacy and biological activity in a range of glycoprotein products. However, glycosylation on a global scale in humans is yet to be fully appreciated as researchers are discovering that glycosylation is not only protein, cell or tissue specific, but is additionally influenced by individual genetics and environmental factors. Functional glycomics and glycoproteomics are emerging as a central field in systems biology and will continue to be a key focus in discerning health and disease. Keywords Biotherapeutics · Cancer · Glycoproteomics · Glycosylation · Systems biology Abbreviations ADCC AFP AGP CCD CDG
antibody-dependent cell mediated cytotoxicity α-fetoprotein α1-acid glycoprotein cross reactive carbohydrate determinant congenital disorders of glycosylation
W.B. Struwe (B) National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland e-mail:
[email protected]
R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_1, C Springer Science+Business Media B.V. 2011
1
2
CE CEA CHO CID DMB EAATs EGFR EPO ER ERT ESI FcγR FDA Fuc FucT Fuc-TIII Gal GalNAc Gal-T Glc GlcNAc GnT-I GnT-III GnT-V GU GWAS HA HCC β-HCG HILIC HPLC IgG LacNAc Lex LLO LSD MALDI Man MBL MS MSn Neu5Gc Neu5Nac NK NMR
W.B. Struwe et al.
capillary electrophoresis carcinoembryonic antigen Chinese hamster ovary collision induced dissociation 1,2-diamino-4,5-methylene-dioxybenzene excitatory amino acid transporters epidermal growth factor receptor erythropoietin endoplasmic reticulum enzyme replacement therapy electrospray ionization Fc-gamma receptor Food and Drug Administration fucose α(1,6)-fucosyltransferse α(1,3/4)-fucosyltransferase galactose N-acetylgalactosamine β(1,4)galactosyltransferase glucose N-acetylglucosamine N-acetylglucosaminyltransferase-I N-acetylglucosaminyltransferase-III N-acetylglucosaminyltransferase-V glucose unit genome-wide association study hemagglutinin hepatocellular carcinomas β-human chorionic gonadotrophin Hydrophilic interaction chromatography high performance liquid chromatography immunoglobulin G N-acetyllactosamine Lewisx lipid-linked oligosaccharide lysosomal storage disease matrix assisted laser desorption ioziation mannose mannose-binding lectin mass spectrometry sequential mass spectrometry N-glycolylneuraminic acid N-acetylneuraminic acid natural killer nuclear magnetic resonance
1
Glycoproteomics in Health and Disease
OST PGC PNGase F PSA RA RNase 1 RP-HPLC SLE sLea sLey ST3GalIV ST8Sia II ST8Sia IV TGFβR TOF WAX Xyl Xyl-T 2-AB 2D-DIGE 2-DE
3
oligosacchryltransferase porous graphitized carbon peptide-N-glycosidase F prostate specific antigen rheumatoid arthritis ribonuclease 1 reverse phase-HPLC systemic lupus erythematosus sialyl Lewisa sialyl Lewisy β-galactoside α(2,3)-sialyltransferase α(2,8)-sialyltransferase II α(2,8)-sialyltransferase IV transforming growth factor-β receptor time of flight weak anion exchange xylose β(1,2)-xylosyltransferase 2-aminobenzamide 2D difference gel electrophoresis 2D gel electrophoresis
Contents 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Basic Glycan Structure . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Biosynthetic Pathway for N- and O-Linked Glycosylation . . . . . . . . . 1.1.3 Glycan Diversity and Biological Function . . . . . . . . . . . . . . . . 1.2 Glycan and Glycoprotein Analytics . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 High Performance Liquid Chromatography . . . . . . . . . . . . . . . 1.2.3 2D Gel Electrophoresis . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Glycosylation and Disease . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Glycosylation in Cancer Biology . . . . . . . . . . . . . . . . . . . . 1.3.2 Role of Glycosylation in Autoimmune Diseases . . . . . . . . . . . . . 1.3.3 Congenital Disorders of Glycosylation (CDGs) . . . . . . . . . . . . . . 1.3.4 Lysosomal Storage Diseases (LSDs) . . . . . . . . . . . . . . . . . . 1.4 Glycobiology in the Treatment of Disease . . . . . . . . . . . . . . . . . . . 1.4.1 Bioproduction of Glycoprotein Therapeutics . . . . . . . . . . . . . . . 1.4.2 Manipulating Glycosylation for Enhanced Biotherapeutic Function . . . . . 1.5 Systems Glycobiology, Glycoproteomics and Glycogenomics in Disease Diagnosis and Pathology . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 4 5 8 10 11 12 13 14 14 21 22 23 23 24 28 29 30
4
W.B. Struwe et al.
1.1 Introduction Carbohydrates, when attached to proteins or lipids, form large complex biomolecules collectively termed glycoconjugates. The carbohydrate moieties of glycoconjugates fall into three main categories: those attached to lipids, and those attached to proteins either through a nitrogen atom (N-linked) or through an oxygen atom (O-linked). The attachment of glycans influences protein structure and function, as well as the localization of cell surface and secreted glycoproteins. Glycans can confer cell-type specificity and are critical components of cell-to-cell signaling [1]. Carbohydrates are also involved in the immune response and host-pathogen interactions [2]. Moreover, changes in N-glycan biosynthesis have been identified as key components in tumour progression in mice and humans [3]. The early stage of N-glycosylation is conserved from yeast to humans and the complete loss of function is lethal [4, 5]. Protein modification with carbohydrates is the most common and complex type of posttranslational modification. A glycoprotein can exist as a number of different glycoforms where the glycan structure(s) vary. The structural complexity and variability of N- and O-glycans is able to provide a high degree of variability to a protein and can modulate its function and/or structure. Glycans also alter the solubility, half-life and aggregation properties of glycoproteins. It is indisputable that glycans play an essential role in the activity of proteins and a fundamental role in biological functions of both native and recombinant glycoproteins. The cell dedicates 1% of its entire genome to glycosylation machinery and upwards of 70% of all proteins are modified by glycans in human cells [6]. When compared to the fields of genomics or proteomics, the study of glycoconjugates is still in its early stages and lacks robust and all encompassing analytical and bioinformatic tools to unravel its complexity. Understanding the glycome and glycoproteome is a much greater challenge and is confounded by the elaborate mechanisms of glycosylation and the fact that the glycome is meticulously intertwined with the genome and proteome.
1.1.1 Basic Glycan Structure In contrast to DNA, RNA and proteins, carbohydrates form branching structures and as a result, a relatively small set of monosaccharides (Table 1.1) can provide considerably complex structures. The dynamics of glycan synthesis, the complexity of the glycans and the extrinsic properties attributed to glycans are significant factors that make glycobiology one of the less well understood disciplines in systems biology research today. Approximately 17 monosaccharides constitute the building blocks of N- and Oglycans, and it is the linkages and branching of these residues that gives rise to the complexity of oligosaccharides. One report calculated the number of possible structural isomers based on a hexasaccharide to be greater than 1.05×1012 [7], however in practice N-glycan isomers are present far fewer in number in nature. However,
1
Glycoproteomics in Health and Disease
5
Table 1.1 Common monosaccharides by types Monosaccharide
Type
Abbreviation
2-Keto-3-deoxynononic acid Fucose Galactose Galactosamine Galacturonic acid Glucose Glucosamine Glucuronic acid Iduronic acid Mannose Mannosamine Mannuronic acid N-acetlygalactosamine N-acetlyglucosamine N-acetlyneuraminic acid N-glycolylneuraminic acid Xylose
Sialic acid Deoxyhexose Hexose Aminohexose Uronic acid Hexose Aminohexose Uronic acid Uronic acid Hexose Aminohexose Uronic acid Aminohexose Aminohexose Sialic acid Sialic acid Pentose
Kdn Fuc Gal GalN GalA Glc GlcN GlcA IdoA Man ManN ManA GalNAc GlcNAc Neu5NAc Neu5Gc Xyl
the degree of structural information carried by glycans is potentially a great deal more than proteins or nucleic acids.
1.1.2 Biosynthetic Pathway for N- and O-Linked Glycosylation The complexity of N-glycans (Fig. 1.1) is achieved through a non-template driven biosynthetic pathway that relies on the transcription and translation of glycosylation enzymes that are precisely localized throughout the cell. There are four distinct stages in eukaryotic cells. (i) Formation of the lipid-linked oligosaccharide (LLO) precursor on the surface and in the lumen of the endoplasmic reticulum (ER). This process is highly conserved among all eukaryotes and defects of enzymes in this pathway are the basis for congenital disorders of glycosylation type I (CDG-I) in humans. (ii) The en bloc transfer of the precursor oligosaccharide to the nascent polypeptide in the lumen of the ER facilitated by the oligosacchryltransferase (OST) complex hetero-oligomeric proteins composed of eight subunits. (iii) A series of quality control steps to ensure correct folding. This is achieved by the chaperones calnexin and calreticulin as well as the glycosidase enzymes glucosidase I, glucosidase II and ER α (1,2)-mannosidase. (iv) Glycan processing by the addition of new sugar residues to the truncated glycan in the medial and trans-Golgi (Fig. 1.2). Glycosylation in the Golgi is responsible for the highly complex structural diversity of N-glycans found in mammals and other higher species. The identification of over 200 glycosylation enzymes has aided in understanding the mechanism of glycosylation [8]. Several factors are important to glycan formation including transport rates of glycopeptides from the ER to the Golgi, the duration of glycocopeptides in the Golgi, sugar nucleotide metabolism, and localization of
6
Fig. 1.1 (continued)
W.B. Struwe et al.
1
Glycoproteomics in Health and Disease
7
Fig. 1.2 N-linked glycan biosynthesis in the endoplasmic reticulum and Golgi. Initiation of N-linked glycan biosynthesis occurs on the cytosolic side of the ER, where dolichol diphosphate acts as the scaffold for the extension of sugar moieties to the Man5 GlcNAc2 glycoform. Following a translocation to the lumen side of the ER, further extension is performed and transfer of the Glc3 Man9 GlcNAc2 structure from dolicholpyrophosphate to the nascent polypeptide arriving from ribosomal activity. Calnexin/calreticulin mediated quality control for proper protein folding precedes the migration of the naïve glycoprotein to the cis-Golgi, where further glycan processing occurs. Trimming of mannose residues and addition of a GlcNAc residue on the α (1,3)-mannose arm signals migration to the medial-Golgi, where further assembly of complex glycans occurs. This includes the replacement of the α(1,3)- and α(1,6)-mannose residues on the α(1,6) arm with a GlcNAc and the addition of a core α(1,6)-fucose. Transfer to the trans-Golgi results in the addition of galactose and sialic acid residues, completing the process of N-linked glycan biosynthesis. Glycoproteins are subsequently targeted to specific intra- or extra-cellular locations
Fig. 1.1 Structural examples of N- and O-linked glycans. Glycans can be attached to asparagine and either serine or threonine residues, resulting in the formation of N- and O-linked glycans, respectively. N-linked biosynthesis involves a multitude of glycosyltransferases and glycosidases, together acting to generate glycans of unparalleled complexity. These can broadly be categorized as either high mannose type, hybrid type, or complex type. Examples of each are presented. O-linked glycosylation is defined by the biosynthesis of eight core structures on which further complexity is normally found. The structure of each of the core structures is presented. Individual residues are shown with distinct shapes and shading. Linkage positions are represented by the angle of the line linking adjacent monosaccharides. Anomericity is indicated by using either a full line to represent a β-linkage or a dashed line to represent an αlinkage
8
W.B. Struwe et al.
glycosyltransferases in the Golgi [9]. The focus of functional glycomics is to understand how glycan diversity and microheterogeineity results from and contributes to biology in development and disease. The direct links between glycan structure and gene expression are becoming increasingly important in the context of systems biology, whereby all cellular factors including genomics, proteomics, transcriptomics and metabolomics are largely considered.
1.1.3 Glycan Diversity and Biological Function Deciphering the association between a particular glycan structure and its function is an essential question that plagues glycobiologists. Presently, more than 7000 glycan structures have been determined, but their significance in cellular function remains to be established [10]. Some suggest that glycomics is at least an order of magnitude more difficult than proteomics [11]. Considering the complex diversity of glycans and how they influence proteins, it is no surprise that Schachter [12] once asked, “will it ever be possible to determine the role that a specific posttranslational modification plays in the function of a specific protein for every protein in the genome?”. Nonetheless, notable advances have been made in elucidating the role of N-linked glycans. It is known that glycans influence cell growth and development, tumour growth and metastasis, anticoagulation, immune recognition and response, cell-cell communication and microbial pathogenesis [13]. While defects in the glycan biosynthetic machinery can have fatal or debilitating consequences that manifest in diseases such as autoimmune disorders and lysosomal storage diseases (Section 1.3.4). The goal of functional glycomics is to assign specific glycans to a particular protein and determine their function. There is increasing interest in functional glycomics and several international collaborative efforts have been established, striving to deduce the biological roles of glycans. These groups include the Consortium for Functional Glycomics, EuroCarbDB and the Japanese Consortium for Glycomics. Becoming increasingly evident is that the complexity of glycomics requires a systems approach that investigates biosynthesis, structural analysis as well as glycan-protein interactions to delineate glycan-structure relationships. The many experimental approaches taken to understand the roles of glycans include inhibition of glycosylation, alterations to processing mechanisms, elimination of glycosylation sites, enzymatic or chemical de-glycosylation of complete glycan chains and the study of glycosylation mutants [14]. The consequences of altering individual glycosylation mechanisms are highly unpredictable and the effects can range from virtually undetectable to lethal. Moreover, altering glycosylation changes all glycoprotein structures and functions in a cell simultaneously which needs to be considered in such experiments. Recently, Chinese hamster ovary (CHO) glycosylation knockouts have revealed high mass complex N-glycans in the order of m/z ∼13,000 that consist of up to 26 N-acetyllactosamine repeats [15]. In many cases, investigating the role or structure of glycans through knockout models
1
Glycoproteomics in Health and Disease
9
is beneficial, but the universal presence of glycoconjugates makes understanding cellular phenotypes as a function of their gene and protein components difficult. On the most basic level, glycans alter proteins either intrinsically or extrinsically where the carbohydrate modulates the function of the underlying peptide. The external location of glycans on proteins can serve as a shield, protecting the protein from proteases or antibodies. Carbohydrates are exceedingly hydrophilic which alters the conformation and solubility of proteins. Protein folding is driven by its folding energy landscape initiated by hydrophobic collapse. The free energy for each possible conformation is largely determined by the primary sequence and by the contacts of its nonpolar groups. The addition of carbohydrates during translation in the ER greatly alters the energy landscape of proteins. Depending on the size, glycoforms and extent of occupancy, a protein will fold until a native structure is formed and the lowest free energy is reached. Individual protein motifs (α-helices and β-turns) fold within microseconds, which is why quality control measures are in place to determine proper folding before any protein exits the ER [16]. The presence or absence of glycans can affect the primary function or activity of a protein. For example, β-human chorionic gonadotrophin (β-HCG) can bind with similar affinity to its receptor with and without its glycan component. β-HCG activates adenylate cyclase leading to increased cAMP production, but β-HCG fails to do so in its deglycosylated form [14, 17]. This illustrates that glycans can regulate the primary function of a glycoprotein without changing its binding properties. Glycans can also influence the longevity of proteins to which they are attached. In the case of human erythropoietin (EPO), the presence of sialic acid on its termini increases the half-life, but decreases the activity in vitro [18]. The extent of branching can determine binding of EPO to its receptor in specific tissues [19]. The tuning effect of glycan sequences act in protein function, although the effect may be a change in the binding mechanism seen through changes in glycoprotein structure. Glycans act as specific ligands for endogenous and exogenous receptors. The role of glycans as ligands for lectins is perhaps the most well explored functional aspect of oligosaccharides in cellular systems. For example, the glycoprotein hemagglutinin (HA) on the surface of the avian influenza virus is responsible for binding to the viral host cell [20]. Avian HA binds specifically to α (2,3)-sialylated glycans, which are absent in the respiratory tract of humans. It is thought that a switch in HA binding from α(2,3)-sialylated glycans to α(2,6)-sialylated glycans, which are present in humans, enables infections in humans [20]. Currently functional glycomics lacks any high-throughput method for determining the site specific structures and functions of each glycan moiety on a case by case basis. Structural characterization of glycans is only one of many important aspects of functional glycomics. The challenges include understanding glycan structure as a function of extracellular signaling, determining the basis for glycan-protein specificity and interactions, and elucidating how glycan diversity is generated as a function of its biosynthesis. Furthermore the biology and biosynthesis of glycosylation on a cellular, let alone multicellular level remains unclear. Addressing the fundamental biology of glycosylation is vital in order to link all facets of functional glycoproteomics.
10
W.B. Struwe et al.
Genomics, proteomics and metabolomics all play a part in glycosylation thereby necessitating researchers to consider additional factors in determining the role of oligosaccharides in disease. The new era of glycomics will not only set out to answer the structure-function relationship of glycans, but will seek to determine the extrinsic factors that lead to the variable glycosylation observed in disease and what the implications are for the patient. New analytical trends will aspire to comprehensive positional structural analysis of glycoproteins on a sensitive and high-throughput scale. But the function of a particular glycan(s) on a specific protein and its corresponding expression is the key to understanding glycoproteomics role in the clinical setting.
1.2 Glycan and Glycoprotein Analytics The drive to understand glycoproteomics has fueled the technological development of new and innovative techniques that aim to determine the sequence of both the protein backbone and glycan component in addition to site occupancy. Glycoprotein analysis seeks to determine not only the overall glycan profile of a given glycoprotein, but the individual glycoforms on each site of glycosylation that together contribute to the complete glycan profile. Methods for analyzing glycoproteomics and glycans are developing rapidly to investigate these problems. The majority of glycan analytical techniques employ high performance liquid chromatography (HPLC), nuclear magnetic resonance (NMR), mass spectrometry (MS), capillary electrophoresis (CE), lectin arrays, glycan arrays or 2D gel electrophoresis (2-DE). Mass spectrometry has emerged as the preferred tool to determine the structure of intact and/or digested glycoproteins because of its sensitivity and capability to determine the size of higher mass molecules [21–23]. The abundance and variability of glycosylated proteins in biological samples are the most daunting factors in the field of glycoproteomics. For this reason, glycoprotein analysis may incorporate enrichment or purification of specific glycoforms via lectins, immunoprecipitation, and single- or two-dimenional gel electrophoresis prior to MS analysis. In bottom-up methods glycoproteins are digested by a specific protease (e.g. trypsin) following enrichment. From this step, glycopeptides can be further enriched by lectins, hydrophilic interaction chromatography (HILIC), size-exclusion chromatography or hydrazine resin [24]. Mass mapping of the glycopeptides via tandem MS (MS/MS) will provide information of glycan structure and heterogeneity. Concurrently, MS of the deglycosylated peptide, after PNGase F for N-glycans and reductive β elimination for O-glycans, aids in determining site occupancy. PNGase F removes N-glycans between the reducing end N-acetylglucosamine (with and without α (1,6) linked fucose) and the asparagine residue on the peptide. The asparagine residue will form an aspartic acid following PNGase F treatment, resulting in a one m/z mass shift in the mass spectrometer. Topdown methods analyze intact glycoproteins, which is valuable in identifying the type and location of a particular glycans composition. Top-down approaches are not ideal for large glycoproteins, heterogeneous or novel glycoproteins. Moreover, top-down
1
Glycoproteomics in Health and Disease
11
strategies do not supply in-depth glycan information due to the loss of the glycan component during the fragmentation step or the type of glycan fragment detected, which is typically between the penultimate and reducing end N-acetylglucosamine. This common fragmentation does little more than confirming the presence of the glycan and overall topology. Generally, additional experiments are required for complete glycan analysis.
1.2.1 Mass Spectrometry Current approaches for glycan analysis mainly focus on the liberated glycan and overlook site occupancy. Alternatively analysis of intact glycoforms does not provide the detailed structural information achievable from purified glycans. The difficulty with glycoproteomics is the impediment of analytical and bioinformatics tools to support researchers in defining the structure of glycoproteins. Mass spectrometry is the chosen tool to analyze glycans because of the sensitivity, reproducibility and relative ease it provides. MS based analyses of glycans employ both matrix assisted laser desorption ionziation (MALDI) and electrospray ionization (ESI) ion sources either coupled to on-line chromatographic separation systems. Each source has its advantages and disadvantages. MALDI-time of flight (TOF) instruments offer analysis in the higher mass range with limited sample expenditure and can tolerate minor sample contaminants. However, the sensitivity and level to which a precursor ion can be isolated and fragmented (i.e. sequential mass spectrometry or MSn ) is limited. MALDI and ESI ionization result in primarily protonated or sodiated ions in positive and deprotonated ions in negative mode, especially when sialic acids are present. However, the individual wet-lab workflow can greatly influence the type of ion/adduct present. In addition to sodium ions, lithium and potassium ions are also likely. ESI based instruments can be joined with a greater range of mass analyzers and therefore can be used in different capacities and access more information based on the type of sample being analyzed. Negative mode ESI-QTOF MS/MS has been used with great success in analyzing native glycans from cancer samples [25– 28]. The attraction of negative mode analysis is that the majority of fragments generated during the collision induced dissociation (CID) step are cross-ring cleavages, which are more informative than B-type and C-type fragments typical of positive mode analyses. Likewise MSn techniques are unparalleled in the amount of structural information that can be generated [29–31]. MSn techniques have been instrumental in detecting cancer specific glycans or potential biomarkers in metastasis [32]. Typically, ESI analyses require larger sample volumes that must also be more pure than those subject to MALDI. As a result, many ESI analyses will couple the MS instrument to a chromatographic separation, typically porous graphitized carbon (PGC), amide-80 resins, ion exchange or reverse phase C18 or C4 . The consequence of on-line MS approaches is the level of “mining” that can be performed. Specifically, for MSn applications, the procedure requires a direct and continuous
12
W.B. Struwe et al.
injection so a single m/z composition can be analyzed for greater periods of time which is not possible in LC-MS/MS or LC-MSn methodologies. Generally, a full MS analysis of N- or O-glycans will incorporate both types of ion sources. All in all, sample preparation and purity is crucial in any MS approach and LC separation, either off- or on-line can be advantageous. Alternatively solid phase extraction using PGC, C18 or cation exchange resin is also useful to purify carbohydrates from proteins and/or salt contaminants [33]. Glycans can also be derivatived by permethylation or methylation, which are compulsory for MSn and sialic acid linkage detection respectively [34, 35]. Permethylation also acts as an adhoc purification step and the permethylated products ionize more readily as sodium adducts so that sialic acid and neutral structures can be analyzed simultaneously.
1.2.2 High Performance Liquid Chromatography In addition to mass spectrometry, HPLC is an equally robust analytical tool used to analyze glycans. The advantage that HPLC has over MS is its increase sensitivity, quantitative qualities, reproducibility and the ability to provide monosaccharide sequence and linkage information. However, novel glycans routinely require MS to confirm the overall composition or structure. The analysis of the serum glycome relies on the ability to detect low abundant glycans due to the minimal expression of the majority of serum glycoproteins. Although HPLC is not generally used for analysis of whole or digested glycoproteins, its attributes towards glycan analysis makes it favorable for analysis of the glycan component. Additionally, HPLC-based glycan characterization have emerged as a powerful high-throughput method [36]. Amide-based hydrophilic interaction chromatography (HILIC) glycan separation is a well-established technique capable of providing highly reproducible profiles [37–40]. Since 2-aminobenzamide (2-AB) labeling of glycans has a stoichiometry of 1:1, the resulting chromatogram is quantitative. HPLC is also helpful to analyze charged and neutral glycans simultaneously, since the analysis does not cleave sialic acid bonds as in the case of MALDI-MS and ESI-MS. The use of a dextran ladder enables the glycan retention times to be standardized into glucose unit (GU) values, which allows the normalization of all glycans analyzed by HILIC, effectively eliminating variability between HPLC systems. HPLC (and MS) analysis can be strengthened by exoglycosidase digestion because of the specificity of the enzymes which only cleave specific linkages, monomers and anomers. HPLC techniques have also grown to include the option of sialic acid quantitation and speciation, facilitated by the use of weak anion exchange HPLC (WAX-HPLC) and 1, 2-diamino-4, 5-methylene-dioxybenzene (DMB) labeling. In sialic acid speciation, terminal sialic acid residues are typically released from glycoproteins by acid hydrolysis and the resulting isolated sialic acid-based saccharides are labeled with DMB. Similar to 2AB, DMB couples with sialic acid in a 1:1 stoichiometry,
1
Glycoproteomics in Health and Disease
13
allowing for the relative quantitation of individual sialic acid species. Labeled glycans are separated by reverse phase-HPLC (RP-HPLC) using a C18 solid phase, where sialic acid orthologues such as Neu5Gc (N-glycoylneuraminic acid), Neu5, 7Ac2, and Neu5, Gc9Ac are identified and quantified. Separation of glycans by WAX-HPLC assists in identifying the number of sialic acid moieties present on a given glycoprotein. Used in conjunction with sialidase digestions, glycans can then be verified as containing terminal sialic acids of a known number. Whether glycans are present that carry a negative charge not provided by N-acetylneuraminic acid, such as phosphate or sulfate, can be determined. From an analytical perspective, quantitation of individual sialic acid species provides essential information that cannot be accomplished by MS methods alone.
1.2.3 2D Gel Electrophoresis 2D gel electrophoresis (2-DE) is a notable tool to analyze complex mixtures of glycoproteins from serum, tissue or whole cell lysates. 2-DE separates proteins and glycoproteins based on their isoelectric point in the first dimension and molecular mass in the second dimension. The advantage of 2-DE lies in the fact that whole proteomes are analyzed and the data reflects the presence of isoforms and changes in glycosylation of specific proteins between samples. 2-DE can also accompany MS analysis to further protein identification and characterization. Depletion of high abundant proteins (albumin, IgG, anti-trypsin, IgA, transferrin and haptoglobin) is required for analysis of serum samples and the failure to do so results in masking of medium- to low-abundant proteins. Reports suggest that relevant disease markers will not be one of the six aforementioned proteins, but will instead be proteins excreted or cleaved from specific tissue sites [41]. Aside from serum proteins, housekeeping proteins are also a consideration for 2-DE, where the number may be 105 to 106 copies per cell [42]. Comparatively speaking the number of protein receptors may only be present in <100 copies per cell, making marker identification unlikely without enrichment. Changes in glycosylation can easily be detected on 2D gels, since the addition of neutral or charged residues will alter a glycoproteins mass and overall charge and therefore will change the location of that protein on the gel. However, the structure or even composition of the glycan cannot be resolved via 2-DE and requires MS analysis. A modified type of 2-DE analysis has been developed that enables comparative proteomics in a single gel run. This method, termed 2D difference gel electrophoresis (2D-DIGE), fluorescently labels two samples with two different fluorescent dyes (CyDyes Cy3- and Cy5-), images and superimposes the gel spots [43]. This method effectively removes the high variability when comparing individual 2D gels and allows relative quantification of medium- to low-abundant proteins to identify disease markers [44]. Furthermore glycoproteins can be fluorescently detected in gels or electroblots using glycoprotein stain [45].
14
W.B. Struwe et al.
1.3 Glycosylation and Disease Glycans heavily influence the development of many diseases such as congenital disorders of glycosylation (CDGs), rheumatoid arthritis (RA), cancer, diabetes, lupus, HIV/AIDS to name a few. The question of why and how glycans influence disease remains, but much has been discovered as to how glycans affect protein structure/function, localization of glycoproteins and disease phenotypes. Many researchers examine the subtle changes of glycan expression as markers for disease whereas others look to understand the fundamental cause and effect of aberrant glycosylation. By and large the role of glycosylation in diseases remains an open area of investigation, but there is increasing information regarding the mechanisms concerning oligosaccharide processing and their role in pathology.
1.3.1 Glycosylation in Cancer Biology Defects that occur in life cycle of a cell (such as growth, survival, and apoptosis) potentially lead to the development of cancer. Progression of cancer throughout a body is reliant on the ability of a cancerous cell to separate from a tumour and disseminate to distal locations in different organs. Tumour metastasis involve a number of mechanisms that aid in growth, invasiveness, dissemination, angiogenesis, and immune system evasion. Central to the metastatic potential of a tumour malignancy is the ability to influence cell-cell adhesion, growth capacity, and motility, all of which contribute to the establishment of tumours in distal sites throughout the body. Changes in N-linked glycosylation have been implicated in many of the key features of metastatic potential, presumably through the extensive glycosylation observed on cell surface proteins. For example, growth factor receptors such as epidermal growth factor receptor (EGFR) and transforming growth factor-β receptor (TGFβR), in addition to adhesion molecules such as E-cadherin and various selectins are known to play a significant role in the growth and spread of a number of cancers [46, 47]. Alterations to cell surface glycoproteins typically result from changes within the glycan biosynthetic machinery, where modulation in the expression and/or activity of specific glycosyltransferases results in the assembly of altered glycoproteins important in tumour progression. A number of key glycosyltransferases have been implicated in affecting tumour growth and spread. Specifically, glycosyltransferases involved in N-acetylglucosamine transfer as well as sialyltransferases and fucosyltransferases are recognized as influential to tumour progression.
1.3.1.1 Glycosyltransferases and Cancer Progression A hallmark of tumour metastatic potential is the ability for rapid growth and proliferation, which is often coordinated through the prolonged exposure of growth factor receptors on the cell surface [46]. It is now appreciated that increased N-glycan branching affects cell proliferation by increasing the retention of growth receptors at the cell surface, mediated through galectin-3 which binds branched N-glycans
1
Glycoproteomics in Health and Disease
15
to form lattices that prevent recycling of the growth receptors [48, 49]. The activity of β(1,6)-N-acetylglucosaminyltransferase-V (GnT-V), which is responsible for the catalytic addition of a GlcNAc moiety in a β(1,6) linkage to the α(1,6)mannose of the core N-glycan structure, produces the GlcNAc3 Man3 GlcNAc2 structure (Fig. 1.2). Mice deficient in mgat5, the gene encoding GnT-V, are less susceptible to tumour growth and metastases [50] while low metastatic prostate tumour cells engineered to over express mgat5 demonstrate increased invasiveness [51]. With an apparent role in tumour biology, detection of products from GnT-V activity using lectins such as plant leukoagglutinin have been used as markers in the tumour progression of breast and colon cancers, with increased branching of N-glycans often being associated with a poor prognosis [52, 53]. Growth receptors containing glycans with increased branching due to GnT-V play an important role in tumour growth and proliferation, owing largely to the extended retention of growth receptors at the cell surface of tumour cells (Fig. 3.2). Apart from its role in N-linked glycan branching, a secreted form of GnT-V has also been documented, which has been implicated in tumour angiogenesis [54]. Surprisingly, this function is independent of glycosylation and suggests that this glycosyltransferase has important functions beyond glycan biosynthesis. Additionally, branching of N-glycans by GnT-V provides the scaffold for increased poly-Nacetyllactosamine expression [55]. Poly-N-acetyllactosamines are repeats of the N-acetyllactosamine (LacNAc) disaccharide Galβ1–4GlcNAcβ1–3 and are associated with tumour progression (Fig. 1.3) [56]. The terminal β (1,4)-galactose residue
Fig. 1.3 Terminal N-glycan motifs which are associated with cancer metastasis. The expression of sLea , sLex , sLey antigens, N-acetyllactosamine (LacNAc) and polysialic acid residues are examples of glycan epitopes that are markedly enhanced in cancer cells
16
W.B. Struwe et al.
interact with galectins on the endothelium and may influence tumour adhesion or metastasis. In contrast to GnT-V, the activity of GnT-III is understood to negatively regulate tumour metastasis, in part by acting to eliminate a suitable substrate for GnT-V (Fig. 1.4). GnT-III, or β (1,4)-N-acetylglucosaminyltransferase-III, adds a bisecting GlcNAc moiety in a β (1,4) linkage to the β (1,4)-linked mannose of the chitobiose core structure. Interestingly, the overexpression of GnT-III prevents the proteolytic degradation of E-cadherin, an important cell surface glycoprotein involved with cell-cell interactions. Increased expression of GnT-III negatively results in the extended presence of E-cadherin on cell surfaces, therein promoting cell-cell adhesion [57]. This fits in line with a counter function of GnT-III compared to GnT-V. Improved cell-cell interactions prevent the shedding of potentially tumorigenic cells for subsequent dissemination. Changes in the levels and specification of fucosylation are largely implicated in cancer metastasis. Fucosylation is regulated by fucosyltranserases, GDP-fucose synthetic enzymes and GDP-fucose transporters present in the Golgi. Fucosylation is catalyzed by various fucosyltransferases and is dependent on the availability of GDP-fucose as a substrate for the remodelling of N-glycans. Reports have shown that the expression of GDP-galactose transporter mRNA is increased in colorectal
Fig. 1.4 Changes in glycosylation gene expression associated in tumour progression. GnT-V and FucT enzyme activity positively influences cancer metastasis while GnT-III negatively affects tumour progression by generating bisecting N-glycans that prevent catalysis by GnT-V and FucT. Examples of the influence of glycans on tumour progression include E-cadherin and epidermal growth factor receptor (EGFR)
1
Glycoproteomics in Health and Disease
17
cancer [58] and Zipin et al. [59] proposed increases of GDP-fucose could control the metastatic potential of colorectal cancer. The GDP-fucose transporter was found to play the most pivotal role in fucosylation among hepatocellular carcinomas (HCC) patients [60]. Fucosylation of α-fetoprotein is a well known tumour marker for HCC and can be used to distinguish between HCC and chronic hepatitis and liver cirrhosis [61, 62]. Increases in FUT8 expression, which encodes the (1,6)-fucosyltransferse (FucT) and adds fucose to the reducing end N-acetylglucosamine residue, are observed in HCC patients. The biological significance of the increase in core fucosylation is unclear in HCC. On the other hand, core fucosylation is necessary for EGFR function and EGFR is broadly involved in cellular growth, cellular differentiation and migration [63]. Additionally expression of α(1,6)-fucosyltransferase and E-cadherin, a calcium-dependant cell surface protein responsible for cell-tocell adhesion, is increased in colorectal cancer (Fig. 3.2). Fucose on the reducing end N-acetylglucosamine is believed to modulate the structure/function of Ecadherin thereby limiting the cells ability to maintain cell-cell junctions, increasing metastasis capability [64]. Sialic acid plays an important role in cancer biology due to its broad involvement in cell-cell interactions, both in a repulsive and attractive mode. Sialic acid residues are often the final monosaccharide attached to growing glycan structures and as such, they provide a number of critical features in cell biology both for better and worse. Sialic acid itself is a nine carbon monosaccharide bearing a net negative charge at biological pH, thereby influencing the electrostatic properties of the associated glycoprotein. In combination with GnT-V activity, increased branching of glycans promotes additional sites for sialylation following galactosylation of the terminal GlcNAc residues. Implications for enhanced sialylation are significant when considering the function of sialic acid binding Ig-like lectins (siglecs) and selectins which all show preference for sialylated glycoproteins [65, 66]. Sialic acids play a part in tumour extravasation when attached to Lewis antigens (Fig. 1.3). Lewis antigens are blood group antigens expressed by endothelial cells that, when carried by glycolipids or lipoproteins, are absorbed by red blood cells and maintained on the cell membrane. Synthesis of the Lewis antigens is performed by the activity of either an α(1,2) or an α(1,3)/ α(1,4) fucosyltransferase, which attach fucose residues to terminal GlcNAc moieties. A total of 4 Lewis antigens are known, with derivatives arising due to sulphation at various positions on the antigen. The formation of sialyl Lewis antigens has been shown to occur through the activity of the β-galactoside α(2,3)-sialyltransferase (ST3Gal IV) and α (1,3/4)-fucosyltransferase (Fuc-TIII) in colonic tissues, although synthesis does not necessarily rely on increased expression of these enzymes [67]. Sialylation of Lewis antigens influences the biological function of the associated glycoprotein, as the sialyl Lewis antigens are known to be ligands for selectins [68]. Selectins are involved in cell adhesion and assist in the early stages of leukocyte trafficking. Interactions of selectins with their ligands promote leukocyte rolling and tethering on vascular surfaces, which is an important feature of leukocyte recruitment at the site of inflammation or injury. Perhaps most significant is the role of selectins in
18
W.B. Struwe et al.
facilitating the extravasation of leukocytes across the vascular wall [69]. It is now appreciated that malignant cancers have an increased expression of the sialyl Lewis antigens [70]. Increased expression of cancer cells with sialyl Lewis containing glycoproteins enhances the likelihood of interaction with selectins. This has serious implications as it provides the opportunity for the cancer cells to breach the vascular system and progress cancer malignancy. Aside from typical α(2,3)- and α(2,6)-linked structures, sialic acid residues can form α(2,8)-linkages to terminal sialic residues to form linear polysialic acid structures (Fig. 1.3). Two enzymes are responsible for polysialic acid formation, α (2,8)-sialyltransferase II (ST8Sia II) and α (2,8)-sialyltransferase IV (ST8Sia IV). Polysialic acids are found on neuronal cell adhesion molecules and increases in their expression are found in non-small cell lung carcinomas [71]. However, polysialic acid due to the expression of ST8Sia IV is present in both normal and tumour lung tissue, whereas expression from ST8Sia II correlates with tumour development [72]. 1.3.1.2 Glycoproteins as Markers for Cancer Detection Alterations to the detailed structures of N- and O-glycans expressed on proteins are commonplace in tumours. The nature of aberrant glycosylation in cancer is reasonably well understood at the cellular level, but the pathophysiological role of glycoproteomics in cancer development throughout the body remains of key interest. Alterations to glycans differ from tissue to tissue as well as cancer type, as illustrated in the comparatively small set of cancer specific glycoprotein biomarkers isolated from serum, especially those that are recognized by the Food and Drug Administration (FDA) (Table 1.2). Much of the focus of glycoproteomics and cancer is on markers for disease because of the requisite for early detection for treatment. Glycans present within the peripheral circulation as a result of glycoprotein secretion or shedding from tumour cells may potentially provide diagnostic or prognostic markers for cancer. For instance, the serum levels of CA19-9, a long established carbohydrate antigen bearing the sLea epitope, is the most commonly used diagnostic and prognostic indicator for pancreatic cancer. As well as use in serum analysis, antibodies against epitopes such as sLea , Lex , sLex and Ley are incorporated in the histological evaluation of biopsy specimens from a range of malignancies including breast, colorectal and non-small cell lung carcinoma [73–75]. Clearly, expression of the sLex glycan plays an important role in the progression of cancer. An important question to be addressed is which serum glycoproteins carry altered levels of the sLex epitope. Increased levels in sLex epitopes in cancer have been found on haptoglobin, α1-acid glycoprotein and α1-antichymotrypsin [76, 77]. These glycoproteins are acute phase proteins, whose concentration alters in response to inflammatory stimuli [77]. They are expressed in liver hepatocytes that can be stimulated by cytokines from the tumour microenvironment [77]. Increases in both protein concentration and sLex epitope on these acute phase proteins was found in chronic inflammatory conditions [77], therefore these could be changes observed as a result of chronic inflammation, which often accompanies cancer.
1
Glycoproteomics in Health and Disease
19
Table 1.2 Glycoprotein biomarkers for cancer Protein/Marker
Cancer type
FDA approved
Alkaline phosphatise α-fetoprotein (AFP) CA 125 (MUC16) CA 15.3 (MUC1) CA 19.9
Colorectal Gastric, liver, testicular Gastric, lung, ovarian Breast Colorectal, gastric, liver, pancreatic Pancreatic, gastric Breast Pancreatic, gastric, colorectal, hepatic carcinomas Gastric Breast, colorectal, gastric, lung, pancreatic, thyroid Choriocarcinoma, testicular Pancreatic, prostate
√ √ √ √
CA 242 CA 27.29 (MUC1) CA 50 CA 72-4 Carcinoembryonic antigen (CEA) Human chorionic gonadotropin Chromogranin A (parathyroid secretory protein 1) Colony stimulating factor 1 receptor Complement factor H protein Epidermal growth factor receptor Follicle-stimulating hormone Hepatocyte growth factor Human chorionic gonadotropin-β Immunoglobulin G (IgG) Inhibin Kallikrein 10 Kallikrein 11 Kallikrein 3 (prostate specific antigen, PSA) Kallikrein 5 Kallikrein 6 Kallikrein 7 Kallikrein 8 Mesothelin Epithelial cell adhesion molecule (EpCCAM) OVX1 Prolactin Ribonuclease 1 (RNase 1) Soluble interleukin-2 receptor α (CD25) Thyroglobulin ErbB2, Her2/neu Vascular endothelial growth factor A (VEGF) von Willebrand factor
√
Breast, ovarian Bladder Colorectal Ovarian Bladder, breast, colorectal, gastric, lung Testicular Ovarian Ovarian Breast, prostate Ovarian, prostate Prostate
√
√
√
Ovarian Breast, ovarian Ovarian Ovarian Mesothelioma Breast, colorectal Endometrial carcinoma, ovarian Endometrial carcinoma Pancreatic Breast, leukemia, lung, gastric Thyroid Breast Breast, colorectal Colorectal, squamous cell carcinoma
√
√ √
20
W.B. Struwe et al.
Haptoglobin is a glycoprotein with haemoglobin-binding capacity and as such prevents iron loss and kidney damage during haemolysis. Human haptoglobin has two α and two β chains; one β chain contains four Asn-linked glycosylation sites, all of which may be occupied [76]. Glycosylation accounts for approximately 19% of β-haptoglobin mass and consists of neutral, monosialylated and disialylated biantennary and disialylated and trisialylated triantennary complex glycans [76]. Increase in expression of haptoglobin as well as increase in its fucosylation, sialylation and branching has been found in cancer and in inflammatory diseases [76, 78]. α1-acid glycoprotein (AGP) is synthesized mainly by hepatocytes and is heavily glycosylated [79]. It binds and transports several drugs of endogenous and exogenous origin. Functionally, AGP has a beneficial role in wound healing, protection against tissue damage and modulates immune and inflammatory response [76]. Serum concentration of AGP elevates two to fivefold during an acute phase response. Protein synthesis and glycosylation are independently regulated by cytokines and glycocorticoids [76]. AGP stimulates cytokine secretion which contributes to the inflammatory response [76]. Human AGP has five putative glycosylation sites, the glycans can have bi-, tri- or tetrantennary structure and 12% of glycans are sialic acids responsible for AGP’s negative charge [76]. Increase in sLex has been found in inflammation and cancer [25, 76]. Biantennary glycans on AGP were found to be increased in acute inflammation and decreased in chronic inflammation [76]. α1-antichymotrypsin is a glycoprotein secreted by the liver and has a carbohydrate content of 24% [80]. Its concentration increases more than four times in response to inflammatory stimulus and is increased in cancer [76]. α1antichymotrypsin inhibits chymotrypsin-like proteases, regulates cathepsin G activity, modulates the cellular functions of neutrophils and lymphocytes and inhibits platelet-activating-factor synthesis [76]. α1-antichymotrypsin has six potential glycosylation sites, where identified glycans were found to be disialyl bi-antennary, trisialyl tri-antennary and disialyl tri-antennary [76]. sLex on α1-antichymotrypsin is increased in inflammation and cancer [77]. Changes in immunoglobulin G (IgG) glycosylation are also observed in ovarian cancer [76, 77]. IgG in ovarian cancer patients expresses a marked difference in glycosylation pattern, specifically in the level of sialylation, terminal galactose and N-acetylglucosamine residues and core fucosylation. It is plausible that the diverse glycoforms may differ in the efficiency of interactions with ligands [61, 81–83]. It is known that the level of galactosylation and sialylation on IgG are decreased in ovarian cancer [25]. An increase of agalactosyl IgG oligosaccharides can be the result of decreased β(1,4)-galactosyltransferase (Gal-T) activity in plasma cells [84], or increased production of specific subsets of plasma cells with low expression levels of galactosyltransferases [85]. Increase of the agalactosyl IgG glycoform has predominantly been identified with tumour progression and metastasis of prostate, gastric and lung cancer [86, 87], as well as in other diseases such as rheumatoid arthritis, tuberculosis, inflammatory bowel disease [84, 88], and vasculitis [89]. Kaneko et al. [83] has shown that sialylation of IgG reduces cytotoxicity of natural killer (NK) cells, exhibiting an anti-inflammatory effect, and a decrease in
1
Glycoproteomics in Health and Disease
21
sialylation on IgG glycans has also been found in rheumatoid arthritis [90]. Terminal GlcNAc of agalactosyl IgG oligosaccharides on the Fc region of the IgG molecule can be recognized by the serum lectin mannose-binding lectin (MBL) resulting in complement activation [81]. Presence of bisecting GlcNAc and lack of fucose on IgG increases its cytotoxicity [61, 82]. Changes such as these can benefit cancer cells by altering IgG function and the subsequent immune response. It is generally thought, whether correct or not, that clinically significant biomarkers will originate from leakage, secretion or shedding of proteins from specific lesions and be present in low abundance within the peripheral circulation [91]. Aberrant glycosylation of certain glycoproteins secreted directly from the tumour site has already been investigated with a view to increasing the overall sensitivity and specificity of cancer detection. A notable example is that of prostate specific antigen (PSA), currently the only serum marker approved for use in the diagnosis of prostate cancer. However, PSA is not specific to the disease as benign conditions such as benign prostatic hyperplasia or prostatitis can exhibit elevated levels of serum PSA [92]. A number of studies have compared the glycosylation of PSA in prostate cancer with control or benign specimens and demonstrated that detection of aberrant glycoforms of PSA in cancer may improve diagnostic specificity [93–95]. Another example is that of ribonuclease 1 (RNase 1), a member of the RNase A superfamily and proposed tumour marker for pancreatic cancer. Similarly to PSA, serum levels of RNase 1 lack specificity for pancreatic cancer and have been associated with age and other conditions [96]. However, RNase 1 purified from pancreatic cancer serum specimens was found to have an increase in core fucosylation of sialylated biantennary glycans in comparison with control [97]. The analysis of alterations in glycosylation on low abundance glycoproteins is likely to enhance the specificity of cancer diagnosis.
1.3.2 Role of Glycosylation in Autoimmune Diseases Glycosylation is central to various autoimmune diseases, including rheumatoid arthritis (RA), Crohn’s disease, HIV, systemic lupus erythematosus (SLE), IgA nephropathy, autoimmune hemolytic anemia, Graves’ disease, diabetes type I, myasthenia gravis and schizophrenia [98, 99]. Antibodies have variable glycosylation patterns and alterations or defects of the N-glycans greatly alter their immunogenic properties [100, 101]. The change in glycosylation (etiology) varies for each disease, but alterations in IgG glycoforms are responsible for RA, SLE and autoimmune haemolytic anemia. IgGs have one glycosylation site in the Cγ2 domain of each heavy chain (Asn 297), which is a complex bi-antennary type having varying degrees of terminal galatosylation and sialylation. RA is arguably the most studied glycosylation-related autoimmune disease and its link to IgG glycoforms was established in 1975 [102]. A critical feature of RA is a marked increase in IgG glycoforms that lack terminal galactose residues (termed G0). The two antennae of these N-glycans are capped with N-acetylglucosamine residues and the levels of G0 correspond to the severity of RA symptoms [103]. The
22
W.B. Struwe et al.
significance of exposed GlcNAc residues is potentially concerning as it can be recognized by the serum mannose-binding lectin. MBL can bind terminal GlcNAc, fucose, glucose and mannose residues and once activated, MBLs can activate the mannose-binding lectin pathway, a key event in the inflammatory cascade in RA [81]. People with compromised autoimmunity are at higher risk to developing schizophrenia which has also recently been attributed to defects in glycosylation. Excitatory amino acid transporters (EAATs) are aberrantly N-glycosylated in schizophrenia patients [104]. EAATs are membrane bound channels in the neurotransmiting transporters and are involved in glutamatergic signalling by uptake of glutamate from the neuronal synapse into Glia cells. Additionally, a decrease in plasma α (1,3)-fucosyltransferase activity is recognized in schizophrenia patients as well as an increase in plasma α (2,6)-sialylytransferse activity [105, 106].
1.3.3 Congenital Disorders of Glycosylation (CDGs) Congenital Disorders of Glycosylation (CDG) are inherited autosomal recessive diseases caused by defects in the biosynthesis of N-glycans. CDGs are categorized as type I or type II depending on where the defect occurs in the biochemical pathway. Type I disorders, of which there are 15 types, are characterized as defects of enzymes in the assembly of the LLO precursor in the ER. Type II disorders are caused by defects of enzymes, transporters or components of the Golgi that are involved in trimming and remodelling the protein-bound oligosaccharide. To date there are 10 known type II defects. Type I disorders are more commonly diagnosed than type II, with approximately 1000 cases documented since 1980 [107]. Mutations in phosphomannomutase II cause CDG-Ia, which is the most common subtype with nearly 900 cases worldwide. The importance of protein glycosylation is dramatically illustrated by the severity of the disease. CDG patients suffer from psychomotor retardation, low muscle tone, incomplete brain development, visual problems, seizures, stroke-like episodes, coagulation disorders, endocrine abnormalities and overall failure to thrive [108, 109]. There are two general trends of glycosylation in each CDG type. CDG-I illnesses are distinguished by hypoglycosylation of glycoproteins, where complete residues are absent and site occupancy is less than normal. Glycosylation in CDG II is marked by incomplete structures, namely a decrease in terminal sialylation or galactosylation. CDG subtypes are still being discovered and some CDG cases exist without establishing the etiology of the disease. In these patients there is strong clinical evidence of CDG (i.e. isoelectric focusing of serum transferrin) but the defective gene or enzyme has yet to be identified [110]. In addition to the unknown types of CDGs, the clinical presentation of CDG is inconsistent and poorly understood. In any given type or subtype, the range of symptoms is heterogeneous and CDG is commonly misdiagnosed as cerebral palsy [111]. CDG is a rare disease, but in population studies of the most common form (CDG-Ia), the carrier frequency of the mutation R141H, is 1/60 [112]. The calculated frequency of homozygotes for this mutation is 1/3600, although there is a total lack of homozygotes in the population.
1
Glycoproteomics in Health and Disease
23
The expected homozygote R141H/R141H genotype is 45–60% of CDG-Ia patients. Patients with CDG-Ia are compound heterozygotes for R141H. Given homozygotes of R141H are most likely lethal, CDG-Ia with this genotype are compelling candidates for neonatal deaths [113]. In addition to the infrequency of CDG diagnoses, there is evidence suggesting a clear disequilibrium with the frequency of the disease in the population [114]. Two possibilities exist to explain the lack of CDG patients diagnosed, either those afflicted never reach birth or CDG illnesses are severely under-diagnosed [115, 116].
1.3.4 Lysosomal Storage Diseases (LSDs) The lysosome is responsible for degradation of macromolecules in the cell, including lipids, nucleic acids, proteins and carbohydrates. The lysosome contains over 50 acid hydrolases that are specifically targeted and localized to the organelle by bearing the mannose-6-phosphate recognition residue on their respective N-glycan(s). The addition of mannose-6-phosphate occurs in the Golgi and is achieved by a phosphotransferase that recognizes lysosomal enzymes. The phosphotransferase adds an N-acetylglucosamine-1-phosphate to the terminal mannose residue and an Nacetylglucosamine-1-phosphodiesterase subsequently removes the N-acetylglucose. The resulting N-glycan structure will display a terminal mannose-6-phosphate residue. Defects in this pathway result in lysosomal storage disorders (LSD). By and large LSDs result from a deficiency in the enzymes or enzyme co-factors responsible for the metabolism of mucopolysaccharides or sphingolipidoses in the lysosome. The result is incomplete breakdown of macromolecules which leads to their accumulation in the lysosome. Eventually the lysosome will enlarge and can lead to tissue and organ dysfunction. There are over 40 different types of LSDs and are frequently misdiagnosed due to overlapping symptoms with other diseases in combination with a general poor understanding of LSD etiology. The frequency of LSD illnesses have been calculated as high as 1/7000 births [117]. Clinical presentation of LSD varies from coarse facial features to mental retardation to a vegetative status [118]. There are several established therapies to treat LSD including chemical chaperone therapy, bone marrow transplantation, substrate deprivation therapy, and enzyme replacement therapy (ERT). ERT can be used to treat Gaucher’s disease [119], Fabry disease [120], Pompe disease [121], and more recently, mucopolysaccharadosis I [122], II [123], and VI [124]. However ERT is considerably expensive and ineffective in treating central nervous system cells.
1.4 Glycobiology in the Treatment of Disease The continued growth of glycobiology and glycoproteomics has provided significant inroads towards our understanding of the role of carbohydrates in health and disease. Establishment of the biosynthetic pathways that underpin the formation of both natural and aberrant glycosylation have been fundamental to the development
24
W.B. Struwe et al.
of the field. More specifically, elucidation of the biological function of individual carbohydrate moieties on larger glycans has been paramount in developing a strategy towards how malfunctions in processing or activities of glycoproteins might be targeted for therapeutic intervention. Appropriately, the biopharmaceutical industry has assimilated this information and engineered next-generation therapies designed for improved biological activity, potency, and extended serum half-life. This has ultimately led to the development of glycoprotein therapeutics capable of providing optional strategies to the healthcare community, resulting in significantly improved outcomes for patients suffering from diseases once considered not only untreatable but incurable. The success of glycoprotein therapeutics is more apparent when considering their contribution to the annual sales of biologics. In 2008, biologics amassed over 46 billion US in sales [125] with glycoproteins representing a significant percentage of that figure. Monoclonal antibodies alone represent over 35% of biologic sales.
1.4.1 Bioproduction of Glycoprotein Therapeutics The biopharmaceutical industry has recognized the potential for glycoproteins in the treatment of numerous diseases and has subsequently embarked on the process of synthesizing complex glycoproteins for therapeutic use. This has resulted in the production of several categories of proteins including recombinant monoclonal antibodies, growth factors, blood clotting factors, hormones, and cytokines. These therapies have had an enormous impact on the treatment of disease, substantiated in part by the sales recorded for individual therapeutics. Treatments are now available for cancers (breast, non-Hodgkins lymphomas), autoimmune diseases (rheumatoid arthritis, Crohn’s disease, and lupus erythmatosis), lysosomal storage disorders (Pompe’s, Fabry, and Gaucher’s diseases), and blood disorders (anemia), all of which have benefited remarkably from improved therapeutic engineering. However, essential to the manufacture of complex glycoproteins are cellular expression systems capable of performing N-linked glycosylation. This is a process exclusive to eukaryotes, although certain prokaryotes have also shown some degree of N-linked glycan biosynthesis [126]. Accordingly, bioproduction of glycosylated proteins has been focussed towards specific eukaryotic cell lines capable of performing N-linked glycosylation in a manner similar to that observed in humans. Departure from human glycosylation raises several concerns in terms of therapeutic efficacy and risk as the human immune system is alert to and capable of eliminating proteins demonstrating abnormal glycosylation. Thus, a major obstacle for the biopharmaceutical industry has been the development of cell lines capable of producing therapies that will have improved pharmacokinetic behaviour and reduced immunogenicity. Several cell lines have emerged as appropriate expression systems for the production of complex glycoproteins with specific glycosylation requirements. CHO cells have moved to the fore as the industry standard for glycoprotein production, evidenced in part by the preference for CHO in over 20 currently
1
Glycoproteomics in Health and Disease
25
FDA-approved therapies [127]. A significant selling feature of CHO is the extensive similarity of the N-linked glycan biosynthetic pathway when compared to humans. Several modifications have also been engineered in CHO to improve its production capacity and longevity, including the introduction of anti-apoptotic factors [128], the co-expression of chaperones such as Hsp70 and Hsp27 for improved disulphide bond formation [129], and the expression of proteins such as Sly1 [130] and Munc18c [131], which are involved with vesicle trafficking from the ER to the Golgi. While several factors support the selection of CHO as an ideal expression system, inconsistencies are observed that can potentially influence recombinant glycoprotein potency. CHO cells are known to include Neu5Gc, a variation of sialic acid not synthesized by humans and a known immunogenic epitope [132]. The inclusion of Neu5Gc represents a potential concern for regulatory agencies stemming from evidence of Neu5Gc as an oncofetal antigen [132, 133]. From an industrial perspective, CHO cells lack GnT-III, the glycosyltransferase responsible for the addition of a bisecting GlcNAc moiety [134]. In terms of therapeutic biological activity, the bisecting GlcNAc has been shown to play a significant role in eliminating core α(1,6)-fucosylation from IgG-derived structures [135, 136]. Recombinant antibodies designed for use in the treatment of various cancers rely on the ability to engage NK cells which are known to target and eliminate virally infected and tumorigenic cells [137]. Communication between IgG and NK cells is mediated through the Fc receptor FcγRIIIa; consequently the only representative of the FcγR family expressed by NK cells [138]. The affinity of IgG for FcγRIIIa is significantly influenced by core fucosylation of the IgG-Fc region, where elimination of the monosaccharide results in a 100-fold enhancement in affinity. Thus, absence of GnT-III potentially results in recombinant monoclonal antibodies likely containing core α(1,6)-fucose and a subsequent reduction in antibody-dependent cell mediated cytotoxicity (ADCC) activity. Nonetheless, CHO cells currently represent the model for recombinant glycoprotein expression. Murine myeloma cell lines (such as NS0 and Sp2/0) have been used for R , the expression of a number of glycoprotein therapeutics, including Erbitux R R R Remicade , Synagis , and Zenapax [127]. While several advantages exist for murine expression systems, key disadvantages have also been recognized. Principally, murine myeloma cells incorporate the galactose-α(1,3)-galactose disaccharide; a known immunogenic epitope where 1% of normal human serum IgG is directed against this moiety [139]. Murine myeloma cell lines also incorporate Neu5Gc, making murine-expressed recombinant glycoproteins an obvious concern for not only the manufacturer but regulatory agencies intent on reducing the potential risk of candidate therapeutics. Alternative systems for glycoprotein bioproduction are currently being explored, with efforts largely directed towards production in insect, yeast, and transgenic plants. Much like murine and CHO cell lines, each of these systems has advantages including high production yields, the use of chemically defined media, and N-linked glycan biosynthetic machinery similar to that observed in humans. Yet despite these attributes, disadvantages prevail that make each cell expression system difficult for selection in manufacturing processes (Fig. 1.5).
Fig. 1.5 Bioproduction of recombinant glycoproteins exhibits divergence in N-linked biosynthesis. The production of recombinant biotherapeutics inherently relies on eukaryotic expression systems capable of performing N-linked glycan biosynthesis. CHO, insect, yeast, and plants are all options for production, but each demonstrate significant differences with human glycosylation. Yeast typically synthesize high mannose glycans while insects generate largely paucimannosidic structures. Plants tend to incorporate immunogenic epitopes such as β (1,2)-xylose and α (1,3) core fucose while CHO only differs from humans in the linkage used for sialic acid capping
26 W.B. Struwe et al.
1
Glycoproteomics in Health and Disease
27
Insect cells have recently become the target for complex recombinant glycoprotein expression due to a number of advantages including: (i) their ability to perform both N- and O-linked glycosylation; (ii) the efficiency of the baculovirus-derived expression vectors, and; (iii) the ability for growth in serum-free media. Key disadvantages to glycoprotein expression in insect cell lines are the incorporation of core α(1,3)-fucose; a cross reactive carbohydrate determinant (CCD) recognized by IgE in individuals allergic to bee venom [140]. The assembly of complex glycans is of paramount interest in the biopharmaceutical industry, specifically the addition of terminal sialic acids. These residues play an important function in the protein half-life. Insects not only lack the ability to galactosylate and sialylate, but they express GlcNAcase, an enzyme whose function is to remove terminal β(1,2)linked GlcNAc moieties from the Man3 GlcNAc2 core structure [141]. Essentially, the activity of this enzyme prevents the processing of complex glycans and instead generates glycans largely paucimannosidic in structure. Insect cell lines therefore represent a challenge for the biopharmaceutical industry, principally from the incorporation of α(1,3)-fucose and the absence of fully complex glycans containing galactose and sialic acid. Not unlike insect cells, yeasts are fast becoming a choice for glycoprotein production, principally due to the culturing advantages including high cell densities, high protein yields, chemically defined media, and extensively characterized N-linked glycan biosynthesis pathways [142]. Yeasts have been used for the production of many industrially-relevant proteins but have yet to make a noteworthy contribution to protein-based therapeutics. This is largely due to the hyper-mannosylation characteristic of yeast. The N-linked biosynthetic machinery of yeast resembles humans in terms of processing in the ER; however the similarities effectively end at that point. While humans progress towards complex glycans, yeast diverge towards highly mannosylated structures, reflective of the activity of several mannosyltransferases including α(1,2) – and α(1,6)-mannosyltransferases. The activities of these enzymes combine to generate mannosylated structures anywhere from Man9 GlcNAc2 to upwards of Man100 GlcNAc2 [143]. This has obvious implications in protein half-life due to the presence of mannose receptors distributed throughout the hematopoietic system [144] whose sole purpose is the recognition and targeted elimination of high mannose structures. Thus, while yeasts boast numerous attributes as a suitable vehicle for glycoprotein expression, key steps in the N-linked glycan biosynthetic pathway demote its efficacy as a candidate for glycoprotein therapeutic production. Transgenic plants are emerging as an alternative expression system for glycoproteins, notably due to their categorization as a “green” biotechnology and their ability to perform post-translational modifications including N-linked glycosylation. Recombinant proteins can be directed towards the endosperm, which facilitates protein purification. Moreover, plants require less investment and lower operating costs and are devoid of contamination concerns present in culture media such as endotoxins, viruses, and prions. In terms of glycosylation, there are two principal factors preventing the selection of transgenic plants as the production system of choice in bioproduction. Firstly, plants encode α(1,3)-fucosyltransferase and β(1,2)-xylosyltransferase (Xyl-T), which are responsible for the transfer of
28
W.B. Struwe et al.
α(1,3)-fucose and β(1,2)-xylose, respectively. These carbohydrates are known immunogenic epitopes, once again raising concerns for associated risks for therapeutic proteins expressed in transgenic plants. Secondly, plants encode α(1,4)fucosyltransferase that adds α(1,4)-fucose to the terminal GlcNAc on growing glycans. Together with the β(1,3)-galactose, the trisaccharide forms the Lewisa antigen [145].
1.4.2 Manipulating Glycosylation for Enhanced Biotherapeutic Function Notwithstanding the disadvantages of cell expression systems, researchers have made significant progress in engineering these systems for improved glycosylation with the intention of obtaining “humanized” N-linked glycans in bioproduction. Yeast has been the principal target for manipulation, essentially undergoing a complete transformation in its N-linked glycan biosynthetic machinery. This has included the elimination of α(1,2)- and α(1,6)-mannosyltransferases [146]. In their place, several enzymes have been supplemented to yeast with the intention to progress the development of complex glycans. Enzymes such as α(1,2)mannosidase [147] and N-acetylglucosaminyltransferase-I (GnT-I) [148] have been successfully cloned into yeast to provide a platform for the extension of glycans to the complex type. Further extension by the addition of β(1,4)-linked galactose has also been achieved, although this required the co-expression of a panel of enzymes with Gal-T including UDP-galactose-epimerase and the UDP-galactose transporter [149]. Sialic acid capping of glycans in yeast has been accomplished by a similar strategy as galactose, namely through the co-expression of additional factors required for the biosynthesis, transport, and transfer of activated sialic acid to the growing glycan [150]. With EPO as a model glycoprotein, modification of yeast N-linked glycan biosynthetic pathway has resulted in the generation of EPO with over 90% terminal sialylation, demonstrating a milestone has been achieved with “humanizing” glycosylation in yeast. Aside from yeast, other cell lines have shown promise following manipulation of their N-linked glycan biosynthetic machinery. Insect cell lines have been modified to “humanize” glycosylation through the elimination of GlcNAcase and α(1,3)-fucosyltransferase along with the introduction of GnT-I, GalT, and α(2,6)-sialyltransferase [151, 152]. This has resulted in the production of fully sialylated complex biantennary structures. Plants are known to incorporate both β(1,2)-xylose and core α(1,3)-fucose in addition to the Lewisa antigen [145, 153]. Elimination of core α(1,3)-fucose has been achieved by either disruption of L-fucose biosynthesis [154] or the expression of GnT-III, which prevents both β(1,2)-xylosylation and α(1,3)-fucosylation through the addition of a bisecting β(1,4)-GlcNAc [155]. Alternatively, disruption of α(1,3)-fucosyltransferase (fucT) and β(1,2)-xylosyltransferase (xylT) has also shown promise in preventing the inclusion of immunogenic carbohydrate epitopes in plants engineered for expression of recombinant EPO [156].
1
Glycoproteomics in Health and Disease
29
Continued efforts to modify N-linked glycosylation in more popular eukaryotic cell lines will undoubtedly benefit the development of therapeutics with improved biological activity and potency. Perhaps of greatest significance is the establishment of humanized glycosylation in cell expression systems that provide the greatest economic opportunity, namely high yielding, low risk, and low cost production of highly efficacious products. This not only stands to benefit the biopharmaceutical industry producing the therapy, but ultimately the patient receiving the treatment. As the discipline of glycoproteomics continues to expand, the biopharmaceutical industry is poised to benefit in terms of its ability to incorporate new data into the design of new therapeutics that will assist in engineering novel therapies with optimal efficacy and potency.
1.5 Systems Glycobiology, Glycoproteomics and Glycogenomics in Disease Diagnosis and Pathology The availability of sensitive quantitative tools for glycan analysis is opening up the possibility of discovering useful biomarkers from analysing total glycomes or disease associated glycans on individual glycoproteins, either isolated or excised from 2D gels. Disease markers are required for many reasons, for example to achieve an initial diagnosis, stage a disease process and determine the response to medication. Glycan processing of glycoproteins and glycolipids has been noted in many if not most diseases and combinations of glycan changes are now being tested alongside panels of protein and genomic changes. Many diseases are multi-systemic and the most effective markers will be discovered through an in-depth understanding of the disease processes. Moreover, in order to control medication and understand disease progression it will also become critical to understand how these processes relate to each other within the patient by adopting a systems biology approach. Systems (glyco)biology involves determining the links between one system and another especially by establishing what happens at the boundaries and what contributes to the “tipping point” when signalling molecules from one system induce a functional response in another. In glycobiology, this will be achieved through an in-depth knowledge of molecules that initiate and operate cell signalling pathways, the effect of these pathways on nuclear events such as transcription levels of glycan processing enzymes. Determining the influence of the internal and external environment of the cell and metabolites on protein expression levels and the glycan processing of cell surface and secreted glycoproteins will be important. Disease associated glycosylation changes may contribute to pathogenesis and provide potential targets for markers and drugs. The role of cell surface and secreted glycoproteins in promoting tumour growth and metastasis also requires a systems approach. The challenges of building a relational framework to accommodate data from genomics, proteomics, metabolomics and glycomics cannot be overestimated. Although it is a good place to start, it is not enough to understand how individual reactions take place in vitro because in a living organism many parameters more complex than a reaction vessel provide the backdrop for functional interactions. These include
30
W.B. Struwe et al.
the concentration and geometrical presentation of ligands to enzymes or substrates, their temporal and spatial expression, turn-over time to accommodate the dynamic situation in vivo where molecules continually leave and join the interactive space and sometimes compete for multiple receptors or ligands. In a recent study, made possible by the development of high throughput, quantitative HPLC based glycan analysis, plasma glycans were analyzed in 1008 individuals to evaluate variability and heritability, as well as the main environmental determinants that affect glycan structures. By combining HPLC analysis of fluorescently labelled glycans with sialidase digestion, glycans were separated into 33 chromatographic peaks and quantified. A high level of variability was observed with the median ratio of minimal to maximal values of 6.17 and significant age- and gender-specific differences. Heritability estimates for individual glycans varied widely, ranging from very low to very high. Glycome-wide environmental determinants were also detected with statistically significant effects of different variables including diet, smoking and cholesterol levels [157]. Another major breakthrough has come in a genome-wide association study (GWAS) in which FUT8 and ESR2 were identified as co-regulators of a bi-antennary N-linked glycan (GlcNAc2 Man3 GlcNAc2 ) in human plasma proteins, genomics and glycomics can be directly linked. This was carried out in a population of 3,000 people and correlated single nucleotide polymorphisms directly with levels of specific glycan populations in the human plasma glycome of these individuals [158]. This proof of principle that glycoanalytical technology is now sufficiently reproducible and high throughput to enable glycomics and genomics data to be interrogated together opens the way for glycomics and glycobiology to take their place alongside the other major fields in systems biology.
References 1. Varki A (1993) Biological roles of oligosaccharides: all of the theories are correct. Glycobiology 3:97–130 2. Griffitts JS et al (2005) Glycolipids as receptors for Bacillus thuringiensis crystal toxin. Science 307:922–925 3. Fuster MM, Esko JD (2005) The sweet and sour of cancer: glycans as novel therapeutic targets. Nat Rev Cancer 5:526–542 4. Marek KW, Vijay IK, Marth JD (1999) A recessive deletion in the GlcNAc-1phosphotransferase gene results in peri-implantation embryonic lethality. Glycobiology 9:1263–1271 5. Matthijs G, Schollen E, Van Schaftingen E, Cassiman JJ, Jaeken J (1998) Lack of homozygotes for the most frequent disease allele in carbohydrate-deficient glycoprotein syndrome type 1A. Am J Hum Genet 62:542–550 6. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8 7. Laine RA (1994) A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05×10(12) structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems. Glycobiology 4:759–767 8. Opat AS, van Vliet C, Gleeson PA (2001) Trafficking and localisation of resident Golgi glycosylation enzymes. Biochimie 83:763–773
1
Glycoproteomics in Health and Disease
31
9. Hossler P, Mulukutla BC, Hu WS (2007) Systems analysis of N-glycan processing in mammalian cells. PLoS ONE 2:e713 10. Srivastava S (2008) Move over proteomics, here comes glycomics. J Proteome Res 7:1799 11. Dove A (2001) The bittersweet promise of glycobiology. Nat Biotechnol 19:913–917 12. Schachter H et al (2002) Functional post-translational proteomics approach to study the role of N-glycans in the development of Caenorhabditis elegans. Biochem Soc Symp (69):1–21 13. Raman R, Raguram S, Venkataraman G, Paulson JC, Sasisekharan R (2005) Glycomics: an integrated systems approach to structure-function relationships of glycans. Nat Methods 2:817–824 14. Varki A (1999) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 15. North SJ et al (2010) Glycomics profiling of Chinese hamster ovary (CHO) cell glycosylation mutants reveals N-glycans of a novel size and complexity. J Biol Chem 285:5759–5775 16. Schroder M, Kaufman RJ (2005) The mammalian unfolded protein response. Annu Rev Biochem 74:739–789 17. Menon KM, Jaffe RB (1973) Chorionic gonadotropin sensitive adenylate cyclase in human term placenta. J Clin Endocrinol Metab 36:1104–1109 18. Erbayraktar S et al (2003) Asialoerythropoietin is a nonerythropoietic cytokine with broad neuroprotective activity in vivo. Proc Natl Acad Sci USA 100:6741–6746 19. Takeuchi M, Kobata A (1991) Structures and functional roles of the sugar chains of human erythropoietins. Glycobiology 1:337–346 20. Chandrasekaran A, Srinivasan A, Raman R, Viswanathan K, Raguram S, Tumpey TM, Sasisekharan V, Sasisekharan R (2008) Glycan topology determines human adaptation of avian H5N1 virus hemagglutinin. Nat Biotechnol 26:107–113 21. Wada Y et al (2010) Comparison of methods for profiling O-glycosylation: HUPO Human Disease Glycomics/Proteome Initiative Multi-Institutional Study of IgA1. Mol Cell Proteomics 9:719–727 22. Tissot B et al (2009) Glycoproteomics: past, present and future. FEBS Lett 583: 1728–1735 23. Tackenberg B, Jelcic I, Baerenwaldt A, Oertel WH, Sommer N, Nimmerjahn F, Lunemann JD (2009) Impaired inhibitory Fcgamma receptor IIB expression on B cells in chronic inflammatory demyelinating polyneuropathy. Proc Natl Acad Sci USA 106:4788–4792 24. An HJ, Froehlich JW, Lebrilla CB (2009) Determination of glycosylation sites and sitespecific heterogeneity in glycoproteins. Curr Opin Chem Biol 13:421–426 25. Saldova R et al (2007) Ovarian cancer is associated with changes in glycosylation in both acute-phase proteins and IgG. Glycobiology 17:1344–1356 26. Harvey DJ (2005) Fragmentation of negative ions from carbohydrates: part 1. Use of nitrate and other anionic adducts for the production of negative ion electrospray spectra from Nlinked carbohydrates. J Am Soc Mass Spectrom 16:622–630 27. Harvey DJ (2005) Fragmentation of negative ions from carbohydrates: part 2. Fragmentation of high-mannose N-linked glycans. J Am Soc Mass Spectrom 16:631–646 28. Harvey DJ (2005) Fragmentation of negative ions from carbohydrates: part 3. Fragmentation of hybrid and complex N-linked glycans. J Am Soc Mass Spectrom 16:647–659 29. Prien JM, Ashline DJ, Lapadula AJ, Zhang H, Reinhold VN (2009) The high mannose glycans from bovine ribonuclease B isomer characterization by ion trap MS. J Am Soc Mass Spectrom 20:539–556 30. Hanneman AJ, Rosa JC, Ashline D, Reinhold VN (2006) Isomer and glycomer complexities of core GlcNAcs in Caenorhabditis elegans. Glycobiology 16:874–890 31. Ashline D, Singh S, Hanneman A, Reinhold V (2005) Congruent strategies for carbohydrate sequencing. 1. Mining structural details by MSn. Anal Chem 77:6250–6262 32. Prien JM, Huysentruyt LC, Ashline DJ, Lapadula AJ, Seyfried TN, Reinhold VN (2008) Differentiating N-linked glycan structural isomers in metastatic and nonmetastatic tumour cells using sequential mass spectrometry. Glycobiology 18:353–366
32
W.B. Struwe et al.
33. Zaia J (2008) Mass spectrometry and the emerging field of glycomics. Chem Biol 15: 881–892 34. Wheeler SF, Domann P, Harvey DJ (2009) Derivatization of sialic acids for stabilization in matrix-assisted laser desorption/ionization mass spectrometry and concomitant differentiation of alpha(2 → 3)- and alpha(2 → 6)-isomers. Rapid Commun Mass Spectrom 23:303–312 35. Ashline DJ, Lapadula AJ, Liu YH, Lin M, Grace M, Pramanik B, Reinhold VN (2007) Carbohydrate structural isomers analyzed by sequential mass spectrometry. Anal Chem 79:3830–3842 36. Royle L et al (2008) HPLC-based analysis of serum N-glycans on a 96-well plate platform with dedicated database software. Analytical Biochemistry 376:1–12 37. Guile GR, Rudd PM, Wing DR, Prime SB, Dwek RA (1996) A rapid high-resolution highperformance liquid chromatographic method for separating glycan mixtures and analyzing oligosaccharide profiles. Anal Biochem 240:210–226 38. Anumula KR (2006) Advances in fluorescence derivatization methods for high-performance liquid chromatographic analysis of glycoprotein carbohydrates. Anal Biochem 350:1–23 39. Anumula KR (2000) High-sensitivity and high-resolution methods for glycoprotein analysis. Anal Biochem 283:17–26 40. Nimmerjahn A (2009) Astrocytes going live: advances and challenges. J Physiol 587: 1639–1647 41. Jacobs JM, Adkins JN, Qian WJ, Liu T, Shen Y, Camp DG 2nd, Smith RD (2005) Utilizing human blood plasma for proteomic biomarker discovery. J Proteome Res 4:1073–1085 42. Gorg A, Weiss W, Dunn MJ (2004) Current two-dimensional electrophoresis technology for proteomics. Proteomics 4:3665–3685 43. Unlu M, Morgan ME, Minden JS (1997) Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18:2071–2077 44. Byrne JC et al (2009) 2D-DIGE as a strategy to identify serum markers for the progression of prostate cancer. J Proteome Res 8:942–957 45. Hart C, Schulenberg B, Steinberg TH, Leung WY, Patton WF (2003) Detection of glycoproteins in polyacrylamide gels and on electroblots using Pro-Q Emerald 488 dye, a fluorescent periodate Schiff-base stain. Electrophoresis 24:588–598 46. Partridge EA et al (2004) Regulation of cytokine receptors by Golgi N-glycan processing and endocytosis. Science 306:120–124 47. Yoshimura M, Ihara Y, Matsuzawa Y, Taniguchi N (1996) Aberrant glycosylation of E-cadherin enhances cell-cell binding to suppress metastasis. J Biol Chem 271:13811–13815 48. Yamashita K, Ohkura T, Tachibana Y, Takasaki S, Kobata A (1984) Comparative study of the oligosaccharides released from baby hamster kidney cells and their polyoma transformant by hydrazinolysis. J Biol Chem 259:10834–10840 49. Pierce M, Arango J (1986) Rous sarcoma virus-transformed baby hamster kidney cells express higher levels of asparagine-linked tri- and tetraantennary glycopeptides containing [GlcNAc-beta (1,6)Man-alpha (1,6)Man] and poly-N-acetyllactosamine sequences than baby hamster kidney cells. J Biol Chem 261:10772–10777 50. Granovsky M, Fata J, Pawling J, Muller WJ, Khokha R, Dennis JW (2000) Suppression of tumour growth and metastasis in Mgat5-deficient mice. Nat Med 6:306–312 51. Tsui KH, Chang PL, Feng TH, Chung LC, Sung HC, Juang HH (2008) Evaluating the function of matriptase and N-acetylglucosaminyltransferase V in prostate cancer metastasis. Anticancer Res 28:1993–1999 52. Fernandes B, Sagman U, Auger M, Demetrio M, Dennis JW (1991) Beta 1-6 branched oligosaccharides as a marker of tumour progression in human breast and colon neoplasia. Cancer Res 51:718–723 53. Seelentag WK, Li WP, Schmitz SF, Metzger U, Aeberhard P, Heitz PU, Roth J (1998) Prognostic value of beta1,6-branched oligosaccharides in human colorectal carcinoma. Cancer Res 58:5559–5564
1
Glycoproteomics in Health and Disease
33
54. Saito T, Miyoshi E, Sasai K, Nakano N, Eguchi H, Honke K, Taniguchi N (2002) A secreted type of beta 1,6-N-acetylglucosaminyltransferase V (GnT-V) induces tumour angiogenesis without mediation of glycosylation: a novel function of GnT-V distinct from the original glycosyltransferase activity. J Biol Chem 277:17002–17008 55. Saito H et al (1994) cDNA cloning and chromosomal mapping of human Nacetylglucosaminyltransferase V+. Biochem Biophys Res Commun 198:318–327 56. Cornil I, Kerbel RS, Dennis JW (1990) Tumour cell surface beta 1-4-linked galactose binds to lectin(s) on microvascular endothelial cells and contributes to organ colonization. J Cell Biol 111:773–781 57. Sato Y et al (2001) Overexpression of N-acetylglucosaminyltransferase III enhances the epidermal growth factor-induced phosphorylation of ERK in HeLaS3 cells by up-regulation of the internalization rate of the receptors. J Biol Chem 276:11956–11962 58. Kumamoto K, Goto Y, Sekikawa K, Takenoshita S, Ishida N, Kawakita M, Kannagi R (2001) Increased expression of UDP-galactose transporter messenger RNA in human colon cancer tissues and its implication in synthesis of Thomsen-Friedenreich antigen and sialyl Lewis A/X determinants. Cancer Res 61:4620–4627 59. Zipin A et al (2004) Tumour-microenvironment interactions: the fucose-generating FX enzyme controls adhesive properties of colorectal cancer cells. Cancer Res 64: 6571–6578 60. Moriwaki K, Noda K, Nakagawa T, Asahi M, Yoshihara H, Taniguchi N, Hayashi N, Miyoshi E (2007) A high expression of GDP-fucose transporter in hepatocellular carcinoma is a key factor for increases in fucosylation. Glycobiology 17:1311–1320 61. Shields RL, Lai J, Keck R, O’Connell LY, Hong K, Meng YG, Weikert SH, Presta LG (2002) Lack of fucose on human IgG1 N-linked oligosaccharide improves binding to human Fcgamma RIII and antibody-dependent cellular toxicity. J Biol Chem 277:26733–26740 62. Taketa K et al (1993) A collaborative study for the evaluation of lectin-reactive alpha-fetoproteins in early detection of hepatocellular carcinoma. Cancer Res 53: 5419–5423 63. Wang X, Gu J, Ihara H, Miyoshi E, Honke K, Taniguchi N (2006) Core fucosylation regulates epidermal growth factor receptor-mediated intracellular signaling. J Biol Chem 281:2572–2577 64. Osumi D et al (2009) Core fucosylation of E-cadherin enhances cell-cell adhesion in human colon carcinoma WiDr cells. Cancer Sci 100:888–895 65. Crocker PR, Varki A (2001) Siglecs, sialic acids and innate immunity. Trends Immunol 22:337–342 66. Feizi T (2000) Carbohydrate-mediated recognition systems in innate immunity. Immunol Rev 173:79–88 67. Kudo T, Ikehara Y, Togayachi A, Morozumi K, Watanabe M, Nakamura M, Nishihara S, Narimatsu H (1998) Up-regulation of a set of glycosyltransferase genes in human colorectal cancer. Lab Invest 78:797–811 68. Imai Y, Lasky LA, Rosen SD (1992) Further characterization of the interaction between L-selectin and its endothelial ligands. Glycobiology 2:373–381 69. Sperandio M, Gleissner CA, Ley K (2009) Glycosylation in immune cell trafficking. Immunol Rev 230:97–113 70. Kannagi R (2001) Transcriptional regulation of expression of carbohydrate ligands for cell adhesion molecules in the selectin family. Adv Exp Med Biol 491:267–278 71. Miyahara R, Tanaka F, Nakagawa T, Matsuoka K, Isii K, Wada H (2001) Expression of neural cell adhesion molecules (polysialylated form of neural cell adhesion molecule and L1-cell adhesion molecule) on resected small cell lung cancer specimens: in relation to proliferation state. J Surg Oncol 77:49–54 72. Tanaka F et al (2000) Expression of polysialic acid and STX, a human polysialyltransferase, is correlated with tumour progression in non-small cell lung cancer. Cancer Res 60: 3072–3080
34
W.B. Struwe et al.
73. Narita T, Funahashi H, Satoh Y, Watanabe T, Sakamoto J, Takagi H (1993) Association of expression of blood group-related carbohydrate antigens with prognosis in breast cancer. Cancer 71:3044–3053 74. Nakagoe T et al (2000) Expression of Lewis(a), sialyl Lewis(a), Lewis(x) and sialyl Lewis(x) antigens as prognostic factors in patients with colorectal cancer. Can J Gastroenterol 14: 753–760 75. Ogawa J-I, Sano A, Inoue H, Koide S (1995) Expression of Lewis-related antigen and prognosis in stage I non – small cell lung cancer. Ann Thorac Surg 59:412–415 76. Saldova R, Wormald MR, Dwek RA, Rudd PM (2008) Glycosylation changes on serum glycoproteins in ovarian cancer may contribute to disease pathogenesis. Dis Markers 25:219–232 77. Arnold JN, Saldova R, Abd Hamid UM, Rudd PM (2008) Evaluation of the serum Nlinked glycome for the diagnosis of cancer and chronic inflammation. Proteomics 8: 3284–3293 78. Gornik O, Lauc G (2008) Glycosylation of serum proteins in inflammatory diseases. Dis Markers 25:267–278 79. Schmid K, Nimerg RB, Kimura A, Yamaguchi H, Binette JP (1977) The carbohydrate units of human plasma alpha1-acid glycoprotein. Biochim Biophys Acta 492:291–302 80. Laine A, Hayem A (1981) Purification and characterization of alpha 1-antichymotrypsin from human pleural fluid and human serum. Biochim Biophys Acta 668:429–438 81. Malhotra R, Wormald MR, Rudd PM, Fischer PB, Dwek RA, Sim RB (1995) Glycosylation changes of IgG associated with rheumatoid arthritis can activate complement via the mannose-binding protein. Nat Med 1:237–243 82. Umana P, Jean-Mairet J, Moudry R, Amstutz H, Bailey JE (1999) Engineered glycoforms of an antineuroblastoma IgG1 with optimized antibody-dependent cellular cytotoxic activity. Nat Biotechnol 17:176–180 83. Kaneko Y, Nimmerjahn F, Ravetch JV (2006) Anti-inflammatory activity of immunoglobulin G resulting from Fc sialylation. Science 313:670–673 84. Axford JS, Sumar N, Alavi A, Isenberg DA, Young A, Bodman KB, Roitt IM (1992) Changes in normal glycosylation mechanisms in autoimmune rheumatic disease. J Clin Invest 89:1021–1031 85. Omtvedt LA, Royle L, Husby G, Sletten K, Radcliffe CM, Harvey DJ, Dwek RA, Rudd PM (2006) Glycan analysis of monoclonal antibodies secreted in deposition disorders indicates that subsets of plasma cells differentially process IgG glycans. Arthritis Rheum 54: 3433–3440 86. Kanoh Y, Mashiko T, Danbara M, Takayama Y, Ohtani S, Egawa S, Baba S, Akahoshi T (2004) Changes in serum IgG oligosaccharide chains with prostate cancer progression. Anticancer Res 24:3135–3139 87. Kanoh Y, Mashiko T, Danbara M, Takayama Y, Ohtani S, Imasaki T, Abe T, Akahoshi T (2004) Analysis of the oligosaccharide chain of human serum immunoglobulin g in patients with localized or metastatic cancer. Oncology 66:365–370 88. Parekh RB et al (1985) Association of rheumatoid arthritis and primary osteoarthritis with changes in the glycosylation pattern of total serum IgG. Nature 316:452–457 89. Holland M et al (2002) Hypogalactosylation of serum IgG in patients with ANCA-associated systemic vasculitis. Clin Exp Immunol 129:183–190 90. Matsumoto A, Shikata K, Takeuchi F, Kojima N, Mizuochi T (2000) Autoantibody activity of IgG rheumatoid factor increases with decreasing levels of galactosylation and sialylation. J Biochem 128:621–628 91. Jacobs JM, Adkins JN, Qian W-J, Liu T, Shen Y, Camp DG, Smith RD (2005) Utilizing human blood plasma for proteomic biomarker discovery. J Proteome Res 4: 1073–1085 92. Oesterling JE (1991) Prostate specific antigen: a critical assessment of the most useful tumour marker for adenocarcinoma of the prostate. J Urol 145:907–923
1
Glycoproteomics in Health and Disease
35
93. Peracaula R, Tabares G, Royle L, Harvey DJ, Dwek RA, Rudd PM, de Llorens R (2003) Altered glycosylation pattern allows the distinction between prostate-specific antigen (PSA) from normal and tumour origins. Glycobiology 13:457–470 94. Ohyama C, Hosono M, Nitta K, Oh-eda M, Yoshikawa K, Habuchi T, Arai Y, Fukuda M (2004) Carbohydrate structure and differential binding of prostate specific antigen to Maackia amurensis lectin between prostate cancer and benign prostate hypertrophy. Glycobiology 14:671–679 95. Meany DL, Zhang Z, Sokoll LJ, Zhang H, Chan DW (2008) Glycoproteomics for prostate cancer detection: changes in serum PSA glycosylation patterns. J Proteome Res 8:613–619 96. Weickmann JL, Olson EM, Glitz DG (1984) Immunological assay of pancreatic ribonuclease in serum as an indicator of pancreatic cancer. Cancer Res 44:1682–1687 97. Barrabes S et al (2007) Glycosylation of serum ribonuclease 1 indicates a major endothelial origin and reveals an increase in core fucosylation in pancreatic cancer. Glycobiology 17:388–400 98. Varma R, Hoshino AY (1980) Serum glycoproteins in schizophrenia. Carbohydr Res 82:343–351 99. Delves PJ (1998) The role of glycosylation in autoimmune disease. Autoimmunity 27: 239–253 100. Raju TS (2008) Terminal sugars of Fc glycans influence antibody effector functions of IgGs. Curr Opin Immunol 20:471–478 101. Nezlin R, Ghetie V (2004) Interactions of immunoglobulins outside the antigen-combining site. Adv Immunol 82:155–215 102. Mullinax F, Mullinax GL (1975) Abnormality of Igg structure in rheumatoid-arthritis and systemic lupus-erythematosus. Arthritis Rheum 18:417–418 103. van Zeben D, Rook GA, Hazes JM, Zwinderman AH, Zhang Y, Ghelani S, Rademacher TW, Breedveld FC (1994) Early agalactosylation of IgG is associated with a more progressive disease course in patients with rheumatoid arthritis: results of a follow-up study. Br J Rheumatol 33:36–43 104. Bauer D, Haroutunian V, Meador-Woodruff JH, McCullumsmith RE (2009) Abnormal glycosylation of EAAT1 and EAAT2 in prefrontal cortex of elderly patients with schizophrenia. Schizophr Res 117:92–98 105. Yazawa S, Tanaka S, Nishimura T, Miyanaga K, Kochibe N (1999) Plasma alpha1,3-fucosyltransferase deficiency in schizophrenia. Exp Clin Immunogenet 16: 125–130 106. Maguire TM, Thakore J, Dinan TG, Hopwood S, Breen KC (1997) Plasma sialyltransferase levels in psychiatric disorders as a possible indicator of HPA axis function. Biol Psychiatry 41:1131–1136 107. Haeuptle MA, Hennet T (2009) Congenital disorders of glycosylation: an update on defects affecting the biosynthesis of dolichol-linked oligosaccharides. Hum Mutat 30: 1628–1641 108. Freeze HH, Westphal V (2001) Balancing N-linked glycosylation to avoid disease. Biochimie 83:791–799 109. Eklund EA, Freeze HH (2006) The congenital disorders of glycosylation: a multifaceted group of syndromes. NeuroRx 3:254–263 110. Prietsch V et al (2002) A new case of CDG-x with stereotyped dystonic hand movements and optic atrophy. J Inherit Metab Dis 25:126–130 111. Mahant S, Feigenbaum A (2006) A child with an underrecognized form of developmental delay: a congenital disorder of glycosylation. CMAJ 175:1369 112. Kjaergaard S, Schwartz M, Skovby F (2001) Congenital disorder of glycosylation type Ia (CDG-Ia): phenotypic spectrum of the R141H/F119L genotype. Arch Dis Child 85:236–239 113. Schollen E, Kjaergaard S, Martinsson T, Vuillaumier-Barrot S, Dunoe M, Keldermans L, Seta N, Matthijs G (2004) Increased recurrence risk in congenital disorders of glycosylation type Ia (CDG-Ia) due to a transmission ratio distortion. J Med Genet 41:877–880
36
W.B. Struwe et al.
114. Schollen E, Kjaergaard S, Legius E, Schwartz M, Matthijs G (2000) Lack of HardyWeinberg equilibrium for the most prevalent PMM2 mutation in CDG-Ia (congenital disorders of glycosylation type Ia). Eur J Hum Genet 8:367–371 115. Matthijs G et al (2000) Mutations in PMM2 that cause congenital disorders of glycosylation, type Ia (CDG-Ia). Hum Mutat 16:386–394 116. de Lonlay P et al (2001) A broad spectrum of clinical presentations in congenital disorders of glycosylation I: a series of 26 cases. J Med Genet 38:14–19 117. Fletcher JM (2006) Screening for lysosomal storage disorders – a clinical perspective. J Inherit Metab Dis 29:405–408 118. Beck M (2007) New therapeutic options for lysosomal storage disorders: enzyme replacement, small molecules and gene therapy. Hum Genet 121:1–22 119. Elstein D, Zimran A (2009) Review of the safety and efficacy of imiglucerase treatment of Gaucher disease. Biologics 3:407–417 120. Schiffmann R et al (2000) Infusion of alpha-galactosidase A reduces tissue globotriaosylceramide storage in patients with Fabry disease. Proc Natl Acad Sci U S A 97:365–370 121. Van den Hout JM et al (2004) Long-term intravenous treatment of Pompe disease with recombinant human alpha-glucosidase from milk. Pediatrics 113:e448–e457 122. Pastores GM (2008) Laronidase (Aldurazyme): enzyme replacement therapy for mucopolysaccharidosis type I. Expert Opin Biol Ther 8:1003–1009 123. Clarke LA (2008) Idursulfase for the treatment of mucopolysaccharidosis II. Expert Opin Pharmacother 9:311–317 124. Hopwood JJ, Bate G, Kirkpatrick P (2006) Galsulfase. Nat Rev Drug Discov 5:101–102 125. Aggarwal S (2009) What’s fueling the biotech engine-2008. Nat Biotech 27:987–993 126. Dennis L et al (2005) Functional analysis of the Campylobacter jejuni N-linked protein glycosylation pathway. Mol Microbiol 55:1695–1703 127. Hossler P, Khattak SF, Li ZJ (2009) Optimal and consistent protein glycosylation in mammalian cell culture. Glycobiology 19:936–949 128. Figueroa B Jr, Ailor E, Osborne D, Hardwick JM, Reff M, Betenbaugh MJ (2007) Enhanced cell culture performance using inducible anti-apoptotic genes E1B-19 K and Aven in the production of a monoclonal antibody with Chinese hamster ovary cells. Biotechnol Bioeng 97:877–892 129. Lee YY, Wong KT, Tan J, Toh PC, Mao Y, Brusic V, Yap MG (2009) Overexpression of heat shock proteins (HSPs) in CHO cells for extended culture viability and improved recombinant protein production. J Biotechnol 143:34–43 130. Peng RW, Fussenegger M (2009) Molecular engineering of exocytic vesicle traffic enhances the productivity of Chinese hamster ovary cells. Biotechnol Bioeng 102:1170–1181 131. Peng RW, Guetg C, Tigges M, Fussenegger M (2010) The vesicle-trafficking protein munc18b increases the secretory capacity of mammalian cells. Metab Eng 12:18–25 132. Muchmore EA, Milewski M, Varki A, Diaz S (1989) Biosynthesis of N-glycolyneuraminic acid. The primary site of hydroxylation of N-acetylneuraminic acid is the cytosolic sugar nucleotide pool. J Biol Chem 264:20216–20223 133. Tangvoranuntakul P, Gagneux P, Diaz S, Bardor M, Varki N, Varki A, Muchmore E (2003) Human uptake and incorporation of an immunogenic nonhuman dietary sialic acid. PNAS 100:12045–12050 134. Bergwerff AA, Stroop CJ, Murray B, Holtorf AP, Pluschke G, Van Oostrum J, Kamerling JP, Vliegenthart JF (1995) Variation in N-linked carbohydrate chains in different batches of two chimeric monoclonal IgG1 antibodies produced by different murine SP2/0 transfectoma cell subclones. Glycoconj J 12:318–330 135. Umana P, Jean-Mairet J, Moudry R, Amstutz H, Bailey JE (1999) Engineered glycoforms of an antineuroblastoma IgG1 with optimized antibody-dependent cellular cytotoxic activity. Nat Biotech 17:176–180 136. Ferrara C, Brünker P, Suter T, Moser S, Püntener U, Umaña P (2006) Modulation of therapeutic antibody effector functions by glycosylation engineering: Influence of
1
Glycoproteomics in Health and Disease
137. 138. 139. 140.
141. 142.
143. 144. 145. 146.
147. 148. 149.
150. 151.
152.
153.
154.
155.
156.
37
Golgi enzyme localization domain and co-expression of heterologous beta1, 4-Nacetylglucosaminyltransferase III and Golgi alpha-mannosidase II. Biotechnol Bioeng 93:851–861 Trinchieri G (1989) Biology of natural killer cells. Adv Immunol 47:187–376 Nimmerjahn F, Ravetch JV (2008) Fcgamma receptors as regulators of immune responses. Nat Rev Immunol 8:34–47 Galili U (1989) Abnormal expression of alpha-galactosyl epitopes in man. A trigger for autoimmune processes? Lancet 2:358–361 Bencurova M, Hemmer W, Focke-Tejkl M, Wilson IB, Altmann F (2004) Specificity of IgG and IgE antibodies against plant and insect glycoprotein glycans determined with artificial glycoforms of human transferrin. Glycobiology 14:457–466 Geisler C, Aumiller JJ, Jarvis DL (2008) A fused lobes gene encodes the processing beta-Nacetylglucosaminidase in Sf9 cells. J Biol Chem 283:11330–11339 Lehle L, Strahl S, Tanner W (2006) Protein glycosylation, conserved from yeast to man: a model organism helps elucidate congenital human diseases. Angew Chem Int Ed Engl 45:6802–6818 Wildt S, Gerngross TU (2005) The humanization of N-glycosylation pathways in yeast. Nat Rev Microbiol 3:119–128 Varki A (2009) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY Fitchette-Laine AC et al (1997) N-glycans harboring the Lewis a epitope are expressed at the surface of plant cells. Plant J 12:1411–1417 Chiba Y, Suzuki M, Yoshida S, Yoshida A, Ikenaga H, Takeuchi M, Jigami Y, Ichishima E (1998) Production of human compatible high mannose-type (Man5 GlcNAc2 ) sugar chains in Saccharomyces cerevisiae. J Biol Chem 273:26298–26304 Herscovics A (2001) Structure and function of Class I alpha 1,2-mannosidases involved in glycoprotein synthesis and endoplasmic reticulum quality control. Biochimie 83:757–762 Choi BK et al (2003) Use of combinatorial genetic libraries to humanize N-linked glycosylation in the yeast Pichia pastoris. PNAS 100:5022–5027 Bobrowicz P et al (2004) Engineering of an artificial glycosylation pathway blocked in core oligosaccharide assembly in the yeast Pichia pastoris: production of complex humanized glycoproteins with terminal galactose. Glycobiology 14:757–766 Hamilton SR et al (2006) Humanization of yeast to produce complex terminally sialylated glycoproteins. Science 313:1441–1443 Hollister JR, Jarvis DL (2001) Engineering lepidopteran insect cells for sialoglycoprotein production by genetic transformation with mammalian β1,4-galactosyltransferase and α2,6sialyltransferase genes. Glycobiology 11:1–9 Hollister J, Grabenhorst E, Nimtz M, Conradt H, Jarvis DL (2002) Engineering the protein N-glycosylation pathway in insect cells for production of biantennary, complex N-glycans. Biochemistry 41:15093–15104 Bardor M, Faveeuw C, Fitchette AC, Gilbert D, Galas L, Trottein F, Faye L, Lerouge P (2003) Immunoreactivity in mammals of two typical plant glyco-epitopes, core alpha(1,3)fucose and core xylose. Glycobiology 13:427–434 Rayon C, Cabanes-Macheteau M, Loutelier-Bourhis C, Salliot-Maire I, Lemoine J, Reiter WD, Lerouge P, Faye L (1999) Characterization of N-glycans from Arabidopsis. Application to a fucose-deficient mutant. Plant Physiol 119:725–734 Frey AD, Karg SR, Kallio PT (2009) Expression of rat β(1,4)-Nacetylglucosaminyltransferase III in Nicotiana tabacum remodels the plant-specific N-glycosylation. Plant Biotechnol J 7:33–48 Weise A, Altmann F, Rodriguez-Franco M, Sjoberg ER, Bäumer W, Launhardt H, Kietzmann M, Gorr G (2007) High-level expression of secreted complex glycosylated recombinant human erythropoietin in the Physcomitrella Δfuc-t -Δxyl-t mutant. Plant Biotechnol J 5:389–401
38
W.B. Struwe et al.
157. Knezevic A et al (2009) Variability, heritability and environmental determinants of human plasma N-glycome. J Proteome Res 8:694–701 158. Lauc G, Huffman J, Hayward C, Knezevic A, Polasek O, Gornik O, Vitart V, Kolcic I, Biloglav Z, Zgaga L, Hastie N, Wright A, Campbell H, Rudd P, Rudan I (2009) Genomewide association study identifies FUT8 and ESR2 as co-regulators of a bi-antennary N-linked glycan A2 (GlcNAc2Man3GlcNAc2) in human plasma proteins. Available from Nature Precedings
Chapter 2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions of Therapeutic IgGs T. Shantha Raju, David M. Knight, and Robert E. Jordan
Abstract Glycans N-linked to the Fc region of IgGs affect binding to various Fc receptors and C1q protein and therefore are important for IgG effector functions, including ADCC and CDC activities. Fc glycans are highly heterogeneous and the nature of this variation differs between species. Heterogeneity of Fc glycans arises from the presence or the absence of different terminal sugars, including sialic acid, galactose and N-acetylglucosamine, core fucose along with bisecting N-acetylglucosamine. To understand the influence of individual terminal sugar residues on serum half-life and antibody effector functions, it is necessary to prepare homogeneous IgG glycoforms. This chapter describes glycoengineering strategies to prepare IgG molecules containing homogeneous glycan chains in the Fc region and their significance in assessing IgG functions. The importance of selecting appropriate in vitro and/or in vivo conditions, including enzymes, buffers and cell culture conditions, to produce recombinant IgGs with homogeneous glycoforms is discussed. Keywords Immunoglobulins · Antibody · Glycosylation · Glycans · Glycoengineering · Recombinant IgG Abbreviations ADCC CDC CE-LIF CHO CMP-Sia DHB EPO
antibody dependent cellular cytotoxicity complement dependent cytotoxicity capillary electrophoresis with laser induced fluorescence detection Chinese hamster ovary cytidine monophosphate N-acetylneuraminic acid 2,5-Dihydroxybenzoic acid erythropoetin
T.S. Raju (B) Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA e-mail: [email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_2, C Springer Science+Business Media B.V. 2011
39
40
ESI-MS Fuc Gal GlcNAc GnT-III IEC IgG MALDI-TOF-MS Man PNGase F rIgG RP-HPLC Sia tPA UDP-Gal UDP-GlcNAc α1,3GT α2,3ST α2,6ST β1,4GT
T.S. Raju et al.
electrospray ionization mass spectrometry fucose galactose N-acetylglucosamine N-acetylglucosaminyltransferase-III ion exchange chromatography immunoglobulin G matrix-assisted laser/desorption ionization time-of-flight mass spectrometry mannose peptide N-glycosidase F recombinant immunoglobulin G reversed phase high-performance liquid chromatography sialic acid (N-acetylneuraminic acid) tissue plasminogen activator uridine diphosphate galactose uridine diphosphate N-acetylglucosamine α-1,3-galactosyltransferase α-2,3-sialyltransferase α-2,6-sialyltransferase β-1,4-galactosyltransferase
Contents 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 2.2 Significance of Fc Glycosylation for IgG Effector Functions 2.3 Impact of Terminal Sugars on the Effector Functions of IgGs 2.3.1 N-acetylglucosamine . . . . . . . . . . . . . . 2.3.2 Galactose . . . . . . . . . . . . . . . . . . . . 2.3.3 Sialic Acid . . . . . . . . . . . . . . . . . . . 2.4 Impact of Core Sugars on the Effector Functions of IgGs . 2.4.1 Bisecting N-acetylglucosamine . . . . . . . . . . 2.4.2 Core Fucose . . . . . . . . . . . . . . . . . . 2.4.3 High Mannose Glycans . . . . . . . . . . . . . 2.5 Impact of Non-human Glycan Epitopes . . . . . . . . . 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
40 42 44 44 45 46 50 50 51 51 52 53 53
2.1 Introduction IgGs are central components of the immune system of vertebrates [1, 2] and recombinant monoclonal IgG antibodies have become important human therapeutics. Serum IgG molecules are highly water soluble glycoproteins and comprise two identical light chains and two identical heavy chains that are covalently linked
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
41
through inter-chain disulfide bonds [3, 4]. Each light chain consists of two domains, a variable domain (VL) and a constant domain (CL); whereas each heavy chain comprises four domains, one variable domain (VH) and three constant domains (CH1, CH2 and CH3 domains) [1–4]. The complementarity determining regions (CDRs) of IgG molecules are located in the VL and VH domains [1–5]. The primary amino acid sequence and molecular conformation of CDRs determines the binding specificity of IgGs to recognize specific antigens [4–6]. The VL, VH, CL and CH1 domains together constitute the Fab portion of IgGs whereas CH2 and CH3 domains constitute the Fc portion [3, 4]. The Fab and Fc portions of IgGs are linked through a highly flexible hinge region, which in turn covalently connects the two heavy chains through inter-chain disulfide bonds [1–5]. Human IgGs are further subdivided into four isotypes i.e. IgG1, IgG2, IgG3 and IgG4 [3]. These IgG isotypes differ in the number and locations of the intra- and inter-chain disulfide bonds [3–5]. Variations between the IgG isotypes are also evident in the primary amino acid sequence and the number of disulfide bonds linking the two heavy chains in the hinge region [3–5]. The length and flexibility of the hinge region varies between IgG isotypes which in turn affects the relative mobility of the Fab and Fc domains [5]. Hence, the physicochemical and biological properties of IgG isotypes, and their ability to elicit antibody dependent cellular cytotoxicity (ADCC) and complement dependent cytotoxicity (CDC) responses differ from each other [1–6]. Thus human IgG1 is the isotype of choice for designing therapeutic R . The antibodies requiring an active Fc effector functions, for example Rituxan Fab portion of IgGs is relatively resistant to proteases whereas the Fc and hinge regions have been shown to be susceptible to proteolysis by several physiologically relevant proteases including bacterial proteases [7, 8]. Whilst the Fab portion of IgGs is important for antibody-antigen interactions, the Fc portion plays a key role in defining the antibody effector functions and pharmacokinetic properties of IgG molecules [9]. The Fc region of IgGs is involved in binding to various cellular Fc receptors, including FcγRIIIa and neonatal FcRn receptor [10, 11], and also in binding to C1q protein, the first component of the complement cascade [12]. Hence, the Fc region of IgGs is important for the ADCC and CDC [1–12]. The binding of Fc to the neonatal FcRn receptor, plays an important role in the pharmacokinetic properties of IgG molecules [13–15]. IgGs are N-glycosylated in the CH2 domain of the Fc at Asn297 (EU numbering) [3, 4, 16]. In human IgG, the majority of the Fc glycans are complex biantennary structures with a high degree of heterogeneity due to the presence or absence of different terminal sugar residues [3–6, 16]. In addition, human IgGs may also contain minor amounts of high-mannose and hybrid structures as part of their Fc glycan repertoire [16, 17]. The relative ratio of these glycans can vary considerably between species [18, 19]. It has been shown that Fc glycosylation is required for IgGs to bind to various Fc receptors including FcγRIIIa receptor [20] and also for antibody binding to C1q protein [3, 4, 12]. The effector functions of IgG antibodies such as ADCC and CDC are therefore dependent on Fc glycosylation. Although, at this point it is not yet clear whether Fc glycans affect the binding of IgGs to FcRn receptor, certain terminal sugars have been shown to affect the stability of IgGs and hence may affect the pharmacokinetic properties of these molecules [21, 22].
42
T.S. Raju et al.
Glycans present in the Fc region of IgG molecules contain a common core region consisting of three mannose (Man) residues and two N-acetylglucosamine (GlcNAc) residues in which one of the GlcNAc residue is linked to the Asn297 residue via an amide bond [17]. This core structure is extended into either high mannose, or hybrid, or complex structures by the addition and/or removal of sugar residues such as Man, GlcNAc (in the antennae), galactose (Gal), core fucose (Fuc), bisecting GlcNAc, and sialic acid (Sia) [16, 23, 24]. Structural heterogeneity of high mannose type glycans found in IgGs varies with the species [3, 18, 19]. For example, human IgGs contain small amounts of Man5 and related structures whereas chicken IgGs (also referred as IgYs) contain substantial amounts of highly heterogeneous high mannose structures. In addition, chicken serum IgGs (or IgYs) also contain a completely unprocessed Glc3 Man9 GlcNAc2 structure that is rarely found in serum glycoproteins of vertebrate origin. Along with high mannose structures, chicken IgGs (or IgYs) also contain a fully processed complex biantennary structures with core fucose and bisecting GlcNAc residue [18, 19]. Small amounts of sialylated complex biantennary structures with core fucose and bisecting GlcNAc residues are also found in chicken IgGs (or IgYs) [18, 19]. Hence, the glycan repertoire of chicken IgGs (or IgYs) contains a mixture of both completely processed and completely unprocessed glycans in almost equal proportions along with minor amounts of partially processed high mannose structures and sialylated complex glycans. Major glycans found in the Fc region of IgGs from vertebrate species other than human and chicken, are also mostly complex biantennary structures [16–19]. Highly branched structures such as tri- or tetraantennary structures are not found in the Fc region of these IgGs [16]. This is in contrast to serum-derived glycoproteins such as tissue plasminogen activator, erythropoietin and fetuin, in which highly branched, glycans, high mannose and hybrid glycans are all typically found [25, 26]. Thus, there is a remarkable simplicity of complex glycans found in the Fc region of serum IgGs of various vertebrate species in which only biantennary complex glycans are found. However, this uniformity is offset by extensive micro-heterogeneity resulting from the presence or absence of different terminal sugars including Sia, Gal, GlcNAc, and core Fuc and bisecting GlcNAc epitopes [3]. Terminal sugar residues are the major structural determinants of glycans specifying the interaction of glycoproteins with many sugar binding proteins, including lectins (carbohydrate binding proteins) [1–6, 27, 28] and affecting binding to various receptors. Hence the microheterogeneity of Fc glycans has important consequences for the function of IgGs [27, 28]. Therefore, minimising or abolishing the heterogeneity of IgG glycans is not only relevant to understanding the role of individual sugar residues but also to enhancing the biological functions of therapeutic IgG antibodies. In this chapter, different glyco-engineering strategies to prepare and characterize homogeneous IgG glycoforms are discussed.
2.2 Significance of Fc Glycosylation for IgG Effector Functions As discussed in the introduction, IgGs are glycosylated in the CH2 domain of the Fc at Asn297 and this glycosylation is important for binding of IgGs to Fc receptors [3–6]. Also, glycosylation is required for antibody binding to C1q protein, the first
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
43
component of the complement cascade [12]. The significance of Fc glycosylation can be examined by comparing the biological activities of glycosylated IgG with either enzymatically deglycosylated IgGs or by preparing aglycosylated IgGs using molecular biology techniques [4, 5]. Of these approaches, preparing deglycosylated IgGs using PNGase F (peptide-N-glycosidase F) is easier and more convenient than mutagenesis to change Asn297 residues [21, 22]. Unlike the surface exposed glycans present in serum glycoproteins, the glycans present in the CH2 domain of the Fc can be readily removed by treating with PNGase F under native conditions [21]. The progress of enzymatic deglycosylation can be examined using MALDI-TOF-MS (matrix-assisted laser desorption/ionization time of flight mass spectrometry) (see Fig. 2.1). Alternatively, aglycosylated IgGs can be expressed by using site-directed mutagenesis strategies or by expressing IgGs in E. coli [29]. Aglycosylated or deglycosylated IgGs can be purified by using traditional protein A
Fig. 2.1 Comparison of MALDI-TOF-MS of glycosylated, deglycosylated and aglycosylated IgGs: (a) Fc glycosylated recombinant IgG produced in CHO cells; (b) IgG glycosylated in the Fc and treated with PNGase F to remove glycans; (c) aglycosylated IgG produced in E. coli. Indicated molecular weights are not corrected and are subject to + or – 0.1% (+1, singly charged molecular ion; +2, doubly charged molecular ion; +3, triply charged molecular ion)
44
T.S. Raju et al.
chromatography as both glycosylated and deglycosylated (or aglycosylated) bind to protein A [21, 29]. Aglycosylated or deglycosylated IgGs have been shown to lack ADCC and CDC activities because these IgGs do not bind to FcγRIIIa receptor, and C1q protein [29]. However, the role of the Fc glycan can be substituted by mutating specific amino acid residues adjacent to the Asn297 glycosylation site (Ser298Gly/Thr299Ala) to produce aglycosylated IgG with some Fc receptor binding activity [6, 30]. There is no evidence that Fc glycans are involved in binding of IgGs to the neonatal Fc (FcRn) receptor as both glycosylated and deglycosylated IgGs bind equally to this receptor. Additionally, both glycosylated and deglycosylated IgGs have been shown to have similar serum half-lives [30]. In summary, the Fc glycosylation of IgGs is required for antibody effector functions but does not seem to be important for the pharmacokinetic properties of IgGs.
2.3 Impact of Terminal Sugars on the Effector Functions of IgGs 2.3.1 N-acetylglucosamine R The majority of Fc glycans present in therapeutic antibodies such as Rituxan (a chimeric recombinant antibody against CD20 antigen on B-cells) Herceptin R R (a humanized antibody against HER2-neu antigen), Avastin (humanized antiR body against vascular endothelial growth factor), Remicade (a chimeric antibody against tumour necrosis factor) contain terminal GlcNAc residues in their antennae [31]. This is probably due to incomplete sialylation and incomplete galactosylation during their biosynthesis [25, 26]. It is known that the terminal GlcNAc content of natural human IgGs varies with age and gender [32–39] and since terminal GlcNAc residues in the antennae may affect IgGs functions, it may be necessary to prepare homogeneous IgG glycoforms containing terminal GlcNAc residues (G0 glycoforms, see Fig. 2.2 for structure). The G0 glycoforms can be prepared in vitro by treating IgGs with sialidase, to remove minor amounts of terminal sialic acid residues, followed by β-galactosidase treatment to remove exposed terminal Gal residues. The most commonly used sialidase is sialidase A from Arthobacterium urefesus, which removes both α2,3- and α2,6-sialic acid residues and the enzyme has a wide pH range. Only the β-galactosidase from Diplococcus pneumoniae is able to remove terminal Gal residues from the Fc glycans of IgGs under native conditions [21, 22]. Although, none of the commercially available mammalian β-galactosidases are effective, several cell culture conditions have been developed to maximize the terminal GlcNAc content of glycoproteins [40, 41]. However, such conditions do not produce completely homogeneous preparations of IgG G0 glycoforms. In general, glycoproteins containing terminal GlcNAc residues bind to mannose receptors in the liver [23, 42–50]. As a result, such glycoproteins show reduced serum half-life compared to their sialylated counterparts [23, 49, 50]. In contrast, R R R , Herceptin , Avastin , etc which the serum half-life of rIgGs such as Rituxan
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
45
Fig. 2.2 Schematic representation of the structure of the major N-glycans found in recombinant IgGs produced in mammalian cells (NANA, N-acetylneuraminic acid or Sia, sialic acid; Gal, galactose; Man, Mannose; GlcNAc, N-acetylglucosamine; Fuc, Fucose)
contain significant amounts of G0 structures in the Fc, is considerably longer which is atypical of asialylated serum glycoproteins [47, 48]. Thus, terminal GlcNAc residues present in the antennae of Fc glycans may not significantly impact pharmacokinetic properties of IgGs. However, IgGs with terminal GlcNAc residues in the antennae have been shown to bind mannose binding protein (MBP) present in serum and activate the alternative complement cascade [44–47]. Further, increase in terminal GlcNAc content was shown to directly affect the terminal Gal content, resulting in reduced binding to C1q protein that will in turn reduce CDC bioactivity [12, 17, and 48]. In contrast, G0 content does not appear to affect either the antigen binding or the ADCC activity of IgGs.
2.3.2 Galactose Similar to terminal GlcNAc residues, IgGs also contain exposed terminal Gal residues due to under-sialylation during biosynthesis [3]. Accordingly, IgGs produced in cell culture in vitro or purified from serum may contain one or two terminal Gal residues (G1 or G2 glycoforms, see Fig. 2.2 for structure of glycans). Terminal Gal residues of Fc glycans can differentially affect IgG effector functions such as
46
T.S. Raju et al.
CDC activity. Hence, it may be necessary to prepare homogeneous G2 glycoforms of IgGs to assess the impact of terminal Gal residues on their functions [48]. This can be achieved either by altering cell culture conditions or by using in vitro glycoengineering methods [48]. Addition of Gal to cell culture media has been shown to increase terminal Gal content of IgGs [41, 48]. Also, overexpressing human β1,4galactosyltransferase (β-GalT) in cells using molecular biology techniques has been shown to increase terminal Gal content of rIgGs [48]. However, these methods do not provide homogeneous G2 glycoforms of rIgGs. Homogeneous G2 glycoforms can be easily prepared by treating IgGs in vitro with β-GalT and UDP-Gal under native conditions [25, 48]. Both human and bovine derived β-GalTs are capable of transferring Gal from UDP-Gal to terminal GlcNAc residues of rIgGs [25]. The reaction can be monitored either by MALDI-TOF-MS or NP-HPLC (normal phase high performance liquid chromatography) methods as shown in Figs. 2.3 and 2.4 [21, 22]. The re-galactosylated rIgGs can be easily purified using protein A affinity chromatography [22]. Glycoproteins containing terminal Gal residues are expected to bind to asialoglycoprotein receptors which are responsible for the reduced serum half-life of asialoglycoproteins [49, 50]. However, currently there is no evidence in the literature to show that the presence of terminal Gal residues in Fc significantly affects the pharmacokinetic properties of IgGs. In addition, terminal Gal residues do not appear to affect the binding of antibodies to antigens. But, terminal Gal content in Fc affects the CDC activity of some rIgGs [48]. For example, in the case of R , a chimeric recombinant antibody against CD20 antigen on B-cells, it Rituximab has been observed that an increase in terminal Gal content results in a linear increase in CDC activity [48]. This is because increase in terminal Gal content increases the antibody binding to C1q protein [48]. In contrast, terminal Gal residues do not affect R . This may be because terminal Gal residues do the ADCC activity of Herceptin not appear to affect antibody binding to FcγRIIIa receptor [48]. The effect of terminal Gal on antibody binding to C1q protein and CDC activity has been established for homogeneous antibody glycoforms containing two terminal Gal residues (G2 glycoform) as well as for homogeneous antibody glycoforms containing no terminal Gal residues (G0 glycoform) [48]. Serum derived IgGs as well as rIgGs contain significant amounts of glycans with one terminal Gal residue (G1 glycans) that are a mixture of two G1 isomers as shown in Fig. 2.5. The relative proportions of these two G1 isomers vary from species to species [18] but remain constant within the species regardless of variations in culture conditions (see Tables 2.1 and 2.2). At present it is not clear whether the presence or the absence of any of these two G1 isomers affect the antibody functions.
2.3.3 Sialic Acid Serum glycoproteins contain N-glycans that are surface exposed and often terminated with sialic acid residues [51]. Terminal sialylation increases the in vivo half-life of serum and recombinant glycoproteins [25, 49, 50]. This is because
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
47
Fig. 2.3 Positive ion mode MALDI-TOF-MS of glycans released from glycoengineered IgGs: (a) Fc glycans released from control IgG prior to glycoengineering; (b) Glycans released from IgG treated with sialidase A and β-galactosidase; (c) Glycans released from IgG treated with β1,4GT and UDP-Gal; (d) Glycans released from IgG treated with β1,4GT, α1,3GT and UDP-Gal (glycans were released by treating IgGs with PNGase F, the glycans were purified by ion-exchange chromatography to remove salts and protein, and then analyzed by MALDI-TOF-MS in the positive mode using sDHB (2,5-Dihydroxybenzoic acid) as the matrix (indicated molecular weights are for the singly charged molecular weights for corresponding sodiated glycans and are not corrected)
terminal sialylation prevents binding of serum glycoproteins to asialoglycoprotein receptor present in the liver that recognizes terminal Gal residues [49, 50]. Fc glycans of IgGs, including rIgGs, contain variable amounts of sialylated glycans [18–20]. Increase in sialylation of Fc glycans results in decreased ADCC activity of rIgGs because terminal sialylation decreases the affinity of these antibodies for FcγRIIIa receptor [52]. In addition, increased sialylation may also result in decreased binding to cell surface antigens [52]. In contrast, Fc sialylation has
48
T.S. Raju et al.
Fig. 2.4 NP-HPLC Profiles of N-glycans released from glycoengineered IgGs: (a) control IgG; (b) IgG treated with sialidase A and β-galactosidase; (c) IgG treated with β1,4GT and UDP-Gal; (d) IgG treated with a mixture of β1,4GT, α2,3-ST, CMP-Sia and UDP-Gal (glycans were released by treating IgGs with PNGase F, the glycans were labeled with anthranilic acid (AA), purified to remove excess labeling reagents and analyzed by normal phase-IEC)
Fig. 2.5 Structure of two G1 isomers found in serum derived and recombinant IgGs. The relative ratio of these two G1 isomers varies from species to species, but remains constant within the species and may not vary from lot-to-lot (see Table 2.1 and 2.2)
been attributed to increased anti-inflammatory activities of some antibodies [53, 54]. Kaneko et al. [53] used a mouse model of rheumatoid arthritis to demonstrate that sialylated human IgG has increased anti-inflammatory activity compared to the desialylated IgG [53]. Hence, sialylation of Fc glycans may have dual effects similar
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
49
Table 2.1 The relative proportions of two G1 isomers present in IgGs varies with species [18] Species
G1α1,6
G1α1,3
Cat Cow Dog Goat Guinea pig Horse Human Mouse Rabbit Rat Rhesus Sheep
50.1 23.9 62.5 30.8 54.3 43.0 59.9 75.4 29.5 39.0 59.3 13.6
49.9 76.1 37.5 69.2 45.7 57.0 40.1 24.6 70.5 61.0 40.7 86.4
The PNGase F released glycans of serum IgGs were analyzed by capillary electrophoresis with laser induced fluorescence detection (CE-LIF) [31] and the relative proportions of the two isomers were calculated from their respective peak area. Table 2.2 The relative proportions of the two G1 isomers remains constant within the species rIgG lots
G0/G2
G1α1,6/G1α1,3
#1 #2 #3 #4 #5 #6 #7 #8
9.3 8.6 4.9 4.2 6.4 6.6 8.3 7.4
3.0 3.1 3.0 3.2 3.1 3.0 3.1 3.2
Different lots of rIgGs produced in CHO cells under variable culture conditions were treated with PNGase F and released glycans were analyzed by CE-LIF [31]. Although the G0/G2 ratio varies with the culture conditions the two G1 isomers ratio remains constant.
to the sialylation of serum glycoproteins where terminal sialylation increases serum half-life but is involved in host-pathogen interactions, thus allowing viruses and bacteria to enter and kill the cells [3, 52–54]. Similarly, in the case of IgGs, Fc sialylation increases anti-inflammatory activity but reduces antibody effector functions along with decreasing antibody binding to certain cell surface antigens [52–54]. Sialylation of rIgGs can be increased by over-expressing sialyltransferases in mammalian cells [3, 41]. Sialylation of rIgGs can also be increased by subjecting them to combined in vitro galactosylation and sialylation in a single step [25]. Alternatively, sialylated IgGs can be enriched by selective purification using ionexchange chromatography or by using lectin affinity chromatography [52]. To use as control, asialylated IgGs can be prepared by treating with sialidase A, an exoglycosidase that has a broad specificity for both α2,3- and α2,6-linked sialic acid residues (see Section 2.3.1) [3].
50
T.S. Raju et al.
2.4 Impact of Core Sugars on the Effector Functions of IgGs 2.4.1 Bisecting N-acetylglucosamine Bisecting GlcNAc containing complex N-glycans are synthesized as a result of transfer of a GlcNAc residue from UDP-GlcNAc to the core Man (that is β1,4-linked to GlcNAc residue in the chitobiose core (see Fig. 2.6) residue at the O-4 position mediated by N-acetylglucosaminyltransferase-III (GnT-III) [55]. The bisecting GlcNAc content of Fc glycans can vary with the species [18, 19]. About 10% of the Fc glycans found in human IgGs contain bisecting GlcNAc residues. Some species do not express the active GnT-III enzyme and hence do not contain N-glycans with bisecting GlcNAc residues [18, 52]. For example, glycoproteins produced in normal CHO (Chinese hamster ovary) cells do not contain bisecting GlcNAc epitopes because of the absence of active GnT-III enzyme in these cells [55, 56]. Serum IgGs from dog, horse and cat do not contain detectable levels of bisecting GlcNAc containing Fc glycans [18, 19]. Whereas, greater than 70% of Fc glycans of sheep IgGs contain bisecting GlcNAc epitopes [18, 19]. Serum IgGs from other species contain variable amounts of bisecting GlcNAc containing Fc glycans [18, 19]. IgGs with Fc glycans containing bisecting GlcNAc can be produced by overexpressing GnT-III enzyme [57, 58]. Umana et al. [57] overexpressed GnT-III R that contained higher levels enzyme in CHO cells and produced Rituximab R of bisecting GlcNAc residues. The Rituximab thus produced showed increased R ADCC activity [57–59]. This is because bisecting GlcNAc containing Rituximab showed increased affinity to bind to FcγRIIIa receptor [57–59]. Similar improvement in ADCC activity and increased binding affinity to FcγRIIIa receptor has been shown for several rIgGs that contained bisecting GlcNAc residues [60, 61]. However, these rIgG samples also contained increased amounts of glycans that did not contain Fuc in the core region. Absence of core Fuc has been shown to increase ADCC because of the increased affinity of afucosylated antibodies for the FcγRIIIa receptor [62, 63]. Hence, it is not yet clear whether the absence of core Fuc or the presence of bisecting GlcNAc is the reason for increased ADCC activity of these IgG samples [57–63]. Bisecting GlcNAc containing rIgGs can also be prepared in vitro by treating G0 glycoforms with GnT-III and UDP-GlcNAc [55]. Although GnT-III enzyme is not available commercially, it can be expressed as a recombinant GnT-III enzyme, purified and then used for in vitro enzymatic reactions.
Fig. 2.6 Structure of biantennary N-glycan containing bisecting GlcNAc residue and terminated with GlcNAc residues in the antennae
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
51
2.4.2 Core Fucose Greater than 90% of Fc glycans of rIgGs produced in CHO cells contain a core fucose (Fuc) residue in the α1,6-position linked to the GlcNAc residue that in turn is linked to Asn297 [16–19]. Core fucosylated glycans are produced in the trans-Golgi by the transfer of a Fuc residue from GDP-Fuc mediated by α1,6-fucosyltransferase [64, 65]. More than 80% of the Fc glycans present in human IgGs are fucosylated [18, 66]. Absence of core Fuc residues in Fc glycans has been shown to substantially increase the ADCC activity of IgGs as the non-fucosylated antibodies bind FcγRIIIa receptors with increased affinity [62, 63, 67]. Hence, several strategies have been described by a number of groups to reduce fucosylation of rIgGs in the core region [67–69]. These strategies include the development of alternative cell lines that either lack the expression of α1,6-fucosyltransferase or express the enzyme at reduced levels [67–69]. Alternative strategies include, silencing of the gene that encodes the transferase using an RNAi method [63–69]. Attempts to remove the core Fuc residues by treating rIgGs with commercially available fucosidases have not been successful. This may be due to the lack of accessibility of core Fuc residues to these enzymes. Efforts to separate afucosylated IgGs from the fucosylated IgGs using lectin affinity chromatography are also largely unsuccessful. Hence, alternative strategies described above are the most useful way to produce afucosylated IgGs. IgGs that contain reduced levels of core Fuc residues show higher ADCC activities and improved binding affinity to Fc receptors [69]. Some of the rIgGs that are either completely non-fucosylated or contain significantly reduced amounts of core Fuc residues are currently in human clinical trials to be developed as human therapeutics [68, 69].
2.4.3 High Mannose Glycans As described earlier, chicken serum IgGs (or IgYs) contain a heterogeneous mixture of high mannose and complex biantennary type glycans [18, 19]. The distribution of high mannose structures in chicken IgGs (or IgYs) is a combination of partially processed Man5 GlcNAc2 , Man6 GlcNAc2 , Man7 GlcNAc2 , Man8 GlcNAc2 and Man9 GlcNAc2 structures along with a completely unprocessed Glc3 Man9 GlcNAc2 structure [18, 19]. However, human IgGs contain only a small amount of Man5 GlcNAc2 structure and other high mannose type glycans found in chicken IgGs (or IgYs) were either not found or present at very low levels. In rIgGs, high mannose content can vary from cell line to cell line and also from batch to batch [16–19]. Glycoproteins containing high mannose structures with terminal Man residues are known to bind to the mannose receptor present in the liver, which is responsible for the faster clearance of such glycoproteins from serum [42, 43]. It has been shown that the IgGs containing high mannose structures with terminal Man residues were cleared from serum at a faster rate than the IgGs containing complex glycans
52
T.S. Raju et al.
[70, 71]. However, Millward et al. [72] recently performed a pharmacokinetic study in mice with two preparations of an IgG in which one preparation was enriched with complex type glycans and the other preparation was enriched with high mannose type glycans, and found no significant difference in the serum half-life between the two antibody preparations. Hence, the effect of high mannose type Fc glycans on serum half-life of IgGs may vary from antibody to antibody and may be related to their respective specificity. In addition to their influence on serum half-life of IgGs, high mannose type Fc glycans may also influence the antibody effector functions. Zhou et al. [73] showed that high mannose containing rIgGs exhibit improved ADCC activity. These high mannose-containing antibodies bind FcγRIIIa receptor with increased affinity similar to the antibodies lacking core fucose residues. The high mannose type N-glycan structures are not fucosylated in the core region [65]. It is not yet clear whether the presence of high mannose structures with a terminal Man residue or the absence of core Fuc residues is responsible for the improved effector functions of such rIgGs. In the same study it was shown that the IgGs containing high mannose structures gave reduced binding to C1q protein and hence showed decreased CDC activity [73]. Accordingly, it appears that the high mannose type Fc glycans impact the serum half-life of some IgGs but often increase the ADCC activity and decrease the CDC activity of IgG molecules.
2.5 Impact of Non-human Glycan Epitopes In addition to the terminal sugars discussed above, IgGs from non-human species and rIgGs produced either in mouse myeloma cell lines or using transgenic animals or plants, may contain non-human terminal sugar epitopes such as α1,3-Gal or xylose and/or core Fuc residue in the α1,3-position instead of α1,6-position [3, 74–76]. For example, antibodies produced in transgenic plants may contain xylose and/or α1,3-Fuc residues [3, 76]. These xylose and/or α1,3-Fuc epitopes have been shown to elicit anti-carbohydrate immune responses [74–76]. Antibodies derived from non-human species or produced in mouse myeloma cell lines may contain α1,3-Gal epitopes, as these species or cell lines express α1,3-galactosyltransferase R , [3]. Humans contain circulating antibodies against α1,3-Gal epitopes. Cetuximab a chimeric mouse-human IgG1 monoclonal antibody against the epidermal growth factor receptor, produced in mouse myeloma cell lines, contains α1,3-Gal epitopes R developed anaphylaxis and contained [77]. Some patients treated with Cetuximab IgE antibodies specific for α1,3-Gal epitopes present in the glycans of the variable R [77]. These adverse events related to α1,3-Gal epitopes domain in Cetuximab R of Cetuximab have been shown to be confined to specific locations in North America [77]. However, it is not yet clear whether the α1,3-Gal epitopes present in the Fc glycans elicit anaphylaxis as observed for variable domain glycans present R or another type of immune response similar to the one developed in Cetuximab R . These non-human type glycan epitopes also by patients treated with Cetuximab
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
53
contribute to the terminal sugar repertoire of Fc glycans and hence may also affect the antibody effector functions, and pharmacokinetic properties.
2.6 Conclusions The glycans present in biological molecules such as glycolipids, proteoglycans and glycoproteins, including IgGs, play an important part in the chemistry and biology of these molecules. In particular, terminal sugars of the attached glycans are key structural determinants. The presence or absence of various terminal sugars including sialic acid, Gal, Man, GlcNAc on Fc glycans of IgGs increases microheterogeneity. This in turn affects antibody effector functions, serum half-life, and stability, and may also modulate antibody binding to some cell surface antigens. Structural studies of intact antibodies and Fc fragments suggest that the hydrophobic and hydrophilic interactions between terminal sugar residues and amino acid residues in the CH2 domain of the Fc affect the structural conformations of IgG molecules [17]. This may explain the functional affects of terminal sugars on the physicochemical and biological properties of IgGs. The production of IgGs with relatively homogeneous glycoforms using a combination of in vitro and in vivo glyco-engineering methods has enabled the role of different terminal sugars to be explored. Further, modifications which enhance effector function, for example incorporation of a bisecting GlcNAc into the N-glycan, offer a way of enhancing the therapeutic efficacy of rIgGs which act via ADCC and/or CDC. Another consideration for recombinant IgGs is the presence of non-human type terminal sugars which may contribute to product related adverse reactions due to, for example, immunogenicity of the non-human carbohydrate epitope. Therefore it is important to consider the expression system used for manufacturing potential therapeutic rIgGs and to evaluate glycosylation related issues that could be significant to regulatory agencies. Accordingly, careful monitoring of the nature of IgG glycosylation is required during the development of an antibody therapeutic. In addition, cell engineering methods are also available to produce rIgGs with preferred glycans with improved functions [78–96].
References 1. Davies DR, Metzger H (1983) Structural basis of antibody function. Annu Rev Immunol 1:87–117 2. Beale D, Feinstein A (1976) Structure and function of the constant regions of immunoglobulins. Q Rev Biophys 9:135–180 3. Raju TS (2003) Glycosylation variations with expression systems and their impact on biological activity of therapeutic immunoglobulins. BioProcess Int 1(4):44–53 4. Wright A, Morrison SL (1997) Effect of glycosylation on antibody function: implications for genetic engineering. Trends Biotechnol 15(1):26–32 5. Jefferis R (1991) Structure-function relationships in human immunoglobulins. Neth J Med 39(3–4):188–198 6. Jefferis R (1993) The glycosylation of antibody molecules: functional significance. Glycoconjugate J 10(5):358–361
54
T.S. Raju et al.
7. Ryan MH, Petrone D, Nemeth JF, Barnathan E, Björck L, Jordan RE (2008) Proteolysis of purified IgGs by human and bacterial enzymes in vitro and the detection of specific proteolytic fragments of endogenous IgG in rheumatoid synovial fluid. Mol Immunol 45(7): 1837–1846 8. Brezski RJ, Vafa O, Petrone D, Tam SH, Powers G, Ryan MH, Luongo JL, Oberholtzer A, Knight DM, Jordan RE (2009) Tumor-associated and microbial proteases compromise host IgG effector functions by a single cleavage proximal to the hinge. Proc Natl Acad Sci U S A 106(42):17864–17869 9. Jefferis R (2007) Antibody therapeutics: isotype and glycoform selection. Expert Opin Biol Ther 7(9):1401–1413 10. Nimmerjahn F, Ravetch JV (2008) Fcgamma receptors as regulators of immune responses. Nat Rev Immunol 8(1):34–47 11. Nimmerjahn F, Ravetch JV (2007) Fc-receptors as regulators of immunity. Adv Immunol 96:179–204 12. Duncan AR, Winter G (1988) The binding site for C1q on IgG. Nature 332(6166):738–740 13. Lencer WI, Blumberg RS (2005) A passionate kiss, then run: exocytosis and recycling of IgG by FcRn. Trends Cell Biol 15(1):5–9 14. Lobo ED, Hansen RJ, Balthasar JP (2004) Antibody pharmacokinetics and pharmacodynamics. J Pharm Sci 93(11):2645–2668 15. Roopenian DC, Akilesh S (2007) FcRn: the neonatal Fc receptor comes of age. Nat Rev Immunol 7(9):715–725 16. Mizuochi T, Taniguchi T, Shimizu A, Kobata A (1982) Structural and numerical variations of the carbohydrate moiety of immunoglobulin G. J Immunol 129(5):2016–2020 17. Raju TS (2008) Terminal sugars of Fc glycans influence antibody effector functions of IgGs. Curr Opin Immunol 20:471–478 18. Raju TS, Briggs JB, Borge SM, Jones AJ (2000) Species-specific variation in glycosylation of IgG: evidence for the species-specific sialylation and branch-specific galactosylation and importance for engineering recombinant glycoprotein therapeutics. Glycobiology 10(5): 477–486 19. Hamako J, Matsui T, Ozeki Y, Mizuochi T, Titani K (1993) Comparative studies of asparaginelinked sugar chains of immunoglobulin G from eleven mammalian species. Comp Biochem Physiol B 106(4):949–954 20. Mimura Y, Ghirlando R, Sondermann P, Lund J, Jefferis R (2001) The molecular specificity of IgG-Fc interactions with Fc gamma receptors. Adv Exp Med Biol 495:49–53 21. Raju TS, Scallon BJ (2006) Glycosylation in the Fc domain of IgG increases resistance to proteolytic cleavage by papain. Biochem Biophys Res Commun 341(3):797–803 22. Raju TS, Scallon B (2007) Fc Glycans terminated with N-acetylglucosamine residues increase antibody resistance to papain. Biotechnol Prog 33(4):964–971 23. Kobata A (2000) A journey to the world of glycobiology. Glycoconj J 17: 443–464 24. Kornfeld R, Kornfeld S (1985) Assembly of asparagine-linked oligosaccharides. Annu Rev Biochem 54:631–664 25. Raju TS, Briggs JB, Chamow SM, Winkler ME, Jones AJ (2001) Glycoengineering of Therapeutic Glycoproteins: in vitro galactosylation and sialylation of glycoproteins with terminal N-acetylglucosamine and galactose residues. Biochemistry 40(30):8868–8876 26. Raju TS, Lerner L, O’Connor JV (1996) Glycopinion: biological significance and methods for the analysis of complex carbohydrates of recombinant glycoproteins. Biotechnol Appl Biochem 24(Pt 3):191–194 27. Arnold JN, Wormald MR, Sim RB, Rudd PM, Dwek RA (2007) The Impact of glycosylation on the biological function and structure of human immunoglobulins. Annu Rev Immunol 25:21–50 28. Burton DR, Boyd J, Brampton AD, Easterbrook S, Emanuel EJ, Novotny J, Rademacher TW, van Schravendijk MR, Sternberg MJ, Dwek RA (1980) The Clq receptor site on immunoglobulin G. Nature 288(5789):338–344
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
55
29. Simmons LC, Reilly D, Klimowski L, Raju TS, Meng G, Sims P, Hong K, Shields RL, Damico LA, Rancatore P, Yansura DG (2002) Expression of full-length immunoglobulins in Escherichia coli: rapid and efficient production of aglycosylated antibodies. J Immunol Methods 263(1–2):133–147 30. Jefferis R (2009) Aglycosylated antibodies and the methods of making and using them: WO2008030564. Expert Opin Ther Pat 19(1):101–105 31. Raju TS (2000) Electrophoretic methods for the analysis of N-linked oligosaccharides. Anal Biochem 283(2):125–132 32. Yamada E, Tsukamoto Y, Sasaki R, Yagyu K, Takahashi N (1997) Structural changes of immunoglobulin G oligosaccharides with age in healthy human serum. Glycoconj J 14(3): 401–405 33. Parekh RB, Roitt IM, Isenberg DA, Dwek RA, Ansell BM, Rademacher TW (1988) Galactosylation of IgG associated oligosaccharides: reduction in patients with adult and juvenile onset rheumatoid arthritis and relation to disease activity. Lancet Apr 30;1(8592): 966–969 34. Alavi A, Axford J (1995) Evaluation of beta 1,4-galactosyltransferase in rheumatoid arthritis and its role in the glycosylation network associated with this disease. Glycoconj J 12:206–210 35. Opdenakker G, Dillen C, Fiten P, Martens E, Van Aelst I, Van den Steen PE, Nelissen I, Starckx S, Descamps FJ, Hu J, Piccard H, Van Damme J, Wormald MR, Rudd PM, Dwek RA (2006) Remnant epitopes, autoimmunity and glycosylation. Biochim Biophys Acta 1760(4):610–615 36. Popko J, Marciniak J, Zalewska A, Maldyk P, Rogalski M, Zwierz K (2006) The activity of exoglycosidases in the synovial membrane and knee fluid of patients with rheumatoid arthritis and juvenile idiopathic arthritis. Scand J Rheumatol 35(3):189–192 37. Rademacher TW, Jones RH, Williams PJ (1995) Significance and molecular basis for IgG glycosylation changes in rheumatoid arthritis. Adv Exp Med Biol 376:193–204 38. Routier FH, Hounsell EF, Rudd PM, Takahashi N, Bond A, Hay FC, Alavi A, Axford JS, Jefferis R (1998) Quantitation of the oligosaccharides of human serum igg from patients with rheumatoid arthritis: a critical evaluation of different methods. J Immunol Methods 213(2):113–130 39. Tsuchiya N, Endo T, Shiota M, Kochibe N, Ito K, Kobata A (1994) Distribution of glycosylation abnormality among serum IgG subclasses from patients with rheumatoid arthritis. Clin Immunol Immunopathol 70(1): 47–50 40. Routier FH, Davies MJ, Bergemann K, Hounsell EF (1997) The glycosylation pattern of humanized IgGI antibody (D1.3) expressed in CHO cells. Glycoconj J 14(2):201–207 41. Weikert S, Papac D, Briggs J, Cowfer D, Tom S, Gawlitzek M, Lofgren J, Mehta S, Chisholm V, Modi N, Eppler S, Carroll K, Chamow S, Peers D, Berman P, Krummen L (1999) Engineering Chinese hamster ovary cells to maximize sialic acid content of recombinant glycoproteins. Nat Biotechnol 17(11):1116–1121 42. Keck R, Nayak N, Lerner L, Raju S, Ma S, Schreitmueller T, Chamow S, Moorhouse K, Kotts C, Jones A (2008) Characterization of a complex glycoprotein whose variable metabolic clearance in humans is dependent on terminal N-acetylglucosamine content. Biologicals 36(1):49–60 43. Jones AJ, Papac DI, Chin EH, Keck R, Baughman SA, Lin YS, Kneer J, Battersby JE (2007) Selective clearance of glycoforms of a complex glycoprotein pharmaceutical caused by terminal N-acetylglucosamine is similar in humans and cynomolgus monkeys. Glycobiology 7(5):529–540 44. Malhotra R, Wormald MR, Rudd PM, Fischer PB, Dwek RA, Sim RB (1995) Glycosylation changes of IgG associated with rheumatoid arthritis can activate complement via the mannosebinding protein. Nat Med 1(3):237–243 45. Presta LG (2002) Engineering antibodies for therapy. Curr Pharm Biotechnol 3(3):237–256 46. Presta LG (2006) Engineering of therapeutic antibodies to minimize immunogenicity and optimize function. Adv Drug Deliv Rev 58(5–6):640–656
56
T.S. Raju et al.
47. Sato R, Matsushita M, Miyata M, Sato Y, Kasukawa R, Fujita T (1997) Substances reactive with mannose-binding protein (MBP) in sera of patients with rheumatoid arthritis. Fukushima J Med Sci. 43(2):99–111 48. Hodoniczky J, Zheng YZ, James DC (2005) Control of recombinant monoclonal antibody effector functions by Fc N-glycan remodeling in vitro. Biotechnol Prog 21(6):1644–1652 49. Stockert RJ, Morell AG, Ashwell G (1991) Structural characteristics and regulation of the asialoglycoprotein receptor. Targeted Diagn Ther 4:41–64 50. Ashwell G, Harford J (1982) Carbohydrate-specific receptors of the liver. Annu Rev Biochem 51:531–54 51. Varki A (1996) “Unusual” modifications and variations of vertebrate oligosaccharides: are we missing the flowers for the trees? Glycobiology 6(7):707–710 52. Scallon BJ, Tam SH, McCarthy SG, Cai AN, Raju TS (2007) Higher levels of sialylated Fc glycans in immunoglobulin G molecules can adversely impact functionality. Mol Immunol 44(7):1524–1534 53. Kaneko Y, Nimmerjahn F, Ravetch JV (2006) Anti-inflammatory activity of immunoglobulin G resulting from Fc sialylation. Science 313(5787):670–673 54. Nimmerjahn F, Ravetch JV (2007) The antiinflammatory activity of IgG: the intravenous IgG paradox. J Exp Med 204(1):11–15 55. Campbell C, Stanley P (1984) A dominant mutation to ricin resistance in Chinese hamster ovary cells induces UDP-GlcNAc: glycopeptide beta-4-N-acetylglucosaminyltransferase-III activity. J Biol Chem 259(21):13370–13378 56. Patnaik SK, Stanley P (2006) Lectin-resistant CHO glycosylation mutants. Methods Enzymol 416:159–182 57. Umana P, Jean M, Moudry R, Amstutz H, Bailey JE (1999) Engineered glycoforms of an antineuroblastoma IgG1 with optimized antibody-dependent cellular cytotoxic activity. Nat Biotechnol 17(2):176–180 58. Umana P, Jean M, Bailey JE (1999) Tetracycline-regulated over expression of glycosyltransferases in Chinese hamster ovary cells. Biotechnol Bioeng 65(5):542–549 59. Davies J, Jiang L, Pan LZ, LaBarre MJ, Anderson D, Reff M (2001) Expression of GnT-III in a recombinant anti-CD20 CHO production cell line: expression of antibodies with altered glycoforms leads to an increase in ADCC through higher affinity for Fc gamma RIII. Biotechnol Bioeng 74(4):288–294 60. Schuster M, Umana P, Ferrara C, Brünker P, Gerdes C, Waxenecker G, Wiederkum S, Schwager C, Loibner H, Himmler G, Mudde GC (2005) Improved effector functions of a therapeutic monoclonal Lewis Y-specific antibody by glycoform engineering. Cancer Res 65(17):7934–7941 61. Ferrara C, Brünker P, Suter T, Moser S, Püntener U, Umaña P (2006) Modulation of therapeutic antibody effector functions by glycosylation engineering: influence of Golgi enzyme localization domain and co-expression of heterologous beta1, 4-N-acetylglucosaminyltransferase III and Golgi alpha-mannosidase II. Biotechnol Bioeng 93(5):851–861 62. Shields RL, Lai J, Keck R, Connell LY, Hong K, Meng YG, Weikert SH, Presta LG (2002) Lack of Fucose on human IgG1 N-linked oligosaccharide improves binding to human fcgamma RIII and antibody-dependent cellular toxicity. J Biol Chem 277(30):26733–26740 63. Shinkawa T, Nakamura K, Yamane N, Shoji-Hosaka E, Kanda Y, Sakurada M, Uchida K, Anazawa H, Satoh M, Yamasaki M, Hanai N, Shitara K (2003) The absence of fucose but not the presence of galactose or bisecting N-acetylglucosamine of human IgG1 complextype oligosaccharides shows the critical role of enhancing antibody-dependent cellular cytotoxicity. J Biol Chem 278(5):3466–3473 64. Miyoshi E, Noda K, Yamaguchi Y, Inoue S, Ikeda Y, Wang W, Ko JH, Uozumi N, Li W, Taniguchi N (1999) The alpha1-6-fucosyltransferase gene and its biological significance. Biochim Biophys Acta 1473(1):9–20 65. Schachter H (1986) Biosynthetic controls that determine the branching and microheterogeneity of protein-bound oligosaccharides. Biochem Cell Biol 64(3):163–181
2
Glyco-engineering of Fc Glycans to Enhance the Biological Functions
57
66. Mimura Y, Lund J, Church S, Dong S, Li J, Goodall M, Jefferis R (2001) Butyrate increases production of human chimeric IgG in CHO-K1 cells whilst maintaining function and glycoform profile. J Immunol Methods 247(1–2):205–216 67. Jefferis R (2007) Antibody therapeutics: isotype and glycoform selection. Expert Opin Biol Ther 7(9):1401–1413 68. Imai-Nishiya H, Mori K, Inoue M, Wakitani M, Iida S, Shitara K, Satoh M (2007) Double knockdown of alpha1,6-fucosyltransferase (FUT8) and GDP-mannose 4,6-dehydratase (GMD) in antibody-producing cells: a new strategy for generating fully non-fucosylated therapeutic antibodies with enhanced ADCC. BMC Biotechnol 7:84 69. Scallon B, McCarthy S, Radewonuk J, Cai A, Naso M, Raju TS, Capocasale R (2007) Quantitative in vivo comparisons of the Fc gamma receptor-dependent agonist activities of different fucosylation variants of an immunoglobulin G antibody. Int Immunopharmacol 7(6):761–772 70. Wright A, Morrison SL (1994) Effect of altered CH2-associated carbohydrate structure on the functional properties and in vivo fate of chimeric mouse-human immunoglobulin G1. J Exp Med 180(3):1087–1096 71. Wright A, Sato Y, Okada T, Chang K, Endo T, Morrison S (2000) In vivo trafficking and catabolism of IgG1 antibodies with Fc associated carbohydrates of differing structure. Glycobiology 10(12):1347–1355 72. Millward TA, Heitzmann M, Bill K, Langle U, Schumacher P, Forrer K (2008) Effect of constant and variable domain glycosylation on pharmacokinetics of therapeutic antibodies in mice. Biologicals 36(1):41–47 73. Zhou Q, Shankara S, Roy A, Qiu H, Estes S, McVie-Wylie A, Culm-Merdek K, Park A, Pan C, Edmunds T (2008) Development of a simple and rapid method for producing non-fucosylated oligomannose containing antibodies with increased effector function. Biotechnol Bioeng Feb 15;99(3):652–665 74. Jin C, Altmann F, Strasser R, Mach L, Schähs M, Kunert R, Rademacher T, Glössl J, Steinkellner H (2008) A plant-derived human monoclonal antibody induces an anticarbohydrate immune response in rabbits. Glycobiology 18(3):235–241 75. Jin C, Hantusch B, Hemmer W, Stadlmann J, Altmann F (2008) Affinity of IgE and IgG against cross-reactive carbohydrate determinants on plant and insect glycoproteins. J Allergy Clin Immunol Jan; 121(1):185–190 76. Altmann F (2007) The role of protein glycosylation in Allergy. Int Arch Allergy Immunol 142(2):99–115 77. Chung CH, Mirakhur B, Chan E, Le QT, Berlin J, Morse M, Murphy BA, Satinover SM, Hosen J, Mauro D, Slebos RJ, Zhou Q, Gold D, Hatley T, Hicklin DJ, Platts-Mills TA (2008) Cetuximab-induced anaphylaxis and IgE specific for galactose-alpha-1,3-galactose. N Engl J Med 358(11):1109–1117 78. Cox KM, Sterling JD, Regan JT, Gasdaska JR, Frantz KK, Peele CG, Black A, Passmore D, Moldovan-Loomis C, Srinivasan M, Cuison S, Cardarelli PM, Dickey LF (2006) Glycan optimization of a human monoclonal antibody in the aquatic plant Lemna minor. Nat Biotechnol 24(12):1591–1597 79. Lonberg N (2005) Human antibodies from transgenic animals. Nat Biotechnol 23(9): 1117–1125 80. Potgieter TI, Cukan M, Drummond JE, Houston-Cummings NR, Jiang Y, Li F, Lynaugh H, Mallem M, McKelvey TW, Mitchell T, Nylen A, Rittenhour A, Stadheim TA, Zha D, d’Anjou M (2009) Production of monoclonal antibodies by glycoengineered Pichia pastoris. J Biotechnol 139(4):318–325 81. Hossler P, Khattak SF, Li ZJ (2009) Optimal and consistent protein glycosylation in mammalian cell culture. Glycobiology 19(9):936–949 82. De Muynck B, Navarre C, Nizet Y, Stadlmann J, Boutry M (2009) Different subcellular localization and glycosylation for a functional antibody expressed in Nicotiana tabacum plants and suspension cells. Transgenic Res 18(3):467–482
58
T.S. Raju et al.
83. Morrow KJ Jr (2007) Advances in antibody manufacturing using mammalian cells. Biotechnol Annu Rev 13:95–113 84. Werner RG, Kopp K, Schlueter M (2007) Glycosylation of therapeutic proteins in different production systems. Acta Paediatr Suppl 96(455):17–22 85. Majid FA, Butler M, Al-Rubeai M (2007) Glycosylation of an immunoglobulin produced from a murine hybridoma cell line: the effect of culture mode and the anti-apoptotic gene, bcl-2. Biotechnol Bioeng 97(1):156–169 86. Strohl WR (2009) Optimization of Fc-mediated effector functions of monoclonal antibodies. Curr Opin Biotechnol 20(6):685–691 87. Jefferis R (2009) Recombinant antibody therapeutics: the impact of glycosylation on mechanisms of action. Trends Pharmacol Sci 30(7):356–362 88. Yoo EM, Chintalacharuvu KR, Penichet ML, Morrison SL (2002) Myeloma expression systems. J Immunol Methods 261(1–2):1–20 89. Sazinsky SL, Ott RG, Silver NW, Tidor B, Ravetch JV, Wittrup KD (2008) Aglycosylated immunoglobulin G1 variants productively engage activating Fc receptors. Proc Natl Acad Sci U S A 105(51):20167–20172 90. Satoh M, Iida S, Shitara K (2006) Non-fucosylated therapeutic antibodies as next-generation therapeutic antibodies. Expert Opin Biol Ther 6(11):1161–1173 91. Macher BA, Galili U (2008) The Galalpha1,3Galbeta1,4GlcNAc-R (alpha-Gal) epitope: a carbohydrate of unique evolution and clinical relevance. Biochim Biophys Acta 1780(2): 75–88 92. Du J, Yarema KJ (2010) Carbohydrate engineered cells for regenerative medicine. Adv Drug Deliv Rev. Jan 28. [Epub ahead of print] PubMed PMID: 20117158 93. Solá RJ, Griebenow K (2010) Glycosylation of therapeutic proteins: an effective strategy to optimize efficacy. BioDrugs 24(1):9–21 94. Durocher Y, Butler M (2009) Expression systems for therapeutic glycoprotein production. Curr Opin Biotechnol 20(6):700–707 95. Jacobs PP, Callewaert N (2009) N-glycosylation engineering of biopharmaceutical expression systems. Curr Mol Med 9(7):774–800 96. Du J, Meledeo MA, Wang Z, Khanna HS, Paruchuri VD, Yarema KJ (2009) Metabolic glycoengineering: sialic acid and beyond. Glycobiology 19(12):1382–1401
Chapter 3
Bioinformatics Databases and Applications Available for Glycobiology and Glycomics René Ranzinger, Kai Maaß, and Thomas Lütteke
Abstract Bioinformatics for glycobiology is still considered to be in its infancy. Nevertheless, there are various applications and databases available for glycoscientists by now. This article summarizes the problems that glycoinformatics is facing and gives an overview of the existing resources, including web portals, databases and tools. Software for structure input and display, for processing of analytical data, for prediction and analysis of glycosylation sites, and applications related to carbohydrate 3D structures are described. Special emphasis is put on GlycomeDB, a project that aims to integrate all freely available carbohydrate structure data already stored in databases, and the taxonomic annotation of these structures, into one resource. By this means it allows researchers to locate data in many databases without having to learn the different query types and carbohydrate notations used in the individual resources. Keywords Bioinformatics · Carbohydrate database · GlycomeDB · Glycan · Glycosylation sites · Automatic annotation · 3D structure · Analytical software · Carbohydrate software tools Abbreviations CQS ETL-Process GU GT HPLC MS MD
complex query system Extract-Transform-Load-Process glucose unit glycosyltransferase high performance liquid chromatography mass spectrometry molecular dynamics
T. Lütteke (B) Faculty of Veterinary Medicine, Institute of Biochemistry and Endocrinology, Justus-Liebig University Gießen, Frankfurter Str. 100, 35392 Gießen, Germany e-mail: [email protected] Dedicated to Claus-Wilhelm “Willi” von der Lieth, who was a pioneer in the development of glycobioinformatics. R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_3, C Springer Science+Business Media B.V. 2011
59
60
NMR REST PDB SOAP 2-AB
R. Ranzinger et al.
nuclear magnetic resonance representational state transfer protein data bank simple object access protocol 2-aminobenzamide
Contents 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Glycobioinformatics Web Portals . . . . . . . . . . . . . . . . . 3.3 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Carbohydrate Structure Databases . . . . . . . . . . . . . . 3.3.2 Glycoenzyme and Lectin Databases . . . . . . . . . . . . . 3.3.3 Other Carbohydrate-Related Databases . . . . . . . . . . . 3.4 GlycomeDB . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Situation . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 GlycomeDB Project . . . . . . . . . . . . . . . . . . . . 3.4.3 Content and Quality Control . . . . . . . . . . . . . . . . 3.4.4 Web Portal Glycome-DB.org . . . . . . . . . . . . . . . . 3.4.5 Web Services . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 Future Perspectives . . . . . . . . . . . . . . . . . . . . 3.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Structure Input and Display . . . . . . . . . . . . . . . . 3.5.2 Tools for Processing Analytical Data . . . . . . . . . . . . 3.5.3 Prediction and Analysis of Glycosylation Sites . . . . . . . . 3.5.4 Carbohydrate-/Glycoprotein-Related 3D Structure Applications 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
60 61 62 62 68 69 69 69 71 72 72 76 77 77 77 78 84 85 86 86
3.1 Introduction Complex carbohydrate chains, often referred to as glycans, play numerous important roles in a variety of biological processes [1, 2]. Covalently attached to proteins they change the properties of these glycoproteins, making them more soluble [3], affecting charges or stability [4–6] and protecting them from proteolysis [7, 8]. The glycan moieties of glycoproteins can also serve as molecular addresses in protein trafficking [9, 10]. Carbohydrates on the cell surface participate in various cell–cell and cell–matrix interactions, ranging from fertilization and cellular differentiation to pathological events, e.g. pathogen-host interactions, immune responses, inflammation, and diseases such as cancer [11–13]. Regardless of their importance, the large-scale analysis of the carbohydrate chains in an organism, termed glycomics,
3
Bioinformatics Databases and Applications
61
is still lagging behind its sister fields, proteomics and genomics. The main reason for this is the fact that carbohydrates are encoded only indirectly in the genome. Unlike the expression of proteins, there is no single template that is translated into a glycan chain, but instead dozens of enzymes (glycosyltransferases and glycosidases) are involved in the biosynthesis of the glycans [14]. Depending on which of these enzymes are expressed in a cell that synthesizes a glycoprotein and the order in which these enzymes act on the growing glycan chains different glycans, the so-called glycoforms, can be attached to a single glycosylation site [15]. Due to this non-template-driven process there is no method to amplify glycan chains analogous to the polymerase chain reaction (PCR) or protein expression in bacteria. This means that glycans need to be purified and analyzed in physiological amounts [16]. Therefore, compared to genomics and proteomics, the amount of primary experimental and structural data is much less, which in part accounts for the fact that the field of bioinformatics for glycobiology is still in its infancy compared to bioinformatics applications for genomics and proteomics [17–19]. However, the main reason relates to the fact that the building blocks of carbohydrate chains, the monosaccharides, can be connected in several ways, giving rise to the potential of branched structures [20, 21]. These cannot be handled by “classical” bioinformatics algorithms that have been developed for linear structures such as DNA strands and proteins. New approaches, therefore, have to be developed specifically for carbohydrates [17]. Nevertheless, several applications and databases are already available to glycoscientists [18, 19]. These can be divided into two categories: on the one hand those that require an explicit description of the carbohydrate chains and on the other hand those where the protein moieties of glycoproteins, of enzymes that are involved in glycan metabolism, or of lectins are analyzed. While the former require the development of special algorithms, well-established approaches can be applied to the latter [22]. This chapter proposes to summarize the glycobioinformatics resources that are currently available. Special emphasis has been placed on GlycomeDB, a meta database to access multiple carbohydrate databases via one single search interface. Web portals, databases and tools are summarized in Tables 3.1, 3.2 and 3.4, respectively. Each row of these tables features a consecutive number, which is used (in parentheses) to reference the resources within the text.
3.2 Glycobioinformatics Web Portals Most of the carbohydrate databases and software applications are available online. Many of them can be accessed through web portals (summarized in Table 3.1) that combine various resources in a common layout. All these portals have different focuses. Within the Kyoto Encyclopedia of Genes and Genomes (KEGG), which aims to integrate genomic, chemical, and systemic functional information [23], KEGG Glycan (1) is focused on metabolic pathways of carbohydrate biosynthesis and breakdown [24]. The GLYCOSCIENCES.de portal (2) has a special emphasis on carbohydrate 3D structures but also features, for example, tools to aid experimentalists with the interpretation of mass spectrometry (MS) and/or nuclear
62
R. Ranzinger et al. Table 3.1 Glycoinformatics web portals
No Name
Description, reference
URL
1
KEGG Glycan
www.genome.jp/kegg/glycan
2
GLYCOSCIENCES.de
3
CFG
4
RINGS
5
EUROCarbDB
6
JCGGDB
Carbohydrate subpart of the KEGG portal [23, 24] Collection of various tools and databases [25] Web portal of the Consortium for Functional Glycomics [26] Data mining tools based on the data of KEGG Glycan and GLYCOSCIENCES.de Web portal of the European Carbohydrate Database Consortium Databases of the Japan Consortium for Glycobiology and Glycotechnology
www.glycosciences.de www.functionalglycomics.org
rings.t.soka.ac.jp
www.eurocarbdb.org
jcggdb.jp/index_en.html
magnetic resonance (NMR) data [25]. The web portal of the US Consortium for Functional Glycomics (CFG) (3) is the main access point for experimental data that have been generated by the consortium [26]. RINGS (4) provides various data-mining tools and some programs for carbohydrate structure format conversion. The data that form the basis of the data-mining applications originate mainly from the KEGG portal, and some GLYCOSCIENCES.de data are also included. The EUROCarbDB consortium (5) is focused on primary data from MS, NMR, and high performance liquid chromatography (HPLC) experiments. Another recently released portal is JCGGDB, a collection of databases of the Japanese Consortium for Glycobiology and Glycotechnology (6). Details on the databases and tools provided by these portals are described in the respective sections below.
3.3 Databases 3.3.1 Carbohydrate Structure Databases During the past two decades a number of databases related to glycobiology have been developed (see Table 3.2). The Complex Carbohydrate Structure Database (CCSDB) (7) [27, 28], often referred to as CarbBank, was the first attempt at a systematic collection of carbohydrate structures from the literature. The extraction of glycan information from publications, however, had to be done manually, because the branched carbohydrate structures are usually presented as graphics, which are difficult to parse automatically by computers. Therefore, CarbBank was not updated any more after funding stopped in the mid-1990s. Nevertheless, its data forms the
CCSDB/CarbBank
GLYCOSCIENCES.de database CFG glycan structures database
BCSDB KEGG Glycan Database
GlycoconjugateDB:Structures DOUGAL GlycoEpitopeDB ECODAB GlycoSuiteDB
CAZy KEGG Pathway
KEGG Orthology
CFG GT Database
GlycoGeneDB (GGDB) GPI Biosynthesis Report 3D Lectines CFG GBP Database
7
8
10 11
12 13 14 15 16
17 18
19
20
21 22 23 24
9
Name
No.
Human glycogenes database [42] Enzymes for biosynthesis of GPI anchors [43] Lectin 3D structures in the PDB CFG Glycan Binding Proteins Database [26]
Complex Carbohydrate Structure Database, the first database of carbohydrate structures [27, 28] Carbohydrate structure database of the GLYCOSCIENCES.de portal [25, 29] Carbohydrate structure database of the CFG portal [26] Bacterial Carbohydrate Structure Database [32] Carbohydrate structure database of the KEGG Glycan portal [24] Database of carbohydrate structures in the PDB [33] Glycoprotein 3D-Structure database Database of glycan recognition motifs E. coli O-antigen database [34] Formerly commercial database, now publicly available [35] Carbohydrate Active enZymes database [37] Collection of various biosynthesis pathway maps with links to individual compounds, reactions and enzymes [23] Functional classification of glycosyltransferases [23, 41] Glycosyltransferase database of the CFG portal
Description, reference
Table 3.2 Carbohydrate databases
www.functionalglycomics.org/glycomics/molecule/jsp/ glycoEnzyme/geMolecule.jsp riodb.ibase.aist.go.jp/rcmg/ggdb/ mendel.imp.ac.at/SEQUENCES/gpi-biosynthesis/ www.cermav.cnrs.fr/lectines/ www.functionalglycomics.org/glycomics/molecule/jsp/ gbpMolecule-home.jsp
www.genome.jp/kegg-bin/get_htext?ko01003.keg
www.cazy.org www.genome.jp/kegg/pathway.html#glycan
www.glycostructures.jp/ www.cryst.bbk.ac.uk/DOUGAL/ www.glyco.is.ritsumei.ac.jp/epitope/ www.casper.organ.su.se/ECODAB/ glycosuitedb.expasy.org/glycosuite/glycodb
www.functionalglycomics.org/glycomics/molecule/jsp/ carbohydrate/carbMoleculeHome.jsp www.glyco.ac.ru/bcsdb3/ www.genome.jp/dbget-bin/www_bfind?glycan
www.glycosciences.de/sweetdb/
www.boc.chem.uu.nl/sugabase/carbbank.html
URL
3 Bioinformatics Databases and Applications 63
KEGG GBP
Lectin Frontier Database
GlyAffinity O-GlycBase GlycoProtein DataBase
MonosaccharideDB
DisaccharideDBConformational maps of disaccharides GlycoMapsDB SugaBase GlycoBase (Lille)
GlycoBase (Dublin)
Glycan Mass Spectral Database
Glycome-DB
25
26
27 28 29
30
31
35
36
37
32 33 34
Name
No.
Meta-database covering several of the above databases [53, 54]
Conformational maps of carbohydrates [49] NMR data of CarbBank entries [50] Carbohydrate NMR spectra-glycobase.univ-lille1.fr/base/ Database of 2-aminobenzamide (2AB)-labeled released glycans [52] Mass spectrometry data
Monosaccharide notation database, includes tools for residue name parsing and validation
Affinity constants (Ka ) of a series of lectins towards a panel of pyridylaminated (PA) glycans Protein-carbohydrate interaction data O-Glycosylation sites in protein sequences [44] Experimentally identified N-glycoproteins
KEGG Glycan Binding Proteins Database
Description, reference
Table 3.2 (continued)
riodb.ibase.aist.go.jp/rcmg/glycodb/Ms_ ResultSearch www.glycome-db.org
glycobase.nibrt.ie/
www.glycosciences.de/modeling/glycomapsdb/ www.boc.chem.uu.nl/sugabase/sugabase.html
www.cermav.cnrs.fr/cgi-bin/di/di.cgi
www.glycosciences.de/affinity/ www.cbs.dtu.dk/databases/OGLYCBASE/ riodb.ibase.aist.go.jp/rcmg/glycodb/Glc_ ResultSearch www.monosaccharidedb.org
http://www.genome.jp/kegg-bin/get_htext? ko04091.keg riodb.ibase.aist.go.jp/rcmg/glycodb/LectinSearch
URL
64 R. Ranzinger et al.
3
Bioinformatics Databases and Applications
65
basis of many of the major carbohydrate structure databases that were developed later. For example, the GLYCOSCIENCES.de database (8) [25], the former SweetDB [29], started by making CarbBank data available online and combining them with computed carbohydrate 3D structures. Later on, NMR data extracted from the literature and information on carbohydrate structures in the Protein Data Bank (PDB) [30] were added to this database. The process of weekly updating the GLYCOSCIENCES.de database with new PDB entries is mostly automated, keeping these data up-to-date and independent of funding [31]. The glycan structure database of the CFG (9) [26] also incorporates some data from the CarbBank together with entries from the commercial GlycoMinds glycomics database as an initial seed and is extended with carbohydrate structures provided by the CFG’s experimental cores. The CFG is mainly interested in mammalian glycans, and the format to store the R (see Section 3.4), has a limglycan sequences in the database, the LinearCode ited dictionary of monosaccharide names. Therefore, not all of the CarbBank data sets are included in the CFG database. Another database that is targeted to glycans of a limited taxonomy is the Bacterial Carbohydrate Structure Database (BCSDB) (10) [32]. This database aims to provide access to all the bacterial carbohydrate structures in the literature, many of which are polysaccharides. Data sets include the bacterial carbohydrate structures from CarbBank and manually assigned structures from the literature that were published after the maintenance of CarbBank had been terminated. NMR data are also available in BCSDB. The KEGG Glycan database (11) [24] integrates information on glycan structures with other KEGG data, predominantly with information from biosynthetic or metabolic pathways and their respective enzymes. In this way the KEGG Glycan database combines glycan structure and glycoenzyme data in a single resource, so that users can easily access records on biochemical reactions related to the synthesis or breakdown of glycan chains. In addition to these rather general databases, various more specialized glycan structure databases have been developed. The carbohydrate structures in the PDB, for example, also serve as the data basis of GlycoconjugateDB:Structures (12) [33] and of DOUGAL (13). The former aims to provide records of both the carbohydrate structures covalently linked to glycoproteins and of the protein-carbohydrate complexes in the PDB, similar to the GLYCOSCIENCES.de database, while DOUGAL is exclusively focused on the glycoproteins. The GlycoEpitopeDB (14) collects records of carbohydrate recognition motifs together with antibodies that recognize these motifs, glycoproteins or glycolipids on which these motifs have been discovered, and the respective references in the literature. Enzymes for biosynthesis or breakdown are also listed for some of the motifs. O-antigenic structures that are used for serotyping of Escherichia coli are summarized together with literature references and lists of chemical shifts from NMR experiments in the ECODAB database (15) [34]. GlycoSuiteDB (16) [35] was originally developed as a commercial database in Australia and is now publicly available on the ExPASy Proteomics Server [36]. An overview of the kind of information that is available in some of the databases is presented in Table 3.3. The different databases offer individual query options, which are listed in Table 3.4. Some of the single query types can be combined
Biosynthesis
Analysis
Species Organ Tissue Cell Type Disease Method NMR Data HPLC Data Glycan Arrays MS Data Pathways Enzymes
Carboh. Sequence Aglycone
Structure
Literature Biol. Source
Subcategory
Information
+
+ +
+
BCSDB + + + + + + + + +
CarbBank
+
+ +
+ + + + + + +
CFG
+
+ + + + + + + + +
+
EUROCarbDB
+
+
+ +
+
GlycoBase (Dublin)
+ +
+ +
+
GlycoBase (Lille)
+
+ + + +
+ +
+ +
GLYCOSCIENCES.de KEGG
Table 3.3 Overview of the kind of data stored in various databases. A “+” indicates that the corresponding information is present in a database. The type of data that are present usually corresponds to the scientific focuses of the resources
66 R. Ranzinger et al.
+
+ +
b Only
+ + + + +
+ + +
CFG
of two substructures are possible for the search query. search by type of NMR data not by shift.
a Combinations
+ + + + +
+
+
+
+
Substructure Molecular weight Composition Exact structure Species Organ Tissue Disease Internal GlycanID Motif N-Glycan classification Literature NMR data HPLC data
CarbBank
BCSDB
Query
+
+ + +
+
+
EUROCarbDB
+
+ +
+ +
GlycoBase (Dublin)
+b
+ +
+
+ +
GlycoBase (Lille)
Table 3.4 Query options that are available in various databases
+
+
+a + + + +
+ + + + +
KEGG
GLYCOSCIENCES.de
3 Bioinformatics Databases and Applications 67
68
R. Ranzinger et al.
(e.g., a substructure search and a species search), but not all combinations are possible and usually each option can only be used once in a single database inquiry.
3.3.2 Glycoenzyme and Lectin Databases Databases storing records of enzymes that synthesize (glycosyltransferases, GTs) or break down (glycosidases) carbohydrates, summarized by the term glycoenzymes, or of carbohydrate-binding proteins (lectins), form a link between the classical protein databases and carbohydrate structure databases. One of the major databases of this kind is CAZy, the Carbohydrate-Active enZymes database (17) [37]. CAZy describes families of structurally related catalytic domains and carbohydratebinding modules of glycoenzymes. Well-established bioinformatics tools such as BLAST [38] or HMMER [39] are used to classify the proteins in CAZy according to their amino acid sequences. Approximately 1–3% of the proteins encoded by a typical genome can be classified as glycoenzymes by these methods [37, 40]. Glycoenzyme data are also stored in the glycan subsections of the KEGG Pathway (18) [23] and KEGG Orthology (19) [23, 41] databases of the KEGG Glycan portal (1), in the CFG GT database (20) and the GlycoGeneDB (GGDB) (21) [42]. The GPI Biosynthesis Report (22) [43] deals with enzymes for the biosynthesis of GPI anchors. Similar archives are available for lectins. Some enzymes also feature lectin domains outside the catalytic center that bind to carbohydrate epitopes, e.g., to keep the enzyme in position. Such domains are also listed in CAZy. Resources specifically dealing with lectins are the 3D LECTINES database (23), the CFG Glycan Binding Proteins (GBP) database (24) [26], KEGG Glycan Binding Proteins (25) as a subpart of the KEGG BRITE [23] functional hierarchies database, the Lectin Frontier DataBase (LfDB) (26), and GlyAffinity (27). Parts of the interface to LfDB, however, are only available in Japanese and therefore only of limited use for non-Japanese scientists. The data for proteins stored in O-GlycBase (28) [44] are not classified by function but by the fact that they are glycoproteins with O-glycosidically linked glycan chains. The entries list the amino acid sequences together with experimentally verified or computationally predicted glycosylation sites. Another resource of glycosylation sites is the GlycoProtein DataBase (GPDB) (29), which, similar to LfDB, is only partially available in English at the time of writing. The classical, more universal protein and gene databases such as UniProt (www.uniprot.org) [45], SwissProt (www.expasy.org/sprot/) [46], BRENDA (www.brenda-enzymes.org) [47], GenBank (www.ncbi.nlm.nih.gov/Genbank/) [48], and the PDB (www.pdb.org) [30] also contain information on glycoenzymes and lectins, but not in a focused way. This means that it is possible to find information such as amino acid sequence, taxonomy or nomenclature in these databases, but not to, for example, browse enzymes grouped by their carbohydrate substrate specificity. The glyco-specific databases often acquire the basic protein and/or gene data from these major classical databases before adding the carbohydrate information. Links to the classical resources are usually provided with these entries, so that
3
Bioinformatics Databases and Applications
69
they can easily be reached this way, while there are very few links from databases such as UniProt or GenBank to glycoenzyme- or lectin-specific resources.
3.3.3 Other Carbohydrate-Related Databases Several further databases concentrate on specific aspects of the carbohydrate field such as nomenclature, conformation, or primary experimental data. Residue notation is a serious problem in glycobioinformatics. Due to the diversity of monosaccharide base types and possible modifications, multiple names are feasible for most monosaccharides. This complicates storage and retrieval of data in carbohydrate databases as well as the exchange of data between the individual resources. MonosaccharideDB (30) aims to overcome this problem by providing routines to parse and validate names in various notations and to create unique residue names. DisaccharideDB (31) and GlycoMapsDB (32) [49] cover carbohydrate conformation by providing calculated conformational ϕ/ψ maps of glycosidic linkages. While the former is limited to disaccharide fragments, the latter also contains maps for more complex oligosaccharides, which makes it possible to analyze for example, the impact of branching on the conformational space that a linkage can cover. Primary experimental data are stored in relatively few databases. Sugabase (33) [50], GlycoBase (Lille) [51] (34), BCSDB (10) and the GLYCOSCIENCES.de database (8) feature NMR spectra, while GlycoBase (Dublin) (35) [52] holds published details of 2-aminobenzamide (2AB)-labeled and released glycans together with Glucose Unit (GU) values, monosaccharide composition, and other information. The Glycan Mass Spectral Database (36) of the JCGGDB portal (6) stores information on mass spectrometric analysis of carbohydrates, but is like other databases of this portal in being only partially available in English. MS data can be found in the glycan profiling section of the CFG portal websites (3) as well. This portal also features gene microarray, glycan array and mouse phenotyping data. EUROCarbDB (5) stores primary data of MS, NMR and HPLC experiments. This database enables users to create a personal account and upload their data to the database.
3.4 GlycomeDB 3.4.1 Situation As described above there are several databases for carbohydrate structures and corresponding information available. Figure 3.1 shows the global distribution of the major carbohydrate structure databases. The map includes the first carbohydrate structure database developed at the University of Georgia, CCSDB/CarbBank (7), and ongoing database projects [BCSDB (10), CFG database (3), EUROCarbDB (5), GlycoBase (Lille) (34), GlycoBase (Dublin) (35), GLYCOSCIENCES.de (2) and KEGG (1)]. Also the Japanese initiative for cross-referencing different Japanese
70
R. Ranzinger et al.
Fig. 3.1 Major carbohydrate structure databases all over the world
databases, JCGGDB (6), is shown. The main problem with these databases that have been developed over the last two decades is the absence of standardization in the digital data representation. Figure 3.2 demonstrates this problem with an example of carbohydrate sequences. The figure shows the core fucosylated N-glycan core in the sequence formats used by different carbohydrate databases. Displayed are the CabosML format (see Fig. 3.2 A) [55] as used in JCGGDB, the LINUCS format (see Fig. 3.2 B) [56] as utilized by GLYCOSCIENCES.de, the BCSDB sequence format (see Fig. 3.2 R (see Fig. 3.2 D) as utilized by CFG, the CarbBank format (see C), LinearCode Fig. 3.2 E), the sequence format of the GlycoBase (Lille) (see Fig. 3.2 F), the format used in GlycoBase (Dublin) (see Fig. 3.2 G) [52], the KCF format (see Fig. 3.2 H) [57] as used by KEGG, and GlycoCT{condensed} (see Fig. 3.2 I) [58] as utilized in EUROCarbDB. As the figure shows each database uses a different approach for storing carbohydrate sequences. Even the monosaccharide names may differ from one sequence format to another. The same lack of standardization also exists for other data corresponding to carbohydrate sequences, such as taxonomic annotations and/or experimental information. This compilation makes it very hard to compare the data content of the single databases or to cross-reference and compare datasets between different databases. However, there are exceptions to this situation. BCSDB (10), the CFG database (9), GLYCOSCIENCES.de (8) and KEGG (11) reference the corresponding CarbBank entries for their structures. In addition, BCSDB and CFG link to GLYCOSCIENCES.de entries, and in turn, GLYCOSCIENCES.de features cross-references to CFG entries. The largest interconnection between two databases is the implementation of a cross-database search between BCSDB and GLYCOSCIENCES.de [59]. Despite these efforts, however, the databases must still be considered as isolated islands with no or only a weak connection to each other. A user, who is interested to collect all information on a carbohydrate structure, faces
3
Bioinformatics Databases and Applications
71
Fig. 3.2 The fucosylated N-glycan core in the CFG pictogram representation and in different R sequence formats. A: CabosML, B: LINUCS, C: BCSDB format, D: LinearCode , E: CarbBank format, F: GlycoBase (Lille) format, G: GlycoBase (Dublin) format, H: KCF, I: GlycoCT
the problem that he needs to go to each database and use the different query input interfaces of the various databases to obtain all this information.
3.4.2 GlycomeDB Project The GlycomeDB project was started in 2005 to overcome the isolation of each glycan database and to obtain an overview of all digitally available carbohydrate structures [53]. The aim of this project is to integrate all freely available carbohydrate structure data already stored in databases, and the taxonomic annotation of
72
R. Ranzinger et al.
these structures, into one resource. During the integration process, the structural and taxonomic data from the different databases are transformed into a unique digital representation. The GlycoCT{condensed} format [58] (see Fig. 3.2 I) is used to encode the carbohydrate sequences. The taxonomic annotations contained in the databases are mapped to IDs in NCBI taxonomy [60]. After transforming the data into the standard representation they are uploaded into the new database GlycomeDB, thereby, preserving the references to the original data sets of the structures. GlycomeDB can be considered as a data-warehouse for carbohydrate structures and their taxonomic annotations because of this Extract-Transform-Load-Process (ETL-Process). The project started with the data from GLYCOSCIENCES.de (8) and CarbBank (7). In cooperation with the database providers several databases were added to the integration process between 2006 and 2009. This includes BCSDB (10), the CFG database (9), EUROCarbDB (5), GlycoBase (Dublin) (35), GlycoBase (Lille) (34), KEGG (11), JCGGDB (6) and PDB. Table 3.5 summarizes the numbers of structures present in the different databases and the sequence formats used by the resources. The integration process is run on a weekly basis, allowing the addition of all new structures from the databases in GlycomeDB.
3.4.3 Content and Quality Control By November 2009 GlycomeDB contained an index of more than 35,000 different GlycoCT structures from all databases. Nearly half of the structures have taxonomic annotations, which originate from more than 1800 different taxon entries. The contents of the database are freely available, and have been used for statistical analyses of chemical building blocks in mammalian oligosaccharides [61] and comparison of mammalian and bacterial carbohydrates [21]. During data integration, the carbohydrate structures from different databases are checked for correctness. These checks include testing sequences with the grammar of the sequence format, testing residue names with the monosaccharide and substituent dictionaries integrated in GlycomeDB, and testing the structures with chemical rules. These rules are able to find impossible molecular modifications, such as a deoxygenation at position two of an N-acetylglucosamine and invalid linkages, such as a 1–6 linkage between two fucose units. In the case of a structure that does not pass these tests, the failure and error messages are stored in GlycomeDB, too. Reports are generated from the list of errors and sent to the providers of the databases. These reports are used by the database providers to delete or correct the structures in their database, leading to a steady increase in the data quality of databases.
3.4.4 Web Portal Glycome-DB.org Because the original database references were preserved during the integration process, GlycomeDB can be used to find all data sets in the different databases, in which a certain carbohydrate structure occurs. The web portal Glycome-DB.org (37) [54]
6675 13, 558 3062 6219 233 − 5384 − 14, 783
With taxons 9067 49, 897 8643 16, 458 234 377 23, 235 10, 969 35, 205
Data setsb 8918 25, 206 3062 6219 233 − 5384 − 14, 783
With taxons 6087 14, 887 6285 13, 304 195 359 15, 803 10, 160 35, 205
Unique GlycoCT structuresc
BCSDB format CarbBank format R LinearCode GlycoCT Format of GlycoBase (Lille) Format of GlycoBase (Dublin) LINUCS KCF GlycoCT
Sequence format
tuple). c Structures translated to GlycoCT after release of the aglycone part, erroneous structures are excluded.
b Datasets
structures according to the database internal format. in the database. Numbers correspond to the primary key of the database (typically structure but in CarbBank and BCSDB a structure-publication
6789 23, 402 8643 13, 458 234 377 23, 235 10, 969 35, 205
BCSDB CarbBank CFG EUROCarbDB GlycoBase (Lille) GlycoBase (Dublin) Glycosciences.de KEGG GlycomeDB
a Unique
Unique structuresa
Database
Table 3.5 Overview of the number of structures present in different databases and of the corresponding sequence formats
3 Bioinformatics Databases and Applications 73
74
R. Ranzinger et al. Table 3.6 Search queries in web portal Glycome-DB.org
Query type
Description
Search by database ID
Search for a structure based on the ID of this structure in GlycomeDB or another database. It is also possible to search for all structures originating from a certain database. Search for a particular structure in the database. Search for all structures in the database which contain the given structure as a substructure. Search for all structures which are similar to the given structure. The similarity is calculated based on the co-occurrence of disaccharide patterns [63]. Search for the maximum common substructure (MCS) between the entered structure and all structures in the database. Search for all structures with a given species annotation. It is also possible to search for all structures that have at least one species annotation.
Exact structure search Substructure search Similarity search
MCS search Search by species
was established to serve as a single search engine for carbohydrate structures in all of the integrated databases. This portal allows carbohydrates to be found using different search queries, which are summarised in Table 3.6. For the structural queries [Exact structure search, Substructure search, Similarity search and Maximum common substructure (MCS) search] it is possible to use different input options for the structure and substructures, respectively. The user can draw the carbohydrate structure using the GlycanBuilder applet (38) [62] or the DrawRings applet (39). It is also possible to enter a structure using different textual representations (see Fig. 3.2). Each search query shown in Table 3.6 has several additional options that can modify the search. For example, substructure searches can be restricted to the reducing end of a structure, or species searches can be restricted to search only structures that are exclusively annotated with a certain species. All searches have a common negation option to enable the finding of all opposite structures. The results of a search in the database can either be a single structure or a list of candidate structures. The single structure page displays all available information on the structure in a web page, as partially shown in Fig. 3.3. The web page starts with a figure of the structure and the input fields that can generate pictures in other display styles or other file formats. These images are generated using the GlycoWorkbench library [64]. Following the picture section, a list of species annotations is shown. For each species the name and NCBI Taxonomy ID are given. It is also possible to show for each annotation the list of original database entries for this species annotation by using the web link “Show remote structure evidence” (see Fig. 3.3). Below the species section, the complete list of original database entries is shown. For each entry the name of the corresponding database and the ID of the dataset in this database are displayed. The ID serves as a web link to the web-page of the structure in the original database. The next section contains a list of common motifs which were found in the structure. After the motifs list, a section containing the names of all non-carbohydrate residues (such as lipids or amino
3
Bioinformatics Databases and Applications
75
Fig. 3.3 Reduced web page with information on structure 1643 in GlycomeDB
acids) which were attached to the carbohydrate structure in the original database is shown. Similar to the species section it is possible to list each aglycone name of the entries in the original database by using the “Show remote structure evidence” web link in this section. Following the aglycone section, but not shown in Fig. 3.3, there are fields for downloading the structure in different sequence formats and a section containing the GlycoCT{condensed} encoding of the structure. In the case where the search query results in multiple structures rather than a single structure, a web page with all candidate structures is shown. The user can either get all the information on a selected structure or he can refine his search query. This query refinement is performed by the Complex Query System (CQS). The CQS is a unique querying system that allows refining search results by one or multiple additional search criteria. The system supports the operations subset and union set between two query results and the operation complementary set of a single
76
R. Ranzinger et al.
Fig. 3.4 Illustration of a complex query in Glycome-DB.org. The numbers on the left-side show the counts of all structures in the database fitting the search criteria. The numbers on the right-side show the counts of structures after each step of the complex query
query result. With this system it is possible to answer rather complex questions by simply using the Glycome-DB.org web portal. This can be illustrated by the following example: Find all structures with an O-glycan core 1 at the reducing end and blood group H epitope that occur in the mouse or rat but not in humans. Figure 3.4 visualizes how this question can be answered using the CQS in Glycome-DB.org. The first part of the query is a search for all structures annotated with mouse (Mus musculus), resulting in a set of 563 structures. Between this set and the search for all structures with the annotation Rattus a union set operation is performed resulting in 1114 structures. The next step is a subset operation between these results and a search for all non-human structures, which gives 622 structures meeting the taxonomic part of the question. With this result set, further subset operations are performed with substructure searches for the O-glycan core 1 motif and blood group H motif. The final result consists of eight structures.
3.4.5 Web Services One important requirement for a modern database such as GlycomeDB is that the data are not only available as web pages but also in a machine readable form. This is necessary as it allows other databases and external programs to access and work with the data in the database in an automated manner. To fulfill this requirement GlycomeDB provides two different web service interfaces which allow an automated
3
Bioinformatics Databases and Applications
77
access to the data in the database. The first interface is based on representational state transfer (REST), and the second interface is based on the simple object access protocol (SOAP) and the web service framework BioMoby (http://biomoby.org/) [65, 66]. With these two interfaces it is possible to request all data stored in GlycomeDB. Since most of the integrated databases do not provide their own web service interfaces, GlycomeDB also works as a proxy web service provider for these databases. This is possible as GlycomeDB stores not only the structure index and taxonomic annotations of the structures but also the references to the structures in the other databases.
3.4.6 Future Perspectives The next steps in the development of the GlycomeDB project are the integration of additional information from the databases, such as tissue distribution, the relation of the carbohydrate to disease, and bibliographic information. The search options on the web portal have to be adapted to these new data. In addition, the integration of further databases, such as GlycoSuiteDB, will make GlycomeDB and the search engine more powerful. A perspective for all database providers is integration of the crosslink information, which is available in GlycomeDB, into the existing databases. This will allow users of a database to find related information on a specific carbohydrate in the web pages of other database providers without using GlycomeDB directly.
3.5 Tools The development of software tools in the field of glycomics is driven mainly by two major aspects: the input of carbohydrate structures and the processing of primary analytical data, such as mass spectrometric data (MS spectra), nuclear magnetic resonance data (NMR spectra), and high-performance liquid chromatographic data (HPLC chromatograms). Due to the fact that most companies, which develop such analytical equipment, do not focus on the detection and/or analysis of carbohydrates their software is often not suitable for the requirements of glycan structure analysis. Therefore, many research groups all over the world have started to design their own software, which is, regrettably, often tailored to the specific analytical questions to be solved and cannot be used in other laboratories without additional programming adjustments. Several initiatives have tried to overcome this situation in the past twenty years and have started the design of software tools, many of which are now freely available via the internet (e.g. via the web portals described above) and, therefore, usable for researchers worldwide.
3.5.1 Structure Input and Display Due to the complexity of carbohydrate structures with branching and multiple potential linkage positions between two monosaccharides, the encoding of structures and,
78
R. Ranzinger et al.
therefore, the input of such molecules is not straightforward. While the encoding of proteins or DNA sequences can be done with text editors to write and display the linear sequences, in the glycomics field graphical editors are the key technology to input or edit a structure. These editors allow users an easy start in using these programs without any knowledge of the encoding scheme used in the background. As these editors use cartoon-like representations for glycan structures the embedded controlled vocabulary minimizes errors during the input process. Most often their development was embedded in a database project. Two examples of such editors are the DrawRINGS input tool (39) developed at SOKA University in Japan and the Glycanbuilder (38) [62] developed by the EUROCarbDB initiative (5). Both tools use a graphical representation based on the CFG notation (see Fig. 3.5) [67] to allow translation to other encodings.
3.5.2 Tools for Processing Analytical Data The most commonly used technologies to analyze carbohydrate structures are nuclear magnetic resonance (NMR), high-performance liquid chromatography (HPLC), and mass spectrometry (MS). In the past twenty years a lot of software tools have been designed to overcome the lack of commercial software in these fields. 3.5.2.1 MS Mass spectrometry probably offers the fastest method to derive useful structural information from a sample to be analyzed, as it offers unrivalled levels of sensitivity and the ability to handle complex mixtures of different glycans. Most recently developed MS techniques allow the generation of mass spectra from fragmented carbohydrates to determine the structure of the glycan molecule. With respect to the limits of mass spectrometry (neither the discrimination between isobaric monosaccharides, such as glucose, galactose, and mannose, nor the detection of anomeric states is possible), the bottleneck to complete structure determination is the lack of robust software tools for the analysis of MS data in high-throughput glycomics projects. Composition Analysis Calculations of glycan compositions – most often the first step in structure analysis – scale exponentially with the size of detected molecules. Therefore, parameters for the calculation need to be constrained to prevent an enormously large number of alternatives presented to the user. Taxonomic and biosynthetic restrictions can be best used to narrow the results to classes of carbohydrates fitting to the analytical problem. GlycoMod (40) [68] is a web-based tool for compositional analysis. The number of calculated composition proposals can be restricted by selecting the type of monomers present and the type of glycan (N- or O-linked). Limiting factor for the calculation is a set of pre-calculated compositions used to minimize calculation
3
Bioinformatics Databases and Applications
79
Fig. 3.5 Screenshots from DrawRINGS (39) (top) and GlycanBuilder (38) (bottom)
time. Unlike Glyco-Peakfinder (41) [69], the most recent development, GlycoMod is not able to calculate the composition of fragments produced in MSn spectra. A consecutively achieved database search in GlycoSuiteDB (16) allows users to search for already known structures fitting the detected composition(s). Glyco-Peakfinder follows the same strategy, but allows complete de novo calculation as a set of monosaccharides, reducing-end modifications, derivatizations and glycoconjugates (peptides and lipids) is embedded, but no pre-calculations have been achieved. The detected results can also be checked for biological relevance by a database search in open-access databases.
80
R. Ranzinger et al.
Another tool termed Cartoonist (42) [70, 71] combines composition analysis with structure models. Cartoonist generates, depending on user settings and embedded rules, all mammalian cell-specific N- or O-glycans and matches their masses with the given mass list resulting in a fully assigned spectrum showing one or more structure proposals referring to the mass signals. The inclusion of such pathways dramatically reduces the number of possible results but at the same time restricts solutions based on intrinsic knowledge. Semi-automatic Sequencing Tools GlycoFragments (43) [72, 73] computes all theoretical fragments from an entered structure. All fragments will be displayed with their calculated masses to assist scientists in the manual annotation process. In contrast, matching of fragments with the given peak list is done automatically by GlycoWorkbench (44) [64]. This software suite contains several other plug-ins to be used as a single tool in the whole structure determination process. For structure input the Glycanbuilder (see above) is implemented, as well as Glyco-Peakfinder for composition analysis. Like GlycoPeakfinder and Glycanbuilder, the tool contains a complete list of monosaccharides, derivatizations, substituents and reducing-end modifications. Peak lists processed from MS software or raw spectra can be directly fed into the tool, to either directly start the annotation process or allow manual peak picking beforehand. The in silico generated fragments will be subsequently and automatically matched with the compiled peak list, including the annotation of all types of mass spectrometric cleavages. The results can be reported either in tabular or graphical forms comparable to Cartoonist style. De Novo Sequencing To overcome the restrictions described for composition analysis and semi-automatic sequencing several de novo sequencing strategies have been implemented in software tools. A list of tools can be found in Table 3.7. Exemplarily, three of them will be described here. STAT (48) [74] starts from a composition proposal calculated for a given mass and calculates all possible structural topologies that fit the selected monosaccharide composition. Subsequently, the structure models are matched with the complete peak list and ranked accordingly. The structures thus produced are evaluated against the given peak list, and ranked accordingly. Oscar (49) [75] is restricted to N-glycan structures and includes pathways-derived information from MSn experiments of permethylated compounds into the candidate structures generation process. Remaining structure candidates must comply with all given pathway rules. StrOligo (50) [76] starts by analyzing the molecular loss between mass signals in the fragment spectra to, finally, derive the composition of a precursor ion. The tool is restricted to underivatized glycans and can only work with glycosidic cleavages. The next step is again structure generation from the calculated composition with respect to the biosynthetic rules of mammalian N-glycans, which limits the number of structure candidates to combinations of the given terminal groups.
Name
DrawRINGS
39
Graphical glycan structure editor (embedded in GlycoWorkbench) [62] Graphical structure editor
Description, Reference
GlycoMod
Glyco-Peakfinder Cartoonist GlycoFragment
GlycoWorkbench
GlycoSearchMS GlycanMass
PMAA
STAT
Oscar
StrOligo
Glych ProspectND
40
41 42 43
44
45 46
47
48
49
50
51 52
Calculation of glycan compositions from molecular ion peaks [68] Composition analysis of glycoconjugates [69] Template based glycan sequencing [70, 71] Calculation of masses for glycan fragments [72] Structure determination from mass spectrometric data [64] Comparison of MS-spectra [73] Calculation of glycan masses from composition GC-MS fragmentation of permethylated monosaccharides De novo sequencing of glycans from MS data [74] De novo sequencing of glycans from MS data [75] De novo sequencing of glycans from MS data [76] De novo sequencing of glycans from MS data Processing of glycan NMR data
Tools for glycan structure analysis
GlycanBuilder
38
Structure editors
No
– www.boc.chem.uu.nl/static/local/prospectnd/manual_prospectnd.html
–
–
–
www.ccrc.uga.edu/specdb/ms/pmaa/pframe.html
www.glycosciences.de/sweetdb/start.php?action=form_ms_search/ www.expasy.org/tools/glycomod/glycanmass.html
www.glycoworkbench.org
www.glyco-peakfinder.org – www.glycosciences.de/tools/GlycoFragments
www.expasy.org/tools/glycomod/
rings.t.soka.ac.jp/cgi-bin/drawrings.pl
www.ebi.ac.uk/eurocarb/gwb/builder.action
URL
Table 3.7 Glyco-related tools
3 Bioinformatics Databases and Applications 81
CASPER GlyNest CCPN autoGU
GALAXY
53 54 55 56
57
NetCGlyc YingOYang GlycoPred
GECS
GlySeq
60 61 62
63
64
Sweet-II
GLYCAM Biomolecules Builder
65
66
3D structures
NetNGlyc NetOGlyc
58 59
Creation of carbohydrate 3D structure models [84] Carbohydrate 3D structure models and in silico glycosylation of proteins
Prediction of N-glycosylation sites Prediction of mucin-type O-glycosylation sites [79] Prediction of C-mannosylation sites [80] O-GlcNAc prediction [81] Prediction of the location of N-linked and O-linked glycosylation sites from amino acid sequence Gene expression to chemical structure, prediction of N-glycan chains [63] Analysis of glycoprotein sequences [82]
glycam.ccrc.uga.edu/ccrc/biombuilder/biomb_index.jsp
www.glycosciences.de/modeling/sweet2/
www.glycosciences.de/glyseq/
www.genome.jp/tools/gecs/
www.cbs.dtu.dk/services/NetCGlyc/ www.cbs.dtu.dk/services/YinOYang/ comp.chem.nottingham.ac.uk/glyco/
www.cbs.dtu.dk/services/NetNGlyc/ www.cbs.dtu.dk/services/NetOGlyc/
www.glycoanalysis.info/galaxy2/ENG/
www.casper.organ.su.se/casper/ www.glycosciences.de/sweetdb/start.php?action=form_shift_estimation www.ccpn.ac.uk glycobase.nibrt.ie/cgi-bin/profile_upload.cgi
1H
and 13 C estimation [77] NMR chemical shift estimation [92] NMR data processing software [93] Matching of GU data from HPLC profiles [52] Visualization of HPLC 2D maps [94]
URL
Description, Reference
Prediction/analysis of glycosylation positions
Name
No
Table 3.7 (continued)
82 R. Ranzinger et al.
Name
pdb2linucs
pdb-care
glyProt glyVicinity
glyTorsion
CARP
No
67
68
69 70
71
72
Detection of carbohydrate moieties in PDB entries [87] Validation of carbohydrate 3D structures in the PDB [88] In silico glycosylation of proteins [91] Analysis of protein-carbohydrate interaction data in the PDB [82] Analysis of carbohydrate torsion angles in the PDB [82] Ramachandran Plot-like analysis of glycosidic torsions [82]
Description, Reference
www.glycosciences.de/tools/carp/
www.glycosciences.de/tools/glytorsion/
www.glycosciences.de/modeling/glyprot/ www.glycosciences.de/tools/glyvicinity/
www.glycosciences.de/tools/pdb-care/
www.glycosciences.de/tools/pdb2linucs/
URL
Table 3.7 (continued)
3 Bioinformatics Databases and Applications 83
84
R. Ranzinger et al.
3.5.2.2 NMR In contrast to MS, NMR is able to derive the full structure information, including, anomeric configurations and stereochemistry of the monosaccharides. Unfortunately, the requirements for sample amount and purity are much higher than in MS or HPLC. Thus, structures cannot be solved from mixtures. Exemplarily, two software tools should be mentioned here: ProSpectND (52) is a software suite for fully integrated processing of single- up to five-dimensional NMR data. As in commercial software a complete structure assignment from the given data tailored to carbohydrates is achieved. The program is independent from the NMR device as it is able to handle data from various instruments. CASPER (53) [77] calculates chemical shifts of carbohydrates based on the monosaccharides and glycosylation shifts from substitutions. Therefore, it can be used in two directions: with a known carbohydrate structure CASPER can simulate 1 H and 13 C spectra, with an unknown structure CASPER can determine the glycan sequence based on NMR data combined with component and methylation analysis. 3.5.2.3 HPLC In many laboratories high performance liquid chromatography (HPLC) is often reduced to only a purification technique without using the full potential of analytical information provided for single or consecutive HPLC runs. The acquisition of HPLC profiles from entire structures in combination with those of exoglycosidase digestion profiles from the same structure allows the matching of the derived retention times, i.e. GU values and, finally, the prediction of structure candidates. AutoGU (56) [52] in combination with GlycoBase (Dublin) (35) can be used to narrow down the number of possible structures by taking into account the digestion fingerprint of the possible glycans. A final structure assignment of the glycan profile can be achieved if the combination of all footprints matches with the digestion profile-fingerprints.
3.5.3 Prediction and Analysis of Glycosylation Sites The Asn-Xaa-Ser/Thr sequence motif is a necessary, but not a sufficient condition for N-glycosylation, simply because not all potential N-glycosylation sites are occupied [78]. The same principle applies to potential O-glycosylation sites. Various tools are dedicated to estimate the occupancy status of glycosylation sites based on the amino-acid composition in their neighborhood. The applications of the CBS Prediction Servers use artificial neural network technologies to predict posttranslational modifications of proteins. These applications include NetNGlyc (58) for N-glycosylation, NetOGlyc (59) [79] for mucin-type O-glycosylation, NetCGlyc (60) [80] for mammalian C-mannosylation, or YinOYang (61) [81] for O-β-GlcNAc glycosylation. The glycoPred server (62) predicts both N- and O-glycosylation sites from amino acid sequence.
3
Bioinformatics Databases and Applications
85
The occupancy status is not usually given in yes/no statements but as probabilities. One should always keep in mind that the occupancy of a glycosylation site does not only depend on the protein but also on the state of the cell, such as physiological age, health, and tissue, etc. [11]. The same principle applies to the primary structures of glycans that are present at occupied glycosylation sites. The latter is reflected by the KEGG GECS (63) [63] server, which predicts N-glycan structures that might be present in a sample based on information about the glycosyltransferases that are expressed in an organism or tissue. This kind of information can be found in the consortium data of the CFG (3). A statistical analysis of the amino acids in the neighborhood of occupied glycosylation sites extracted from the PDB [30] and SwissProt [46] can be performed with the GlySeq (64) [82] application of the GLYOSCIENCES.de portal.
3.5.4 Carbohydrate-/Glycoprotein-Related 3D Structure Applications Many scientific questions concerning carbohydrates and/or glycoproteins can only be answered properly with the use of 3D structural information. These kinds of data are indispensable, e.g., for molecular dynamics (MD) simulations or docking experiments [83]. There are several tools available that are specifically designed to deal with carbohydrate 3D structures, e.g., for the prediction or the analysis of glycan conformations. Sweet-II (65) [84] and the GLYCAM Biomolecule Builder (66) can be used to convert carbohydrate sequence data into 3D structure models. The results can be obtained in the PDB file format and with the GLYCAM Biomolecule Builder, the structures can also be directly saved as input files for MD simulations with AMBER [85] or CHARMM [86]. The PDB forms another source for 3D structures of glycoproteins or protein-carbohydrate complexes, but the carbohydrates are often difficult to find there. The pdb2linucs software (67) [87] was developed to overcome this problem by implementing an algorithm that can detect carbohydrate moieties within PDB files using only atom coordinates and element types. A systematic analysis of carbohydrates in the PDB, however, revealed a rather high error rate [87]. The PDB Carbohydrate Residue check (pdb-care) tool (68) [88] can help researchers to locate problems within the carbohydrate moieties of a PDB structure. Another problem with glycoproteins in the PDB is the fact that many proteins that are known to be glycosylated in nature do not carry any glycan chains in their PDB 3D structures, and if carbohydrates are attached, they usually represent only a fraction of the naturally occurring glycans [89, 90]. Coordinates of glycoproteins with complete glycan chains to be used, e.g., as starting structures for MD simulations, can be obtained by in silico glycosylation. The glyProt software (69) [91] attaches glycan chains that are generated with Sweet-II (65) to a protein structure, and the GLYCAM Biomolecule Builder (66) also offers an option to link its predicted glycan chains to a given protein. Statistical analyses of carbohydrate data from the PDB can be done with glyVicinity (70) and glyTorsion (71) [82]. The former investigates protein-carbohydrate
86
R. Ranzinger et al.
interactions by evaluating which amino acids are typically found in the spatial vicinity of specific carbohydrate residues, and the latter performs statistics of torsion angles such as glycosidic torsions or omega torsions. Glycosidic torsions of individual PDB structures can be examined with CARP (72) [82], the Carbohydrate Ramachandran Plot, in a similar way to the “classical” Ramachandran Plots as utilized for proteins.
3.6 Conclusions Although considerable progress has been made in recent years, bioinformatics for glycobiology is still lagging behind the genomics and proteomics areas. In particular, the connections between the individual projects are still rather poor, leaving the various resources as disconnected islands. The GlycomeDB project was needed since there is no global agreement on standards for storing carbohydrates and their meta information, making data exchange and comparisons impossible. Therefore, a very important aim for the carbohydrate community is the implementation and usage of standards for storing this information. This will allow the direct exchange of data between the databases and with external applications to break the isolation of the single databases. A first step in this direction has already been done at an National Institute of Health (NIH) workshop in 2006 by the agreement on GLYDE-II [95] as the global exchange format for carbohydrate structures [96].
References 1. Varki A, Cummings RD, Esko JD, Freeze HH, Stanley P, Bertozzi CR, Hart GW, Etzler ME (2009) Essentials of glycobiology, 2nd edn. Cold Spring Harbor Laboratory Press, Plainview, NY 2. Lowe JB, Marth JD (2003) A genetic approach to mammalian glycan function. Annu Rev Biochem 72:643–691 3. Jones J, Krag SS, Betenbaugh MJ (2005) Controlling N-linked glycan site occupancy. Biochim Biophys Acta 1726:121–137 4. van Zuylen CW, Kamerling JP, Vliegenthart JFG (1997) Glycosylation beyond the Asn78-linked GlcNAc residue has a significant enhancing effect on the stability of the alpha subunit of human chorionic gonadotropin. Biochem Biophys Res Commun 232: 117–120 5. Wormald MR, Dwek RA (1999) Glycoproteins: glycan presentation and protein-fold stability. Structure Fold Des 7:R155–R160 6. Shental-Bechor D, Levy Y (2008) Effect of glycosylation on protein folding: a close look at thermodynamic stabilization. Proc Natl Acad Sci USA 105:8256–8261 7. Garner B, Merry AH, Royle L, Harvey DJ, Rudd PM, Thillet J (2001) Structural elucidation of the N- and O-glycans of human apolipoprotein(a): role of o-glycans in conferring protease resistance. J Biol Chem 276:22200–22208 8. Indyk K, Olczak T, Ciuraszkiewicz J, Watorek W, Olczak M (2007) Analysis of individual azurocidin N-glycosylation sites in regard to its secretion by insect cells, susceptibility to proteolysis and antibacterial activity. Acta Biochim Pol 54:567–573 9. Guo Y et al (2004) Structural basis for distinct ligand-binding and targeting properties of the receptors DC-SIGN and DC-SIGNR. Nat Struct Mol Biol 11:591–598
3
Bioinformatics Databases and Applications
87
10. Hart GW, Housley MP, Slawson C (2007) Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins. Nature 446:1017–1022 11. Ohtsubo K, Marth JD (2006) Glycosylation in cellular mechanisms of health and disease. Cell 126:855–867 12. Mitoma J et al (2007) Critical functions of N-glycans in L-selectin-mediated lymphocyte homing and recruitment. Nat Immunol 8:409–418 13. Lau KS, Dennis JW (2008) N-Glycans in cancer progression. Glycobiology 18:750–760 14. Betenbaugh MJ, Tomiya N, Narang S, Hsu JT, Lee YC (2004) Biosynthesis of human-type N-glycans in heterologous systems. Curr Opin Struct Biol 14:601–606 15. Rudd PM, Dwek RA (1997) Glycosylation: heterogeneity and the 3D structure of proteins. Crit Rev Biochem Mol Biol 32:1–100 16. von der Lieth C-W (2009) Glycobiology, glycomics and (bio)informatics. In von der Lieth CW, Lütteke T, Frank M (eds) Bioinformatics for glycobiology and glycomics. An introduction. John Wiley & Sons, Chichester, UK, pp 3–20 17. von der Lieth C-W, Bohne-Lang A, Lohmann KK, Frank M (2004) Bioinformatics for glycomics: status, methods, requirements and perspectives. Brief Bioinform 5: 164–178 18. Lütteke T (2008) Web resources for the glycoscientist. Chembiochem 9:2155–2160 19. von der Lieth CW (2007) Databases and informatics for glycobiology and glycomics. In Kamerling JP (ed) Comprehensive glycoscience. Elsevier, Oxford, pp 329–346 20. Schachter H (2000) The joys of HexNAc. The synthesis and function of N- and O-glycan branches. Glycoconj J 17:465–483 21. Herget S, Toukach P, Ranzinger R, Hull W, Knirel Y, von der Lieth C-W (2008) Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans. BMC Struct Biol 8:35 22. von der Lieth CW, Lütteke T, Frank M (2006) The role of informatics in glycobiology research with special emphasis on automatic interpretation of MS spectra. Biochim Biophys Acta 1760:568–577 23. Kanehisa M et al (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36:D480–D484 24. Hashimoto K, Goto S, Kawano S, Aoki-Kinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M (2006) KEGG as a glycome informatics resource. Glycobiology 16:63R–70R 25. Lütteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth C-W (2006) GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research. Glycobiology 16:71R–81R 26. Raman R, Venkataraman M, Ramakrishnan S, Lang W, Raguram S, Sasisekharan R (2006) Advancing glycomics: implementation strategies at the consortium for functional glycomics. Glycobiology 16:82R–90R 27. Doubet S, Bock K, Smith D, Darvill A, Albersheim P (1989) The complex carbohydrate structure database. Trends Biochem Sci 14:475–477 28. van Kuik JA, Vliegenthart JF (1992) Databases of complex carbohydrates. Trends Biotechnol 10:182–185 29. Loss A, Bunsmann P, Bohne A, Loss A, Schwarzer E, Lang E, von der Lieth C-W (2002) SWEET-DB: an attempt to create annotated data collections for carbohydrates. Nucleic Acids Res 30:405–408 30. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242 31. Lütteke T, von der Lieth CW (2006) The protein data bank (PDB) as a versatile resource for glycobiology and glycomics. Biocatal Biotransformation 24:147–155 32. Toukach FV, Knirel YA (2005) New database of bacterial carbohydrate structures. In Proceedings of the XVIII International Symposium on Glycoconjugates. Florence, Italy, pp 216–217
88
R. Ranzinger et al.
33. Nakahara T, Hashimoto R, Nakagawa H, Monde K, Miura N, Nishimura S-I (2008) Glycoconjugate Data Bank: structures – an annotated glycan structure database and N-glycan primary structure verification service. Nucleic Acids Res 36, D368–D371 34. Stenutz R, Weintraub A, Widmalm G (2006) The structures of Escherichia coli Opolysaccharide antigens. FEMS Microbiol Rev 30:382–403 35. Cooper CA, Joshi HJ, Harrison MJ, Wilkins MR, Packer NH (2003) GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003 update. Nucleic Acids Res 31:511–513 36. Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31: 3784–3788 37. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 37:D233–D238 38. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 39. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763 40. Davies GJ, Gloster TM, Henrissat B (2005) Recent structural insights into the expanding world of carbohydrate-active enzymes. Curr Opin Struct Biol 15:637–645 41. Hashimoto K, Tokimatsu T, Kawano S, Yoshizawa AC, Okuda S, Goto S, Kanehisa M (2009) Comprehensive analysis of glycosyltransferases in eukaryotic genomes for structural and functional characterization of glycans. Carbohydr Res 344:881–887 42. Kikuchi N, Narimatsu H (2006) Bioinformatics for comprehensive finding and analysis of glycosyltransferases. Biochim Biophys Acta 1760:578–583 43. Eisenhaber B, Maurer-Stroh S, Novatchkova M, Schneider G, Eisenhaber F (2003) Enzymes and auxiliary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins. Bioessays 25:367–385 44. Gupta R, Birch H, Rapacki K, Brunak S, Hansen JE (1999) O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res 27:370–372 45. Apweiler R et al (2004) UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32:D115–D119 46. Boeckmann B et al (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370 47. Chang A, Scheer M, Grote A, Schomburg I, Schomburg D (2009) BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009. Nucleic Acids Res 37:D588–D592 48. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2008) GenBank. Nucleic Acids Res 36:D25–D30 49. Frank M, Lütteke T, von der Lieth C-W (2007) GlycoMapsDB: a database of the accessible conformational space of glycosidic linkages. Nucleic Acids Res 35:287–290 50. van Kuik JA, Hard K, Vliegenthart JFG (1992) A 1 H NMR database computer program for the analysis of the primary structure of complex carbohydrates. Carbohydr Res 235:53–68 51. Maes E, Bonachera F, Strecker G, Guerardel Y (2009) SOACS index: an easy NMR-based query for glycan retrieval. Carbohydr Res 344:322–330 52. Campbell MP, Royle L, Radcliffe CM, Dwek RA, Rudd PM (2008) GlycoBase and autoGU: tools for HPLC-based glycan analysis. Bioinformatics 24:1214–1216 53. Ranzinger R, Herget S, Wetter T, von der Lieth C-W (2008) GlycomeDB – integration of open-access carbohydrate structure databases. BMC Bioinformatics 9:384 54. Ranzinger R, Frank M, von der Lieth CW, Herget S (2009) Glycome-DB.org: a portal for querying across the digital world of carbohydrate sequences. Glycobiology 19:1563–1567 55. Kikuchi N, Kameyama A, Nakaya S, Ito H, Sato T, Shikanai T, Takahashi Y, Narimatsu H (2005) The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures. Bioinformatics 21:1717–1718
3
Bioinformatics Databases and Applications
89
56. Bohne-Lang A, Lang E, Förster T, von der Lieth CW (2001) LINUCS: linear notation for unique description of carbohydrate sequences. Carbohydr Res 336:1–11 57. Aoki KF, Yamaguchi A, Ueda N, Akutsu T, Mamitsuka H, Goto S, Kanehisa M (2004) KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Res 32:W267–W272 58. Herget S, Ranzinger R, Maass K, Lieth C-WVD (2008) GlycoCT-a unifying sequence format for carbohydrates. Carbohydr Res 343:2162–2171 59. Toukach P, Joshi HJ, Ranzinger R, Knirel Y, von der Lieth CW (2007) Sharing of worldwide distributed carbohydrate-related digital resources: online connection of the Bacterial Carbohydrate Structure DataBase and GLYCOSCIENCES.de. Nucleic Acids Res 35: D280–D286 60. Wheeler DL, Chappey C, Lash AE, Leipe DD, Madden TL, Schuler GD, Tatusova TA, Rapp BA (2000) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 28:10–14 61. Werz DB, Ranzinger R, Herget S, Adibekian A, von der Lieth C-W, Seeberger PH (2007) Exploring the structural diversity of mammalian carbohydrates (“glycospace”) by statistical databank analysis. ACS Chem Biol 2:685–691 62. Ceroni A, Dell A, Haslam SM (2007) The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source Code Biol Med 2:3 63. Kawano S, Hashimoto K, Miyama T, Goto S, Kanehisa M (2005) Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics 21:3976–3982 64. Ceroni A, Maass K, Geyer H, Geyer R, Dell A, Haslam SM (2008) GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. J Proteome Res 7:1650–1659 65. Wilkinson MD, Links M (2002) BioMOBY: an open source biological web services proposal. Brief Bioinform 3:331–341 66. Wilkinson M (2006) Gbrowse Moby: a Web-based browser for BioMoby Services. Source Code Biol Med 1:4 67. Raman R, Raguram S, Venkataraman G, Paulson JC, Sasisekharan R (2005) Glycomics: an integrated systems approach to structure-function relationships of glycans. Nat Methods 2:817–824 68. Cooper CA, Gasteiger E, Packer NH (2001) GlycoMod – a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics 1:340–349 69. Maass K, Ranzinger R, Geyer H, von der Lieth C-W, Geyer R (2007) “Glyco-peakfinder” – de novo composition analysis of glycoconjugates. Proteomics 7:4435–4444 70. Goldberg D, Sutton-Smith M, Paulson J, Dell A (2005) Automatic annotation of matrixassisted laser desorption/ionization N-glycan spectra. Proteomics 5:865–875 71. Goldberg D, Bern M, Li B, Lebrilla CB (2006) Automatic determination of O-glycan structure from fragmentation spectra. J Proteome Res 5:1429–1434 72. Lohmann KK, von der Lieth C-W (2003) GLYCO-FRAGMENT: A web tool to support the interpretation of mass spectra of complex carbohydrates. Proteomics 3:2028–2035 73. Lohmann KK, von der Lieth C-W (2004) GlycoFragment and GlycoSearchMS: web tools to support the interpretation of mass spectra of complex carbohydrates. Nucleic Acids Res 32:W261–W266 74. Gaucher S, Morrow J, Leary J (2000) STAT: A Saccharide Topology Analysis Tool Used in Combination with Tandem Mass Spectrometry. Anal Chem 72:2331–2336 75. Lapadula A, Hatcher P, Hanneman A, Ashline D, Zhang H, Reinhold V (2005) Congruent strategies for carbohydrate sequencing. 3. OSCAR: an algorithm for assigning oligosaccharide topology from MSn data. Anal Chem 77:6271–6279 76. Ethier M, Saba J, Spearman M, Krokhin O, Butler M, Ens W, Standing K, Perreault H (2003) Application of the StrOligo algorithm for the automated structure assignment of complex Nlinked glycans from glycoproteins using tandem mass spectrometry. Rapid Commun Mass Spectrom 17:2713–2720
90
R. Ranzinger et al.
77. Jansson PE, Stenutz R, Widmalm G (2006) Sequence determination of oligosaccharides and regular polysaccharides using NMR spectroscopy and a novel Web-based version of the computer program CASPER. Carbohydr Res 341:1003–1010 78. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8 79. Julenius K, Molgaard A, Gupta R, Brunak S (2005) Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15:153–164 80. Julenius K (2007) NetCGlyc 1.0: prediction of mammalian C-mannosylation sites. Glycobiology 17:868–876 81. Gupta R, Brunak S (2002) Prediction of glycosylation across the human proteome and the correlation to protein function. Pac Symp Biocomput 310–322 82. Lütteke T, Frank M, von der Lieth C-W (2005) Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Res 33:D242–D246 83. Frank M (2009) Conformational analysis of carbohydrates – a historical overview. In von der Lieth CW, Lütteke T, Frank M (eds) Bioinformatics for glycobiology and glycomics. An introduction. John Wiley & Sons, Chichester, UK, pp 337–357 84. Bohne A, Lang E, von der Lieth CW (1999) SWEET – WWW-based rapid 3D construction of oligo- and polysaccharides. Bioinformatics 15:767–768 85. Case DA et al (2005) The Amber biomolecular simulation programs. J Comput Chem 26:1668–1688 86. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) CHARMM: a program for macromolecular energy, minimisation, and dynamics calculations. J Comput Chem 4:187–217 87. Lütteke T, Frank M, von der Lieth CW (2004) Data mining the protein data bank: automatic detection and assignment of carbohydrate structures. Carbohydr Res 339:1015–1020 88. Lütteke T, von der Lieth C-W (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics 5:69 89. Petrescu AJ, Petrescu SM, Dwek RA, Wormald MR (1999) A statistical analysis of N- and O-glycan linkage conformations from crystallographic data. Glycobiology 9:343–352 90. Lütteke T (2009) Analysis and validation of carbohydrate three-dimensional structures. Acta Crystallogr D Biol Crystallogr 65:156–168 91. Bohne-Lang A, von der Lieth CW (2005) GlyProt: in silico glycosylation of proteins. Nucleic Acids Res 33:W214–W219 92. Loss A, Stenutz R, Schwarzer E, von der Lieth C-W (2006) GlyNest and CASPER: two independent approaches to estimate 1H and 13C NMR shifts of glycans available through a common web-interface. Nucleic Acids Res 34:W733–W737 93. Vranken WF et al (2005) The CCPN data model for NMR spectroscopy: development of a software pipeline. Proteins 59:687–696 94. Takahashi N, Kato K (2003) GALAXY(Glycoanalysis by the Three Axes of MS and Chromatography):a web application that assists structural analyses of N-glycans. Trends Glycosci Glycotechnol 15:235–251 95. Sahoo SS, Thomas C, Sheth A, Henson C, York WS (2005) GLYDE-an expressive XML standard for the representation of glycan structure. Carbohydr Res 340:2802–2807 96. Packer NH et al (2008) Frontiers in glycomics: bioinformatics and biomarkers in disease. An NIH white paper prepared from discussions by the focus groups at a workshop on the NIH campus, Bethesda MD (September 11–13, 2006). Proteomics 8:8–20
Chapter 4
Lectin Microarrays: Simple Tools for the Analysis of Complex Glycans Lakshmi Krishnamoorthy and Lara K. Mahal
Abstract The emerging roles for post-translational modifications in the regulation of cellular function have turned the spotlight on glycosylation. Given the prevalence of protein and lipid glycosylation, it has become imperative to create and utilize new tools to study these critical biopolymers. In particular, there has been an emphasis on the development of high-throughput methodologies to study the structural and functional aspects of glycan-protein interactions. The use of carbohydrate binding proteins (i.e. lectins) in a microarray format has greatly enhanced our ability to de-convolute the structural aspects of the glycome. This simple and unique technology provides a rapid method for glycomic analysis, which opens up the field of glycobiology to more systems-based approaches towards function. Abbreviations CHO DC-SIGN Fuc Gal GlcNAc HIV LacNAc NHS
Chinese hamster ovary dendritic cell-specific intercellular adhesion molecule-3-grabbing non-integrin fucose galactose N-acetylglucosamine human immunodeficiency virus N-acetyllactosamine N-hydroxysuccinimidyl
Contents 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Lectins in Carbohydrate Analysis . . . . . . . . . . . . . . . . . . . . . . . 4.3 Development of Lectin Microarray Technology . . . . . . . . . . . . . . . . .
92 92 93
L.K. Mahal (B) Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA [email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_4, C Springer Science+Business Media B.V. 2011
91
92
L. Krishnamoorthy and L.K. Mahal
4.4 Analysis of Complex Mixtures with Lectin Microarrays . . . . . . . . . . . . . 4.4.1 Analysis of Whole Cells . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Analysis of Isolated Glycoprotein Fractions . . . . . . . . . . . . . . . 4.5 Comparative Glycomics of Human Immunodeficiency Virus (HIV) and Microvesicles: An Illustration of the Power of Lectin Microarrays for Glycomic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Advantages and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94 94 95
98 99 99 100
4.1 Introduction Glycosylation of proteins and lipids is known to influence their localization, cellular interactions and function [1, 2]. Glycans can impact a wide variety of biological processes ranging from cell-cell communication, development and differentiation to infection by viral and bacterial pathogens [3–6]. In spite of the ubiquitous nature of carbohydrate residues, the study of this unique class of post-translational modifications has been hindered by the structural complexity of carbohydrates. Multiple epimeric monomers and a multitude of possible linkages and branching patterns are some of the key analytical problems associated with unraveling the structure of glycans [3, 7]. Although the theoretical number of distinct glycan epitopes in a biological system is astronomical, a recent review has theorized that the number of glycan determinants, i.e. unique glycan epitopes sufficient for recognition by a glycan binding protein, is ∼7000 with ∼4000 of these belonging to glycosylaminoglycans [8]. Along with glycan-specific antibodies, glycan-binding proteins (i.e. lectins) have proven to be valuable tools in carbohydrate structural analysis [9– 11]. Recently the power of lectins for glycan analysis has been harnessed for the rapid evaluation of glycosylation through the creation of lectin microarray technology. This review provides an overview of the development and applications of lectin microarrays.
4.2 Lectins in Carbohydrate Analysis Lectins are carbohydrate-binding proteins which are found in a many organisms ranging from bacteria to mammals. They can bind terminal or internal residues in an oligosaccharide and provide a rich source of probes for carbohydrate analysis. Lectins recognize a diverse range of sugars from simple epitopes such as terminal N-acetylglucosamine (GlcNAc) to more complex structures such as Blood Group H (Fucose (Fuc) α-1, 2-Galactose (Gal) β-1,4-GlcNAc). Although lectins were initially used exclusively in agglutination assays, they are currently used extensively in histology, fluorescent microscopy, and lectin blots [11–13]. Lectins immobilized on a solid support are widely used for the isolation of glycoprotein sub-populations
4
Lectin Microarrays
93
from complex mixtures, such as serum and tissue extracts, which are then subjected to mass spectrometry or other forms of analysis [14, 15]. In comparison to antibody-antigen binding interactions, monovalent lectin-glycan interactions are considered low affinity (μM versus nM for antibodies) and are consequently harder to detect. However in a physiologically relevant context, multimeric lectins often interact with multiple glycans contributing to a high apparent binding affinity [3]. Thus, the high spatial density of lectin immobilized in an array format provides an ideal platform for the rapid detection of glycan epitopes. Given that there are ∼100 commercially available lectins, the simultaneous probing of a given sample on a lectin array provides a quick method to survey the glycome.
4.3 Development of Lectin Microarray Technology First fabricated in 2005, lectin microarrays have greatly enhanced our ability to rapidly gather glycomic information. In a typical lectin microarray, lectins are immobilized onto a solid support and interrogated with fluorescently labeled samples. An early example of this technology came from Angeloni and colleagues who utilized a biotin-avidin immobilization strategy to couple a limited panel of biotinylated lectins to a glass slide. The binding of a selection of fluorescently labeled glycoconjugates to this array was successfully detected [16]. Initial work from our laboratory made use of both aldehyde and epoxide glass slides to covalently immobilize underivatized lectins in a microarray format. A small panel of nine lectins with well characterized specificities was used to examine glycans on several fluorescently-labeled glycoproteins including Cy3-ovalbumin and -porcine mucin. A conventional fluorescence-based microarray slide scanner was used in the detection of these arrays. The lower limit of detection of the Cy-labeled glycoproteins in these studies was 10 μg/mL. As expected, both ovalbumin and porcine mucin exhibited differential binding to the lectin panel. The specificity of the lectin array was demonstrated using inhibition experiments with a small series of carbohydrates. These studies confirmed the validity and the detection capabilities of this emerging technology [17, 18]. Since its initial development, lectin microarray technology in our laboratory has evolved to encompass ∼80 lectins including commercial lectins, recombinant algal lectins and bacterial adhesins. In related work, Hirabayashi and colleagues employed a 39 lectin panel to examine glycans on selected glycoproteins. Although the microarray technology was essentially similar to previous work (epoxy-coated glass slides, Cy-labeled glycoproteins), the detection system they utilized was unique. Binding of the fluorescent glycoproteins was detected using an evanescent-wave based scanner. These scanners exhibit high sensitivity at the glass slide surface, enabling the detection of lectin-glycan binding without rinsing the slide. Using this system, they could detect 50 ng/mL of Cy-labeled glycoprotein, highlighting the sensitivity of this system. Enzyme-mediated deglycosylation of the samples was used as a negative control to validate the specificity of interactions observed using this array format [19]. Further optimizations of this lectin array platform have now lowered the detection limit to
94
L. Krishnamoorthy and L.K. Mahal
100 pg/mL [20]. Other groups have modified the arrays to optimize glycoprotein analysis. One example of this is the piezo-electric based deposition of the sample directly onto the lectin array. This methodology is potentially useful in cases where the amounts of samples are very limited, as it eliminates the need to incubate the entire array with the sample of interest [21]. Commercial lectin array kits based on the above technology include the Qproteome GlycoArrayTM kit [22] and the LecChipTM lectin microarrays from GP Biosciences.
4.4 Analysis of Complex Mixtures with Lectin Microarrays Lectin microarrays bring to the field of glycobiology a high-throughput method to gather glycomic data, enabling systems level analysis similar to that which has revolutionized the field of genomics. The rapid comparative analysis of the glycome enables us to more closely examine the plethora of roles that glycans play in mammalian systems by pinpointing which glycans are important and consistent within a statistical number of samples.
4.4.1 Analysis of Whole Cells Preliminary efforts to apply lectin microarray technology to more complex sample mixtures, such as cells and cell lysates, initially focused on whole cell technologies. In early work, Smith and co-workers used a panel of six lectins to examine the binding of whole mammalian cells to a macroarray. Their array was fabricated on a gold substrate overlaid with a self-assembled monolayer of N-hydroxy succinimidyl (NHS) ester derivatives followed by lectin coupling to the slide. Fluorescently labeled whole mammalian cells were incubated with the array and binding was evaluated using a fluorescent microscope. In spite of the limited number of epitopes that can be detected with this small panel, it was sufficient to distinguish Caco-2 (human) from BHK-21 (hamster) cells based on their glycan composition [23]. A similar technique using evanescent wave technology and a lectin microarray has been utilized by Hirabayashi and coworkers to examine Chinese Hamster Ovary (CHO) cell mutants [24]. Additionally, Schneck and colleagues used an expanded 94 lectin panel to examine glycosylation changes in a series of 24 mammalian cell lines using a whole cell approach. With this system they were able to observe glycomic changes exhibited by differentiating T lymphocytes [25]. Our laboratory has utilized whole cell assays to study bacterial glycosylation of E. coli and other pathogenic bacteria. Dynamic changes in carbohydrate composition reflective of different growth phases of a pathogenic E. coli strain were detected [26, 27]. This highlights the potential of lectin microarrays for whole cell glycomics. The examination of glycans in the context of the whole cell has several advantages and some serious pitfalls. The whole cell approach ensures that the glycans interacting with the lectin array are on the cell surface. In addition, the presentation
4
Lectin Microarrays
95
of glycans in the whole cell context is the most relevant for key biological processes in which these glycans must be recognized. These are clear advantages. On the other hand, this format requires cells to be well-behaved, i.e. to maintain their cellular integrity, and to have limited non-specific adhesion. This format is more amenable for non-adherent cells and bacteria, which do not naturally adhere to a substrate and typically have spherical morphology. However, it can be difficult to utilize with adherent cells. First, the study of adherent cells requires that they be detached intact from their substrate. Typically this is accomplished through the use of trypsin, a protease commonly utilized in cell culture. Indeed all of the previous detailed studies used this method to obtain suspensions of adherent cell lines. However, trypsin has been shown to bind specific carbohydrates and thus may bias the glycan pool observed [28]. Second, some adherent cells, such as neurons, cannot be examined using this method as they are too delicate to withstand the detachment and wash processes required in these assays. The necessity to be gentle during the washing steps also presents a technical challenge to the utilization of whole cells for this format as mitigating non-specific adhesion often requires harsher conditions than can be employed with intact cells [29]. Other disadvantages to this method come from technical concerns about dynamic range. To get an appropriate dynamic range for mammalian cells requires a relatively large spot size, since mammalian cells average between 10 and 40 μm in diameter, dependent on the cell type [30]. Note, this is not such an issue with bacteria, which on average are 1 μm in size. The work by Schneck and colleagues utilized an array with a 125 μm spot size, which can accommodate between 10 and 144 cells per spot. This presented problems in their work, especially for the larger adherent cell lines, and caused them to bin cells as binding or non-binding, rather than allowing them to examine relative levels of binding as would be allowed with a greater dynamic range [25]. In contrast, the work by Hirabayashi and colleagues utilized a 500 μm spot size, accommodating ∼160 and 2500 cells. This gave them far better statistical comparisons between binders, allowing them to examine more closely binding levels [24]. However, the higher spot size decreases the number of arrays that can be printed per slide, resulting in a more expensive experimental setup. In addition, the dynamic range radically changes between the binding of a typical blood cell line (10 μm diameter) and a typical fibroblast (30 μm), limiting this approach as a cross-comparative method between cell types as the level of fluorescence does not directly correlate to the absolute number of cells bound. In general, based on these concerns, whole cell assays are better suited towards the gross examination of changes in the glycosylation of non-adherent cell types.
4.4.2 Analysis of Isolated Glycoprotein Fractions An alternative approach to intact cell analysis is to isolate either cellular glycoproteins or membranes and study their glycosylation [31, 32]. In recent work, Hirabayshi and colleagues used a clever strategy to extract glycoproteins from
96
L. Krishnamoorthy and L.K. Mahal
formalin-fixed commercial tissue arrays. The extracted glycoproteins were representative of ∼500 cells, allowing them to study the glycomic changes in colon cancer compared to normal colon tissues from minimal material [33]. However, lysate-based analysis of the glycome loses some glycomic information, specifically the glycolipid component. In our work, this component is maintained as we isolate and analyze cellular membranes rather than a detergent lysate. By mechanically dissociating cells and disrupting cell membranes by sonication we create micellae that are representative of the cell surface [31]. In recent unpublished work we have demonstrated that these micellae are most likely vesicular in nature and maintain their bilayer structure. Thus, glycans on both glycoproteins and glycolipids in a more native context are preserved during this procedure. Our initial studies comparing complex samples identified a major flaw in the standard single color assays that permeate the lectin microarray field. Using a single color approach, more subtle differences in binding to lectins could not be observed using a single sample concentration. This is due to the fact that saturation of lectin binding can occur and it is difficult, especially for unknown samples, to ensure that all lectin binding will be in the linear range for a sample. In addition, the lack of internal calibration, which none of the single color experiments to date contain, prevents any true comparison between the levels of glycosylation based on the fluorescence signal from disparate samples [34]. In DNA microarrays, this issue is mitigated through the use of housekeeping genes to standardize signal content. This is not possible for glycomic analysis. Therefore, to enable direct comparisons between complex samples, our laboratory has adopted a methodology for comparative analysis from the early DNA microarray work, i.e. competitive binding. This approach is based on the competition between a common biological reference and the sample of interest for lectin binding. The cyanine-based dyes, Cy3- and Cy5-, are typically used as spectrally resolved orthogonal fluorophores. In a typical experiment, two arrays are run with the sample and reference bearing each dye in turn (i.e. a dye-swapped pair). For analysis, the log2 ratio of the average fluorescence intensities from the two arrays is calculated for a given lectin. This ratiometric approach allows us to compare samples hybridized on different days and across multiple prints, minimizing issues associated with dye bias and changes in lectin activity. This method was first calibrated using the Lec mutants as a model system and was further extended to the examination of changes in glycosylation accompanying the differentiation of HL-60 cells [31]. In this work we found that single color is useful for establishing general glycopatterns (Fig. 4.1), while dual-color is optimal for comparative analysis (Fig. 4.2). A different mode of analysis using lectins to examine the glycosylation of a specific glycoprotein in a complex mixture using microarray technology comes from work by Haab and colleagues. In this work, antibody arrays were used to immobilize specific proteins from serum onto a solid support. The array was subsequently probed with individual lectins to examine glycan changes in specific proteins. Given that antibodies are heavily glycosylated, chemical derivatization of the glycans on the antibodies was required to prevent the lectins from binding to the antibodies themselves. This approach has been used to examine changes in the glycosylation
4
Lectin Microarrays
97
A.
≡
Bind to lectin microarray Cy-labeled sample
Lectin microarray
Glycopattern
B. SNA, PSL
α 2,6
GNA
β 1,4 β 1,2
α 1,3
α 1,3
α 1,6
NPA, HHL
α 1,6
β 1,4 β 1,4
Fig. 4.1 Schematic of a typical lectin microarray single color experiment. Fluorescently labeled samples are hybridized to the array. The arrays are rinsed and scanned to obtain a lectin binding profile (glycopattern). (b) Interpretation of the glycopattern. The structure of a typical N-glycan motif found in a mammalian system is shown. The monosaccharides are depicted as follows: mannose (green), galactose (yellow), GlcNAc (blue) and sialic acid (pink). The glycopattern is interpreted by examining the binding specificities of the lectins responsible for the array signature. In this example, we show how binding from lectins specific for α-2, 6-sialic acid (SNA and PSL), α-1, 6-mannose (NPA and HHL) and α-1, 3-mannose (GNA) can be used to elucidate glycan structures
Fig. 4.2 Ratiometric lectin microarray approach. Cellular membranes are isolated from cells and labeled with fluorescent dyes. Orthogonally labeled biological reference and sample are mixed and hybridized on the lectin microarray. Competitive binding between the two analytes is analyzed to obtain ratiometric data, thus enabling comparative glycomics. Figure is adapted with permission from [31]. Copyright 2007 National Academy of Sciences, U.S.A.
Cell A
Cell B
1.Lyse Cells 2.Isolate Membranes 3.Conjugate Cy3
1.Lyse Cells 2.Isolate Membranes 3.Conjugate Cy5
1. Mix Samples 2. Bind to Array 3. Rinse and Scan
98
L. Krishnamoorthy and L.K. Mahal
of cancer bio-marker proteins such as MUC1 and CEA (carcinoembryonic antigen) [35]. As a corollary to this approach, Hirabayashi and colleagues used their lectin microarray to examine glycosylation of individual proteins from a complex mixture using biotinylated antibodies to pull down proteins of interest. The immuno-isolated complexes were then directly examined on the lectin microarray. Bound glycoproteins were detected using a fluorescently labeled streptavidin conjugate [36]. These examples highlight the ability of the lectin microarray to track dynamic glycan modulations in targeted glycoproteins.
4.5 Comparative Glycomics of Human Immunodeficiency Virus (HIV) and Microvesicles: An Illustration of the Power of Lectin Microarrays for Glycomic Analysis The role of glycans in HIV biology has generated a great deal of interest in terms of both prophylaxis and therapeutics. The glycosylation of HIV affects the tropism of entry, for example the enhanced infection of dendritic cells through interactions with the lectin dendritic cell-specific intercellular adhesion molecule-3-grabbing non-integrin (DC-SIGN) [37]. Analysis of HIV glycosylation has been limited to the heavily glycosylated viral envelope glycoprotein, gp120. This work ignored the rest of the glycoproteins at the viral surface that originate from the host membrane. In recent work, we utilized our lectin microarray approach to examine the glycans on the whole viral particle, thus evaluating both the glycans on the viral surface that emanate from gp120 and host derived components of this enveloped retrovirus [38]. Glycosylation is a hallmark of protein trafficking. Given this, one area in HIV-1 biology where glycosylation of the particle may shed some light is on the mode of exit for HIV-1 from T-cells. A controversial hypothesis in this area is that the virus could utilize a specific cellular mechanism for the budding of small membrane vesicles (called exosomes or microvesicles) to exit from the cell surface [39, 40]. To further examine the validity of this hypothesis, we utilized the lectin microarrays for a systems-based glycomic analysis of HIV-1, microvesicles and parent T-cell membranes. Using both single and dual color approaches we demonstrated that the glycome of HIV-1 closely resembles that of the microvesicles, providing a unique perspective on this debate. Our analysis of microvesicles and HIV-1 derived from a panel of T-cells also pointed out subtle cell line dependent glycan changes in both particles suggesting that glycosylation changes via propagation through different cells could be a mechanism to escape the immune system. Surprisingly, the conserved glycome of the particles was different from that of the host plasma membrane implying that these particles emerge from glycan enriched microdomains. Fluorescence microscopy confirmed the presence of such microdomains, as predicted from the lectin microarray data. Beyond the discovery of these glycan-delineated microdomains as the biogenesis site of microvesicles and HIV-1, closer examination of our data also revealed unexpected insights into both the therapeutic potential of glycosylation as a target for HIV-1 therapeutics and lectin specificity. High mannose ligands were enriched
4
Lectin Microarrays
99
in both microvesicles and HIV-1 in comparison to the plasma membrane. Given that high mannose epitopes are a key therapeutic target in antiviral therapy, the data calls into question the potential utility of this approach [41]. Our lectin microarray analysis also revealed that galectin-1, an immune lectin, which was presumed to bind HIV-1 through LacNAc (N-acetyllactosamine) motifs, could potentially recognize and bind high mannose epitopes of HIV-1. Galectin-1 clustered closely with known antiviral lectins instead of other LacNAc binding lectins and was inhibited with mannose to a greater degree than with lactose [38]. This implies that there may be a secondary binding motif in HIV-1 galectin interactions. In general, this work highlights the power of the lectin microarray approach to shed light upon the biology of glycans in a complex system. This level of systems analysis has been prohibited in the past by the difficulties inherent within the analytical methods.
4.6 Advantages and Limitations The advantages afforded by the lectin microarray system include the ability to rapidly analyze glycan epitopes in multiple samples simultaneously, simplicity of both the fabrication and methodology for the technique, the ability to gather specific epitope and linkage information from overlapping lectin specificities and the small sample requirements for a fairly comprehensive analysis. One important drawback of the arrays is that they are constrained by the epitopes observed by the arrayed lectins. In other words, given that there are over 3000 glycan epitopes in the mammalian glycome, the lectins currently used in the array format are probably not sufficient to examine the full of spectrum glycans present [8]. Efforts to address this issue include discovery and evolution of new lectins as well as detailed characterization of glycan binding partners of known lectins [14, 42, 43]. The other major concern with the use of lectin microarrays is that most plant lectins are glycosylated. This could be problematic if the sample of interest contains endogenous lectins which may then bind sugars on arrayed lectins. We have recently created a lectin microarray platform based on non-glycosylated, recombinant bacterial adhesins. Glycans from cancer cells derived from different tissues were easily distinguishable even with a small panel of recombinant lectins, highlighting the utility of this approach [44]. Non-lectin based carbohydrate binders such as aptamers [45, 46] and peptido borono lectins (PBLs) [47] could also be used in conjunction with lectins in an array format to further diversify the glycans observed.
4.7 Conclusions Lectin microarrays bring to the field of glycomics the ability to do systems-level analysis in a simple and approachable manner. This technology has the possibility of bringing glycomics to the forefront of biology, enabling us to query the role glycosylation plays in a myriad of biological systems. By allowing unbiased analysis,
100
L. Krishnamoorthy and L.K. Mahal
which is not dependent on pre-conceptions of what glycans are involved in a process, this technique allows us to discover which motifs are important in a biological context. This allows for new discoveries in both basic research, as exemplified by our work on HIV-1 and microvesicles, and in clinical applications, where glycanbased biomarkers are being sought for a variety of medical conditions. Ongoing endeavours targeted at improving the resolving capacity of these arrays continue to strengthen their impact within the scientific community. Acknowledgements L.K.M would like to acknowledge the Alfred P. Sloan foundation for funding.
References 1. Huet G, Gouyer V, Delacour D, Richet C, Zanetta JP, Delannoy P, Degand P (2003) Involvement of glycosylation in the intracellular trafficking of glycoproteins in polarized epithelial cells. Biochimie 85:323–330 2. Helenius A, Aebi M (2001) Intracellular functions of N-linked glycans. Science 291:2364– 2369 3. Varki A, Cummings RD, Esko JE, Freeze HH, Stanley P, Bertozzi CR, Hart GW, Etzler ME (2008) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, New York 4. Ohtsubo K, Marth JD (2006) Glycosylation in cellular mechanisms of health and disease. Cell 126:855–867 5. Haltiwanger RS, Lowe JB (2004) Role of glycosylation in development. Annu Rev Biochem 73:491–537 6. Sharon N (2006) Carbohydrates as future anti-adhesion drugs for infectious diseases. Biochim Biophys Acta 1760:527–537 7. Mahal LK (2008) Glycomics: towards bioinformatic approaches to understanding glycosylation. Anticancer Agents Med Chem 8:37–51 8. Cummings RD (2009) The repertoire of glycan determinants in the human glycome. Mol Biosyst 5:1087–1104 9. Rudiger H, Gabius HJ (2001) Plant lectins: occurrence, biochemistry, functions and applications. Glycoconj J 18:589–613 10. Gemeiner P, Mislovicova D, Tkac J, Svitel J, Patoprsty V, Hrabarova E, Kogan G, Kozar T (2009) Lectinomics II. A highway to biomedical/clinical diagnostics. Biotechnol Adv 27:1–15 11. Sharon N (2007) Lectins: carbohydrate-specific reagents and biological recognition molecules. J Biol Chem 282:2753–2764 12. Misloviˇcová D, Gemeiner P, Kozarova A, Kozar T (2009) Lectinomics I. Relevance of exogenous plant lectins in biomedical diagnostics. Biologia 64:1–19 13. Wu AM, Lisowska E, Duk M, Yang Z (2008) Lectins as tools in glycoconjugate research. Glycoconj J 26:899–913 14. Hirabayashi J (2008) Concept, strategy and realization of lectin-based glycan profiling. J Biochem 144:139–147 15. Geyer H, Geyer R (2006) Strategies for analysis of glycoprotein glycosylation. Biochim Biophys Acta 1764:1853–1869 16. Angeloni S, Ridet JL, Kusy N, Gao H, Crevoisier F, Guinchard S, Kochhar S, Sigrist H, Sprenger N (2005) Glycoprofiling with micro-arrays of glycoconjugates and lectins. Glycobiology 15:31–41 17. Pilobello KT, Krishnamoorthy L, Slawek D, Mahal LK (2005) Development of a lectin microarray for the rapid analysis of protein glycopatterns. Chembiochem 6:985–989 18. Pilobello KT, Mahal LK (2007) Lectin microarrays for glycoprotein analysis. Methods Mol Biol 385:193–203
4
Lectin Microarrays
101
19. Kuno A, Uchiyama N, Koseki-Kuno S, Ebe Y, Takashima S, Yamada M, Hirabayashi J (2005) Evanescent-field fluorescence-assisted lectin microarray: a new strategy for glycan profiling. Nat Methods 2:851–856 20. Uchiyama N, Kuno A, Tateno H, Kubo Y, Mizuno M, Noguchi M, Hirabayashi J (2008) Optimization of evanescent-field fluorescence-assisted lectin microarray for high-sensitivity detection of monovalent oligosaccharides and glycoproteins. Proteomics 8:3042–3050 21. Nagaraj VJ, Eaton S, Thirstrup D, Wiktor P (2008) Piezoelectric printing and probing of Lectin NanoProbeArrays for glycosylation analysis. Biochem Biophys Res Commun 375:526–530 22. Rosenfeld R, Bangio H, Gerwig GJ, Rosenberg R, Aloni R, Cohen Y, Amor Y, Plaschkes I, Kamerling JP, Maya RB (2007) A lectin array-based methodology for the analysis of protein glycosylation. J Biochem Biophys Methods 70:415–426 23. Zheng T, Peelen D, Smith LM (2005) Lectin arrays for profiling cell surface carbohydrate expression. J Am Chem Soc 127:9982–9983 24. Tateno H, Uchiyama N, Kuno A, Togayachi A, Sato T, Narimatsu H, Hirabayashi J (2007) A novel strategy for mammalian cell surface glycome profiling using lectin microarray. Glycobiology 17:1138–1146 25. Tao SC, Li Y, Zhou J, Qian J, Schnaar RL, Zhang Y, Goldstein IJ, Zhu H, Schneck JP (2008) Lectin microarrays identify cell-specific and functionally significant cell surface glycan markers. Glycobiology 18:761–769 26. Hsu KL, Pilobello KT, Mahal LK (2006) Analyzing the dynamic bacterial glycome with a lectin microarray approach. Nat Chem Biol 2:153–157 27. Hsu KL, Mahal LK (2006) A lectin microarray approach for the rapid analysis of bacterial glycans. Nat Protoc 1:543–549 28. Takekawa H, Ina C, Sato R, Toma K, Ogawa H (2006) Novel carbohydrate-binding activity of pancreatic trypsins to N-linked glycans of glycoproteins. J Biol Chem 281:8528–8538 29. Nimrichter L, Gargir A, Gortler M, Altstock RT, Shtevi A, Weisshaus O, Fire E, Dotan N, Schnaar RL (2004) Intact cell adhesion to glycan microarrays. Glycobiology 14:197–203 30. Adams RB, Voelker WH, Gregg EC (1967) Electrical counting and sizing of mammalian cells in suspension: an experimental evaluation. Phys Med Biol 12:79–92 31. Pilobello KT, Slawek DE, Mahal LK (2007) A ratiometric lectin microarray approach to analysis of the dynamic mammalian glycome. Proc Natl Acad Sci U S A 104: 11534–11539 32. Ebe Y, Kuno A, Uchiyama N, Koseki-Kuno S, Yamada M, Sato T, Narimatsu H, Hirabayashi J (2006) Application of lectin microarray to crude samples: differential glycan profiling of lec mutants. J Biochem 139:323–327 33. Matsuda A, Kuno A, Ishida H, Kawamoto T, Shoda J and Hirabayashi J (2008) Development of an all-in-one technology for glycan profiling targeting formalin-embedded tissue sections. Biochem Biophys Res Commun 370:259–263 34. Hamelinck D, Zhou H, Li L, Verweij C, Dillon D, Feng Z, Costa J, Haab BB (2005) Optimized normalization for antibody microarrays and application to serum-protein profiling. Mol Cell Proteomics 4:773–784 35. Chen S, LaRoche T, Hamelinck D, Bergsma D, Brenner D, Simeone D, Brand RE, Haab BB (2007) Multiplexed analysis of glycan variation on native proteins captured by antibody microarrays. Nat Methods 4:437–444 36. Kuno A, Kato Y, Matsuda A, Kaneko MK, Ito H, Amano K, Chiba Y, Narimatsu H, Hirabayashi J (2009) Focused differential glycan analysis with the platform antibody-assisted lectin profiling for glycan-related biomarker verification. Mol Cell Proteomics 8:99–108 37. Wu L, KewalRamani VN (2006) Dendritic-cell interactions with HIV: infection and viral dissemination. Nat Rev Immunol 6:859–868 38. Krishnamoorthy L, Bess JW, Jr., Preston AB, Nagashima K, Mahal LK (2009) HIV-1 and microvesicles from T cells share a common glycome, arguing for a common origin. Nat Chem Biol
102
L. Krishnamoorthy and L.K. Mahal
39. Booth AM, Fang Y, Fallon JK, Yang JM, Hildreth JE, Gould SJ (2006) Exosomes and HIV Gag bud from endosome-like domains of the T cell plasma membrane. J Cell Biol 172: 923–935 40. Chan R, Uchil PD, Jin J, Shui G, Ott DE, Mothes W, Wenk MR (2008) Retroviruses human immunodeficiency virus and murine leukemia virus are enriched in phosphoinositides. J Virol 82:11228–11238 41. Scanlan CN, Offer J, Zitzmann N, Dwek RA (2007) Exploiting the defensive sugars of HIV-1 for drug and vaccine design. Nature 446:1038–1045 42. Raman R, Venkataraman M, Ramakrishnan S, Lang W, Raguram S, Sasisekharan R (2006) Advancing glycomics: implementation strategies at the consortium for functional glycomics. Glycobiology 16:82R–90R 43. Yabe R, Suzuki R, Kuno A, Fujimoto Z, Jigami Y, Hirabayashi J (2007) Tailoring a novel sialic acid-binding lectin from a ricin-B chain-like galactose-binding protein by natural evolutionmimicry. J Biochem 141:389–399 44. Hsu KL, Gildersleeve JC, Mahal LK (2008) A simple strategy for the creation of a recombinant lectin microarray. Mol Biosyst 4:654–662 45. Jeong S, Eom T, Kim S, Lee S, Yu J (2001) In vitro selection of the RNA aptamer against the Sialyl Lewis X and its inhibition of the cell adhesion. Biochem Biophys Res Commun 281:237–243 46. Li M, Lin N, Huang Z, Du L, Altier C, Fang H, Wang B (2008) Selecting aptamers for a glycoprotein through the incorporation of the boronic acid moiety. J Am Chem Soc 130:12636–12638 47. Zou Y, Broughton DL, Bicker KL, Thompson PR, Lavigne JJ (2007) Peptide borono lectins (PBLs): a new tool for glycomics and cancer diagnostics. Chembiochem 8:2048–2051
Chapter 5
The Application of High Throughput Mass Spectrometry to the Analysis of Glycoproteins Sasha Singh, Morten Thaysen Andersen, and Judith Jebanathirajah Steen
Abstract Mass spectrometry has significantly contributed to the advancement of glycoprotein research. In combination with conventional glycan purification strategies, high throughput studies become practical, yielding a significant amount of detail regarding the glycoproteome. In this chapter, we provide an overview of commonly employed glycoprotein enrichment strategies which exploit the unique chemical features of glycans for their purification, and introduce the basic concepts of mass spectrometric methods as they apply to glycan identification and characterization. Landmark studies in the enrichment workflows and mass spectrometric analysis of glycans are summarized, as well as the major challenges that face glycoproteomic research. Keywords Glycoproteomics · Mass spectrometry · High throughput · Glycopeptides · Glycan enrichment techniques Abbreviations bPPCR CID ECD ESI ETD FAB FTICR Gal Gal6 GalNAc Gal-T Glc GlcNAc
Biotinylated phosphine capture reagent Collision induced dissociation Electron capture dissociation Electrospray ionization Electron transfer dissociation Fast atom bombardment Fourier transform ion cyclotron Galactose Galectin LEC6 α-N-acetylgalactosamine β-1,4 galactosyl transferase Glucose N-acetylglucosamine
J.J. Steen (B) F.M. Kirby Center for Neurobiology, Proteomics Center at Children’s Hospital Boston, Boston, MA 02115, USA e-mail: [email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_5, C Springer Science+Business Media B.V. 2011
103
104
HA HCD Hex HILIC IRMPD IT LC MALDI Man MS MSn NeuGc PAS PNGase F PTM QIT QqQ Q-TOF SILAC SLAC TAG TOF
S. Singh et al.
Hemagglutinin Higher-energy collision dissociation Hexose Hydrophilic interaction chromatography Infrared multiphoton dissociation Ion trap Liquid chromatography Matrix assisted laser desorption/ionization Mannose Mass spectrometry Multi-stage mass spectrometry N-glycolylneuraminic acid Periodate-acid Shiff coupled affinity chromatography peptide-N-glycosidase F Post-translational modification Quadrupole ion trap Triple quadrupole Quadrupole time of flight Stable isotope labelling with amino acids in cell culture Serial lectin affinity chromatography Tagging-via-substrate affinity Time of flight
Contents 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Introduction to Glycoproteomics and MS . . . . . . . . . . . . . . . . 5.1.2 Challenges in Glycoproteomics . . . . . . . . . . . . . . . . . . . . . 5.2 Glycopeptide Enrichment Techniques in Glycoproteomics . . . . . . . . . . . . 5.2.1 Glycan-Modifying Enrichment Techniques . . . . . . . . . . . . . . . 5.2.2 Glycan-Non-modifying Enrichment Techniques . . . . . . . . . . . . . 5.3 Structural Characterization of Protein Glycosylation Using MS . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 MS Instrumentation for Glycoproteomics . . . . . . . . . . . . . . . . 5.3.2 Structural Characterization of Glycopeptides . . . . . . . . . . . . . . . 5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
104 104 105 106 107 111 113 113 116 119 120
5.1 Introduction 5.1.1 Introduction to Glycoproteomics and MS Glycoproteomics is the systematic analysis of the glycoproteome including the identification of glycoproteins and their abundance, sites of glycosylation, and the characterization of the attached glycans. For the analysis of other
5
The Application of High Throughput Mass Spectrometry
105
posttranslational modifications (PTMs) such as phosphorylations and acetylations, mass spectrometry (MS) is the most suitable analytical tool to tackle the challenges associated with the analysis of protein glycosylation. Past and on-going advancements in qualitative and quantitative MS have, in essence, advanced the field of glycoproteomics. In turn, the challenges related to glycoproteomics have influenced the development of mass spectrometers with higher performance and optimized workflows, as the inherent complexity of this family of PTM requires tailored or specific needs for its analysis. In contrast to proteins, which are primary gene products, carbohydrate structures are produced by template-less mechanisms as secondary gene products. In a process that changes with the physiological status of the cell, glycosylation enzymes function in concert to generate a variety of related glycan structures called glycoforms. The glycoforms generate a gradient of biological activity by modulating the function of the carrier proteins or eliciting independent functions. Known glycan functions include the mediation of protein folding and turnover, cell-cell recognition and interactions, and inter- and intra-cellular trafficking or signaling [1, 2]. It has been estimated that as much as 50% of the human proteome is glycosylated [3] and these glycosylations occur mainly on proteins destined for the extracellular environment, for example, secreted or cell surface proteins; although, intracellular glycosylation (i.e., O-GlcNAc) is increasingly being recognized as an abundant and important protein modification [4]. The significant involvement of protein glycosylations in a broad spectrum of biological processes remains unquestioned, but the direct mechanisms of the carbohydrate role are often not understood in detail. For example, it is known that glycoproteins are involved in immune and inflammatory responses, and that aberrant glycosylation of these can lead to cancer [5–14]. However, the exact roles of the glycans are generally not known and it is unclear whether the changes are causing or a result of the pathology. Another motivation for investigating protein glycosylation is that glycoproteins comprise approximately 25% of the currently available cancer biomarkers approved by the FDA (Food and Drug Administration) [15]. Structural characterization of protein glycosylations by glycoproteomics represents a significant accomplishment in the functional characterization of the PTMs, and has great potential as at present, only a limited number of what is projected to be a vast pool of glycans in nature have been characterized. The research opportunities provided by glycoproteomics are consequently abundant. MS has already established key protocols for the targeted characterization of a small subset of the total set of glycoproteins; the massive challenge, however, is to devise systematic and universal workflows that can deconvolute the diversity and complexity of the glycoproteome.
5.1.2 Challenges in Glycoproteomics The mass spectrometer can in a single experiment detect a vast number of peptides representing thousands of proteins covering a range of three to four orders of magnitude in relative abundance. Despite this large dynamic range of MS, the detectable
106
S. Singh et al.
range does not span the true range of relative protein or peptide abundance observed in some body fluids such as plasma, which has protein concentrations spanning 10– 12 orders of magnitude [16]. In addition, the abundance difference for glycosylated proteins can be even larger due to the substoichiometry of the individual glycoforms, resulting from substantial heterogeneity of the attached glycan structures as well as under-glycosylation of a given glycosylation site. Fractionation and enrichment are consequently essential to address this problem, but can be rather challenging due to the hydrophilic nature of glycoconjugates, which limits the use of traditional proteomics techniques that are mostly based on hydrophobic separation strategies, such as reversed phase liquid chromatography (LC). A second major challenge for glycoproteomics is the characterization of the glycoprotein, which involves multiple levels of analysis: (i) identification of the protein carrier and its absolute or relative abundance in the sample, (ii) identification of the site of glycosylation and the glycan occupancy level and (iii) detailed structural characterization of the total set of glycans linked to a given site and their relative abundance. Glycopeptides derived from glycoproteins using specific or non-specific proteases are generally the preferred analyte in order to gain such information; however, only partial information is usually obtained with a single technique. Hence, multiple workflows have to be applied to increase the completeness of the characterization, which limit the overall sensitivity and increase the analysis time. Finally, as the development of bioinformatic tools for glycoproteomics is lagging behind those for proteomics, data handling is challenging. This is because spectrum assignment and data interpretation generally have to be performed manually or completed using a tailor-made programme to fulfill the requirements for the individual study. Improvements are continuously reported on all of these levels and glycoproteomics is slowly becoming more “user-friendly” as established techniques and workflows are presented. Despite recent advances, however, a level of expertise is still required to analyze protein glycosylation as the analytical tools are so far not as stream-lined as techniques for regular proteomics. This text highlights the status of the current enrichment techniques suitable for glycoproteomics approaches and describes how MS can be utilized to obtain detailed structural information of protein glycosylation.
5.2 Glycopeptide Enrichment Techniques in Glycoproteomics The combination of low stoichiometry and poor ionization of glycopeptides relative to non-modified peptides calls for efficient glycopeptide enrichment techniques. Glycopeptides form a very heterogeneous group of analytes since the amino acid sequence of glycopeptides varies dramatically around the glycosylation sites of glycosylated proteins. Due to this heterogeneity, it is challenging to design universal glycopeptide enrichment techniques. The common feature of glycopeptides is the linkage of glycan moieties, and most enrichment techniques target this structure
5
The Application of High Throughput Mass Spectrometry
107
either directly by interaction or through the physicochemical properties that are associated with the presence of glycans. The generation of the glycopeptides is the first step that needs consideration; specific proteases such as trypsin and endoproteinase Asp-N will generate glycopeptides of widely different sizes and properties from the glycoproteome due to the variation in the amino acid sequence around the glycosylation sites. Since these specific proteases generate predictable peptide sequences, they are often preferred. Alternatively, proteases such as proteinase K or pronase, which cleave peptide bonds more generally, generate a more homogeneous population of glycopeptides in term of size and physicochemical properties. Use of these proteases results in unpredictable peptide sequences which in turn put a higher demand on the peptide characterization steps.
5.2.1 Glycan-Modifying Enrichment Techniques Several chemistry-based approaches have been used for purification of glycosylated proteins and peptides. Unlike lectin-based and other non-modifying enrichment strategies (presented in Section 5.2.2), these methods modify the glycans. Naturally, it is a requirement that the chemical derivatization methods are glycan-specific and do not incur changes in the polypeptide backbone. Most derivatization strategies employ two main reactions: the Schiff-base reaction of aldehydes with hydrazine [17–20] or a Staudinger ligation between a phosphine and an azide [21, 22]. These are briefly discussed below. An overview of the total set of glycan-modifying and non-modifying enrichment techniques for glycopeptides is given in Table 5.1. 5.2.1.1 Schiff-Base Reaction (O-GlcNAc Ketone Enrichment) One example of the Schiff-base reaction is the O-GlcNAc (N-acetylglucosamine) ketone enrichment method [23]. This chemo-enzymatic method uses an engineered β-1,4-galactosyl transferase (GalT) to transfer a ketone containing substrate onto O-GlcNAcylated proteins (Fig. 5.1a). The ketones are then biotinylated with biotinhydrazine (via a Schiff-base), proteolysed, and then enriched using streptavidin affinity columns. The precise glycan attachment site is then determined as the biotinylated tag generates distinctive finger print multi-stage MS (MSn ) spectra. This approach identified 34 unique O-GlcNAc-linked peptides, belonging to 25 proteins from the brain [23]. Subsequently, this workflow has been further extended to provide quantitative O-GlcNAc data by incorporating differential isotopic labeling strategies to samples [24]. 5.2.1.2 Staudinger Ligation The standard Staudinger reaction between an azide and a phosphine, was modified by Saxon and Bertozzi in order to covalently label cell surface glycans, and generate stable adducts [21]. The general strategy is to label cell surface glycans with unnatural azido sialic compounds, followed by exposure and nucleophilic attack by a biotinylated phosphine group (Fig. 5.1b). Both azide and phosphine compounds
108
S. Singh et al.
Table 5.1 Overview of techniques for glycopeptide enrichment. The binding mechanisms and the targeted analytes are listed for the individual techniques Type of enrichment
Binding/Retention mechanism
Glycan-modifying enrichment techniques Schiff-base reaction Covalent coupling Staudinger ligation Covalent coupling PAS Covalent coupling
Glycan-non-modifying enrichment techniques Lectins Affinity HILIC Graphitized carbon Boronic acid
TiO2 Strong cation exchange chromatography Size exclusion chromatography
Enriched analytes
References
O-GlcNAc peptides O-GlcNAc peptides Glycopeptides containing cis-diols (e.g. Man, Gal or Glc)
[17–20, 23, 24] [21, 22, 25–28] [18, 19, 29–33]
Subset of glycome/glycoproteome Hydrophilic, Hydrophilic analytes (e.g. electrostatic glycopeptides) Polar, hydrophobic, Hydrophilic/hydrophobic ionic analytes Covalent Glycopeptides containing (heterocyclic cis-diols (e.g. Man, Gal or diesters) Glc) Bi-/multi-dentate Phosphate or carboxylic acid containing glycan Ionic interactions Discriminates for charged analytes (e.g. sialoglycopeptides) Size exclusion Discriminates for high mass analytes (e.g. glycopeptides)
Table 5.2 [29, 45–54] [58] [59–61]
[62] [63]
[64]
are synthetic and not metabolized by the cells, as such the reaction remains targeted and unique to glycan characterization. O-GlcNAc modified proteins were enriched from cytosolic extracts of a Drosophila melanogaster S2 cell line by metabolic labeling by using the cell permeable, peracetylated azido-GlcNAc [22]. Through natural endogenous cellular processing steps, O-glycosylated proteins were modified by the azido-GlcNAc derivative. Proteins precipitated from cell lysate were subjected to the Staudinger capture reaction using a biotinylated phosphine capture reagent (bPPCR). MS analysis revealed 51 O-GlcNAcylated proteins, only ten of which were previously known. Despite the identification of novel glycans, information on the precise glycosylation site could not always be determined. This is because the O-GlcNAc is a labile modification and collision induced dissociation (CID) fragmentation usually results in its loss and therefore does not leave any fingerprint or hallmark features for further MSn analyses. Moreover, the effects of the azido-modified sugars on cell metabolism are not known, questioning the biological relevance of this strategy. The phosphine groups can also be designed to harbor other tags aside from biotin such as the human influenza hemagglutinin (HA) or the FLAG peptide [25, 26]. The chemical strategy (designated tagging-via-substrate affinity, TAG) can be modified to characterize additional PTMs such as cell surface
MS
Modified azido-GlcNAc
Detection/Verification
Identification of captured peptides
MS
Schiff base reaction
Proteolysis
Cis-diol containing sugar
Periodate oxidation reaction
Enrichment of O-glycans
MS
The Application of High Throughput Mass Spectrometry
Fig. 5.1 Glycan-modifying enrichment work flows. (a) O-GlcNAc ketone enrichment – β-1,4-galactosyl transferase (GalT) transfers a ketone containing substrate onto O-GlcNAc proteins which are in turn biotinylated. The biotin serves as a tag for both detection and glyan enrichment purposes. (b) Staudinger ligation – O-glycoslated proteins can be metabolically labeled with a synthetic azido-GlcNAc, which in turn can be biotinylated via the Staudinger capture reaction, for further manipulation for MS. C, Periodate-acid Schiff coupled affinity chromatography – the cis-diol moiety of the glycan, and its modification reactions are shown
Proteolysis
Total peptide pool
Streptavidin affinity
Periodate-acid-Schiff coupled affinity
Proteolysis
Detection/Verification using Western blot using horse radish peroxidase-conjugated streptavidin
Staudinger capture reaction with biotinylated phosphine capture reagent
Cell lysis
Aminooxy Biotin
Metabolic labeling of proteins destined to be O-glycosylated
Staudinger ligation
Identification of captured peptides
B
GalT transfers a ketone substrate
O-glycosylated Protein
A O-GlcNAc ketone enrichment
5 109
110
S. Singh et al.
farnesylated proteins [27], sumoylated proteins [28] and cytosolic O-GlcNAcylated proteins [22]. 5.2.1.3 Periodate-Acid Schiff Coupled Affinity Chromatography (PAS) PAS is a chemical derivatization of glycans, which exploits the vicinal diol structure unique to sugars. It utilizes a mild periodate oxidation reaction to convert the diols to aldehydes without affecting amino acid residues apart from methionine (Fig. 5.1c). The application of PAS for glycoproteome enrichment strategies was first reported using an iminobiotin hydrazide in the Schiff base reaction [18]. The derivatized peptides were affinity purified on a streptavidin column and subsequently analyzed by MS. The variety of hydrazide coupling reagents available, including hydrazide itself, offers versatility within the method, as exemplified by a study aimed to enrich Nlinked glycoproteins from a mixture of plasma membrane proteins and human blood serum proteins [19]. This study incorporated a quantitative MS strategy to demonstrate that relative abundances of N-linked peptides/proteins could be determined post-PAS. Samples were prepared in duplicate workflows; however, known amounts of N-linked glycoprotein standards were added. After proteolysis of the captured glycoproteins, and post wash steps to remove non-glycosylated peptides, captured N-linked glycopeptides from the parallel preparations were uniquely labeled with either isotopically light (no deuterium) or heavy (four deuteriums) succinic anhydride at their respective N-termini. Treatment of the bound glycopeptides with the enzyme Protein-N-glycosidase F (PNGase F) hydrolyzes the GlcNAc-asparagine bond, releasing the peptides from the hydrazide column. The hydrolysis reaction converts the asparagines to aspartate, and this conversion is indicated at the mass spectral level by a mass shift of one Dalton for the analyzed peptide [29, 30]. Cleavage of the peptide from its associated sugar moiety is necessary since the sugars can be modified in a number of ways by the hydrazide tags, creating a highly complex pool of modified peptides to analyze by MS. Nonetheless, the site of sugar attachment can be deduced and the result is still very relevant for global glycoproteomic studies. Subsequent applications of the PAS strategy include quantitative analyses of serum glycoproteins from normal and cancerous mice [31], a screen to find biomarkers from serum [32], and most recently, the quantification of N-linked cell surface glycoproteins [33]. In this recent study, surface glycans of intact Jurkat Tlymphocytes were labeled with biocytin hydrazide, marking the initial step of this cell surface capture strategy. Subsequent sample processing, which included standard cell lysis, fractionation, enrichment of the tagged glycans, and MS analysis, demonstrated the specificity of this method to select for glycosylated proteins (95% positives). The study also integrated a standard MS quantitative strategy (stable isotope labeling with amino acids in cell culture [SILAC]), that enabled relative quantification of cell surface glycoproteins between two cell types (Human Ramos B cells versus Human Jurkat T lymphocytes), and between Jurkat T cells with and without stimulation by known activators (anti-CD3 and anti-CD28 antibodies). This type of MS driven glycan characterization is more robust and feasible as compared
5
The Application of High Throughput Mass Spectrometry
111
to standard immunochemistry (antibody or lectin labeling characterizations) of cell surface glycans. Due to the broad distribution of vicinal diols in glycoproteins, this enrichment technique is expected to be non-biased for the glycoproteome.
5.2.2 Glycan-Non-modifying Enrichment Techniques 5.2.2.1 Lectin Affinity Purification Lectins are glycan-recognizing proteins and their natural affinities for glycans are widely used to enrich for glycopeptides and glycoproteins in glycoproteomics and as detection tools for immunohistochemical staining. A large number of lectins exist with different affinities for the various classes of glycans, individual glycan structures, and even smaller determinants of the glycan (e.g. one or two monosaccharide residues). Hence, lectins can discriminate between different sub-populations of the glycome/glycoproteome [34, 35]. Lectins can be coupled to various solid supports, such as agarose beads to produce affinity matrices which can be used to enrich for glyco-conjugates from simple or complex protein/peptide mixtures [36, 37]. Lectins utilized include Canavalia esniformis concanavalin A (Con A), Arachis hyopgaea agglutin (PNA), Datura stramonium agglutinin (DSA), wheat germ agglutinin (WGA), Artocarpus heterophyllus jacalin, Helix pomatia agglutinin (HPA), Lycopersicon esculentum lectin (LEL), Aleuria aurantia lectin (AAL), Lens culinaris agglutinin (LCA) and L. culinaris lectin (LCL) [29, 38–43] (additional lectins are listed in Table 5.2). PNA and LEL, for example, are specific for glycans/glycoconjugates possessing GlcNAc residues; HPA for terminal alpha-N-acetylgalactosamine (GalNAc) residues and LCA for the N-acetylchitobiose portion of the oligosaccharide [40]. Reducing sample complexity to more specific types of glycan populations has obvious advantages, Table 5.2 A list of commonly used lectins and their glycan specificities Lectin
Specificity
Con A, Canavalia esniformis concanavalin A AAA, Aleuria aurantia agglutinin WGA, wheat germ agglutinin DSA, Datura stramonium agglutinin Gal6, Caenorhabditis elegans galectin Lec6 GNA, Galanthus nivalis agglutinin MAA, Maackia amurensis agglutinin PNA, Arachis hyopgaea agglutinin SNA, Sambucus nigra agglutinin ECA, Erythrina cristagalli agglutinin LCH, Lens culinaris agglutinin SBA, Soy bean DBA, Dolichos biflorus agglutinin Lotus UEA-1, Ulex europaeus agglutinin
Mannose, Glucose, GlcNAc Fucose β-GlcNAc, Sialic GlcNAc Complex type N-glycans Mannose Sialic β-Gal β-Gal, Sialic α- and β-GalNAc α-Mannose, α-Glucose, α-GlcNAc α- and β-GalNAc, α-GalNAc α-Fucose α-Fucose
112
S. Singh et al.
including the production of a more manageable and predictable population of analytes to analyze, and the option to study specific glycosylation pathways in diseased versus healthy states of biological samples. However, these specific lectins introduce a strong bias for subsets of the glycome/glycoproteome, which limits their use for quantitative experiments of the entire glycoproteome. Lectins with broad specificity such as soybean lectin, Galectin LEC6 (Gal6), and Con A have also been used for more general glycoproteome studies, such as the characterization of periplasmic glycoproteins from Caenorhabditis elegans [44]. Serial lectin affinity chromatography (SLAC) approaches take advantage of the commonalities and unique properties among glycans. Con A, for example, can first be used to enrich for mannose-rich N-glycans, which in turn can be subfractionated into complex-type glycans by use of a Gal6 lectin column [29, 34, 41]. 5.2.2.2 Hydrophilic Interaction Chromatography (HILIC) HILIC has been used increasingly to separate and enrich for glycopeptides in glycoproteomics. The rationale is that the hydrophilic contribution from the glycan is often sufficient to generate a rather unique overall hydrophilicity among the glycopeptides, which can be used to discriminate these from the less hydrophilic non-glycosylated peptides. The hydrophilic nature of glycans is primarily a result of the large number of hydroxyl groups. A selection of HILIC stationary phases are available [45] and mobile phases containing a high content of organic component (40–97%, often acetonitrile) in water are generally used. Formate and other weak acids are commonly used mobile phase additives for adjusting pH when using online HILIC-MS detection. Elution of glycopeptides is performed isocratically or by increasing the concentration of water in the mobile phase. A variety of HILIC approaches have been used for glycoproteomics: (i) enrichment of glycopeptides from simple [46, 47] or complex peptide mixtures [29] in a solid phase extraction type setup (ii) desalting/clean-up of glycopeptides without losing quantitative information [48] and (iii) separation of glycopeptides on analytical [49–51] or capillary/nano [52–54] LC-scale with off- or on-line MS detection. 5.2.2.3 Other, Non-modifying Enrichment Techniques Graphitized carbon has also been used for the separation and purification of glycans and glycopeptides [55–57]. For example, the purification of very small proteinase K generated glycopeptides (3–5 amino acid residues) is a very useful application of graphitized carbon when packed in micro-columns [58]. The disadvantage of graphitized carbon, which supposedly carries out its retention mechanism through polar, hydrophobic and ionic retention effects, is that it fails to selectively enrich tryptic and other “normal size” glycopeptides from peptide mixtures as it also retains non-glycosylated peptides. Hence, the technique is most useful for clean-up (desalting) of glycans and glycopeptides from samples where no interfering molecules in the mass range of glycopeptides are present.
5
The Application of High Throughput Mass Spectrometry
113
Treatment with boronic acid is a rarely used technique for the enrichment of glycoconjugates [59–61]. The principle is that boronic acid can bind molecules containing cis-diol groups, such as mannose (Man)-, galactose (Gal)- or glucose (Glc)-containing glycans, and form heterocyclic diesters that are stable under alkaline conditions. It is difficult to evaluate the potential of boronic acid as a glycopeptide enrichment tool since no detailed studies have been published on its overall specificity and performance. Titanium dioxide has been shown to be valuable for the enrichment of negatively charged compounds such as sialylated glycopeptides [62]. The drawback of titanium dioxide in a quantitative glycoproteomic context is its bias towards negatively charged species. However, for exploring sialylated glycopeptides it has proven very useful if prior dephosphorylation is performed. Finally, strong cation exchange [63] and size exclusion chromatography [64] have also been used for glycopeptide enrichment to discriminate the glycopeptides from the non-glycosylated peptides using charge and size, respectively.
5.3 Structural Characterization of Protein Glycosylation Using MS Following enrichment of the glycosylated proteins or peptides, the next challenge is their detailed characterization, which can be divided in three levels (i) identification of the glycoprotein and determination of its absolute or relative abundance in the sample, (ii) identification of the glycosylation site and the glycan occupancy level and (iii) detailed structural characterization of the total set of glycans linked to a given site and their relative abundance. Experiments regarding level i–ii can (partly) be performed using existing techniques for proteomics or by slight modification. However, level iii requires more specific glycoproteomics techniques. The individual levels will be discussed in Section 5.3.2 following an overview of the MS instrumentation useful for glycoproteomics.
5.3.1 MS Instrumentation for Glycoproteomics 5.3.1.1 Ion Sources and Mass Analyzers Earlier studies of protein glycosylation have mostly employed fast atom bombardment (FAB) analyte ionization [65, 66]. Modern day instruments use predominantly electrospray ionization (ESI) and matrix assisted laser desorption/ionization (MALDI) [67]. Although both of these ionization techniques are considered to yield soft ionization, making them suitable for biomolecules, ESI is known to be the most gentle of the two. In particular, for the analysis of relatively labile PTMs such as glycans, selection of ionization source can make a difference in the results obtained. In addition, the suitability of ESI for coupling to LC separation makes it an attractive ionization technique in glycoproteomics where automation and high-throughput is needed [68]. MALDI, on the other hand, can be adjusted to introduce only little
114
S. Singh et al.
energy to the analyte in the ionization process, thereby reducing the glycan and glycopeptide in source and post source fragmentation to a minimum [48, 69]. The robustness and the complementary workflows associated with MALDI compared to ESI make it a useful tool for analyzing protein glycosylation. It has been widely used in the field of glycomics where a limited number of analytes are present requiring limited separation. However, for glycoproteomics where samples containing a great number of analytes usually are investigated, ESI will often be preferred over MALDI because it is more suited for coupling to separation techniques such as LC and automated workflows. Both of these ionization techniques can be coupled with a variety of mass spectrometers such as triple quadrupoles (QqQ), ion traps (IT), Fourier transform ion cyclotron (FTICR), orbitraps, time-of-flight instruments (TOF and TOF-TOF), and some hybrid-type instruments like quadrupole ion traps (QIT) and quadrupole timeof-flight instruments (Q-TOF) [70]. These instruments are described in a recent review [71]. It is difficult to generalize on their individual performances in glycoproteomics as many parameters will determine which instrument is the most suited for the specific study. However, as for regular proteomics, performance characteristics like sensitivity, resolution, accuracy and suitability for LC coupling are important parameters to consider. 5.3.1.2 Tandem MS Fragmentation of Glycopeptides In glycoproteomics detailed structural information is obtained by fragmentation of the ionized glycoconjugates (most often protease generated glycopeptides) using MS/MS. The fragmentation has traditionally been performed with collision induced dissociation (CID), but other fragmentation techniques have been developed such as infrared multiphoton dissociation (IRMPD), electron capture dissociation (ECD) and electron transfer dissociation (ETD). These will briefly be discussed below with respect to glycoproteomics and an overview of these fragmentation techniques is presented in Fig. 5.2. In CID the analyte fragmentation is a result of collision of the analyte with an inert gas such as helium or argon. The energy produced by the collision is transferred to the analyte and some of its chemical bonds will break. For nonglycosylated peptides, CID results in the cleavage of peptide bonds producing b and y ions [72]. For glycopeptides, however, the CID results in preferential fragmentation of the glycosidic bonds, which are weaker than those of the peptide backbone. Consequentially, CID fragmentation of glycopeptides primarily yields information on the glycan structure, rather than the peptide sequence and the glycosylation site [17, 73]. Mainly B and Y ions are generated with CID of glycopeptides. In CID, the fragmentation is divided into high- (> 100 eV) or low- (< 100 eV) energy collision fragmentation. Generally, TOF-TOF instruments use high energy collision fragmentation and Q-TOFs low energy CID. Low energy tandem MS was applied to a glycopeptide linked with a N-glycolylneuraminic acid (NeuGc)2 GalGalNAc moiety. The MS/MS spectrum showed the loss of NeuGc, Gal, and GalNAc groups from the precursor ion [74]. Similar fragmentation studies have been performed
5
The Application of High Throughput Mass Spectrometry Fragmentation method
Fragmentation (Primary)
115
Structural information
Examples B ions
CID HCD
Glycosidic bonds Mostly B / Y ions + oxonium ions are produced
Glycan identity Monosaccharide composition and branching
IRMPD
Y ions
*
*
+ Oxonium ions:
*
m/z 163 HexNAc m/z 204 LacNAc m/z 366
* H-Q-G-N-D-T-S-R
ECD ETD
N-Cα bonds of peptide Mostly c/z ions are produced
Peptide identity Sequence, location of glycosylation site and mass of glycan
z ions
H-Q-G-N-D-T-S-R c ions
Fig. 5.2 Overview of the fragmentation techniques used for glycopeptides in glycoproteomics. CID, HCD and IRMPD represent fragmentation techniques that preferentially cleave the glycosidic bonds whereas ECD and ETD cleave the peptide backbone. Thus, the former yields information of the glycan identity and the latter on the peptide identity and glycosylation site. Examples of the two groups of fragmentation techniques are given. Abundant fragments are marked with asterisks
on N-glycosylated peptides where the most abundant signals in the MS/MS spectra correspond to fragments from the glycan moiety [75–78]. In addition to glycan fragmentation, CID also induces the production of oxonium ions in the low m/z region of the MS/MS spectra, which can be of great diagnostic value. For example, production of oxonium ions was demonstrated for mono- and di-fucosylated O-linked peptides using an ESI-Q-TOF mass spectrometer under low-energy fragmentation conditions [79]. On individual instruments the collision energy can be manipulated to obtain optimal fragmentation of a given analyte. This has led to a special high energy variant of CID called higher-energy collision dissociation (HCD). If HCD is applied to glycopeptides, fewer, but more abundant fragments appear in the spectrum (unpublished observation). These fragments are mostly oxonium ions and specific glycan fragments. Occasionally, there will be low abundant fragments of the peptide backbone present in the spectra as well. The invention of ECD in FTICR MS instruments [80, 81] has facilitated the localization of labile PTMs such as O- and N-linked glycans to their specific glycosylation sites [82, 83]. ECD produces odd-electron, free radical driven fragmentation which results in cleavage of the N-Cα bond of the peptide backbone, and the preservation of the PTM on the peptide. Thus, ECD yields mostly information of the peptide moiety and the site of glycosylation. ECD has successfully been used in combination with IRMPD, which has been shown to selectively cleave glycosidic bonds rather than peptide bonds [82, 84].
116
S. Singh et al.
Similar to ECD, ETD has emerged as an alternative fragmentation technique complementary to CID and IRMPD [85, 86]. Peptide fragmentation is achieved through gas-phase electron-transfer reactions and the glycan moiety is left intact [87]. Generally, the efficiency of ETD fragmentation increases with the charge density of the glycopeptides meaning that analytes with higher charge states will fragment more readily. Thus, ETD is usually limited to glycopeptide ions with three or preferably more charges [88, 89].
5.3.2 Structural Characterization of Glycopeptides 5.3.2.1 Identification of Glycoproteins and Their Glycosylation Sites in Glycoproteomics Following enrichment (summarized in Fig. 5.3) and deglycosylation, previously glycosylated peptides derived from glycoproteins can be identified using regular proteomics techniques. The removal of N-glycans can be performed efficiently using endoglycosidases such as PNGase F or A or endo-β-acetylglucosamidinases (endo H, D or F). PNGase F cleaves the GlcNAc-Asn bond converting the asparagine to aspartate. This conversion results in a 0.98 Da mass increment, a discernable remnant at the mass spectral level [90]. The major pitfall of using the asparagine-to-aspartate conversion is that the approach does not distinguish between
lysate
Selective Derivatization Metabolic or chemical labeling
Streptavidin or hydrazide columns, or Staudinger reaction capture
Lectin affinity chromatogrpahy
Endoglycosidase treatment PNGase F, Endo H, Endo D
Glycan detection / verification Probe against tag or glycan
Gel electrophoresis Protein digest Trypsin, chymotrypsin, etc.
Streptavidin or hydrazide columns, or Staudinger reaction capture
HILIC LC/MS with glycan specific precursor ion scans
Fig. 5.3 An overview the number of strategies for glycoprotein/glycopeptides enrichment. These strategies are often combined for fine-tuned enrichment of a subset of the glycoproteome. This figure emphasizes that a significant amount work in the MS-dependent glycoproteome analysis is in the preparation of the glycans themselves
5
The Application of High Throughput Mass Spectrometry
117
deglycosylation-induced conversion and other causes of deamidation, either in vivo or in vitro. Deglycosylation in heavy isotopic water (H2 18 O) has been performed as the +2.98 Da mass shift is more easily recognized [91]. The relative abundance of glycopeptides from two samples can also be determined by performing the deglycosylation of the two samples in H2 16 O and H2 18 O, respectively, with subsequent mixing and determination of the relative intensities of their respective MS signals. An alternative to PNGase F/A is Endo H/D/F, which cleaves the glycosidic bond between the two GlcNAc residues of the N-glycan core leaving a single GlcNAc residue attached to the peptide. The increased mass difference incurred at the modified peptide is 203 Da (349 Da for core fucosylated N-glycans), which represents a unique N-glycosylation site identifier [92]. 5.3.2.2 Characterization of Glycan Structures from Glycopeptides Site-specific characterization of glycans includes the determination of monosaccharide composition, linkage types and branch points. Targeted studies on specific glycoproteins can address all of these levels; however, for larger scale studies determination of the monosaccharide composition is usually the aim. In particular for N-glycans, the overall structure can be hypothesized from a given monosaccharide composition due to the presence of a conserved core structure and a well-defined and restricted glycan synthesis pathway. As described in Section 5.3.1.2, tandem MS fragmentation can be used to obtain information about the glycan structure of the glycopeptide. CID is efficient for generating glycan fragmentation of glycopeptides, but fails to give much information about the peptide and glycosylation site. Occasionally, low abundant peptide fragments appear in the CID MS/MS spectra from which the sequence can be deduced. For N-linked glycosylation, the glycosylation site is usually quite easy to identify since N-glycans are linked to asparagine residues in the restricted sequence NXT/S, where X can be any amino acid residue except for proline. However, for O-linked glycosylation, the glycosylation site is often impossible to determine with CID due to the fact that no consensus sequence is known for this type of glycosylation and that O-linked glycans often appear in regions rich in serine and threonine residues. In these cases, ETD and ECD fragmentations are beneficial since they retain the glycan on the peptide backbone and generate peptide information. Thus, both fragmentation techniques are often required to obtain the information needed in glycoproteomics. Although several glycopeptide enrichment strategies are available (see Table 5.1 for overview), samples will often contain an amount of non-glycosylated peptides in particular when starting from complex peptide mixtures, which is usually the case in glycoproteomics. Even if only little “contamination” from non-glycosylated peptides is present, the sample can still be rather complex since a large fraction of the proteome is glycosylated. Hence, a separation step is often needed in front of the mass spectrometer making ESI-LC-MS an attractive workflow. If a fairly wellenriched glycopeptide sample is analyzed with an ordinary LC-MS setup (90 min gradient, reversed-phase column, selecting the top 5 MS signals for CID MS/MS
118
S. Singh et al.
fragmentation), a large amount of information can usually be obtained although interpretation can be time-consuming and laborious. However, for samples containing relatively low amounts of glycopeptides there is a risk that these signals will never be selected for fragmentation and will be lost in the “noise” of the nonmodified peptides. Hence, some specific operation modes, which will be described below, have been designed to optimize the MS analysis. In the early 1990s a set of landmark studies established standard MS protocols to screen for glycan-specific reporter ions/fragments, which serve to guide the mass spectrometer to enhance further fragmentation of glyosylated peptides from which the reporter ions were generated [93, 94]. In a typical tripe quadrupole MS experiment (QqQ MS), the first quadrupole scans a mass range such that ions are transferred to the second quadruopole (q) for CID. These fragmented ions are then transferred to the third quadrupole where the mass scan range is specific for the glycan reporter ions. If the reporter ions are detected, the mass spectrometer prioritizes further fragmentation of their precursor masses – the intact glycopeptides from which they were derived, thus the term precursor ion scanning. The common glycan reporter ions include Hex+ (m/z 163), HexNAc+ , (m/z 204), HexHexNAc+ (m/z 366); and m/z 274 and 292 which are reporters for the presence of sialic acid [17, 95, 96]. If multiple reporter ions co-elute or are derived from the same precursor mass, the confidence in the glycan identification is increased [17, 93, 97]. Neutral loss scanning is also used for isolating post-translationally modified peptides. The name is derived from the observation that a loss in a PTM (or part of the PTM) can affect the parent or precursor mass without affecting the charge. Observed neutral losses are often unique to a given PTM. For example, in a typical QqQ experiment, when samples are subjected to CID, ions are transferred to the third quadrupole which is set to scan for an offset in the precursor mass corresponding to the neutral loss of m/z 203 for the HexNAc moiety [43, 98]. A neutral loss corresponding to hexose (162 Da), m/z 81 and 54 for doubly and triply charged species respectively, can also be observed when collision energies are optimized for an ESI-QTOF instrument set-up [99]; a neutral loss of 146 Da is also an indicator of fucosylated glycans as observed by LTQ-FT [43]. In summary, complete glycan characterization is extremely difficult in glycoproteomics. At present, researchers are mainly focusing on the identification of glycosylated proteins, their glycosylation sites and the monosaccharide compositions of the attached glycans. Largely, these levels are achievable with the current techniques and instrumentation. However, as is the case for proteomics, only the most abundant subset of the glycoproteome is detectable with these methods. Higher sensitivity and dynamic range of the analytical techniques and better separation, pre-fractionation and enrichment of the sample might increase the depth of the glycoproteome coverage. In addition, the advancement of glycoproteomics is dependent on the development of bioinformatic tools for the automatic assignment of glycopeptide MS/MS spectra and of robust search engines similar to the ones available for regular proteomics.
5
The Application of High Throughput Mass Spectrometry
119
5.4 Conclusions Glycoproteomics is conceptually based on the large scale analysis of glycopeptides or alternatively directly on intact glycoproteins. Detailed structural characterization of glycopeptides generates biologically relevant information, however, it is technically challenging compared to the structural analysis of released glycans in glycomics. Some of the challenges are associated with mass analysis, for example, the higher analyte mass and generating (and interpreting) fragmentation of both the glycan and the peptide moieties. However, the main challenge is found at the sample preparation level, where the heterogeneity of protein glycosylation dictates selective glycopeptide enrichment. Different characteristics have been used to discriminate for glycopeptides: mass, hydropathy, structure and charge. Naturally, the most unique characteristic is the glycan structure; however, structural pattern recognition inherently introduces a bias for subsets of the glycoproteome, making it unsuited for quantitative experiments. In contrast, the hydrophilic character is a common feature of glycopeptides and hydrophilicity is consequently an ideal physicochemical parameter for the selective and non-biased enrichment of glycopeptides. Hence, it is expected that the use of HILIC will increase in the coming years. The development and application of HILIC are expected to be paralleled by the use of other enrichment techniques, in particular lectins. The bias of lectins can be reduced by using lectins with broad specificity or mixtures of lectins. For targeted glycoprotein analysis, proteolytic enzymes generating glycopeptides of appropriate mass can easily be predicted in silico. It is much harder to design universal strategies for complex protein mixtures because the polypeptide sequences around the glycosylation sites vary among the glycoproteins. The use of proteinase K or pronase, which generate short glycopeptides around the sequon irrespective of the polypeptide sequence, are options when aiming for the development of universal workflows. Technology improvements are continuously increasing the sensitivity, resolution, accuracy and speed of mass spectrometers. However, it seems that the performance of the modern mass spectrometers fulfill the requirements for the majority of the reported studies. Thus, the quality of the obtained data is to a higher degree dependent on the quality/condition of the sample applied to the mass spectrometer rather than the last few percent of MS performance. It is anticipated that improvements in sample preparation (i.e. enrichment and desalting) and analyte separation techniques (pre-fractionation / on- or off-line LC-MS) will contribute more to the progress of the field than further improvements of the mass spectrometers. However, there are recent examples where MS technology inventions have aided carbohydrate analysis significantly e.g. ETD for the selective fragmentation of the peptide backbone of glycopeptides to facilitate glycosylation site assignment. There is little doubt that MS will continue to be the main analytical tool for glycoproteomics in the coming years. In conclusion, judging by the technical challenges still associated with the analysis of glycopeptides from relatively simple peptide mixtures there is still a long
120
S. Singh et al.
way to go before glycoproteomics is performed routinely in non-expert laboratories. Developments on the sample preparation level and on the general workflows are crucial for moving more rapidly in this direction and acquiring data of higher quality. In addition, improvements in the bioinformatics tools available are similarly required as a consequence of the increasing demand for large-scale glycoproteomics experiments.
References 1. Roseman S (2001) Reflections on glycobiology. J Biol Chem 276:41527–41542 2. Rudd PM, Wormald MR and Dwek RA (2004) Sugar-mediated ligand-receptor interactions in the immune system. Trends Biotechnol 22:524–530 3. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8 4. Hart GW, Housley MP, Slawson C (2007) Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins. Nature 446:1017–10122 5. Hwang HY, Olson SK, Esko JD, Horvitz HR (2003) Caenorhabditis elegans early embryogenesis and vulval morphogenesis require chondroitin biosynthesis. Nature 423:439–443 6. Collins BE, Paulson JC (2004) Cell surface biology mediated by low affinity multivalent protein-glycan interactions. Curr Opin Chem Biol 8:617–625 7. Lin X (2004) Functions of heparan sulfate proteoglycans in cell signaling during development. Development 131:6009–6021 8. Lowe JB, Marth JD (2003) A genetic approach to mammalian glycan function. Ann Rev Biochem 72:643–691 9. Dube DH, Bertozzi CR (2005) Glycans in cancer and inflammation – potential for therapeutics and diagnostics. Nat Rev Drug Discov 4:477–488 10. Inatani M, Irie F, Plump AS, Tessier-Lavigne M, Yamaguchi Y (2003) Mammalian brain morphogenesis and midline axon guidance require heparan sulfate. Science 302:1044–1046 11. Kinjo Y et al (2005) Recognition of bacterial glycosphingolipids by natural killer T cells. Nature 434:520–525 12. Casu B, Guerrini M, Torri G (2004) Structural and conformational aspects of the anticoagulant and anti-thrombotic activity of heparin and dermatan sulfate. Curr Pharm Des 10: 939–949 13. Guo Y et al (2004) Structural basis for distinct ligand-binding and targeting properties of the receptors DC-SIGN and DC-SIGNR. Nat Struct Mol Biol 11:591–598 14. Liu D, Shriver Z, Venkataraman G, El Shabrawi Y, Sasisekharan R (2002) Tumor cell surface heparan sulfate as cryptic promoters or inhibitors of tumor growth and metastasis. Proc Natl Acad Sci U S A 99:568–573 15. Ludwig JA, Weinstein JN (2005) Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer 5:845–856 16. Anderson NL, Anderson NG (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 1:845–867 17. Jebanathirajah J, Steen H, Roepstorff P (2003) Using optimized collision energies and high resolution, high accuracy fragment ion selection to improve glycopeptide detection by precursor ion scanning. J Am Soc Mass Spectrom 14:777–784 18. Jebanathirajah J, Stensballe H, Jensen A, Roepstorff P (2002) Modification specific proteomics: integrated strategy for glyco and phosphospecific proteomics. In Presented at the ASMS 2002, Orlando, FL 19. Zhang H, Li XJ, Martin DB, Aebersold R (2003) Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol 21:660–666
5
The Application of High Throughput Mass Spectrometry
121
20. Khidekel N, Arndt S, Lamarre-Vincent N, Lippert A, Poulin-Kerstien KG, Ramakrishnan B, Qasba PK, Hsieh-Wilson LC (2003) A chemoenzymatic approach toward the rapid and sensitive detection of O-GlcNAc posttranslational modifications. J Am Chem Soc 125: 16162–16163 21. Saxon E, Bertozzi CR (2000) Cell surface engineering by a modified Staudinger reaction. Science 287:2007–2010 22. Sprung R, Nandi A, Chen Y, Kim SC, Barma D, Falck JR, Zhao Y (2005) Tagging-viasubstrate strategy for probing O-GlcNAc modified proteins. J Proteome Res 4:950–957 23. Khidekel N, Ficarro SB, Peters EC, Hsieh-Wilson LC (2004) Exploring the O-GlcNAc proteome: direct identification of O-GlcNAc-modified proteins from the brain. Proc Natl Acad Sci U S A 101:13132–13137 24. Khidekel N et al (2007) Probing the dynamics of O-GlcNAc glycosylation in the brain using quantitative proteomics. Nat Chem Biol 3:339–348 25. Vocadlo DJ, Hang HC, Kim EJ, Hanover JA, Bertozzi CR (2003) A chemical approach for identifying O-GlcNAc-modified proteins in cells. Proc Natl Acad Sci U S A 100:9116–9121 26. Prescher JA, Dube DH, Bertozzi CR (2004) Chemical remodelling of cell surfaces in living animals. Nature 430:873–837 27. Kho Y et al (2004) A tagging-via-substrate technology for detection and proteomics of farnesylated proteins. Proc Natl Acad Sci U S A 101:12479–12484 28. Zhao Y, Kwon SW, Anselmo A, Kaur K, White MA (2004) Broad spectrum identification of cellular small ubiquitin-related modifier (SUMO) substrate proteins. J Biol Chem 279: 20999–21002 29. Hagglund P, Bunkenborg J, Elortza F, Jensen ON, Roepstorff P (2004) A new strategy for identification of N-glycosylated proteins and unambiguous assignment of their glycosylation sites using HILIC enrichment and partial deglycosylation. J Proteome Res 3:556–566 30. Carr SA, Roberts GD (1986) Carbohydrate mapping by mass spectrometry: a novel method for identifying attachment sites of Asn-linked sugars in glycoproteins. Anal Biochem 157:396–406 31. Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ, Rush J, Lauffenburger DA, White FM (2005) Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics 4: 1240–1250 32. Pan S, Zhang H, Rush J, Eng J, Zhang N, Patterson D, Comb MJ, Aebersold R (2005) High throughput proteome screening for biomarker detection. Mol Cell Proteomics 4:182–190 33. Wollscheid B, Bausch-Fluck D, Henderson C, O Brien R, Bibel M, Schiess R, Aebersold R, Watts JD (2009) Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nat Biotechnol 27:378–386 34. Kobata A, Endo T (1992) Immobilized lectin columns: useful tools for the fractionation and structural analysis of oligosaccharides. J Chromatogr 597:111–122 35. Harada H, Kamei M, Tokumoto Y, Yui S, Koyama F, Kochibe N, Endo T, Kobata A (1987) Systematic fractionation of oligosaccharides of human immunoglobulin G by serial affinity chromatography on immobilized lectin columns. Anal Biochem 164:374–381 36. Cummings RD, Kornfeld S (1982) Characterization of the structural determinants required for the high affinity interaction of asparagine-linked oligosaccharides with immobilized Phaseolus vulgaris leukoagglutinating and erythroagglutinating lectins. J Biol Chem 257:11230–11234 37. Hirabayashi J (2004) Lectin-based structural glycomics: glycoproteomics and glycan profiling. Glycoconj J 21:35–40 38. Kaji H et al (2003) Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoproteins. Nat Biotechnol 21:667–672 39. Kaji H, Yamauchi Y, Takahashi N, Isobe T (2006) Mass spectrometric identification of Nlinked glycopeptides using lectin-mediated affinity capture and glycosylation site-specific stable isotope tagging. Nat Protoc 1:3019–3027
122
S. Singh et al.
40. Jung K, Cho W, Regnier FE (2009) Glycoproteomics of plasma based on narrow selectivity lectin affinity chromatography. J Proteome Res 8:643–650 41. Kubota K et al (2008) Analysis of glycopeptides using lectin affinity chromatography with MALDI-TOF mass spectrometry. Anal Chem 80:3693–3698 42. Gallagher JT, Morris A, Dexter TM (1985) Identification of two binding sites for wheat-germ agglutinin on polylactosamine-type oligosaccharides. Biochem J 231:115–122 43. Jia W et al (2009) A strategy for precise and large scale identification of core fucosylated glycoproteins. Mol Cell Proteomics 8:913–923 44. Hirabayashi J, Hayama K, Kaji H, Isobe T, Kasai K (2002) Affinity capturing and gene assignment of soluble glycoproteins produced by the nematode Caenorhabditis elegans. J Biochem 132:103–114 45. Hemstrom P, Irgum K (2006) Hydrophilic interaction chromatography. J Sep Sci 29: 1784–1821 46. Thaysen-Andersen M, Thogersen IB, Nielsen HJ, Lademann U, Brunner N, Enghild JJ, Hojrup P (2007) Rapid and individual-specific glycoprofiling of the low abundance Nglycosylated protein tissue inhibitor of metalloproteinases-1. Mol Cell Proteomics 6:638–647 47. Thaysen-Andersen M et al (2008) Investigating the biomarker potential of glycoproteins using comparative glycoprofiling – application to tissue inhibitor of metalloproteinases-1. Biochim Biophys Acta 1784:455–463 48. Thaysen-Andersen M, Mysling S, Hojrup P (2009) Site-specific glycoprofiling of N-linked glycopeptides using MALDI-TOF MS: strong correlation between signal strength and glycoform quantities. Anal Chem 81:3933–3943 49. Zhang J, Wang DI (1998) Quantitative analysis and process monitoring of site-specific glycosylation microheterogeneity in recombinant human interferon-gamma from Chinese hamster ovary cell culture by hydrophilic interaction chromatography. J Chromatogr B Biomed Sci Appl 712:73–82 50. Takegawa Y, Deguchi K, Ito H, Keira T, Nakagawa H, Nishimura SI (2006) Simple separation of isomeric sialylated N-glycopeptides by a zwitterionic type of hydrophilic interaction chromatography. J Sep Sci 29:2533–2540 51. Takegawa Y, Deguchi K, Keira T, Ito H, Nakagawa H, Nishimura S (2006) Separation of isomeric 2-aminopyridine derivatized N-glycans and N-glycopeptides of human serum immunoglobulin G by using a zwitterionic type of hydrophilic-interaction chromatography. J Chromatogr A 1113:177–181 52. Takegawa Y, Ito H, Keira T, Deguchi K, Nakagawa H, Nishimura S (2008) Profiling of N- and O-glycopeptides of erythropoietin by capillary zwitterionic type of hydrophilic interaction chromatography/electrospray ionization mass spectrometry. J Sep Sci 31:1585–1593 53. Wuhrer M, Koeleman CA, Hokke CH, Deelder AM (2005) Protein glycosylation analyzed by normal-phase nano-liquid chromatography – mass spectrometry of glycopeptides. Anal Chem 77:886–894 54. Wuhrer M, de Boer AR, Deelder AM (2009) Structural glycomics using hydrophilic interaction chromatography (HILIC) with mass spectrometry. Mass Spectrom Rev 28: 192–206 55. Packer NH, Lawson MA, Jardine DR, Redmond JW (1998) A general approach to desalting oligosaccharides released from glycoproteins. Glycoconj J 15:737–747 56. Itoh S, Kawasaki N, Ohta M, Hyuga M, Hyuga S, Hayakawa T (2002) Simultaneous microanalysis of N-linked oligosaccharides in a glycoprotein using microbore graphitized carbon column liquid chromatography-mass spectrometry. J Chromatogr A 968:89–100 57. Alley WR Jr, Mechref Y, Novotny MV (2009) Use of activated graphitized carbon chips for liquid chromatography/mass spectrometric and tandem mass spectrometric analysis of tryptic glycopeptides. Rapid Commun Mass Spectrom 23:495–505 58. Larsen MR, Hojrup P, Roepstorff P (2005) Characterization of gel-separated glycoproteins using two-step proteolytic digestion combined with sequential microcolumns and mass spectrometry. Mol Cell Proteomics 4:107–119
5
The Application of High Throughput Mass Spectrometry
123
59. Sparbier K, Koch S, Kessler I, Wenzel T, Kostrzewa M (2005) Selective isolation of glycoproteins and glycopeptides for MALDI-TOF MS detection supported by magnetic particles. J Biomol Tech 16:407–413 60. Sparbier K, Wenzel T, Kostrzewa M (2006) Exploring the binding profiles of ConA, boronic acid and WGA by MALDI-TOF/TOF MS and magnetic particles. J Chromatogr B Analyt Technol Biomed Life Sci 840:29–36 61. Zhang L, Xu Y, Yao H, Xie L, Yao J, Lu H, Yang P (2009) Boronic acid functionalized coresatellite composite nanoparticles for advanced enrichment of glycopeptides and glycoproteins. Chemistry 15:10158–10166 62. Larsen MR, Jensen SS, Jakobsen LA, Heegaard NHH (2007) Exploring the sialiome using titanium dioxide chromatography and mass spectrometry. Mol Cell Proteomics 6:1778–1787 63. Lewandrowski U, Zahedi RP, Moebius J, Walter U, Sickmann A (2007) Enhanced Nglycosylation site analysis of sialoglycopeptides by strong cation exchange prefractionation applied to platelet plasma membranes. Mol Cell Proteomics 6:1933–1941 64. Alvarez-Manilla G, Atwood J, 3rd, Guo Y, Warren NL, Orlando R, Pierce M (2006) Tools for glycoproteomic analysis: size exclusion chromatography facilitates identification of tryptic glycopeptides with N-linked glycosylation sites. J Proteome Res 5:701–708 65. Morris HR (1980) Biomolecular structure determination by mass spectrometry. Nature 286:447–452 66. Barber M, Bordoli RS, Sedgwick RD, Tyler AN, Bycroft BW (1981) Fast atom bombardment mass spectrometry of bleomycin A2 and B2 and their metal complexes. Biochem Biophys Res Commun 101:632–638 67. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM (1989) Electrospray ionization for mass spectrometry of large biomolecules. Science 246:64–71 68. Lane CS (2005) Mass spectrometry-based proteomics in the life sciences. Cell Mol Life Sci 62:848–869 69. Papac DI, Wong A, Jones AJ (1996) Analysis of acidic oligosaccharides and glycopeptides by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Anal Chem 68:3215–3223 70. Hoffmann Ed., Stroobant V (2007) Mass spectrometry principles and applications. John Wiley & Sons Ltd, New York 71. Schaeffer-Reiss C (2008) A brief summary of the different types of mass spectrometers used in proteomics. Methods Mol Biol 484:3–16 72. Roepstorff P, Fohlman J (1984) Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11:601 73. Medzihradszky KF, Gillece-Castro BL, Settineri CA, Townsend RR, Masiarz FR, Burlingame AL (1990) Structure determination of O-linked glycopeptides by tandem mass spectrometry. Biomed Environ Mass Spectrom 19:777–781 74. Hirayama K, Yuji R, Yamada N, Kato K, Arata Y, Shimada I (1998) Complete and rapid peptide and glycopeptide mapping of mouse monoclonal antibody by LC/MS/MS using ion trap mass spectrometry. Anal Chem 70:2718–2725 75. Bateman KP, White RL, Thibault P (1998) Evaluation of adsorption preconcentration/capillary zone electrophoresis/nanoelectrospray mass spectrometry for peptide and glycoprotein analyses. J Mass Spectrom 33:1109–1123 76. Hui JP, White TC, Thibault P (2002) Identification of glycan structure and glycosylation sites in cellobiohydrolase II and endoglucanases I and II from Trichoderma reesei. Glycobiology 12:837–849 77. Zeng R, Xu Q, Shao XX, Wang KY, Xia QC (1999) Characterization and analysis of a novel glycoprotein from snake venom using liquid chromatography-electrospray mass spectrometry and Edman degradation. Eur J Biochem 266:352–328 78. Zhu X, Borchers C, Bienstock RJ, Tomer KB (2000) Mass spectrometric characterization of the glycosylation pattern of HIV-gp120 expressed in CHO cells. Biochemistry 39: 11194–11204
124
S. Singh et al.
79. Macek B, Hofsteenge J, Peter-Katalinic J (2001) Direct determination of glycosylation sites in O-fucosylated glycopeptides using nano-electrospray quadrupole time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 15:771–777 80. McLafferty FW, Horn DM, Breuker K, Ge Y, Lewis MA, Cerda B, Zubarev RA, Carpenter BK (2001) Electron capture dissociation of gaseous multiply charged ions by Fourier-transform ion cyclotron resonance. J Am Soc Mass Spectrom 12:245–249 81. Zubarev RA, Horn DM, Fridriksson EK, Kelleher NL, Kruger NA, Lewis MA, Carpenter BK, McLafferty FW (2000) Electron capture dissociation for structural characterization of multiply charged protein cations. Anal Chem 72:563–573 82. Hakansson K, Cooper HJ, Emmett MR, Costello CE, Marshall AG, Nilsson CL (2001) Electron capture dissociation and infrared multiphoton dissociation MS/MS of an Nglycosylated tryptic peptic to yield complementary sequence information. Anal Chem 73:4530–4536 83. Mirgorodskaya E, Roepstorff P, Zubarev RA (1999) Localization of O-glycosylation sites in peptides by electron capture dissociation in a Fourier transform mass spectrometer. Anal Chem 71:4431–4436 84. Hakansson K, Emmett MR, Hendrickson CL, Marshall AG (2001) High-sensitivity electron capture dissociation tandem FTICR mass spectrometry of microelectrosprayed peptides. Anal Chem 73:3605–3610 85. Hogan JM, Pitteri SJ, Chrisman PA, McLuckey SA (2005) Complementary structural information from a tryptic N-linked glycopeptide via electron transfer ion/ion reactions and collision-induced dissociation. J Proteome Res 4:628–632 86. Wuhrer M, Catalina MI, Deelder AM, Hokke CH (2007) Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B Analyt Technol Biomed Life Sci 849:115–128 87. Catalina MI, Koeleman CA, Deelder AM, Wuhrer M (2007) Electron transfer dissociation of N-glycopeptides: loss of the entire N-glycosylated asparagine side chain. Rapid Commun Mass Spectrom 21:1053–1061 88. Wiesner J, Premsler T, Sickmann A (2008) Application of electron transfer dissociation (ETD) for the analysis of posttranslational modifications. Proteomics 8:4466–4483 89. Good DM, Wirtala M, McAlister GC, Coon JJ (2007) Performance characteristics of electron transfer dissociation mass spectrometry. Mol Cell Proteomics 6:1942–1951 90. Picariello G, Ferranti P, Mamone G, Roepstorff P, Addeo F (2008) Identification of N-linked glycoproteins in human milk by hydrophilic interaction liquid chromatography and mass spectrometry. Proteomics 8:3833–3847 91. Gonzalez J, Takao T, Hori H, Besada V, Rodriguez R, Padron G, Shimonishi Y (1992) A method for determination of N-glycosylation sites in glycoproteins by collision-induced dissociation analysis in fast atom bombardment mass spectrometry: identification of the positions of carbohydrate-linked asparagine in recombinant alpha-amylase by treatment with peptide-N-glycosidase F in 18O-labeled water. Anal Biochem 205:151–158 92. Hagglund P, Matthiesen R, Elortza F, Hojrup P, Roepstorff P, Jensen ON, Bunkenborg J (2007) An enzymatic deglycosylation scheme enabling identification of core fucosylated N-glycans and O-glycosylation site mapping of human plasma proteins. J Proteome Res 6:3021–3031 93. Carr SA, Huddleston MJ, Bean MF (1993) Selective identification and differentiation of Nand O-linked oligosaccharides in glycoproteins by liquid chromatography-mass spectrometry. Protein Sci 2:183–196 94. Huddleston MJ, Bean MF, Carr SA (1993) Collisional fragmentation of glycopeptides by electrospray ionization LC/MS and LC/MS/MS: methods for selective detection of glycopeptides in protein digests. Anal Chem 65:877–884 95. Annan RS, Carr SA (1997) The essential role of mass spectrometry in characterizing protein structure: mapping posttranslational modifications. J Protein Chem 16:391–402 96. Sullivan B, Addona TA, Carr SA (2004) Selective detection of glycopeptides on ion trap mass spectrometers. Anal Chem 76:3112–3118
5
The Application of High Throughput Mass Spectrometry
125
97. Adamson JT, Hakansson K (2006) Infrared multiphoton dissociation and electron capture dissociation of high-mannose type glycopeptides. J Proteome Res 5:493–501 98. Jiang H, Desaire H, Butnev VY, Bousfield GR (2004) Glycoprotein profiling by electrospray mass spectrometry. J Am Soc Mass Spectrom 15:750–758 99. Gadgil HS, Bondarenko PV, Treuheit MJ, Ren D (2007) Screening and sequencing of glycated proteins by neutral loss scan LC/MS/MS method. Anal Chem 79:5991–5999
Chapter 6
Solutions to the Glycosylation Problem for Low- and High-Throughput Structural Glycoproteomics Simon J. Davis and Max Crispin
Abstract N- and O-glycosylation profoundly affect the biological properties of glycoproteins, principally by influencing their structures and cellular trafficking, and by forming the recognition sites of carbohydrate-binding ligands. For crystallographers interested in studying the protein component of glycoproteins, the two most important aspects of glycosylation are (1) that it is often essential for the correct folding of a given protein and for ensuring its solubility, which generally necessitates expression of the molecule in eukaryotic cells, and (2) that there are now procedures for the efficient post-folding removal of N-linked glycans from glycoproteins and for minimizing the effects of O-glycosylation, which will generally benefit crystallogenesis. We provide an overview of how glycans influence glycoprotein folding and then identify the sources of structural heterogeneity at the heart of the ‘glycosylation problem’. We then discuss the options available to structural biologists for circumventing the problems associated with protein N- and O-glycosylation. Our emphasis is on methods for producing glycoproteins with homogeneous and/or removable N-glycosylation in mammalian cells that can be implemented in both very high yield, stable expression systems and in a high throughput format based on transient expression protocols. We also consider whether deglycosylation reduces protein stability and end by emphasizing the importance of using rigorous stereochemical and biosynthetic data when building glycosylation into partial or complete electron density. Keywords Glycosylation · Endoglycosidases · Mammalian expression systems · Structural biology · High throughput
S.J. Davis (B) Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK e-mail: [email protected]
R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_6, C Springer Science+Business Media B.V. 2011
127
128
S.J. Davis and M. Crispin
Abbreviations CHO CNX CRT Endo ER GalNAc GlcNAc GnT I HEK IL IgSF MALDI-TOF MS NB-DNJ PDB PNGase F s S2 SG STP TCR UGGT
Chinese hamster ovary calnexin calreticulin endoglycosidase endoplasmic reticulum α-N-acetylgalactosamine N-acetylglucosamine β1−2 N-acetylglucosamine transferase I human embryonic kidney interleukin immunoglobulin superfamily matrix-assisted laser desorption/ionization-time of flight mass spectrometry N-butyldeoxynojirimycin Protein Data Bank peptide-N-glycosidase F soluble Schneider 2 structural genomics serine-, threonine- and proline-rich T-cell receptor UDP-glucose glycoprotein:glucosyltransferase
Contents 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 6.2 Glycosylation and Protein Folding . . . . . . . . . . . . . 6.3 The “Glycosylation Problem” . . . . . . . . . . . . . . . 6.3.1 O-Glycosylation . . . . . . . . . . . . . . . . . 6.3.2 N-Glycosylation . . . . . . . . . . . . . . . . . 6.4 Solving the Glycosylation Problem . . . . . . . . . . . . 6.4.1 Shielding N-Linked Glycans from Lattice Contacts . . 6.4.2 Depletion of Glycosylation by Sequon Mutation . . . 6.4.3 Deglycosylation with peptide-N-glycosidase . . . . . 6.4.4 Exoglycosidase Treatment . . . . . . . . . . . . . 6.4.5 N-Glycan Remodeling and Endoglycosidase Treatment 6.5 Does Glycan Removal Affect Protein Function? . . . . . . 6.6 Putting the Sugar Back . . . . . . . . . . . . . . . . . . 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
129 130 132 132 133 136 136 137 138 138 140 147 148 150 151
6
Solutions to the Glycosylation Problem
129
6.1 Introduction The surfaces of eukaryotic cells are covered by a diverse array of complex carbohydrates that mediate a range of biological functions. These covalent posttranslational modifications can prevent the formation of crystals necessary for the structural analysis of glycoproteins by X-ray diffraction. This is because the crystallization of macromolecules is generally hindered by structural and conformational disorder. Even small regions of high mobility in protein loops can prevent the identification of crystallization conditions [1]. In the case of glycoproteins, this heterogeneity derives largely from the carbohydrate surrounding the protein core. Carbohydrate groups, or glycans, can occupy an equivalent volume to the protein domain to which they are attached for domains of up to ∼100 amino acids. The barrier to crystallization imposed by glycosylation applies to many biologically important surface-expressed proteins: virus envelopes, membrane transport proteins, and cellsignaling receptors, including those involved in the neurological and immunological synapses. Indeed, glycosylation is a posttranslational modification affecting over half of all proteins [2] and can be essential for correct protein folding and stability. The problem of glycosylation is almost certainly among the major factors resulting in membrane proteins constituting less than 1% of the structures in the Protein Data Bank (PDB) [3]. As has now been confirmed by quantitative analysis of the performance of structural genomics (SG) pipelines, in general the key hurdles to obtaining a new protein structure occur at the steps between the expression and purification of a protein and that between obtaining the purified protein and crystallizing it [4]. Virtually all of the glycoproteins of higher eukaryotes, and particularly those of medical interest, are translated in the lumen of the endoplasmic reticulum (ER) favouring disulphide bond formation when this is required, and are thereafter directed to secretory pathways or expressed at the cell surface. Heterologous protein production in mammalian expression systems has the advantage that such proteins are able to fold under largely native conditions and can be purified from the extracellular milieu if they can be expressed in soluble forms. Most proteins in this class would otherwise have to be refolded from denatured precipitates produced in bacteria. As we will discuss, a number of methods became established for the post-folding removal of the glycosylation from recombinant glycoproteins expressed stably in mammalian cells [5–7]. With the advent of structural genomics, however, the need arose to have parallel methods that could be implemented in a high throughput setting, with the expectation that this might help relieve both of the bottlenecks at the protein expression and crystallization stages of structure determination experienced in SG pipelines. A little-heralded advantage of the SG approach is that, in addition to the rapid analysis of proteins that can be expressed and crystallized readily, the so-called “low-hanging fruit” [8], the opportunity afforded by robotics and miniaturization has made it possible to address very difficult cases, using a variety of approaches in parallel. A recent example from our laboratories involves a large, very heavily glycosylated cell-surface protein for which it has been necessary to prepare
130
S.J. Davis and M. Crispin
>50 different forms of the protein in order to obtain well-diffracting crystals (VT Chang, AR Aricescu, EY Jones and SJD, unpublished data). Taking advantage of these opportunities necessitated the development of approaches based on transient rather than stable protein expression [9]. The development of tools for producing deglycosylatable protein in transient mammalian expression systems coincided with great improvements in the yields, efficiency and scalability of transient expression protocols. Due to the advent of new episomal expression vectors, transfection protocols and multiplex, deep-well tissue culture methods, the scalability of transient expression in mammalian cells is now comparable to that of high throughput bacterial systems [10–13]. We present an overview of the problems posed by the glycosylation of cell surface and secreted proteins for structural analysis, drawing on our own experience of them, and describe the development of methods for circumventing these issues in both a stable expression context and, more recently, in transient expression systems. We start with a discussion of the impact of glycosylation on protein folding.
6.2 Glycosylation and Protein Folding There are two principal modes of glycosylation that occur in eukaryotic proteins: the glycans are attached via Asn residues or via Ser and Thr residues. These processes are termed N-and O-linked glycosylation, respectively. N-linked glycosylation occurs in the ER and has a major role in early protein folding. O-linked glycans are attached in the Golgi apparatus and therefore have little if any bearing on glycoprotein folding, except insofar as they influence the overall shape and dimensions of a molecule. In contrast to O-linked glycosylation, which tends to cluster in regions rich in serine, threonine and proline, the sites of potential N-linked glycosylation are well defined by a consensus motif. N-linked glycosylation occurs co-translationally by the recognition of a glycosylation sequon by the oligosaccharyltransferase complex, defined as Asn-X-Ser/Thr/Cys-X, where X is generally any amino acid except proline [14, 15]. The actual sequence has a bearing on the occupancy of a glycosylation sequon. For example, occupancy of the Asn-X-Cys sequon in the human epidermal growth factor receptor is known to be very low [15]. N-linked glycosylation is closely linked to protein folding and secretion [16]. At one level, glycans have a direct structural role by forming intramolecular glycanprotein interactions. Such stabilising interactions often shield hydrophobic residues on the protein surface and have important consequences for the conformational and chemical structure of a glycan at a given site, which can influence the design of crystallization strategies. However, glycosylation also plays a central role in folding per se insofar as it mediates the interaction of nascent glycoproteins with lectin-type chaperones. The chaperones of eukaryotic cells are generally highly conserved, which allows many, but not all, glycoproteins from higher eukaryotes to be expressed in lower-eukaryotic expression systems. Exceptions known to the authors are members of the immunolglobulin superfamily, such as CD48 and LFA-3, which
6
Solutions to the Glycosylation Problem
131
could be expressed at high levels but failed to fold detectably in Pichia pastoris (SJD, unpublished data). The translation and translocation of nascent glycoproteins into the ER is coupled through the ribosome-sec61 complex. Cryo-electron microscopy analysis of translating versus non-translating ribosomes suggests that nascent polypeptide chains start to fold on the ribosome [17]. N-linked glycosylation occurs via the co-translational transfer of Glc3 Man9 GlcNAc2 to the asparagine residue of the sequon. Completion of the folding process can take considerable time. For example, although initial folding of the HIV-1 envelope glycoprotein gp160 is achieved within approximately five minutes, the glycoprotein remains in the ER for several hours whilst undergoing further folding involving disulphide bond isomerisation and folding-dependent leader sequence cleavage [18]. A nascent, incompletely folded protein is able to recruit numerous chaperones. For example, thyroglobulin folds in complex with BiP/Hsp 70, which is also associated with Grp94, Grp170, ERp72 and protein disulphide isomerase [19]. The N-linked glycan of nascent glycoproteins recruits the lectin-type chaperones calnexin (CNX) and calreticulin (CRT), which are capable of recruiting additional chaperones such as ERp57. Following glycosylation by the oligosaccharyltransferase, the Glcα1-2Glcα1-3Glcα1-3 cap of the glycans is hydrolysed by α-glucosidase I and II, which act on the α1-2 and α1-3 linked groups, respectively. This leaves a monoglucosylated structure that comprises the recognition signal for both the membrane bound CNX and the soluble CRT. CNX is closely associated with the translocon as demonstrated by the rapid binding of hemagglutinin to CNX following the translocation of as few as thirty amino-acid residues [20]. The central role of CNX and CRT in glycoprotein folding is highlighted by the increased expression of these chaperones by B-cells in preparation for increased antibody expression [21]. Glucosidase II can liberate the glycoprotein from the CNX/CRT complexes allowing fully folded glycoproteins to exit the ER. However, incompletely folded glycoproteins may be retained for further lectin-mediated folding events upon reglucosylation by UDP-glucose glycoprotein:glucosyltransferase (UGGT) at N-linked glycosylation sites distal to the misfolded site [22]. UGGT determines the folding status of a glycoprotein via the exposure in misfolded glycoproteins of hydrophobic regions normally buried in the native fold [23]. The ER-associated degradation pathway eliminates irretrievably misfolded glycoproteins. The presence of Glc1 Man8 GlcNAc2 on glycoproteins unable to complete folding is recognized by ER degradation-enhancing α-mannosidase-like protein facilitating transport of the glycoprotein out of the ER for degradation. Fully folded glycoproteins cannot be reglucosylated by UGGT and therefore exit the ER and traffic to the Golgi apparatus. Endomannosidase shunt pathways can in some cells rescue glycoproteins trapped in the CNX/CRT cycle. Whilst the majority of mammalian cells contain active endomannosidase, Chinese hamster ovary (CHO) cells appear to belong to a small subset of cells completely lacking in this activity. In exceptional circumstances, monoglucosylated glycans can be retained in fully folded glycoproteins when a glycosylation site is substantially protected by the protein [24, 25].
132
S.J. Davis and M. Crispin
6.3 The “Glycosylation Problem” Whereas chemical and conformational heterogeneity has relatively little impact on solution nuclear magnetic resonance-based experiments, it is anathema to crystallography because it prevents the formation of reproducible lattice contacts and therefore crystals. Strategies have emerged to promote crystallogenesis through either shielding or reducing surface disorder. Antibody fragments can be utilized in co-crystallization trials in the hope that they might dominate lattice interactions [26]. Alternatively the protein can be engineered to eliminate flexible regions, or point mutations can be introduced to convert highly mobile side-chains, such as that of lysine, to more structurally ordered ones [27]. Similarly, surface entropy can be reduced, and crystallogenesis promoted, by the reductive methylation of lysines [28]: reductive methylation rescued 7% of a test set of 360 targets otherwise resistant to crystallization [29]. The extreme heterogeneity of the native glycans that often coat the surface of glycoproteins comprises the “glycosylation problem”. The approaches taken to minimize its impact on crystallization have parallels with the entropy reduction techniques applied to lysine. However, the impact of carbohydrate on the protein surface is generally much more significant than for lysine. Each N-glycan chain can potentially be several thousand Daltons in mass, exhibit tremendous internal and global flexibility, and occlude a large percentage of the total surface area. Furthermore, each glycosylation site can, especially in glycoproteins from mammalian expression sources, contain several hundred chemically distinct oligosaccharide structures, termed “glycoforms”. Finally, each glycoform may be heterogenous not only in mass and structure, but also charge.
6.3.1 O-Glycosylation The crystallographer interested in glycoproteins will generally have to contend at some point with both N- and O-linked glycans. O-linked glycosylation occurs in the Golgi via the sequential addition of monosaccharides to serine or threonine residues clustered in serine-, threonine- and proline-rich (STP) or “mucin-like” domains [30]. O-linked glycosylation occurring outside of STP domains cannot presently be accurately predicted and is not specifically addressed in this article. In higher eukaryotic expression systems, O-glycosylation is initiated by the transfer of α-N-acetylgalactosamine (GalNAc) to the Ser or Thr residues, which is subsequently elaborated by a wide range of glycosyltransferases leading to tremendous glycan heterogeneity. STP domains required only to project glycoproteins such as CD8 [31] and CD55 [32] from the cell surface are usually deleted from expression constructs for crystallography. The high proline content of STP domains makes them readily identifiable by disorder prediction algorithms, such as RONN [33], and the O-link prediction server, NetOGlyc [30]. However, the structure of C-cadherin confirms that O-linked glycosylation can occur outside classic disordered STP-domains, in domains with loops rich in proline [34]. This type
6
Solutions to the Glycosylation Problem
133
of intradomain O-glycosylation is, however, rare compared to STP O-linked glycosylation [30], and negligible compared to that of N-linked glycosylation [35]. The “glycosylation problem” is for the most part, therefore, restricted to N-linked glycans. In the rare cases where they cannot simply be removed from expression constructs, for example when required to obtain usable levels of expression, STP domains pose significant challenges to crystallization since they are most likely to form extended, somewhat flexible structures [31]. To overcome such problems, Leahy et al. [36] relied on post-purification treatments of the STP domain to crystallize the globular immunoglobulin domains of CD8αα, that utilized neuraminidase, core 1 O-glycanase and Staphylococcal V8 protease. Such a strategy is likely to be effective only for the subset of STP domains containing sialylated core 1 structures (Galβ1−3GalNAc).
6.3.2 N-Glycosylation An important consideration in tackling the glycosylation problem is that the different eukaryotic expression systems available to crystallographers, such as mammalian, insect and yeast expression systems, exhibit different N-glycosylation profiles. All eukaryotic expression systems share a highly conserved N-linked glycosylation pathway in the ER and early Golgi apparatus, presumably reflecting the important role of glycosylation in protein folding. Understanding the conserved features of the enzymology of glycosylation and the key differences between the common expression systems may not only guide approaches to the remodeling of glycosylation to promote crystallogenesis, but will assist the accurate model building and interpretation of electron density in the absence of detailed site-specific chemical characterization of the carbohydrate moiety [37]. 6.3.2.1 Chemical Heterogeneity Of N-Linked Glycans Higher eukaryotic expression systems are characterized by the significant chemical diversity of their glycans (Fig. 6.1a). Following folding, yeast oligomannose-type glycans are extended by mannosyltransferases to form high-mannose N-linked glycans. In insect and mammalian eukaryotic expression systems the terminal α1−2 mannose residues are trimmed by ER α-mannosidase I and Golgi α-mannosidase IA, IB and IC to give Man5 GlcNAc2 [38–40]. The subsequent transfer of β1−2 N-acetylglucosamine (GlcNAc) by GlcNAc transferase I (GnT I) forms so-called “hybrid-type” glycans which are acted upon by numerous Golgi-resident glycosyltransferases. In higher eukaryotes, these include β-galactosyltransferase, GnT III, fucosyltransferase VIII, and Golgi α-mannosidase II [41]. Golgi α-mannosidase II catalyzes the formation of monoantennary glycans by the cleavage of the two terminal α-mannose residues of the 6-arm of the trimannosyl core [42]. In higher eukaryotes these structures are subsequently elaborated to form highly heterogenous “complex-type” glycans. The antennae are initially extended with galactose. Such Galβ1−3/4GlcNAc structures can be further elaborated with moieties such as
Fig. 6.1 The glycosylation problem. Glycosylation exhibits chemical and conformational heterogeneity. (a) MALDI-TOF MS of the N-linked glycans from the extracellular region of a mammalian type-1 membrane glycoprotein (19A) transiently expressed in HEK 293T cells [9, 143]. (b) Overlay of different conformations of an N-linked glycan revealing flexibility about the glycosidic linkages [144]. (c) Molecular model of a prion glycoprotein displaying the conformational freedom of the two N-linked glycans (yellow and green) projected away from the underlying protein core (grey). The glycosylphosphatidylinositol anchor is shown in blue. The key to the symbols used in panel a is presented in Fig. 6.2 (panel a was provided courtesy of Prof. David J. Harvey, University of Oxford; Panel b is adapted from Frank et al [144]; Panel c is courtesy of Dr Mark R. Wormald, University of Oxford)
134 S.J. Davis and M. Crispin
6
Solutions to the Glycosylation Problem
135
a
2DTQ
2HCZ
1P8J
b
1S4P
1JUH
2B8H
Key: xylose
N-acetylglucosamine
galactose
fucose
mannose
6 4 3
1
α-linkage β-linkage
2
Fig. 6.2 Ordered glycans in crystal structures. Structures of well-ordered N-linked glycans displayed as sticks (oxygen, red; carbon, green; nitrogen, blue) with the associated 2Fo –Fc electron density map contoured at 1σ and shown as a blue mesh. Glycans are of complex- (a) or oligomannose- (b) type. Cartoon representations of the glycan structures are depicted below each glycan. The key to the nomenclature is presented in the bottom panel and uses the angle between monosaccharide symbols to indicate linkage position [145]. The accession codes for the corresponding PDB entries are indicated; 2DTQ [146]; 2HCZ [147]; 1P8J [148]; 1S4P [96]; 2B8H [149]; 1JUH [152]. Molecular graphics were prepared using PyMol
fucose, sialic acid, sulphate and GalNAc, some of which make up the classic blood group antigens. Thus, as a glycoprotein progresses through the Golgi apparatus, the competing actions of both lumenal and membrane-associated glycosyltransferases give rise to a vast range of final glycoforms. For example, at least 123 glycan structures were identified at the single N-linked glycosylation site of human erythrocyte CD59 [43]. Insect expression systems such as D. melanogaster and S. frugiperda yield much simpler glycosylation: following the actions of Golgi α-mannosidase II a membrane-bound β-hexosaminidase rapidly cleaves the β1−2 GlcNAc resulting in compact paucimannose-type glycans such as fucosylated Man3 GlcNAc2 [44, 45].
136
S.J. Davis and M. Crispin
6.3.2.2 Conformational Heterogeneity of N-Linked Glycans Even homogenous glycoproteins with a single glycoform may be recalcitrant to crystallization due to the extensive conformational disorder of the glycan. The conformational space explored by a glycan is a product of both the internal flexibility (Fig. 6.1b) and the degree of orientation (if any) imposed by intramolecular glycan-protein interactions (Fig. 6.1c). NMR structures of N-linked glycans have revealed that the overall topology of a glycan can be relatively well defined despite a high degree of internal flexibility [46]. Furthermore, some motifs within a glycan can be very well ordered. For example, the structure of sialyl Lewis x, NeuNAcα2−3Galβ1−4[Fucα1−3]GlcNAc, is ordered by hydrophobic packing between the α1−3 linked fucose and the galactose [47]. Small conformational changes of the GlcNAc-Asn bond may nevertheless result in positional changes at the non-reducing termini of glycans, leading to dramatic conformational heterogeneity (Fig. 6.1c). Notwithstanding some exceptions, large glycans are rarely observed in crystal contacts presumably due to the entropic penalties associated with their immobilisation. The majority of glycans observed in crystal structures are stabilized by existing intramolecular glycan-protein interactions. For example, in the crystal structure of Sf9 cell-expressed CD26 (dipeptidylpeptidase IV) in complex with adenosine deaminase, the N-linked glycan at Asn229 of CD26 lies along the dimer interface with a hydrogen bond from the central β-mannose to the backbone of adenosine deaminase [48]. Consequently, Manα1−6Manβ1−4GlcNAcβ1−4GlcNAcβ1−Asn229 is well ordered. Although examples of complex-type glycans observed in crystal structures of glycoproteins are known (Fig. 6.2a), such extreme examples of ordered complex-type glycans as those shown are rare, with most participating in extensive lattice contacts. Many observable glycans are of the oligomannose-type, implying that the same contacts that stabilize the glycan also inhibit its ER and Golgi processing (Fig. 6.2b). Many partial glycans observable in crystal structures involve contacts with the surface of the protein, particularly with the first GlcNAc at the reducing terminus of the glycan [35, 49]. The “glycosylation problem” is compounded not only by the reliance of the majority of glycoproteins on N-linked glycans for correct protein folding at the ER calnexin/calreticulin checkpoint, but by their direct effects on protein folding and stabilization [50]. Together, these factors often preclude the bacterial expression of glycoproteins, and limit the success of refolding strategies.
6.4 Solving the Glycosylation Problem 6.4.1 Shielding N-Linked Glycans from Lattice Contacts A key objective of any attempt to improve the crystallizability of a macromolecule is a reduction in surface disorder. By minimizing surface disorder, the entropic penalties associated with crystal contact formation are reduced. This can be achieved either by shielding the disordered region or reducing its disorder. The generation of complexes with functionally relevant binding partners is particularly
6
Solutions to the Glycosylation Problem
137
attractive because it yields new biological insights along with shielding disordered regions. Greater flexibility is, however, afforded by the use of antibody Fab fragments reactive with the target. The effect of Fabs, which are almost invariably unglycosylated, is to reduce the proportion of glycosylated surface area. Fab-Fab lattice contacts may therefore dominate over glycoprotein-glycoprotein interactions, particularly if the target is small relative to the Fab. For example, Evans et al. [26] used a Fab from a mitogenic antibody to crystallize the immunoglobulin domain of CD28. In the crystals CD28 homodimerized and the homodimer was suspended within the lattice by the Fab fragments, which formed the great majority of the contacts. Without these contributions the crystal lattice would have had to accommodate the ten N-linked glycans attached to the homodimer. A major drawback to the Fabbased approach to crystallization is the requirement to produce the Fab. Moreover, Fab complex formation may not completely eliminate the inherent disorder of the target and may be insufficient for the generation of high-quality crystals.
6.4.2 Depletion of Glycosylation by Sequon Mutation Point mutations allow the elimination of chemical heterogeneity at source. For example, heterogeneity in the human EPO receptor expressed in P. pastoris resulted from glycosylation at N52 and via isoformylization of the Asn164-Gly165 peptide bond, both of which were eliminated by N52Q and N164Q mutations, allowing structural determination of the EPO-receptor complex to 1.9 Å [51]. Similarly, a series of glycosylation mutants of the ADP-ribosyl cyclase, CD157, were screened for functionality and crystallizability and only the mutant lacking the most C-terminal glycan, of four, was functional and could be crystallized [52, 53]. Other examples of proteins whose crystallization was aided by the deletion of glycosylation sites include human butyrylcholin-esterase [54], human testis angioconverting enzyme [55], carboxypeptidase from yeast [56], and rat procathepsin B [57]. The screening of glycosylation mutants is labour intensive and many glycoproteins are likely to require glycosylation at one sequon at least for folding, meaning that all the glycosylation sites cannot necessarily be deleted. By definition, the existence of variable occupancy at a particular site indicates that glycosylation of this site is not essential for protein folding [58–60]. Mutation of a variably occupied glycosylation sequon is thus unlikely to affect protein folding due to loss of the glycan, allowing such sites to be deleted without unduly affecting expression. It has recently been shown that these sites can be detected by liquid chromatography electrospray ionization mass spectrometry [61]. This involves treating the glycoprotein with peptide N-glycosidase F (PNGase F), which removes glycans from N-glycosylated proteins and peptides, in the process converting the asparagine into an aspartic acid residue and increasing the mass of the fragment by 1 Da. Combined with proteolytic digestion, this mass conversion allows the identification of glycan-modified oligopeptides [61]. Alternatively, identification of the degree of conservation of the glycosylation sites combined with the use of occupancy prediction servers (http://www.cbs.dtu.dk/services/NetNGlyc/) can be used to identify
138
S.J. Davis and M. Crispin
and mutate glycosylation sites likely to be variably occupied. However, such glycosylation mutants may disrupt protein folding through new glycan-independent structural effects. As discussed in Section 6.4.5, procedures are now in place for reducing individual N-linked glycans to single GlcNAc residues [5–7, 9]. However variable site occupancy resulting in heterogeneity in the numbers of single GlcNAc residues remaining at each site may still be a problem. Therefore, in some cases it may even be necessary to eliminate this level of heterogeneity by mutating the variably occupied glycosylation acceptor site.
6.4.3 Deglycosylation with peptide-N-glycosidase PNGase F is commonly used to completely remove mammalian N-linked glycans. PNGase F hydrolyses the secondary amide bond that links N-linked glycans to Asn and thus differs from the endoglycosidases that cleave glycosidic linkages. Under denaturing conditions, PNGase F is effective at removing N-linked glycans that do not contain core α1−3 fucose, which occur in glycans from both plants and insects. However, the differential presentation of glycosylation sites on the surface of a mature, folded glycoprotein results in a significant number of glycosylation sites that are inaccessible to PNGase F hydrolysis as the enzyme requires extended peptide substrates [62]. For example, despite extensive treatment of bovine amine oxidase with PNGase F and a number of exoglycosidases, electron density corresponding to the reducing termini of all three glycosylation sites was observed [63, 64], indicating that PNGase F treatment had been ineffective. This inherent limitation led to the inclusion of 1.5 M urea in an effort to deglycosylate the Ebola virus surface glycoprotein [65]. Furthermore, many N-linked glycosylation sites are surrounded by hydrophobic surface patches that are stabilized by the interaction with the hydrophobic face of the core GlcNAc created by the axial C−H groups. In our experience, in contrast to the behaviour of endoglycosidase (Endo) H-treated protein (Fig. 6.3a; see below) the introduction by PNGase F of a point charge at glycosylation sites owing to the glycan-Asn/Asp transition can often be destabilising and result in aggregated protein (Fig. 6.3b [6]). Despite its lack of generality, deglycosylation with PNGase F has been successfully employed in the generation of a number of crystal structures, including interleukin (IL)-5 [66], IL-10:shIL-10R1 and vIL10:shIL-10R1 [67], human α-thrombin-thrombomodulin complex [68], lactoferrin [69, 70], laccase [71], and the lectin and EGF domains of E-selectin [72]. PNGase F-mediated deglycosylation of transmembrane proteins, such as the human erythrocyte anion-exchanger membrane domain [73] and the water channel, aquaporin-1 [74, 75], has also facilitated their crystallization.
6.4.4 Exoglycosidase Treatment It has long been appreciated that enzymatic remodeling of glycoproteins with exoglycosidases can promote crystallization [69]. Exoglycosidases are specific for a
6
Solutions to the Glycosylation Problem
139
a
b 31
34
Absorbance (280 nm)
Absorbance (280 nm)
Fn no: 28
43 kD
0.6
30 kD
0.4 0.2
Fn no: 21 24 27 30 33 36
0.8
43 kD
0.6
20 kD
0.4 0.2 0
0 18
24
30
Fraction number
36
18
24
30
36
Fraction number
Fig. 6.3 PNGase F-, but not Endo H-mediated deglycosylation leads to protein aggregation. Human sCD58 expressed in CHO-K1 cells in the presence of NB-DNJ was digested with Endo H (a) and PNGase F (b) overnight at 37◦ C and pH 5.2 and pH 8, respectively. Both mixes were concentrated and analyzed by Sephacryl S-100 size-exclusion chromatography. Virtually all of the PNGase F-deglycosylated material formed large aggregates that eluted with the void volume, whereas the Endo H-treated material eluted at the positions expected for variably glycosylated monomers
unique, or a very narrow range of glycosidic linkages at the non-reducing termini of glycans. Exoglycosidases have the advantage that the exposure of underlying hydrophobic surface areas can be limited. Moreover, exoglycosidases can be used to control the length of the associated glycans. This approach has been used to demonstrate the structural impact of IgG Fc glycosylation, via the generation and crystallization of a panel of IgG Fc domains with sequential truncations of the Asn297 oligosaccharide [76]. Exoglycosidase trimming of glycoproteins has facilitated the determination of several glycoprotein structures. The structures of both human chorionic gonadotropin [77] and the atrial natriuretic peptide receptor [78] were determined following neuraminidase treatment. Similarly, the crystallization of human acid β-glucocerebrosidase was achieved following treatment with neuraminidase, β-galactosidase and N-acetylglucosaminidase to expose the underlying mannose residues. The use of exoglycosidases is, however, limited as a general approach for carbohydrate remodelling due to the large number of specific enzymes required for effective uniform truncation and the requirement of prior knowledge of the glycan structures [79]. In addition, the availability and expense of such a panel of enzymes is likely to be prohibitive for high-throughput structure determination of glycoproteins. An exception is the use of a single α-mannosidase with glycoproteins from known mannose-based expression systems. Illustrating this approach, Jack bean α-mannosidase was used in the crystallization of human CD1a from Schneider 2 (S2) D. melanogaster cells [80, 81]. Due to uniform α-mannosylation, Jack bean α-mannosidase also readily trims the O-linked glycosylation of fungally-derived glycoproteins [82].
140
S.J. Davis and M. Crispin
6.4.5 N-Glycan Remodeling and Endoglycosidase Treatment By choosing a suitable cell line or by using inhibitors of glycosylation pathways, it is possible to produce glycoproteins that fold normally but whose glycosylation is sensitive to endo-N-acetylglucosaminidases. The key advantage of this approach is that protein folding proceeds normally in the ER in the presence of normal levels of glycosylation whilst the subsequent, aberrant Golgi-dependent processing resulting in the formation of oligomannose-type N-linked glycans facilitates the removal of the glycans once the protein has been purified. Endo-β-N-acetylglucosaminidases specific for the β1−4 linkage of the diN-acetylchitobiose core of N-linked glycans, have been isolated from a range of microorganisms. Although endo-β-N-acetylglucosaminidases all hydrolyse the same glycan linkage, they are each specific for only a subset of all N-linked glycans. For example, the endo-β-N-acetylglucosaminidases from Flavobacterium meningosepticum [83, 84] cleave only oligomannose-type glycans (Endo F1 ), monoand biantennary glycans (Endo F2 ) and fucosylated mono-, bi- and isomer-specific triantennary glycans (Endo F3 ). Other examples are Endo H from Streptococcus plicatus which cleaves oligomannose and hybrid-type glycans [85, 86], Endo D from Streptococcus pneumoniae which cleaves paucimannose structures and Man5 GlcNAc2 [87], and Endo S from Streptococcus pyogenes which cleaves biantennary complex-type glycans [88]. All the endo-β-N-acetylglucosaminidases leave single GlcNAc residues with or without the potential core α1−6 linked fucose attached to the protein surface. As this GlcNAc dominates the stabilizing intramolecular glycan-protein interactions, endoglycosidases are powerful tools for preparing largely unglycosylated but natively folded and active proteins. The development of deglycosylation strategies can be guided by the knowledge of the substrate specificities of the endo-β-N-acetylglucosaminidases. However, due to the substrate diversity of the endoglycosidases and the multitude of glycan structures that can be present on a glycoprotein, endoglycosidases were initially used when the glycosylation was naturally restricted, as in insect and fungal expression systems. However, endoglycosidases have also been utilized in mammalian expression systems wherein the glycan diversity is artificially restricted by the use of inhibitors or through the use of glycosylation processing mutants. 6.4.5.1 Insect Cell-Derived Glycoproteins Insect expression systems, such as D. melanogaster and S. frugiperda, are characterized by oligo- and paucimannose-type glycosylation. Expression of target glycoproteins in such systems has the advantage of glycosylation being dominated by the paucimannose core α1−6-fucosylated Man3 GlcNAc2 glycan. This structure corresponds to the trimannosyl core of glycans from the mammalian expression systems and has a well defined, relatively stable conformation [35]. The compactness and stablity of the Man3 GlcNAc2 glycan has allowed the crystallization of numerous proteins (see e.g. Garcia et al. [89]). The presence of trace amounts of
6
Solutions to the Glycosylation Problem
141
larger oligomannose-type glycans can nevertheless hinder crystallization. It is possible, however, to remove these contaminants and the dominating Man3 GlcNAc2 glycan using endoglycosidases. In the case of the HIV-1 envelope glycoprotein, gp120, Kwong et al. employed specific endoglycosidases to remove the oligomannose series (Endo H) and the paucimannose Man3 GlcNAc2 structures (Endo D [90]). The reaction efficacy was evaluated by structural analysis of the PNGase F-released glycans. Since all the glycoproteins expressed in D. melanogaster and S. frugiperda contain entirely α-mannose-based glycans these can also be trimmed from the non-reducing termini with, for example, Jack bean α-mannosidase, as used in the crystallization of human CD1a [80, 91]. 6.4.5.2 Fungally-Expressed Glycoproteins Fungal expression systems have been a popular choice for targets that require glycosylation for folding and therefore cannot be obtained using bacterial expression systems, because of the high expression levels that can be achieved. Many, but not necessarily all, mammalian glycoproteins can be expressed in these systems. Fungal expression systems give rise to glycan structures that can be readily cleaved by endoglycosidase digestion. Indeed, the N-linked glycans are entirely sensitive to Endo H and Endo F1 . This Endo H/F1 sensitivity has been exploited in the structural determination of both native and heterologous glycoproteins from fungal sources. For example, endoglycosidase treatment was used in the crystallization of phosphomonoesterase from Aspergillus ficuum [92] and the dye-decolorizing peroxidase produced by A. oryzae [93]. Among other poteins, Endo H digestion facilitated the crystallization of peroxidase from the inky-cap mushroom Coprinus cinereus [94], endopolygalacturonase from the pathogenic fungus, Stereum purpureum [95], and quercetin 2,3-dioxygenase from A. japonicus [96]. The quercetin 2,3-dioxygenase structure revealed an important advantage of using Endo H. The homodimer contained ten N-linked glycans all of which were cleaved by Endo H except for two symmetry-related glycans forming extensive stabilising interactions at the dimer interface. Thus endoglycosidase action does not remove glycans that are tightly associated with the protein and maintain protein stability and solubility. A number of crystal structures have been obtained using yeast expression systems such as Pichia pastoris in combination with deglycosylation with Endo H or Endo F1 , including human complement receptor type 2 (CD21) [97], A. fumigatus phytase [98], human pancreatic α-amylase [99], human tissue kallikrein [100] and human neutral endopeptidase (neprilysin) complexed with phosphoramidon [101]. Crystals of the deglycosylated endopeptidase diffracted to 2.1 Å, whereas the crystals of the glycosylated enzyme only diffracted to 7.5 Å resolution [102]. 6.4.5.3 Mammalian Expression Systems Expression in Lec3.2.8.1 Cells: Rat sCD2 The problems associated with crystallizing glycoproteins became apparent to the authors upon observing that, whereas the N-terminal two domains of human CD4,
142
S.J. Davis and M. Crispin
which is unglycosylated crystallizes readily [103, 104], the equivalent region of rat CD4 containing a single N-linked glycan did not (SJD et al. unpublished observations). Proteins with simplified, smaller N-linked glycans were known to be expressible using baculovirus expression systems and insect cells, but our experience was that the expression levels obtainable with such systems were relatively poor in comparison to that achievable with mammalian expression systems [105]. CHO cells, on the other hand, give very high levels of expression with fully processed (i.e. complex) N-glycosylation [106]. The generation by Pamela Stanley and co-workers of a panel of mutated CHO cell lines with defective glycosylation offered the possibility of generating high levels of protein expression with simplified N-linked glycans [107]. One cell line in particular, Lec3.2.8.1, was particularly promising as it produced oligomannose-type Man5 GlcNAc2 N-linked glycans known to be sensitive to Endo H. These cells harboured mutations resulting in an undefined deficiency in sialic acid metabolism, defective CMP-sialic acid and UDP-galactose translocation, and disrupted GnT I [108]. The sensitivity of the N-linked glycans produced by these cells, under conditions that preserved the folding of the proteins to which they were attached (i.e. in the absence of denaturants), was unknown. This was first tested in structural studies of soluble (s) forms of rat CD2, which has four N-glycosylation sequons and CD4, which has two. The Lec3.2.8.1 cell line was transfected with calcium phosphate/DNA mixtures using the very high-yield glutamine synthetase-based stable gene expression system [109]. Although this was initially very inefficient, transfection was improved using cationic liposomes. The first few Lec3.2.8.1 clones expressing rat sCD2 and sCD4 that were obtained did, however, secrete the recombinant glycoproteins at very high levels (40−80 mg/l). The expressed proteins were purified by affinity chromatography and gel filtration and Endo H sensitivity compared to the analogous forms expressed in wild-type (CHO-K1) cells. Under non-denaturing conditions, Endo H only partially deglycosylated CHO-K1-derived sCD4 and had no effect on CHO-K1-derived sCD2 over a 16 h incubation. In contrast, the deglycosylation of Lec3.2.8.1 derived material went to virtual completion within 90 minutes. The reduction in the heterogeneity of sCD2 following deglycosylation was marked (Fig. 6.4a), although somewhat exaggerated by the variable sequon occupancy characterizing rat sCD2 [61]. The molecular weight of the deglycosylated sCD4 was virtually indistinguishable from that of an unglycosylated variant allowing for the presence of 1−4 GlcNAc residues left by Endo H [105]. Glycosylation analysis showed that the majority of the N-linked glycans from the CHO-K1 derived products were acidic whereas greater than 90% of those from the Lec3.2.8.1 derived sCD2 and sCD4 were neutral. Gel-filtration showed that the N-linked glycans from sCD2 and sCD4 expressed in CHO-K1 cells were large and heterogeneous and combined β-galactosidase and hexosaminidase digestion confirmed that all of the N-linked glycans were of the complex type. The N-linked glycans from sCD4 and sCD2 expressed in Lec3.2.8.1 cells, in contrast, were smaller and less heterogeneous than those present on the CHO-K1 derived glycoproteins. Digestion with α-mannosidase confirmed that they were of the oligomannosetype.
6
Solutions to the Glycosylation Problem
a
Le + Le CH c3 En c 3 .2 O do .2 . .8 -K .1 H 8.1 1
143
b
43 kD
30 kD
20 kD
Fig. 6.4 The deglycosylation and crystallization of rat sCD2. (a) sCD2 expressed in Lec3.2.8.1 cells was treated with Endo H. The starting material (lane 2) and deglycosylated sCD2 (lane 3) were analysed on an SDS-PAGE gel stained with Coomassie Blue. Comparison was made with sCD2 expressed in CHO-KI cells (lane 1). The four bands comprising the non-Endo H treated CHOK1- and Lec3.2.8.1-derived forms arise as a result of variable sequon occupancy [61]. (b) Crystals appearing in the first well of a hanging-drop crystallization trial of the deglycosylated sCD2 are shown
Lec3.2.8.1 cell-derived, Endo H-digested sCD4 failed to give better crystals than those previously obtained. However, the sensitivity of the Lec3.2.8.1-derived sCD2 meant that 3 mg of deglycosylated sCD2 could be prepared from 10 mg of total protein using 700 mU of Endo H, and this material crystallized very readily in the presence of polyethylene glycol (Fig. 6.4b), whereas the fully glycosylated Lec3.2.8.1 product formed only amorphous precipitates. The structure of sCD2, solved to 2.8 Å resolution, offered the first complete view of the extracellular region of a cell adhesion molecule [110]. Expression in Lec3.2.8.1 cells has since been used in structural analyses of the T-cell receptor (TCR) in complex with an anti-TCR Fab [111], murine CD8αα and of CD8αα in complex with both the class I MHC H-2 Kb [112] and the non-classical MHC-I molecule, TL [113]. An important, more recent development is that, even though it is much less efficient than in buffers at pH 5.2, there is enough residual Endo H activity at pH 7.4 or even pH 8 to attempt deglycosylation under non-acidic conditions if the target protein is unstable under acidic conditions. The use of glutathione S-transferase-linked forms of e.g. Endo F1 allows their subsequent removal from the digestion mix. It is worth noting that the truncated sugars present on Lec3.2.8.1-derived material occasionally seem also to be conducive to crystallization. For example, non-Endo H digested, Lec3.2.8.1-derived ICAM-1 [114] and ICAM-2 [115], CEACAM1a [42], Semaphorin 4D [116] and a TCR/MHC class II complex [117] also produced crystals, presumably due to the relative uniformity and compactness of Lec3.2.8.1 N-linked glycans. In the case of Semaphorin 4D, the glycosylated, but not the Endo H-deglycosylated form of Lec3.2.8.1-derived protein formed crystals. The glycosylated and deglycosylated forms of any given protein with compact N-linked glycans may therefore be worth considering as separate crystallization targets.
144
S.J. Davis and M. Crispin
Inhibition of Processing with N-Butyldeoxynojirimycin: Human sCD2 Methods were sought that avoided difficult transfections of Lec3.2.8.1 cells and for generating Endo H-sensitive protein from pre-existing cell lines. The α-glucosidase I inhibitor, N-butyldeoxynojirimycin (NB-DNJ), was known to prevent the maturation of N-linked oligosaccharides on HIV-1 gp120 expressed in CHO cells [118], to the extent that they remained in Endo H-sensitive oligomannose forms. Quantitative analysis of a human sCD2-expressing cell line [6] cultured in the absence or in the presence of 0.5, 1, 1.5, or 2 mM NB-DNJ indicated that the inhibitor reduced expression 3-4 fold. On the other hand, sensitivity to Endo H increased with increasing NB-DNJ concentrations (Fig. 6.5a). At the highest NB-DNJ concentration less than ∼15% of the hsCD2 was Endo H-resistant reflecting the absence of the endomannosidase shunt pathway in these cells. The resistant glycoforms could nevertheless be readily removed using a mixture of lentil lectin-, concanavalin A- and phytohaemagglutinin-coupled agarose beads. The Endo H-treated hsCD2 readily formed crystals that diffracted to 2.5 Å resolution (Fig. 6.5b). Subsequent work indicated that for some proteins expressed in Lec3.2.8.1 cells or in CHO-K1 cells in the presence of NB-DNJ, as little as 10−30% of the total secreted protein was Endo H sensitive. While this represented a substantial increase in Endo H sensitivity over that of, for example, rat sCD2 expressed in wild-type CHO-K1 cells (<1%), neither approach gave complete inhibition of oligosaccharide processing beyond oligomannose forms. Since the enzymatic target of NB-DNJ and the processing disruptions in Lec3.2.8.1 cells were complementary, we suspected that the combined effects of the Lec3.2.8.1 phenotype and NB-DNJ on the Endo Hsensitivity of the total oligosaccharide pool would be essentially additive, as proved to be the case (Fig. 6.6a). Approximately two-thirds of the Endo H-resistant glycans remaining after NB-DNJ inhibition were rendered Endo H-sensitive by the defects in the Lec3.2.8.1 processing pathways, ensuring that the majority of glycans present within the total pool of secreted protein were rendered Endo H-sensitive when the treatments were combined. These effects were consistent with the vectorial nature of
a
NB-DNJ Endo H -
0 mM
0.5 mM -
1 mM -
1.5 mM -
2 mM
b
-
43 kD 30 kD 20 kD
Fig. 6.5 NB-DNJ renders the N-linked oligosaccharides of human sCD2 sensitive to Endo H. (a)Human sCD2 was expressed in the presence of 0, 0.5, 1.0, 1.5, or 2 mM NB-DNJ, purified and then digested with increasing amounts of endo H (0.012, 0.06 or 0.3 I.U.B. units/mg). The digests were analyzed on SDS-PAGE gels stained with Coomassie Blue. Maximum Endo H sensitivity was observed following expression of human sCD2 in the presence of 1.5 mM NB-DNJ. (b) The deglycosylated human sCD2 formed crystals that diffracted to 2.5 Å resolution [150]
6
Solutions to the Glycosylation Problem
a -
rsCD2-C 1h 3h
-
rsCD2-L 1h 3h
-
sCD58-N 1h 3h
145
-
hsCD2-LN 1h 3h
-
sCD80-LN 1h 3h
b
43 kD 30 kD 20 kD
Fig. 6.6 Expression of glycoproteins in Lec3.2.8.1 cells in the presence of NB-DNJ enhances Endo H sensitivity. (a) Protein was digested with Endo H for the indicated times prior to SDS-PAGE analysis. The triangle indicates the position of migration of Endo H. The set of proteins analyzed are: rsCD2-C, rat sCD2 expressed in wild-type CHO-K1 cells; sCD2-L, rat sCD2 expressed in Lec3.2.8.1 cells; sCD58-N, sCD58(LFA-3) expressed in wild-type CHO-K1 cells in the presence of 1.5 mM NB-DNJ; hsCD2-LN, human sCD2 expressed in Lec3.2.8.1 cells in the presence of 0.5 mM NB-DNJ; and sCD80-LN, sCD80 expressed in Lec3.2.8.1 cells in the presence of 0.5 mM NB-DNJ. (b) The deglycosylated form of a chimeric protein consisting of the CD2 binding domain of human CD58 and domain 2 of rat CD2 formed crystals that diffracted to 1.8 Å resolution [151]
oligosaccharide processing and the physical separation of the early and late stages of N-glycan processing in the ER and Golgi apparatus, respectively. The large effects of the Lec3.2.8.1 phenotype and NB-DNJ were dominant over protein- and sitespecific effects since the oligosaccharide profiles of e.g. human sCD2 and sCD80 expressed under these conditions were essentially identical. This implied that the glycan remodeling approach would be successful for most glycoproteins. The combined approach proved very useful for the crystallization of a non-native chimeric form of CD58 (Fig. 6.6b) which apparently folded inefficiently since it was secreted at ∼10% of the levels of native sCD58. Obtaining enough protein for crystallization trials depended critically on its very high Endo H-sensitivity. Extension of the Inhibitor Approach to High Throughput Settings: s19A It has been argued that SG methodologies should be broadened to better accommodate targets of higher technical difficulty and greater scientific “impact” [119, 120]. This includes proteins unsuited to the dominant technology implemented in existing SG pipelines, e.g. glycoproteins not easily expressed in bacterial systems. Stable mammalian cell-based protein expression cannot readily be implemented in a high-throughput setting owing to variation in the scale of the tissue culture procedures required, i.e. from 200 μl cultures through to multiple one-litre NuncTM “Cell Factories”. The range in scale arises because individual clones exhibit considerable variation in expression, necessitating their individual selection. However, it was noted that the efficiency and scalability of transient mammalian expression is now comparable to that of high-throughput bacterial systems, offering scope for mammalian culture to be adapted to high-throughput settings [9]. Yields of rapid, transiently expressed glycoproteins in human embryonic kidney (HEK) 293T cells frequently exceed 5 mg/l [119]. Therefore, these cells were chosen for establishing methods for producing Endo H-sensitive glycoproteins by transient transfection.
146
S.J. Davis and M. Crispin
Following the examples set by expression in Lec3.2.8.1 cells and in the presence of NB-DNJ, we compared a glycan processing mutant of HEK 293 cells and the effects of two processing inhibitors in cultures of 293T cells. The effects of the inhibitors on glycan processing in 293T cells, revealed by matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS), are shown in Fig. 6.7. HEK 293S cells deficient in GnT I, which catalyzes the formation of hybrid-type glycans via transfer of β1−2 linked GlcNAc to the 3-antenna of the oligomannose
Fig. 6.7 Manipulation of the N-linked glycosylation pathway in a high throughput setting. (a) Schematic representation of the N-linked glycosylation pathway. The initial α-glucosidase reactions have been omitted (α-MI, α1-2-mannosidases; GnT I, GlcNAc transferase I; α-MII, Golgi α-mannosidase II). The stage of inhibition by the mannosidase inhibitors, kifunensine (Kif) and swainsonine (Swa) are indicated. MALDI-TOF MS of the N-linked glycans from the extracellular region of a mammalian type-1 membrane glycoprotein (s19A) transiently expressed in HEK 293T cells in the presence of kifunensine (b) and swainsonine (c) [143] are shown. The hydrolysis of Man9 GlcNAc2 glycan by Endo H is also shown in Panel b. The insets show crystals of EphA4 and the receptor binding domain of ephrin A2 [125] (b) and s19A [9] (c) prepared with the use of kifunensine and swainsonine, respectively. The key to the symbols used is presented in the legend of Fig. 6.2 (Mass spectrometry data was provided courtesy of Prof. David J. Harvey, University of Oxford)
6
Solutions to the Glycosylation Problem
147
substrate, Man5 GlcNAc2 , secreted protein bearing only Man5 GlcNAc2 -type Nlinked glycans [121]. The extreme Endo H-sensitivity of GnT I-deficient HEK 293S-derived material in our hands [9] contrasted with that of glycoproteins from GnT I-deficient (i.e. Lec1) CHO cells (<50% sensitivity; SJD et al. unpublished data), or even Lec3.2.8.1 cells (50−70% sensitivity [5]). HEK 293S cells may therefore lack an α-mannosidase activity present in CHO cells. The deglycosylated extracellular region of a human tyrosine phosphatase, sRPTPμ, comprised of six fibronectin III domains and expressed in these cells, gave crystals that diffracted to 3 Å resolution whereas crystals of the fully glycosylated form of the protein only diffracted to >8 Å (AR Aricescu, personal communication). A drawback of the HEK 293S cells deficient in GnT I, however, was that expression was only 10−50% of the level obtainable in HEK 293T cells. N-linked glycan processing inhibitors were therefore trialed in 293T cells, taking advantage of the enhanced expression-capability of these cells [9]. The alkaloids kifunensine and swainsonine are potent inhibitors of α-mannosidases I and II, respectively (Fig. 6.7a). Unexpectedly, in the presence of these inhibitors, 293T cells produced 10−30% more protein than 293T cells alone [9], possibly due to its inhibition of ER-associated degradation [122]. N-linked glycans released from protein expressed in the presence of kifunensine and swainsonine were of the oligomannose- and hybrid-types, respectively (Fig. 6.7b, c). Both forms of a test protein (s19A) were essentially glycan-free after digestion with Endo H at pH 5.2 [9]. Kifunensine had the advantages over swainsonine of preventing corefucosylated hybrid structures from forming, which are somewhat Endo H-resistant, and of being active at 2−4-fold lower concentrations, making it more cost-effective. In addition to sRPTPμ [123], the HEK 293T/kifunensine/Endo H method has provided the structures of a range of targets with distinct protein folds. These include the Nipah virus envelope attachment glycoprotein [124], henipaviral attachment glycoproteins complexed with ephrin [124], ephrin/ephrin receptor complexes [125] and human hedgehog-interacting protein [126]. An additional advance is that kifunensine is effective in CHO cells (C Yu, MC and SJD, unpublished data).
6.5 Does Glycan Removal Affect Protein Function? An important consideration is whether or not the Endo H-treated proteins reflect the structural properties and behaviour of their native, glycosylated precursors. While many Ig-like molecules are N-glycosylated, the extent of glycosylation varies considerably and glycosylation sites are generally not conserved, even between homologues of different species. The ligand binding function of rat CD2, which has four N-glycosylation sites, is not glycosylation-dependent [127, 128]. It was suggested, however, that other IgSF domains, such as the ligand binding domain of human CD2, which has a single N-glycosylation site, “maintain their stable conformation as a result of dynamic interactions between the polypeptide and its attached glycan" [129]. This proposal, which was based on mutagenesis of the glycosylation site of CD2 and of residues surrounding it [129, 130] was surprising given that
148
S.J. Davis and M. Crispin
the evolutionary success of the IgSF, and the high degree of conservation of the IgSF fold, are each thought to reflect the ability of IgSF protein domains to form stable platforms for the presentation of receptor-ligand recognition motifs [131, 132]. Moreover, untreated, Endo H- and PNGase F-treated hsCD2 were each shown to bind with an affinity for sCD58 similar to that of fully glycosylated sCD2 [6]. The binding of three conformationally sensitive antibodies to all three forms of the protein further confirmed the glycosylation independence of the folding of CD2 [6]. These observations did not rule out the possibility that the domain 1 glycan has some other role in CD2 expression as implied by the mutational data of Wyss et al. and others. The problem with the mutational approach, however, is that it does not necessarily distinguish between the potential effects of glycosylation on (i) protein folding, (ii) post-folding conformational stability, or (iii) protein trafficking to or beyond the cell surface. The conformational stability of deglycosylated two-domain CD2 and the unimpaired trafficking of unglycosylated mutant forms of CD2 to the cell surface suggested that the domain 1 glycan influences only the initial folding of CD2. A key advantage of the glycan remodeling approach is that folding proceeds under normal conditions of glycosylation in the ER. Nevertheless, wherever possible the structural and functional properties of the glycosylated and deglycosylated forms of the protein to be crystallized ought to be compared. In our experience, proteins are either highly soluble and functionally indistinguishable from the wild-type protein, or completely insoluble (as in the case of sgp120 and sCD28; SJD et al. unpublished observations).
6.6 Putting the Sugar Back Although deglycosylation enhances the chances of a glycoprotein crystallizing, crystallization of the fully glycosylated target can also be attempted following homogenization of the glycans. This strategy is usually attempted with targets that aggregate upon deglycosylation. In these cases, the N-linked glycans often form stabilizing interactions with the protein surface that extend beyond the reducingterminal GlcNAc. Expression of such targets with homogeneous glycans can yield electron density for protein-proximal sugar residues. A feature of the glycan remodeling approach is that each method traps the glycans as various oligomannose structures. For example, kifunensine leaves Man9 GlcNAc2 glycans, GnT I-deficient HEK 293S or CHO Lec3.2.8.1 cells leave Man5 GlcNAc2 glycans, and D. melanogaster SC2 and baculovirus/Sf9 cells yield fucosylated Man3 GlcNAc2 structures. This has the advantage that electron density corresponding to the Nlinked glycans can be readily interpreted by knowledge of the shared framework of oligomannose glycans. This is because all the oligomannose structures from these systems have a shared biosynthetic origin, i.e. Glc3 Man9 GlcNAc2 , thus enabling interpretation of electron density in the absence of detailed carbohydrate analysis of the target glycoprotein. The methods outlined above for the homogenization of glycosylation can be applied to systems where the oligomannose glycoform of the target is biologically
6
Solutions to the Glycosylation Problem
149
a
b D3 B
4’
C 3
B
3
D3 A
4
A
1
4
4’ 2
2
D2 C
1 D1
Fig. 6.8 A blueprint for density interpretation. Electron density corresponding to N-linked glycans (a) can be readily interpreted if oligomannose-based expression systems have been used due to their conserved structure (b). The N-linked glycan is from the immature oligomannose glycoform of IgG Fc [136]. The glycan is displayed as sticks (oxygen, red; carbon, green; nitrogen, blue) with the associated 2Fo –Fc electron density map contoured at 1σ and shown as a blue mesh. The key to the symbols used is presented in Fig. 6.2. Molecular graphics were prepared using PyMol
important. An example is the oligomannose glycoform of IgG1 Fc, which is reported to have enhanced antigen-dependent cellular cytotoxicity compared to Fc bearing fucosylated complex-type glycans [133–135]. Crystallographic analysis revealed extensive electron density corresponding to carbohydrate in the interdomain space between the opposing chains of Cγ2-Cγ3 domains [136]. The structure of the most ordered glycan is shown in Fig. 6.8. Analysis of the glycosidic torsion angles using CARP plots implemented in the GLYCOSCIENCES.de web portal [137– 141], revealed that all observable linkages were in the favoured region, emphasizing that oligosaccharides have internal structure. This is also evident in other structures wherein N-linked glycans form either intramolecular interactions or intermolecular lattice contacts. With the advent of methods for the chemical homogenization of N-linked glycans, more structures of fully glycosylated glycoproteins are likely to be solved, particularly in cases where deglycosylation induces aggregation because the glycans mediate extensive contacts with the protein surface. The crystallographic analysis of glycoproteins has lead to an increased need for readily accessible model building and validation tools of similar sophistication to those available for protein model building. Analysis of the quality of models deposited in the PDB containing oligosaccharides has revealed that perhaps as many as one-third of entries contain significant errors in carbohydrate stereochemistry, nomenclature or even consistency with the electron density maps [37, 142]. Common errors include significant deviations away from the known chemistry of carbohydrates such as high-energy boat conformations and distortions around the α-carbon, presumably due to the use of inadequate parameter files. PDBCare
150
S.J. Davis and M. Crispin
identifies stereochemical errors and is particularly useful in the case of highenergy torsion angles [141]. Many models of glycoproteins contain monosaccharide residues and linkages incompatible with the known biosynthetic routes of glycans. Knowledge of the conserved framework for the oligomannose N-linked glycans generated using the methods described herein may help to minimize such errors.
6.7 Conclusions The development of high-yield mammalian expression systems capable of generating glycoproteins with homogenous and cleavable mannose-based glycosylation means that for most proteins glycosylation is no longer a barrier to their crystallization. A decision tree for negotiating the options available for preparing glycoproteins for crystallization is given in Fig. 6.9. The implementation of these methods in high throughput settings, along with the advent of miniaturization and robotic screening of crystallization conditions should help relieve important bottlenecks in SG pipelines targeting eukaryotic proteins. The use of mannose-based expression systems also provides a useful template for crystallographic model building that, TARGET
Bacterial expression
Failure
Poor
Sf9 expression
293T expression
Good Success Variable occupancy?
Yes
LC/MS/MS analysis
Yes
Variable occupancy? No
No
Low – moderate
Expression level High
Express in 293T with kifunensine Endo H + D deglycosylation
Endo H simplification
Express in GnT1-/- 293S
Endo H
PURIFICATION AND CRYSTALLIZATION
Fig. 6.9 Flow diagram outlining strategy decisions for the high throughput preparation of glycoproteins for crystallization. The three major pathways for protein expression, i.e. bacterial, insect and mammalian (HEK 293T) cell expression are coloured pink, yellow and blue, respectively. In the first instance, we propose that no consideration to the potential effects of variable sequon occupancy should be given
6
Solutions to the Glycosylation Problem
151
together with monitoring of the stereochemistry, should improve the quality of deposited structures [37, 138]. These solutions to the “glycosylation problem” ought to bolster efforts to understand the important roles that glycoproteins play in eukaryotic and viral biology. Acknowledgements The authors wish to thank Veronica Chang, Radu Aricescu, Ray Owens, Jo Nettleship, Yvonne Jones, David Stuart, Neil Barclay, Chris Scanlan, Pauline Rudd, David Harvey, Mark Wormald, Tom Bowden and Raymond Dwek for many helpful discussions. We are particularly grateful to Ed Evans for his intellectual input and for his preparation of Fig. 6.9. This work was funded by the Wellcome Trust, the United Kingdom Medical Research Council and the Oxford Glycobiology Institute Endowment.
References 1. Pantazatos D, Kim JS, Klock HE, Stevens RC, Wilson IA, Lesley SA, Woods VL Jr (2004) Rapid refinement of crystallographic protein construct definition employing enhanced hydrogen/deuterium exchange MS. Proc Natl Acad Sci USA 101:751–756 2. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8 3. Doerr A (2009) Membrane protein structures. Nat Methods 6: doi:10.1038/nmeth.f.240 4. Terwilliger TC, Stuart D, Yokoyama S (2009) Lessons from structural genomics. Annu Rev Biophys 38:371–383 5. Butters TD, Sparks LM, Harlos K, Ikemizu S, Stuart DI, Jones EY, Davis SJ (1999) Effects of N-butyldeoxynojirimycin and the Lec3.2.8.1 mutant phenotype on N-glycan processing in Chinese hamster ovary cells: application to glycoprotein crystallization. Protein Sci 8: 1696–1701 6. Davis SJ et al (1995) Ligand binding by the immunoglobulin superfamily recognition molecule CD2 is glycosylation-independent. J Biol Chem 270:369–375 7. Davis SJ, Puklavec MJ, Ashford DA, Harlos K, Jones EY, Stuart DI, Williams AF (1993) Expression of soluble recombinant glycoproteins with predefined glycosylation: application to the crystallization of the T-cell glycoprotein CD2. Protein Eng 6:229–232 8. Pusey ML, Liu ZJ, Tempel W, Praissman J, Lin D, Wang BC, Gavira JA, Ng JD (2005) Life in the fast lane for protein crystallization and X-ray crystallography. Prog. Biophys. Mol. Biol. 88:359–386 9. Chang VT et al (2007) Glycoprotein structural genomics: solving the glycosylation problem. Structure 15:267–273 10. Aricescu AR, Lu W, Jones EY (2006) A time and cost efficient system for high level protein production in mammalian cells. Acta Crystallogr D Biol Crystallogr 10:1243–1250 11. Davies A, Greene A, Lullau E, Abbott WM (2005) Optimisation and evaluation of a highthroughput mammalian protein expression system. Protein Expr Purif 42:111–121 12. Durocher Y, Perret S, Kamen A (2002) High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res 30:E9 13. Geisse S, Henke M (2005) Large-scale transient transfection of mammalian cells: a newly emerging attractive option for recombinant protein production. J Struct Funct Genomics 6:165–170 14. Bause E (1983) Structural requirements of N-glycosylation of proteins. Studies with proline peptides as conformational probes. Biochem J 209:331–336 15. Sato C, Kim JH, Abe Y, Saito K, Yokoyama S, Kohda D (2000) Characterization of the N-oligosaccharides attached to the atypical Asn-X-Cys sequence of recombinant human epidermal growth factor receptor. J Biochem (Tokyo) 127:65–72
152
S.J. Davis and M. Crispin
16. Parodi AJ (2000) Protein glucosylation and its role in protein folding. Annu Rev Biochem 69:69–93 17. Gilbert RJ, Fucini P, Connell S, Fuller SD, Nierhaus KH, Robinson CV, Dobson CM, Stuart DI (2004) Three-dimensional structures of translating ribosomes by Cryo-EM. Mol Cell 14:57–66 18. Land A, Zonneveld D, Braakman I (2003) Folding of HIV-1 envelope glycoprotein involves extensive isomerization of disulfide bonds and conformation-dependent leader peptide cleavage. Faseb J 17:1058–1067 19. Kuznetsov G, Chen LB, Nigam SK (1997) Multiple molecular chaperones complex with misfolded large oligomeric glycoproteins in the endoplasmic reticulum. J Biol Chem 272:3057–3063 20. Daniels R, Kurowski B, Johnson AE, Hebert DN (2003) N-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin. Mol Cell 11:79–90 21. van Anken E, Romijn EP, Maggioni C, Mezghrani A, Sitia R, Braakman I, Heck AJ (2003) Sequential waves of functionally related proteins are expressed when B cells prepare for antibody secretion. Immunity 18:243–253 22. Taylor SC, Ferguson AD, Bergeron JJ, Thomas DY (2004) The ER protein folding sensor UDP-glucose glycoprotein-glucosyltransferase modifies substrates distant to local changes in glycoprotein conformation. Nat Struct Mol Biol 11:128–134 23. Caramelo JJ, Castro OA, Alonso LG, De Prat-Gay G, Parodi AJ (2003) UDPGlc:glycoprotein glucosyltransferase recognizes structured and solvent accessible hydrophobic patches in molten globule-like folding intermediates. Proc Natl Acad Sci USA 100:86–91 24. Crispin MDM, Ritchie GE, Critchley AJ, Morgan BP, Wilson IA, Dwek RA, Sim RB, Rudd PM (2004) Monoglucosylated glycans in the secreted human complement component C3: implications for protein biosynthesis and structure. FEBS Lett 566:270–274 25. Ryu KS et al (2009) The presence of monoglucosylated N196-glycan is important for the structural stability of storage protein, arylphorin. Biochem J 421:87–96 26. Evans EJ et al (2005) Crystal structure of a soluble CD28-Fab complex. Nat Immunol 6: 271–279 27. Derewenda ZS, Vekilov PG (2006) Entropy and surface engineering in protein crystallization. Acta Crystallogr D Biol Crystallogr 62:116–124 28. Walter TS et al (2006) Lysine methylation as a routine rescue strategy for protein crystallization. Structure 14:1617–1622 29. Kim Y et al (2008) Large-scale evaluation of protein reductive methylation for improving protein crystallization. Nat Methods 5:853–854 30. Julenius K, Molgaard A, Gupta R, Brunak S (2005) Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. Glycobiology 15:153–164 31. Merry AH et al (2003) O-glycan sialylation and the structure of the stalk-like region of the T cell co-receptor CD8. J Biol Chem 278:27119–27128 32. Lukacik P et al (2004) Complement regulation at the molecular level: the structure of decayaccelerating factor. Proc Natl Acad Sci USA 101:1279–1284 33. Yang ZR, Thomson R, McNeil P, Esnouf RM (2005) RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 21:3369–3376 34. Boggon TJ, Murray J, Chappuis-Flament S, Wong E, Gumbiner BM, Shapiro L (2002) C-cadherin ectodomain structure and implications for cell adhesion mechanisms. Science 296:1308–1313 35. Petrescu AJ, Milac AL, Petrescu SM, Dwek RA, Wormald MR (2004) Statistical analysis of the protein environment of N-glycosylation sites: implications for occupancy, structure, and folding. Glycobiology 14:103–114 36. Leahy DJ, Axel R, Hendrickson WA (1992) Crystal structure of a soluble form of the human T cell coreceptor CD8 at 2.6 A resolution. Cell 68:1145–1162
6
Solutions to the Glycosylation Problem
153
37. Crispin M, Stuart DI, Jones EY (2007) Building meaningful models of glycoproteins. Nat Struct Mol Biol 14:1 38. Herscovics A (2001) Structure and function of Class I α1,2-mannosidases involved in glycoprotein synthesis and endoplasmic reticulum quality control. Biochimie 83:757–762 39. Lal A, Pang P, Kalelkar S, Romero PA, Herscovics A, Moremen KW (1998) Substrate specificities of recombinant murine Golgi α1, 2-mannosidases IA and IB and comparison with endoplasmic reticulum and Golgi processing α1,2-mannosidases. Glycobiology 8:981–995 40. Tremblay LO, Herscovics A (2000) Characterization of a cDNA encoding a novel human Golgi α1, 2-mannosidase (IC) involved in N-glycan biosynthesis. J Biol Chem 275: 31655–31660 41. Schachter H (1991) The ‘yellow brick road’ to branched complex N-glycans. Glycobiology 1:453–461 42. Tan K et al (2002) Crystal structure of murine sCEACAM1a: a coronavirus receptor in the CEA family. EMBO J 21:2076–2086 43. Rudd PM, Morgan BP, Wormald MR, Harvey DJ, van den Berg CW, Davis SJ, Ferguson MA, Dwek RA (1997) The glycosylation of the complement regulatory protein, human erythrocyte CD59. J Biol Chem 272:7229–7244 44. Leonard R, Rendic D, Rabouille C, Wilson IB, Preat T, Altmann F (2006) The Drosophila fused lobes gene encodes an N-acetylglucosaminidase involved in N-glycan processing. J Biol Chem 281:4867–4875 45. Tomiya N et al (2006) Purification, characterization, and cloning of a Spodoptera frugiperda Sf9 β-N-acetylhexosaminidase that hydrolyzes terminal N-acetylglucosamine on the N-glycan core. J Biol Chem 281:19545–19560 46. Woods RJ, Pathiaseril A, Wormald MR, Edge CJ, Dwek RA (1998) The high degree of internal flexibility observed for an oligomannose oligosaccharide does not alter the overall topology of the molecule. Eur J Biochem 258:372–386 47. Wormald MR, Edge CJ (1993) The systematic use of negative nuclear Overhauser constraints in the determination of oligosaccharide conformations: application to sialyl-Lewis X. Carbohydr Res 246:337–344 48. Weihofen WA, Liu J, Reutter W, Saenger W, Fan H (2004) Crystal structure of CD26/dipeptidyl-peptidase IV in complex with adenosine deaminase reveals a highly amphiphilic interface. J Biol Chem 279:43330–43335 49. Wormald MR, Petrescu AJ, Pao YL, Glithero A, Elliott T, Dwek RA (2002) Conformational studies of oligosaccharides and glycopeptides: complementarity of NMR, X-ray crystallography, and molecular modelling. Chem Rev 102:371–386 50. Wormald MR, Dwek RA (1999) Glycoproteins: glycan presentation and protein-fold stability. Structure 7:R155–R160 51. Syed RS et al (1998) Efficiency of signalling through cytokine receptors depends critically on receptor orientation. Nature 395:511–516 52. Yamamoto-Katayama S, Ariyoshi M, Ishihara K, Hirano T, Jingami H, Morikawa K (2002) Crystallographic studies on human BST-1/CD157 with ADP-ribosyl cyclase and NAD glycohydrolase activities. J Mol Biol 316:711–723 53. Yamamoto-Katayama S et al (2001) Site-directed removal of N-glycosylation sites in BST1/CD157: effects on molecular and functional heterogeneity. Biochem J 357:385–392 54. Nachon F, Nicolet Y, Viguie N, Masson P, Fontecilla-Camps JC, Lockridge O (2002) Engineering of a monomeric and low-glycosylated form of human butyrylcholinesterase: expression, purification, characterization and crystallization. Eur J Biochem 269: 630–637 55. Gordon K, Redelinghuys P, Schwager SL, Ehlers MR, Papageorgiou AC, Natesh R, Acharya KR, Sturrock ED (2003) Deglycosylation, processing and crystallization of human testis angiotensin-converting enzyme. Biochem J 371:437–442 56. Wilson KP, Liao DI, Bullock T, Remington SJ, Breddam K (1990) Crystallization of serine carboxypeptidases. J Mol Biol 211:301–303
154
S.J. Davis and M. Crispin
57. Sivaraman J, Coloumbe R, Magny MC, Mason P, Mort JS, Cygler M (1996) Crystallization of rat procathepsin B. Acta Crystallogr D Biol Crystallogr 52:874–875 58. Petrescu AJ, Wormald MR, Dwek RA (2006) Structural aspects of glycomes with a focus on N-glycosylation and glycoprotein folding. Curr Opin Struct Biol 16:600–607 59. Branza-Nichita N, Negroiu G, Petrescu AJ, Garman EF, Platt FM, Wormald MR, Dwek RA, Petrescu SM (2000) Mutations at critical N-glycosylation sites reduce tyrosinase activity by altering folding and quality control. J Biol Chem 275:8169–8175 60. Young RJ et al (1995) Secretion of recombinant human IgE-Fc by mammalian cells and biological activity of glycosylation site mutants. Protein Eng 8:193–199 61. Nettleship JE, Aplin R, Radu Aricescu A, Evans EJ, Davis SJ, Crispin M, Owens RJ (2006) Analysis of variable N-glycosylation site occupancy in glycoproteins by liquid chromatography electrospray ionization mass spectrometry. Anal Biochem 361:149–151 62. Joshi S, Katiyar S, Lennarz WJ (2005) Misfolding of glycoproteins is a prerequisite for peptide: N-glycanase mediated deglycosylation. FEBS Lett 579:823–826 63. Calderone V, Di Paolo ML, Trabucco M, Biadene M, Battistutta R, Rigo A, Zanotti G (2003) Crystallization and preliminary X-ray data of amine oxidase from bovine serum. Acta Crystallogr D Biol Crystallogr 59:727–729 64. Lunelli M, Di Paolo ML, Biadene M, Calderone V, Battistutta R, Scarpa M, Rigo A, Zanotti G (2005) Crystal structure of amine oxidase from bovine serum. J Mol Biol 346:991–1004 65. Lee JE, Fusco ML, Hessell AJ, Oswald WB, Burton DR, Saphire EO (2008) Structure of the Ebola virus glycoprotein bound to an antibody from a human survivor. Nature 454:177–182 66. Johanson K et al (1995) Binding interactions of human interleukin 5 with its receptor α subunit. Large scale production, structural, and functional studies of Drosophila-expressed recombinant proteins. J Biol Chem 270:9459–9471 67. Hoover DM, Schalk-Hihi C, Chou CC, Menon S, Wlodawer A, Zdanov A (1999) Purification of receptor complexes of interleukin-10 stoichiometry and the importance of deglycosylation in their crystallization. Eur J Biochem 262:134–141 68. Fuentes-Prior P et al (2000) Structural basis for the anticoagulant activity of the thrombinthrombomodulin complex. Nature 404:518–525 69. Baker HM, Day CL, Norris GE, Baker EN (1994) Enzymatic deglycosylation as a tool for crystallization of mammalian binding proteins. Acta Crystallogr D Biol Crystallogr 50: 380–384 70. Day CL, Norris GE, Anderson BF, Tweedie JW, Baker EN (1992) Preliminary crystallographic studies of the amino terminal half of human lactoferrin in its iron-saturated and iron-free forms. J Mol Biol 228:973–974 71. Ducros V et al (1997) Crystallization and preliminary X-ray analysis of the laccase from Coprinus cinereus. Acta Crystallogr D Biol Crystallogr 53:605–607 72. Graves BJ et al (1994) Insight into E-selectin/ligand interaction from the crystal structure and mutagenesis of the lec/EGF domains. Nature 367:532–538 73. Lemieux MJ, Reithmeier RA, Wang DN (2002) Importance of detergent and phospholipid in the crystallization of the human erythrocyte anion-exchanger membrane domain. J Struct Biol 137:322–332 74. Cheng A, van Hoek AN, Yeager M, Verkman AS, Mitra AK (1997) Three-dimensional organization of a human water channel. Nature 387:627–630 75. Sui H, Walian PJ, Tang G, Oh A, Jap BK (2000) Crystallization and preliminary X-ray crystallographic analysis of water channel AQP1. Acta Crystallogr D Biol Crystallogr 56:1198–1200 76. Krapp S, Mimura Y, Jefferis R, Huber R, Sondermann P (2003) Structural analysis of human IgG-Fc glycoforms reveals a correlation between glycosylation and structural integrity. J Mol Biol 325:979–989 77. Tegoni M, Spinelli S, Verhoeyen M, Davis P, Cambillau C (1999) Crystal structure of a ternary complex between human chorionic gonadotropin (hCG) and two Fv fragments specific for the α and β-subunits. J Mol Biol 289:1375–1385
6
Solutions to the Glycosylation Problem
155
78. Ogawa H, Qiu Y, Ogata CM, Misono KS (2004) Crystal structure of hormone-bound atrial natriuretic peptide receptor extracellular domain: rotation mechanism for transmembrane signal transduction. J Biol Chem 279:28625–28631 79. Tsiftsoglou SA et al (2006) Human complement factor I glycosylation: structural and functional characterisation of the N-linked oligosaccharides. Biochim Biophys Acta 1764: 1757–1766 80. Zajonc DM, Elsliger MA, Teyton L, Wilson IA (2003) Crystal structure of CD1a in complex with a sulfatide self antigen at a resolution of 2.15 A. Nat Immunol 4: 808–815 81. Zajonc DM, Maricic I, Wu D, Halder R, Roy K, Wong CH, Kumar V, Wilson IA (2005) Structural basis for CD1d presentation of a sulfatide derived from myelin and its implications for autoimmunity. J Exp Med 202:1517–1526 82. O’Leary JM, Radcliffe CM, Willis AC, Dwek RA, Rudd PM, Downing AK (2004) Identification and removal of O-linked and non-covalently linked sugars from recombinant protein produced using Pichia pastoris. Protein Expr Purif 38:217–227 83. Tarentino AL, Quinones G, Changchien LM, Plummer TH, Jr. (1993) Multiple endoglycosidase F activities expressed by Flavobacterium meningosepticum endoglycosidases F2 and F3. Molecular cloning, primary sequence, and enzyme expression. J Biol Chem 268:9702–9708 84. Tarentino AL, Quinones G, Schrader WP, Changchien LM, Plummer TH, Jr. (1992) Multiple endoglycosidase (Endo) F activities expressed by Flavobacterium meningosepticum. Endo F1: molecular cloning, primary sequence, and structural relationship to Endo H. J Biol Chem 267:3868–3872 85. Robbins PW, Wirth DF, Hering C (1981) Expression of the Streptomyces enzyme endoglycosidase H in Escherichia coli. J Biol Chem 256:10640–10644 86. Robbins PW et al (1984) Primary structure of the Streptomyces enzyme endo-β-Nacetylglucosaminidase H. J Biol Chem 259:7577–7583 87. Muramatsu H, Tachikui H, Ushida H, Song X, Qiu Y, Yamamoto S, Muramatsu T (2001) Molecular cloning and expression of endo-β-N-acetylglucosaminidase D, which acts on the core structure of complex type asparagine-linked oligosaccharides. J Biochem (Tokyo) 129:923–928 88. Collin M, Olsen A (2001) EndoS, a novel secreted protein from Streptococcus pyogenes with endoglycosidase activity on human IgG. EMBO J 20:3046–3055 89. Garcia KC, Degano M, Stanfield RL, Brunmark A, Jackson MR, Peterson PA, Teyton L, Wilson IA (1996) An αβ T cell receptor structure at 2.5 A and its orientation in the TCRMHC complex. Science 274:209–219 90. Kwong PD et al (1999) Probability analysis of variational crystallization and its application to gp120, the exterior envelope glycoprotein of type 1 human immunodeficiency virus (HIV1). J Biol Chem 274:4115–4123 91. Zajonc DM et al (2005) Molecular mechanism of lipopeptide presentation by CD1a. Immunity 22:209–219 92. Kostrewa D, Gruninger-Leitch F, D’Arcy A, Broger C, Mitchell D, van Loon AP (1997) Crystal structure of phytase from Aspergillus ficuum at 2.5 Å resolution. Nat Struct Biol 4:185–190 93. Sato T et al (2004) A unique dye-decolorizing peroxidase, DyP, from Thanatephorus cucumeris Dec 1: heterologous expression, crystallization and preliminary X-ray analysis. Acta Crystallogr D Biol Crystallogr 60:149–152 94. Houborg K, Harris P, Poulsen JC, Schneider P, Svendsen A, Larsen S (2003) The structure of a mutant enzyme of Coprinus cinereus peroxidase provides an understanding of its increased thermostability. Acta Crystallogr D Biol Crystallogr 59:997–1003 95. Shimizu T, Nakatsu T, Miyairi K, Okuno T, Kato H (2001) Crystallization and preliminary X-ray study of endopolygalacturonase from the pathogenic fungus Stereum purpureum. Acta Crystallogr D Biol Crystallogr 57:1171–1173
156
S.J. Davis and M. Crispin
96. Fusetti F et al (2002) Crystal structure of the copper-containing quercetin 2,3-dioxygenase from Aspergillus japonicus. Structure 10:259–268 97. Prota AE, Sage DR, Stehle T, Fingeroth JD (2002) The crystal structure of human CD21: Implications for Epstein-Barr virus and C3d binding. Proc Natl Acad Sci USA 99: 10641–10646 98. Xiang T, Liu Q, Deacon AM, Koshy M, Kriksunov IA, Lei XG, Hao Q, Thiel DJ (2004) Crystal structure of a heat-resilient phytase from Aspergillus fumigatus, carrying a phosphorylated histidine. J Mol Biol 339:437–445 99. Rydberg EH et al (1999) Cloning, mutagenesis, and structural analysis of human pancreatic α-amylase expressed in Pichia pastoris. Protein Sci 8:635–643 100. Katz BA, Liu B, Barnes M, Springman EB (1998) Crystal structure of recombinant human tissue kallikrein at 2.0 Å resolution. Protein Sci 7:875–885 101. Oefner C, D’Arcy A, Hennig M, Winkler FK, Dale GE (2000) Structure of human neutral endopeptidase (Neprilysin) complexed with phosphoramidon. J Mol Biol 296:341–349 102. Dale GE, D’Arcy B, Yuvaniyama C, Wipf B, Oefner C, D’Arcy A (2000) Purification and crystallization of the extracellular domain of human neutral endopeptidase (neprilysin) expressed in Pichia pastoris. Acta Crystallogr D Biol Crystallogr 56:894–897 103. Ryu SE et al (1990) Crystal structure of an HIV-binding recombinant fragment of human CD4. Nature 348:419–426 104. Wang JH et al (1990) Atomic structure of a fragment of human CD4 containing two immunoglobulin-like domains. Nature 348:411–418 105. Davis SJ, Brady RL, Barclay AN, Harlos K, Dodson GG, Williams AF (1990) Crystallization of a soluble form of the rat T-cell surface glycoprotein CD4 complexed with Fab from the W3/25 monoclonal antibody. J Mol Biol 213:7–10 106. Ashford DA et al (1993) Site-specific glycosylation of recombinant rat and human soluble CD4 variants expressed in Chinese hamster ovary cells. J Biol Chem 268:3260–3267 107. Stanley P (1984) Glycosylation mutants of animal cells. Annu Rev Genet 18: 525–552 108. Stanley P (1981) Selection of specific wheat germ agglutinin-resistant (WgaR) phenotypes from Chinese hamster ovary cell populations containing numerous lecR genotypes. Mol Cell Biol 1:687–696 109. Cockett MI, Bebbington CR, Yarranton GT (1990) High level expression of tissue inhibitor of metalloproteinases in Chinese hamster ovary cells using glutamine synthetase gene amplification. Biotechnology (N Y) 8:662–667 110. Jones EY, Davis SJ, Williams AF, Harlos K, Stuart DI (1992) Crystal structure at 2.8 A resolution of a soluble form of the cell adhesion molecule CD2. Nature 360:232–239 111. Liu J et al (1996) Crystallization of a deglycosylated T cell receptor (TCR) complexed with an anti-TCR Fab fragment. J Biol Chem 271:33639–33646 112. Kern PS et al (1998) Structural basis of CD8 coreceptor function revealed by crystallographic analysis of a murine CD8αα ectodomain fragment in complex with H-2Kb . Immunity 9:519–530 113. Liu Y et al (2003) The crystal structure of a TL/CD8αα complex at 2.1 A resolution: implications for modulation of T cell activation and memory. Immunity 18:205–215 114. Casasnovas JM, Stehle T, Liu JH, Wang JH, Springer TA (1998) A dimeric crystal structure for the N-terminal two domains of intercellular adhesion molecule-1. Proc Natl Acad Sci USA 95:4134–4139 115. Casasnovas JM, Springer TA, Liu JH, Harrison SC, Wang JH (1997) Crystal structure of ICAM-2 reveals a distinctive integrin recognition surface. Nature 387:312–315 116. Love CA, Harlos K, Mavaddat N, Davis SJ, Stuart DI, Jones EY, Esnouf RM (2003) The ligand-binding face of the semaphorins revealed by the high-resolution crystal structure of SEMA4D. Nat Struct Biol 10:843–848 117. Reinherz EL et al (1999) The crystal structure of a T cell receptor in complex with peptide and MHC class II. Science 286:1913–1921
6
Solutions to the Glycosylation Problem
157
118. Karlsson GB, Butters TD, Dwek RA, Platt FM (1993) Effects of the imino sugar N-butyldeoxynojirimycin on the N-glycosylation of recombinant gp120. J Biol Chem 268:570–576 119. Aricescu AR et al (2006) Eukaryotic Expression: Developments for Structural Proteomics. Acta Crystallogr D Biol Crystallogr 62:1114–11124 120. Chandonia JM, Brenner SE (2006) The impact of structural genomics: expectations and outcomes. Science 311:347–351 121. Reeves PJ, Callewaert N, Contreras R, Khorana HG (2002) Structure and function in rhodopsin: high-level expression of rhodopsin with restricted and homogeneous N-glycosylation by a tetracycline-inducible N-acetylglucosaminyltransferase I-negative HEK293S stable mammalian cell line. Proc Natl Acad Sci USA 99:13419–13424 122. Tokunaga F, Brostrom C, Koide T, Arvan P (2000) Endoplasmic reticulum (ER)-associated degradation of misfolded N-linked glycoproteins is suppressed upon inhibition of ER mannosidase I. J Biol Chem 275:40757–40764 123. Aricescu AR, Siebold C, Choudhuri K, Chang VT, Lu W, Davis SJ, van der Merwe PA, Jones EY (2007) Structure of a tyrosine phosphatase adhesive interaction reveals a spacer-clamp mechanism. Science 317:1217–1220 124. Bowden TA, Crispin M, Harvey DJ, Aricescu AR, Grimes JM, Jones EY, Stuart DI (2008) Crystal structure and carbohydrate analysis of Nipah virus attachment glycoprotein: a template for antiviral and vaccine design. J Virol 82:11628–11636 125. Bowden TA, Aricescu AR, Nettleship JE, Siebold C, Rahman-Huq N, Owens RJ, Stuart DI, Jones EY (2009) Structural plasticity of eph receptor A4 facilitates cross-class ephrin signaling. Structure 17:1386–1397 126. Bishop B, Aricescu AR, Harlos K, O’Callaghan CA, Jones EY, Siebold C (2009) Structural insights into hedgehog ligand sequestration by the human hedgehog-interacting protein HHIP. Nat Struct Mol Biol 16:698–703 127. van der Merwe PA, McPherson DC, Brown MH, Barclay AN, Cyster JG, Williams AF, Davis SJ (1993) The NH2-terminal domain of rat CD2 binds rat CD48 with a low affinity and binding does not require glycosylation of CD2. Eur J Immunol 23:1373–1377 128. van der Merwe PA, Brown MH, Davis SJ, Barclay AN (1993) Affinity and kinetic analysis of the interaction of the cell adhesion molecules rat CD2 and CD48. Embo J 12: 4945–4954 129. Wyss DF et al (1995) Conformation and function of the N-linked glycan in the adhesion domain of human CD2. Science 269:1273–1278 130. Recny MA et al (1992) N-glycosylation is required for human CD2 immunoadhesion functions. J Biol Chem 267:22428–22434 131. Williams AF, Barclay AN (1988) The immunoglobulin superfamily–domains for cell surface recognition. Annu Rev Immunol 6:381–405 132. Williams AF (1987) A year in the life of the immunoglobulin superfamily. Immunol Today 8:298–303 133. Niwa R et al (2004) Enhancement of the antibody-dependent cellular cytotoxicity of low-fucose IgG1 Is independent of FcγRIIIa functional polymorphism. Clin Cancer Res 10:6248–6255 134. Niwa R, Sakurada M, Kobayashi Y, Uehara A, Matsushima K, Ueda R, Nakamura K, Shitara K (2005) Enhanced natural killer cell binding and activation by low-fucose IgG1 antibody results in potent antibody-dependent cellular cytotoxicity induction at lower antigen density. Clin Cancer Res 11:2327–2336 135. Niwa R et al (2004) Defucosylated chimeric anti-CC chemokine receptor 4 IgG1 with enhanced antibody-dependent cellular cytotoxicity shows potent therapeutic activity to T-cell leukemia and lymphoma. Cancer Res 64:2127–2133 136. Crispin M, Bowden TA, Coles CH, Harlos K, Aricescu AR, Harvey DJ, Stuart DI, Jones EY (2009) Carbohydrate and domain architecture of an immature antibody glycoform exhibiting enhanced effector functions. J Mol Biol 387:1061–1066
158
S.J. Davis and M. Crispin
137. Frank M, Lutteke T, von der Lieth CW (2007) GlycoMapsDB: a database of the accessible conformational space of glycosidic linkages. Nucleic Acids Res 35:287–290 138. Lutteke T (2009) Analysis and validation of carbohydrate three-dimensional structures. Acta Crystallogr D Biol Crystallogr 65:156–168 139. Lutteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth CW (2006) GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research. Glycobiology 16:71R–81R 140. Lutteke T, Frank M, von der Lieth CW (2005) Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Res 33:D242–D246 141. Lutteke T, von der Lieth CW (2004) pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics 5:69 142. Lutteke T, Frank M, von der Lieth CW (2004) Data mining the protein data bank: automatic detection and assignment of carbohydrate structures. Carbohydr Res 339:1015–1020 143. Crispin M et al (2009) A human embryonic kidney 293T cell line mutated at the Golgi α-mannosidase II locus. J Biol Chem 284:21684–21695 144. Frank M, Bohne-Lang A, Wetter T, Lieth CW (2002) Rapid generation of a representative ensemble of N-glycan conformations. In Silico Biol 2:427–439 145. Harvey DJ, Merry AH, Royle L, Campbell MP, Dwek RA, Rudd PM (2009) Proposal for a standard system for drawing structural diagrams of N- and O-linked carbohydrates and related compounds. Proteomics 9:3796–3801 146. Matsumiya S et al (2007) Structural comparison of fucosylated and nonfucosylated Fc fragments of human immunoglobulin G1. J Mol Biol 368:767–779 147. Yennawar NH, Li LC, Dudzinski DM, Tabuchi A, Cosgrove DJ (2006) Crystal structure and activities of EXPB1 (Zea m 1), a β-expansin and group-1 pollen allergen from maize. Proc Natl Acad Sci USA 103:14664–14671 148. Henrich S, Cameron A, Bourenkov GP, Kiefersauer R, Huber R, Lindberg I, Bode W, Than ME (2003) The crystal structure of the proprotein processing proteinase furin explains its stringent specificity. Nat Struct Biol 10:520–526 149. Smith BJ et al (2006) Structure of a calcium-deficient form of influenza virus neuraminidase: implications for substrate binding. Acta Crystallogr D Biol Crystallogr 62:947–952 150. Bodian DL, Jones EY, Harlos K, Stuart DI, Davis SJ (1994) Crystal structure of the extracellular region of the human cell adhesion molecule CD2 at 2.5 A resolution. Structure 2:755–766 151. Ikemizu S, Sparks LM, van der Merwe PA, Harlos K, Stuart DI, Jones EY, Davis SJ (1999) Crystal structure of the CD2-binding domain of CD58 (lymphocyte function-associated antigen 3) at 1.8-A resolution. Proc Natl Acad Sci USA 96:4289–4294 152. Fusetti F et al (2002) Crystal structure of the copper-containing quercetin 2,3-dioxygenase from Aspergillus japonicus. Structure 10:259–268
Chapter 7
Role of Glycoproteins in Virus–Human Cell Interactions Thomas A. Bowden and Elizabeth E. Fry
Abstract Glycosylation of viral proteins is clearly advantageous to virus survival, having roles in cell entry, proteolytic processing, trafficking and immune evasion. For enveloped RNA viruses, including many important human pathogens, entry into host cells tends to be mediated by viral glycoproteins. Structural studies of glycoproteins from different viral families have gradually elucidated the mechanisms by which this occurs. We illustrate this by providing examples from recent studies and show that clear differences exist between viruses which use individual glycoproteins for attachment and fusion, and those that use a single glycoprotein for both functions. However, in all cases a similar end-point is reached. Understanding the biology of infection and host responses should lead to the development of enhanced therapeutics. Keywords Structural virology · Viral glycoprotein · Protein–protein interactions · Membrane fusion · Immune evasion Abbreviations C CD46 CFPV DC-SIGN DENV α-DG E EBOV ER G
core protein cluster of differentiation 46 canine feline parvovirus Dendritic Cell-Specific Intercellular adhesion molecule-3-Grabbing Non-integrin Dengue virus α-dystroglycan envelope glycoprotein Ebola virus endoplasmic reticulum attachment glycoprotein
E.E. Fry (B) The Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK e-mail: [email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_7, C Springer Science+Business Media B.V. 2011
159
160
GPC GTOV H HIV-1 hmGL HN HPIV HR JEV JUNV LASV LCMV MACV MARV MMTV MV-H NA NDV PDB PIV prM SABV SLAM SSP SV5 TBEV TfR1 WNV
T.A. Bowden and E.E. Fry
viral glycoprotein precursor Guanarito virus hemagglutinin human immune deficiency virus-1 Human macrophage C-type lectin specific for galactose and N-acetylgalactosamine hemagglutinin-neuraminidase human parainfluenza virus heptad repeat motif Japanese encephalitis virus Junin virus Lassa virus Lymphocytic Choriomeningitis Virus Machupo virus Marburg virus mouse mammary tumour virus measles virus H neuraminidase Newcastle disease virus protein data bank parainfluenza virus precursor membrane protein Sabia virus signalling lymphocyte activation molecule stable sequence peptide simian parainfluenza type-5 tick-borne encephalitis virus human transferrin receptor 1 West Nile virus
Contents 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Paramyxoviruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Paramyxovirus Attachment – Receptor Switching on a Common Scaffold 7.2.2 Paramyxovirus Fusion – An Example of Type I Fusion Machines . . . 7.3 New World Arenavirus Glycoprotein Structure – A New Protein Fold . . . . . 7.4 Flaviviruses Combine Attachment and Fusion Functions in the E Protein . . . 7.5 Filoviruses – A Deadly Receptor-Binding Chalice . . . . . . . . . . . . . . 7.6 Influenza Virus – Glycosylation Modulates Pathogenicity . . . . . . . . . . 7.7 HIV – A Glycan Shield . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
161 164 165 166 167 169 170 173 174 174 175
7
Role of Glycoproteins in Virus-Human Cell Interactions
161
7.1 Introduction Throughout history, viruses have had a considerable impact upon both human health and the economy. Whilst the threat associated with some viruses has been reduced by the development of vaccines (e.g. smallpox, currently the only virus to have been R for eradicated across the globe [1]) and small molecule drugs (e.g. Oseltamivir treatment of influenza virus infection [2]), there are still no established methods of preventing or treating most of the diseases caused by viruses. Our ability to protect against viral infection is enhanced by an understanding of the virus lifecycle, and in particular the interaction of viruses with our own cells. For the many pathogenic viruses that harbour a membrane, these interactions tend to be mediated by viral glycoproteins, making these proteins prime targets for small molecule antiviral drug design. Posttranslational modifications endow proteins with additional structural and functional diversity, and viruses have co-opted these cellular processes to adapt the proteins present on their surface (and hence presented to the host immune system). This impacts both the way in which they infect cells and the host response. Given the tremendous diversity of viruses and the likelihood that the major viral lineages are extremely ancient [3], it is not surprising that different viruses use different strategies for the processing of the surface proteins involved in host interactions. For instance, the poxviruses use an array of proteins for membrane fusion, and none of the core proteins of this complex are glycolsylated, whereas these highly complex viruses do produce a number of glycoproteins that modulate host responses to the virus [4]. Nevertheless, for most membrane containing human viruses, the key viral glycoproteins involved in host cell interactions are those mediating cell attachment and membrane fusion. These are most commonly modified by N-linked glycosylation, whereby a high mannose sugar is attached to the amide nitrogen of asparagines in the conserved motif Asn-X-Ser/Thr followed by subsequent processing in the Golgi and endoplasmic reticulum (ER). In this case, glycosylation is usually crucial to the proper folding and trafficking of the molecule during infection [5]. The capacity of viruses to evolve extremely rapidly means that glycosylation sites tend to be added and deleted between even closely related viruses. Such changes can alter protein folding and conformation and thus immune and receptor recognition. Thus, the attachment of enveloped viruses to the host cell is usually dominated by interactions between glycoproteins present on the virus envelope and receptor molecules displayed on the cell surface. The physical properties of these cell-surface receptor molecules vary greatly since they can be glycoproteins, carbohydrates, or lipids. This wide range reflects the opportunistic capability of viruses to rapidly change and adapt to dissimilar cell surface receptors available within different hosts. This is a root cause of viral zoonosis. Identification of the cell-surface receptor(s) used by a virus for attachment is crucial as it often leads to a better understanding of the cellular tropism and the potential for the virus to be transmitted between different species. However, there is currently no established rationale for inferring which cell-surface receptors are exploited by a specific virus. Thus, the identity of a functional cell-surface receptor must usually
162
T.A. Bowden and E.E. Fry
be deduced experimentally. Technologies such as the glycan screening array developed by the Consortium for Functional Glycomics have been of great value for rapidly determining the specificity of viral glycoproteins for carbohydrate receptors [6]. Methods for ascertaining the identity of glycoprotein cell-surface receptors, on the other hand, are often more challenging and usually rely on pull-down assays to capture viral glycoprotein-cell surface receptor complexes and mass spectrometric analysis to identify possible candidate receptors. For many viruses the situation is complicated by the existence of more than one receptor, often viruses attach, or browse, potential host cells using high abundance low affinity receptors but may require a higher affinity receptor interaction to achieve cell entry. Nevertheless, in general the discovery of the functional receptor(s) used by a virus can provide many clues as to the species and cell-types that are permissive to viral infection. This is important for understanding the life cycle and zoonotic properties of a virus; information which is valuable for predicting and protecting against the emergence and spread of new pathogenic diseases. The interaction between the viral attachment glycoprotein and the cell surface receptor must be of sufficient strength to localize the virus to the cell surface long enough to allow subsequent fusion events (Fig. 7.1). In the case of glycoprotein– glycoprotein interactions, this interaction is usually strong (nM–μM affinity) and highly specific. The strength of binding between a viral attachment glycoprotein and a cell surface carbohydrate, on the other hand, is usually much weaker (mM affinity) and requires the compensatory avidity effect of multiple receptor-pair interactions. Following attachment, merger of the virus envelope with the host cell membrane is facilitated by the viral fusion machinery. Some viruses rely on a single glycoprotein to bring about attachment and fusion (e.g. the E protein of tick borne encephalitis virus), whilst others use separate glycoproteins for each task. Whilst viral attachment glycoproteins demonstrate a broad variety of architectures, X-ray crystallographic analyses have revealed that the presently visualized fusion glycoproteins, which exhibit vast sequence diversity, are structurally limited to one of three classes: I, II, or III.
Fig. 7.1 Cartoon schematic of attachment and fusion of enveloped viruses with their respective host cell. The virus attachment glycoprotein is shown in orange, the cellular receptor in blue and the viral fusion protein in yellow with a green fusion loop/peptide
7
Role of Glycoproteins in Virus-Human Cell Interactions
163
Fig. 7.2 Cartoon representation of the three known classes of fusion glycoproteins in their pre- and post-fusion state. (a) Class I: simian parainfluenza type-5 (SV5-F) in its pre-fusion state (top; PDB identification number 2B9B) and parainfluenza virus-3 (PIV3) (bottom; PDB identification number 1ZTM) in its post-fusion state. (b) Class II: Dengue virus E in its pre- (top; PDB identification number 1OAN) and post-fusion (bottom; PDB identification number 3G7T) state. (c) Vesicular stomatitis virus in its pre- (top; PDB identification number 2J6J) and post-fusion state (bottom; PDB identification number 2CMZ). For each structure, a monomeric subunit is shown, and coloured according to domain boundaries. Individual domains (DI-DV), heptad-repeat units (HR), fusion loops/peptides (FL/FP) are labelled according to established nomenclature. Arrows are shown to highlight the domain rearrangements which occur between pre- and post-fusion states
Class-I fusion proteins have either N-terminal or internal fusion peptides and are primarily α-helical in structure (Fig. 7.2a), class-II fusion proteins have an internal fusion peptide and are comprised primarily of β-sheets (Fig. 7.2b), and class-III fusion proteins appear to be a composite of classes I and II as they have an internal fusion peptide and a mixed composition of α-helices and β-sheets (Fig. 7.2c). Indeed, it has been proposed that all three classes are in fact related, with the similarities being often obscured by the extreme conformational gymnastics which the
164
T.A. Bowden and E.E. Fry
molecules undergo as part of their function, and used to achieve structures which are complex composites of small folded domains [7]. High resolution crystal structures of representatives from all three classes have been determined and have revealed the extensive conformational changes that these proteins undergo upon interacting with their host; a process built around the insertion of the fusion peptide(s) into the host cell membrane and a subsequent structural collapse or contraction to draw the virus and host cell membranes together and initiate fusion. The morphogenic transformation from “pre-fusion” to “post-fusion” forms is a thermodynamically favorable process which seems to be irreversible in type-I and type-II fusion glycoproteins and sometimes reversible in class-III fusion glycoproteins (possibly providing a fail-safe mechanism protecting against premature activation due to exposure to low pH environments during virion assembly and budding). Entry of membrane-bound viruses into host cells takes place by one of two mechanisms: (i) receptor-mediated endocytosis, where activation of the fusion protein is achieved by a change in pH, or (ii) direct fusion of the virus envelope with the host plasma membrane, a pH independent process where the conformational changes in the fusion glycoprotein are indirectly triggered by the viral attachment glycoprotein upon cell-surface receptor binding. The structure and higher order assembly of attachment and fusion glycoproteins on the virus surface have been revealed by an array of structural and functional approaches, most notably cryo-electron microscopy, and X-ray crystallography. For the purposes of this chapter where we are concerned with descriptions at the atomic level, the primary context is structures from X-ray crystallographic studies. We illustrate the structural diversity of viral glycoprotein architectures through the use of case-studies of viruses from the Paramyxovirus, Arenavirus, Flavivirus, and Filovirus families, which include some of the most pathogenic human viruses, whilst the impact of glycosylation in immune evasion will be discussed by reference to Influenza virus and HIV. These are generally enveloped RNA viruses: cell attachment and entry for DNA and non-enveloped RNA viruses is less well understood.
7.2 Paramyxoviruses The Paramyxoviridae family consists of single-stranded, negative-sense RNA viruses and is composed of a number of disease causing pathogens. The Paramyxoviridae are divided into two subfamilies: Paramyxovirinae and Pneumovirinae, [8], the Paramyxovirinae subfamily is divided into five genera: Rubulavirus, Avulavirus, Respirovirus, Henipavirus, Morbillivirus; and the Pneumovirinae subfamily is divided into two genera: Pneumovirus and Metapneumovirus. Paramyxoviruses penetrate into host cells by pH independent fusion of the viral envelope with the host cell membrane [9]. Two outer membrane glycoproteins extend 8–12 nm from the viral envelope and are required for efficient entry: an
7
Role of Glycoproteins in Virus-Human Cell Interactions
165
attachment protein and a fusion glycoprotein. These associate to form a fusion and attachment assembly [10].
7.2.1 Paramyxovirus Attachment – Receptor Switching on a Common Scaffold Depending upon the Paramyxovirus, the glycoprotein used for attachment to the host cell has different properties and is termed either a hemagglutininneuraminidase (HN), hemagglutinin (H), or attachment glycoprotein (G). HN glycoproteins such as those present on parainfluenza (PIV) and Newcastle disease viruses (NDV) possess the ability to both aggregate (hemagglutinate) red blood cells and cleave glycosidic sialic (N-acetylneuraminic) acid linkages. H glycoproteins, on the other hand, lack neuraminidase but have hemagglutination activity, are present on morbilliviruses such as measles, canine distemper, and rinderpest viruses, and have been shown to use signalling lymphocyte activation molecule (SLAM, CD150) [11, 12] or cluster of differentiation 46 (CD46) [13, 14] as cellular receptors for viral attachment. Finally, G glycoproteins lack both hemagglutination and neuraminidase activity and simply attach to a cell-surface protein, leading to activation of the fusion glycoprotein. Prominent members of this group include the Henipaviruses, which use ephrinB2 and ephrinB3 cell-surface glycoproteins as high affinity functional receptors [15–17]. All three classes of Paramyxovirus attachment glycoproteins are type-II transmembrane proteins consisting of an N-terminal cytoplasmic tail (∼50 amino acids), a transmembrane domain (∼20 amino acids), an α-helical stalk region responsible for glycoprotein dimerisation and tetramerisation (∼100 amino acids) [18–19], and a C-terminal globular domain required for receptor binding which consists of a sixbladed β-propeller fold (∼400 amino acids). Due to the pH independent mechanism of fusion used by the Paramyxovirus family, it is expected that structural changes in the attachment glycoprotein upon receptor recognition trigger the conformational changes in the fusion glycoprotein necessary to initiate fusion. X-ray crystallographic investigations of these three classes of attachment glycoproteins, alone and in complex with their respective cell surface receptors, have defined the specificity which underlies virus receptor–cell receptor interactions. Despite the shared underlying architecture of the C-terminal receptor-binding six-bladed β-propeller domain, the three classes of Paramyxovirus glycoproteins have evolved markedly different modes of receptor binding. Whilst glycan binding Paramyxoviruses such as PIV and NDV rely on a conserved binding pocket consisting of eight charged residues deep in the centre of the β-propeller to bind to sialic acid (Fig. 7.3a) [20–23], H and G attachment glycoproteins bind to their respective protein cellular receptors quite differently. H glycoprotein attachment, as demonstrated by the co-crystal structure of the measles virus H (MV-H) bound to its cellular receptor CD46, reveals a comparatively large (∼2000 Å2 ) binding site, located on the “side” of the β-propeller (Fig. 7.3b) [24]. The co-crystal structure of the Nipah virus G protein bound to its cellular receptor, ephrinB2, shows an even
166
T.A. Bowden and E.E. Fry
Fig. 7.3 Contrasting modes of cell-surface receptor recognition by hemagglutininneuraminidase (HN), hemagglutinin (H) and glycoprotein (G) Paramyxovirus attachment glycoproteins. (a) hPIV3 (yellow) in complex with sialic acid (grey sticks; PDB identification number 1V3C). (b) MV-H (blue) in complex with CD46 (brown; PDB identification number 3INB). (c) NiV-G (green) in complex with ephrinB2 cell surface receptor (orange; PDB identification number 2VSM)
more extensive (∼2700 Å2 ) interface located on a different face of the molecule, directly above the deep PIV- and NDV-sialic acid binding pocket (Fig. 7.3c) [19, 25]. Detailed comparisons of the proteins suggests that the switch from sugar binding to protein binding has been made independently for the measles and the Hendra/Nipah viruses [19].
7.2.2 Paramyxovirus Fusion – An Example of Type I Fusion Machines Along with a range of other proteins, including influenza virus hemagglutinin, Ebola virus GP2 [26], HIV-1 gp41 [27], and Arenavirus GP2 [28, 29], paramyxovirus fusion glycoproteins are designated as class 1 fusion proteins [9]. Generally, paramyxovirus fusion proteins are synthesized as a single precursor protein, F0, which, following post-translational cleavage by a specific protease into F1 and F2 subunits, covalently associates by a disulphide bond to form trimers. F1 consists of an N-terminal hydrophobic fusion peptide adjacent to a heptad repeat motif (HR) which is separated by approximately 250 amino acids, comprising domains I-III, from a C-terminal HR and transmembrane domain. F2 is ∼100 amino acids in size and probably plays a role in stabilizing the metastable prefusion state. Structural analysis of the pre- and post-fusion forms of simian parainfluenza type-5 (SV5) and human parainfluenza type-3 (PIV3) F proteins, respectively, revealed that F undergoes remarkable conformational rearrangements during the fusion process (Fig. 7.2a). In its pre-fusion form, SV5-F forms a compact, globular structure composed of HR-A and DI-DIII with the N-terminal hydrophobic peptide buried between trimeric subunits. In its post-fusion state, PIV3-F rearranges to form a rod-like structure, where the two heptad repeat motifs associate to form a 6-helix bundle connected to DI-III by a long 107 Å trimeric helix. In this state, the fusion
7
Role of Glycoproteins in Virus-Human Cell Interactions
167
peptide is solvent exposed and is expected to extend into the cell-membrane, thus catalyzing the fusion process. Overall, the changes result in a major rearrangement of all of the protein domains and, moving the C-terminal HR-B region by 196 Å between the two conformations [30]. This change is characteristic, in broad terms, of the conformational changes observed for all type I fusion machines [31].
7.3 New World Arenavirus Glycoprotein Structure – A New Protein Fold Arenaviruses comprise a family of primarily rodent-borne viruses and constitute a growing threat to global health [32]. They are divided into the Tacaribe serocomplex (commonly referred to as New World Arenaviruses) and the Lassa-Lympohcytic choriomeningitis serocomplex (Old World Arenaviruses). New World Arenaviruses have been detected throughout the Americas and comprise three clades: A, B, and C. Except for the Tacaribe virus which uses fruit-eating bats [33], Arenaviruses almost exclusively use rodents as a natural reservoir [34]. With the exception of lymphocytic choriomeningitis virus, which has a worldwide distribution, Old World Arenaviruses have been isolated only in western Africa. There are currently five known species of Old World Arenaviruses and seventeen New World Arenaviruses [34] (a further six recently discovered Arenavirus species are putatively designated as New World Arenaviruses [35, 36]); of these viruses, approximately one half are currently considered human pathogens. The single-stranded, ambi-sense RNA genome of Arenaviruses encodes five genes on two gene segments: S and L. The L (large) segment encodes an RNAdependent polymerase and a zinc-finger matrix protein, important for virus budding [37]. The S (small) segment encodes the nucleoprotein and the glycoprotein precursor, GPC, required for virus attachment and fusion [38]. The GPC is cleaved by the cellular proprotein convertase site 1 protease [39] to yield a complex composed of a 58 amino acid stable signal sequence peptide (SSP) essential for virus maturation, a GP1 subunit involved in receptor attachment (∼200 amino acids), and a transmembrane GP2 subunit (∼250 amino acids) which is classified as a class-I fusion protein [28, 29]. Although the nature of the SSP association with the mature GPC complex is unknown, it is proposed that the mature GP complex consists of a heterohexamer composed of trimers of GP1 and GP2 subunits [40]. In contrast to GP2, which maintains high sequence similarity between Arenaviruses (greater than 80% sequence identity), GP1 is poorly conserved. Nevertheless, recent studies have shown that α-dystroglycan (α-DG) is a functional receptor for the Old World Arenaviruses [41] (e.g. the GP1 of Lassa virus (LASV) and LCMV) and the human transferrin receptor 1 (TfR1) is a functional receptor for New World Hemorrhagic fever Arenaviruses (e.g. the GP1 of Machupo virus (MACV), Junin virus (JUNV), Guanarito virus (GTOV), and Sabia virus (SABV) [36]. This receptor usage correlates with the broad cellular tropism of both virus
168
T.A. Bowden and E.E. Fry
Fig. 7.4 α-dystroglycan (α-DG) and transferrin receptor (TfR1) are cell-surface receptors for Old and New World Arenaviruses, respectively. (a) Crystal structure of the N-terminal IgG-like and RNA-binding protein-like domains of α-DG (PDB identification number 1U2C). (b) Cartoon diagram of TfR1 (PDB accession number 1CX8 [51]) with the protease-like, helical, and apical domains labelled
serocomplexes, as both α-DG and TfR1 are expressed in a wide array of tissues [42–44]. The α-DG cell-surface receptor consists of four domains. The crystal structure of the two most N-terminal shows them to consist of an Ig-like domain (residues 60– 158) and an RNA-binding protein-like fold (residues 180–303), respectively. The third domain is a highly glycosylated mucin-like domain (residues 315–485) and the C-terminal domain (residues 486–653) binds to the β-dystroglycan [42] binding domain. The central RNA-binding protein and mucin-like domains are essential for GP1 attachment by Arenaviruses (Fig. 7.4a) [45]. Site-directed mutagenesis of amino acids in the human TfR1 ecto-domain apical domain (residues 184–384) has identified residues Asn204, Tyr211, and Asn348 as critical for New World hemorrhagic fever Arenaviruses GP1 binding [46]. This overlaps with identified MMTV [47] and CFPVs [48] attachment sites and is distal from the apo-transferrin binding region in the protease-like domain (Fig. 7.4b) [49]. The X-ray crystal structure of the MACV-GP1 glycoprotein TfR1-binding fragment has been solved recently and represents the first structure of an Arenavirus GP1 (Fig. 7.5) [50]. The MACV-GP1 consists of a protein fold that has not been previously observed, suggesting that the Arenaviruses use a novel mechanism of viral attachment. These results emphasize the structural diversity of viral glycoproteins and the different mechanisms by which viruses can undergo analogous virus–human interactions. The importance of understanding the structure of arenaviral proteins is also further underscored by recent work which has shown that non-pathogenic New World Arenaviruses use rodent TfR1 orthologues as functional receptors and that specificity for human TfR1 can be conferred via specific site-directed mutagenesis [51]. These studies are an important step towards defining New World
7
Role of Glycoproteins in Virus-Human Cell Interactions
169
Fig. 7.5 Structure of Machupo virus- glycoprotein 1 (MACV-GP1). Cartoon diagram of MACV-GP1 coloured according to secondary structure with α-helices coloured red, β-strands coloured yellow, and loops coloured blue. Sites of N-linked glycosylation are shown as sticks with carbon atoms coloured white, oxygen atoms coloured red, and nitrogen atoms coloured blue
Arenaviral tropism, and provide a rationale for the emergence of new, pathogenic Arenaviruses. Future structural work in this field will profit from the structural determination of arenaviral GP1–receptor complexes, and additional structural studies of the GP2 Arenavirus fusion protein alone and in complex with GP1.
7.4 Flaviviruses Combine Attachment and Fusion Functions in the E Protein The genus Flavivirus belongs to the Flaviviridae family and constitutes a group of over 70 arthropod-borne viruses, over 40 of which use mosquitoes as a natural host, facilitating the rapid spread of these viruses. This is exemplified by the emergence of Dengue virus (DENV), the causative agent of Dengue haemorrhagic fever [52], now a global pathogen (two fifths of the world’s population are at risk). Several other medically important viruses belong to this family making the flaviviruses high priority targets for the pharmaceutical industry. Flaviviruses are single-stranded, positive-sense RNA viruses whose genome contains a single open reading frame, which, following translation, is proteolytically processed into ten proteins [53], the three most N-terminal of which comprise the precursor membrane protein (prM ∼170 amino acids), the envelope glycoprotein (E, ∼500 amino acids) and the core protein (C, ∼120 amino acids), which packages the RNA genome. Following translation, E and prM associate as a heterodimer [54] which assembles on the virion envelope and undergoes pH dependent, reversible conformational rearrangements until it is primed at acidic pH by furin cleavage of prM to form the mature, fusion-competent E protein [55]. Flavivirus entry is negotiated entirely by the E glycoprotein. Candidate Flavivirus receptors include glycosaminoglycans for Japanese encephalitis virus (JEV) and
170
T.A. Bowden and E.E. Fry
West Nile virus (WNV) [56], αVβ3 integrin for WNV [57, 58], Dendritic CellSpecific Intercellular adhesion molecule-3-Grabbing Non-integrin (DC-SIGN) for WNV and DENV [59, 60], and the mannose receptor for DENV [61]. A recent cryoelectron microscopic reconstruction of DENV in complex with DC-SIGN revealed specific binding to the carbohydrate present on the Asn67 of the E glycoprotein [62]. The Flavivirus E glycoprotein consists of a large ectodomain (∼450 amino acids) attached to the virus envelope by a C-terminal transmembrane domain (∼50 amino acids). Crystal structures of the ectodomains of E from WNV, TBEV, and DENV solved in both the immature prM-bound, and mature pre-fusion states revealed long, finger-like molecules composed predominantly of β-sheets [63–68]. This contrasts with the primarily α-helical structures observed in trimeric class-I fusion proteins and led to the E glycoprotein being assigned to a second class of fusion proteins (class-II) [69]. Analogous to the similar class-II fusion glycoproteins from the alphavirus Semliki Forest virus [70], the E glycoprotein consists of three domains: DI, DII, and DIII. DI is located at the centre of the molecule, DII contains the highly conserved fusion loop responsible for insertion into the host cell membrane, and the C-terminal DIII domain has an Ig-like fold [63, 66, 67]. In the immature form, the hydrophobic fusion loop is buried between domains DI and DIII and further protected by the associated prM glycoprotein to avoid premature fusion (Fig. 7.6a) [64]. Upon maturation cleavage of prM the E glycoprotein forms anti-parallel homo-dimers (Fig. 7.6b) which lie flat on the icosahedral virus envelope [71]. Flaviviruses enter host cells through clathrin-mediated endocytosis [72], which activates E at low pH, switching the oligomeric state from dimeric to trimeric via a monomeric intermediate [73]. These irreversible rearrangements rotate the molecule so that it projects away from the virion surface [74–76], whilst conformational changes lead to the movement and exposure of the hydrophobic fusion loops at the tip of the trimer and the folding back of the C-terminal, membrane-anchored C-terminal DIII domain against the DI and DII domains (Fig. 7.6c). The fusion loop presumably then inserts into the host cell membrane and pulls the virus via the viral envelope-anchored DIII domain towards the host cell. This mechanism appears to be broadly similar in the alphaviruses [70, 77]. The structures resulting from these studies of the E glycoprotein have provided a number of templates by which to pursue structure-based antiviral design. Indeed, the development of small-molecules which inhibit E glycoprotein maturation may prove to be an effective means by which to target this group of biomedically important viruses.
7.5 Filoviruses – A Deadly Receptor-Binding Chalice The Filoviridae family consists of single-stranded, negative-sense RNA viruses which have been isolated throughout central and western Africa and the Philippines where they are causative agents for severe hemorrhagic fever [78]. The Filoviridae family is composed of two genera: Ebola virus (EBOV) and Marburg (MARV)
7
Role of Glycoproteins in Virus-Human Cell Interactions
171
Fig. 7.6 Conformational states of the Flavivirus E glycoprotein. (a) Immature form of the Dengue E glycoprotein, serotype 2, in complex with prM (PDB identification number 3C5X). (b) Dengue E glycoprotein, serotype 2, in its mature, pre-fusion form (PDB identification number 1OAN). (c) Dengue E glycoprotein, serotype 2, in its post-fusion state (PDB identification number 1OK8). Structures are coloured according to domain with D1 yellow, DII orange, DIII red and prM cyan
virus. The Filovirus genome encodes eight genes in seven open reading frames, the fourth of which encodes the glycoprotein precursor, GPC, which is responsible for viral entry and a soluble glycoprotein, sGP, implicated in immune evasion [79, 80]. Following translation, the GPC precursor is processed by furin, into two subunits: GP1 and GP2 [81] which associate on the surface of the virus envelope. GP1 is responsible for cell-receptor binding and GP2 for fusion of the viral envelope with the cell membrane. Suspected cell-surface receptors include: the lectins DC-SIGN [82] and hmGL [83], α- [84] and β- [85] integrins, folate receptor-α [86], and the Tyro3 [87] family of receptors. Following attachment, Filoviruses enter their host cells via clathrin mediatedendocytosis. This pH dependent mechanism activates GP2 upon trafficking of the virus into later endosomal compartments, where cathepsins B and L are thought to cleave the mucin-like domain from the GP1, stimulating conformational changes that activate GP2 [88–90]. Recently, the crystal structure of a truncated GP1–GP2 glycoprotein complex from EBOV was solved in its putative pre-fusion form bound to a neutralising antibody [91]. The attachment and fusion glycoproteins form a heterohexamer from three covalently associated GP1–GP2 heterodimers (Fig. 7.7a) [91]. The
172
T.A. Bowden and E.E. Fry
Fig. 7.7 Structure of Ebola virus glycoprotein (EBOV-GP). (a) Cartoon diagram of EBOVGP in complex with a neutralising antibody, KZ52 (PDB identification number 3CSY). GP1 is coloured green, GP2 is coloured black, and KZ52 is shown as a white surface. The neutralizing antibody, KZ52, was observed to bind to an epitope consisting of both GP1 and GP2 subunits. (b) Panel (a) rotated by 90 degrees. Grey bars are shown to display the respective positions of the virus envelope and host cell membrane. (c) The EBOV GP1–GP2 heterodimer (coloured as in panel a). The location of the GP2 fusion peptides, predicted receptor binding-site (RBS) [91], and mucin-like domain (MLD) are shown. The other two EBOV heterodimers are shown as transparent cartoons. (d) The pre- (from the GP1 to GP2 complex) and post-fusion states of EBOV-GP2. Both cartoons are coloured as a rainbow with the N-terminus in blue and the C-terminus in red. Heptad repeat regions (HR) are labelled. Note, no visible electron density was observed for HR2 in its pre-fusion form of EBOV-GP2 and the construct used to study the post-fusion form of EBOV-GP2 (PDB identification number 2EBO) lacked the hydrophobic fusion peptide
architecture of the GP1–GP2 complex was described as a GP1 “chalice” which sits, extending towards the target cell on a “cradle” supported in a stem formed from GP2 (Fig. 7.7b). The GP1 subunit consists of three components: an N-terminal region which participates in GP1–GP2 contacts, a second region which extends towards the target host cell, and a third region, rich in N-linked glycosylation which sits at the top of the GP1 “chalice”. Since the fourth mucin-like domain was removed from this construct to promote crystallogenesis, the large O-linked carbohydrates present in this domain had to be modelled (Fig. 7.7c). The location of this highly variable region,
7
Role of Glycoproteins in Virus-Human Cell Interactions
173
in the context of the overall GP suggests that it may facilitate immune evasion by shielding epitopes on the GP1 surface [91]. In contrast, the residues implicated in cell attachment [91] co-localise to the inner-rim of the GP1 chalice. Unlike most class-I fusion glycoproteins, the EBOV-GP2 fusion peptide is located internally, packed against the adjacent GP1 heterodimer, approximately 30 amino acid residues away from the N-terminus [92]. Further downstream, EBOV-GP2 contains two heptad-repeat regions, HR1 and HR2, which help support the GP1 chalice. It is assumed that these switch to associate in an anti-parallel fashion as part of the conversion to the post-fusion state (as is seen in other class I fusion machines), (Fig. 7.7d) [93, 94] – forming a bridge to the target cell and initiating membrane fusion (Fig. 7.7d). Many questions remain concerning Ebola virus cell entry including: the identity of the receptor(s) used by EBOV for entry, structural changes in EBOV-GP which occur on receptor binding, and the architectural changes that result from cleavage of the mucin-like domain from EBOV-GP1 by endosomal cathepsins. Answering these questions may suggest how these glycoproteins can be targeted therapeutically.
7.6 Influenza Virus – Glycosylation Modulates Pathogenicity The Orthomyxoviridae family of segmented negative sense RNA viruses comprises five genera: Influenza viruses A, B and C, Isavirus and Thogotovirus. Influenza A and B viruses are pleiomorphic particles, roughly spherical (80–120 nm) with an outer lipid envelope harboring projecting glycoprotein spikes of haemagglutinin (HA) (a 135 Å trimer) and neuraminidase (NA) (a 60 Å tetramer). HA, the major surface glycoprotein, mediates attachment and entry through interaction with sialylated receptor via a type 1 fusion mechanism. The degree of N-glycosylation varies, with 5–11 sites, mostly in the region of the globular head. In contrast, NA destroys the R and receptor, facilitating virus spread, and is the site of action of both Zanamivir R Oseltamivir [2]. The virus enters the host cell by endocytosis and within the acidic endosome HA mediates fusion of the viral and endosomal membranes [95]. The HA subunit is formed from a single chain precursor, which is activated by cleavage by host proteases. The tissue tropism and virulence of the virus is determined, in some measure, by the susceptibility of the HA to specific cellular proteases. Carbohydrate close to the cleavage site can block protease access but also provides a mechanism for modulating binding affinity and receptor specificity. The gradual build-up of carbohydrate binding around the globular head can act as a “glycan shield” preventing accessibility and recognition by antibodies and may account for the antigenic drift observed in certain strains. In such cases some decrease in receptor binding has also been observed. The carbohydrate modulation of the receptor also affects the level of NA activity required to promote particle release – more extensive glycosylation leads to weaker receptor interaction and a lower activity of NA is required to facilitate release. It appears that in strains of influenza that have been circulating in humans for a long time, the level of glycosylation has increased and the disease has become
174
T.A. Bowden and E.E. Fry
attenuated [96]. It is likely that such glycan shield effects occur in other viruses (e.g. HIV [97], as discussed below) and underlie the much accelerated switching of glycosylation sites in viral glycoproteins noted above.
7.7 HIV – A Glycan Shield HIV is a member of the Retroviridae family possessing a positive sense, singlestranded RNA genome reverse transcribed by a viral polymerase. The virions are roughly spherical, 80–100 nm in diameter and pleiomorphic. They consist of an outer envelope, nucleocapsid and a nucleoid. 8 nm-long glycoprotein spikes cover the surface evenly [98]. The envelope glycoprotein of HIV-1 utilizes formidable immune-evasion mechanisms. It is very heavily glycosylated with between 18 and 33 N-linked glycosylation sites. Such host-pathway-derived glycans are seen by the immune system as “self ”, thus shielding underlying protein epitopes from antibody recognition. Mutations leading to the addition of N-linked glycans can thus provide protection from host neutralizing antibodies without interfering with receptor binding, although multiple mutations may be required [97, 99]. The virus also evades humoral recognition by using exposed hypervariable loops that readily elicit an antibody response, together with conformational masking and inaccessibility of conserved sites. The recent identification of a few broadly neutralizing antibodies is providing clues as to how HIV-1 is driving immune adaptation [100], e.g. the unusual “domain swapped” architecture of one such antibody enables it to recognize self sugars forming dense clusters on the viral surface not present on the host cell [101]. Strategies for producing a long awaited vaccine may be informed by the analysis of these broadly neutralizing antibodies and by utilizing immune focussing to achieve precise target recognition. Many strategies are being explored to target the receptor-binding site on the membrane distal region of the envelope glycoprotein to elicit antibodies. It may be possible to introduce mutations that fix gp120 into the form recognized by the CD4 receptor or by broadly neutralizing antibodies. To help focus the immune response, one can remove immunodominant regions, thus paring the envelope to critically conserved regions of the core or outer domain, or one can mask immunodominant regions with carbohydrate to make them immunologically silent.
7.8 Conclusions To initiate infection, enveloped viruses must attach to and fuse with their host cell. This is accomplished by either direct fusion with the host cell membrane or by endocytotic absorption. Macromolecular crystallography has proven to be an essential tool to study the glycoproteins which mediate these entry events and has led to an enhanced understanding of their role in virus–human cell interactions. This high resolution structural information has not only given us a better
7
Role of Glycoproteins in Virus-Human Cell Interactions
175
understanding of the molecular architecture of viral glycoproteins on the envelope surface but also, through the study of viral receptor–cell receptor complex structures (e.g. measles virus hemagglutinin in complex with its receptor CD46 [102]), has allowed us to identify amino acid residues responsible for viral receptor specificity. Such information informs the design of therapeutics to inhibit virus attachment and fusion. Crystallographic studies have shown that viral glycoproteins achieve common functions through highly varied structures. This structural diversity is particularly evident in viral attachment glycoproteins which, as shown here, exhibit a wide range of protein folds. This likely reflects the virus’s ability to rapidly evolve and adapt to the receptors displayed on different host cells, thus underscoring the potential emergence of new and dangerous zoonotic viruses. Viral fusion glycoproteins, on the other hand, are not necessarily required to adapt, and are thus more unified in their structure and the mechanism by which they fuse the viral envelope with the host cell membrane. Even here the structural commonalities observed are often unpredictable by sequence alone, so that the structures provide a route to understanding evolutionary links between otherwise dissimilar viruses. Acknowledgements We gratefully acknowledge D.I. Stuart, E.Y. Jones, J. Grimes, A.R. Aricescu, and M. Crispin at the Division of Structural Biology, Oxford, for their support and many helpful discussions. T.A.B. is funded as a Sir Henry Wellcome Postdoctoral Fellow by the Wellcome Trust. E.E.F. is funded by the Medical Research Council, UK.
References 1. Taub DD et al (2008) Immunity from smallpox vaccine persists for decades: a longitudinal study. Am J Med 121:1058–1064 2. Jefferson T, Jones M, Doshi P, Del Mar C (2009) Neuraminidase inhibitors for preventing and treating influenza in healthy adults: systematic review and meta-analysis. BMJ 339:b5106 3. Bamford DH, Grimes JM, Stuart DI (2005) What does structure tell us about virus evolution? Curr Opin Struct Biol 15:655–663 4. Moss B (2006) Poxvirus entry and membrane fusion. Virology 344:48–54 5. Daniels R, Kurowski B, Johnson AE, Hebert DN (2003) N-linked Glycans direct the cotranslational folding pathway of influenza hemagglutinin. Mol Cell 11:79–90 6. Amonsen M, Smith DF, Cummings RD, Air GM (2007) Human parainfluenza viruses hPIV1 and hPIV3 bind oligosaccharides with alpha2-3-linked sialic acids that are distinct from those bound by H5 avian influenza virus hemagglutinin. J Virol 81:8341–8345 7. Kadlec J, Loureiro S, Abrescia NG, Stuart DI, Jones IM (2008) The postfusion structure of baculovirus gp64 supports a unified view of viral fusion machines. Nat Struct Mol Biol 15:1024–30 8. Monroe SS, Carter MJ, Hermann JE, Kurtz JB, Matsui SM (2005) Virus taxonomy. Elsevier Academic Press, London 9. Eaton BT, Broder CC, Middleton D, Wang LF (2006) Hendra and Nipah viruses: different and dangerous. Nat Rev Microbiol 4:23–35 10. Gleeson PA, Feeney J, Hughes RC (1985) Structures of N-glycans of a ricin-resistant mutant of baby hamster kidney cells. Synthesis of high-mannose and hybrid N-glycans. Biochemistry 24:493–503
176
T.A. Bowden and E.E. Fry
11. Tatsuo H, Ono N, Tanaka K, Yanagi Y (2000) SLAM (CDw150) is a cellular receptor for measles virus. Nature 406:893–897 12. Tatsuo H, Ono N, Yanagi Y (2001) Morbilliviruses use signaling lymphocyte activation molecules (CD150) as cellular receptors. J Virol 75:5842–5850 13. Dorig RE, Marcil A, Chopra A, Richardson CD (1993) The human CD46 molecule is a receptor for measles virus (Edmonston strain). Cell 75:295–305 14. Naniche D, Varior-Krishnan G, Cervoni F, Wild TF, Rossi B, Rabourdin-Combe C, Gerlier D (1993) Human membrane cofactor protein (CD46) acts as a cellular receptor for measles virus. J Virol 67:6025–6032 15. Bonaparte MI et al (2005) Ephrin-B2 ligand is a functional receptor for Hendra virus and Nipah virus. Proc Natl Acad Sci USA 102:10652–10657 16. Negrete OA, Levroney EL, Aguilar HC, Bertolotti-Ciarlet A, Nazarian R, Tajyar S, Lee B (2005) EphrinB2 is the entry receptor for Nipah virus, an emergent deadly paramyxovirus. Nature 436:401–405 17. Negrete OA et al (2006) Two key residues in EphrinB3 are critical for its use as an alternative receptor for nipah virus. PLos Pathog 2:e7 18. Bossart KN et al (2005) Receptor binding, fusion inhibition, and induction of crossreactive neutralizing antibodies by a soluble G glycoprotein of Hendra virus. J Virol 79: 6690–6702 19. Bowden TA, Aricescu AR, Gilbert RJ, Grimes JM, Jones EY, Stuart DI (2008) Structural basis of Nipah and Hendra virus attachment to their cell-surface receptor ephrin-B2. Nat Struct Mol Biol 15:567–572 20. Crennell S, Takimoto T, Portner A, Taylor G (2000) Crystal structure of the multifunctional paramyxovirus hemagglutinin-neuraminidase. Nat Struct Biol 7:1068–74 21. Lawrence MC, Borg NA, Streltsov VA, Pilling PA, Epa VC, Varghese JN, McKimmBreschkin JL, Colman PM (2004) Structure of the haemagglutinin-neuraminidase from human parainfluenza virus type III. J Mol Biol 335:1343–1357 22. Yuan P, Thompson TB, Wurzburg BA, Paterson RG, Lamb RA, Jardetzky TS (2005) Structural studies of the parainfluenza virus 5 hemagglutinin-neuraminidase tetramer in complex with its receptor, sialyllactose. Structure 13:803–815 23. Zaitsev V, von Itzstein M, Groves D, Kiefel M, Takimoto T, Portner A, Taylor G (2004) Second sialic acid binding site in Newcastle disease virus hemagglutinin-neuraminidase: implications for fusion. J Virol 78:3733–3741 24. Santiago C, Celma ML, Stehle T, Casasnovas JM (2010) Structure of the measles virus hemagglutinin bound to the CD46 receptor. Nat Struct Mol Biol 17:124–129 25. Bowden TA, Crispin M, Harvey DJ, Aricescu AR, Grimes JM, Jones EY, Stuart DI (2008) Crystal structure and carbohydrate analysis of Nipah virus attachment glycoprotein: a template for antiviral and vaccine design. J Virol 82:11628–11636 26. Lee JE, Fusco ML, Hessell AJ, Oswald WB, Burton DR, Saphire EO (2008) Structure of the Ebola virus glycoprotein bound to an antibody from a human survivor. Nature 454: 177–82 27. Chan DC, Fass D, Berger JM, Kim PS (1997) Core structure of gp41 from the HIV envelope glycoprotein. Cell 89:263–273 28. Eschli B, Quirin K, Wepf A, Weber J, Zinkernagel R, Hengartner H (2006) Identification of an N-terminal trimeric coiled-coil core within arenavirus glycoprotein 2 permits assignment to class I viral fusion proteins. J Virol 80:5897–5907 29. Rojek JM, Kunz S (2008) Cell entry by human pathogenic arenaviruses. Cell Microbiol 10:828–835 30. Lamb RA, Jardetzky TS (2007) Structural basis of viral invasion: lessons from paramyxovirus F. Curr Opin Struct Biol 17:427–436 31. Colman PM, Lawrence MC (2003) The structural biology of type 1 viral membrane fusion. Nat Rev Mol Cell Biol 4:309–319 32. Geisbert TW, Jahrling PB (2004) Exotic emerging viral diseases: progress and challenges. Nat Med 10:S110–S121
7
Role of Glycoproteins in Virus-Human Cell Interactions
177
33. Childs JE, Peters CJ (1993) Ecology and epidemiology of arenaviruses and their hosts. Plenum Press, New York 34. Charrel RN, de Lamballerie X, Emonet S (2008) Phylogeny of the genus Arenavirus. Curr Opin Microbiol 11:362–368 35. Cajimat MN, Milazzo ML, Bradley RD, Fulhorst CF (2007) Catarina virus, an arenaviral species principally associated with Neotoma micropus (southern plains woodrat) in Texas. Am J Trop Med Hyg 77:732–736 36. Palacios G et al (2008) A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med 358:991–998 37. Perez M, Craven RC, de la Torre JC (2003) The small RING finger protein Z drives arenavirus budding: implications for antiviral strategies. Proc Natl Acad Sci USA 100: 12978–12983 38. Buchmeier MJ, de la Torre JC, Peters CJ (2007) Arenaviridae: the viruses and their replication. Lippincott-Raven, Philadelphia, PA 39. Rojek JM, Lee AM, Nguyen N, Spiropoulou CF, Kunz S (2008) Site 1 protease is required for proteolytic processing of the glycoproteins of the South American hemorrhagic fever viruses Junin, Machupo, and Guanarito. J Virol 82:6045–6051 40. Rojek JM, Sanchez AB, Nguyen NT, de la Torre JC, Kunz S (2008) Different mechanisms of cell entry by human-pathogenic old world and new world arenaviruses. J Virol 82: 7677–7687 41. Cao W et al (1998) Identification of alpha-dystroglycan as a receptor for lymphocytic choriomeningitis virus and Lassa fever virus. Science 282:2079–2081 42. Bozzi M, Morlacchi S, Bigotti MG, Sciandra F, Brancaccio A (2009) Functional diversity of dystroglycan. Matrix Biol 28:179–187 43. Flanagan ML, Oldenburg J, Reignier T, Holt N, Hamilton GA, Martin VK, Cannon PM (2008) New world clade B arenaviruses can use transferrin receptor 1 (TfR1)-dependent and -independent entry pathways, and glycoproteins from human pathogenic strains are associated with the use of TfR1. J Virol 82:938–948 44. Radoshitzky SR et al (2007) Transferrin receptor 1 is a cellular receptor for New World haemorrhagic fever arenaviruses. Nature 446:92–96 45. Kunz S, Sevilla N, McGavern DB, Campbell KP, Oldstone MB (2001) Molecular analysis of the interaction of LCMV with its cellular receptor [alpha]-dystroglycan. J Cell Biol 155:301–310 46. Radoshitzky SR et al (2008) Receptor determinants of zoonotic transmission of New World hemorrhagic fever arenaviruses. Proc Natl Acad Sci USA 105:2664–2669 47. Wang E, Albritton L, Ross SR (2006) Identification of the segments of the mouse transferrin receptor 1 required for mouse mammary tumor virus infection. J Biol Chem 281: 10243–10249 48. Palermo LM, Hueffer K, Parrish CR (2003) Residues in the apical domain of the feline and canine transferrin receptors control host-specific binding and cell infection of canine and feline parvoviruses. J Virol 77:8915–8923 49. Cheng Y, Zak O, Aisen P, Harrison SC, Walz T (2004) Structure of the human transferrin receptor-transferrin complex. Cell 116:565–576 50. Bowden TA, Crispin M, Graham SC, Harvey DJ, Grimes JM, Jones EY, Stuart DI (2009) Unusual molecular architecture of the machupo virus attachment glycoprotein. J Virol 83:8259–8265 51. Abraham J et al (2009) Host-species transferrin receptor 1 orthologs are cellular receptors for nonpathogenic new world clade B arenaviruses. PLoS Pathog 5:e1000358 52. Mackenzie JS, Gubler DJ, Petersen LR (2004) Emerging flaviviruses: the spread and resurgence of Japanese encephalitis, West Nile and dengue viruses. Nat Med 10: S98–S109 53. Thiel H-J, Collett MS, Could EA, Heinz FX, Houghton M, Meyers G, Purcell RH, Riche CM (2005) Flaviviridae. In: Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (eds) Virus taxonomy. Elsevier Academic Press, London
178
T.A. Bowden and E.E. Fry
54. Li L, Lok SM, Yu IM, Zhang Y, Kuhn RJ, Chen J, Rossmann MG (2008) The flavivirus precursor membrane-envelope protein complex: structure and maturation. Science 319:1830–1834 55. Elshuber S, Allison SL, Heinz FX, Mandl CW (2003) Cleavage of protein prM is necessary for infection of BHK-21 cells by tick-borne encephalitis virus. J Gen Virol 84: 183–191 56. Lee E, Hall RA, Lobigs M (2004) Common E protein determinants for attenuation of glycosaminoglycan-binding variants of Japanese encephalitis and West Nile viruses. J Virol 78:8271–8280 57. Chu JJ, Ng ML (2004) Interaction of West Nile virus with alpha v beta 3 integrin mediates virus entry into cells. J Biol Chem 279:54533–54541 58. Lee JW, Chu JJ, Ng ML (2006) Quantifying the specific binding between West Nile virus envelope domain III protein and the cellular receptor alphaVbeta3 integrin. J Biol Chem 281:1352–1360 59. Tassaneetrithep B et al (2003) DC-SIGN (CD209) mediates dengue virus infection of human dendritic cells. J Exp Med 197:823–829 60. Davis CW, Nguyen HY, Hanna SL, Sanchez MD, Doms RW, Pierson TC (2006) West Nile virus discriminates between DC-SIGN and DC-SIGNR for cellular attachment and infection. J Virol 80:1290–1301 61. Miller JL, de Wet BJ, Martinez-Pomares L, Radcliffe CM, Dwek RA, Rudd PM, Gordon S (2008) The mannose receptor mediates dengue virus infection of macrophages. PLoS Pathog 4:e17 62. Pokidysheva E et al (2006) Cryo-EM reconstruction of dengue virus in complex with the carbohydrate recognition domain of DC-SIGN. Cell 124:485–493 63. Kanai R et al (2006) Crystal structure of west nile virus envelope glycoprotein reveals viral surface epitopes. J Virol 80:11000–11008 64. Li L, Lok SM, Yu IM, Zhang Y, Kuhn RJ, Chen J, Rossmann MG (2008) The flavivirus precursor membrane-envelope protein complex: structure and maturation. Science 319:1830–4 65. Modis Y, Ogata S, Clements D, Harrison SC (2003) A ligand-binding pocket in the dengue virus envelope glycoprotein. Proc Natl Acad Sci USA 100:6986–6991 66. Nybakken GE, Nelson CA, Chen BR, Diamond MS, Fremont DH (2006) Crystal structure of the West Nile virus envelope glycoprotein. J Virol 80:11467–11474 67. Rey FA, Heinz FX, Mandl C, Kunz C, Harrison SC (1995) The envelope glycoprotein from tick-borne encephalitis virus at 2 A resolution. Nature 375:291–298 68. Zhang Y, Zhang W, Ogata S, Clements D, Strauss JH, Baker TS, Kuhn RJ, Rossmann MG (2004) Conformational changes of the flavivirus E glycoprotein. Structure 12: 1607–1618 69. Kielian M (2006) Class II virus membrane fusion proteins. Virology 344:38–47 70. Lescar J, Roussel A, Wien MW, Navaza J, Fuller SD, Wengler G, Rey FA (2001) The Fusion glycoprotein shell of Semliki Forest virus: an icosahedral assembly primed for fusogenic activation at endosomal pH. Cell 105:137–148 71. Kuhn RJ et al (2002) Structure of dengue virus: implications for flavivirus organization, maturation, and fusion. Cell 108:717–725 72. Mukhopadhyay S, Kuhn RJ, Rossmann MG (2005) A structural perspective of the flavivirus life cycle. Nat Rev Microbiol 3:13–22 73. Allison SL, Schalich J, Stiasny K, Mandl CW, Kunz C, Heinz FX (1995) Oligomeric rearrangement of tick-borne encephalitis virus envelope proteins induced by an acidic pH. J Virol 69:695–700 74. Modis Y, Ogata S, Clements D, Harrison SC (2004) Structure of the dengue virus envelope protein after membrane fusion. Nature 427:313–319 75. Stiasny K, Allison SL, Schalich J, Heinz FX (2002) Membrane interactions of the tick-borne encephalitis virus fusion protein E at low pH. J Virol 76:3784–90
7
Role of Glycoproteins in Virus-Human Cell Interactions
179
76. Bressanelli S, Stiasny K, Allison SL, Stura EA, Duquerroy S, Lescar J, Heinz FX, Rey FA (2004) Structure of a flavivirus envelope glycoprotein in its low-pH-induced membrane fusion conformation. EMBO J 23:728–738 77. Gibbons DL, Vaney MC, Roussel A, Vigouroux A, Reilly B, Lepault J, Kielian M, Rey FA (2004) Conformational change and protein-protein interactions of the fusion protein of Semliki Forest virus. Nature 427:320–325 78. Ascenzi P, Bocedi A, Heptonstall J, Capobianchi MR, Di Caro A, Mastrangelo E, Bolognesi M, Ippolito G (2008) Ebolavirus and Marburgvirus: insight the Filoviridae family. Mol Aspects Med 29:151–185 79. Dolnik O et al (2004) Ectodomain shedding of the glycoprotein GP of Ebola virus. EMBO J 23:2175–2184 80. Falzarano D, Krokhin O, Wahl-Jensen V, Seebach J, Wolf K, Schnittler HJ, Feldmann H (2006) Structure-function analysis of the soluble glycoprotein, sGP, of Ebola virus. Chembiochem 7:1605–1611 81. Volchkov VE, Feldmann H, Volchkova VA, Klenk HD (1998) Processing of the Ebola virus glycoprotein by the proprotein convertase furin. Proc Natl Acad Sci USA 95:5762–5677 82. Alvarez CP, Lasala F, Carrillo J, Muniz O, Corbi AL, Delgado R (2002) C-type lectins DC-SIGN and L-SIGN mediate cellular entry by Ebola virus in cis and in trans. J Virol 76:6841–6844 83. Takada A et al (2004) Human macrophage C-type lectin specific for galactose and N-acetylgalactosamine promotes filovirus entry. J Virol 78:2943–2947 84. Schornberg KL, Shoemaker CJ, Dube D, Abshire MY, Delos SE, Bouton AH, White JM (2009) Alpha5beta1-integrin controls ebolavirus entry by regulating endosomal cathepsins. Proc Natl Acad Sci USA 106:8003–8008 85. Takada A, Watanabe S, Ito H, Okazaki K, Kida H, Kawaoka Y (2000) Downregulation of beta1 integrins by Ebola virus glycoprotein: implication for virus entry. Virology 278:20–26 86. Chan SY, Empig CJ, Welte FJ, Speck RF, Schmaljohn A, Kreisberg JF, Goldsmith MA (2001) Folate receptor-alpha is a cofactor for cellular entry by Marburg and Ebola viruses. Cell 106:117–126 87. Dolnik O, Kolesnikova L, Becker S (2008) Filoviruses: Interactions with the host cell. Cell Mol Life Sci 65:756–776 88. Chandran K, Sullivan NJ, Felbor U, Whelan SP, Cunningham JM (2005) Endosomal proteolysis of the Ebola virus glycoprotein is necessary for infection. Science 308:1643–5 89. Kaletsky RL, Simmons G, Bates P (2007) Proteolysis of the Ebola virus glycoproteins enhances virus binding and infectivity. J Virol 81:13378–13384 90. Schornberg K, Matsuyama S, Kabsch K, Delos S, Bouton A, White J (2006) Role of endosomal cathepsins in entry mediated by the Ebola virus glycoprotein. J Virol 80:4174–4178 91. Lee JE, Fusco ML, Hessell AJ, Oswald WB, Burton DR, Saphire EO (2008) Structure of the Ebola virus glycoprotein bound to an antibody from a human survivor. Nature 454: 177–182 92. Ito H, Watanabe S, Sanchez A, Whitt MA, Kawaoka Y (1999) Mutational analysis of the putative fusion domain of Ebola virus glycoprotein. J Virol 73:8907–8912 93. Malashkevich VN, Schneider BJ, McNally ML, Milhollen MA, Pang JX, Kim PS (1999) Core structure of the envelope glycoprotein GP2 from Ebola virus at 1.9-A resolution. Proc Natl Acad Sci USA 96:2662–2667 94. Weissenhorn W, Carfi A, Lee KH, Skehel JJ, Wiley DC (1998) Crystal structure of the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain. Mol Cell 2:605–616 95. Skehel JJ, Wiley DC (2000) A comprehensive review of the known properties of the influenza virus haemagglutinin and the structural basis of these properties. Ann Rev Biochem 69:531–569 96. Tsuchiya E, Sugawara K, Hongo S, Matsuzaki Y, Muraki Y, Li Z-N, Nakamura K (2002) Effect of addition of new oligosaccharide chains to the globularhead of influenza A/H2N2
180
97. 98.
99.
100. 101. 102.
T.A. Bowden and E.E. Fry virus haemagglutinin on the intracellular transport and biological activities of the molecule. J Gen Virol 83:1137–1146 Wei X, Decker JM, Wang S, Huxiong H, Kappes JC, Wu X, Salazar-Gonzalez JF, Salazar MG (2003) Antibody neutralization and escape by HIV-1. Nature 422:307–312 Fauquet CM, Mayo MA, Maniloff J, Desselberger U, Ball LA (2005) Virus Taxonomy:VIIIth Report of the International Committee on Taxonomy of Viruses. Elsevier Academic Press, London Sagar M, Wu X, Lee S, Overbaugh J (2006) HIV-1 V1–V2 envelope loop sequences expand and add glycosylation sites over the course of infection and these modifications affect antibody neutralization sensitivity. J Virol 80:9586–9598 Kwong PD, Wilson IA (2009) HIV-1 and influenza antibodies: seeing antigens in new ways. Nat Immunol 10:573–578 Calarese DA et al (2005) Dissection of the carbohydrate specificity of the broadly neutralizaing anti-HIV-1 antibody 2G12. PNAS 102:13372–13377 Santiago C, Celma ML, Stehle T, Casasnovas JM (2009) Structure of the measles virus hemagglutinin bound to the CD46 receptor. Nature Struct Mol Biol 17:124–129
Subject Index
A Analytical software, 77–78 Antibody, 9, 18, 21, 24–25, 40–44, 46–53, 65, 92–93, 96, 98, 110–111, 131–132, 137, 148, 171–174 Automatic annotation, 70–72, 74, 76–77, 80 B Bioinformatics, 11, 59–86 Biotherapeutics, 26 C Cancer, 11, 14–21, 24–25, 60, 96, 98–99, 105, 110 Carbohydrate analysis, 92–93, 119, 148 Carbohydrate database, 61–63, 69–70 Carbohydrate software tools, 77–78, 80, 84 D 3D structure, 61, 63, 65, 82–83, 85–86 E Endoglycosidases, 116, 138, 140–141 G Glycan enrichment techniques, 107–113 Glycans, 4–5, 7–22, 24, 26–30, 39–53, 60–61, 64–65, 69, 78, 80–81, 85, 91–100, 104–113, 115–117, 129–150, 174 Glycoengineering, 46–47 GlycomeDB, 61, 69–77, 86 Glycomics, 8–10, 29–30, 59–86, 94, 97–99, 114, 119, 162 Glycopeptides, 5, 10, 106–108, 110–119 Glycoproteomics, 1–30, 104–120, 127–151 Glycosylation, 4–5, 7–10, 13–24, 26–29, 41–44, 53, 61, 64, 68, 82–85, 92–96, 98–99, 104–108, 112–119, 127–151, 161, 164, 169, 172–174
Glycosylation sites, 8, 20–21, 44, 61, 64, 68, 82, 84–85, 106–108, 113–119, 131–132, 135, 137–138, 147, 161, 174 H High throughput, 9, 10, 12, 30, 78, 94, 103–120, 127–151 I Immune evasion, 164, 171, 173–174 Immunoglobulins, 19–20, 133, 137 L Lectin, 10, 21–22, 49, 51, 63–64, 68–69, 91–100, 107, 111–112, 116, 130–131, 138, 144 M Mammalian expression systems, 129–130, 140–142, 150 Mass spectrometry, 10–12, 43, 61, 64, 78, 103–120, 137, 146 Membrane fusion, 161, 173 Microarray, 69, 91–100 P Protein–protein interactions, 161–162 R Recombinant IgG, 43, 45, 48, 53 S Structural biology, 127–151 Structural virology, 159–175 Systems biology, 4, 8, 29–30 V Viral glycoprotein, 161–162, 164, 168, 174–175
R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4, C Springer Science+Business Media B.V. 2011
181