The Complement FactsBook

THE COMPLEMENT FactsBook Other books in the FactsBook Series: Robin Callard and Andy Gearing The Cytokine FactsBook S...

Author: Bernard J. Morley | Mark J. Walport

78 downloads 1458 Views 15MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

THE COMPLEMENT FactsBook

Other books in the FactsBook Series: Robin Callard and Andy Gearing The Cytokine FactsBook Steve Watson and Steve Arkinstall The G'Protein Linked Receptor FactsBook Rod Pigott and Christine Power The Adhesion Molecule FactsBook Shirley Ay ad, Ray Boot-Handford, Martin J. Humphries, Karl E. Kadler and C. Adrian Shuttle worth The Extracellular Matrix FactsBook, 2nd edn Grahame Hardie and Steven Hanks The Protein Kinase FactsBook The Protein Kinase FactsBook CD-Rom Edward C. Conley The Ion Channel FactsBook I: Extracellular Ligand-Gated Channels Edward C. Conley The Ion Channel FactsBook II: Intracellular Ligand-Gated Channels Edward C. Conley and William J. Brammar The Ion Channel FactsBook rV: Voltage-gated Channels Kris Vaddi, Margaret Keller and Robert Newton The Chemokine FactsBook Marion E. Reid and Christine Lomas-Francis The Blood Group Antigen FactsBook A. Neil Barclay, Marion H. Brown, S.K. Alex Law, Andrew J. McKnight, Michael G. Tomlinson and P. Anton van der Merwe The Leucocyte Antigen FactsBook, 2nd edn Robin Hesketh The Oncogene and Tumour Suppressor Gene FactsBook, 2nd edn Jeffrey K. Griffith and Clare E. Sansom The Transporter FactsBook Tak W. Mak, Josef Penninger, John Rader, Janet Rossant and Mary Saunders The Gene Knockout FactsBook Steven G.E. Marsh, Peter Parham and Linda D. Barber The HLA FactsBook

THE COMPLEMENT FactsBook Bernard J. Morley Mark J. Walport Imperial College School of Medicine Hammersmith Campus, London, UK

ACADEMIC PRESS A Harcourt Science and Technology Company

San Diego San Francisco New York Boston London Sydney Tokyo

This book is printed on acid-free paper. Copyright © 2000 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press A division of Harcourt Science and Technology Company 24-28 Oval Road, London NWl 7DX, UK http://www.hbuk.co.uk/ap/ Academic Press 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com ISBN 0-12-733360-6 Library of Congress Catalog Card Number: 99-65744 A catalogue for this book is available from the British Library

Typeset by Mackreth Media Services, Hemel Hempstead, UK Printed in Great Britain by Redwood Books, Trowbridge, Wiltshire 00 01 02 03 04 RB 9 8 7 6 5 4 3 2 1

Contents Abbreviations

Vll

Preface

Vlll

Section I THE INTRODUCTORY CHAPTERS Chapter 1 Introduction Bernard J. Morley and Mark J. Walport Chapter 2 The Complement System Bernard J. Morley and Mark J. Walport

Section II THE COMPLEMENT PROTEINS Part 1 C l q and the Collectins Clq Franz Petry and Michael Loos Mannose-binding lectin Peter Lawson and K.B.M. Reid Bovine conglutinin Peter Lawson and K.B.M. Reid SP-A Robert B. Sim SP-D Robert B. Sim Part 2 Serine Proteases Clr Nicole Thielens and Gerard J. Arlaud Cls Nicole Thielens and Gerard J. Arlaud MASP-1 Teizo Fnjita, Yuichi Endo and Misao Matsushita MASP-2 Steen V. Petersen and Jens C. Jensenius

16 31 36 41 46

52

56

61

65

Factor D 69 Jurg Schifferli and Sylvie Miot C2 73 Yuanyuan Xu and John E. Volanakis Factor B 78 Antonella Circolo and Harvey R. Cohen Factor I 83 Bernard f. Morley Part 3 C3 Family C3 Marina Botto C4 David E. Isenman C5 Rick A. Wetsel Part 4 Terminal Pathway Components C6 Michael Hobart C7 Michael Hobart

88 95 104

112 117

Contents

C8 Francesco Tedesco, Mnason E. Plumb and fames M. Sodetz C9 B. Paul Morgan

123 131

Part 5 Regulators of Complement Activation (RCA) CRl 136 Lloyd B. Klickstein and Joann M. Moulds CR2 146 Joel M. Guthridge and V. Michael Holers Decay-accelerating factor 152 L Kuttner-Kondo, W.G. Brodbeck and M.E. Medof Membrane cofactor protein 156 M. Kathryn Liszewski and John P. Atkinson C4b-binding protein 161 Santiago Rodriguez de Cordoba, Olga Criado Garcia and Pilar Sanchez-Corral Factor H 168 Richard G. DiScipio

Part 6 Cell Surface Receptors ClqRp Andrea J. Tenner C3a receptor Robert S. Ames C5a receptor Andreas Klos and Wilfried Bautsch CR3 Yu Xia and Gordon D. Ross CR4 Alex Law

176 180 184 188 198

Part 7 Miscellaneous Complement Components CI inhibitor 206 Rana Zahedi and Alvin E. Davis III Apolipoprotein J (clusterin) 210 Mark E. Rosenberg Properdin 215 Timothy Parries CD59 219 B. Paul Morgan 223 Index

Abbreviations ClINH C4BP CRD CRP DAF EBV EGF FGF fMLP GPI HIV IFNy Ig IL-1 LAD LPS MAC MBL MCP MHC MIDAS M,(K) NK PDGF PMA PMN PTK RaRF RFLP SAP SDS-PAGE SLE TGF^ TNFa VNTR VWF

CI inhibitor C4b-binding protein carbohydrate-recognition domain C-reactive protein decay-accelerating factor Epstein-Barr virus epidermal growth factor fibroblast growth factor formyl-methionyl-leucyl-phenylalanine glycosylphosphatidylinositol human immunodeficiency virus interferon y immunoglobulin interleukin 1 leukocyte adhesion deficiency lipopolysaccharide membrane attack complex mannose-binding lectin membrane cofactor protein major histocompatibility complex metal ion-dependant adhesion site relative molecular mass natural killer platelet-derived growth factor phorbol myristate acetate polymorphonuclear leukocyte protein tyrosine kinase Ra-reactive factor restriction fragment length polymorphism serum amyloid protein polyacrylamide gel electrophoresis in sodium dodecyl sulfate systemic lupus erythematosus transforming growth factor ^ tumour necrosis factor a variable number tandem repeat von Willebrand factor

Preface The authors wish to thank all those who contributed entries for this volume and for their comments and suggestions. In addition, we are indebted to a number of contributors for additional information they provided. Dr Robert Sim for Figure 2 in Chapter 2, Dr David Isenman for the C3 and C4 catabolism diagrams and Dr Robert Ames for the C3a and C5a receptor diagrams. We would also like to thank Dr James Sodetz for advice on nomenclature, and Dr Alex Law for providing much of the information used in the CR3 chapter on deficiency and polymorphism, including unpublished data. We would like to thank Dr Robert Sim for critical reading of the introduction and Jane Rose for prolific proofreading. Finally, we would like to thank Dr Lilian Leung for her encouragement in the final stages of the preparation of this book. The field of complement is rapidly changing with the constant addition of new data. In light of this, we would be grateful if readers could point out any errors, omissions or indeed new information which could then be incorporated into future editions of this book. Please send these to the Editor, The Complement FactsBook, Academic Press, 24-28 Oval Road, London NWl 7DX, UK.

Bernard J. Morley

Mark f. Walport

Section I

THE INTRODUCTORY CHAPTERS

This Page Intentionally Left Blank

1 Introduction AIMS AND SCOPE OF THE BOOK The aim of this book is to present concise biochemical information about the proteins of the complement system. A novel aspect of this book compared with others in the FactsBook series is the inclusion of cDNA structure and intron-exon boundary details. This enables the design of primers for DNA amplification by the polymerase chain reaction, facilitating both functional mutation studies and the design of probes for expression work. The focus of the book is on the human system, though accession numbers have been included for other species. In the case of conglutinin, where no human homologue has been identified, the bovine molecule has been described. The complement proteins are largely built up from protein modules and it is therefore quite easy to divide them into families of structurally related molecules. This is the basis for the separate chapters. A few proteins escape such simple classification (CI inhibitor, apolipoprotein } (clusterin), properdin and CD59) and these have been grouped together in a separate chapter.

ORGANIZATION OF THE DATA Entries are classified into the following sections, each of which is briefly described.

Other names Entries are identified by the accepted nomenclature for the complement system as described^'^. More recently characterized components are entered under their most commonly used name. Historically, many of the complement proteins have been known by alternative names, or were identified as members of other protein families. Hence different researchers may know them by different designations. All of these alternatives have been included.

Physicochemical properties This section includes data on the number of amino acids in the mature protein and leader peptide (if present); the pi; the molecular weight, both observed under reduced and non-reduced conditions, and predicted based on amino acid composition; the number and location of putative N-glycosylation sites, and if known, whether the sites are occupied; and the number and location of interchain disulfide bonds. Intrachain disulfide bonds are not listed, nor are O-linked glycosylation sites, though the latter are mentioned in the structure section.

Structure Details of the three-dimensional structure where known are included in this section together with any other significant features.

Function The mechanism of activation of the molecule is detailed in this section, together with a brief description of its role in the complement pathway. Other functional activity, outside the complement pathway is also mentioned. The modular structure of each protein is illustrated and the functional importance of each

Introduction

Table 1. Key to the schematic diagrams. All diagrams show modules to scale, with the key illustrating average sizes. SYMBOL

PROTEIN MODULE

ABBREVIATION

Complement control protein repeat

CCP

Serine protease domain

—

Factor I/membrane attack complex C6/7 module

FIMAC

0

Epidermal growth factor-like repeat

EGF

I

Calcium-binding epidermal growth factor-like repeat

Ca^+ EGF

iiiiiiiiiiiiiiiii

Von Willebrand factor type A

VWFA

Thrombospondin type 1 repeat

TSPl

0

Low density lipoprotein receptor class A repeat

LDLRA

mm

CUB domain (first identified in Clr/Cls, uEGF, bone morphogenic protein) Membrane attack complex proteins/perforin-like segment

CUB

Collagen-like domain

—

Carbohydrate-recognition domain

CRD

Alpha-helical coiled-coil "neck'' region

—

•

MACPF

Serine, threonine, proline-rich mucin-like domain STP

I

Cytoplasmic domain

—

Transmembrane domain ( Q for C3aR and C5aR) — Glycosylphosphatidylinositol anchor

GPI anchor

Other domains (see individual sections) Scale: 200 amino acids module noted. A key for the common protein modules is provided in Table 1, together with their full names and the abbreviations^ used throughout the text. Modules which are only present in a single protein in this book, are indicated by a white box and the nature of that module is indicated in the protein modules

Introduction

section of the particular entry. For non-modular proteins such as the C3a and C5a receptors, a diagram has been included only if this helps to illustrate important structural features. In the case of C3 and C4, a diagram has been included to show the degradation pathways of these proteins since this is pertinent to their function.

Tissue distribution For the secreted proteins, the typical serum concentration is provided and other biological fluids known to contain the protein are indicated. The primary site of synthesis is given, together with secondary sites. These are not meant to be exhaustive lists of cells expressing a given protein. In many cases, C3 for example, a large number of cell types have been assayed for expression. However, the absence of a cell or tissue from the list should not be taken as evidence that there is no expression from that cell type. For cell surface proteins, cell types which have been clearly demonstrated to express the molecule are listed.

Regulation of expression Stimuli which alter protein expression are described. Mechanisms, if known, are detailed.

Protein sequence The sequence is shown in the single letter amino acid code. Numbering starts with the initiator methionine residue. The leader sequence is underlined, as are cleavage sites between chains and any special features of specific molecules, for example the residues which form the thioester bond in C3/C4 and the transmembrane domains of the C3a and C5a receptors. Putative and known N-linked glycosylated sites are indicated by N. Sites known not to be occupied are not indicated.

Protein modules For the protein modules listed in Table 1, the leader sequence and some important binding regions, the amino acid boundaries and exons are indicated. For C3 and C4, the thioester domain is indicated, while for the serine proteases, the position of the catalytic triad of the active site (Fi-D-S) is listed.

Chromosomal location The chromosomal location of the gene in both human and mouse, where known, is given. Closely linked genes are also indicated.

cDNA sequence The cDNA sequence is given. Where known, the sequence starts with the 5' end of the message. Otherwise, the most 5' sequence is given. All possible exons are included in the sequence. Where alternative splicing removes an exon from the mature message, this is noted. The initiation codon, termination codon and the putative polyadenylation signals are all indicated. In addition, exon-intron boundaries are shown by underlining the first five nucleotides in each exon. No

Introduction

intronic sequences are included. Where there are discrepancies in published sequences, these are indicated.

Genomic structure where the structure of the human gene is known (with the exception of conglutinin, for which the bovine gene structure is given), this is drawn to scale. The gene is represented by a single horizontal line while the exons are indicated by vertical bars, also to scale. Only the first and last exons are numbered, together with a central exon for the larger genes.

Accession numbers Only the GenBank/EMBL accession numbers are included. These are listed as cDNA or genomic depending on the sequences they contain.

Deficiency The mode of inheritance of deficiency in humans is stated together with the functional effects of deficiency and any clinical correlates. The molecular basis is stated, for example in factor I: A1282 to T, H418 to L; three chromosomes/patients/families where A is the normal nucleotide 1282 is the position in the presented cDNA sequence T is the mutant nucleotide H is the normal amino acid 418 is the position in the presented protein sequence L is the mutant, non- or aberrantly functional amino acid and 'three chromosomes/patients/families' represents the number of times this mutation has been described.

Polymorphic variants Polymorphic variants at the protein level, at the level of restriction fragment length polymorphisms (RFLPs) or where the molecular basis is fully described are listed. Alleles are named A/B where A is the nucleotide/amino acid to the left of the numbering.

References A fully comprehensive list of references is not compatible with the format. However, each entry includes the major references, while key references are highlighted in bold. These represent either important work in the field or key reviews which will link to further references.

References ^ World Health Organization. (1968) Bull. WHO 39, 935-938. 2 lUIS-WHO Nomenclature Committee (1981) J. Immunol. 127, 1261-1262. ^ Bork, P. and Bairoch, A. (1995) Trends Biochem. Sci. 20, Suppl. March C03.

2 The Complement System HISTORICAL PERSPECTIVE In the late nineteenth century, much scientific interest was focused on the mechanisms involved in protecting the body from attack by microorganisms. Two apparently contradictory theories of bacteriolysis emerged during this time. The first, the ''cellular theory'', stemmed from the work of Elie Metchnikoff who demonstrated the existence of blood cells which could ingest invading bacteria. The second, the "humoral theory" of bacteriolysis, was based on work from Fodor, Nuttall and Buchner who identified a heat-labile component of fresh, cell-free serum which was capable of bacteriolysis^. Buchner termed this activity "alexin", from the Greek "without a name". In 1894, Pfeiffer observed that cholera vibrios injected into the peritoneum of immune guinea pigs were lysed^. Towards the end of the nineteenth century, Bordet working at the Pasteur Institute, extended this work by demonstrating that serum from immune animals lost its lytic activity after heating but that activity could be fully restored by the addition of non-immune serum. Bordet surmised that two factors were involved, one of which was heat-labile and the other was a stable substance present in immune serum^. The former he assumed was alexin while the latter he termed the "sensitizer". Meanwhile, Ehrlich and Morgenroth, examining erythrocyte haemolysis by immune serum, confirmed the idea that two "principles" were required for lysis. The first principle, which was present in a thermostable form in immune serum, they termed "amboreceptors" or "immune bodies". The second, a heat-labile substance present in the "body juices", they called "complement" due to the fact that it "complemented" the activity of the amboreceptors. However, it was Bordet and Gengou who described the first complement fixation test, thereby establishing the quantitative role played by complement in cell lysis and dispelling the idea that it was merely an accessory factor as implied by Ehrlich's name. For this reason, Bordet is generally credited with the discovery of the complement system. In the absence of robust biochemical techniques, elucidation of the proteinaceous nature of complement and of the multiple components proceeded fairly slowly over the next 40 years. However, by the late 1920s due to the work of Ferrata initially, and Coca and Gordon subsequently, four individual components were recognized. By 1941, Pillemer and co-workers had confirmed the proteinaceous nature of complement^. During the 1960s, Nelson characterized at least six components from guinea pig serum that were necessary for haemolytic activity^, while MiillerEberhard and colleagues focused on the purification and characterization of each of these components^. Also in the 1960s, Ueno and later Mayer used a reconstitution assay, adding partially purified components to antibody-sensitized sheep red blood cells, to unravel the reaction sequence of the classical pathway. The identification of the alternative pathway involved many of the same investigators in another complex challenge. Pillemer described the depletion of C3 from serum by zymosan in the absence of any effect on CI, C2 and C4 levels in 1953. He also identified properdin as an activating factor in what he termed the properdin pathway^. Nelson offered an alternative explanation for these data in 1958*. He proposed that the properdin system was actually the classical pathway, but activated via antibodies to zymosan. In 1971, Miiller-Eberhard purified C3 proactivator and proposed the C3 activator system as an alternative method of complement activation^, thus supporting Pillemer's original hypothesis.

The Complement System

MODULAR STRUCTURE OF COMPONENTS The cloning and sequencing of the complement components in the last 20 years has augmented the extensive protein sequence already in existence and enabled protein structures to be identified. This has revealed the modular nature of the complement proteins and allowed their classification into five functional groups based on common structural motifs.

Clq and the coUectins (Figure 1) SP-D

I_J4^^^

SP-A C1q chains Conglutinin MBL Figure 1. Modular structure of Clq and the coUectins. See Table 1 for key. Additional domains are the globular region for Clq fCI^J; ^^<^ for conglutinin and MBL, the N-terminal cysteine-rich region f[]j.

The structure of Clq is unusual and was originally described^^, supported by electron micrographs, as resembling a ''bunch of tulips''. Clq has 18 subunit chains formed into collagen-like triple helices with globular heads, through which Clq interacts with immunoglobulin. The serum lectin molecules, MBL and conglutinin, together with the lung surfactant proteins, SP-A and SP-D, share a marked similarity to Clq. They also have 12-18 polypeptide chains organized into the collagen-like domain with a globular C-terminus; however, in contrast to Clq, the globular domains of the coUectins bind a range of sugar moieties (Figure 2).

Serine proteases (Figure 3) The enzymes of the complement system are serine proteases of the chymotrypsin family and trypsin subfamily. The distinguishing feature of this group is the serine protease domain containing the catalytic triad of histidine, aspartic acid and serine^^ The remaining domains of the enzymes such as the CCPs of C2 and factor B, are probably involved in binding and substrate specificity.

SHRS (SP-A a3)

2x a2 + 4x a3

SH6R (SP-A a2 and a3)

S H20126

SH

a2 ril u3

a3 a?a2

Dimer

Hexamer

Figure 2. Chain association i n the collectins: the proposed disulfide pattern, oligomerization, and overall morphology o f S P - A . The disulfide pattern is based on the presence of both the 02 and a3 chains i n each S P - A molecule, which allows the optimum pairing of all the cysteine residues involved (based on reference 27).

The Complement System

FD

C1r

C1s

MASP-1

Hi

MASP-2

FB

C2 Fl Figure 3. Modular structure of the serine proteases. See Table 1 for key. Additional domain is the scavenger receptor cysteine-rich or CDS domain of FI (\ \).

a chain

C3

P chain P chain a chain

C4

y chain

C5

°n^^^^

Key: Disulphide bonds E3 ll^iiiiiiiii^iill

family.

a chain p chain

1 A Thioester site

igure 4. The C3

•••

j_

C3a/C4a/ C5a C3d/C4d

400 aa

1

The Complement System

C3 family (Figure 4) The C3 family of proteins, C3, C4 and C5 together with the non-complement protein a2"i^^croglobulin, are thought to have evolved from a common ancestral gene. An approximate pairwise identity of 25% has been identified between the molecules and a 75% identity between the same protein from different species^^. They are synthesized as single chains, but post-translational processing involving proteolytic modification, glycosylation and sulfation results in the disulfide-linked multichain structure. The most important feature of these molecules, though lacking in C5, is the internal thioester^^. This is formed in the a chain between the cysteine and glutamic acid of the tetrapeptide ''cysteine-glycine-glutamic acidglutamine'' and allows the activated C3b* and C4b* molecules to bind covalently to other molecules (Figure 5). Glycine H COO-

../™xCO NH CO -

NH

^

. \

\

CO

/ /

/

,CH2

CHc

NH

CH Glutamic acid

Cysteine CH

I

I

CHp

CO

SH

NH

I ^

I

CH

/

Glutamine

\

0 = C ' - C H 2 -CH ;2 NHo

CO - - NH

Glycine H

/

NH

/ CO -

NH

^

. \

Cysteine CH CH

Thioester bond

CH

COO"

\

/ CH2 CH2

NH

CO

/

CH Glutamic acid

I

CO

2 \

I

NH

\

CH C -CH2 -CH 2

Glutamine CO

NH

Figure 5. Formation of the highly reactive thioester bond in C3 and C4. The thioester bond is in bold.

The Complement System

Terminal pathway (Figure 6) The terminal components, from C6 to C9, are composite molecules, built on a framework of four distinct modules,- a thrombospondin type 1 repeat, a lowdensity lipoprotein receptor class A repeat, a perforin-like segment and an epidermal growth factor-like repeat. The most crucial of these is the perforin or lytic module. Perforin is the molecule released from intracellular granules by cytotoxic T cells. It requires calcium ions alone to generate cylindrical structures penetrating cell membranes which have a marked similarity to the membrane attack complex (MAC)^^. Although C9 in vivo requires C6-8 to perform the same function, under certain conditions it can polymerize to form similar cylindrical structures. The function of the remaining modules in the proteins remains unclear.

C6

C7

K. \ K • \ j'

A

y

fe>^^<^^
C8a C8p C9

10

Figure 6. Modular structure of terminal pathway components. See Table 1 for key to modules.

Regulators of complement activation (RCA) (Figure 7) Proteins in this group share varying numbers of the repeating motif termed the complement control protein (CCP) repeat, and also known as the short consensus repeat (SCR), first identified in factor B^^. These globular units contain approximately 60 amino acids, built around a backbone of four disulfide-bonded cysteine residues; first to third and second to fourth. A framework of residues are conserved around these cysteines. The number of CCPs varies from two in C2 and factor B to 30 in CRl and they are thought to play a role in binding to C3, C4 and their breakdown products. The genes encoding these regulatory molecules are closely linked on chromosome lq32 in humans and are probably derived from a single ancestral gene.

The Complement System

CR1 CR2

DAF

MCP C4BPa C4BP(3

—mj

FH Figure 7. Regulators of complement activation. See Table 1 for key. Additional domain is the C-terminal oligomerization domain for C4BP ([Z\). CRl is represented as four long homologous repeats (bracketed) of seven CCP domains each, while CR2 has four repeats containing four CCP modules each. In CR2 the alternatively spliced eleventh CCP domain is represented as (9)±.

PATHWAYS The complement system is composed of four pathways (reviewed in ref. 16). Three of these (the activation pathways) are involved in the generation of C3 and C5 convertases which feed into the fourth, terminal or lytic, pathway (Figure 8). Different activators initiate each of the pathways. Thus, the classical pathway, which links innate and adaptive immunity, is activated by the binding to Clq of antibodies complexed with antigens. It is also activated in an antibody-independent manner by the binding of C-reactive protein, complexed with ligand, and by many pathogens including gram-negative bacteria. The alternative pathway is antibodyindependent. It is activated by the direct covalent binding of C3 to pathogens and ''altered self'. The lectin pathway, though structurally very similar to the classical pathway and utilizing many of the same components, is activated by the binding to MBL of the sugar mannose, present on the surface of many pathogens. Each pathway is highly regulated to avoid inappropriate activation and consumption of the component molecules. This is particularly true for the alternative pathway, where the C3 convertase incorporates activated C3 into the enzyme complex, resulting in an amplification loop. Without precise control, alternative pathway components would be rapidly utilized. Control is provided by both serum and cell surface molecules.

The Complement System

Classical Activation Pathway

Mannose-binding lectin Activation Pathway

Alternative Activation Pathway

C1q

MBL

C3 + H2O

CIrg-CISg

MASP-1,2

C3(H20)

D \AA/^^- C3(l-^)B

C4

C3(H20)Bb C4b

+ cis \ A A / ^ C2 -^AAA; MASP I C3 Convertase

C4b2a

C5 Convertase

C4b2a3b

C3

r"^C3a

C2b

C3bBb

Terminal Pathway

Membrane Attack Complex L_

M-C5b678:poly9

Figure 8. Overview of the activation of the complement system. Open arrows represent activation via changes in conformation while \AA/W represents an enzymatic cleavage step. Overlined components (Cls) are activated enzymes, derived from zymogen precursors.

The Complement System

Classical pathway The CI complex, which initiates activation of the classical pathway, is a multimolecular complex containing one molecule of Clq with two molecules each of Clr and Cls. The binding of the globular domains of Clq to the Fc regions of immunoglobulins, specifically IgG and IgM, initiates a series of conformational and enzymatic changes within the complex^ ^. Initiation is also brought about by a variety of pathogens in the absence of specific antibodies. First, CI inhibitor (Cl-INH) is displaced from the complex, allowing autoactivation of the proenzyme Clr. This in turn cleaves and activates the second Clr molecule. The two molecules of Cls are then rapidly cleaved by Clr to form the active CI serine protease. At this point, ClINH acts as a second stage inhibitor, binding to the Clr and Cls active sites, causing their inactivation and dissociation from the complex. Activated Cls cleaves C4, resulting in the production of the anaphylatoxin C4a and C4b. Exposed within the C4b fragment is a highly labile internal thioester bond (C4b*). Most of this C4b* is hydrolysed in the fluid phase. However, in the presence of an activating surface such as a microorganism or an immune complex, C4b* reacts with hydroxyl or amino groups via its reactive acyl group, resulting in deposition in clusters close to the activating CI complex. Bound C4b acts as an acceptor for binding of C2, which is then cleaved by Cls. The small C2b fragment is released while the C2a, containing a serine protease domain, remains bound to C4b and forms the classical pathway C3 convertase, C4b2a. The convertase then cleaves C3 at a single point in the a chain, releasing the anaphylatoxin C3a and the highly labile C3b*. As with C4b*, cleavage of C3 to C3b* exposes the reactive thioester which is now available to bind covalently to the activating surface. Binding of C3 in the vicinity of the C3 convertase results in the formation of the classical pathway C5 convertase, C4b2a3b. The exposed thioester, as with C4b*, is extremely labile and consequently only C3b in the immediate vicinity of the convertase will escape hydrolysis and inactivation. Considerable amplification occurs during activation, with approximately 240 C3b molecules deposited on the activating cell surface for each Clq molecule^*. It is this amplification which would result in rapid depletion of complement if it were not carefully regulated by a number of cell surface and serum proteins. The role of ClINH has already been discussed. The remainder of the control proteins function either to inhibit association of the convertases or to promote their dissociation and catabolism. Factor I is a member of the serine protease family and inactivates both C3b and C4b in the presence of a cofactor, either CRl and MCP (both membrane-bound) or for C4, C4BP (fluid phase). Factor H, which is a fluidphase cofactor for C3 degradation by factor I in the alternative pathway, may also function in the classical pathway. C4BP, along with CRl and DAF, another membrane-bound protein, accelerate dissociation of the C3 and C5 convertases^^.

Mannose-binding lectin pathway The lectin pathway is highly analogous to the classical pathway and shares C2, C3 and C4 with it. However, initiation of the pathway is antibody-independent with the difference lying in the initiation complex. In the lectin pathway, the CI complex is replaced by a homologous complex, containing MBL and MASP-1 and -2. MBL is activated by binding to mannose-containing proteins or carbohydrates

The Complement System

on bacterial or viral surfaces^^. The MASP-1 and -2 enzymes function in an analogous manner to Clr and Cls and the result is cleavage of C4. The pathway then follows the classical pathway.

Alternative pathway This pathway is also antibody-independent and provides an initial line of innate immune defence. It is activated by a host of diverse substances including zymosan, lipopolysaccharide from gram-negative bacteria, cell wall teichoic acid from grampositive bacteria, whole microorganisms, lymphoblastoid cells and certain mammalian cell surfaces such as rabbit erythrocytes^^. The exact processes involved in alternative pathway activation are not fully defined, however, the most likely explanation remains the ''tickover'' hypothesis^^. This is based on the continuous low-grade hydrolysis of circulating serum C3 to produce a molecule, C3(H20), which is structurally and functionally similar to C3b* (Figure 9). The

C3

3(HP)

Continuous low grade hydrolysis

B

C3(HP)B -^VAAAD

1

C3(H,0)Bb -i- Ba Carboxypeptidase N

y

C3-

C3b* + C3a

C3bBbP DAF--3 CR1—3

Properdin

Ba +C3bBb-^

(J.

B

Fl + cofactor (FH orMCPorCRI depending on local environment)

t -C3bB

Figure 9. The formation and control of the alternative pathway amplification loop. Dashed arrows represent inhibition or dissociation, while v\A/w represents an enzymatic cleavage step. Overlined components (C3hBb) are activated enzymes, derived from zymogen precursors, and C3b* represents C3 with a reactive thioester.

The Complement System

C3(H20) can then bind factor B. Factor B, bound to C3(H20), is a substrate for factor D, which, unlike other complement serine proteases that are present as zymogens in the serum, circulates in its active form. The C3(H20)Bb complex so formed is an unstable fluid-phase C3 convertase and can cleave C3 to yield C3a and the labile C3b* component. If an activating surface is present then the C3b* will bind and initiate the positive feedback loop characteristic of the alternative pathway. The activated C3b* is rapidly hydrolysed and catabolized in the absence of an activating surface. In contrast, C3b bound to an activating surface will bind factor B. The C3bB is a substrate for further cleavage by factor D, resulting in the formation of the C3 convertase C3bBb. The stability of this convertase is enhanced by the binding of properdin. Bb is the serine protease element of this convertase, playing an analogous role that of the homologous C2a fragment in the classical C3 convertase, C4b2a3b. Since C3 is a component of its own convertase, this loop of activation creates considerable C3b binding in the vicinity of the initiator convertase molecule. The addition of a second C3b molecule to the C3 convertase results in the formation of the alternative pathway C5 convertase C3bBb3b. The fate of C3b is critical to the regulation of the amplification loop. Persistence of C3b allows further binding of factor B, leading to the formation of more C3bBb convertase enzyme and hence, amplified C3 cleavage. Catabolism of C3b prevents formation of the convertase enzyme and inhibits amplification of C3 cleavage. The key enzyme that catabolizes C3b is factor I, and this requires one of a number of cofactors to bind to C3b. In plasma, C3b is rapidly catabolized by factor 1 using factor H as a cof actor. On cell surfaces, MCP or CRl are the major intrinsic cell membrane cofactors for factor I. Factor H may also bind as an extrinsic cofactor from plasma. There is competition for binding to C3b between factor B and these cofactors of factor I. If the binding of factor B is favoured over cofactor binding, then C3 convertase formation results, leading to amplified C3 cleavage. If factor H, CRl or MCP binding are favoured, then C3b catabolism by factor I occurs, which downregulates complement activation. Autologous cell surfaces are highly protected from C3 convertase formation by several mechanisms. The first is the presence of intrinsic cofactors for factor I such as MCP and C R l . The second is their sialic acid content, which favours factor H binding to C3b compared with factor B. The third mechanism is the presence of other cell membrane molecules such as DAF, which inhibit C3 convertase assembly and promote convertase dissociation. In contrast, bacteria and other pathogens lack the host cofactors for C3b cleavage by factor I, though some have evolved similar molecules. The carbohydrate composition of many pathogens, poor in sialic acid, also favours the binding of factor B rather than factor H, promoting convertase assembly and C3 activation^^.

Terminal pathway The three distinct activation pathways of complement converge with the formation of a C5 convertase and it is cleavage of C5 by these convertases that initiates the lytic or terminal pathway. This first step releases the powerful anaphylatoxin C5a and leaves the metastable C5b*. Unlike the closely related C3 and C4 molecules, C5 does not contain an internal thioester bond. Despite this, binding of C5b to the surface-bound C3b in the convertase causes the transient exposure of a hydrophobic acceptor site for C62^. In contrast to the activation

The Complement System

pathways, which require enzymatic cleavage for activation, the terminal pathway relies on conformational changes induced by binding. Thus, binding of C6 facilitates binding of C7 which alters the conformation of the complex. First, C5b67 is released from the C5 convertase and secondly, exposure of a hydrophobic region of C7 allows penetration into the lipid bilayer of the cell membrane. Binding of C8 to the C5b67 complex occurs via the P subunit, resulting in a conformational change and insertion of the a chain of C8 into the membrane. A variable number of C9 molecules associate with the C5b678 complex, resulting in the mature membrane attack complex (MACj^^^. During activation of complement, some of the forming C5b67 complexes may deposit on neighbouring cells, which could result in MAC formation and lysis, socalled ''bystander lysis''. However, this phenomenon is prevented by clusterin in the fluid phase and also by CSp, which binds to C5b67 close to the site of membrane attachment, preventing membrane insertion. Autologous cell surfaces are also protected by the important regulatory protein CD59, which inhibits the incorporation of C8 and C9 into the MAC. The formation of the MAC in a cell or bacterial membrane has a number of possible consequences. The first of these is lysis. This results from membrane disruption caused by the formation of MAC-induced ion-permeable channels^^ and/or leaky patches^^. The second is the triggering of a variety of cellular metabolic pathways. Activation of these pathways may result in the synthesis and release of inflammatory mediators such as leukotrienes and prostaglandins. However, injury does not always result from the insertion of MAC into cell membranes. Cells may release fragments containing clustered MACs as part of the defence mechanism against self-injury by autologous complement activation.

FUNCTION Detailed consideration of the physiology and pathology of the complement system is outside the scope of this book. The reader is referred to several recent reviews and books describing the complement system^<^'27-29 Figure 10 summarizes the different physiological activities of complement. Complement is a major component of the innate immune system and the two main pathways of activation of complement, as an innate host defence mechanism, are by the mannose-binding lectin pathway and the alternative pathway. However, the classical pathway is also activated, in the absence of specific antibody, by gram-negative bacteria and a number of viruses. The links between the adaptive immune system and complement are also extremely important and may be divided into two categories. The first is the activity of the complement system as one of the major effector pathways of host defence following engagement of the adaptive immune system. The binding of antigen by the majority of antibody classes causes activation of the classical pathway of complement. This leads to deposition of the complement opsonins, C4b and C3b, on the immune complexes, release of the anaphylatoxic peptides, C3a and C5a, and formation of the membrane attack complex. This pathway enables and enhances the killing of bacteria which have bound specific antibody. The second link between adaptive immunity and complement is the role of complement in enhancing and amplifying adaptive immune responses-^^^^.

The Complement System

antigen / antibody immune complex

microbes witti terminal mannose groups

bacteria, fungi, virus or tumour cells

MBL - MASP1 - MASP2

C3(H20)

< H . ^ f ^ anaphylatoxin ne complex ^ ^ -'^ 8 immune c a C f ^ - ^opsonin [ modification fi lymphocyte activation clearance of apoptotic cells

C5

*-anaphylatoxin

C6 C7 C8

^ - «• lysis poly-CQ-V. ^ ^

I, *•

membrane perturbation Figure 10. The effector mechanisms of the complement system. Dashed arrows indicate the functions of pathway components. Examples in this category include the reduction of B cell thresholds for activation by ligation of C3 receptors on B lymphocytes; the role of C3 receptors on follicular dendritic cells in germinal centres in maintaining immunological memory; and the role of immune complexes triggering complement in the induction phase of contact hypersensitivity reactions^^.

Complement and disease Complement plays an important role in the pathogenesis of many diseases. Complement deficiency, though extremely rare, is illuminating because it illustrates the unique roles of the complement system in the context of the milieu

The Complement System

of the Hving organism^^. Much more common are diseases in which activation of complement plays an important role in tissue injury and also in the resolution of inflammation.

Complement deficiency Much has been learnt about the physiological activities of the complement system by studying patients with spontaneous inherited deficiencies of individual complement proteins^^. The molecular basis of complement deficiency is described in each of the relevant entries. Three categories of disease are associated with inherited deficiencies of complement proteins: infectious diseases, systemic lupus erythematosus (SLE)-like diseases and CI inhibitor deficiency-linked diseases. Infectious diseases illustrate the role of complement in host defence against infection. Three types of infection may be seen. (1) Pyogenic infections associated with C3 deficiency can be due to mutations either in the C3 gene, or in the control proteins, factor I or factor H, deficiency of which cause unregulated C3 consumption. These demonstrate the role of complement as opsonin. (2) Pyogenic infections associated with partial MBL deficiency^^ occur predominantly early in life, during the period between loss of maternally transferred immunity and the development of a mature antibody system. These illustrate the activity of the MBL pathway in vivo. (3) Infections associated with Neisseria occur in patients lacking MAC proteins or proteins of the precursor activation pathways of the MAC. This association demonstrates the role of complement-mediated lysis in host defence against neisserial infection^^. Systemic lupus erythematosus-like diseases, the second category of disease associated with complement deficiency, are linked to deficiencies of the classical pathway proteins Clq, Clr and Cls, C4 and C2. The precise mechanism of this association is not established but recent evidence suggests that it may illustrate the role of complement in the normal process of resolution of inflammation, especially in the scavenging and clearance of immune complexes and apoptotic cells-^''. In diseases associated with CI inhibitor deficiency^*, heterozygous deficiency of a regulatory protein leads to loss of control of the activation of Clr, Cls, plasmin, factor Xlla and kallikrein. Attacks of angioedema follow and are thought to be mediated by kinins, including a kinin derived from C2, and bradykinin. Much has been learnt by the study of those rare and unfortunate humans with spontaneous inherited complement deficiencies. A number of spontaneous models of complement deficiencies have also been characterized in other species, including pigs, rabbits, guinea-pigs, rats and mice. The ability to target specific genes in mice has resulted in the development of a range of animals with individual and multiple complement deficiencies. Investigation of these models is likely to add significantly to our knowledge of the physiological activities of the complement system.

Complement and tissue injury The role of complement in contributing to the pathogenesis of disease may be considered in three categories: systemic disease, local tissue injury and the resolution of inflammatory injury. An important example of the role of complement in causing systemic injury is in bacterial sepsis. Large-scale activation of the complement cascade leads to substantial anaphylatoxin production, which in turn activates neutrophils and other leukocytes, all of which

The Complement System

contribute to severe morbidity. One of the consequences of extracorporeal circulation of blood over artificial surfaces in haemodialysis machines and heart-lung bypass machines is large-scale activation of complement. There has consequently been great effort to develop artificial membranes and surfaces which minimize complement activation. Local complement activation also plays a role in augmenting tissue injury in many diseases. Infarcted tissue activates complement, for example, following myocardial infarction. The potential clinical importance of complement activation in myocardial infarction has been shown in experimental models in which inhibitors of complement activation were found to reduce infarct size-^^. In autoimmunity, complement is activated and deposited at sites of tissue injury in many different diseases, such as rheumatoid arthritis, systemic vasculitis and myasthenia gravis. Evidence that complement contributes to tissue injury has come from experimental models of membranous nephritis, where deficiency of the membrane attack complex reduces the degree of proteinuria. On the other hand, evidence is beginning to emerge that, in some circumstances, complement may be protective against the development of tissue injury, which would fit with the role of complement in promoting the clearance of immune complexes and, possibly, apoptotic cells'^^. There is an emerging body of evidence indicating that Fc receptors may play a dominant role in transducing proinflammatory signals, rather than complement. This underlines a protective role for complement against inflammatory disease''^.

References ' Buchner, H. (1889) Zbl. Bakt. (Naturwiss.) 5, 817; 6, 1. 2 Pfeiffer, R. and Issaeff, R. (1894) Z. Hyg. Infektionskr. 17, 355. ^ Bordet, J. (1909) In Studies in Immunity. J. Wfley &L Sons, New York. ^ Pillemer, L. et al. (1941) J. Exp. Med. 74, 927-934. ^ Nelson, R.A. et al. (1966) Immunochemistry 3, 111. ^ Mufler-Eberhard, H.J. (1975) Annu. Rev. Biochem. 44, 697-724. ^ Pillemer, L. et al. (1954) Science 120, 279-285. « Nelson, R.A. (1968) J. Exp. Med. 108, 515-535. ^ Gotze, O. and Miiller-Eberhard, H.J. (1971) J. Exp. Med. 134, 90s-108s. ^« Reid, K.B.M. and Porter, R.R. (1976) Biochem. J. 155, 19-23. '' Perona, J.J. and Craik, C.S. (1995) Protein Sci. 4, 337-360. ^2 Campbell, R.D. et al. (1988) Annu. Rev. Immunol. 6, 161-195. '' Tack, B.F. et al. (1980) Proc. Natl Acad. Sci. USA 77, 5764-5768. ^^ Liu, C.C. et al. (1995) Immunol. Today 16, 194-201. ^5 Morley, B.J. and Campbell, R.D. (1984) EMBO J. 3, 153-157. ^^ Volanakis, J.E. and Frank, M.M. eds (1998) The Human Complement System in Health and Disease, Marcel Dekker, New York. '' Sim, R.B. and Reid, K.B.M. (1991) Immunol. Today 12, 307-311. ^« Ollert, M.W. et al. (1994) J. Immunol. 153, 2213-2221. ^^ Liszewski, M.K. and Atkinson, J.P. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 149-165. 20 Epstein, J. et al. (1996) Curr. Opin. Immunol. 8, 29-35. 2^ Pangburn, M.K. and Miiller-Eberhard, H.J. (1984) Semin. Immunopathol. 7, 163-192.

The Complement System

22 Nicol, A.E. and Lachmann, P.J. (1973) 24, 259-275. 23 Miiller-Eberhard, H.J. (1986) Annu. Rev. Immunol. 4, 503-528. 2^ Mold, C. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 309-325. ^^ Bhakdi, S. and Tranum-Jensen, J. (1991) Immunol. Today 12, 318-320. 26 Esser, A.F. (1991) Immunol. Today 12, 316-318. 27 Lu, J. and Sim, R.B. (1994) In New Aspects of Complement Structure and Function (Erdei, A. ed.), R.G. Landes Co., Austin, TX, pp. 85-106. 2* Davies, K.A. et al. (1994) Springer Semin. Immunopathol. 15, 397-416. 29 Janeway, C.A., Travers, P., Walport, M. and Capra J.D. eds (1999) Immunobiology: The Immune System in Health and Disease, 4th edn. Current Biology and Garland Publishing, New York. ^^ Fearon, D.T. and Carter, R.H. (1995) Annu. Rev. Immunol. 13, 127-149. 3^ Fearon, D.T. and Locksley, R.M. (1996) Science 272, 50-53. 32 Carroll, M.C. (1998) Annu. Rev. Immunol. 16, 545-568. 33 Tsuji, R.F. (1997) J. Exp. Med. 186, 1015-1026. 3^ Colten, H.R. and Rosen, F.S. (1992) Annu. Rev. Immunol. 10, 809-834. 35 Turner, M.W. (1996) Immunol. Today 17, 532-540. 36 Figueroa, J. et al. (1993) Immunol. Res. 12, 295-311. 37 Botto, M. et al. (1998) Nature Genet. 19, 56-59. 3« Agostoni, A. and Cicardi, M. (1992) Medicine Baltimore 71, 206-215. 39 Kalli, K.R. et al. (1994) Springer Semin. Immunopathol. 15, 417-431. ^» Ravetch, J.V. and Clynes, R.A. (1998) Annu. Rev. Immunol. 16, 421-432.

Section II

THE COMPLEMENT PROTEINS

This Page Intentionally Left Blank

Parti Clq and the CoUectins

i|

l i s protein Franz Petry and Michael Loos, University of Mainz, Mainz, Germany

Physicochemical properties C l q is made up of three individual polypeptide chains, A, B and C, which are synthesized as pre-molecules with 22, 25 and 28 amino acid leader sequences, respectively.

pP M, (K)

9.3 459.32

M,(K) N-linked glycosylation site Interchain disulfide bonds

A chain 27.5 1 (146) 26

B chain 25.2

C chain 23.8

29

32 (to 32 in second C chain)

Structure C l q has a characteristic '^bunch of tulips'' appearance under the electron microscope, with six globular heads connected by six collagen-like stalks forming a central fibril stem. Glucosylgalactosyl disaccharide units are linked to certain hydroxylysine residues in the collagen regions of all three chains.

Function C l q has a critical function in host defence and clearance of i m m u n e complexes. Antibody-independent activation of the classical pathway via C l q has been demonstrated for certain viruses, gram-negative bacteria, D N A and poly anions. The binding sites for various proteins including endotoxin, CRP, serum amyloid A, D N A and heparin have been mapped to the A chain of C l q . C l q binds to IgG- and IgM-bearing i m m u n e complexes via its globular heads. Other Ig classes do not activate the classical pathway via C l q . Binding of C l q to i m m u n e complexes leads to activation of the CI subcomponent C l r and then C l s . Two molecules of each, C l r and C l s , are assembled with C l q at the collagen-like region to form macromolecular C I . Activation of CI initiates the activation of classical pathway of complement. After activation, the CI complex is dissociated by the CI esterase inhibitor (Cl-INH). Clq-bound i m m u n e complexes bind to C l q receptors on various cell types.

Tissue distribution^'^'^ Primary site of synthesis: macrophages. Secondary sites: follicular dendritic cells, interdigitating cells and further cells of the monocyte-macrophage lineage.

D

Regulation of expression Unknown.

Protein sequences^^ A chain MEGPRGWLVL QGEPGAPGIR GTKGSPGNIK FVCTVPGYYY VSGGMVLQLQ

CVLAISLASM TGIQGLKGDQ DQPRPAFSAI FTFQVLSQWE QGDQVWVEKD

VTEDLCRAPD GEPGPSGNPG RRNPPMGGNV ICLSIVSSSR PKKGHIYQGS

GKKGEAGRPG KVGYPGPSGP VIFDTVITNQ GQVRRSLGFC EADSVFSGFL

RRGRPGLKGE 50 LGARGIPGIK 100 EEPYQNHSGR 150 DTTNKGLFQV 2 00 IFPSA

B chain MKIPWGSIPV PGTPGIKGEK APGAPGPKGE YEPRSGKFTC NTFQVTTGGM A

LMLLLLLGLI GLPGLAGDHG SGDYKATQKI KVPGLYYFTY VLKLEQGENV

DISQAQLSCT EFGEKGDPGI AFSATRTINV HASSRGNLCV FLQATDKNSL

GPPAIPGIPG PGNPGKVGPK PLRRDQTIRF NLMRGRERAQ LGMEGANSIF

IPGTPGPDGQ GPMGPKGGPG DHVITNMNNN KWTFCDYAY SGFLLFPDME

LGLKLLLLLL GIPAIPGIRG EGRYKQKFQS KVPGLYYFVY LQVGEEVWLA

LLALRGQANT PKGQKGEPGL VFTVTRQTHQ HASHTANLCV VNDYYDMVGI

GCYGIPGMPG PGHPGKNGPM PPAPNSLIRF LLYRSGVKW QGSDSVFSGF

LPGAPGKDGY 50 GPPGMPGVPG 100 NAVLTNPQGD 150 TFCGHTSKTN 2 00 LLFPD

C chain MDVGPSSLPH DGLPGPKGEP PMGIPGEPGE YDTSTGKFTC QVNSGGVLLR

50 10 0 150 2 00 250

The leader sequences are underlined and the N-linked glycosylation site in the A chain is indicated (N).

Protein modules A chain 1-22 31-109 110-245

Leader peptide Collagen-like region Globular region

exon 1 exon 1/2 exon 2

Leader peptide Collagen-like region Globular region

exon 1 exon 1/2 exon 2

Leader peptide Collagen-like region Globular region

exon 1 exon 1/2 exon 2

B chain 1-25 31-114 115-251

C chain 1-28 31-114 115-245

Chromosomal location Human«: Ip34-lp36.3. Mouse^: chromosome 4, 66.1 cM.

cDNA sequences ClqA (manual entry from ref. 7) TGTGATGTCC CGGGGATGGC TTGTGCCGAG CCAGGCCTCA GGCCTTAAAG CCAGGGCCCA AGCCCAGGAA CCAATGGGGG CAGAACCACT GTGCTGTCCC CGCTCCCTGG ATGGTGCTTC CACATTTACC GCCTGAGCCA TAAAAAGGGG GGCTGCCCGT

AACCTGCCCA TGGTGCTCTG CACCAGACGG AGGGGGAGCA GAGACCAGGG GCGGCCCCCT ACATCAAGGA GCAACGTGGT CCGGCCGATT AGTGGGAAAT GCTTCTGTGA AGCTGCAGCA AGGGCTCTGA GGGAAGGACC GCGCTATTGC GACACATGCT

GGCCCTCCCG TGTGCTGGCC GAAGAAAGGG AGGGGAGCCC GGAACCTGGG CGGGGCCCGT CCAGCCGAGG CATCTTCGAC CGTCTGCACT CTGCCTGTCC CACCACCAAC GGGTGACCAG GGCCGACAGC CCCTCCCCCA TTCAGCTGCT CTAAGAAGCT

TGTCTCCACA ATATCGCTGG GAGGCAGGAA GGGGCCCCTG CCCTCTGGAA GGCATCCCGG CCAGCCTTCT ACGGTCATCA GTACCCGGCT ATCGTCTCCT AAGGGGCTCT GTCTGGGTTG GTCTTCAGCG CCCACCTCTC GAAGGAGGGC CGTTTCTTAG

GAGGCATCAT CCTCTATGGT GACCTGGCAG GCATCCGGAC ACCCCGGCAA GAATTAAAGG CCGCCATTCG CCAACCAGGA ACTACTACTT CCTCAAGGGG TCCAGGTGGT AAAAAGACCC GCTTCCTCAT TGGCTTCCAT ATGCTCTGAG ACCTCTTCCT

GGAGGGTCCC GACCGAGGAC ACGGGGGCGG AGGCATCCAA GGTGGGCTAC CACCAAGGGC GCGGAACCCC AGAACCGTAC CACCTTCCAG CCAGGTCCGA GTCAGGGGGC CAAAAAGGGT CTTCCCATCT GCTCCGCCTG AGCCCAGACT GGAATAAA

CTGATGTTGC GGGCCCCCAG CCTGGGACCC GAGTTCGGAG GGCCCCATGG TCGGGAGACT CCCCTGCGCC TATGAGCCCC CACGCCAGCT AAGGTGGTCA GTCCTCAAGC CTGGGCATGG GCCTGACCTG ACCCCCAACA AGTGAATGAG CCCAGCACTG

TCCTGCTCCT CCATCCCTGG CAGGGATAAA AGAAGGGAGA GCCCTAAAGG ACAAGGCCAC GGGACCAGAC GCAGTGGCAA CTCGAGGGAA GCTTCTGTGA TGGAGCAGGG AGGGTGCCAA TGGGCTGCTT CCACCCCTTG TAAATAAACT GCACACCAGA

GGGCCTAATC 60 CATCCCGGGT 120 AGGAGAGAAA 180 CCCAGGGATT 240 TGGCCCAGGG 300 CCAGAAAATC 360 CATCCGCTTC 420 GTTCACCTGC 480 CCTGTGCGTG 540 CTATGCCTAC 600 GGAGAACGTC 660 CAGCATCTTT 720 CACATCCACC 780 CCCAGCCAAT 840 CTTCAAGGCC 900 AGTGCCATGC 1020

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900

ClqB (manual entry from ref. 6 ATGAAGATCC GATATCTCCC ATCCCTGGGA GGGCTTCCAG CCTGGGAATC GCCCCTGGAG GCCTTCTCTG GACCACGTGA AAGGTGCCCG AACCTCATGC AACACCTTCC TTCCTGCAGG TCCGGGTTCC CCGGCTCCCC GGACACAGTA AAGGAACAGT TCAGAAATGT

CATGGGGCAG AGGCCCAGCT CACCTGGCCC GGCTGGCTGG CAGGAAAAGT CCCCAGGCCC CCACAAGAAC TCACCAACAT GTCTCTACTA GTGGCCGGGA AGGTCACCAC CCACCGACAA TGCTCTTTCC CTGCCAGCAA GGGCTTGGTG GGTCTAATTC TGGTTACATG

CATCCCAGTA CAGCTGCACC CGATGGCCAA AGACCATGGT CGGCCCCAAG CAAAGGTGAA CATCAACGTC GAACAACAAT CTTCACCTAC GCGTGCACAG CGGTGGCATG GAACTCACTA AGATATGGAG CGCTCACTCT AATGCTGCTG AACTCTGTGT AATGAA

ClqC (manual entry from ref. 7) TCTCTCCCTC GGGCTGAAGC TGCTACGGGA GGACTGCCGG AAAGGGCAGA CCCCCTGGGA GGCAGATACA CCTGCACCCA GACACGAGCA GCGTCGCATA TTCTGTGGCC CAGGTGGGCG GGCTCTGACA TCGAGACCCA CTGCATCCTT CACCCCCTCC GGTTCCTGGG AGTATTGGAA

CCAGTTCCTT TGCTGCTGCT TCCCAGGGAT GGCCCAAGGG AGGGAGAACC TGCCAGGGGT AGCAGAAATT ACAGCCTGAT CTGGCAAGTT CAGCCAACCT ACACGTCCAA AGGAGGTGTG GCGTCTTCTC CGGGCCTTCC GCCTAGACCA CCATGGGTTC ACACTTAACC GGGGTGGGGA

CTCCGGGATG TCTGCTGCTG GCCCGGCCTG GGAGCCAGGA CGGCTTACCC GCCCGGCCCC CCAGTCAGTG CAGATTCAAC CACCTGCAAA GTGCGTGCTG AACCAATCAG GCTGGCTGTC CGGCTTCCTG ACCTCCTCAG TTCTCCCCTC TCTCCTTCCT AATGCCTTCT GATATATAAA

GACGTGGGGC CTCGCCCTCA CCTGGGGCAC ATCCCAGCCA GGCCATCCTG ATGGGCATCC TTCACGGTCA GCGGTACTTA GTCCCCGGCC CTGTACCGCA GTCAACTCGG AATGACTACT CTCTTCCCCG CTTCTGCTAG CAGGGAGCCC CTGAACTTCT GGTACTGCCA TAAA

CCAGCTCCCT GGGGCCAAGC CAGGGAAGGA TTCCCGGGAT GGAAAAATGG CTGGAGAGCC CTCGGCAGAC CCAACCCGCA TCTACTACTT GCGGCGTCAA GCGGTGTGCT ACGACATGGT ACTAGGGCGG GACCCACCTT ACCCTGACCC TTAGGAGTCA TTCTTTTTTT

GCCCCACCTT 60 CAACACAGGC 120 TGGGTACGAC 180 CCGAGGACCC 240 CCCCATGGGA 300 AGGTGAGGAG 360 CCACCAGCCC 420 GGGAGATTAT 480 TGTCTACCAC 540 AGTGGTCACC 600 GCTGAGGTTG 660 GGGCATCCAG 720 GCAGATGCGC 780 ACTGGCCAGT 840 ACCCCCACTG 900 CTGCTTGTGT 960 TTTTTTTTCA 1020

The first five nucleotides of the second exon in each cDNA are underlined to indicate the intron-exon boundaries. The putative polyadenylation sites, (AATAAA, A and C chains,- AATGAA, B chain), the initiation (ATG) and termination codons (TGA, A and B chains,- TAG C chain) are indicated.

Genomic structure^'^^^ The Clq gene cluster is located on a 24kb stretch of DNA and the genes are aligned in the order A-C-B with the same 5'-3' orientation. Each gene consists of two exons separated by one intron. Intergenic distances are indicated in the graphical representation below. The mouse Clq gene cluster shows the same organization, order and orientation with slight differences in the intron sizes and the intergenic distances. A 1kb C B

1 •

2 •

1 •

1

2

1

2

1 1

•

•

1 •

1 •

Accessior1 numbers Human^ iviouse^'^ ^^ BALB/c 129 BALB/c BALB/c Rat5

B chain A A B C B

chain chain chain chain chain

cDNA X03084 X58861 M22531 X66295 X71127

Genomic K03430 X92958 X92959 X92960

Deficiency Human Autosomal recessive. Lack of Clq leads to a loss of activation of the classical pathway. Patients suffer from severe recurrent viral and bacterial

infections and from immune complex deposition leading to glomerulonephritis and systemic lupus erythematosus (SLE)-like symptoms. Mutations identified (reviewed in refs. 12 and 13): A gene C670 to T; Q208 to stop six families; Turkey, Slovak Republic B gene Gl 19 to A; G40 to D one family; Morocco C523 to T; R175 to stop one patient; Mexico C gene G127 to A; G43 to R three families; Germany, India, Saudi Arabia C232 to T; R69 to stop one family; Yugoslavia del C239; frameshift - S136 to stop one patient; England Mouse A Clq knockout mouse has been generated by targeted gene disruption^''.

Polymorphic variants None described.

References ^ 2 3 ^ 5 6 7 « ^ '0 " 12 13 ^^

Heinz, H.P. (1989) Behring Inst. Mitt. 84, 20-31. Reid, K.B.M. (1989) Behring Inst. Mitt. 84, 8-19. Antes U. et al. (1984) J. Immunol. Meth. 74, 299-306. Petry, F. et al. (1991) J. Immunol. 147, 3988-3993. Schwaeble, W. et al. (1995) J. Immunol. 155, 4971-4978. Reid, K.B.M. (1985) Biochem. J. 231, 729-735. Sellar, G.C. et al. (1991) Biochem. J. 274, 481-490. Sellar, G.C. et al. (1992) Immunogenetics 35, 214-216. Petry, F. et al. (1996) Immunogenetics 43, 370-376. Petry, F. et al. (1989) FEBS Lett. 258, 89-93. Petry, F. et al. (1992) Eur. J. Biochem. 209, 129-134. Walport, M.J. et al. (1998) Immunobiology 199, 265-285. Petry, F. (1998) Immunobiology 199, 286-294. Botto, M. et al. (1998) Nature Genet. 19, 56-59.

Mannose-binding lectin

MBL

Peter Lawson and K.B.M. Reid, Department of Biochemistry, University of Oxford, Oxford, UK Other names Mannose-binding protein, mannan-binding lectin, mannan-binding protein, core-specific lectin. Ra-reactive factor (RaRF) corresponds to MBL and its associated serine proteases (MASP-1 and MASP-2). In mice, RaRF subcomponents P28a and P28b refer to MBL-C and MBL-A, respectively.

Physicochemical properties Mannose-binding lectin is a 248 amino acid molecule including a 20 amino acid leader sequence. Mature protein: pi predicted 5.3 M, (K) predicted 24 observed 32 (reduced) -600 (non-denatured) N-linked glycosylation sites none Interchain disulfide bonds 25, 32, 38 (arrangement unknown)

Structure MBL forms a trimer of a single polypeptide chain of 32kDa. The polypeptide chain is composed of an N-terminal cysteine-rich region, a collagen-like region, an a-helical neck region and a carbohydraterecognition domain. Formation of the triple helical collagen-like region may be initiated by the neck region. MBL associates to form higher order oligomers of 2-6 trimers, with the hexameric form resembling the ''bunch of tulips'' structure used to describe the structure of Clq, as judged by electron microscopy. An interruption in the collagen-like sequence results in a kink in the collagen stalks^. The crystal structure of the carbohydraterecognition domain of MBL, as well as a trimeric structure have been determined^-^.

Function MBL interacts with highest affinity to the simple carbohydrates Nacetylglucosamine and mannose. The functions of MBL are mediated by binding carbohydrate on the surface of pathogens, viruses, yeast, fungi and bacteria^. The effector functions are mediated through the Clq receptors, the most recently described is by Nepomuceno and colleagues^, or through the activation of complement^, via the associated serine proteases MASP-1* and MASP-2^. MBL is an important component of innate immune defence and aids the development of the adaptive immune system^'^. MBL has also been shown to act as a direct opsonin, leading to attachment, uptake and killing of Salmonella montevideo by phagocytes^^. MBL can inhibit influenza A virus haemagglutination^^^ the binding of MBL to different viral strains is carbohydrate dependent and leads to complement

Mannose-binding lectin

activation^-^. MBL also inhibits HIV infection of CD4* lymphocytes^^, acts as an acute-phase protein^^ and low levels of MBL have been associated with a defect in opsonization and increased risk of infection during childhood^'^^.

Tissue distribution Serum protein: 1 ^ag/ml (average) sera concentration in Caucasian populations. There is a wide range of MBL concentrations in individuals (O-SjUg/ml) due to the presence of MBL structural and promoter region alleles in the population (see below). Primary site of synthesis: liver^^. Secondary sites: kidney^^.

Regulation of expression MBL is an acute-phase protein, a modest 1.5- to 3-fold increase of serum MBL concentration is seen after malaria infection and trauma (surgery)^*. MBL expression in vitro is enhanced by IL-6, dexamethasone and heat shock but inhibited by IL-P^. The promoter region contains the consensus sequences for three glucocorticoid response elements and a heat shock element, consistent with MBL being an acute-phase protein^^.

Protein sequence^^^^^ MSLFPSLPLL GRDGTKGEKG PDGDSSLAAS VKALCVKFQA NRLTYTNWNE

LLSMVAASYS EPGQGLRGLQ ERKALQTEMA SVATPRNAAE GEPNNAGSDE

ETVTCEDAQK GPPGKLGPPG RIKKWLTFSL NGAIQNLIKE DCVLLLKNGQ

TCPAVIACSS NPGPSGSPGP GKQVGNKFFL EAFLGITDEK WNDVPCSTSH

PGINGFPGKD 5 0 KGQKGDPGKS 10 0 TNGEIMTFEK 150 TEGQFVDLTG 2 00 LAVCEFPI

The leader sequence is underlined, there are no potential N-linked glycosylation sites. The collagen-like region is interrupted. Seven G-X-Y repeats are followed by a G-Q-G interruption, followed by 12 collagen triplets. Some of the prolines in the Y position in the collagen-like region are hydroxylated (shown in bold), P82 is hydroxylated in MBL purified from the liver but not when purified from the serum^^.

Protein modules 1-20 21-41 42-100 101-124 125-248

Leader peptide N-terminal cysteine rich region Collagen-like domain Alpha-helical coiled-coil ''neck'' region Carbohydrate-recognition domain (CRD)

exon 1 exon 1 exon 1/2 exon 3 exon 4

Residues important for calcium-dependent recognition of carbohydrate are E212, N214, E220, N232 and D2332.

Mannose-bindine lectin

Chromosomal location Human^^ H u m a n MBL, the orthologue of rhesus monkey MBL-C, is on chromosome 10, 10qll.2-q21, 74-75 cM. Centromere ... JNK-46 ... MBL ... ANK-3 ... Telomere A mannose-binding lectin mRNA which encodes for a truncated protein has been identified in a liver cDNA library, this expressed pseudogene, the rhesus monkey MBL-A orthologue, is localized on chromosome 10 at 10q22.2-q22.323.

Mouse^^ Mbll, mannose-binding lectin A, serum chromosome 14, 15 cM. Mbl2, mannose-binding lectin C, liver chromosome 19, 25 cM.

cDNA sequence^^ 20 GCTCGGTAAA GTGAGGACCA TCTTACTCAG TGTAGCTCTC GAAAAGGGGG CCTCCAGGAA GGAAAAAGTC GAAATGGCAC TTCTTCCTGA TTCCAGGCCT ATCAAGGAGG CTGACAGGAA TCTGATGAAG ACCTCCCATC TTGTCTTTTT TTCCTCATAT CCAACAAAGC ATATAATATT TAGTTTAATT CGGATTTATT GGGTAGAGGG TCAGGTATTA GAGATATTAA

TATGTGTTCA TGTCCCTGTT AAACTGTGAC CAGGCATCAA AACCAGGCCA ATCCAGGGCC CGGATGGTGA GTATCAAAAA CCAATGGTGA CTGTGGCCAC AAGCCTTCCT ATAGACTGAC ATTGTGTATT TGGCCGTCTG ACTGCAACCC CCAGCATTGT AATAATAGTA TTTAATATAT AATCTGTAAT TTTCCATTTA CTCCCCTAAT AGAAAATCTA ACCATGTA

TTAACTGAGA TCCATCACTC CTGTGAGGAT CGGCTTCCCA AGGGCTCAGA TTCTGGGTCA TAGTAGCCTG GTGGCTGACC AATAATGACC CCCCAGGAAT GGGCATCACT CTACACAAAC GCTACTGAAA TGAGTTCCCT ACAGGCCCAC TCCTTTTGTG GTAGTAGTAG ACTATGAGGC GCTTTCGATA CAACAAACAC GACATCACCA TTTTTGTAAC

TTAACCTTCC CCTCTCCTTC GCCCAAAAGA GGCAAAGATG GGCTTACAGG CCAGGACCAA GCTGCCTCAG TTCTCTCTGG TTTGAAAAAG GCTGCAGAGA GATGAGAAGA TGGAACGAGG AATGGCCAGT ATCTGAAGGG AGTATGCTTG GGCAATCACT TTAGCAGCAG CCTATCTTTT GTGTTAACTT CTGTGCTCTG CAGTTTAATA TTTCTCTATG

CTGAGTTTTC TCCTGAGTAT CCTGCCCTGC GGCGTGATGG GCCCCCCTGG AGGGCCAAAA AAAGAAAAGC GCAAACAAGT TGAAGGCCTT ATGGAGCCAT CAGAAGGGCA GTGAACCCAA GGAATGACGT TCATATCACT AAAAGATAAA AAAAATGATC CAGTAGTAGT GCATCCTACA GCTGCAGTAT TTGAGCCTTC CCACAGCTTT AACTCTGTTT

TCACACCAAG GGTGGCAGCG AGTGATTGCC CACCAAGGGA AAAGTTGGGG AGGAGACCCT TCTGCAAACA TGGGAACAAG GTGTGTCAAG TCAGAATCTC GTTTGTGGAT CAATGCTGGT CCCCTGCTCC CAGGCCCTCC TTATATCAAT ACTAACAGCA CATGCTAATT ATTAATTATC GAAAATAAGA CTTTCTGTTT TTACCAAGTT TCTTTCTAAT

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320

The methionine initiation codon (ATGJ and the termination codon (TGAj are indicated, no potential polyadenylation signals (AAT AAA) are seen. The first five nucleotides of each exon are underlined to indicate intron-exon boundaries.

Genomic structure^^^^ The gene spans 3.5 kb and is encoded by four exons. 1kb

H—h

Mannose-binding lectin

Accession numbers Human Mouse BALB/c CBA/J BALB/c CBA/J Rat

Bovine Rhesus monkey Rabbit Chicken

MBL-C Pseudogene MBL-A(MW:Z) MBL-C iMbl2) MBL-C MBL-A Pseudogene MBL-C MBL-A MBL-C MBL-C MBL-C

cDNA X15422^5

Genomic X15954-X1595720.22 AFO1938223

D l 144125 S4229226 D1144025 S4229426 M1410327 X0502328 See ref. 27

U09006-U090112^ U09012-U0901724

M14104-M1410529 M1410629

D73408^o L439123^ L4391P^ D84293^2 AF022226

Deficiency C223 to T; R52 to C G230 to A; G54 to D G239 to A; G57 to E Homozygotes for these variants have undetectable or trace amounts of MBL in their serum (see below for heterozygotes).

Polymorphic variants Three variants in exon 1 have been linked to increased susceptibility to recurrent infections in infants^^'^^ and adults^^'^*^. Variant alleles of MBL predisposes individuals to enhanced susceptibility to HIV and progression to AIDS^^'^^. Individuals with mutant MBL alleles have low molecular weight forms of MBL, 100-200 kDa, compared with the wild-type MBL of 600 kDa. Variants identified^'^^'^^^: C223 to T; R52C. Gene frequency of 0.05 in African and European populations. Heterozygotes have a 50% reduction in the level of serum MBL. Linked to a promoter haplotype (HY) which is associated with high MBL concentrations. G230 to A; G54 to D. Gene frequencies of 0.03 in African populations and 0.13 in European and Eskimo populations. 12.5% level of serum MBL. Linked to a promoter haplotype (LY) which is associated with median MBL concentrations. This variant causes an interruption in the collagenlike sequence and prevents complement activation

Maniiose-bindine lectin

G239 to A; G57 to E. Gene frequencies of 0.23 in African and 0.02 in European populations. 12.5% level of serum MBL. Linked to a promoter haplotype (LY) which is associated with median MBL concentrations. This variant causes an interruption in the collagen-like sequence. Complement activation not determined.

References ' 2 3 4 5 6 7 s 9 0

20 2^ 22 23 24 25 26 27 28 29 30 3^ 32 ^3 34 35 36 37 3« 39 40

Thiel, S. and Reid, K.B. (1989) FEBS Lett. 250, 78-84. Weis, W.L et al. (1992) Nature 360, 127-134. Weis, W.I. and Drickamer, K. (1994) Structure 2,1227-1240. Sheriff, S. et aL (1994) Nature Struct. BioL 1, 789-794. Turner, M.W. (1996) Immunol. Today 17, 532-540. Nepomuceno, R.R. et al. (1997) Immunity 6, 119-129. Ikeda, K. et al. (1987) J. Biol. Chem. 262, 7451-7454. Matsushita, M. and Fujita, T. (1992) J. Exp. Med. 176, 1497-1502. Thiel, S. et al. (1997) Nature 386, 506-510. Carroll, M.C. and Prodeus, A.P. (1998) Curr. Opin. Immunol. 10, 36-40. Kuhlman, M. et al. (1989) J. Exp. Med. 169, 1733-1745. Anders, E.M. et al. (1990) Proc. Natl Acad. Sci. USA 87, 4485-4489. Anders, E.M. et al. (1994) J. Gen. Virol. 75, 615-622. Ezekowitz, R.A. et al. (1989) J. Exp. Med. 169, 185-196. Ezekowitz, R.A. et al. (1988) J. Exp. Med. 167, 1034-1046. Super, M. et al. (1989) Lancet 2, 1236-1239. Morio, H. et al. (1997) Eur. J. Biochem. 243, 770-774. Thiel, S. et al. (1992) Clin. Exp. Immunol. 90, 31-35. Arai, T. et al. (1993) Q. J. Med. 86, 575-582. Taylor, M.E. et al. (1989) Biochem. J. 262, 763-771. Kurata, H. et al. (1994) J. Biochem. Tokyo 115, 1148-1154. Sastiy, K. et al. (1989) J. Exp. Med. 170, 1175-1189. Guo, N. et al. (1998) Mamm. Genome 9, 246-249. Sastry, R. et al. (1995) Mamm. Genome 6, 103-110. Kuge, S. et al. (1992) Biochemistry 31, 6943-6950. Sastry, K. et al. (1991) J. Immunol 147, 692-697. Drickamer, K. et al. (1986) J. Biol. Chem. 261, 6878-6887. Oka, S. et al. (1987) J. Biochem. Tokyo 101, 135-144. Drickamer, K. and McCreary, V. (1987) J. Biol. Chem. 262, 2582-2589. Kawai, T. et al. (1997) Gene 186, 161-165. Mogues, T. et al. (1996) Glycobiology 6, 543-550. Kawai, T. et al. (1998) Glycobiology 8, 237-244. Sumiya, M. et al. (1991) Lancet 337, 1569-1570. Summerfield, J.A. et ai. (1997) BMJ 314, 1229-1232. Summerfield, J.A. et al. (1995) Lancet 345, 886-889. Garred, P. et al. (1995) Lancet 346, 941-943. Pastinen, T. et al. (1998) AIDS Res. Hum. Retroviruses 14, 695-698. Garred, P. et al. (1997) Lancet 349, 236-240. Madsen, H.O. et aL (1995) J. ImmunoL 155, 3013-3020. Madsen, H.O. et aL (1994) Immunogenetics 40, 37-44.

Bovine conglutinin Peter Lawson and K.B.M. Reid, Department of Biochemistry, University of Oxford, Oxford, UK Note: Conglutinin is believed to be restricted to bovidae. Despite extensive searching at the protein, cDNA and genomic levels, this gene product or gene appears not to be present in humans or mice.

Physicochemical properties Bovine conglutinin is a 391 amino acid polypeptide chain including a 20 amino acid leader sequence. pP predicted 5.9 four isoforms 5.3-6.1 Mr (K) predicted monomer 35.7 observed 43 (reduced) -1000 (non-denatured) N-linked glycosylation site (predicted) 1 (337) Interchain disulfide bonds 20, 25, 58 (arrangement unknown)

Structure Conglutinin, as with all members of the collectin family, is composed of three polypeptide chains which associate together to form a trimeric subunit. Each polypeptide chain consists of a collagen-like region with a C-terminal Ca^+dependent carbohydrate-recognition domain (C-type lectin). In conglutinin, four of the trimeric subunits, 12 chains, are linked by cysteine bonds at the Nterminus to form a cruciform-type structure. Electron microscopy confirms the quaternary structural arrangement, with each collagen rod in the cruciform structure having a length of 38 nm^. The carbohydrate-recognition domain is believed to form a structure analogous to the X-ray crystal model for the C-type lectin domain of mannose-binding lectin.

Function The functions of conglutinin are mediated through its affinity for carbohydrate, showing greatest specificity for iV-acetylglucosamine amongst simple sugars. Conglutinin interacts with the complement system by binding to one of the carbohydrate chains of C3, only when presented on iC3b, a breakdown product of activated C3^. This results in agglutination of complement-fixed surfaces, known as conglutination. Conglutinin interacts with various microorganisms. It inhibits haemagglutination and acts as an opsonin for influenza A virus'^'^. Conglutinin displays antibacterial activity against Salmonella typhimurium and E. coli^, binds to gpl60, the coat protein of HIV, inhibiting the attachment to CD4^. Conglutinin also enhances complement-dependent respiratory burst of phagocytes induced by E. colP. Low levels of conglutinin are associated with increased infections in cattle^. Conversely, conglutinin can also enhance herpes simplex virus type 2 infection in mice^^.

Bovine conulutinin

Tissue distribution Serum protein: 40-70 ^g/ml approx.^^. Primary site of synthesis: liver^^ Secondary sites: not known. However, immunohistochemical localization of conglutinin to the follicular dendritic cells of the spleen, tonsils and lymph nodes as well as endothelial cells of blood vessels, especially in the glomeruli of the kidney has been noted^^. This probably does not reflect de novo synthesis.

Regulation of expression The regulation of conglutinin has not been well characterized. However, it appears to be developmentally regulated, since serum levels increase with age^^. Conglutinin levels are affected by infection, since serum levels drop in cows with mastitis and experimentally induced bovine leukaemia virus infection^^. The promoter region of the gene has been sequenced and contains transcription factor-binding sites for PEAS, NF-IL6, H-APF-1, SP-1, AP-1 and LF-Al, a liver-specific transcription factor^^.

Protein sequence (bovine)^^ MLLLPLSVLL DGQDGRECPH DTGPRGPPGM GFPGPSGLKG RGDPGETGAK GQAVGEKIFK QEKNAYLSMN FPDGKWNDVP

LLTQPWRSLG GEKGDPGSPG PGPAGREGPS EKGAPGETGA GESGLAEVNA TAGAVKSYSD DISTEGRFTY CSKQLLVICE

AEMTTFSQKI PAGRAGRPGW GKQGSMGPPG PGRAGVTGPS LKQRVTILDG AEQLCREAKG PTGEILVYSN F

LANACTLVMC VGPIGPKGDN TPGPKGETGP GAIGPQGPSG HLRRFQNAFS QLASPRSSAE WADGEPNNSD

SPLESGLPGH GFVGEPGPKG KGGVGAPGIQ ARGPPGLKGD QYKKAVLFPD NEAVTQMVRA EGQPENCVEI

50 10 0 150 2 00 2 50 3 00 3 50

This protein sequence is deduced from a cDNA sequence^^. There are four differences (underlined single amino acids) when this sequence is compared with two other cDNA sequences^^'^^ and one sequence derived by amino acid sequencing^* (see Polymorphic variants section). The leader sequence is underlined. The potential N-linked glycosylation site is indicated (N); however, this involves N337 which is one of the residues considered to be involved in carbohydrate recognition. Carbohydrate analysis reveals the presence of O-linked glycans of GalNAc and alpha (2-3) linked sialic acid type. No N-linked glycans were demonstrated^^. Hydroxylated lysines and prolines are marked in bold; probably all eight hydroxylysines are glycosylated with a disaccharide^*.

Protein modules 1-20 21-45 46-216 217-244 245-371

Leader peptide N-terminal cysteine region Collagen-like sequence Neck region CRD

exon exon exon exon exon

3 3 3-7 8 9

Residues believed to be important for calcium-dependent recognition of carbohydrate are E335, N337, E345, N357 and D358.

Bovine conglutinin

Chromosomal location Bovine^^^: chromosome 28ql8, domestic cow (Bos taurus). Centromere ... IDVGA43 ... CGNl ... IDVGA8 (D28S10) ... Telomere River buffalo^^: chromosome 4pl6.

cDNA sequence (bovine)^^';,i7 ATTTGTGTGG CTCAACTTGC AGCAAGTGAG TGCTGCTCCT AGAAAATACT CTGGTCATGA CACCAGGACC GAGACAATGG CAGGTATGCC CTCCAGGCAC GCATACAGGG CTGGAGCCCC CTTCAGGTGC GAGCAAAGGG TAGATGGACA TCCCTGATGG ATTCAGATGC CAGCCGAGAA GCATGAATGA ATTCCAACTG TGGAAATCTT TCTGCGAGTT CTGCCAACAT GGGATGAGGC TGGCTTC

AGGTTCTGAA TTTCCTGTGT AGGAAACAAG GCTCACACAG GGCCAATGCC TGGACAAGAT TGCAGGACGA CTTTGTTGGA TGGACCAGCT ACCAGGCCCC CTTCCCAGGC TGGACGTGCT CAGGGGCCCC GGAGAGTGGG TCTGCGACGC CCAGGCTGTC AGAGCAGCTC CGAGGCCGTG CATCTCCACG GGCCGATGGG TCCTGATGGC TTGAGCTCTC CCCAATAAAA CACCAGGCAG

ACCCTAGAGG CCAATAGCAC CCAGCATTGT CCCTGGAGAT TGTACCCTGG GGGAGAGAAT GCAGGAAGGC GAACCTGGAC GGAAGAGAAG AAAGGAGAGA CCCTCGGGTC GGGGTGACAG CCAGGACTGA CTTGCAGAGG TTCCAAAATG GGGGAGAAGA TGCAGAGAGG ACACAGATGG GAGGGGAGGT GAGCCCAACA AAGTGGAATG CCACCCACCC AGGTGACCCT GCCTCCTATG

ACGCAACTGT TGCAGACTCC AAGAGGACAT CCCTGGGAGC TTATGTGTAG GTCCCCATGG CTGGATGGGT CAAAGGGAGA GCCCCTCAGG CTGGGCCCAA TCAAAGGAGA GGCCTTCTGG AGGGGGACAG TCAATGCTCT CCTTCAGTCA TCTTCAAGAC CTAAGGGACA TCAGAGCCCA TCACTTACCC ACAGTGATGA ACGTACCCTG CAGGGAAGGG CTGCTGCTCA GAACTCCTCC

GAGTGTCTCT AGTACTAGCC GCTTCTCCTC AGAAATGACA CCCCCTGGAG AGAGAAGGGG TGGCCCTATT CACTGGGCCA GAAGCAGGGG AGGAGGAGTG GAAAGGTGCC AGCCATAGGT AGGTGATCCT GAAGCAGGGG GTATAAGAAA AGCAGGTGCT GCTGGCCTCC GGAAAAGAAT CACTGGGGAA GGGACAACCA CAGTAAGCAA GCAGTGCCCA GGGCTTCCCC CTCAGAATAA

GTCCTAGCCT TGTCCAGAGC CCTCTCTCCG ACCTTTTCTC AGTGGCTTGC GATCCAGGTT GGGCCGAAAG CGTGGGCCTC AGCATGGGAC GGTGCCCCAG CCCGGGGAGA CCACAGGGCC GGAGAAACAG GTGACAATCT GCGGTGCTCT GTAAAGTCAT CCACGCTCTT GCTTACCTGA ATACTGGTCT GAGAACTGTG CTCCTTGTGA GAGCTGTGAG ACTGAGCCAC AGTTTGAAAC

60 12 0 18 0 240 3 00 3 60 42 0 480 540 600 6 60 720 7 80 840 900 9 60 102 0 108 0 1140 12 00 12 60 132 0 13 80 1440

The cDNA show^n is a hybrid of the deduced cDNA from a genomic sequence^^ (nucleotides 1-1337) and a partial cDNA^^ (nucleotides 1338-1447). The initiation codon (ATG), the termination codon (TGAI and the polyadenylation signal (AATAAAI are indicated. The first five nucleotides in each exon are underlined to indicate the intron/exon boundaries.

Genomic structure (bovine)^^^^^ The gene spans approximately 11 kb and is encoded by nine exons. The collagen region is encoded in five exons. Collagen encoding exons 4-7 probably arose from a single exon by gene duplication (see the genomic structure for mannose-binding lectin for comparison).

m 1

I "-^ I '

"

m^—

9

Bovine con^lutinin

Accession numbers ill

cDNA 014085^^ L1887P7 X71774^2

Bovine

Genomic D25294-D25302^5 U06852-U0686022

Deficiency None identified.

Polymorphic variants Potential polymorphic variants identified from the three cDNA sequences, two genomic sequences and one protein sequence are shown here. Position 1 4-5 14 17 18 380 521 557 676 704 749 786-788 811 833 836 845 854 857 860 890 973 1154 1310 1315

Genomic^^ starts at 19

A G C G(R) C G AAG (K) T(V) G A T G A C A A(E) C A T

Genomic^^ A T T G A G C G(R) C G AAG (K) T(V) G A T G A C A A(E) C A T

cDNA^2

cDNA^7

cDNA^6

starts at 12 C starts at 60 C insertionT G T C A G T A A A G G C G C A(H) A(H) G(R) C C A A G G AAG (K) AAG (K) AAG (K) C(A) T(V) T(V) G G A A A C T C T G G A A A C C G C A A G T(V) A(E) A(E) T C C T A A T C T

Protein^* starts at 219

-

R

S V

E

-

Updated and modified from ref. 22. Numbering based on the cDNA shown above. The amino acids shown on the far right are the differences seen in the full-length protein sequence, obtained by amino acid sequencing^*, when compared with all the cDNA sequences. Base changes shown in the table are mostly silent except where the translated amino acids are shown in parenthesis. The protein is translated from base 219 and finishes at base 1271. Dashes indicate where sequence is not available for comparison.

References i Holmskov, U. et al. (1995) Biochem. J. 305, 889-896. 2 Strang, C.J. et al. (1986) Biochem. J. 234, 381-389. 3 Laursen, S.B. et al. (1994) Immunology 81, 648-654.

Bovine conglutinin

^ 5 6 7 « 9 ^0 ^^ ^2 " ^^ ^5 ^6 ^7 ^s ^9 20 2^ 22

Hartshorn, K.L. et al. (1993) J. Immunol. 151, 6265-6273. Hartley, C.A. et al. (1992) J. Virol. 66, 4358-4363. Friis Christiansen, P. et al. (1990) Scand. J. Immunol. 31, 453-460. Andersen, O. et al. (1991) Scand.}. Immunol. 33, 81-88. Friis, P. et al. (1991) Immunology 74, 680-684. Holmskov, U. et al. (1998) Immunology 93, 431-436. Fischer, P.B. et al. (1994) Scand. J. Immunol. 39, 439-445. Akiyama, K. et al. (1992) Am. J. Vet. Res. 53, 2102-2104. Lu, J. et al. (1993) Biochem. J. 292, 157-162. Holmskov, U. et al. (1992) Immunology 1^, 169-173. Akiyama, K. et al. (1992) J. Vet. Med. Sci. 54, 977-981. Kawasaki, N. et al. (1994) Biochem. Biophys. Res. Commun. 198, 597-604. Suzuki, Y. et al. (1993) Biochem. Biophys. Res. Commun. 191, 335-342. Liou, L.S. et al. (1994) Gene 141, 277-281. Lee, Y.M. et al. (1991) J. Biol. Chem. 266, 2715-2723. Andersen, O. et al. (1992) J. Struct. Biol. 109, 201-207. Gallagher, D.S., Jr. et al. (1993) Mamm. Genome 4, 716-719. lannuzzi, L. et al. (1994) Hereditas 120, 283-286. Liou, L.S. et al. (1994) J. Immunol. 153, 173-180.

SP-A Robert B. Sim, MRC Immunochemistry Unit, Oxford, UK

D

Other names Surfactant protein A, pulmonary surfactant (glyco)protein A, PSP-A.

Physicochemical properties SP-A is a member of the collectin family, which are typically oligomers of short (Mr (K) 25-45) polypeptide chains, each containing collagenous sequence at the N-terminus and a C-type lectin domain at the C-terminus^ SP-A is made up of up to 18 polypeptide chains of two types, named a2 and oc3, thought to be in the ratio 1:2^. The two polypeptide types are ~98% identical, but are products of separate genes. Each polypeptide is 248 amino acids long, including a 20 amino acid leader sequence. There is variability in the position of cleavage of the leader sequence in rat SP-A^, and this may be so in human SP-A, such that some mature polypeptides may contain C(20). Mature protein: pl M, (K)

predicted observed N-linked glycosylation sites (occupied) Interchain disulfide bonds

-4.4-4.8 -4.5-5.0 (deglycosylated) 24.2 (without glycosylation) 30-33 (secreted form)

1 (207) a2 chain 26, 68 20 Potential Disulfide bridging pattern not established.

a3 chain 26, 68, 85 20 Potential

Structure SP-A has a quaternary structure similar to Clq. Three polypeptide chains (thought to be, optimally, two a3 and one a2) fold to form a heterotrimer. Trimerization involves formation of a coiled-coil at the neck regions and a collagen triple helix"^. Interchain disulfide bridges are likely to form between C26, 68, 85 and possibly C20^. The three-chain subunits then associate covalently or non-covalently to form a structure with up to six subunits, and a total of 18 lectin domains, very similar in shape to Clq. The six-subunit form has M^ (K) approx. 540, Stokes' radius 9.S nm,S2o, w -14^. Analysis of SP-A in lung surfactant fluid and synovial fluid indicates that 40% or more of the SP-A present is not fully assembled into the sixsubunit form, but exists as smaller oligomers^. There is O-linkage of GalGlc disaccharides in the collagen-like region, but the number is unknown.

Function SP-A is the major protein of lung surfactant, where it is found associated with the surfactant lipids, with only -10% of SP-A in solution in the aqueous phase. It is thought to have two major functions, namely innate

immune defence and involvement in surfactant metabolism^'*. It is considered to have a major role in surfactant recycling, i.e. secretion and uptake by alveolar type II cells, and to have a structural role in the organization of tubular myelin. SP-A knockout mice, however, do not appear to have gross abnormalities in surfactant metabolism^. The knockout mice, however, have been shown to be more susceptible to some bacterial infections, consistent with an important role of SP-A in innate immunity. SP-A binds to a wide range of bacteria, viruses, fungi and particulate materials such as pollen grains by interaction of the lectin domains with surface saccharides (mainly mannose, glucosamine, fucose residues p'*'^^. The smaller oligomeric forms of SP-A show much poorer binding than is seen with the fully assembled six-subunit form^'". SP-A acts as an opsonin, and to some extent as an agglutinin. SP-A has no known enzymic activity, and is not known to interact with any complement system proteins, although it interacts {in vitro) with the same receptor(s) as Clq. SP-A has specific interactions with a number of cell types, including macrophage, neutrophils and specialized epithelia. There is controversy as to the receptors involved. Several candidate SP-A receptors have been reported. These include: (a) the collectin receptor (cClqR, cell surface calreticulin)^^'^^ which has been shown to mediate uptake of SP-A-coated Staphylococcus aureus by monocytes^^; (b) ClqRp^^, which is implicated in phagocytic uptake,- (c) SP-A-binding protein (M^jK) 210), involved in uptake of mycobacteria and BCG by rat macrophage and human monocytes^^'^^ and (d) the SP-AR [M, (K) 30)^«. It has also been shown that cell surface oligosaccharide may bind SP-A via its lectin domains^^ or that phospholipids may bridge between SP-A and a cell surface structure^^, such as CD14, the M,(K) 55 LPS receptor^^.

Tissue distribution SP-A is found in lung surfactant up to 5% of the total weight of the surfactant lipid/protein assembly. It is also present in amniotic fluid^^ (~5 mg/litre at term) and synovial fluid. SP-A is undetectable in normal blood plasma, but may be found in plasma in certain lung diseases, e.g. ARDS^^. Primary site of synthesis: alveolar type II cells. Secondary sites: Clara cells, serous glands of the proximal trachea, lachrymal and salivary epithelia, pulmonary parenchyma, mesothelium and synoviocytes*.

Regulation of expression Dexamethasone, IL-la, and raised oxygen concentration all increase mRNA and protein synthesis.

Protein sequence MWLCPLALNL KGDPGPPGPM AHLDEELQAT IQEACARAGG SDGTPVNYTN

ILMAASGAAC GPPGEMPCPP LHDFRHQILQ RIAVPRNPEE WYRGEPAGRG

EVKDVCVGSP GNDGLPGAPG TRGALSLQGS NEAIASFVKK KEQCVEMYTD

GIPGTPGSHG IPGECGEKGE IMTVGEKVFS YNTYAYVGLT GQWNDRNCLY

LPGRDGRDGL 50 PGERGPPGLP 100 SNGQSITFDA 150 EGPSPGDFRY 2 00 SRLTICEF

This is the sequence of the a3 allele 6A . The leader sequence is underlined, and the N-linked glycosylation site is indicated (N). There are six variants of a3 sequence reported, and four a2 variants^^. a3 sequences are characterized by the residues M, D, I, C at positions 66, 73, 81, 85 respectively, while a2 sequences have T, N, V, R at these positions.

Protein modules^^-^^ 1-20 28-100 101-152 153-248

Leader sequence (may vary) Collagen-like domain Alpha-helical coiled-coil ''neck'' region CRD

exon exon exon exon

4 4-6 6/7 7

Chromosomal location Human25: 10q21-24. Centromere ... SFTPD ... SFTPA2 ... Pseudogene ... SFTPAl ... Telomere^^ Gene names for SP-A are SFTPAl and 2 or SFTPl or PSAP and for SP-D are SFTPD, SFTP4 or PSPD. This region of the chromosome also contains the gene for MBL. Mouse: chromosome 14, 14 cM.

cDNA sequence^^ CGGAGACCCA AGAAAGAGCA CTTGATGGCA TATCCCCGGC AGGAGACCCT AAATGATGGG TGGCGAGAGG CCACGACTTT AATGACGGTA TCAGGAGGCA TGAGGCCATT GGGTCCCAGC GTACCGAGGG GCAGTGGAAT ATTTAGGCCA CTTGGTCTGT

AGCAGCTGGA GCGACTGGAC GCCTCTGGTG ACTCCTGGAT GGCCCTCCAG CTGCCTGGAG GGCCCTCCAG AGACATCAAA GGAGAGAAGG TGTGCCAGAG GCAAGCTTCG CCTGGAGACT GAGCCCGCAG GACAGGAACT TGGGACAGGG GAGATGCTAG

GGCTCTGTGT CCAGAGCCAT CTGCGTGCGA CCCACGGCCT GCCCCATGGG CCCCTGGTAT GGCTTCCAGC TCCTGCAGAC TCTTCTCCAG CAGGCGGCCG TGAAGAAGTA TCCGCTACTC GTCGGGGAAA GCCTGTACTC AGGACGCTCT AACTCCCTTT

GTGGGTCGCT GTGGCTGTGC AGTGAAGGAC GCCAGGCAGG TCCGCCTGGA CCCTGGAGAG TCATCTAGAT AAGGGGAGCC CAATGGGCAG CATTGCTGTC CAACACATAT AGACGGGACC AGAGCAGTGT CCGACTGACC CTGGCCTTCG CAACA

GAGTTTCTTG CCTCTGGCCC GTTTGTGTTG GACGGGAGAG GAAATGCCAT TGTGGAGAGA GAGGAGCTCC CTCAGTCTGC TCCATCACTT CCAAGGAATC GCCTATGTAG CCTGTAAACT GTGGAGATGT ATCTGTGAGT GCCTCCATCC

GAGCCTGAAA TCAACCTCAT GAAGCCCTGG ATGGTCTCAA GTCCTCCTGG AGGGGGAGCC AAGCCACACT AGGGCTCCAT TTGATGCCAT CAGAGGAAAA GCCTGACTGA ACACCAACTG ACACAGATGG TCTGAGAGGC TGAGGCTCCA

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900

This is the sequence of an (x3 allele. The initiation codon (ATG) and termination codon (TGA) are indicated. The first five nucleotides of exons 4-7 are underlined to indicate intron-exon boundaries. There is considerable variation in the length and sequence composition of both the 5' and 3' untranslated regions.

Genomic structure The genomic organization of each SP-A gene is similar. The SP-A genes consist of seven exons^^'^^. 1 kb 1

•-

. 1

H-H Accession numbers Human (a3) Mouse Rat Dog Guinea-pig Pig Rabbit

M13686 S48768 M33201 Ml 1769 U40869 L41350 J03542

Deficiency No clearly defined genetic deficiency of SP-A has been reported, nor have any allelic variants been reported to have grossly altered function. Large variations in the degree of polymerization of SP-A have been observed between individuals, and since lower polymers have impaired carbohydratebinding function, there may be functional deficiency of SP-A while the total SP-A concentration appears normal^'^^. It is not currently known what factors control the degree of polymerization. SP-A knockout mice do not have grossly altered surfactant function^. Lowered concentration of SP-A in the lung has been reported in association with asthma^^.

Polymorphic variants Both SP-A genes exhibit variability: splicing variability in the 5' untranslated region and sequence variability within the coding and the 3' untranslated regions. All four untranslated regions can vary in length. This genetic variability affects the level of SP-A mRNA and the ratio of SFPTA1:SFPTA2 mRNA. Variants in coding sequence of the SFTPAl gene (protein a3) are named 6A (sequences of which are shown above), and 6Ao-6A/'2^-27^29 Variants in coding sequence of the SFTPA2 gene (protein a2) are named lA, and lAg-lAj. It is thought that specific SP-A alleles are associated with increased risk of respiratory distress syndrome^^.

References ^ Malhotra, R. et al. (1994) Clin. Exp. Immunol. 97, suppl. 2, 4-9. 2 Voss, T. et al. (1991) Am. J. Respir. Cell Mol. Biol. 4, 88-94. 3 Elhalwagi, B.M. et al. (1997) Biochemistry 36, 7018-7025.

^ Hoppe, H.J. and Reid, K.B.M. (1994) Protein Sci. 3, 1143-1158. ^ Lu, J and Sim, R.B. (1994) In New Aspects of Complement Structure and Function (Erdei, A. ed.), R.G. Landes Co., Austin, TX, pp. 85-106. 6 Hickling, T.P. et al. (1998) Mol. Med. 4, 265-276. ^ Weaver, T.E. and Whitsett, J.A. (1991) Biochem.}. 273, 249-264. « McCormack F.X. and Whitsett, J.A. (1996) In Collectins and Innate Immunity (Sastry, K.N. et al. eds) R.G. Landes Co., Austin, TX, pp. 9-50. ^ Korfhagen, T.R. et al. (1996) Proc. Natl Acad. Sci. USA 93, 9594-9599. 0 Holmskov, U. et al. (1994) Immunol. Today 15, 67-74. Stuart, G.R. et al. (1996) Exp. Lung Res. 22, 467-487. Malhotra, R. et al. (1992) Eur. J. Immunol. 22, 1437-1445. Sim, R.B. et al. (1998) Immunobiology 199, 208-224. Geertsma, M.F. et al. (1994) Am. J. Physiol. 267, L578-L584. Nepomuceno, R.R. et al. (1997) Immunity 6, 119-129. Chroneos, Z.C. et al. (1996) J. Biol. Chem. 271, 16375-16383. Weikert, L.F. et al. (1997) Am. J. Physiol. 272, L989-L995. Strayer, D.S. et al. (1993) J. Biol. Chem. 268, 18679-18684. McCormack, F.X. et al. (1994)}. Biol. Chem. 269, 29801-29807. 20 Mcintosh, J.C. et al. (1996) A m . } . Respir. Cell. Mol. Biol. 15, 509-519. 2^ Sano, H. et al. (1998) Abstr. 5th Marburg Surfactant Symposium on Surfactant and Alveolar Biology. 22 Miyamura, K. et al. (1994) Biochim. Biophys. Acta 1210, 303-307. 23 Doyle, I.R. et al. (1995) Am. J. Respir. Crit. Care Med. 152, 307-317. 2^ Karinch, A.M. et al. (1997) Biochem. J. 321, 39-47. 25 Hoover, R.R. and Floros, J. (1998) Am. J. Resp. Cell. Mol. Biol. 18, 353-362. 26 White R.T. et al. (1985) Nature 317, 361-363. 27 Karinch, A.M. and Floros, J. (1995) Am. J. Resp. Cell. Mol. Biol. 12, 77-88. 28 Floros, J. et al. (1985) J. Biol. Chem. 260, 495-500. 29 Floros J. et al. (1986) J. Biol. Chem. 261, 9029-9033. 30 van de Graaf, E.A. et al. (1992) J. Lab. Clin. Med. 120, 252-263. 3^ Kala, P. et al. (1998) Pediatr. Res. 43, 169-177.

SP-D Robert B. Sim, MRC Immunochemistry Unit, Oxford, UK

Other names Surfactant protein D, pulmonary surfactant (glyco)protein D, PSP-D, SFTP4, CP4.

Physicochemical properties SP-D is a member of the coUectin family, which are typically oligomers of short Mj. (K) 25-45 polypeptide chains, each containing collagenous sequence at the N-terminus and a C-type lectin domain at the C-terminus^. The most commonly observed form of SP-D is made up of 12 identical polypeptide chains. Each polypeptide is 375 amino acids long, including a 20 amino acid leader sequence. The arrangement of domains or modules in the polypeptide is very similar to that in SP-A, but the collagenous region is much longer (177 residues instead of 73p'^. Mature protein: pi predicted M,(K)

predicted predicted observed

7.0 (theoretical, deglycosylated) 35.5 43.6 (secreted formp 40-44

In SDS-PAGE the unreduced protein behaves as a M,{K) 130-150 species, consistent with a disulfide-linked trimer^. N-linked glycosylation sites (occupied)

1 (90)

Interchain disulfide bonds 35, 40 Disulfide bridging pattern not definitively established.

Structure SP-D has a quaternary structure similar to bovine conglutinin, and its assembly is similar to that of Clq or SP-A'*. Three polypeptide chains fold to form a homotrimer. Trimerization involves formation of a coiled-coil at the neck regions and a collagen triple helix^. Interchain disulfide bridges are likely to form between C35 and C40. The three-chain subunits then associate non-covalently at their N-terminal ends to form a very extended cross-shaped structure, commonly with four subunits, and a total of 12 lectin domains. The four-subunit form has M^ (K) approx. 510-520, and a span of 92 nm. Electron microscopy and hydrodynamic studies of SP-D (both native and recombinant) show that a proportion of SP-D occurs as smaller oligomers with 1, 2 or 3 subunits, while some occurs as very large assemblies (fuzzy balls) with up to 32 subunits/96 lectin domains'*'^. Olinkage of Gal-Glc disaccharides in the collagen region is likely, but the number is unknown.

Function SP-D is a major protein of lung surfactant, where, unlike SP-A, it is found mainly (50-90%) in the aqueous phase^. Although it does interact with certain phospholipids^, it was not thought to have a major role in surfactant metabolism. However, recent work on SP-D knock-out mice does suggest a role in surfactant homeostasis. The mice showed accumulation of surfactant lipid, SP-A and SP-B in the alveolar space as well as abnormalities in alveolar type II cells and alveolar macrophage*. SP-D appears to have an important role in innate immunity^'^. SP-D binds to a wide range of bacteria, viruses, fungi and particulate materials by interaction of the lectin domains with surface saccharides (mainly glucose, glucose oligomers, mannose, inositol in phosphatidylinositol). SP-D agglutinates such targets, and stimulates neutrophil respiratory burst, and is reported to have chemotactic activity for monocytes and neutrophils. SP-D has no known enzymic activity, and is not known to interact with any complement system proteins. Unlike SP-A, it does not interact {in vitro) with the same receptor(s) as Clq^^. SP-D has specific interactions with a number of cell types, including alveolar macrophage, neutrophils and monocytes. It does not interact with the same candidate receptors as those discussed for SP-A. It appears to bind to certain cell types via a calcium ion-dependent interaction of its own lectin domains with cell surface carbohydrate^'^. gp340, a member of the scavenger receptor family, is expressed on alveolar macrophage, and is a candidate receptor for SP-D. SP-D lectin domains interact with gp340 in a calcium ion-dependent protein-protein interaction^^

Tissue distribution SP-D is found in lung surfactant at a concentration several-fold lower than SP-A. It is detectable at very low level (<100ng/ml) in normal blood plasma, and at higher levels in plasma from patients with some forms of interstitial lung disease^'^. It is present in amniotic fluid^^, where it may originate from the fetal lung. The concentration rises to ~2 mg/litre at term. Primary site of synthesis: alveolar type II cells, non-ciliated bronchiolar epithelial cells. Secondary sites: also found associated with lachrymal and salivary epithelia, and is made by mucous cells in the stomach.

Regulation of expression Regulation of expression of SP-D has not been investigated extensively. Glucocorticoids increase expression in rat and human fetal lung tissue^.

Protein sequence^-'^ MLLFLLSALV DGRDGREGPR DTGPSGPPGP GMQGSAGARG PGLKGDKGIP VELFPNGQSV QQLWAKNEA CVEIFTNGKW

LLTQPLGYLE GEKGDPGLPG PGVPGPAGRE LAGPKGERGV GDKGAKGESG GEKIFKTAGF AFLSMTDSKT NDRACGEKRL

AEMKTYSHRT AAGQAGMPGQ GALGKQGNIG PGERGVPGNT LPDVASLRQQ VKPFTEAQLL EGKFTYPTGE WCEF

MPSACTLVMC AGPVGPKGDN PQGKPGPKGE GAAGSAGAMG VEALQGQVQH CTQAGGQLAS SLVYSNWAPG

SSVESGLPGR GSVGEPGPKG AGPKGEVGAP PQGSPGARGP LQAAFSQYKK PRSAAENAAL EPNDDGGSED

50 100 150 2 00 250 3 00 3 50

The leader sequence is underlined and the N-linked glycosylation site is indicated (N).

Protein modules^^^'^^ 1-20 46-222 223-261 262-375

Leader sequence Collagen-like domain Alpha-helical coiled-coil ''neck'' region CRD

exon 1 exon 1-5 exon Gjl exon 7

Chromosomal location^^^ Human: 10q22.2-23.1. Near the genes for SP-A and MBL. See SP-A section. Mouse: chromosome 14, 14 cM.

cDNA sequence^ CGGAATTAGG TACTTGTGCA GAGGAGGTCT TTCCTCCTCT AAGACCTACT GAGAGTGGCC GGGGACCCAG GTTGGGCCCA CCAAGTGGAC GGGAAGCAGG AAAGGAGAAG CCTAAGGGAG GGGTCTGCTG AAGGGGGACA GTTGCTTCTC GCTTTCTCTC ATTTTCAAGA GCTGGTGGAC GTCGTAGCTA TTCACCTACC GATGATGGCG GCTTGTGGAG AGTGCTTGGC CTAATAAAAA

GAGATAGTTG ATGATGGTAA GCGGCTTGGA CTGCACTGGT CCCACAGAAC TGCCTGGTCG GTTTGCCAGG AAGGGGACAA CTCCAGGACC GGAACATAGG TAGGTGCCCC AGCGAGGTGT GAGCCATGGG AAGGCATTCC TGAGGCAGCA AGTATAAGAA CAGCAGGCTT AGTTGGCCTC AGAACGAGGC CCACAGGAGA GGTCAGAGGA AAAAGCGTCT CCAGGAGTTT GGTGACCATC

GTATTAGGAT AAGGGTAGCT GCTCCTGGGG CCTACTCACA AACGCCCAGT CGATGGACGG AGCTGCAGGG TGGCTCTGTT TCCCGGTGTG ACCTCAGGGC AGGCATGCAG CCCTGGTGAG TCCCCAGGGA TGGAGACAAA GGTTGAGGCC AGTTGAGCTC TGTAAAACCA TCCACGCTCT TGCTTTCCTG GTCCCTGGTC CTGTGTGGAG TGTGGTCTGC GGCCAGAAGT AAAAAAAAAA

TAGGATTGTT TACTGGTTGT CCTAACAAAA CAGCCCCTGG GCTTGCACCC GATGGGAGAG CAAGCAGGGA GGAGAACCTG CCTGGTCCAG AAGCCAGGCC GGCTCGGCAG CGTGGAGTCC AGTCCAGGTG GGAGCAAAGG TTACAGGGAC TTCCCAAATG TTTACGGAGG GCCGCTGAGA AGCATGACTG TATTCCAACT ATCTTCACCA GAGTTCTGAG CAAGGCTTAG

GTGAAGTATA CCTCCGATTC AGAAACCTGC GCTACCTGGA TGGTCATGTG AGGGCCCTCG TGCCTGGACA GACCAAAGGG CTGGAAGAGA CAAAAGGAGA GGGCAAGAGG CTGGAAACGC CCAGGGGACC GAGAAAGTGG AAGTACAGCA GCCAAAGTGT CACAGCTGCT ATGCCGCCTT ATTCCAAGAC GGGCCCCAGG ATGGCAAGTG CCAACTGGGG ACCCTCATGC

GTACGGATGC AGGTTAGAAT CATGCTGCTC AGCAGAAATG TAGCTCAGTG GGGCGAGAAG AGCTGGCCCA AGACACTGGG AGGTCCCCTG AGCTGGGCCC CCTCGCAGGC AGGGGCAGCA CCCGGGATTG GCTTCCAGAT CCTCCAGGCT CGGGGAGAAG GTGCACACAG GCAACAGCTG AGAGGGCAAG GGAGCCCAAC GAATGACAGG TGGGTGGGGC TGCCAATATC

60 12 0 180 240 3 00 3 60 42 0 48 0 540 600 6 60 72 0 780 840 900 9 60 102 0 1080 114 0 12 0 0 12 60 132 0 13 8 0

I

The initiation codon (ATG), termination codon (TGA) and polyadenylation signal (AATAAA) are underlined, as are the first five nucleotides of each exon to indicate exon/intron boundaries. There is at least one further exon upstream of exon 1, encoding 5' UTR sequence^.

Genomic structure^^^ The SP-D gene consists of at least seven exons.

. 1

2kb I

1

iim

Accession numbers Human Mouse Rat Bovine

X65018 L40156 M81231 X75911

Deficiency No clearly defined genetic deficiency of SP-D has been reported, nor have any allelic variants been reported to have grossly altered function. There may be variation in the degree of polymerization of SP-D between individuals, but this has not been extensively investigated. Lower polymers have impaired agglutination activity^. As for SP-A, it is not currently known what factors control the degree of polymerization. SP-D knockout mice have altered surfactant homeostasis^. Lowered concentration of SP-D in the lung has been reported in association with asthma^"^.

Polymorphic variants There are no reports of genetic variation in SP-D. There are minor discrepancies between D N A sequences^'^.

References ^ 2 3 ^ 5 ^

Malhotra, R. et al. (1994) Clin. Exp. Immunol. 97, suppl. 2, 4-9. Crouch, E. et al. (1993) J. Biol. Chem. 268, 2976-2983. Lu, J. et al. (1992) Biochem. J. 284, 795-802. Holmskov, U. et al. (1995) Biochem. J. 305, 889-896. Hoppe, H.J. and Reid, K.B.M. (1994) Protein Sci. 3, 1143-1158. Crouch, E. and Hartshorn, K. (1996) In Collectins and Innate Immunity (Ezekowitz, R.A.B. et al. eds), R.G. Landes, Austin, Texas, pp. 640-645. 7 Taneva, S. et al. (1997) Biochemistry 36, 8173-8179. « Botas, C. et al. (1998) Proc. Natl Acad. Sci. USA 95, 11869-11874. 9 Reid, K.B.M. (1998) Biochim. Biophys. Acta 1408, 290-295. 10 Miyamura, K. et al. (1994) Biochem. J. 300, 237-242.

^^ ^2 ^3 ^^

Holmskov, U. et al. (1997) J. Biol. Chem. 272, 13743-13749. Miyamura, K. et al. (1994) Biochim. Biophys. Acta 1210, 303-307. Rust, K. et al. (1991) Arch. Biochem. Biophys. 290, 116-126. Kishore, U. and Reid, K.B.M. (1998) In Human Protein Data (Haeberli, A. ed.) Wiley-VCH Verlag, Weinheim.

Part 2 Serine Proteases

Clr

EC 3.4.21.41

Nicole Thielens and Gerard J. Arlaud, Jean-Pierre Ebel Institute of Structural Biology, Grenoble, France

Physicochemical properties

Illlii

•III ••li liiiiill liliill

lllllli

Human Clr is a non-covalent homodimer of M, (K) 173. Each monomer is synthesized as a single-chain proenzyme molecule of 705 amino acids including a 17 amino acid leader sequence. Activation occurs through cleavage of a single bond (R463-I464), yielding two disulfide-linked chains A and B. Mature protein: 4.9 Pl 173.0 MAK] A chain B chain Amino acids 18-463 464-705 31.2 predicted 55.3 M,(K) N-linked glycosylation sites 2(125,221) 2(514,581

^^m ^^^li^^^^il^^^^^^ll Interchain disulfide bonds

^m

1A-B

451

577

Structure Each monomer comprises an interaction region [a] derived from the Nterminal half of the A chain, and a catalytic region [y-B] comprising two CCP modules and the serine protease B domain^'^. Assembly of the Clr-Clr dimer occurs through the catalytic regions-^. The interaction regions are located at each end of the dimer and mediate calcium-dependent association with Cls within the Cls-Clr-Clr-Cls tetramer'^. A threedimensional homology model of the activated form of the Clr catalytic regions is available^.

Function Clr is a serine protease with very narrow trypsin-like specificity that is responsible for activation of the CI complex, a two-step process involving (1) Clr autoactivation, and (2) Cls cleavage by activated Clr^^^. Both reactions occur through cleavage of a single Arg-lle bond that generates two-chain active proteases. Ca'*

Tissue distribution Serum protein: 34 A^g/ml in plasma. Primary site of synthesis: hepatocytes. Secondary sites: monocytes, epithelial and endothelial cells^, cells of the central nervous system*.

Regulation of expression Unknown, but Clr and Cls are coordinately expressed.

Protein sequence 9-12 MWLLYLLVPA PTGYRVKLVF GKKEFMSQGN KLGEEDPQPQ ASGYISSLEY PYDQLQIYAN RYTTEIIKCP LHSFTAVCQD RIQYYCHEPY CGKPVNPVEQ AHTLYPKEHE SYNFEGDIAL EKIAHDLRFV QGDSGGVFAV MEEED

LFCRAGGSIP QQFDLEPSEG KMLLTFHTDF CQHLCHNYVG PRSYPPDLRC GKNIGEFCGK QPKTLDEFTI DGTWHRAMPR YKMQTRAGSR RQRIIGGQKA AQSNASLDVF LELENSVTLG RLPVANPQAC RDPNTDRWVA

IPQKLFGEVT CFYDYVKISA SNEENGTIMF GYFCSCRPGY NYSIRVERGL QRPPDLDTSS IQNLQPQYQF CKIKDCGQPR ESEQGVYTCT KMGNFPWQVF LGHTNVEELM PNLLPICLPD ENWLRGKNRM TGIVSWGIGC

SPLFPKPYPN DKKSLGRFCG YKGFLAYYQA ELQEDRHSCQ TLHLKFLEPF NAVDLLFFTD RDYFIATCKQ NLPNGDFRYT AQGIWKNEQK TNIHGRGGGA KLGNHPIRRV NDTFYDLGLM DVFSQNMFCA SRGYGFYTKV

NFETTTVITV QLGSPLGNPP VDLDECASRS AECSSELYTE DIDDHQQVHC ESGDSRGWKL GYQLIEGNQV TTMGVNTYKA GEKIPRCLPV LLGDRWILTA SVHPDYRQDE GYVSGFGVME GHPSLKQDAC LNYVDWIKKE

50 100 15 0 2 00 250 3 00 3 50 400 45 0 50 0 55 0 600 650 700

The leader sequence and the cleavage site (RI) between the A (N-terminal) and B (C-terminal) chains are underlined. The iV-linked glycosylation sites (all occupied) are indicated (N). N167 undergoes full post-translational hydroxylation.

Protein modules 1-17 18-139 140-192 193-304 305-372 375-447 448-463 464-705

Leader sequence CUB EGF-Ca2CUB CCP CCP Connecting segment Serine protease domain

Catalytic triad: H502, D557, S654.

Chromosomal location Human^^: 12pl3. The Clr and Cls genes lie in a tail-to-tail orientation, with a distance of about 9.5 kb between their 3' ends^^.

cDNA sequence^^ TGCACGAAGA TTGTACCTCC AAGTTATTTG ACAACCACTG GACCTGGAGC AGCCTGGGGA GAATTTATGT GAGAATGGGA GATGAATGTG CTGTGTCACA GAAGACAGGC TACATCTCCA ATCCGGGTGG GACCACCAGC ATTGGCGAGT GATCTGCTGT ACCGAGATCA CTGCAGCCTC CTCATAGAGG TGGCATCGTG AATGGTGACT TACTACTGCC CAAGGGGTGT ATTCCTCGGT ATCATCGGAG CACGGGCGCG CTGTATCCCA ACAAATGTGG CCGGACTACC GAAAATAGTG TTCTACGACC GCTCATGACC CTCCGGGGAA TCTCTAAAGC AACACTGATC TATGGCTTCT GAGGACTGAG CAAAAAACAA AGAAAGACCG AACCAAAGGG

CGCTGTCGGG TGGTGCCGGC GGGAGGTGAC TGATCACAGT CTTCTGAAGG GGTTCTGTGG CCCAAGGGAA CCATCATGTT CTTCCCGGAG ACTACGTTGG ATTCCTGCCA GCCTGGAGTA AGCGGGGCCT AAGTACACTG TCTGTGGGAA TCTTCACAGA TCAAGTGCCC AGTACCAGTT GGAACCAGGT CCATGCCCAG TCCGTTACAC ATGAGCCATA ACACCTGCAC GCTTGCCAGT GGCAAAAAGC GGGGCGGGGC AGGAACACGA AAGAGCTCAT GTCAGGATGA TCACCCTGGG TGGGCTTGAT TCAGGTTTGT AGAATAGGAT AGGACGCCTG GCTGGGTGGC ACACCAAAGT CCCAGAATTC CTGACCAGTT TGTGTGAAAT CCCCTTTCTT

AGAGCCCAGG CCTGTTCTGC TTCCCCTCTG CCCCACGGGA CTGCTTCTAT GCAACTGGGT CAAGATGCTG CTACAAGGGC CAAATTAGGG AGGCTACTTC GGCTGAGTGC CCCTCGGTCC CACCCTGCAC CCCCTATGAC GCAAAGGCCC TGAGTCGGGG CCAGCCCAAG CCGTGACTAC GCTGCATTCC ATGCAAGATC CACCACAATG TTACAAGATG AGCACAGGGC GTGTGGGAAG CAAGATGGGC CCTGCTGGGC AGCGCAAAGC GAAGCTAGGA GTCCTACAAT TCCCAACCTC GGGCTATGTC CCGTCTGCCC GGATGTGTTC CCAGGGGGAT CACGGGCATC GCTCAACTAC ACTAGGTTCG GTTGATAACC TCTCTTTCCT TCTTCTGAGG

ATTCAACACG AGGGCAGGAG TTCCCCAAGC TACAGGGTGA GATTATGTCA TCTCCACTGG CTGACCTTCC TTCCTGGCCT GAGGAGGATC TGTTCCTGCC AGCAGCGAGC TACCCCCCTG CTCAAGTTCC CAGCTACAGA CCCGACCTCG GACAGCCGGG ACCCTAGACG TTCATTGCTA TTCACAGCTG AAGGACTGTG GGAGTGAACA CAGACCAGAG ATTTGGAAGA CCCGTGAACC AACTTCCCCT GACCGCTGGA AACGCCTCTT AATCACCCCA TTTGAGGGGG CTCCCCATCT AGTGGCTTCG GTAGCTAATC TCTCAAAACA AGTGGGGGCG GTGTCCTGGG GTGGACTGGA AATCCAGAGA ACTAAGAGTC GTAGTCCCAT ATTGCAGAGG

GGCCTTGAGA GCTCCATTCC CTTACCCCAA AGCTCGTCTT AGATCTCTGC GCAACCCCCC ACACAGACTT ACTACCAAGC CCCAGCCCCA GTCCAGGCTA TGTACACGGA ACCTGCGCTG TGGAGCCTTT TCTATGCCAA ACACCAGCAG GCTGGAAGCT AGTTCACCAT CCTGCAAGCA TCTGCCAGGA GGCAGCCCCG CCTACAAGGC CTGGCAGCAG ATGAACAGAA CCGTGGAACA GGCAGGTGTT TCCTCACAGC TGGATGTGTT TCCGCAGGGT ACATCGCCCT GCCTCCCTGA GGGTCATGGA CACAGGCCTG TGTTCTGTGC TTTTTGCAGT GCATCGGGTG TCAAGAAAGA GCAGTGTGGA TCTATTAAAA TGATGTACTT ATATAG

AATGTGGCTC CATCCCTCAG CAACTTTGAA CCAGCAGTTT TGATAAGAAA GGGAAAGAAG CTCCAACGAG TGTGGACCTT GTGCCAGCAC TGAGCTTCAG GGCATCAGGC CAACTACAGC TGATATTGAT CGGGAAGAAC CAATGCTGTG GCGCTACACC CATCCAGAAC AGGCTACCAG TGATGGCACG AAACCTGCCT CCGTATCCAG GGAGTCTGAG GGGAGAGAAG GAGGCAGCGC CACCAACATC TGCCCACACC CCTGGGCCAC CAGCGTCCAC GCTGGAGCTG CAACGATACC GGAGAAGATT TGAGAACTGG TGGACACCCA AAGGGACCCG CAGCAGGGGC GATGGAGGAG AAAAAAAAAA TTACTGATGC TACCTGAAAC

60 12 0 180 240 3 00 3 60 42 0 48 0 540 600 660 720 7 80 840 900 9 60 102 0 1080 1140 12 00 12 60 132 0 13 8 0 1440 1500 1560 162 0 1680 1740 1800 1860 192 0 1980 2 040 2100 2160 222 0 22 80 2 340

Only the position of the final intron-exon boundary is known and the first five nucleotides of the final exon are underlined (1400-1404). The methionine initiation codon (ATG), the termination codon (TGAI, and the probable polyadenylation signal (ATTAAA) are indicated.

Genomic structure The gene structure is unknown but the serine protease domain is encoded by a single exon^^.

Accession numbers Human9-^2

X04701 M14058

Deficiency Cases of Clr deficiencies are rare, and usually combined with a Cls deficiency. They are associated with an increased susceptibility to pyogenic infection and immune complex disease such as systemic lupus erythematosus^^.

Polymorphic variants T506C; L152S^^-^2. Two common variants and one rare have been identified by isoelectric focusing of plasma samples^'''^^.

References ' Villiers, C.L. et al. (1985) Proc. Natl Acad. Sci. USA 82, 4477-4481. 2 Busby, T.F. and Ingham, K.C. (1987) Biochemistry 26, 5564-5571. 3 Lacroix, M. et al. (1997) Biochemistry 36, 6270-6282. ^ Thielens, N.M. et al. (1990) J. Biol. Chem. 265, 14469-14475. 5 Cooper, N.R. (1985) Adv. Immunol. 37, 151-216. 6 Arlaud, G.J. and Thielens, N.M. (1993) Methods Enzymol. 223, 61-82. 7 Strunk, R.C. and Colten, H.R. (1993) In Complement in Health and Disease, 2nd edition (Whaley, K., Loos, M. and Weiler, }.M. eds), Kluwer Academic Publications, Dordrecht, pp. 127-158. « Morgan, B.P. and Casque, P. (1996) Immunol. Today 17, 461-466. 9 Arlaud, C.J. and Cagnon, J. (1983) Biochemistry 22, 1758-1764. 0 Arlaud, C.J. et al. (1987) Biochem. J. 241, 711-720. ^ Journet, A. and Tosi, M. (1986) Biochem. J. 240, 783-787. 2 Leytus, S.P. et al. (1986) Biochemistry 25, 4855-4863. 3 Van Cong, N. et al. (1988) Hum. Cenet. 78, 363-368. ^ Kusumoto, H. et al. (1988) Proc. Natl Acad. Sci. USA 85, 7307-7311. 5 Endo, Y. et al. (1996) Int. Immunol. 9, 1355-1358. 6 Morgan, B.P. and Walport, M.J. (1991) Immunol. Today 12, 301-306. 7 Kamboh, M.I. and Ferrel, R.E. (1986) Am. J. Hum. Cenet. 39, 826-831. s Kamboh, M.I. et al. (1988) Hum. Cenet. 81, 93-94.

Cls

EC 3.4.21.42 Nicole Thielens and Gerard J. Arlaud, Jean-Pierre Ebel Institute of Structural Biology, Grenoble, France

Physicochemical properties Human Cls is a glycoprotein of M^ (K) 79.8. It is synthesized as a singlechain proenzyme molecule of 688 amino acids including a 15 amino acid leader sequence. Activation occurs through cleavage of a single bond (R437-I438), yielding two disulfide-linked chains A and B. Mature protein: 4.5 pi MAK] 79.8 A chain B chain Amino acids 438-688 16-437 52.2 Mr (K) observed 27.7 AT-linked glycosylation sites 2(174,406) Interchain disulfide bonds 1 A-B 425 549

Structure Cls comprises an interaction region [a] corresponding to the N-terminal half of the A chain, and a catalytic region [y-B] comprising two CCP modules and the serine protease B domain^-^. A three-dimensional homology model of the activated form of the Cls catalytic region is available^.

Function Cls is a highly specific serine protease that mediates proteolytic activity of the CI complex towards its substrates C4 and C2. It cleaves a single Arg-Ala bond in C4 to yield C4a and C4b, and a single Arg-Lys bond in C2 to yield C2a and C2b^«.

Tissue distribution Serum protein: 31 ^g/ml in plasma. Primary site of synthesis: hepatocytes. Secondary sites: monocytes, epithelial and endothelial cells^, cells of the central nervous system^*^.

Regulation of expression Unknown, but Clr and Cls are coordinately expressed.

Protein sequence^^-^^ MWCIVLFSLL HLYFTHLDIE VPYNKLQVIF IGGYFCSCPP RCEYQIRLEK CGHGFPGPLN NSVWEPAKAK KLKCQPVDCG YHCAGNGSWV WQVFFDNPWA KMLTPEHVFI PGTSSDYNLM EKPTADAEAY GLVSWGPQCG

AWVYAEPTMY LSENCAYDSV KSDFSNEERF EYFLHDDMKN GFQWVTLRR lETKSNALDI YVFRDWQIT IPESIENGKV NEVLGPELPK GGALINEYWV HPGWKLLEVP DGDLGLISGW VFTPNMICAG TYGLYTRVKN

GEILSPNYPQ QIISGDTEEG TGFAAYYVAT CGVNCSGDVF EDFDVEAADS IFQTDLTGQK CLDGFEWEG EDPESTLFGS CVPVCGVPRE LTAAHWEGN EGRTNFDNDI GRTEKRDRAV GEKGMDSCKG YVDWIMKTMQ

AYPSEVEKSW RLCGQRSSNN DINECTDFVD TALIGEIASP AGNCLDSLVF KGWKLRYHGD RVGATSFYST VIRYTCEEPY PFEEKQRIIG REPTMYVGST ALVRLKDPVK RLKAARLPVA DSGGAFAVQD ENSTPRED

DIEVPEGYGI PHSPIVEEFQ VPCSHFCNNF NYPKPYPENS VAGDRQFGPY PMPCPKEDTP CQSNGKWSNS YYMENGGGGE GSDADIKNFP SVQTSRLAKS MGPTVSPICL PLRKCKEVKV PNDKTKFYAA

50 100 15 0 2 00 2 50 3 00 3 50 400 45 0 5 00 550 60 0 650

The leader sequence and the cleavage site (RI) between the A (N-terminal) and B (C-terminal) chains are underlined. N-linked glycosylation sites (all occupied) are indicated (N). N149 undergoes partial post-translational hydroxylation^^.

Protein modules 1-15 Leader sequence CUB 16-128 129-174 EGF-Ca2CUB 175-289 CCP 290-355 358-422 CCP Connecting segment 423-437 438-688 Serine protease domain Catalytic triad: H475, D529, S632.

exon 2/3 exon 3/4 exon 5 exon 617 exon 8/9 exon 10/11 exon 12 exon 12

Chromosomal location Human^5. i2pi3. The Clr and Cls genes lie in a tail-to-tail orientation, with a distance of about 9.5 kb between their 3' ends^^.

I

cDNA sequence^^ GGGCCGGAGT GGGGTTGCCC CTTCTGAAGG CAGGGTGGAC TTATGCTGAG CAGTGAGGTA CTTCACCCAT CTCAGGAGAC TCCAATTGTG CTTTTCCAAT TGAATGCACA TTACTTCTGC TAATTGCAGT CAAACCATAT AGTGGTGGTG CTGCCTTGAC TGGATTCCCT AACTGATCTA CTGCCCTAAG TAGAGATGTG TGCAACATCT ATGTCAACCT AGAGAGCACT GGAAAATGGA GCTGGGCCCG AGAAAAACAG CTTCTTTGAC TGCTCATGTT GACCTCACGG ATGGAAGCTG GCGGCTGAAA CTCTTCCGAC AGAGAAGAGA AAAATGCAAA TCCTAACATG TGGGGCCTTT GTCCTGGGGG CTGGATAATG CCCACCAGCC CATTATTTCA TGAGACGCCT TCTGACTCCT ACTCCTTTCT CAAAATTCCA TATAAGC

TCCTGCAGAG AGCATGCCCA CTCCAAAGTC AAATCGCCAG CCTACCATGT GAGAAATCTT CTGGACATTG ACTGAAGAAG GAAGAGTTCC GAAGAGCGTT GATTTTGTAG TCCTGCCCCC GGGGATGTAT CCAGAGAACT ACCTTGCGGA AGTTTAGTTT GGGCCTCTAA ACAGGGCAAA GAAGACACTC GTGCAGATAA TTCTATTCGA GTGGACTGTG TTGTTTGGTT GGAGGTGGGG GAGCTGCCGA AGGATAATTG AACCCATGGG GTGGAGGGAA CTGGCAAAAT CTGGAAGTCC GACCCAGTGA TACAACCTCA GATCGTGCTG GAAGTGAAAG ATCTGTGCTG GCTGTACAGG CCCCAGTGTG AAGACTATGC TCTCCAAGGG TCATGACTGA TGCTAGAGGT TGGGGTCCTT TGCACTATTC TTTACTTGAT

GGAGCGTCAA CTGGCAGGAG CGGAGTGCAG AGATGTGGTG ATGGGGAGAT GGGACATAGA AGCTGTCAGA GGAGGCTCTG AAGTCCCATA TTACGGGGTT ATGTCCCTTG CGGAATATTT TCACTGCACT CAAGGTGTGA GAGAAGATTT TTGTTGCAGG ATATTGAAAC AAAAGGGCTG CCAATTCTGT CCTGTCTGGA CTTGTCAAAG GCATTCCTGA CTGTCATCCG AGTATCACTG AATGTGTTCC GAGGATCCGA CTGGTGGAGC ACAGGGAGCC CCAAGATGCT CAGAAGGACG AAATGGGACC TGGATGGGGA TTCGCCTCAA TGGAGAAACC GAGGAGAGAA ATCCCAATGA GGACCTATGG AGGAAAATAG TGGTGACCAA AAGAAGACAC AGAGTTTGAT TCCCCGGAGT CACAGGGATA CATTCTCAGT

GGCCCTGTGC AGAGGGAACT AAAGCCAGGA CATTGTCCTG CCTGTCCCCT AGTTCCTGAA GAACTGTGCG TGGACAGAGG CAACAAACTC TGCTGCATAC TAGCCACTTC CCTCCATGAT GATTGGGGAG ATACCAGATC TGATGTGGAA AGATCGGCAA CAAGAGTAAT GAAACTTCGC TTGGGAGCCT TGGGTTTGAA CAATGGAAAG ATCCATTGAG CTACACTTGT TGCTGGTAAC AGTCTGTGGA TGCAGATATT GCTCATTAAT AACAATGTAT CACTCCTGAG AACCAATTTT CACCGTCTCT CCTGGGACTG GGCGGCAAGG CACAGCAGAT GGGCATGGAT CAAGACCAAA GCTCTACACA CACCCCCCGT TGCATTACCT GAGCGAATGA CATAGAATTG ACCTATTGTA CCTTAATTCT ATCCACTGTC

TGCTGTCCCT GACCCACTTG CCAAGAGACA TTTTCACTTT AACTATCCTC GGGTATGGGA TATGACTCAG AGCAGTAACA CAGGTGATCT TATGTTGCCA TGCAACAATT GACATGAAGA ATTGCAAGTC CGGTTGGAGA GCAGCTGACT TTTGGTCCTT GCTCTTGATA TATCATGGAG GCGAAGGCAA GTTGTGGAGG TGGAGTAATT AATGGTAAAG GAGGAGCCAT GGGAGCTGGG GTCCCCAGAG AAAAACTTCC GAGTACTGGG GTTGGGTCCA CATGTGTTTA GATAATGACA CCCATCTGCC ATCTCAGGCT TTACCTGTAG GCAGAGGCCT AGCTGTAAAG TTCTACGCAG CGGGTAAAGA GAGGACTAAT TCTGTTCCTT TTTAAATAGA TGCTGGTCAT GATAACACTA TTGTTTCCTC TATGTACAAT

GGGGGCCAGA CTCCTACCAG GGCAGCTCAC TGGCATGGGT AGGCATATCC TTCACCTCTA TGCAGATAAT ATCCCCACTC TTAAGTCAGA CAGACATAAA TCATTGGTGG ATTGCGGAGT CCAATTATCC AAGGGTTCCA CAGCGGGAAA ACTGTGGTCA TCATCTTCCA ATCCAATGCC AATATGTCTT GACGTGTTGG CCAAACTGAA TTGAAGACCC ATTACTACAT TGAATGAGGT AACCCTTTGA CCTGGCAAGT TGCTGACGGC CCTCAGTGCA TTCATCCGGG TTGCACTGGT TACCAGGCAC GGGGCCGAAC CTCCTTTAAG ATGTTTTCAC GGGACAGTGG CTGGCCTGGT ACTATGTTGA CCAGATACAT ATGATATTCT ACTTGATTGT ACATTTGTGG TGGGTGGGGC TTTACCTGTT AAAGGATGTT

60 12 0 180 240 3 00 3 60 42 0 4 80 540 6 00 660 72 0 780 840 9 00 9 60 102 0 10 8 0 114 0 12 0 0 12 60 13 2 0 13 8 0 144 0 150 0 15 60 162 0 1680 17 4 0 18 0 0 18 6 0 192 0 19 8 0 2 04 0 2100 216 0 222 0 22 8 0 23 40 24 0 0 24 60 2 52 0 2 58 0 2 64 0

The first five nucleotides in each exon are underHned to indicate intron-exon boundaries. Exon 2 is only five nucleotides long (203-207) and contains the intiation codon. To avoid confusion, this exon is not indicated with additional underlining, but is indicated by the double-underlined methionine initiation codon (ATG). The termination codon (TAA) and the polyadenylation signal (AATAAA) are indicated.

Genomic structure The Cls gene spans about 10.5 kb, and contains 12 exons. The whole serine protease domain and the preceding short connecting segment are encoded by a single exon^*^'^^.

i III

I I III I I 1kb

Accession numbers Human^^-^^ Hamster^*

X06596 M18767 J04080 X16160

Deficiency Cases of Cls deficiencies are rare, usually partial, and combined with a Clr deficiency. They are associated with an increased susceptibility to pyogenic infection and immune complex disease such as systemic lupus erythematosus^^.

Polymorphic variants No polymorphism has been identified at the DNA or protein level. N406 in human Cls is linked to either biantennary or triantennary complex-type oligosaccharides, giving rise to three major types of Cls molecules of M, (K) 79.3, 79.9, and 80. P*^. One common and two uncommon variants have been identified by isoelectric focusing of plasma samples^^.

References ' 2 3 4 5 6 7 « ^

Villiers, C.L. et al. (1985) Proc. Natl Acad. Sci. USA 82, 4477-4481. Busby, T.F. and Ingham, K.C. (1988) Biochemistry 27, 6127-6135. Medved, L.V. et al. (1989) Biochemistry 28, 5408-5414. Thielens, N.M. et al. (1990) J. Biol. Chem. 265,14469-14475. Rossi, V. et al. (1995) Biochemistry 34, 7311-7321. Cooper, N.R. (1985) Adv. Immunol. 37, 151-216. Arlaud, G.J. and Thielens, N.M. (1993) Methods Enzymol. 223, 61-82. Arlaud, G.J. et al. (1998) Adv. Immunol. 69, 249-307. Strunk, R.C. and Colten, H.R. (1993) In Complement in Health and Disease, 2nd edition (Whaley, K., Loos, M. and Weiler, J.M. eds), Kluwer Academic Publications, Dordrecht, pp. 127-158.

0 Morgan, B.P. and Gasque, P. (1996) Immunol. Today 17, 461-466. Mackinnon, CM. et al. (1987) Eur. J. Biochem. 169, 547-553. Tosi, M. et al. (1987) Biochemistry 26, 8516-8524. Kusumoto, H. et al. (1988) Proc. Natl Acad. Sci. USA 85, 7307-7311. Thielens, N.M. et al. (1990) Biochemistry 29, 3570-3578. Van Cong, N. et al. (1988) Hum. Genet. 78, 363-368. Tosi, M. et al. (1989) J. Mol. Biol. 208, 709-714. Endo, Y. et al. (1998) J. Immunol. 161, 4924-4930. Kinoshita, H. et al. (1989) FEES Lett. 250, 411-415. Morgan, B.P. and Walport, M.J. (1991) Immunol. Today 12, 301-306. Petillot, Y. et al. (1995) FEES Lett. 358, 323-328. Kamboh, M.I. and Ferrell, R.E. (1987) J. Immunogenet. 14, 231-238.

MASP-1

•

Teizo Fujita, Yuichi Endo and Misao Matsushita, Department of Biochemistry, Fukushima Medical University, 1-Hikariga-oka, Fukushima 960-1295, Japan Other names Mannose (mannan)-binding lectin (MBL)-associated serine protease 1, PI00.

Physicochemical properties MASP-l is synthesized as a single-chain polypeptide of 699 amino acids including a 19 amino acid leader sequence. In the unactivated proenzyme form, MASP-1 is a single polypeptide of Mr (K) 93, while in the active form, it consists of two polypeptides (heavy and light chains) linked by a disulfide bond. Mature protein: pi (predicted) Mr(K) Amino acids Mr (K) predicted observed iV-linked glycosylation sites Potential Interchain disulfide bond 1 heavy-light

•

5.18 93 Heavy chain 20-448 49 66

Light chain 449-699 28 31

4(49,178,385,407)

0

436

572

Structure None known.

Function Proenzyme MASP-1 forms a complex with MASP-2 and MBL and circulates in blood*. Upon binding of the MBL-MASP complex to appropriate carbohydrates via MBL, MASP-1 is cleaved between R448 and 1449 converting to the active serine protease form consisting of two polypeptides (heavy and light chains)2. In addition to MASP-1, MASP-2 is also converted to the active form and the MBL-MASP complex then activates C4 and C2 resulting in the formation of the C3 convertase, C4b2a. Although the precise functional roles of MASP-1 in the complex remain to be elucidated it is able to cleave C2 in the fluid phase. MASP-1 can also cleave C3 to give C3a and C3b, the latter of which participates in the formation of the C3 convertase, C3bBb3.

Tissue distribution Serum protein: Gjng/ml in serum^. (MASP-1 exists as MBL-bound and MBLunbound forms in serum.) Primary site of synthesis: liver^.

Regulation of expression MASP-l is upregulated by IL-6 and by dexamethasone in the rat hepatocytes^.

Protein sequence^'^ MRWLLLYYAL TVPDGFRIKL TPGQEWLSP LSCDHYCHNY DFPNPYPKSS VGPKVLGPFC CPELQPPVHG KDGTWSNKIP YYKMLNNNTG NGRPAQKGTT PTLRDSDLLS NDVALVELLE LMEIEIPIVD VTLNRERGQW

CFSLSKASAH YFMHFNLESS GSFMSITFRS IGGYYCSCRF ECLYTIELEE GEKAPEPIST KIEPSQAKYF TCKIVDCRAP lYTCSAQGVW PWIAMLSHLN PSDFKIILGK SPVLNAFVMP HSTCQKAYAP YLVGTVSWGD

TVELNNMFGQ YLCEYDYVKV DFSNEERFTG GYILHTDNRT GFMVNLQFED QSHSVLILFH FKDQVLVSCD GELEHGLITF MNKVLGRSLP GQPFCGGSLL HWRLRSDENE ICLPEGPQQE LKKKVTRDMI DCGKKDRYGV

IQSPGYPDSY ETEDQVLATF FDAHYMAVDV CRVECSDNLF IFDIQDHPEV SDNSAENRGW TGYKVLKDNV STRNNLTTYK TCLPVCGLPK GSSWIVTAAH QHLGVKHTTL GAMVIVSGWG CAGEKEGGKD YSYIHHNKDW

PSDSEVTWNI CGRETTDTEQ DECKEREDEE TQRTGVITSP PCPYDYIKIK RLSYRAAGNE EMDTFQIECL SEIKYSCQEP FSRKLMARIF CLHQSLDPGD HPQYDPNTFE KQFLQRFPET ACAGDSGGPM IQRVTGVRN

50 100 150 2 00 2 50 3 00 3 50 400 45 0 500 55 0 60 0 65 0

T h e leader sequence and the cleavage site (RI) to generate the heavy and light chains are underlined. Potential N-linked glycosylation sites are indicated (N).

Protein modules Leader sequence 1-19 20-139 CUB 140-183 EGF-Ca2CUB 184-297 298-363 CCP 364-434 CCP Serine protease domain 435-699 Catalytic triad: H490, D552, S646.

D

Chromosomal location Human^: 3q27-28.

exon exon exon exon exon exon exon

1/2 2/3 4 5/6 7/8 9/10 11-16

I

cDNA sequence5,7 ATTCCGGCAC CGGGCAGCCG TGCTTCTCCC ATCCAGTCGC ACTGTCCCAG TACCTTTGTG TGTGGCAGGG GGCTCCTTCA TTTGATGCCC CTGTCCTGTG GGCTACATCC ACTCAAAGGA GAATGCCTGT ATATTTGACA GTTGGTCCAA CAGAGCCACA AGGCTCTCAT AAAATCGAGC ACAGGCTACA AAGGATGGGA GGAGAGCTGG TCTGAGATCA ATATATACCT ACCTGCCTTC AATGGACGCC GGGCAGCCCT TGCCTCCACC CCTTCTGAGT CAGCATCTCG AATGACGTGG ATCTGTCTGC AAGCAGTTCT CACAGCACCT TGTGCTGGGG GTGACCCTGA GACTGTGGGA ATCCAGAGGG TGTGGGCAGT CCTCCCATCC CCCTTTCACT GGCTCTATCA GTTTCTTCAT AGATTAAGTA ATGCACCTTA GCCTGGGTCT TGCCCTTAAA GCTCATCCCT

AGGGACACAA GGAGCACCCA TGTCAAAGGC CTGGTTATCC ATGGGTTTCG AATATGACTA AGACCACAGA TGTCCATCAC ACTACATGGC ACCACTACTG TCCACACAGA CTGGGGTGAT ATACCATCGA TTCAGGACCA AAGTTTTGGG GTGTCCTGAT ACAGGGCTGC CCTCCCAAGC AAGTGCTGAA CGTGGAGTAA AACACGGGCT AATACTCCTG GTTCTGCCCA CAGTGTGTGG CAGCCCAGAA TCTGCGGAGG AGTCACTCGA TCAAAATCAT GCGTCAAACA CTCTGGTGGA CTGAGGGACC TGCAAAGGTT GCCAGAAGGC AGAAGGAAGG ATAGAGAAAG AGAAGGACCG TCACCGGAGT CAGTAGCAGA CCCCTCCTTC CTCTTTAAAG CCTTACTAGT AAGATGGAAA ATAGATGCAT GCAGAAGGTC TAGCATTGAT TGCTGTATGC AAAAAAGTAA

ACAAGCTCAC AGGCAGGAAA TTCAGCCCAC AGACTCCTAT GATCAAGCTT TGTGAAGGTA CACAGAGCAG TTTCCGGTCA TGTGGATGTG CCACAACTAC CAACAGGACC CACCAGCCCT GCTGGAGGAG TCCTGAGGTG GCCTTTCTGT CCTGTTCCAT AGGAAATGAG CAAGTATTTC GGATAATGTG CAAGATTCCC GATCACCTTC TCAGGAGCCC AGGAGTCTGG GCTCCCCAAG AGGCACCACT CTCCCTTCTA TCCGGGAGAT CCTGGGCAAG CACCACTCTC GCTGTTGGAG GCAGGAGGAA CCCAGAGACC TTATGCCCCG GGGAAAGGAC AGGCCAGTGG CTACGGAGTA GAGGAACTGA GGACGATCCT GGCCTATCCA AGATGGAGCA TTGGAGTGCT TGCTATACCT AGCACTTAAC GATGTGTCTA CAGTGACACA TTTTTTGCCA AACAGACAAA

CCAACAAAGC ATGAGGTGGC ACCGTGGAGC CCCAGTGATT TACTTCATGC GAAACTGAGG ACTCCCGGCC GATTTCTCCA GACGAGTGCA ATTGGCGGCT TGCCGAGTGG GACTTCCCAA GGTTTCATGG CCCTGCCCCT GGAGAGAAAG AGTGACAACT TGCCCAGAGC TTCAAAGACC GAGATGGACA ACCTGTAAAA TCTACAAGGA TATTACAAGA ATGAATAAAG TTCTCCCGGA CCCTGGATTG GGCTCCAGCT CCGACCCTAC CATTGGAGGC CACCCCCAGT AGGCCAGTGG GGAGCCATGG CTGATGGAGA CTGAAGAAGA GCCTGTGCGG TACCTGGTGG TACTCTTACA ATTTGGCTCC CCGATGAAAG TTACTGGGCA AGAGAGTGGT GGGCAGGTGA TACCTACCTC AGAGTGCATA CCAGGCAGCG CCTCTCCCCT CCGTGCAACT AAAAAAAAAA

CAAGCTGGGA TGCTTCTCTA TAAACAATAT CAGAGGTGAC ACTTCAACTT ACCAGGTGCT AGGAGGTGGT ATGAGGAGCG AGGAGAGGGA ACTACTGCTC AGTGCAGTGA ACCCTTACCC TCAACCTGCA ATGACTACAT CCCCAGAACC CGGCAGAGAA TACAGCCTCC AAGTGCTCGT CATTCCAGAT TTGTAGACTG ACAAGCTCAC TGCTCAACAA TATTGGGGAG AGCTGATGGC CCATGCTGTC GGATCGTGAC GTGATTCAGA TCCGGTCAGA ATGATCCCAA TGAATGCCTT TCATCGTCAG TTGAAATCCC AAGTGACCAG GTGACTCTGG GCACTGTGTC TCCACCACAA TCAGCCCCAG CAGCCATTTC ATAGAGCAGG CAGAACACAG CTTCATCTCT GTAAAAGTCT GCATACACTG AAGCTCTCTT CAACCTTGAC TGCCCAACAT AAAAAAAAA

GGACCAAGGC TTATGCTCTG GTTTGGCCAG TTGGAATATC GGAATCCTCC GGCAACCTTC CCTCTCCCCT TTTCACAGGC GGACGAGGAG CTGCCGCTTC CAACCTCTTC CAAGAGCTCT GTTTGAGGAC CAAGATCAAA CATCAGCACC CCGGGGCTGG TGTCCATGGG CAGCTGTGAC TGAGTGTCTG TAGAGCCCCA CACATACAAG TAACACAGGT AAGCCTACCC CAGGATCTTC ACACCTGAAT CGCCGCACAC CTTGCTCAGC TGAAAATGAA CACATTCGAG CGTGATGCCC CGGCTGGGGG GATTGTTGAC GGACATGATC AGGCCCCATG CTGGGGTGAT CAAGGACTGG CACCACCAGC TCCTTTCCTT TATCTTCACC GCCGAATCCA TCGAACTTCA GATGAGGAAA TTTTCAATAA ACAAACCCCT CATCTCCATC CAATCTTCAC

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760

The first five nucleotides in each exon are underHned to indicate intron-exon boundaries. The methionine initiation codon (ATGI and the termination codon (TGA) are indicated.

Genomic structure^^^ The gene spans more than 50 kb and contains at least 16 exons as illustrated below. 2kb

16

I

I

h

I I II

"hf

I II I II

Accession numbers Human ^.7,8,9 Mouse^^ Rat (partial)^ Xenopus^

D

cDNA D28593 D17525 D16492 AF004661 D83276

Genomic D61690-5 AB10813-22

Deficiency No known deficiencies.

Polymorphic variants No known polymorphisms.

References 1 Matsushita, M. et al. (1998) Immunobiology 199, 340-347. 2 Matsushita, M. and Fujita, T. (1992) J. Exp. Med. 176, 1497-1502. 3 Matsushita, M. and Fujita, T. (1995) Immunobiology 194, 443-448. ^ Terai, I. et al. (1997) Clin. Exp. Immunol. 110, 317-323. 5 Sato, T. et al. (1994) Int. Immunol. 6, 665-669. 6 Knittel, T. et al. (1997) Lab. Invest. 77, 221-230. ^ Takada, F. et al. (1993) Biochem. Biophys. Res. Commun. 196, 1003-1009. 8 Endo, Y. et al. (1996) Int. Immunol. 8, 1355-1358. 9 Endo, Y. et al. (1998) J. Immunol. 161, 4924-4930. 10 Takayama, Y. et al. (1994) J. Immunol. 152, 2308-2316.

MASP-2 Steen V. Petersen and Jens C. Jensenius, Department of Medical Microbiology and Immunology, University of Aarhus, Aarhus, Denmark Other n a m e s Mannose (mannan)-binding lectin (MBL)-associated serine protease component of the mouse RaRF complex^

2,

Physicochemical properties MASP-2 is synthesized as a single-chain molecule of 686 amino acids including a leader sequence of 15 amino acids yielding a mature protein of 671 residues of M^ (K) 74.2. The sequence presents a peptide bond which is likely to be cleaved upon activation (predicted in analogy to C l r and Cls). The resulting A chain (N-terminal 429 residues) and B chain (C-terminal 242 residues) are predicted to be held together with a disulfide bond.

Mature protein: pP M, (K) predicted observed Amino acids M, (K) predicted observed N-linked glycosylation sites Interchain disulfide bonds 1 A-B

5.43 (calculated) 74.2 76.0 A chain 16-444 47.7 52 none

B chain 445-68( 26.5 31

434

552

Structure The structure of MASP-2 has not been determined.

Function MASP-2 is a trypsin-like serine protease^ and presumably plays an important role in the initiation of the MBL complement activation pathway. It is activated upon binding of the MBL-MASP complex to oligosaccharides (unpublished observation) and subsequently cleaves C4, generating C4a and C4b. No direct evidence of MASP-2 being a C2-cleaving protease has yet been published, but it seems most likely that this is the case since activated MASP-2 blotted onto a PVDF (polyvinylidene difluoride) membrane is capable of generating a C3 convertase upon incubation with serum (unpublished observation). While MASP-1 reportedly is capable of directly cleaving C3, MASP-2 is considered to be the C4b2b-generating protease of the lectin pathway. The function of the N-terminal part has not been assessed. It might be speculated that this part of the molecule participates in intermolecular interactions.

Tissue distribution Serum protein: concentration unknown. Primary site of synthesis: likely to be the liver (unpublished observation). Secondary sites: MASP-2 RNA was not identified in tissues from heart, brain, placenta, lung, kidney and pancreas by northern blotting (unpublished observation).

Regulation of expression This has not been investigated.

I

Protein sequence^ MRLLTLLGLL APPGYRLRLY PGKDTFYSLG TCDHHCHNHL YPRPYPKLSS DREEHGPFCG PYPMAPPNGH DGSWDRPMPA YTMKVNDGKY AKPGDFPWQV TLKRLSPHYT ICLPRKEAES AYEKPPYPRG GIVSWGSMNC

CGSVATPLGP FTHFDLELSH SSLDITFRSD GGFYCSCRAG CTYSISLEEG KTLPHRIETK VSPVQAKYIL CSIVDCGPPD VCEADGFWTS LILGGTTAAG QAWSEAVFIH FMRTDDIGTA SVTANMLCAG GEAGQYGVYT

KWPEPVFGRL LCEYDFVKLS YSNEKPFTGF YVLHRNKRTC FSVILDFVES SNTVTITFVT KDSFSIFCET DLPSGRVEYI SKGEKSLPVC ALLYDNWVLT EGYTHDAGFD SGWGLTQRGF LESGGKDSCR KVINYIPWIE

ASPGFPGEYA SGAKVLATLC EAFYAAEDID SALCSGQVFT FDVETHPETL DESGDHTGWK GYELLQGHLP TGPGVTTYKA EPVCGLSART AAHAVYEQKH NDIALIKLNN LARNLMYVDI GDSGGALVFL NIISDF

NDQERRWTLT GQESTDTERA ECQVAPGEAP QRSGELSSPE CPYDFLKIQT IHYTSTAQPC LKSFTAVCQK VIQYSCEETF TGGRIYGGQK DASALDIRMG KWINSNITP PIVDHQKCTA DSETERWFVG

50 10 0 150 2 00 2 50 3 00 3 50 40 0 450 500 55 0 60 0 65 0

The leader sequence and the activation peptide bond (RI) are underlined.

Protein modules^

Leader peptide 1-15 CUB 16-135 136-182 EGF-Ca2183-292 CUB CCP 293-362 CCP 363-431 432-444 Linker peptide 445-686 Serine protease domain Catalytic triad: H483, D532, S633.

Chromosomal location Human^: chromosome 1.

cDNA sequence^ CTCGTGCAAT CTTCTGTGTG CGCCTGGCAT CTGACTGCAC TCCCACCTCT CTGTGCGGGC CTGGGCTCCA GGGTTCGAGG GCGCCCACCT GCAGGCTACG TTCACCCAGA TCCAGTTGCA GAGTCCTTCG CAAACAGACA ACAAAAAGCA TGGAAGATCC GGCCACGTTT GAGACTGGCT CAGAAAGATG CCTGATGATC AAAGCTGTGA AAATATGTGT GTCTGTGAGC CAAAAGGCAA GCAGGTGCAC AAACATGATG TATACACAAG TTTGACAATG ACGCCTATTT ACTGCATCTG GACATACCGA AGGGGAAGTG TGCAGAGGTG GTGGGAGGAA TACACAAAAG GCGTGTCTGC CGTGGCTCGA AACAGACTCC CCATTGACTC ATGTCACTGC GTTGGCATTT AAAAAAAAAA

TCGGCACGAG GCTCGGTGGC CCCCCGGCTT CCCCCGGCTA GCGAGTACGA AGGAGAGCAC GCCTGGACAT CCTTCTATGC GCGACCACCA TCCTGCACCG GGTCTGGGGA CTTACAGCAT ATGTGGAGAC GAGAAGAACA ACACGGTGAC ACTACACGAG CACCTGTGCA ATGAGCTTCT GATCTTGGGA TACCCAGTGG TTCAGTACAG GTGAGGCTGA CTGTTTGTGG AACCTGGTGA TTTTATATGA CATCCGCCCT CCTGGTCTGA ACATAGCACT GTCTGCCAAG GATGGGGATT TTGTTGACCA TAACTGCTAA ACAGCGGAGG TAGTGTCCTG TTATTAACTA AGTCAAGGAT GAAGCATTCA AGGTGAGGCT AAGGGGACAT TCAAATTACA CTGTAAACTG AAAAA

GCTGGACGGG CACCCCCTTG TCCAGGGGAG CCGCCTGCGC CTTCGTCAAG AGACACGGAG TACCTTCCGC AGCCGAGGAC CTGCCACAAC TAACAAGCGC GCTCAGCAGC CAGCCTGGAG ACACCCTGAA TGGCCCATTC CATCACCTTT CACAGCGCAG AGCCAAATAC GCAAGGTCAC CCGGCCAATG CCGAGTGGAG CTGTGAAGAG TGGATTCTGG ACTATCAGCC TTTTCCTTGG CAACTGGGTC GGACATTCGA AGCTGTTTTT GATTAAATTG AAAAGAAGCT AACCCAAAGG TCAAAAATGT CATGCTTTGT GGCACTGGTG GGGTTCCATG TATTCCCTGG TCTTCATTTT TCATTACTGT GCTGTCATTT AAACCACGAG TTTCATTACC CCTGTCCATG

CACACCATGA GGCCCGAAGT TATGCCAATG CTCTACTTCA CTGAGCTCGG CGGGCCCCTG TCCGACTACT ATTGACGAGT CACCTGGGCG ACCTGCTCAG CCTGAATACC GAGGGGTTCA ACCCTGTGTC TGTGGGAAGA GTCACAGATG CCTTGCCCTT ATCCTGAAAG TTGCCCCTGA CCCGCGTGCA TACATCACAG ACCTTCTACA ACGAGCTCCA CGCACAACAG CAAGTCCTGA CTAACAGCTG ATGGGCACCC ATACATGAAG AATAACAAAG GAATCCTTTA GGTTTTCTTG ACTGCTGCAT GCTGGCTTAG TTTCTAGATA AATTGTGGGG ATCGAGAACA TAGAAATGCC GGACATGGCA CTCCACTTGC AGTGACAGTC TTAAAAAGCC CTCTTTGTTT

GGCTGCTGAC GGCCTGAACC ACCAGGAGCG CCCACTTCGA GGGCCAAGGT GCAAGGACAC CCAACGAGAA GCCAGGTGGC GTTTCTACTG CCCTGTGCTC CACGGCCGTA GTGTCATTCT CCTACGACTT CATTGCCCCA AATCAGGAGA ATCCGATGGC ACAGCTTCTC AATCCTTTAC GCATTGTTGA GTCCTGGAGT CAATGAAAGT AAGGAGAAAA GAGGGCGTAT TATTAGGTGG CTCATGCCGT TGAAAAGACT GTTATACTCA TTGTAATCAA TGAGGACAGA CTAGAAATCT ATGAAAAGCC AAAGTGGGGG GTGAAACAGA AAGCAGGTCA TAATTAGTGA TGTGAAGACC GTTGTTGCTC CAGTTTAATT ATCTTTGCCC AGTCTCTTTT TTAAACTTGT

CCTCCTGGGC TGTGTTCGGG GCGCTGGACC CCTGGAGCTC GCTGGCCACG TTTCTACTCG GCCGTTCACG CCCGGGAGAG CTCCTGCCGC CGGCCAGGTC TCCCAAACTC GGACTTTGTG TCTCAAGATT CAGGATTGAA CCACACAGGC GCCACCTAAT CATCTTTTGC TGCAGTTTGT CTGTGGCCCT GACCACCTAC GAATGATGGT ATCACTCCCA ATATGGAGGG AACCACAGCA CTATGAGCAA ATCACCTCAT TGATGCTGGC TAGCAACATC TGACATTGGA AATGTATGTC ACCCTATCCA CAAGGACAGC GAGGTGGTTT GTATGGAGTC TTTTTAACTT TTGGCAGCGA CACCCAAAAA CCAGCCTTAC ACCCAGTGTA CATACTGGCT TCTTATTGAA

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460

The methionine initiation codon (ATG), the termination codon (TAAJ and the polyadenylation signal (CATAAA) are double underlined.

Accession numbers Human2

D

Y09926

Deficiency No deficiency has been identified.

D

Polymorphic variants No polymorphic variants have been identified.

References 1 Ji, Y.H. et al. (1988) J. Immunol. 141, 4271-4275. 2 Thiel, S. et al. (1997) Nature 386, 506-510. 5 Stover, CM. et al. (1999) J. Immunol. 162, 3481-3490.

Factor D

C3 convertase activator, adipsin, EC 3.4.21.46

Jurg Schifferli and Sylvie Miot, Department of Immunonephrology, Kantonsspital Basel, Basel, Switzerland

Physicochemical properties Factor D is synthesized as a precursor single-chain molecule of 246 amino acids including a 13 amino acids partial signal peptide and a five amino acid potential activation peptide. Mature protein: pP'2 74 Mr(Kp predicted 24.4 observed 25 N-linked glycosylation sites'^ none In rodents, adipsin is heavily glycosylated.

Structure Factor D is the first serine protease whose three-dimensional structure has been determined. The structure was solved at 0.2 n m resolution by a combination of multiple isomorphous and molecular replacement methods^. The structure shows the characteristic chymotrypsin structural fold, but presents certain key differences from other serine proteases in the conformation of residues of the catalytic triad and the substrate-binding region^. Regulation of factor D activity depends on reversible conformational changes for expression and control of catalytic activity. These changes are believed to be induced by the single natural substrate, C3bB, and to result in alignment of the catalytic triad, the specificity pocket and the non-specific substrate-binding site, all of which have atypical conformations^. Mutational studies identified structural elements responsible for these structural features of factor D*'^.

Function Factor D is unique among serine proteases in that it requires neither enzymatic cleavage for expression of proteolytic activity nor activation by a serpin for its controP. Factor D is a protein of the alternative complement pathway. It is a highly specific serine protease and cleaves factor B bound to C3b, generating the C3bBb enzyme which is the alternative pathway C3 convertase. It cleaves R259-K260 of substrate factor B only when B is complexed with activated C3. It is the rate-limiting enzyme of the alternative pathway. Factor D may play some particular role in fatty tissue distinct from its role as a complement protein.

Tissue distribution Serum protein: 2 /xg/ml in plasma. Primary site of synthesis: adipose tissue^*^. Secondary sites: monocytic macrophage lineage^^.

Factor D

Regulation of expression In astroglioma cell line^^, IFN7 exerts no effect on the level of factor D mRNA. In mouse, adipsin is deficient in several genetically and acquired models of obesity^2,i3 insulin reduces adipsin in vitro and is negatively correlated with adipsin expression in vivo^"^.

Protein sequence AVLVLLGAAA CAARPRGRIL GGREAEAHAR

PYMASVQLNG AHLCGGVLVA

50

EQWVLSAAHC

LEDAADGKVQ VLLGAHSLSQ

PEPSKRLYDV LRAVPHPDSQ

100

PDTIDHDLLL

LQLSEKATLG

PAVRPLPWQR VDRDVAPGTL CDVAGWGIVN

HAGRRPDSLQ HVLLPVLDRA TCNRRTHHDG AITERLMCAE

SNRRDSCKGD

SGGPLVCGGV L E G W T S G S R

IDSVLA

VCGNRKKPGI YTRVASYAAW

150 2 00

The leader sequence is underlined and the potential activation peptide is double-underlined (RPRGR). Conflicting sequence data (above sequence stated first): I19M^^ HISW, M33V^^ H42E^6,i7 G45A^«, Q 5 2 R ^ S56T^^ D66G^^ HSLS76-79 -> THLP«, HS76-77 ^ ST^^ deletion LY87-88^^ D89E^^ Q129G^ TCNRRTHHDGAITE171-184^KCRLYDVL^^ S236T^«, S243H^«, deletion 5243^^

Protein modules 1-13 Leader sequence 14-18 Potential activation peptide 19-246 Serine protease domain Catalytic triad: H59, D105, S201.

Chromosomal location Exact location and structure of the human factor D gene not reported. May lie on the X chromosome.

cDNA sequence GCAGTTCTGG GGCGGCAGAG GCGCACCTGT CTGGAGGACG CCGGAGCCCT CCCGACACCA CCTGCTGTGC TGCGACGTGG CACGTGCTCT GCCATCACCG TCCGGGGGCC GTTTGCGGCA ATCGACAGCG AGTCCCGAGC ATATGCAGAA ATGGGCCACG GGGCATGGAG TTGAACGCAG

TCCTCCTAGG AGGCCGAGGC GCGCAGGCGT CGGCCGACGG CCAAGCGCCT TCGACCACGA GCCCCCTGCC CCGGCTGGGG TGCCAGTGCT AGCGCTTGAT CGCTGGTGTG ACCGCAAGAA TCCTGGCCTA AATGAAGTCA GGGGAGGCCG TAGCGCGACT GTGGGTGCTT GAGGCTGAGG

AGCGGCCGCC GCACGCGCGG CCTGGTGGCG GAAGGTGCAG GTACGACGTG CCTCCTGCTG CTGGCAGCGC CATAGTCAAC GGACCGCGCC GTGCGCGGAG CGGGGGCGTG GCCCGGGATC GGGTGCCGGG TCCACTCCTG AGGTGGGAGG CCATCTCTAC GTAGTTCCAG CTGCAGTGAG

TGCGCGGCGC CCCTACATGG GAGCGGTGGG GTTCTCCTGG CTCCGCGCAG CTACAGCTGT GTGGACCGCG CACGCGGGCC ACCTGCAACC AGCAATCGCC CTCGAGGGCG TACACCCGCG GCCTGAAGGT CATCTGGTTG ATCATTGGAT AAATAAATAA CTACTCAGGA TTGTGATTGC

GGCCCCGTGG TCGGATGCTG 60 CGTCGGTGCA GCTGAACGGC 12 0 TGCTGAGCGC GGCGCACTGC 180 GCGCGCACTC CCTGTCGCAG 2 40 TGCCCCACCC GGACAGCCAG 3 00 CGGAGAAGGC CACACTGGGC 3 60 ACGTGGCACC GGGAACTCTC 42 0 GCCGCCCGGA CAGCCTGCAG 4 80 GGCGCACGCA CCACGACGGC 540 GGGACAGCTG CAAGGGTGAC 600 TGGTCACCTC GGGCTCGCGC 660 TGGCGAGCTA TGCGGCCTGG 72 0 CAGGGTCACC CAAGCAACAA 780 GTCTTTATTG AGCACCTACT 84 0 CTCAGGAGTT GGAGATCAGC 900 AAATTAGCTG GGCAATTGGC 9 60 GGCTGAGGTG GGAGGATGAC 102 0 ACCACTGCCCT

Factor D

The methionine initiation codon (ATG), the termination codon (TAG) and the probable polyadenylation signal (AATAAA) are indicated.

D

Genomic structure^^ The human gene structure is unknown.

Accession numbers Human5^6,io,i5-i9 Mouse2o-22

Rat23-24 p^g25-26

M84526 Gl78626 Ml 1768 G202167 X04673 G581866 M13386 G387105 S73894 G693722 U29948 G915533 Z49058 G773265

Deficiency^^ X-linked mode of inheritance is suggested. One case of patient with a complete deficiency in factor D has been reported. The patient suffers from recurrent Neisseria meningitidis and Neisseria gonorrhoeae infections. No activity of the alternative pathway is reported (reduced capacity of the patient to promote phagocytosis of Escherichia coh). Decreased haemolytic complement activity.

Polymorphic variants Factor D is unusual among serum protein in having a single-banded pattern in isoelectric focusing. One variant was found in three people of West African parentage among 120 tested^^. Presence of the variant band was associated with weakening of the normal band. Both bands are of equal strength^^, indicating that variants are inherited in the usual autosomal codominant way. References ' Lesavre, P.H. et al. (1979) J. Immunol. 123, 529-534. 2 Gotze, O. et al. (1976) Adv. Immunol. 24, 1-35. ^ Volanakis, J.E et al. (1998) In The Human Complement System in Health and Disease, chapter 4 (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 48-82. ^ Tomana, M. et al. (1985) Mol. Immunol. 22, 107-111.

5 Narayana, S.V.L. et al. (1994) J. Mol. Biol. 235, 695-708. 6 Davis, A.E. (1980) Proc. Natl Acad. Sci. USA 11, 4938-4942. 7 Volanakis, J.E. et al. (1996) Protein Sci. 5, 553-564. « Kim, S. et al. (1995) Biochemistry^ 33, 14393-14399. ^ Kim, S. et al. (1995) J. Immunol. 154, 6073-6079. ^« White, R.T. et al. (1992) J. Biol. Chem. 267, 9210-9213. ^^ Barnum, S.R. et al. (1992) Biochem J. 287, 595-601. ^2 Rosen, B.S. et al. (1989) Science 244, 1483-1487. ^^ Flier, J.S. et al. (1987) Science 237, 405-408. ^^ Miner, J.L. et al. (1993) Physiol. Behav. 54, 207-212. ^5 Volanakis, J.E. et al. (1980) Proc. Natl Acad. Sci. USA 11, 1116-1119. ^6 Johnson, D.M.A. et al. (1984) FEBS Lett. 166, 347-351. ^7 Johnson, D.M.A. et al. (1980) Biochem. J. 187, 863-874. ^s Niemann, M.A. et al. (1984) Biochemistry 23, 2482-2486. ^^ Kim, S. et al. (1995) J. Biol. Chem. 270, 24399-24405. 20 Phillips, M. et al. (1986) J. Biol. Chem. 261, 10821-10827. 2^ Min, H.Y. et al. (1986) Nucleic Acids Res. 14, 8879-8892. 22 Cook, K.S. et al. (1985) Proc. Natl Acad. Sci. USA 82, 6480-6484. 23 Zhu, L. et al. ( 1994) J. Clin. Invest. 94, 1163-1171. 24 Baker, B.C. et al. (1991) Biochem. J. 279, 115-119. 25 Miner, J.L. et al. (1995) Submitted to EMBL/GenBank. 26 Nicolas, N. (1995) Submitted to EMBL/GenBank. 27 Hiemstra, P.S. et al. (1989) J. Clin. Invest. 84, 1957-1961. 28 Hobart, M.J. et al. (1976) Transplant. Rev. 32, 26-42. 29 Martin, A. et al. (1976) Immunochemistry 13, 317-324.

EC 3.4.21.43

Yuanyuan Xu, University of Alabama at Birmingham, AL, USA and John E. Volanakis, BSCR "A. Fleming'', Vari, Greece

Physicochemical properties C2 is a single polypeptide chain glycoprotein synthesized as a pre molecule of 752 amino acid residues including a 20 amino acid leader peptide. Mature protein contains 15.9% carbohydrate^. Mature protein: pF M,(K) predicted observed N-linked glycosylation sites^ (probably all occupied)

-6.0-6.3 (depending on isoform) 81 100 8 (29, 112, 290, 333, 467, 471, 621, 651)

Structure C2 is structurally similar to factor B, exhibiting 39% amino acid residue identity with it and a similar trilobed structure demonstrated by electron microscopy^. The N-terminal lobe of C2 is formed by the C2b region, which is cleaved off during cleavage/activation by Cls. The middle lobe is formed by the von Willebrand factor type A module and the C-terminal one by the serine protease domain.

Function C2 provides the catalytic subunit for the C3 and C5 convertases of the classical and lectin pathways of complement activation. Formation of the C3 convertase requires the Mg^'-'dependent binding of C2 to activatorattached C4b and the subsequent cleavage of C2 by Cls or MASP. The C2a fragment remains bound to C4b and expresses proteolytic activity against C3 only in the context of the C4b2a complex. Attachment of a C3b fragment to the C3 convertase generates the trimolecular complex C4b2a3b, which is the C5 convertase. Both enzymes share the same active centre provided by the serine protease domain of C2a. The proteolytic activity of the convertases is restricted to single peptide bonds on C3 and C5, respectively.

Tissue distribution Serum protein: 11-35/ig/ml in serum^. Primary site of synthesis: liver^. Secondary sites: monocytes/macrophages^'^, fibroblasts^, type II alveolar epithelial cells^^ and astroglioma cells^^

Regulation of expression C2 synthesis by macrophages, fibroblasts, alveolar type II epithelial cells, astroglioma cells and hepatoma cells is induced mainly in response to IFN^io,i2,/3 jn most cells, IFN7 enhances the transcriptional rate of the gene so that mRNA increases even though its half-life may be shortened^^. Steroids upregulate C2 synthesis by human monocytes by stabilizing the C2 message^^.

Protein sequence^-'^ MGPLMVLFCL CPQGLYPSPA TPRLGSYPVG CPNPGISLGA EPICRQPYSY GHLNLYLLLD EPKVLMSVLN NNQMRLLGME KRNDYLDIYA HMLDVSKLTD WVLTAAHCFR NQGILEFYGD RDHENELLNK NLTDVREWT LYNPCLGSAD PL

LFLYPGLADS SRLCKSSGQW GNVSFECEDG VRTGFRFGHG DFPEDVAPAL CSQSVSENDF DNSRDMTEVI TMAWQEIRHA IGVGKLDVDW TICGVGNMSA DGNDHSLWRV DIALLKLAQK QSVPAHFVAL DQFLCSGTQE KNSRKRAPRS

APSCPQNVNI QTPGATRSLS FILRGSPVRQ DKVRYRCSSN GTSFSHMLGA LIFKESASLM SSLENANYKD IILLTDGKSN RELNELGSKK NASDQERTPW NVGDPKSQWG VKMSTHARPI NGSKLNINLK DESPCKGESG KVPPPRDFHI

SGGTFTLSHG KAVCKPVRCP CRPNGMWDGE LVLTGSSERE TNPTQKTKES VDRIFSFEIN HENGTGTNTY MGGSPKTAVD DGERHAFILQ HVTIKPKSQE KEFLIEKAVI CLPCTMEANL MGVEWTSCAE GAVFLERRFR NLFRMQPWLR

WAPGSLLTYS APVSFENGIY TAVCDNGAGH CQGNGVWSGT LGRKIQIQRS VSVAIITFAS AALNSVYLMM HIREILNINQ DTKALHQVFE TCRGALISDQ SPGFDVFAKK ALRRPQGSTC WSQEKTMFP FFQVGLVSWG QHLGDVLNFL

50 10 0 15 0 2 00 2 50 3 00 3 50 400 450 500 550 60 0 65 0 7 00 7 50

The leader sequence and the cleavage site (RK) between C2b and C2a are underlined. Potential N-linked glycosylation sites (probably all occupied) are indicated (N).

Protein modules 1-20 Leader peptide 21-85 CCPl CCP2 86-147 148-206 CCP3 Linker 207-236 VWFA 237-452 Linker 453-466 Serine protease domain 467-752 Catalytic triad: H507, D561, S679.

exon 1 exon 2 exon 3 exon 4 exon 5 exon 6-10 exon 11 exon 11-1

Chromosomal location Human^^'^^: 6p21.3 (MHC class III region). Telomere ... HSP70 ... C2 ... Bf ... Centromere Mouse^*: chromosome 17. Telomere ... HSP70 ... C2 ... Bf ... Centromere

cDNA sequence3,15 GGGGGCAGTA GGGAGACAGG CTAACAGAAG AGCTGCTGAC CTACCTCTCG CTGTTCCTGT TCGGGTGGCA TGCCCCCAGG CAGACCCCAG GCCCCTGTCT GGCAATGTGA TGTCGCCCCA TGCCCCAACC GACAAGGTCC TGCCAGGGCA GACTTCCCTG ACCAATCCCA GGTCATCTGA CTCATCTTCA GTGAGCGTTG GACAACTCCC CATGAAAATG AACAACCAAA ATCATCCTTC CATATCAGAG ATCGGGGTGG GATGGTGAGA CATATGCTGG AACGCCTCTG ACCTGCCGGG GATGGCAACG AAAGAATTGC AACCAGGGAA GTAAAGATGT GCTCTGCGGA CAGAGTGTTC ATGGGAGTGG AACTTGACAG GATGAGAGTC TTTTTTCAGG AAAAACTCCC AATCTCTTCC CCCCTCTAGC CTTCTACCTC AATCCGGGTC TACCAGCAGG CCCTGGTTGA TAATAAAAAT

CACAAAGCCT GCAAAGGTTT ACCATCCCCC CCAGCTCTAG CCGCCCCTAG ACCCAGGTCT CCTTCACCCT GCCTGTACCC GAGCCACCCG CCTTTGAGAA GCTTCGAGTG ACGGCATGTG CAGGCATTTC GCTATCGCTG ACGGGGTCTG AGGACGTGGC CCCAGAAGAC ACCTCTACCT AGGAGAGCGC CCATTATCAC GGGATATGAC GAACTGGGAC TGCGACTCCT TGACAGATGG AGATCCTGAA GCAAGCTGGA GGCATGCCTT ATGTCTCCAA ACCAGGAGAG GGGCCCTCAT ACCACTCCCT TTATTGAGAA TCCTGGAGTT CCACCCATGC GACCTCAAGG CTGCTCATTT AGTGGACAAG ATGTCAGGGA CCTGCAAGGG TGGGTCTGGT GCAAAAGGGC GCATGCAGCC CATGGCCACT TGAATGGCCA TCTAGGATGC ACTGCCTCGC CTTGACTCAT CAATGGTTTC

GTGGGGGAGA CACCCTTCAG TTGCCACTCC TTTTCGGGAA GGAGGACACC GGCAGACTCG CAGCCATGGC ATCCCCAGCA GTCTCTGTCT TGGCATTTAT TGAGGATGGC GGATGGAGAA ACTGGGCGCA CTCCTCGAAT GAGTGGAACG CCCTGCCCTG AAAGGAAAGC GCTCCTGGAC CTCCCTCATG CTTTGCCTCA TGAGGTGATC TAACACCTAT CGGCATGGAA AAAGTCCAAT CATCAACCAG TGTGGACTGG CATTCTGCAG GCTCACAGAC GACACCCTGG CTCCGACCAA GTGGAGGGTC GGCGGTGATC CTATGGTGAT CAGGCCCATC CAGCACCTGT TGTCGCCTTG CTGTGCCGAG GGTGGTGACA AGAATCTGGG GAGCTGGGGT CCCTCGTAGC CTGGCTGAGG GAGCCCTCTG CCCTTAGACC CAGAGGCAGC TGCCCCACCT GCTTGTTTCA GAG

TCTATTGACC TTCAGTCCCC CTGGTTTTTC GTCAGATGAC ATGGGCCCAC GCTCCCTCCT TGGGCTCCTG TCACGGCTGT AAGGCGGTCT ACCCCACGGC TTCATATTGC ACAGCTGTGT GTGCGGACAG CTTGTGCTCA GAGCCCATCT GGCACTTCCT CTGGGCCGTA TGTTCGCAGA GTGGACAGGA GAGCCCAAAG AGCAGCCTGG GCGGCCTTAA ACGATGGCCT ATGGGTGGCT AAGAGGAATG AGAGAACTGA GACACAAAGG ACCATCTGCG CATGTCACTA TGGGTCCTGA AATGTGGGAG TCCCCAGGGT GACATAGCTC TGCCTTCCCT AGGGACCATG AATGGGAGCA GTTGTCTCCC GACCAGTTCC GGAGCAGTTT CTTTACAACC AAGGTCCCGC CAGCACCTGG CTGCCCTGCC CTGTGATCCA GCACACAAGC CCCGCTCCTT CTTTCACATG

CTATAGATAT AATCCCTGCT TTCTCTGGCA CTTTTCCCTC TGATGGTTCT GCCCTCAGAA GGAGCCTTCT GCAAGAGCAG GCAAACCTGT TGGGGTCCTA GGGGCTCGCC GTGATAATGG GCTTCCGCTT CGGGGTCTTC GCCGCCAACC TCTCCCACAT AAATCCAAAT GTGTGTCGGA TCTTCAGCTT TCCTCATGTC AAAATGCCAA ACAGTGTCTA GGCAGGAAAT CTCCCAAGAC ACTATCTGGA ATGAGCTAGG CTCTGCACCA GGGTGGGGAA TTAAGCCCAA CAGCAGCTCA ACCCCAAATC TTGATGTCTT TGCTGAAGCT GCACGATGGA AGAATGAACT AACTGAACAT AAGAAAAAAC TATGCAGTGG TCCTTGAGCG CCTGCCTTGG CGCCACGAGA GGGATGTCCT AGAATCTGGG TCCTCTCTCC TGGGAAATCC GGCCTGTCCC GAATTTCCCA

ATTAGCATCA TATTATTTCC GCAATGAAGC CCGCGGCTCT TTTTTGCCTG CGTGAATATC CACCTACTCC CGGACAGTGG GCGCTGTCCA TCCCGTGGGT TGTGCGTCAG GGCTGGCCAC TGGTCATGGG GGAGCGGGAG CTACTCTTAT GCTTGGGGCC CCAGCGCTCT AAATGACTTT TGAGATCAAT TGTCCTGAAC CTATAAAGAT TCTCATGATG CCGACATGCC AGCTGTTGAC CATCTATGCC GTCCAAGAAG GGTCTTTGAA CATGTCAGCA GAGCCAAGAG TTGCTTCCGC CCAGTGGGGC TGCCAAAAAG GGCCCAGAAA GGCCAATCTG GCTGAACAAA TAACCTTAAG CATGTTCCCC GACCCAGGAG GAGATTCAGG CTCTGCTGAC CTTTCACATG GAATTTTTTA GCCCCTCCAT TAGCTGAGTA TCAGGGCTCC CAGATTCCTT GTTATGAAAT

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820

The first five nucleotides in each exon are underUned to indicate intron-exon boundaries. The methionine initiation codon (ATG), the termination codon (TAG) and the polyadenylation signal (AATAAA) are indicated.

Genomic structure^^ The gene spans 18kb of DNA and contains 18 exons. The poly(A) signal is located only 421 bp upstream of the cap site of the factor B gene^^. The largest intron (4.5 kb) follows exon 3 and contains a human-specific^^ SINE-type retroposon, derived from the human endogenous retrovirus HERV-KIO^^ This retroposon, termed SINE-R.C2, is associated with a variable number tandem repeats (VNTR) locus, which gives rise to a multiallelic RFLP. 1kb 18

lf+

III

I

I—h

411

SINE.R-C2

Accession numbers Human^'^5,19 Mouse^^

cDNA X04481 K1236 M26301 M57891 J05661

Genomic L09706 L09707 L09708 M60563-M60579 J05661

Deficiency^"^'^^ Autosomal recessive. This is the most common complement component genetic deficiency among individuals of European descent with an estimated frequency of the null allele of about 1%. Deficient individuals have an increased incidence of SLE and SLE-like syndromes, glomerulonephritis, vasculitis and pyogenic infections. Two molecular mechanisms can cause C2 deficiency: In type I deficiency no protein is detected either in the blood or intracellularly. In type II deficiency a mutant C2 polypeptide is retained intracellularly. Mutations identified: Type I: 28 bp deletion removes 9bp of the 3' end of exon 6 and 19bp of the 5' end of the adjoining intron. This causes skipping of exon 6 during RNA splicing, resulting in a shift of the reading frame and a premature termination codon^^. In linkage disequilibrium with HLA haplotype, HLAA25, B18, Drw227. Many cases described. Type IF8^29.

G662 to A, C131 to Y; one patient C896 to T, S209 to F; one patient G1660 to A, G464 to R; one patient

Polymorphic variants A common (C) and two relatively rare allotypes (A, B) have been identified by isoelectric focusing. In addition, a number of acidic (A) and basic (B)

variants have been described using polyacrylamide gel isoelectric focusing followed by western blotting and immunofixation^^. At the gene level a multiallelic and several dimorphic RFLPs have been described^^.

References ^ 2 ^ ^ 5 6 7 « 9 0

Tomana, M. et al. (1985) Mol. Immunol. 22, 107-111. Alper, C.A. (1976) J. Exp. Med. 144, 1111-1115. Bentley, D.R. (1986) Biochem. J. 239, 339-345. Smith, C.A. et al. (1984) J. Exp. Med. 159, 324-329. Oglesby, T.J. et al. (1988) J. Immunol. Methods 110, 55-62. Perlmutter, D.H. et al. (1984) J. Biol. Chem. 259, 10380-10385. Whaley, K. (1980) J. Exp. Med. 151, 501-516. Cole, F.S. et al. (1983) Clin. Immunol. Immunopathol. 27, 153-159. Katz. Y. et al. (1988) J. Exp. Med. 167, 1-14. Strunk, R.C. et al. (1988) J. Clin. Invest. 81, 1419-1426. Barnum, S.R. et al. (1992) Biochem. J. 287, 595-601. Strunk, R.C. et al. (1985) J. Biol. Chem. 260, 15280-15285. Lappin, D.F. et al. (1992) Biochem. J. 281, 437-442. Lappin, D.F. et al. (1991) Biochem. J. 280, 117-123. Horiuchi, T. et al. (1989) J. Immunol. 142, 2105-2111. Carroll, M.C. et al. (1984) Nature 307, 237-241. Carroll, M.C. et al. (1987) Proc. Natl Acad. Sci. USA 84, 8535-8539. Steinmetz, M. et al. (1986) Cell 44, 895-904. Ishii, Y. et al. (1993) J. Immunol. 151,170-174. 20 Wu, L.-C. et al. (1987) Cell 48, 331-342. 2^ Zhu, Z.-B. et al. (1994) Hum. Genet. 93, 545-551. 22 Zhu, Z.-B. et al. (1992) J. Exp. Med. 175, 1783-1787. 23 Ishikawa, N. et al. (1990) J. Biol. Chem. 265, 19040-19046. 2^ Glass, D. et al. (1976) J. Clin. Invest. 58, 853-861. 25 Johnson, C.A. et al. (1992) New Engl. J. Med. 326, 871-874. 26 Johnson, C.A. et al. (1992) J. Biol. Chem. 267, 9347-9353. 2^ Awdeh, Z.L. et al. (1981) J. Clin. Invest. 67, 581-583. 2s Wetsel, R.A. (1996) J. Biol. Chem. 271, 5824-5831. 29 Zhu, Z.-B. et al. (1998) J. Immunol. 161, 578-584. ^0 Hauptmann, G. et al. (1990) Complement Inflamm. 7, 252-254. 3^ Cross, S.J. et al. (1985) Immunogenetics 21, 39-48.

Factor B Antonella Circolo, University of Alabama Medical School, Birmingham, AL, USA and Harvey R. Colten, Northwestern University School of Medicine, Chicago, IL, USA

D

Other names Complement component C3/C5 convertase (alternative), properdin factor B, C3 proactivator, C3 convertase, C5 convertase, EC 3.4.21.47.

Physicochemical properties Factor B is synthesized as a single-chain molecule of 764 amino acids, including a leader peptide of 25 amino acids^'^. Mature protein: pP -5.6-6.1 Mr(K) predicted 83 observed 90 N-linked glycosylation sites (all occupied) 4 (122, 142, 285, 378)

Structure^ Electron microscopy analysis of the native molecule reveals a tri-lobed structure. In the assembled bimolecular complex C3 convertase, Bb appears as a bi-lobed molecule with only one lobe bound to the C3b, indicating that the N-terminal lobe is lost after cleavage by factor D. This structure is similar to that observed for C2 and the C4b-C2a complex. Amino acid sequence alignment of the Bf serine protease domain reveals high homology Mrith the serine protease domain of chymotrypsinogen, trypsin, human factor D, Clr, Cls, factor I and MASPs, and with murine Bf and C2.

Function Bf provides the catalytic subunit of the C3/C5 convertases of the alternative complement pathway. Assembly of the C3 convertase (C3bBb) requires binding of Bf to C3b (C3.H2O) and factor D-mediated cleavage of bound Bf resulting in the release of Ba (M, (K) -33). The C3 convertase is stabilized by the binding of properdin. This provides a positive amplification loop for the classical and alternative complement pathways. After cleavage of C3, the C5 convertase ((C3b)2Bb) is formed. The initial binding of Bf to C3b is mediated by two binding sites, one on Ba (mapped on the second and third CCP domain)^, and the other (Mg^^ or Ni^+ ion dependent) on the VWFA domain of Bb and similar to the metal ions-binding domain of C2^. In addition to complement activation, Bf proteolytic fragments participate in other immunological functions (macrophage spreading, monocyte-mediated cytotoxicity and B lymphocyte proliferation). The C3 site cleaved by the active C3bBb: L744-G-L-A-R-S-N-L-D752.

Tissue distribution Plasma concentration: 180jag/ml (range 74-286)^. (Serum concentration significantly higher in homozygous BF*F, lower in homozygous BF*S and intermediate in heterozygous individuals*.) Primary site of synthesis: liver. Secondary sites: mononuclear phagocytes, fibroblasts, endothelial cells, alveolar type II epithelial cells^.

Regulation of expression Bf is an acute-phase protein. Serum Bf levels increase during inflammation. Bf expression is increased in a time- and concentration-dependent fashion by LPS and cytokines (particularly IL-la and f^, TNFcc and, to a lesser extent, IL-6 and IFN/and E). Polypeptide growth factors (PDGF, EGF and FGF)^^ and IL-4^^ counter-regulate the stimulation induced by cytokines (IL-1, lL-6 and TNFa). The effects of the cytokines and growth factors are exerted primarily at pretranslational levels and are gene- and cellspecific^'^2 Cis-acting elements for the IL-1 and IFN7 responsiveness and for constitutive expression have been localized in the 5'-flanking region of the Bf gene; constitutive expression is controlled by the nuclear factor HNF-4.

Protein sequence^^ MGSNLSPOLC LLQEGQALEY IHCPRPHDFE WSGQTAICDN QRRTCQEGGS HGPGEQQKRK ASYGVKPRYG SGTNTKKALQ VIDEIRDLLY HVFKVKDMEN VIRPSKGHES EWLFHPNYN EGTTRALRLP DKKGSCERDA GPLIVHKRSR WLKEKLQDED

LMPFILGLLS VCPSGFYPYP NGEYWPRSPY GAGYCSNPGI WSGTEPSCQD IVLDPSGSMN LVTYATYPKI AVYSMMSWPD IGKDRKNPRE LEDVFYQMID CMGAWSEYF INGKKEAGIP PTTTCQQQKE QYAPGYDKVK FIQVGVISWG LGFL

GGVTTTPWSL VQTRTCRSTG YNVSDEISFH PIGTRKVGSQ SFMYDTPQEV lYLVLDGSDS WVKVSEADSS DVPPEGWNRT DYLDVYVFGV ESQSLSLCGM VLTAAHCFTV EFYDYDVALI ELLPAQDIKA DISEWTPRF WDVCKNQKR

AQPQGSCSLE SWSTLKTQDQ CYDGYTLRGS YRLEDSVTYH AEAFLSSLTE IGASNFTGAK NADWVTKQLN RHVIILMTDG GPLVNQVNIN VWEHRKGTDY DDKEHSIKVS KLKNKLKYGQ LFVSEEEKKL LCTGGVSPYA QKQVPAHARD

GVEIKGGSFR KTVRKAECRA ANRTCQVNGR CSRGLTLRGS TIEGVDAEDG KCLVNLIEKV EINYEDHKLK LHNMGGDPIT ALASKKDNEQ HKQPWQAKIS VGGEKRDLEI TIRPICLPCT TRKEVYIKNG DPNTCRGDSG FHINLFQVLP

50 10 0 150 2 00 2 50 3 00 3 50 400 45 0 500 550 600 65 0 700 750

Amino acid sequence of allelic variant BF*F, characterized by a Q at position 32 (BF*S: R32). The leader peptide and the factor D cleavage site (RK) are underlined. The four N-linked glycosylation sites are indicated (N).

Protein modules^^ 1-25 Leader peptide 34-100 CCP 101-161 CCP 163-220 CCP 221-252 Connecting segment 253-468 VWFA 469-481 Connecting segment 482-764 Serine protease domain Catalytic triad: H526, D576, S699.

exon 1/2 exon 2 exon 3 exon 4 exon 5 exon 5-10 exon 11 exon 11-18

Chromosomal location^^-^^ Human: 6p21.1-6p21.3 within MHC. Mouse: chromosome 17 within H-2. Major histocompatibility complex gene cluster (MHC class III), immediately 3' to the C2 gene (Bf mRNA start site: 421 bp downstream of the C2 polyadenylation site).

cDNA sequence^^'^^-2^ AGGGGAAGGG AGACAAGCAA TCCAACGCCA CTCTTGTCTG TCTCTGGAGG CTGGAGTACG TCTACGGGGT TGCAGAGCAA TCTCCCTACT CGGGGCTCTG TGTGACAACG GGCAGCCAGT CGTGGCTCCC TGCCAAGACT CTGACAGAGA AAGCGGAAGA TCAGACAGCA GAGAAGGTGG CCCAAAATTT CAGCTCAATG GCCCTCCAGG AACCGCACCC CCAATTACTG CCAAGGGAGG AACATCAATG ATGGAAAACC TGTGGCATGG AAGATCTCAG GAGTACTTTG AAGGTCAGCG AACTACAACA

AATGTGACCA AGCAAGCCAG TGGGGAGCAA GAGGTGTGAC GGGTAGAGAT TGTGTCCTTC CCTGGAGCAC TCCACTGTCC ACAATGTGAG CCAATCGCAC GAGCGGGGTA ACCGCCTTGA AGCGGCGAAC CCTTCATGTA CCATAGAAGG TCGTCCTGGA TTGGGGCCAG CAAGTTATGG GGGTCAAAGT AAATCAATTA CAGTGTACAG GCCATGTCAT TCATTGATGA ATTATCTGGA CTTTGGCTTC TGGAAGATGT TTTGGGAACA TCATTCGCCC TGCTGACAGC TAGGAGGGGA TTAATGGGAA

GGTCTAGGTC GACACACCAT TCTCAGCCCC CACCACTCCA CAAAGGCGGC TGGCTTCTAC CCTGAAGACT AAGACCACAC TGATGAGATC CTGCCAAGTG CTGCTCCAAC AGACAGCGTC GTGTCAGGAA CGACACCCCT AGTCGATGCT CCCTTCAGGC CAACTTCACA TGTGAAGCCA GTCTGAAGCA TGAAGACCAC CATGATGAGC CATCCTCATG GATCCGGGAC TGTCTATGTG CAAGAAAGAC TTTCTACCAA CAGGAAGGGT TTCAAAGGGA AGCACATTGT GAAGCGGGAC AAAAGAAGCA

TGGAGTTTCA CCTGCCCCAG CAACTCTGCC TGGTCTTTGG TCCTTCCGAC CCGTACCCTG CAAGACCAAA GACTTCGAGA TCTTTCCACT AATGGCCGGT CCGGGCATCC ACCTACCACT GGTGGCTCTT CAAGAGGTGG GAGGATGGGC TCCATGAACA GGAGCCAAAA AGATATGGTC GACAGCAGTA AAGTTGAAGT TGGCCAGATG ACTGATGGAT TTGCTATACA TTTGGGGTCG AATGAGCAAC ATGATCGATG ACCGATTACC CACGAGAGCT TTCACTGTGG CTGGAGATAG GGAATTCCTG

GCTTGGACAC GCCCAGCTTC TGATGCCCTT CCCAGCCCCA TTCTCCAAGA TGCAGACACG AGACTGTCAG ACGGGGAATA GCTATGACGG GGAGTGGGCA CCATTGGCAC GCAGCCGGGG GGAGCGGGAC CCGAAGCTTT ACGGCCCAGG TCTACCTGGT AGTGTCTAGT TAGTGACATA ATGCAGACTG CAGGGACTAA ACGTCCCTCC TGCACAACAT TTGGCAAGGA GGCCTTTGGT ATGTGTTCAA AAAGCCAGTC ACAAGCAACC GTATGGGGGC ATGACAAGGA AAGTAGTCCT AATTTTATGA

TGAGCCAAGC TCTCCTGCCT TATCTTGGGC GGGATCCTGC GGGCCAGGCA TACCTGCAGA GAAGGCAGAG CTGGCCCCGG TTACACTCTC GACAGCGATC AAGGAAGGTG GCTTACCCTG GGAGCCTTCC CCTGTCTTCC GGAACAACAG GCTAGATGGA CAACTTAATT TGCCACATAC GGTCACGAAG CACCAAGAAG TGAAGGCTGG GGGCGGGGAC TCGCAAAAAC GAACCAAGTG AGTCAAGGAT TCTGAGTCTC ATGGCAGGCC TGTGGTGTCT ACACTCAATC ATTTCACCCC CTATGACGTT

60 12 0 18 0 240 3 00 3 60 42 0 4 80 540 600 660 72 0 7 80 84 0 900 9 60 102 0 108 0 1140 12 00 12 60 132 0 13 80 1440 15 00 15 60 162 0 16 80 174 0 18 00 18 6 0

Factor B

cDNA sequence

continued

GCCCTGATCA CCCTGCACCG CAAAAGGAAG AAAAAGCTGA AGAGATGCTC CCTCGGTTCC GATTCTGGCG AGCTGGGGAG GCCCGAGACT GATGAGGATT AAAACAGCTG

TAAGCTGAAA TCGAGCTTTG TGCACAGGAT GGTCTACATC AGGCTATGAC AGGAGTGAGT AGTTCACAAG CTGCAAAAAC CCTCTTTCAA ATAAGGGGTT

AGCTCAAGAA AGGGAACAAC AGCTGCTCCC CTCGGAAGGA AATATGCCCC TTTGTACTGG GCCCCTTGAT TAGTGGATGT TTCACATCAA TGGGTTTTCT CGACAAC

TATGGCCAGA AGGCTTCCTC ATCAAAGCTC AAGAATGGGG AAAGTCAAGG CCCTATGCTG AGAAGTCGTT CAGAAGCGGC GTGCTGCCCT TCCTGCTGGA

CTATCAGGCC CAACTACCAC TGTTTGTGTC ATAAGAAAGG ACATCTCAGA ACCCCAATAC TCATTCAAGT AAAAGCAGGT GGCTGAAGGA CAGGGGCGTG

CATTTGTCTC TTGCCAGCAA TGAGGAGGAG CAGCTGTGAG GGTGGTCACC TTGCAGAGGT TGGTGTAATC ACCTGCTCAC GAAACTCCAA GGATTGAATT

192 0 19 80 2 04 0 210 0 2160 222 0 22 8 0 23 40 2 40 0 2 4 60

Complete cDNA sequence of the factor BF*F allele. The first five nucleotides in each exon are underlined to indicate intron-exon boundaries. The methionine initiation codon (ATG), the termination codon (TAAj and the polyadenylation signal (ATTAAA) are indicated.

Genomic structure^^'^^ The gene consists of 18 exons spanning 6kb of genomic DNA and has an exon-intron organization similar to the murine Bf and to human and murine C2. In the mouse tissue-specific utilization of an upstream initiation site leads to the expression of an additional, 302 bp longer transcript. Similar transcripts have not been identified in humans. 0.5kb I 1

•• n i l I I II IIII III! Accession numbers Human Mouse

cDNA L15702 (BF*F) X72875 (BF*S) M57890 J05660

Genomic AF019413

Deficiency With the exception of a preliminary report^^, no ''natural'' Bf deficiencies have been described in humans or animals. Recently, Bf genetically deficient mice have been generated by targeted disruption of the Bf gene. In pathogen-free conditions, Bf null mice exhibit no overt phenotype23'24

Polymorphic variants Four allelic variants BF*S (slow), BF*F (fast), BF*S0.7 and BF*F1 (allele frequency in Caucasian population 0.731, 0.239, 0.025 and 0.005,

Factor B

respectively) were first identified by differences of electrophoretic mobility in agarose at pH 8.6^^. Each allelic variant consists of five electrophoretic isoforms resulting from variable sialic acid content^^. Additional studies have identified approximately 11 allelic variants. No significant differences in haemolytic activity were found among Bf allelic variants. A224G; Q32R; F/S

References 1 2 3 ^ 5 6 7 « 9 ^« " ^2

^^ ^^ ^5 ^6 ^^ ^s ^^ 20 2^ 22 23 2^ 25 26

Lesavre, P.H. et al. (1979) J. Immunol. 123, 529-534. Niemann, M.A. et al. (1980) Biochemistry 19, 1576-1583. Curman, B. et al. (1977) Biochemistry 16, 5368-5375. Smith, C.A. et al. (1984) J. Exp. Med. 159, 324-329. Hourcade, D.E. et al. (1995) J. Biol. Chem. 270, 19716-19722. Tuckwell, D.S. et al. (1997) Biochemistry 36, 6605-6613. Oglesby, T.J. et al. (1988) J. Immunol. Meth. 110, 55-62. Mortensen, J.P. and Lamm, L.U. (1981) Immunology 42, 505-511. Colten, H.R. and Strunk, R.C. (1993) In Complement in Health and Disease (Whaley, K., Loos, M. and Weiler, J.M. eds). Academic Publisher, Dordrecht, pp. 127-158. Circolo, A. et al. (1990) J. Biol. Chem. 265, 5066-5071. Katz, Y. and Strunk, R.C. (1990) J. Immunol. 144, 4675-4680. Colten, H.R. and Gamier, G. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 217-240. Mole, J.E. et al. (1984) J. Biol. Chem. 259, 3407-3412. Volanakis, J.E. and Arlaud, G.J. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 49-81. Carroll, M.C. et al. (1984) Nature 307, 237-241. Dunham, I. et al. (1987) Proc. Natl Acad. Sci. USA 84, 7237-7241. Chaplin, D.D. et al. (1983) Proc. Natl Acad. Sci. USA 80, 6947-6951. Wu, L.C. et al. (1987) Cell 48, 331-342. Campbell, R.D. et al. (1984) Phil. Trans. R. Soc. Lond. B306, 367-378. Woods, D.E. et al. (1982) Proc. Natl Acad. Sci. USA 79, 5661-5665. Ishikawa, N. et al. (1990) J. Biol. Chem. 265, 19040-19046. Densen, P. et al. (1996) Mol. Immunol. 33, 68 (Abstract). Matsumoto, M. et al. (1997) Proc. Natl Acad. Sci. USA 94, 8720-8725. Taylor, P.R. et al. (1998) J. Biol. Chem. 273, 1699-1704. Alper, C.A. et al. (1972) J. Exp. Med. 135, 66-80. Gamier, G. et al. (1988) Complement 5, 77-88.

Factor I

C3b inactivator, C 3 b / C 4 b inactivator, EC 3.4.21.45

Bernard J. Morley, Imperial College School of Medicine, Hammersmith Campus, London, UK

Physicochemical properties Factor I is synthesized as a prepro single-chain molecule of 583 amino acids including an 18 amino acid leader sequence and a four amino acid cleavage site. Mature protein: pP -4.5-6 (depending on isoform) -7.5 (after neuraminidase treatment)

Amino acids M, (K) predicted observed N-linked glycosylation sites^ (all occupied) Interchain disulfide bonds 1 heavy-light

Heavy chain 19-335 35.4 50

Light chain 340-583 27.6 38

3(70, 103, 177)

3 (474, 494, 536)

? (33, 284, 327)

453

Structure Factor I has a bilobed structure, each lobe formed by one chain shown to be 13 ± 1.4 n m by electron microscopy^, confirmed by neutron scattering^. The serine protease domain is very similar to that of ^-trypsin as determined by neutron and X-ray scattering^.

Function Factor I functions as a control protein for the activation of the complement pathways. It is a highly specific serine protease and cleaves the a' chains of C3b and C4b in the presence of cofactors (factor H for C3b, C4BP for C4 and CRl and MCP for both), inactivating these proteins and preventing the formation of the C3/C5 convertases. In C3, it cleaves at two positions in the a' chain to yield iC3b and C3f, while in C4b, it cleaves on either side of the thioester to yield C4c and C4d.

Tissue distribution Serum protein: 35 jiig/ml in plasma. Primary site of synthesis: liver"^. Secondary sites: monocytes, fibroblasts, endothelial cells, myoblasts, glial cells^-*.

Regulation of expression Factor I is an acute-phase protein and is upregulated by LPS and by IFNy in endothelial cells, fibroblasts and hepatocytes^'^. In myoblasts, IFN7 has no effect*.

Protein sequence 9,10 MKLLHVFLLF WQRCIEGTCV LNNGTCTAEG NVACLDLGFQ RRTMGYQDFA SDELCCKACQ QEETEILTAD LPWQVAIKDA HPDLKRIVIE PACVPWSPYL NRFYEKEMEC GKPEFPGFYT

LCFHLRFCKV CKLPYQCPKN KFSVSLKHGN QGADTQRRFK DWCYTQKAD GKGFHCKSGV MDAERRRIKS SGITCGGIYI YVDRIIFHEN FQPNDTCIVS AGTYDGSIDA KVANYFDWIS

TYTSQEDLVE GTAVCATNRR TDSEGIVEVK LSDLSINSTE SPMDDFFQCV CIPSQYQCNG LLPKLSCGVK GGCWILTAAH YNAGTYQNDI GWGREKDNER CKGDSGGPLV YHVGRPFISQ

KKCLAKKYTH SFPTYCQQKS LVDQDKTMFI CLHVHCRGLE NGKYISQMKA EVDCITGEDE NRMHIRRKRI CLRASKTHRY ALIEMKKDGN VFSLQWGEVK CMDANNVTYV YNV

LSCDKVFCQP LECLHPGTKF CKSSWSMREA TSLAECTFTK CDGINDCGDQ VGCAGFASVA VGGKRAQLGD QIWTTWDWI KKDCELPRSI LISNCSKFYG WGWSWGENC

50 10 0 15 0 2 00 2 50 3 00 3 50 40 0 450 500 550 583

The leader sequence and the cleavage site (RRKRI between the heavy (Nterminal) and light (C-terminal) chains, which is lost from the mature protein, are underlined. N-linked glycosylation sites (all occupied) are indicated (N).

Protein modules Leader peptide 1-18 23-89 FIMAC 95-200 C D 5 domain LDLRA 202-239 241-275 LDLRA 300-583 Serine protease domain Catalytic triad: H362, D411, S507.

exon 1 exon2 exon 3/4 exon 5 exon 6 exon 9-13

Cliromosomal location Human^O'^^: 4q25. Telomere ... IL-2 ... IF ... EGF ... Centromere Mouse^^: chromosome 3, 66.6 cM. Telomere ... Adh3 ... Cfi ... Egf ... Centromere

cDNA sequence 9,10,13 GAGAGACAAA TGTGCTTCCA AAAAGTGCTT GGCAGAGATG GCACTGCAGT TGGAATGTCT AGTTTAGTGT TTGTGGACCA ACGTGGCCTG TGTCTGATCT CCAGTTTGGC ATGTGGTTTG

GACCCCGAAC CTTAAGGTTT AGCAAAAAAA CATTGAGGGC GTGTGCAACT TCATCCAGGG TTCCTTGAAG AGATAAGACA CCTTGACCTT CTCTATAAAT TGAATGTACT TTATACACAG

ACCTCCAACA TGCAAGGTCA TATACTCACC ACCTGTGTTT AACAGGAGAA ACAAAGTTTT CATGGAAATA ATGTTCATAT GGGTTTCAAC TCCACTGAAT TTTACTAAGA AAAGCAGATT

TGAAGCTTCT CTTATACATC TCTCCTGCGA GTAAACTACC GCTTCCCAAC TAAATAACGG CAGATTCAGA GCAAAAGCAG AAGGTGCTGA GTCTACATGT GAAGAACTAT CTCCAATGGA

TCATGTTTTC TCAAGAGGAT TAAAGTCTTC GTATCAGTGC ATACTGTCAA AACATGCACA GGGAATAGTT CTGGAGCATG TACTCAAAGA GCATTGCCGA GGGTTACCAG TGACTTCTTT

CTGTTATTTC CTGGTGGAGA TGCCAGCCAT CCAAAGAATG CAAAAGAGTT GCCGAAGGAA GAAGTAAAAC AGGGAAGCCA AGGTTTAAGT GGATTAGAGA GATTTCGCTG CAGTGTGTGA

60 120 180 240 300 360 420 480 540 600 660 720

Factor I

• | cDNA sequence continued ^ H ^ H ^^1

^H ^^m ^ H ^ H ^^1 ^^1 ^^1 ^^1 ^ H ^ H ^^1 ^^m ^^1 ^ H ^ H ^^1 ^ H ^^1

ATGGGAAATA CATTTCTCAG ATGAAAGCCT GTGATGGTAT GTGATGAACT GTGTTGTAAA GCATGCCAAG GCAAAGGCTT GCATTCCAAG CCAGTATCAA TGCAATGGTG AGGTGGACTG AGGCTTTGCA TCTGTGGCTC AAGAAGAAAC TGGATGCAGA AAGAAGACGG ATAAAATCAT TATTACCTAA ACAGAATGCA CATTCGAAGG AAACGAATTG TGGGAGGAAA TCCCATGGCA GGTGGCAATT AAGGATGCCA GTGGAATCAC GTGGCTGTTG GATTCTGACT GCTGCACATT GTCTCAGAGC AAATATGGAC AACAGTAGTA GACTGGATAC ACCCCGACCT ACGTGGATAG AATTATTTTC CATGAAAACT ACAATGCAGG CTTTGATTGA AATGAAAAAA GACGGAAACA AAAAAGATTG CTGCCTGTGT CCCCTGGTCT CCTTACCTAT TCCAACCTAA GCTGGGGACG AGAAAAAGAT AACGAAAGAG TCTTTTCACT TAATAAGCAA CTGCTCTAAG TTTTACGGAA ATCGTTTCTA CAGGTACATA TGATGGTTCC ATCGATGCCT GTAAAGGGGA GTATGGATGC CAACAATGTG ACTTATGTCT GGGGTGTTGT GAAAACCAGA GTTCCCAGGT TTTTACACCA AAGTGGCCAA ACCATGTAGG AAGGCCTTTT ATTTCTCAGT ACAATGTATA ATTCTATTCT TTTTCTCTCA AGAGTTCCAT TTAATGGAAA TTCTCTAGGG GGGAAAAATG AAGCAAATCT CATTGGATAT TTTATGCCAT ATTGGAATTT TGTTGTATAA TTCTCAAATA

CAATGATTGT CCATTGCAAA CATTACAGGG AGAAATTTTG ACTATCTTGT GCGAGCACAA CTGTGGGGGA CAGTAAAACT TAAACGTATA CACTTACCAA TGAGCTGCCT TGATACATGC TCAGTGGGGT TGAAAAAGAA CTCTGGAGGC GAGTTGGGGG TTATTTTGAC AAATTGTGAT TAAAACGGTA TTTTAAAGGT AATATTTTGG

GGAGACCAAA TCGGGTGTTT GAAGATGAAG ACTGCTGACA GGAGTTAAAA CTGGGAGACC ATTTATATTG CATCGTTACC GTAATTGAAT AATGACATCG CGTTCCATCC ATCGTTTCTG GAAGTTAAAC ATGGAATGTG CCCTTAGTCT GAAAACTGTG TGGATTAGCT CTCTCTCTTC TAATTAATAA CTCCACAGAG TGAAGCAT

780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920

The first five nucleotides in each exon are underlined to indicate intron-exon boundaries. The methionine initiation codon (ATG), the termination codon (TAA) and the probable polyadenylation signal (AATAAA) are indicated.

Genomic structure^^ The gene spans 63 kb and is encoded by 13 exons illustrated below. The first intron is 36 kb in size while the others vary from 0.19 to 5.9 kb. 4kb

4-h-h

-h-H 13

1

Accession numbers Human^'^O'^^ Mouse (BALB/c)^2

Xenopus^"^

cDNA Y00318 J02770 U47810 X59958

Genomic X78594

Deficiency^-'^ Autosomal recessive. Uncontrolled activation of the amplification loop of alternative pathway results in consumptive loss of C3. Patients show an

increased incidence of recurrent pyogenic infection, glomerulonephritis and SLE-like illness. Mutations identified: A1282 to T, H418 to L; three chromosomes G801 to A; splicing defect, loss of exon 5; one chromosome

Polymorphic variants G833A; silent^. Two common (A and B)^^ ^^d four rare (Al, A2, B2 and C)^^'^* variants have been identified by isoelectric focusing of neuraminidase-treated plasma followed by western blotting.

References ^ Sim, R.B. et al. (1993) Int. Rev. Immunol. 10, 65-86. 2 Perkins, S.J. et al. (1993) Biochem. J. 295, 101-108. 3 DiScipio, R.G. (1992)}. Immunol. 149, 2592-2599. ^ Morris, K.M. et al. (1982) J. Clin. Invest. 70, 906-913. 5 Whaley, K. (1980) J. Exp. Med. 151, 501-516. 6 Vyse, T.J. et al. (1996) J. Clin. Invest. 97, 925-933. ^ Dauchel, H. et al. (1990) Eur. J. Immunol. 20, 1669-1675. « Legoedec, J. et al. (1995) Eur. J. Immunol. 25, 3460-3466. ^ Catterall, C.F. et al. (1987) Biochem. J. 242, 849-856. '0 Goldberger, G. et al. (1987) J. Biol. Chem. 262,10065-10071. ^^ Shiang, R. et al. (1989) Genomics 4, 82-86. ^2 Minta, J.O. et al. (1996) Mol. Immunol. 33, 101-112. ^3 Vyse, T.J. et al. (1994) Genomics 24, 90-98. ^^ Kunnath-Muglia, L.M. et al. (1993) Mol. Immunol. 30, 1249-1256. ^5 Vyse, T.J. et al. (1994) Q. J. Med. 87, 385-401. ^6 Nakamura, S. and Abe, K. (1985) Hum. Genet. 71, 45-48. ^7 Nakamura, S. et al. (1991) Hum. Hered. 41, 403-408. ^8 Umetsu, K. et al. (1994) Hum. Biol. 66, 339-348.

Parts C3 Family

Marina Botto, Imperial College School of Medicine, Hammersmith Campus, London, UK

Physicochemical properties C3 is synthesized as a prepro single-chain molecule of 1663 amino acids including a 22 amino acid leader peptide and a four amino acid cleavage site^ Mature protein^: pl M, (K) predicted Amino acids M, (K) predicted N-linked glycosylation sites Interchain disulfide bonds I p-a

-5.9 -190 P chain 23-667 -75 (645 aa) 1(85)

a chain 672-1663 -115 (992 aa) 2(939, 1617-potential)

559

816

Structure C3 has a two-domain shape, consisting of a flat ellipsoid about 18nm long, 2nm thick and 8-10 nm wide, with a smaller flat domain of 2 X 4 X 9nm, confirmed by scattering solution analysis^. The larger domain represents the C3c and the smaller is the C3d region. On proteolytic activation and removal of C3a there is a large conformational change with the two domains moving closer together. Recently, X-ray crystallographic analysis has shown C3d to be folded into a highly helical structure known as an a-a barrel with a convex surface at one end which contains the thioester-dependent covalent attachment site (see under Function below) and a concave depression at the opposite end of the barreP.

Function C3 precursor is first processed by the removal of four Arg residues, forming two chains, /3 and a, linked by a disulfide bond. C3 convertase activates C3 by cleaving the a chain, releasing C3a anaphylotoxin and generating C3b {p chain and a' chain). C3b can bind to, via its reactive thioester, cell surface carbohydrates or immune aggregates and have opsonic and immunoregulatory functions. C3b is rapidly split in two positions by factor I and a cofactor to form iC3b (inactivated C3b) and C3f which is released. iC3b is slowly cleaved (possibly by factor I) to form C3c and C3dg. Other proteases produce other fragments such as C3d or C3g. C3a anaphylatoxin is a mediator of local inflammatory processes. It induces the contraction of smooth muscles, increases vascular permeability and causes histamine release from mast cells and basophilic leukocytes.

Thioester site _^4 a chain C3 p chain

-N

I

C3 convertase

a' chain

C3a

C3b p chain

I 1 s-s

• a'-67

1 1 C3f II

a-40

1

1

Factor I with cofactor ( H , MCP, CR1)

1

iC3b

chain

I C3dg

Factor I with CR1 as cofactor

a-40

C3c p chain

C3 Degradation Pathway

Tissue distribution Serum protein: 1-1.5 mg/ml^. Primary site of synthesis: liver^. Secondary sites: monocytes/macrophages^, neutrophils^, fibroblasts^, capillary^'^ and umbilical vein endothelial cells^^, myoblasts^^, synovium^^, T cells^'*, endometrium^^, astrocytes'^, glomerular mesangial cells'^, proximal tubular epithelium'*, glomerular epithelium'^, pulmonary alveolar type II epithelium^o, adipocytes^', osteoblasts^^, intestinal epithelial cells^-^.

Regulation of expression C3 is the most prominent of the acute-phase complement components. C3 synthesis is tissue-specific and is modulated in response to a variety of stimulatory agents. It is upregulated in most cell types by IL-1, IL-6, TNFa and LPS, the exception being neutrophils, where IL-1 downregulates C3. IFN7 also generally upregulates expression, except in neutrophils, pneumocytes and intestinal epithelial cells. In the endometrium, oestrogens lower C3 expression, while progesterone increases synthesis. Expression levels are controlled both at the level of transcription^^ and mRNA stability25.

I

Protein sequence^ MGPTSGPSLL GDVPVTVTVH GRNKFVTVQA TVNHKLLPVG MGQWKIRAYY LEVTITARFL VLSRKVLLDG SPYQIHFTKT TQGDGVAKLS NSNNYLHLSV LKAGRQVREP VWVDVKDSCV GVFVLNKKNK GQQTAQRAEL NPMRFSCQRR LDEDIIAEEN TWEILAVSMS LYNYRQNQEL VPLKTGLQEV RLGREGVQKE KHLIVTPSG^ KGYTQQLAFR CGAVKWLILE LQEAKDICEE RLKGPLLNKF PWRWLNEQR PSRSSKITHR KAKDQLTCNK SILDISMMTG VSHSEDDCLA KLNKLCRDEL KVQLSNDFDE HYLMWGLSSD AFTESMWFG

LLLLTHLPLA DFPGKKLVLS TFGTQWEKV RTVMVNIENP ENSPQQVFST YGKKVEGTAF VQNLRAEDLV PKYFKPGMPF INTHPSQKPL LRTELRPGET GQDLWLPLS GSLWKSGQS LTQSKIWDW QCPQPAARRR TRFISLGEAC IVSRSEFPES DKKGICVADP KVRVELLHNP EVKAAVYHHF DIPPADLSDQ GEQNMIGMTP QPSSAFAAFV KQKPDGVFQE QVNSLPGSIT LTTAKDKNRW YYGGGYGSTQ IHWESASLLR FDLKVTIKPA FAPDTDDLKQ FKVHQYFNVE CRCAEENCFI YIMAIEQTIK FWGEKPNLSY CPN

LGSPMYSIIT SEKTVLTPAT VLVSLQSGYL EGIPVKQDSL EFEVKEYVLP VIFGIQDGEQ GKSLYVSATV DLMVFVTNPD SITVRTKKQE LNVNFLLRMD ITTDFIPSFR EDRQPVPGQQ EKADIGCTPG RSVQLTEKRM KKVFLDCCNY WLWNVEDLKE FEVTVMQDFF AFCSLATTKR ISDGVRKSLK VPDTESETRI TVIAVHYLDE KRAPSTWLTA DAPVIHQEMI KAGDFLEANY EDPGKQLYNV ATFMVFQALA SEETKENEGF PETEKRPQDA LANGVDRYIS LIQPGAVKVY QKSDDKVTLE SGSDEVQVGQ IIGKDTWVEH

PNILRLESEE NHMGNVTFTI FIQTDKTIYT SSQNQLGVLP SFEVIVEPTE RISLPESLKR ILHSGSDMVQ GSPAYRVPVA LSEAEQATRT RAHEAKIRYY LVAYYTLIGA MTLKIEGDHG SGKDYAGVFS DKVGKYPKEL ITELRRQHAR PPKNGISTKL IDLRLPYSW RHQQTVTIPP WPEGIRMNK LLQGTPVAQM TEQWEKFGLE YWKVFSLAV GGLRNNNEKD MNLQRSYTVA EATSYALLAL QYQKDAPDHQ TVTAEGKGQG KNTMILEICT KYELDKAFSD AYYNLEESCT ERLDKACEPG QRTFISPIKC WPEEDECQDE

TMVLEAHDAQ PANREFKSEK PGSTVLYRIF LSWDIPELVN KFYYIYNEKG IPIEDGSGEV AERSGIPIVT VQGEDTVQSL MQALPYSTVG TYLIMNKGRL SGQREWADS ARWLVAVDK DAGLTFTSSS RKCCEDGMRE ASHLGLARSN MNIFLKDSIT RNEQVEIRAV KSSLSVPYVI TVAVRTLDPE TEDAVDAERL KRQGALELIK NLIAIDSQVL MALTAFVLIS lAGYALAQMG LQLKDFDFVP ELNLDVSLQL TLSWTMYHA RYRGDQDATM RNTLIIYLDK RFYHPEKEDG VDYVYKTRLV REALYKTRLV ENQKQCQDLG

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650

The leader sequence and the cleavage site (RRRRI between the a and j8 chains are underlined. CIO 10 and Q1013, whose side-chains form the thioester bond, are double underlined and N-linked glycosylation sites are indicated (N).

Protein modules 1-22 23-667 672-1663

Leader peptide /3 chain a chain

672-748 955-1303 955-1001

C3a anaphylatoxin C3dg fragment C3g fragment

1002-1303 1303-1321

C3d fragment C3f fragment

1010-1013

Thioester domain (C-Q)

1424-1456 1209-1271 749-790 1227-1236 1383-1403

Properdin-binding site Factor H-binding site^^ Factor H, factor B^^ and CRPs binding sites CR2-binding site^^ CR3-binding site^^

Chromosomal location Human^^: Mouse: Rat:

19pl3.3-pl3.2. chromosome 17, 34.3 cM. chromosome 9.

cDNA sequence^ CTCCTCCCCA ATGGGACCCA CTGGGGAGTC ACCATGGTGC GACTTCCCAG AACCACATGG GGGCGCAACA GTGCTGGTCA CCTGGCTCCA CGGACGGTCA TCTTCTCAGA ATGGGCCAGT GAGTTTGAGG AAATTCTACT TACGGGAAGA AGGATTTCCC GTGCTGAGCC GGGAAGTCTT GCAGAGCGCA CCCAAGTACT GGCTCTCCAG ACCCAGGGAG AGCATCACGG ATGCAGGCTC CTACGTACAG CGCGCCCACG TTGAAGGCGG ATCACCACCG AGCGGCCAGA GGCTCGCTGG ATGACCCTGA GGCGTGTTCG GAGAAGGCAG GACGCAGGGC CAGTGCCCGC GACAAAGTCG

TCCTCTCCCT CCTCAGGTCC CCATGTACTC TGGAGGCCCA GCAAAAAACT GCAACGTCAC AGTTCGTGAC GCCTGCAGAG CAGTTCTCTA TGGTCAACAT ACCAGCTTGG GGAAGATCCG TGAAGGAGTA ACATCTATAA AAGTGGAGGG TGCCTGAATC GGAAGGTACT TGTACGTGTC GCGGGATCCC TCAAACCAGG CCTACCGAGT ATGGCGTGGC TGCGCACGAA TGCCCTACAG AGCTCAGACC AGGCCAAGAT GACGCCAGGT ACTTCATCCC GGGAGGTGGT TGGTAAAAAG AGATAGAGGG TGCTGAATAA ACATCGGCTG TGACCTTCAC AGCCAGCCGC GCAAGTACCC

CTGTCCCTCT CAGCCTGCTG TATCATCACC CGACGCGCAA AGTGCTGTCC CTTCACGATC CGTGCAGGCC CGGGTACCTC TCGGATCTTC TGAGAACCCG CGTCTTGCCC AGCCTACTAT CGTGCTGCCC CGAGAAGGGC AACTGCCTTT CCTCAAGCGC GCTGGACGGG TGCCACCGTC CATCGTGACC AATGCCCTTT CCCCGTGGCA CAAACTCAGC GAAGCAGGAG CACCGTGGGC CGGGGAGACC CCGCTACTAC GCGAGAGCCC TTCCTTCCGC GGCCGACTCC CGGCCAGTCA TGACCACGGG GAAGAACAAA CACCCCGGGC GAGCAGCAGT CCGCCGACGC CAAGGAGCTG

GTCCCTCTGA CTCCTGCTAC CCCAACATCT GGGGATGTTC AGTGAGAAGA CCAGCCAACA ACCTTCGGGA TTCATCCAGA ACCGTCAACC GAAGGCATCC TTGTCTTGGG GAAAACTCAC AGTTTCGAGG CTGGAGGTCA GTCATCTTCG ATTCCGATTG GTGCAGAACC ATCTTGCACT TCTCCCTACC GACCTCATGG GTCCAGGGCG ATCAACACAC CTCTCGGAGG AACTCCAACA CTCAACGTCA ACCTACCTGA GGCCAGGACC CTGGTGGCGT GTGTGGGTGG GAAGACCGGC GCCCGGGTGG CTGACGCAGA AGTGGGAAGG GGCCAGCAGA CGTTCCGTGC CGCAAGTGCT

CCCTGCACTG TAACCCACCT TGCGGCTGGA CAGTCACTGT CTGTGCTGAC GGGAGTTCAA CCCAAGTGGT CAGACAAGAC ACAAGCTGCT CGGTCAAGCA ACATTCCGGA CACAGCAGGT TCATAGTGGA CCATCACCGC GGATCCAGGA AGGATGGCTC TCCGAGCAGA CAGGCAGTGA AGATCCACTT TGTTCGTGAC AGGACACTGT ACCCCAGCCA CAGAGCAGGC ATTACCTGCA ACTTCCTCCT TCATGAACAA TGGTGGTGCT ACTACACGCT ACGTCAAGGA AGCCTGTACC TACTGGTGGC GTAAGATCTG ATTACGCCGG CCGCCCAGAG AGCTCACGGA GCGAGGACGG

TCCCAGCACC CCCCCTGGCT GAGCGAGGAG TACTGTCCAC CCCTGCCACC GTCAGAAAAG GGAGAAGGTG CATCTACACC ACCCGTGGGC GGACTCCTTG ACTCGTCAAC CTTCTCCACT GCCTACAGAG CAGGTTCCTC TGGCGAACAG GGGGGAGGTT AGACCTGGTG CATGGTGCAG CACCAAGACA GAACCCTGAT GCAGTCTCTA GAAGCCCTTG TACCAGGACC TCTCTCAGTG GCGAATGGAC GGGCAGGCTG GCCCCTGTCC GATCGGTGCC CTCCTGCGTG TGGGCAGCAG CGTGGACAAG GGACGTGGTG TGTCTTCTCC GGCAGAACTT GAAGCGAATG CATGCGGGAG

60 12 0 180 240 3 00 3 60 42 0 480 54 0 6 00 66 0 72 0 7 80 84 0 900 9 60 102 0 108 0 114 0 12 0 0 12 6 0 132 0 13 8 0 144 0 15 0 0 15 60 162 0 1680 174 0 180 0 18 6 0 192 0 19 8 0 2 04 0 210 0 2160

1

1

cDNA sequence AACCCCATGA AAGAAGGTCT GCCAGCCACC ATCGTTTCCC CCACCGAAAA ACGTGGGAGA TTCGAGGTCA CGAAACGAGC AAGGTGAGGG CGTCACCAGC GTGCCGCTAA ATCAGTGACG ACTGTGGCTG GACATCCCAC CTCCTGCAAG AAGCACCTCA ACGGTCATCG AAGCGGCAGG CAACCCAGCT TACGTGGTCA TGCGGGGCTG GATGCGCCCG ATGGCCCTCA CAGGTCAACA ATGAACCTAC AGGCTGAAGG GAGGACCCTG CTGCAGCTAA TACTACGGTG CAATACCAAA CCCAGCCGCA TCAGAAGAGA ACCTTGTCGG TTCGAGGTCA AAGAACACTA TCTATATTGG CTGGCCAATG AGGAACACCC TTCAAAGTTC GCCTATTACA AAGCTGAACA CAAAAGTCGG GTGGACTATG TACATCATGG CAGCGCACGT CACTACCTCA ATCATCGGGA GAGAACCAGA TGCCCCAACT

GGTTCTCGTG TCCTGGACTG TGGGCCTGGC GAAGTGAGTT ATGGAATCTC TTCTGGCTGT CAGTAATGCA AGGTGGAAAT TGGAACTACT AGACCGTAAC AGACCGGCCT GTGTCAGGAA TTCGCACCCT CTGCAGACCT GGACCCCAGT TTGTGACCCC CTGTGCATTA GGGCCTTGGA CTGCCTTTGC AGGTCTTCTC TTAAATGGCT TGATACACCA CGGCCTTTGT GCCTGCCAGG AGAGATCCTA GGCCTCTTCT GTAAGCAGCT AAGACTTTGA GTGGCTATGG AGGACGCCCC GCTCCAAGAT CCAAGGAAAA TGGTGACAAT AGGTCACCAT TGATCCTTGA ACATATCCAT GTGTTGACAG TCATCATCTA ACCAATACTT ACCTGGAGGA AGCTCTGCCG ATGACAAGGT TGTACAAGAC CCATTGAGCA TCATCAGCCC TGTGGGGTCT AGGACACTTG AACAATGCCA GACCACACCC

continued CCAGCGCCGG CTGCAACTAC CAGGAGTAAC CCCAGAGAGC TACGAAGCTC CAGCATGTCG GGACTTCTTC CCGAGCCGTT CCACAATCCA CATCCCCCCC GCAGGAAGTG GTCCCTGAAG GGATCCAGAA CAGTGACCAA GGCCCAGATG CTCGGGCTGC CCTGGATGAA GCTCATCAAG GGCCTTCGTG TCTGGCTGTC GATCCTGGAG AGAAATGATT TCTCATCTCG CAGCATCACT CACTGTGGCC TAACAAATTT CTACAACGTG CTTTGTGCCT CTCTACCCAG TGACCACCAG CACCCACCGT TGAGGGTTTC GTACCATGCT AAAACCAGCA GATCTGTACC GATGACTGGC ATACATCTCC CCTGGACAAG TAATGTAGAG AAGCTGTACC TGATGAACTG CACCCTGGAA CCGACTGGTC GACCATCAAG CATCAAGTGC CTCCTCCGAT GGTGGAGCAC GGACCTCGGC CCATTCCCCC

ACCCGTTTCA ATCACAGAGC CTGGATGAGG TGGCTGTGGA ATGAATATAT GACAAGAAAG ATCGACCTGC CTCTACAATT GCCTTCTGCA AAGTCCTCGT GAAGTCAAGG GTCGTGCCGG CGCCTGGGCC GTCCCGGACA ACAGAGGATG GGGGAACAGA ACGGAGCAGT AAGGGGTACA AAACGGGCAC AACCTCATCG AAGCAGAAGC GGTGGATTAC CTGCAGGAGG AAAGCAGGAG ATTGCTGGCT CTGACCACAG GAGGCCACAT CCCGTCGTGC GCCACCTTCA GAACTGAACC ATCCACTGGG ACAGTCACAG AAGGCCAAAG CCGGAAACAG AGGTACCGGG TTTGCTCCAG AAGTATGAGC GTCTCACACT CTTATCCAGC CGGTTCTACC TGCCGCTGTG GAACGGCTGG AAGGTTCAGC TCAGGCTCGG AGAGAAGCCC TTCTGGGGAG TGGCCTGAGG GCCTTCACCG ACTCCAGATA

TCTCCCTGGG TGCGGCGGCA ACATCATTGC ACGTTGAGGA TTTTGAAAGA GGATCTGTGT GGCTACCCTA ACCGGCAGAA GCCTGGCCAC TGTCCGTTCC CTGCCGTCTA AAGGAATCAG GTGAAGGAGT CCGAGTCTGA CCGTCGACGC ACATGATCGG GGGAGAAGTT CCCAGCAGCT CCAGCACCTG CCATCGACTC CCGACGGGGT GGAACAACAA CTAAAGATAT ACTTCCTTGA ATGCTCTGGC CCAAAGATAA CCTATGCCCT GTTGGCTCAA TGGTGTTCCA TTGATGTGTC AATCTGCCAG CTGAAGGAAA ATCAACTCAC AAAAGAGGCC GAGACCAGGA ACACAGATGA TGGACAAAGC CTGAGGATGA CTGGAGCAGT ATCCGGAAAA CTGAGGAGAA ACAAGGCCTG TGTCCAATGA ATGAGGTGCA TGAAGCTGGA AGAAGCCCAA AGGACGAATG AGAGCATGGT AAGCTTCAGT

CGAGGCGTGC GCACGCGCGG AGAAGAGAAC CTTGAAAGAG CTCCATCACC GGCAGACCCC CTCTGTTGTT CCAAGAGCTC CACCAAGAGG ATATGTCATC CCATCATTTC AATGAACAAA GCAGAAAGAG GACCAGAATT GGAACGGCTG CATGACGCCC CGGCCTAGAG GGCCTTCAGA GCTGACCGCC CCAAGTCCTC CTTCCAGGAG CGAGAAAGAC TTGCGAGGAG AGCCAACTAC CCAGATGGGC GAACGGCTGG CTTGGCCCTA TGAACAGAGA AGCCTTGGCT CCTCCAACTG CCTCCTGCGA AGGCCAAGGC CTGTAATAAA TCAGGATGCC TGCCACTATG CCTGAAGCAG CTTCTCCGAT CTGTCTAGCT CAAGGTCTAC GGAGGATGGA TTGCTTCATA TGAGCCAGGA CTTTGACGAG GGTTGGACAG GGAGAAGAAA CCTCAGCTAC CCAAGACGAA TGTCTTTGGG TATATCTC

2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TGA) and the polyadenylation signal (GATAAA) are indicated^^'^^.

Genomic structure^^^^^ The gene spans 41 kb and is encoded by 41 exons illustrated below.

Mil m m III—^—H+-HH—II nil iiiiiiii Accession numbers Human: K02765 Mouse: K02782

Deficiency Autosomal recessive. Patients show an increased susceptibility to pyogenic infections, membranoproliferative glomerulonephritis and SLE-like illness. Mutations identified: G to A, intron 18; splicing defect, loss of 61 bp of exon 18 one patient^^ G to T intron 10; splicing defect, partial deletion of exon 10 one patient-^^ Gl 705 to A; D549 to N one patient^^ Deletion of exons 22 and 23 one patient^^

Polymorphic variants Two common variants (CSS and C3F)^^ and more than 20 rare variants have been identified on the basis of their relative electrophoretic mobilities. C364G; R102G; C3S/C3F^9 A4915C; S992R; C3S/C3*S025^o CI00IT; P314L, -ve/+ve for monoclonal antibody HAV4-P^ C4956T; silent^o

References ' 2 ^ ^ 5 6 7 « 9 ^0 " ^2 ^3 ^^

De Bmijn, M.H.L. et al. (1985) Proc. Natl Acad. Sci. USA 82, 708-712. Tack, B.F. et al. (1976) Biochemistry 15, 4513-4521. Perkins, S.J. et al. (1986) Eur. J. Biochem. 157, 155-168. Nagar, B. et al. (1998) Science 280, 1277-1281. Kohler, P.F. et al. (1967) J. Immunol. 99, 1211-1216. Alper, C.A. et al. (1969) Science 163, 286-288. Eistein, L.P. et al. (1977) J. Clin. Invest. 60, 963-969. Botto, M. et al. (1992) J. Immunol. 149, 1348-1355. Katz, Y. et al. (1989) Eur.}. Immunol. 19, 983-988. Ueki, A. et al. (1987) Immunology 61, 11-14. Warren, H.B. et al. (1987) Am. J. Pathol. 129, 9-13. Legoedec, J. et al. (1995) Eur. J. Immunol. 25, 3460-3466. Ruddy, S. et al. (1974) N. Engl. J. Med. 290, 1284-1288. Pantazis, P. et al. (1990) Mol. Immunol. 27, 283-289.

^5 Sundstrom, S.A. et al. (1989) J. Biol. Chem. 264, 16941-16947. ^6 Barnum, S.R. et al. (1992) J. Neuroimmunol. 38, 275-282. ^7 Sacks, S.H. et al. (1993) Clin. Exp. Immunol. 93, 411-417. ^s Brooimans, R.A. et al. (1991) J. Clin. Invest. 88, 379-384. ^9 Sacks, S.H. et al. (1993) Immunology 19, 348-354. 20 Strunk, R. et al. (1988) J. Clin. Invest. 81, 1419-1426. 2^ Choy, L.N. et al. (1992) J. Biol. Chem. 267, 12736-12741. 22 Hong, M.H. et al. (1991) Endocrinology 129, 2774-2779. 23 Molmenti, E.P. et al. (1993) J. Biol. Chem. 268, 14116-14124. 24 Wilson, D.R. et al. (1990) Mol. Cell. Biol. 10, 6181-6191. 25 Mitchell, T.J. (1996) J. Immunol. 156, 4429-4434. 26 Lambris, J.D. et al. (1988) J. Biol. Chem. 263, 12147-12150. 27 Lambris, J.D. et al. (1996) J. Immunol. 156, 4821-4832. 2s Becherer J.D. et al. (1988) J. Biol. Chem. 263, 14586-14591. 29 Lambris, J.D. et al. (1985) Proc. Natl Acad. Sci. USA 82, 4235-4239. 30 Wright, S.D. et al. (1987) Proc. Natl Acad. Sci. USA 84, 1965-1968. 3^ Whitehead, A.S. et al. (1982). Proc. Natl Acad. Sci. USA 19, 5021- 5025. 32 Fong, K.Y.et aL (1990) Genomics 7, 579-586. 33 Vik, D.P. et aL (1991) Biochemistry 30,1080-1085. 3^* Botto, M. et al. (1990) J. Clin. Invest. 86, 1158-1163. 35 Huang, J.L. et al. (1994) Clin. Immunol. Immunopathol. 73, 267-273. 36 Singer, L. et al. (1994) J. Biol. Chem. 269, 28494-28499. 37 Botto, M. et al. (1992) Proc. Natl Acad. Sci. USA 89, 4957-4961. 3« Alper, C.A. et al. (1968) J. Clin. Invest. 47, 2181-2192. 39 Botto, M. et aL (1990) J. Exp. Med. 172, 1011-1017. ^» Hohler, T. et al. (1995) Hum. Genet. 96, 539-541.

D

David E. Isenman, Department of Biochemistry, University of Toronto, Toronto, Canada Other names None in human; in mouse formerly also known as Ss protein.

Physicochemical properties Human C4 is synthesized as a prepro single chain of 1744 amino acids including a 19 amino acid leader sequence and two tetrabasic cleavage sites for a furin-like enzyme. The three chains of mature C4 are linked by disulfide bonds and these chains have the order p-a-y in the precursor molecule. In primates, sheep and cattle there are two isotypes of C4, C4A and C4B^, the A/B designations reflecting their more Acidic or 5asic electrophoretic mobilities at alkaline pH. Mature protein: pi ==7.9 (calculated for C4A, however is allotype dependent)

P chain Amino acids

M, (K)

20-675

predicted 71.8 observed 75 N-linked glycosylation sites 1 (226) (all occupied^ Interchain disulfide bonds-^'"^ /3567 1 p-a, 2 a-y (Parings inferred from C3 data^J

a chain 680-1449 (secreted) 680-1427 (plasma) 84.6 93

/chain 1454-1744

33.1 33

3(862,1328,1391)

a820 aS76 al394

7(1588, 1590, 1595)? 71566

Structure A high-resolution structure of C4 is not available. Electron microscopic^ as well as X-ray and neutron scattering studies suggest that C4 is a multidomain molecule^. The shape is that of an oblate ellipsoid of approximate dimension 2 x 8 X 18 nm. The scattering data further suggest that C4c and C4d form independent domains^. Recently, X-ray crystallographic analysis has shown C3d to be folded into a highly helical structure known as an a-a barrel with a convex surface at one end which contains the thioester-dependent covalent attachment site (see under Function below) and a concave depression at the opposite end of the barreP. Given the - 4 0 % identity between C4d and C3d and the fact that the models derived from the X-ray and neutron-scattering studies of C3 are similar to those for C4^, it seems highly likely that the C4d domain will adopt a similar structure.

Function Following secretion into the plasma, an elastase-like metalloprotease removes a 22 residue peptide from the C-terminus of the a chain^^. Other post-translational modifications include formation of the intramolecular thioester bond, glycosylation and tyrosine sulfation. The chain structure of the mature plasma form of C4 and its degradation products are depicted below. C4 is proteolytically activated through both the classical pathway by Cls and the lectin pathway by MASP-2. The resulting C4b fragment is a modulatory subunit of both the classical pathway C3 convertase (C4b2a) and classical pathway C5 convertase (C3b4b2a). Only when bound to C4b can C2a cleave C3 and similarly, only when bound to the C4b3b heterodimer can C5 be efficiently cleaved by C4b-associated C2a^^. Thus in terms of propagating the pathway, C4b has stable binding sites for C2, C5 and nascently activated C3b^2 j ^ also has binding sites for the CCP domain-containing complement regulatory molecules C4BP, CRl, MCP and DAF and the regulatory serine protease factor I. The C4a activation peptide has weak anaphylatoxin activity (depending on the experimental system -100-1000 less potent than C3a)^^ and although it was originally thought to bind to the same receptor as did C3a, more recent work suggests that C4a binds to a distinct receptor^^'^^. The thioester formed between the side chains of CyslOlO and Gin 1013 within the C4d region of the a chain mediates covalent attachment to the target surface bearing activated forms of Cls or MASP. The human C4A and C4B isotypes not only differ in their preference of acceptor nucleophiles, amino groups for C4A and hydroxyl groups for C4B, but also in the mechanism through which transacylation is accomplished^^. In nascently activated C4b of the A isotype, amino group nucleophiles are believed to directly attack the thioester carbonyl. However, due to the presence in C4B of a histidine at residue 1125 in place of the non-nucleophilic residue aspartic acid in C4A, activation first leads to the formation of an intramolecular acyl-imidazole intermediate involving the transacylation of the thioester carbonyl to the ring nitrogen of H1125. The released thiolate anion can then act as a general base to catalyse the attack of a hydroxyl-bearing nucleophile on the very reactive acyl-imidazole intermediate. Species which have only one functional isotype of C4 have the C4B isotype equivalent and can thus undergo the catalysed form of the transacylation reaction. Target-bound C4b is a ligand of CRl, also known as the immune adherence receptor. There is evidence that C4b of the A isotype is a somewhat better ligand for this receptor than is C4b of the B isotype^''. All functional interaction sites of C4b are lost when a C4b-cofactor complex (i.e. C4b-C4BP, C4b-MCP or C4b-CR1) is cleaved by factor I C-terminal to residues 956 and 1336 to yield fragments C4c and C4d.

cC

i;

NC

:^

C4

Thioester site

1-c

iXs-L^ C-j

Y chain

C1S/MASP2

o£

|3 chain

s-s-l

C4i

e

l-C

N{

C^

±D_ V chain

C4b

|-N

Factor I + Cofactor (C4BP, CR1, MCP)

c£

e

N-l

l-N

(3 chain

s-s-l ^^4

a3

C-|

±ZL_n

y chain

C4c

^-C

NC

C4cl

C4d

C4 Degradation Pathway

Tissue distribution Serum protein: ~0.6nig/ml. Primary site of synthesis: liver^*. Secondary sites: monocytes, macrophages, mammary gland, lung, spleen, kidney, brain, testis, and intestinal epithelial cells^*-^^.

Regulation of expression C4 is an acute-phase protein which is upregulated by INF/in hepatocytes, monocytes and intestinal epithelial cells^^^'^i. INF7 increases the stability of the C4 mRNA22. Upregulation of C4 synthesis in intestinal epithelial cells is also mediated by 11-6^^. No other cytokines have been reported to affect C4 expression. LPS downregulates the synthesis of C4 and counteracts the upregulation of synthesis produced by INF/^^

I

Protein sequence23 MRLLWGLIWA WKGSVFLRN QLLRGPEVQL NPGQRVRYRV DFVIPDISEP YILTVPGHLD SQTKLVNGQS EMEEAELTSW IPVKVSATVS GSPHPAIARL TFSHYYYMIL DHPVANSLRV LGALDTALYA GLAFSDGDQW RCCQDGVTRL QAGLQRALEI DSLTTWEIHG LRPVLYNYLD WPTAAAAVS LDHRGRTLEI ASLLRLPRGC KGYMRIQQFR QETSNWLLSQ HGLAVFQDEG TLTKAPVDLL SDPMPQAPAL STQDTVIALD IRGLEEELQF VTVKGHVEYT RRREAPKWE LEKLTSLSDR PASATLYDYY ALERGLQDED LHFTKDVKAA YLLDSNSWIE

SSFFTLSLQK PSRNNVPCSP VAHSPWLKDS FALDQKMRPS GTWKISARFS EMQLDIQARY HISLSKAEFQ YFVSSPFSLD SPGSVPEVQD TVAAPPSGGP SRGQIVFMNR DVQAGACEGK AGSKSHKPLN TLSRKRLSCP PMMRSCEQRA LQEEDLIDED LSLSKTKGLC KNLTVSVHVS LKWARGSFE PGNSDPNMIP GEQTMIYLAP KADGSYAAWL QQADGSFQDP AEPLKQRVEA GVAHNNLMAM WIETTAYALL ALSAYWIASH SLGSKINVKV MEANEDYEDY EQESRVHYTV YVSHFETEGP NPERRCSVFY GYRMKFACYY ANQMRNFLVR EMPSERLCRS

PRLLLFSPSV KVDFTLSSER LSRTTNIQGI TDTITVMVEN DGLESNSSTQ lYGKPVQGVA DALEKLNMGI LSKTKRHLVP IQQNTDGSGQ GFLSIERPDS EPKRTLTSVS LELSVDGAKQ MGKVFEAMNS KEKTTRKKRN ARVQQLDCRE DIPVRSFFPE VATPVQLRVF PVEGLCLAGG FPVGDAVSKV DGDFNSYVRV TLAASRYLDK SRDSSTWLTA CPVLDRSMQG SISKANSFLG AQETGDNLYW HLLLHEGKAE TTEERGLNVT GGNSKGTLKV EYDELPAKDD CIWRNGKVGL HVLLYFDSVP GAPSKSRLLA PRVEYGFQVK ASCRLRLEPG TRQRAACAQL

VHLGVPLSVG DFALLSLQVP NLLFSSRRGH SHGLRVRKKE FEVKKYVLPN YVRFGLLDED TDLQGLRLYV GAPFLLQALV VSIPIIIPQT RPPRVGDTLN VFVDHHLAPS YRNGESVKLH YDLGCGPGGG VNFQKAINEK PFLSCCQFAE NWLWRVETVD REFHLHLRLP GGLAQQVLVP LQIEKEGAIH TASDPLDTLG TEQWSTLPPE FVLKVLSLAQ GLVGNDETVA EKASAGLLGA GSVTGSQSNA MADQAAAWLT LSSTGRNGFK LRTYNVLDMK PDAPLQPVTP SGMAIADVTL TSRECVGFEA TLCSAEVCQC VLREDSRAAF KEYLIMGLDG NDFLQEYGTQ

VQLQDVPRGQ LKDAKSCGLH LFLQTDQPIY VYMPSSIFQD FEVKITPGKP GKKTFFRGLE AAAIIEYPGG REMSGSPASG ISELQLSVSA LNLRAVGSGA FYFVAFYYHG LETDSLALVA DSALQVFQAA LGQYASPTAK SLRKKSRDKG RFQILTLWLP MSVRRFEQLE AGSARPVAFS REELVYELNP SEGALSPGGV TKDHAVDLIQ EQVGGSPEKL LTAFVTIALH HAAAITAYAL VSPTPAPRNP RQGSFQGGFR SHALQLNNRQ NTTCQDLQIE LQLFEGRRNR LSGFHALRAD VQEVPVGLVQ AEGKCPRQRR RLFETKITQV ATYDLEGHPQ GCQV

50 100 150 200 2 50 3 00 350 400 450 500 550 600 650 7 00 7 50 800 85 0 900 950 1000 1050 1100 115 0 12 0 0 12 50 13 00 1350 14 00 145 0 1500 1550 1600 1650 17 00

The leader sequence and the cleavage sites (RKKR and RRRR) between the P-a and a-y chains, respectively, are underlined. iV-linked glycosylation sites (all utilized) are indicated (N) and residues CIOIO and Q1013, whose side-chains form the thioester bond, are double underlined.

Protein modules 1-19 20-675 680-1449 1454-1744

Leader peptide P chain a chain /chain

680-756 7S7-9S6 957-1336 1337-1427

C4a anaphylatoxin o& fragment of C4c C4d fragment a^ fragment of C4c

1010-1013

CG/AEQ thioester motif common to C3, C4 and ci^-macroglobulin of all species Human isotype-specific sequence, C4A = PCPVLD, C4B = LSPVIH

1120-1125 474-488

Putative location within ^ chain of a site of interaction with 052^^ Putative location within the o& fragment of a site of interaction with C4BP25

757-845

The isotype-specific residues, in addition to four other residues that generally, although not absolutely, segregate as a set with either C4A or C4B produce the serological antigenic determinants of h u m a n C4 known as Chido (Chi,2,3,4,5,6) and Rodgers (Rgl,2). Shown below are the residues that have been deduced to form the various Ch and Rg antigenic determinants^^. Rgl, Rg2, C h i , Ch4, Ch5, and Ch6 are linear sequence determinants, whereas Ch2 and Ch3 are conformational determinants. 2

C4A

° pS-C-,

PC pv LD

D

^

C4B

,

CO

o

A F

_J

——• O

[

LO

II

^

1

1

1

2

o

5

(

1

O

1—1 V

LS PV IH

G j

Rg

M

tt

I II

r^ o o ^

IV

Ch residue number C4d polymorphic cluster no.

Chromosomal location^^ Human: 6p21.3 within the class III region of the MHC. Telomere ... HLA-B ... C2 ... Bf ... C4A ... CYP21A ... C4B ... CYP21B ... HLA-DR Mouse: chromosome 17, locus position 18.8 within the class III region of the MHC. Telomere ... H2-D ... C2 ... Bf ... Sip ... CYP21A ... C4 ... CYP21B ... H2-Ea Sip is 94% identical to mouse C4, but because it is not readily cleaved by C l s , its function in complement is questionable.

cDNA sequence23,28 AGAAGGTAGC CTCTGGGCAT CCTTCTGTGG CGAGGACAGG TGCTCCCCAA CAGGTGCCCT GTCCAGCTGG CAGGGTATCA CCCATTTACA CGCCCGAGCA AAGAAGGAGG TCAGAGCCAG AGCACCCAGT GGAAAGCCCT GCCAGGTACA GATGAGGATG GGACAGAGCC ATGGGCATTA CCAGGTGGGG TCCTTGGATC GCCTTGGTCC ACGGTGTCTT AGCGGCCAAG GTATCTGCAG GGAGGCCCCG ACTCTGAACC ATGATCCTAT TCGGTCTCGG TACCATGGAG GAGGGCAAGC AAGCTCCACT CTGTATGCTG ATGAACAGCT CAGGCAGCGG AGCTGTCCCA AATGAGAAAT ACACGTCTGC TGCCGGGAGC GACAAGGGCC GATGAGGATG ACAGTGGACC ATCCATGGCC CGGGTGTTCC CAGCTGGAGC CACGTGTCCC CTGGTGCCTG GCTGTGTCTC TCCAAGGTTC CTCAACCCCT ATGATCCCTG ACTTTAGGCT CGAGGCTGTG CTGGACAAGA CTGATCCAGA GCTTGGTTGT

AGACAGACAG CCAGCTTCTT TTCATCTGGG TAGTGAAAGG AGGTGGACTT TGAAAGATGC TGGCCCATTC ACCTGCTCTT ACCCTGGCCA CTGACACCAT TGTACATGCC GGACCTGGAA TTGAGGTGAA ACATCCTGAC TCTATGGGAA GTAAGAAGAC ACATTTCCCT CTGACCTCCA AGATGGAGGA TTAGCAAGAC GTGAGATGTC CTCCTGGGTC TCAGCATTCC GCTCCCCACA GGTTTCTGTC TGAACTTGCG CCCGAGGGCA TGTTTGTGGA AGCACCCAGT TGGAGCTCAG TAGAAAGCGA CAGGCAGCAA ATGACCTCGG GCCTGGCCTT AGGAGAAGAC TGGGTCAGTA CCATGATGCG CCTTCCTGTC AGGCGGGCCT ACATTCCCGT GCTTTCAAAT TGAGCCTGTC GCGAGTTCCA TGCGGCCTGT CAGTGGAGGG GGGGCTGTGC TGAAGGTGGT TGCAGATTGA TGGACCACCG ATGGGGACTT GTGAGGGGGC GGGAGCAAAC CAGAGCAGTG AAGGCTACAT CACGGGACAG

ACGGATCTAA CACCTTATCT GGTCCCCCTA ATCAGTGTTC CACCCTTAGC GAAGAGCTGT GCCATGGCTA CTCCTCTCGC GCGGGTTCGG CACAGTCATG CTCGTCCATC GATCTCAGCC GAAATATGTC GGTGCCAGGC GCCAGTGCAG TTTCTTTCGG CTCAAAGGCA GGGGCTGCGC GGCAGAGCTC CAAGCGACAC AGGCTCCCCA TGTTCCTGAA AATAATTATC TCCAGCGATA TATTGAGCGG AGCCGTGGGC GATCGTGTTC CCATCACCTG GGCCAACTCC CGTGGACGGT CTCCCTAGCC GTCCCACAAG CTGTGGTCCT TTCTGATGGA AACCCGGAAA TGCTTCCCCG TTCCTGCGAG CTGCTGCCAA CCAACGAGCC GCGCAGCTTC ATTGACACTG CAAAACCAAA CCTGCACCTC CCTCTATAAC GCTGTGCCTG CCGGCCTGTT GGCTCGAGGG GAAGGAAGGG AGGCCGGACC TAACAGCTAC CTTGTCACCA CATGATCTAC GAGCACACTG GCGGATCCAG CAGCACCTGG

CCTCTCTTGG CTGCAGAAGC TCGGTGGGGG CTGAGAAACC TCAGAAAGAG GGCCTCCATC AAGGACTCTC CGGGGGCACC TACCGGGTCT GTGGAGAACT TTCCAGGATG CGATTCTCAG CTTCCCAACT CATCTTGATG GGGGTGGCAT GGGCTGGAGA GAGTTCCAGG CTCTACGTTG ACATCCTGGT CTTGTGCCTG GCTTCTGGCA GCCCAGGACA CCTCAGACCA GCCAGGCTCA CCGGATTCTC AGTGGGGCCA ATGAATCGAG GCACCCTCCT CTGCGAGTGG GCCAAGCAGT CTGGTGGCGC CCCCTCAAGA GGGGGTGGGG GACCAGTGGA AAGAGAAACG ACAGCCAAGC CAGCGGGCAG TTTGCTGAGA CTGGAGATCC TTCCCAGAGA TGGCTCCCCG GGCCTATGTG CGCCTGCCCA TACCTGGATA GCTGGGGGCG GCCTTCTCTG TCCTTCGAAT GCCATCCATA TTGGAAATAC GTCAGGGTTA GGAGGCGTGG TTGGCTCCGA CCTCCCGAGA CAGTTTCGGA CTCACAGCCT

ATCCTCCAGC CCAGGTTGCT TGCAGCTCCA CATCTCGTAA ACTTCGCACT AACTCCTCAG TGTCCAGAAC TCTTTTTGCA TTGCTCTGGA CTCACGGCCT ACTTTGTGAT ATGGCCTGGA TTGAGGTGAA AAATGCAGTT ATGTGCGCTT GTCAGAGCAA ACGCCCTGGA CTGCAGCCAT ATTTTGTGTC GGGCCCCCTT TTCCTGTCAA TTCAGCAAAA TCTCAGAGCT CTGTGGCAGC GACCTCCTCG CCTTTTCTCA AGCCCAAGAG TCTACTTTGT ATGTCCAGGC ACCGGAACGG TGGGAGCCTT TGGGCAAGGT ACAGTGCCCT CCTTATCCAG TGAACTTCCA GCTGCTGCCA CCCGCGTGCA GTCTGCGCAA TGCAGGAGGA ACTGGCTCTG ACTCTCTGAC TGGCCACCCC TGTCTGTCCG AAAACCTGAC GAGGGCTGGC TGGTGCCCAC TCCCTGTGGG GAGAGGAGCT CTGGCAACTC CAGCCTCAGA CCTCCCTCTT CACTGGCTGC CCAAGGACCA AGGCGGATGG TTGTGTTGAA

CATGAGGCTG CTTGTTCTCT GGATGTGCCC TAATGTCCCC CCTCAGTCTC AGGCCCTGAG GACAAACATC GACGGACCAG TCAGAAGATG GGGGGTGGGG CCCAGACATC ATCCAACAGC GATCACCCCT AGACATCCAG TGGGCTCCTA GCTGGTGAAT GAAGCTGAAT CATTGAGTCT ATCTCCCTTC CCTGCTGCAG AGTTTCTGCC CACAGACGGG GCAGCTGTCA CCCACCTTCA TGTTGGGGAC TTACTACTAC GACCCTGACC GGCCTTCTAC TGGGGCCTGC GGAGTCCGTG GGACACAGCT CTTTGAAGCT TCAGGTGTTC AAAGAGACTA AAAGGCGATT GGATGGGGTG GCAGCCGGAC GAAGAGCAGG GGACCTGATT GAGAGTGGAA CACGTGGGAG AGTCCAGCTC CCGCTTTGAG TGTGAGCGTC CCAGCAGGTG GGCAGCCGCC AGATGCGGTG GGTCTATGAA TGATCCCAAT TCCATTGGAC GAGGCTTCCT TTCCCGGTAC CGCCGTGGAT TTCCTATGCG GGTCCTGAGT

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300

cDNA sequence TTGGCCCAGG CTGTCCCAGC ATGCAGGGGG GCCCTTCATC GTGGAAGCCT CTGGGTGCCC GACCTGCTCG CTGTACTGGG CGCAACCCAT GCCCTGCTGC TGGCTCACCC GCCCTGGATG AATGTGACTC AACCGCCAGA GTGAAGGTGG GACATGAAGA GAGTACACGA AAGGATGACC AGGAACCGCC TACACCGTGT GTCACCCTCC TCTGACCGTT TCGGTCCCCA CTGGTGCAGC GTGTTTTACG TGCCAGTGTG GACGAGGATG CAGGTTAAGG ACCCAAGTCC CTGGTTCGAG CTGGATGGGG TGGATCGAGG GCCCAGCTCA CCCTCCCACC GTGTCCGCTT GTGTCAGTGT

AGCAGGTAGG AGCAGGCTGA GTTTGGTGGG ATGGGCTGGC CCATCTCAAA ACGCAGCTGC GTGTTGCCCA GCTCAGTCAC CCGACCCCAT ACCTCCTGCT GTCAGGGCAG CCCTGTCTGC TCAGCTCCAC TTCGCGGCCT GAGGAAACAG ACACGACCTG TGGAAGCAAA CAGATGCCCC GCAGGAGGGA GCATCTGGCG TGAGTGGATT ACGTGAGTCA CCTCCCGGGA CGGCCAGCGC GGGCACCAAG CTGAGGGGAA GCTACAGGAT TTCTCCGAGA TGCACTTCAC CCTCCTGCCG CCACCTATGA AGATGCCCTC ACGACTTCCT TCCGCTGGGA TCATGAACAC TGGC

continued AGGCTCGCCT CGGCTCGTTC CAATGATGAG CGTCTTCCAG GGCAAACTCA CATCACGGCC CAACAACCTC TGGTTCTCAG GCCCCAGGCC TCACGAGGGC CTTCCAAGGG CTACTGGATT AGGCCGGAAT GGAGGAGGAG CAAAGGAACC CCAGGACCTA CGAGGACTAT TCTGCAGCCC GGCGCCCAAG GAACGGCAAG CCACGCCCTG CTTTGAGACC GTGCGTGGGC AACCCTGTAC TAAGAGCAGA GTGCCCTCGC GAAGTTTGCC AGACAGCAGA CAAGGATGTC CCTTCGCTTG CCTCGAGGGA TGAACGCCTG CCAGGAGTAT GGAACCTGAA AGCCTGGGAC

GAGAAACTGC CAGGACCCCT ACTGTGGCAC GATGAGGGTG TTTTTGGGGG TATGCCCTGT ATGGCAATGG AGCAATGCCG CCAGCCCTGT AAAGCAGAGA GGATTCCGCA GCCTCCCACA GGGTTCAAGT CTGCAGTTTT CTGAAGGTCC CAGATAGAAG GAGGACTATG GTGACACCCC GTGGTGGAGG GTGGGGCTGT CGTGCTGACC GAGGGGCCCC TTTGAGGCTG GACTACTACA CTCTTGGCCA CAGCGTCGCG TGCTACTACC GCTGCTTTCC AAGGCCGCTG GAACCTGGGA CACCCCCAGT TGCCGGAGCA GGCACTCAGG CCTGGGAACC CAGGGCATAT

AGGAGACATC GTCCAGTGTT TCACAGCCTT CAGAGCCATT AGAAAGCAAG CACTGACCAA CCCAGGAGAC TGTCGCCCAC GGATTGAAAC TGGCAGACCA GTACCCAAGA CCACTGAGGA CCCACGCGCT CCTTGGGCAG TTCGTACCTA TGACAGTCAA AGTACGATGA TGCAGCTGTT AGCAGGAGTC CTGGCATGGC TGGAGAAGCT ACGTCCTGCT TGCAGGAAGT ACCCCGAGCG CCTTGTGTTC CCCTGGAGCG CCCGTGTGGA GCCTCTTTGA CTAATCAGAT AAGAATATTT ACCTGCTGGA CCCGCCAGCG GGTGCCAGGT ATGAAGCTGG TAAAGGCTTT

TAACTGGCTT AGACAGGAGC TGTGACCATC GAAGCAGAGA TGCTGGGCTC GGCGCCTGTG TGGAGATAAC CCCGGCTCCT CACAGCCTAC GGCTTCGGCC CACGGTGATT GAGGGGTCTC GCAGCTGAAC CAAGATCAAT CAATGTCCTG AGGCCACGTC GCTTCCAGCC TGAGGGTCGG CAGGGTGCAC CATCGCGGAC GACCTCCCTC GTATTTTGAC GCCGGTGGGG CAGATGTTCT TGCTGAAGTC GGGTCTGCAG GTACGGCTTC GACCAAGATC GCGCAACTTC GATCATGGGT CTCGAATAGC GGCAGCCTGT GTGAGGGCTG AAGCACTGCT TGGCAGCAAA

33 60 3 42 0 3480 3 540 3 600 3 660 3 72 0 3 780 3 84 0 3 9 00 3 9 60 4 02 0 40 80 414 0 42 00 42 6 0 43 2 0 43 8 0 4440 45 00 4 5 60 4 62 0 4 6 80 474 0 4800 4860 492 0 49 8 0 5 04 0 5100 5160 52 2 0 52 80 53 4 0 54 0 0

The first five nucleotides in each exon are underHned to indicate the intron-exon boundaries. The methionine initiation codon (ATGJ, the termination codon (TGA) and the polyadenylation signal (ATTAAA) are indicated.

Genomic structure^^^^^ Human C4A and C4B are normally tandem loci that are separated by -10 kb. Whereas C4A is -21 kb in length, C4B can be either 14.6 kb or 21 kb in length. The size difference is solely due to a larger intron 9 in the long C4B gene, which in turn is due to the insertion of an -6.4 kb retroposon element within this intron^^. Both human genes, as well as the mouse C4 gene consist of 41 exons.

1

6.4 kb Intron in C4A andC4B longl

. ,. , ^'^^ , 20

UtiHWiimillllllllilllllHI

41

llll

HHH

Accession numbers Human

C4A23,29

cDNA K02403*

C4B30

Mouse Xenopus Chicken

C 4 (B10.WRp2,33 Q^34

C4

Ml 1729 D78003

Genomic M59815 M59816 AF019413 U24578 J05095 AL023516

* K02403 provides the cDNA for human precursor C4A as it was originally published^^. This sequence is not fully correct in that it is missing part of the leader peptide and 9 nucleotides (GACTATGAG) coding for a repeating DYE peptide sequence near the C-terminus of a chain, a sequence that has been found in other C4A and C4B clones and at the protein leveP*. Both of these segments have been corrected in the cDNA of human C4A presented in the section above. The protein sequence presented above, as well as the protein sequence links available from the C4A gene accession numbers, have been corrected for these problems. The protein sequence link for human C4B from accession number AF019413 is full length, whereas that from U24578 is not and contains sequence ambiguities.

Deficiency Given the tandem nature of the C4 locus, in most individuals there will be expression from two C4A genes and two C4B genes. The population frequency for non-expression of 1, 2 or 3 of these alleles is respectively 3 5 % , 8-10% and 1%, regardless of racial backgrounds^. Non-expression of all four alleles is extremely rare, but most individuals who are totally deficient in C4 have systemic or discoid lupus erythematosus, i m m u n e complex disease of the kidney and increased susceptibility to infections'^^. Even partial deficiencies, especially at the C4A locus, predispose individuals to i m m u n e complex disease^^. Null alleles for C4 have three origins: (1) about 40% are due to deletion of the entire gene, (2) some C4A null alleles are due to the presence of pseudogenes at this locus, and (3) non-deleted C4B nulls have C4A isotype-specific sequences and thus likely represent gene conversions^*^.

Polymorphic variants In addition to the non-expressed alleles at both C4 loci in humans, there are at least 13 allelic polymorphisms of C4A and 16 of C4Bs^. Within an isotype, allotypes are given numbers in increasing order of their anodal migration at alkaline pH. The two most common allotypes in all ethnic groups are C4A3 and C4BP''. The isotype-specific polymorphic residues have been identified within the C4d region as have some allotype-specific residues in all three chains^^. In most cases, allotypic amino acid interchanges are without functional consequence, but an exception is C4A6 in which p chain residue R477 is replaced by W (as a result of a C to T transition in the first nucleotide of the codon), which results in the molecule losing the ability to bind C5^*'^^.

References ' 2 3 ^ 5 6 ^ « ^

20 2^ 22 23 2^ 25 26 27 28 29 30 3^ 32 33 3^ 35 36 37 38 39

Dodds, A.W. and Law, S.K.A. (1990) Biochem. J. 265, 495-502. Chan A.C. and Atkinson, J.P. (1985) J. Immunol. 134, 1790-1798. Janatova, J. (1986) Biochem. J. 233, 819-825. Seya, T. et al. (1986) J. Immunol. 136, 4152-4156. Dolmer, K. and Sottrup-Jensen, L. (1993) FEBS Lett. 315, 85-90. Smith, C.A. et al. (1984) J. Exp. Med. 159, 324-329. Perkins, S.J. et al. (1990) Biochemistry 29, 1167-1175. Nagar, B. et al. (1998) Science 280, 1277-1281. Perkins, S.J. et al. (1990) Biochemistry 29, 1175-1180. Hortin, G. et al. (1986) J. Biol. Chem. 261, 9065-9069. Takata, Y. et al. (1987) J. Exp. Med. 165, 1494-1507. Kim, U.K. et al. (1992) J. Biol. Chem. 267, 4171-4176. Hugh, T.E. et al. (1983) Mol. Immunol. 20, 637-645. Murakami, Y. et al. (1993) Immunol. Lett. 36, 301-304. Tornetta, M.A. et al. (1997) J. Immunol. 158, 5277-5282. Law, S.K.A. and Dodds, A.W. (1997) Protein Sci. 6, 263-274. Reilly, B.D. and Mold, C. (1997) Clin. Exp. I m m u n o l 110, 310-316. Cox, B.J. and Robins, D.M. (1988) Nucleic Acids Res. 16, 6857-6870. Whaley, K. (1980) J. Exp. Med. 151, 501-516. Andoh, A. et al. (1993) J. Immunol. 151, 4239-4247. Kulics, J. et al. (1990) J. Clin. Invest. 85, 943-949. Mitchell, T.J. (1996) J. Immunol. 156, 4429-4434. Belt, K.T. et al. (1984) Cell 36, 907-914. Ebanks, R.O. et al. (1995) J. Immunol. 154, 2808-2820. Hessing, M. et al. (1990) FEBS Lett. 271, 131-136. Yu, C.Y. et al. (1988) Immunogenetics 27, 399-405. Campbell, R.D. et al. (1988) Annu. Rev. Immunol. 6, 161-195. Belt, K.T. et al. (1985) Immunogenetics 21, 173-180. Yu, C.Y. (1991) J. Immunol. 146, 1057-1066. Ulgiati, D. et al. (1996) Immunogenetics 43, 250-252. Dangel, A.W. et al. (1994) Immunogenetics 40, 425-436. Ogata, R.T. et al. (1989) J. Biol. Chem. 264, 16565-16572. Sepich, D.S. et al. (1985) Proc. Natl Acad. Sci. USA 82, 5895-5899. Mo, R. et al. (1996) Immunogenetics 43, 360-369. Hauptmann, G. et al. (1988) Immunodeficiency Rev. 1, 3-22. Braun, L. et al. (1990) J. Exp. Med. 171, 129-140. Mauff, G. et al. (1990) Complement Inflamm. 7, 261-268. Anderson, M.J. et al. (1992) J. Immunol. 148, 2795-2802. Ebanks, R.O. et al. (1992) J. Immunol. 148, 2803-2811.

Rick A. Wetsel, University of Texas-Houston Institute of Molecular Medicine, Houston, TX, USA

Physicochemical properties C5 is synthesized as an intracellular single-chain precursor, pro-C5^, of 1676 amino acids including an 18 amino acid leader peptide, and an arginine-rich linker sequence (RPRR) located between the N-terminal /3 chain and the C-terminal a chain. The signal and linker peptides are processed from the promolecule during secretion, yielding the mature, native two-chain structure that is held together by a disulfide bond(s) and non-covalent forces. Mature protein: pF 4.7-5.5 6.0 (calculated) Amino acids^ Mr (K) predicted observed^ N-linked glycosylation site^ Interchain disulfide bonds 1 ^a (putative)^

j8 chain 19-673 73.3 75.0

a chain 678-1676 112.5 115.0 1 (741)

567

810

Structure Electron microscopy^, neutron^ and X-ray scattering, and physical chemical studies have indicated that C5 is an asymmetrical two-domain molecule with a multilobal, irregular ultrastructure of dimensions 10.4x14x16.8 nm. NMR modelling of C5a peptide suggests a drumstick shape with high helical content through residue 70. The C-terminal segment containing the activation domain is seen as a dynamic random coiP-^^.

Function C5 is cleaved during complement activation by both C5 convertase enzymes at a specific site within the a chain R751-L752, into C5a and C5b fragments". After activation, the Mr(K) 11.2 C5a peptide of 74 amino acids that is derived from the N-terminus of the a chain (residues 678-751) is released from the parent molecule. C5a acts as a potent anaphylatoxin causing smooth muscle contraction, increased vascular permeability, basophil and mast cell degranulation and lysosomal enzyme release*. In addition, C5a stimulates the directed migration of neutrophils, eosinophils, basophils and monocytes. Recent studies indicate that C5a also modulates the hepatic acute-phase gene response" and augments the overall immune response by increasing protein synthesis of TNFa, IL-1/3, IL-6, and IL-8 from certain cell types. Plasma enzyme carboxypeptidase N (EC 3.4.12.7) removes the C-terminal arginine from the C5a peptide forming the C5 desArg derivative. On a molar basis, human C5a des-Arg expresses only 1 % as much anaphylatoxic activity and 1 % as PMN chemotactic activity as does

undigested C5a. The C5b fragment initiates the assembly of the membrane attack complex (MAC) that mediates cytolysis of viral and bacterial pathogens. C5a a-chain (3-chain

400 aa

Tissue distribution Serum protein: IS A^g/ml in plasma^^. Primary site of synthesis: liver (hepatocytes). Secondary sites: lung, spleen, fetal intestine, monocytes, macrophages, and type II alveolar cells.

Regulation of expression Regulated expression of C5 has not been explored in depth. In studies that have been performed, C5 expression was not regulated by inflammatory stimuli to any great degree. Cytokines such as IL-6 and IL-ljS that regulate the hepatic expression of numerous acute-phase proteins do not affect C5 expression.

Protein sequence3,13 MGLLGILCFL DATISIKSYP YVYLEWSKH DLKPAKRETV IKAKYKEDFS KARYFYNKW FDSETAVKEL PYKLNLVATP TSDLDPSKSV GYRAIAYSSL YLILSKGKII TAELVSDSVW ALAAVDSAVY LTFLTNANAD DGACVNNDET RLHMKTLLPV GIGISNTGIC

IFLGKTWGQE DKKFSYSSGH FSKSKRMPIT LTFIDPEGSE TTGTAYFEVK TEADVYITFG SYYSLEDLNN LFLKPGIPYP TRVDDGVASF SQSYLYIDWT HFGTREKFSD LNIEEKCGNQ GVQRGAKKPL DSQENDEPCK CEQRAARISL SKPEIRSYFP VADTVKAKVF

QTYVISAPKI VHLSSENKFQ YDNGFLFIHT VDMVEEIDHI EYVLPHFSVS IREDLKDDQK KYLYIAVTVI IKVQVKDSLD VLNLPSGVTV DNHKALLVGE ASYQSINIPV LQVHLSPDAD ERVFQFLEKS EILRPRRTLQ GPRCIKAFTE ESWLWEVHLV KDVFLEMNIP

FRVGASENIV NSAILTIQPK DKPVYTPDQS GIISFPDFKI lEPEYNFIGY EMMQTAMQNT ESTGGFSEEA QLVGGVPVIL LEFNVKTDAP HLNIIVTPKS TQNMVPSSRL AYSPGQTVSL DLGCGAGGGL KKIEEIAAKY CCWASQLRA PRRKQLQFAL YSWRGEQIQ

IQVYGYTEAF QLPGGQNPVS VKVRVYSLND PSNPRYGMWT KNFKNFEITI MLINGIAQVT EIPGIKYVLS NAQTIDVNQE DLPEENQARE PYIDKITHYN LVYYIVTGEQ NMATGMDSWV NNANVFHLAG KHSWKKCCY NISHKDMQLG PDSLTTWEIQ LKGTVYNYRT

50 100 150 200 2 50 3 00 3 50 400 450 500 550 600 650 7 00 750 800 850

1

1

Protein sequence SGMQFCVKMS LPLEIGLHNI GTISRRKEFP THLPKGSAEA EGMLSIMSYR CNSLLWLVEN GIRKAFDIGP THPQFRSIVS TAYALLTSLN SLLVKQLRLS GFGSGLATVH RIVAGASYKP TDYQIKDGHV PDKQCTMFYS TAGKPEIAYA IKKVTCTNAE DTTCSSCQAF

continued

AVEGICTSES NFSLETWFGK YRIPLDLVPK ELMSWPVFY NADYSYSVWK YQLDNGSFKE LVKIDTALIK ALKREALVKG LKDINYVNPV MDIDVSYKHK VTTWHKTST SREESSSGSS ILQLNSIPSS TSNIKIQKVC YKVSITSITV LVKGRQYLIM LANLDEFAED

PVIDHQGTKS EILVKTLRW TEIKRILSVK VFHYLETGNH GGSASTWLTA NSQYQPIKLQ ADNFLLENTL NPPIYRFWKD IKWLSEEQRY GALHNYKMTD SEEVCSFYLK HAVMDISLPT DFLCVRFRIF EGAACKCVEA ENVFVKYKAT GKEALQIKYN IFLNGC

SKCVRQKVEG PEGVKRESYS GLLVGEILSA WNIFHSDPLI FALRVLGQVN GTLPVEAREN PAQSTFTLAI NLQHKDSSVP GGGFYSTQDT KNFLGRPVEV IDTQDIEASH GISANEEDLK ELFEVGFLSP DCGQMQEELD LLDIYKTGEA FSFRYIYPLD

SSSHLVTFTV GVTLDPRGIY VLSQEGINIL EKQKLKKKLK KYVEQNQNSI SLYLTAFTVI SAYALSLGDK NTGTARMVET INAIEGLTEY LLNDDLIVST YRGYGNSDYK ALVEGVDQLF ATFTVYEYHR LTISAETRKQ VAEKDSEITF SLTWIEYWPR

900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650

The signal peptide sequenc(I and cleavaee site, (RPRRl. between the Band a chains are underUned. The known N-Hnked glycosylation site in the C5a peptide is indicated (N). The linker peptide is lost from the mature protein.

Protein modules 1-18 19-673 678-1676 678-751 752-1676

Leader peptide P chain a chain C5a peptide CSa' chain

Chromosomal location Human^^: 9q33. Telomere ... C8G ... ABL ... C5 ... PAPPA ... Centromere Mouse^^: chromosome 2. Telomere ... C8g ... Abl ... He ... Pappa ... Centromere

cDNA sequence3,13 CTACCTCCAA TGGGGACAGG GAAAATATTG AAAAGTTATC AATAAATTCC AACCCAGTTT ATGCCAATAA CCAGACCAGT AGAGAAACTG ATTGATCATA

CCATGGGCCT AGCAAACATA TGATTCAAGT CTGATAAAAA AAAACTCTGC CTTATGTGTA CCTATGACAA CAGTAAAAGT TCTTAACCTT TTGGAATTAT

TTTGGGAATA TGTCATTTCA TTATGGATAC ATTTAGTTAC AATCTTAACA TTTGGAAGTT TGGATTTCTC TAGAGTTTAT CATAGATCCT CTCTTTTCCT

CTTTGTTTTT GCACCAAAAA ACTGAAGCAT TCCTCAGGCC ATACAACCAA GTATCAAAGC TTCATTCATA TCGTTGAATG GAAGGATCAG GACTTCAAGA

TAATCTTCCT TATTCCGTGT TTGATGCAAC ATGTTCATTT AACAATTGCC ATTTTTCAAA CAGACAAACC ACGACTTGAA AAGTTGACAT TTCCGTCTAA

GGGGAAAACC 60 TGGAGCATCT 12 0 AATCTCTATT 18 0 ATCCTCAGAG 240 TGGAGGACAA 3 00 ATCAAAAAGA 3 60 TGTTTATACT 42 0 GCCAGCCAAA 4 80 GGTAGAAGAA 54 0 TCCTAGATAT 600

cDNA sequence GGTATGTGGA TTTGAAGTTA TTCATTGGTT AATAAAGTAG GATGATCAAA GCTCAAGTCA GATTTAAACA TCTGAAGAGG GTTGCTACTC GATTCGCTTG GTAAACCAAG GTAGCTTCCT ACTGATGCTC TACTCATCTC CTAGTGGGAG ACTCACTATA AAATTTTCAG TCATCCCGAC GATTCAGTCT CCTGATGCAG GATTCCTGGG AAAAAGCCCT GGTGGTGGCC GCAAATGCAG AGAACGCTGC AAATGTTGTT CGGATTAGTT CAGCTCCGTG CTGTTACCAG GTTCATCTTG TGGGAAATTC GCAAAGGTGT GAACAGATCC GTTAAAATGT GGCACAAAGT ACATTCACTG TGGTTTGGAA GAAAGCTATT AAGGAGTTCC TTGAGTGTAA ATCAATATCC CCAGTATTCT GACCCATTAA ATGTCCTACA TGGTTAACAG CAAAATTCAA TCTTTCAAGG GCCCGAGAGA GATATATGCC GAAAATACAC CTGGGAGATA TTGGTTAAAG AGCTCTGTAC ACCAGTCTGA

CGATCAAGGC AAGAATATGT ACAAGAACTT TCACTGAGGC AAGAAATGAT CATTTGATTC ACAAGTACCT CAGAAATACC CTCTTTTCCT ACCAGTTGGT AGACATCTGA TTGTGCTTAA CAGATCTTCC TCAGCCAAAG AACATCTGAA ATTACTTGAT ATGCATCTTA TTCTGGTCTA GGTTAAATAT ATGCATATTC TGGCATTAGC TGGAAAGAGT TCAACAATGC ATGACTCCCA AAAAGAAGAT ACGATGGAGC TAGGGCCAAG CTAATATCTC TAAGCAAGCC TTCCCAGAAG AAGGCATTGG TCAAAGATGT AATTGAAAGG CTGCTGTGGA CCTCCAAATG TGCTTCCTCT AAGAAATCTT CTGGTGTTAC CATACAGGAT AAGGACTGCT TAACCCACCT ATGTTTTTCA TTGAAAAGCA GAAATGCTGA CTTTTGCTTT TTTGTAATTC AAAATTCACA ACAGCTTATA CCCTGGTGAA TGCCAGCCCA AAACTCACCC GTAATCCACC CTAACACTGG ACTTGAAAGA

continued TAAATATAAA CTTGCCACAT TAAGAATTTT TGACGTTTAT GCAAACAGCA TGAAACAGCA TTATATTGCT TGGCATCAAA GAAGCCTGGG AGGAGGAGTC CTTGGATCCA TCTCCCATCT AGAAGAAAAT TTACCTTTAT TATTATTGTT TTTATCCAAG TCAAAGTATA TTATATCGTC TGAAGAAAAA TCCAGGCCAA AGCAGTGGAC ATTTCAATTC CAATGTGTTC AGAAAATGAT AGAAGAAATA CTGCGTTAAT ATGCATCAAA TCATAAAGAC AGAAATTCGG AAAACAGTTG CATTTCAAAC CTTCCTGGAA AACTGTTTAC GGGAATCTGC TGTGCGCCAG GGAAATTGGC AGTAAAAACA TTTGGATCCT ACCCTTAGAT TGTAGGTGAG CCCCAAAGGG CTAGCTGGAA GAAACTGAAG CTACTCTTAC AAGAGTACTT TTTATTGTGG GTATCAACCA TCTTACAGCC AATCGACACA GAGCACCTTT ACAGTTTCGT CATTTATCGT TACGGCACGT TATAAATTAT

GAGGACTTTT TTTTCTGTCT GAAATTACTA ATCACATTTG ATGCAAAACA GTCAAAGAAC GTAACAGTCA TATGTCCTCT ATTCCATATC CCAGTAATAC AGCAAAAGTG GGAGTGACGG CAGGCCAGGG ATTGATTGGA ACCCCCAAAA GGCAAAATTA AACATTCCAG ACAGGAGAAC TGTGGCAACC ACTGTGTCTC AGTGCTGTGT TTAGAGAAGA CACCTAGCTG GAACCTTGTA GCTGCTAAAT AATGATGAAA GCTTTCACTG ATGCAATTGG AGTTATTTTC CAGTTTGCCC

CAACAACTGG CAATCGAGCC TAAAAGCAAG GAATAAGAGA CAATGTTGAT TGTCATACTA TAGAGTCTAC CTCCCTACAA CCATCAAGGT TGAATGCACA TAACACGTGT TGCTGGAGTT AAGGTTACCG CTGATAACCA GCCCATATAT TCCATTTTGG TAACACAGAA AGACAGCAGA AGCTCCAGGT TTAATATGGC ATGGAGTCCA GTGATCTGGG GACTTACCTT AAGAAATTCT ATAAACATTC CCTGTGAGCA AATGTTGTGT GAAGGCTACA CAGAAAGCTG TACCTGATTC ACTGGTATAT' GTGTTGCTGA ATGAATATAC CATATTCTGT AACTATAGGA CTTCTGGGAT ACTTCGGAAA GCCCAGTCAT AAAGTAGAGG GCTCCTCCAG CTTCACAACA TCAATTTTTC TTACGAGTGG TGCCAGAAGG AGGGGTATTT ATGGTACCAT TTGGTCCCCA AAACAGAAAT ATCTTGTCTG CAGTTCTAAG AGTGCAGAGG GGGAGCTGAT ACAGGAAATC ATTGGAACAT AAAAAATTAA AAGAAGGGAT AGTGTGTGGA AGGGTGGAAG GGACAAGTAA ATAAATACGT CTAGTTGAGA ATTATCAATT ATAAAATTAC AGGGTACCTT TTTACTGTGA TTGGAATTAG GCTCTAATTA AAGCTGACAA ACATTGGCCA TTTCTGCGTA TCAATTGTTT CAGCTTTGAA TTTTGGAAAG ACAATCTTCA ATGGTAGAAA CAACTGCCTA GTTAACCCAG TCATCAAATG

AACCGCATAT AGAATATAAT ATATTTTTAT AGACTTAAAA AAATGGAATT CAGTTTAGAA AGGTGGATTT ACTGAATTTG GCAGGTTAAA AACAATTGAT TGATGATGGA TAATGTCAAA AGCAATAGCA TAAGGCTTTG TGACAAAATA CACGAGGGAG CATGGTTCCT ATTAGTGTCT TCATCTGTCT AACTGGAATG AAGAGGAGCC CTGTGGGGCA CCTCACTAAT CAGGCCAAGA AGTAGTGAAG GCGAGCTGCA CGTCGCAAGC CATGAAGACC GTTGTGGGAA TCTAACCACC TACTGTCAAG TGTACGAGGA GCAGTTCTGT TGATCATCAG TCACTTGGTG ACTGGAGACT TGTCAAAAGG TAGCAGACGA CAAAAGGATT TCAGGAAGGC GAGCGTTGTC TTTTCATTCT GTTGAGCATT TGCTAGCACT AGAGCAGAAC AGATAATGGA GCCTGTTGAA AAAGGCTTTC CTTTCTGCTT TGCTCTTTCC GAGAGAAGCT GCATAAAGAC TGCTTTACTC GCTATCAGAA

660 720 780 840 900 960 1020 108 0 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 210 0 2160 222 0 22.8 0 2340

2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840

1 cDNA sequence GAGCAGAGGT CTGACGGAAT TACAAGCATA CCAGTAGAGG GCTACAGTAC TTTTATTTGA TCTGATTACA TCTGGATCCT GAAGACTTAA GATGGACATG TTCCGGATAT GAATACCACA CAGAAAGTCT GAAGAATTGG ATTGCATATG TACAAGGCAA GAGATTACCT TACTTAATTA TACCCTTTAG TGTCAAGCAT TAAAATTCCT CGTTTTTTTG TTTACTTAGA CTGAAATAAC CGGAAACAAT GAAAGAACAG ACCAAGGAAC

continued

ATGGAGGTGG ATTCACTCCT AAGGTGCCTT TGCTTCTCAA ATGTAACAAC AAATCGATAC AACGCATAGT CTCATGCGGT AAGCCCTTGT TTATTCTGCA TTGAACTCTT GACCAGATAA GTGAAGGAGC ATCTGACAAT CTTATAAAGT CCCTTCTGGA TCATTAAAAA TGGGTAAAGA ATTCCTTGAC TTTTAGCTAA GAAGTTCAGC TTTTCTTCTT ATTAGTGGCA ATGGCCTTGG AAATTGGAAC TCCATTGAAA AGGAAACTGA

CTTTTATTCA GGTTAAACAA ACATAATTAT TGATGACCTC TGTAGTTCAC TCAGGATATT AGCATGTGCC GATGGACATG GGAAGGGGTG ACTGAATTCG TGAAGTTGGG ACAGTGTACC CGCGTGCAAG CTCTGCAGAG TAGCATCACA TATCTACAAA GGTAACCTGT AGCCCTCCAG CTGGATTGAA TTTAGATGAA TGCATACAGT TTTTTAAACA CTTGCTTTTA AGGGCATGAA ACCTCCTCAA GGGAGTATTA TCATTAAAGC

ACCCAGGACA CTCCGCTTGA AAAATGACAG ATTGTCAGTA AAAACCAGTA GAAGCATCCC AGCTACAAGC TCCTTGCCTA GATCAACTAT ATTCCCTCCA TTTCTCAGTC ATGTTTTATA TGTGTAGAAG ACAAGAAAAC TCCATCACTG ACTGGGGAAG ACTAACGCTG ATAAAATACA TACTGGCCTA TTTGCCGAAG TTGCACTTAT TTCATAGCTG TTAGAGAATG GACAGATACT ACCTACCACT CAAAAACATG CTGAGTTTGC

CCATCAATGC GTATGGACAT ACAAGAATTT CAGGATTTGG CCTCTGAGGA ACTACAGAGG CCAGCAGGGA CTGGAATCAG TCACTGATTA GTGATTTCCT CTGCCACTTT GCACTTCCAA CTGATTGTGG AAACAGCATG TAGAAAATGT CTGTTGCTGA AGCTGGTAAA ATTTCAGTTT GAGACACAAC ATATCTTTTT GGACTCCTGT GTCTTATTTG ATTTCAAATG CCTCCAAGGT CAGGAATGTT GCCTTTGCTT TTTC

CATTGAGGGC CGATGTTTCT CCTTGGGAGG CAGTGGCTTG AGTTTGCAGC CTACGGAAAC AGAATCATCA TGCAAATGAA CCAAATCAAA TTGTGTACGA CACAGTTTAC TATCAAAATT GCAAATGCAG TAAACCAGAG TTTTGTCAAG GAAAGACTCT AGGAAGACAG CAGGTACATC ATGTTCATCG AAATGGATGC TGTTGAAGTT TAAAGCTCAC CTGTAACTTT TATTGGACAC TGCTGGGGCC GAAAGAAAAT

3 9 00 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATG), the termination codon (TAAl and the polyadenylation signal (ATTAAAl are indicated.

Genomic structure^^' 17 The human C5 structural gene is 79 kb in length and is highly interrupted by 41 intron-exon boundaries. Although C5 is much larger than the C3 (41 kb) and C4 (20 kb) genes, the intron-exon organization of all three family members is very similar. Two alternatively processed human C5 polyadenylated transcripts have been cloned and sequenced. No in vivo function has been attributed to them, however. 5kb, 21

I I II

I I I iiiimii

Accession numbers Human: Mouse:

cDNA M57729 J05234

41

III—H+fl—III II iiMiiii I II

Genomic M72430 M64852

Deficiency Autosomal recessive. Several C5-deficient families have been reported from different ethnic backgrounds and from different geographic regions. Sera from homozygous deficient individuals lack bactericidal activity and have a severely impaired ability to induce chemotaxis. Consequently, all C5deficient individuals display a propensity for severe recurrent infections, particularly to neisserial species, including meningitis and extragenital gonorrhoea. In addition, one individual with complete C5 protein deficiency has SLE, but it is not clear if this is directly related to her C5 deficiency. Human mutations identified^*: African-American families. Two nonsense mutations: C67toT;Q19tostop C4438 to T; R1476 to stop Mouse mutations identified^^: 2 bp gene deletion TA768-769 in exon 7

Polymorphic variants Polymorphisms of C5 have been examined only cursorily. A protein polymorphism of C5 has been reported in populations of the South Pacific region^^ and from cDNA sequence studies^. T1565C;F518S

References ' Ooi, Y.M. and Colten, H.R. (1979) J. Immunol. 123, 2494-2498. 2 DiScipio, R.G. et al. (1983) J. Biol. Chem. 258, 10629-10636. 3 Haviland, D.L. et al. (1991) J. Immunol. 146, 362-368. ^ Tack, B.F. et al. (1979) Biochemistry 18, 1490-1497. 5 Fernandez, H.N. and Hugh, T.E. (1978) J. Biol. Chem. 253, 6955-6962. 6 Dolmer, K. and Sottrup-Jensen, L. (1993) FEBS Lett. 315, 85-90. 7 Perkins, S.J. et al. (1990) Biochemistry 29, 1175-1180. « Gerard, C. and Gerard, N.P. (1994) Annu. Rev. Immunol. 12, 775-808. 9 Greer, J. (1985) Science 228, 1055-1060. ^0 Zuiderweg, R.P. et al. (1988) Proteins Struct., Funct. Genet. 3, 139-145. ^^ Lambris, J.D. et al. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.F. eds). Marcel Dekker, New York, pp. 83-118. ^2 Kohler, P.F. and Muller-Eberhard, H.J. (1967) J. Immunol. 99, 1211-1216. ^3 Wetsel, R.A. et al. (1990) J. Biol. Chem. 265, 2435-2440. ^^ Wetsel, R.A. et al. (1988) Biochemistry 27, 1226-1232. ^5 DTustachio, P. et al. (1986) J. Immunol. 137, 3990-3995. ^6 Haviland, D.L. et al. (1991) J. Biol. Chem. 266, 11818-11825. 1^ Carney, D.F. et al. (1991) J. Biol. Chem. 266, 18786-18791. ^« Wang, X. et al. (1995) J. Immunol. 154, 5464-5471. ^^ Hobart, M. et al. (1981) Ann. Hum. Genet. 45, 1-4.

This Page Intentionally Left Blank

Part 4 Terminal Pathway Components

Michael Hobart, Department of Biological Sciences, De Montfort University, Leicester, UK

Physicochemical properties C6 is synthesized as a single chain precursor of 934 amino acids including a leader peptide of 21 amino acids^-^. pi -6-6.5 (multiple bands and polymorphisms'^) -6.5-7 (after neuraminidase treatment) M,(K) -100 N-linked glycosylation sites 2 (324, 855)

D

Structure Not known.

Function C6 connects the components of the terminal complement complex to C5b, the last cleavage product of the complement enzyme cascades. The C5b6 complex avidly binds C7, but in its absence, the bimolecular complex has a prolonged half-life. On binding of C5b6 to C7 a site is revealed by which the complex binds to local substrates. These would normally be antigen on which the complement system was activated, including the membrane of cells or bacteria, but also carbohydrates (e.g. agarose)^. It is assumed that the binding site resides in C7. The bound complex attracts C8 and subsequently C9 to form the terminal complement complex which damages the osmotic integrity of cells.

Tissue distribution I

Serum protein: 45 mg/mP. Primary site of synthesis: liver^. ! Secondary site: macrophage.

D

Regulation of expression C6 is an acute-phase reactant^. CCAAT enhancer binding site is essential*.

Protein sequence2,3 MARRSVLYFI WDKYYQENF SKVRSVLRPS CIARKLECNG AGEPRGEVLD KTDFYKDLTS QAIQASHKKD

LLNALINKGQ CEQICSKQET QFGGQPCTEP ENDCGDNSDE NSFTGGICKT LGHNENQQGS SSFIRIHKVM

ACFCDHYAWT RECNWQRCPI LVAFQPCIPS RDCGRTKAVC VKSSRTSNPY FSSQGGSSFS KVLNFTTKAK

QWTSCSKTCN NCLLGDFGPW KLCKIEEADC TRKYNPIPSV RVPANLENVG VPIFYSSKRS DLHLSDVFLK

SGTQSRHRQI SDCDPCIEKQ KNKFRCDSGR QLMGNGFHFL FEVQTAEDDL ENINHNSAFK ALNHLPLEYN

50 100 150 2 00 2 50 3 00 3 50

Protein sequence SALYSRIFDD RIETKKRVLF YGAALAWEKG VTKRNNLRKA NCEKQSPDYK CEGEKRQEED ENGFIRNEKQ RTECIKPWQ WTPPISNSLT CVFDTDSNDY RLSSNSTKKE GSSTSEKTLN

continued

FGTHYFTSGS AKKTKVEHRC SSGLEEKTFS LQEYAAKFDP SNAVDGQWGC CTFSIMENNG LYLVGEDVEI EVLTITPFQR CEKDTLTKLK FTSPACKFLA SCGYDTCYDW ICEVGTIRCA

LGGVYDLLYQ TTNKLSEKHE EWLESVKENP CQCAPCPNNG WSSWSTCDAT QPCINDDEEM SCLTGFETVG LYRIGESIEL GHCQLGQKQS EKCLNNQQLH EKCSASTSKC NRKMEILHPG

FSSEELKNSG GSFIQGAEKS AVIDFELAPI RPTLSGTECL YKRSRTRECN KEVDLPEIEA YQYFRCLPDG TCPKGFWAG GSECICMSPE FLHIGSCQDG VCLLPPQCFK KCLA

LTEEEAKHCV ISLIRGGRSE VDLVRNIPCA CVCQSGTYGE NPAPQRGGKR DSGCPQPVPP TWRQGDVECQ PSRYTCQGNS EDCSHHSEDL RQLEWGLERT GGNQLYCVKM

40 0 450 500 550 60 0 650 7 00 750 800 85 0 9 00

The leader sequence is underlined and the TV-linked glycosylation sites are indicated (N).

Protein modules 1-21 22-80 81-137 138-175 176-516 517-553 563-611 643-700 703-763 766-840 858-934

Leader peptide TSPl TSPl LDLRA MACPF

EGF TSPl

CCP CCP FIMAC FIMAC

exon 1 exon 1/2 exon 2/3 exon 3/4 exon 4-10 exon 10 exon 11 exon 12/13 exon 14 exon 15/16 exon 16/17

Chromosomal location Human^-^^: 5 p l 2 - 1 4 . 5' C6 3 ' - 3 ' C7 5^^2,13 Orientation on chromosome unknown, near C9. Mouse^'*: chromosome 15. Two loci in some strains.

cDNA sequence^^ AGCTTAGGTC GCCAGACGCT TGCTTCTGTG GGAACCCAGA GAACAGATTT TGCCTCCTGG AAAGTTAGAT GTAGCCTTTC AATAAATTTC AATGACTGTG CGGAAGTATA

CGAGGACACC CTGTCTTGTA ATCACTATGC GCAGACACAG GCAGCAAGCA GAGATTTTGG CTGTCTTGCG AACCATGCAT GCTGTGACAG GAGACAATTC ATCCCATCCC

ACAAACTCTG CTTCATCCTG ATGGACTCAG ACAAATAGTA GGAGACTAGA ACCATGGTCA TCCCAGTCAG TCCATCTAAG TGGCCGCTGC AGATGAAAGG TAGTGTACAG

CTTAAAGGGC CTGAATGCTC TGGACCAGCT GTAGATAAGT GAATGTAACT GACTGTGACC TTTGGGGGAC CTCTGCAAAA ATTGCCAGAA GACTGTGGGA TTGATGGGCA

CTGGAGGCTC TGATCAACAA GCTCAAAAAC ACTACCAGGA GGCAAAGATG CTTGTATTGA AGCCATGCAC TTGAAGAGGC AGTTAGAATG GGACAAAGGC ATGGGTTTCA

TCAAGGCATG GGGCCAAGCC TTGCAATTCT AAACTTTTGT CCCCATCAAC AAAACAGTCT TGAGCCTCTG TGACTGCAAG CAATGGAGAA AGTATGCACA TTTTCTGGCA

60 120 180 240 300 360 420 480 540 600 660

cDNA sequence GGAGAGCCCA AAAAGCAGTA GAGGTACAAA GGACACAATG CCAATTTTTT GCCATTCAAG GTCTTAAACT CTTAACCATC GGGACTCATT AGCAGTGAGG ATTGAAACAA ACCAACAAGC TCCCTGATTC TCTGGTCTGG GTGATTGACT ACAAAACGGA CAGTGTGCTC GTGTGTCAGA AATGCAGTAG AAGAGATCGA GAGGGGGAGA CCATGTATCA TCCGGGTGTC TACTTGGTTG ^CAGTACTTCA ACGGAGTGCA TATAGAATTG TCAAGGTACA GAAAAAGATA TCTGAATGCA GTGTTTGACA AAATGTTTAA CAGTTAGAAT TGTGGCTATG TGCCTATTGC TCATCAACAA AGGAAGATGG CAGCACAATG ACAAACAGCA ATCATTATTC GCCCTGTAGC CATTGTACAA TCACAATTAG CTCCAGAACC ATTTCTGAGT AATAAAACAG CAAAAAATGA

GAGGAGAAGT GGACAAGTAA CTGCAGAAGA AAAATCAACA ATTCCTCAAA CCTCTCACAA TCACAACGAA TGCCTCTAGA ACTTCACCTC AACTAAAGAA AGAAACGCGT TGTCAGAGAA GAGGTGGAAG AGGAGAAGAC TTGAGCTTGC ACAACCTCAG CATGCCCTAA GTGGCACCTA ACGGACAGTG GAACCCGAGA AGCGACAAGA ATGATGATGA CTCAGCCAGT GAGAAGATGT GATGCTTACC TCAAGCCAGT GTGAATCCAT CATGCCAGGG CTCTAACAAA TTTGTATGTC CAGACTCCAA ATAATCAGCA GGGGTCTTGA ACACCTGCTA CCCCACAGTG GTGAGAAAAC AAATACTGCA AACAGATTTA GACTGGCATG TCCCCTGACT ATACCCCTAG AAATAATGTG AAATAAGAAT TCTGAAACAC GTGTATACAG ACAAAAGCCT AATAAACAGG

continued CCTTGATAAC TCCATACCGT TGACTTGAAA AGGCTCATTC GAGAAGTGAA AAAGGATTCT AGCTAAAGAT ATACAACTCT TGGCTCCCTG CTCAGGTTTA TTTATTTGCT ACATGAAGGT GAGTGAATAT ATTTTCTGAG CCCCATCGTG GAAAGCTTTG TAATGGCCGA TGGTGAGAAC GGGTTGTTGG ATGCAATAAT GGAAGACTGC AGAAATGAAA TCCTCCAGAA TGAAATTTCA AGACGGGACC TGTGCAGGAA TGAGCTAACT GAATTCCTGG ATTAAAAGGC TCCAGAAGAA CGATTACTTT ACTCCATTTT AAGGACAAGA TGACTGGGAA CTTCAAGGGT ATTGAACATC TCCTGGAAAG CCATCCCGAA CTCAAAGTTA CTCCTGTTTG GTACCAACTT ACTTCTGAGG AAAACCCATA ATTCTTGAAG GATGTCAAGT TTGCCTTCAT TAAAATATGT

TCTTTCACTG GTTCCGGCCA ACAGATTTCT TCAAGTCAGG AATATCAACC AGTTTTATTA CTGCACCTTT GCTTTGTACA GGAGGCGTGT ACCGAGGAAG AAGAAAACAA TCATTTATAC GGAGCAGCTT TGGTTAGAAT GACTTGGTAA CAAGAGTATG CCCACCCTCT TGTGAGAAAC TCTTCCTGGA CCTGCCCCCC ACATTTTCAA GAGGTCGATC AATGGATTTA TGCCTTACTG TGGAGACAAG GTCCTGACAA TGCCCCAAAG ACACCACCCA CATTGTCAGC GACTGTAGCC ACTTCACCCG CTACATATTG CTTTCATCCA AAATGTTCAG GGAAACCAAC TGTGAAGTGG TGTTTGGCCT GAACCAACTC CTGACAAAAA GGCATGTCTT CCACAGCAGT CCCTTATGTA ATTTTCTTCA CCCAGCTTTC ACTGACCAAA GAAGCATACA AGC

GAGGAATATG ATCTGGAAAA ACAAGGATTT GGGGGAGCTC ATAATTCTGC GGATCCATAA CTGATGTCTT GCCGAATATT ATGACCTTCT AAGCCAAACA AAGTGGAACA AGGGAGCAGA TGGCATGGGA CAGTGAAGGA GAAACATCCC CAGCCAAGTT CAGGGACTGA AGTCTCCAGA GTACCTGTGA AACGAGGAGG TCATGGAAAA TTCCTGAGAT TCCGGAATGA GCTTTGAAAC GGGATGTGGA TTACACCATT GCTTTGTTGT TTTCAAACTC TGGGACAGAA ATCATTCAGA CTTGTAAGTT GTTCCTGCCA ACAGCACAAA CCTCCACTTC TCTACTGTGT GAACTATAAG AGCACAATTA CTACAAATGA TTATTTTCTG ATTCAGTTCC CTCGTAAATT GCCTGTGACA ATGAGTTAAT ATATCTTCAT GTCCTGAGAA TTCATTCAGG

TAAAACTGTC TGTCGGCTTT AACTTCTCTT TTTCAGTGTA CTTCAAACAA AGTGATGAAA TTTGAAAGCA CGATGACTTT CTATCAGTTT CTGTGTCAGG TAGGTGCACC GAAATCCATA GAAAGGGAGC AAATCCTGCT CTGTGCAGTG CGATCCTTGC ATGTCTGTGT TTATAAATCC TGCTACTTAT GAAACGCTGT CAATGGACAA AGAAGCAGAT AAAGCAACTA TGTTGGATAC ATGCCAACGG TCAGAGATTG TGCTGGGCCA TCTCACCTGT ACAATCAGGA AGATCTCTGT TTTGGCTGAG AGACGGCCGC GAAAGAATCC CAAATGTGTC CAAAATGGGA ATGTGCAAAC CTGCTAGGCC GAATTCTTGC TTAGTTTGAG AGCTCATGAC CTCCTGTTCA TTAAGCATTC AAACAGAAAT TCAACAAATA CTCGGCAGAT GGTAGACACA

720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420

The first five nucleotides in each, exon are underiined to indicate the intron-exon boundaries. The methionine initiation codon (ATGJ, the termination codon (TAG) and the polyadenylation signal (AATAAA) are indicated.

Genomic structure^^ The gene spans approximately 80kbp and is encoded in 18 exons (starting with exon 0, to align exon numbering with C9, the first member of the gene family investigated). The introns vary from 450bp to 12 kb. Note that the majority of the exons are asymmetric with respect to boundary phase type and that there is a strong counter-correspondence of protein modules (domains) and exons.

-+H—\

H-H

INI II

II—I

Accession numbers Human:

cDNA HSC6A

Genomic HSC6X1-HSC6X17

Deficiency Complete Autosomal recessive^^. Loss of lytic function of complement. Patients show increased susceptibility to recurrent meningococcal meningitis and systemic N. gonorrhoeae infection, though many, including siblings sharing the deficiency may be healthy^^-^^. Mutations identified: 291 delC^^ One Japanese patient 879 delG^^ 30 Cape Coloured South African chromosomes,- three Dutch chromosomes 1195 delC^^'^^ Seven African chromosomes 1396 delG^^^^ Four African chromosomes Subtotap3.24 Autosomal recessive. Very substantial but incomplete loss of lytic function of complement (approximately 2% normal haemolytic activity and protein). Patients show no clinical effects. Rare (6 patients), but often associated with subtotal C7 deficiency. Mutation identified: Splice defect at 5' donor site of intron 15 (GT to GC)^^; 8 patients

Polymorphic variants Two common A (0.65) and B (0.35) and at least 10 rare phenotypic variants have been identified by isoelectric focusing^'^^ and detection by functional overlay, immunoblotting or immunofixation and are found at approximately these frequencies in all h u m a n populations. In addition, there are two known RFLPs and an unusual combined insertion/deletion polymorphism in intron 10. The D N A polymorphisms are tabulated below, including the D N A origin of the A/B phenotype.

Locus 1^'27,28 A413C; El 19A; A/B frequency 0.35 Locus 229-31 CCGG/CTGG in intron 3 frequency 0.35 Locus 3^2 Combined insertion/deletion in intron 10 frequency 0.05 Locus 4^^'^"^ T(3gl polymorphism in intron 16? frequency 0.4 Markers at loci 1 and 2 show strong allelic association in North Europeans^^, but not in other populations. Other markers do not show associations.

References ' 2 3 ^

DiScipio, R.G. et al. (1982) Mol. Immunol. 19, 1425-1431. DiScipio, R.G. and Hugli, T.E. (1989) J. BioL Chem. 264,16197-16206. Haefliger, J.-A. et al. (1989) J. BioL Chem. 264,18041-18051. Hobart, M.J. et al. (1975) In Protides of the Biological Fluids, 22nd Colloquium (Peeters, H. ed.), Pergamon Press, Oxford, pp. 575-580. 5 Wurzner, R. et al. (1991) Clin. Exp. Immunol. 83, 430-437. 6 Hobart, M.J. et al. (1977) J. Exp. Med. 146, 629. 7 Thompson, R.A. and Lachmann, P.J. (1970) J. Exp. Med. 131, 629-641. s Gonzalez, S. and Lupez-Larrea, C. (1996) J. Immunol. 157, 2282-2290. ^ Abbott, C. et aL (1989) Genomics 4, 606-609. ^0 Jeremiah, S.J. et al. (1990) Ann. Hum. Genet. 54, 141-147. '' Rogde, S. et al. (1991) J. Med. Genet. 28, 587-590. ^2 Hobart, M.J. et al. (1993) Hum. Mol. Genet. 2, 1035-1036. " Setien, F. et al. (1993) Immunogenetics 38, 341-344. ^"^ Hayakawa, J. et al. (1985) Immunogenetics 22, 637-642. ^5 Hobart, M.J. et aL (1993) Biochemistry 32, 6198-6205. ^6 Leddy, J.P. et aL (1974) J. Clin. Invest. 53, 544-553. ^^ Wurzner, R. et al. (1992) Immunodeficiency Rev. 3, 123-147. ^s Ross, S.C. and Denson, P. (1984) Medicine 63, 243-273. 29 Orren, A. et al. (1987) Immunology 62, 249-253. 20 Nishizaka, H. et aL (1996) J. Immunol. 156, 2309-2315. 2i Hobart, M.J. et al. (1998) Hum. Genet. 103, 506-512. 22 Zhu, Z.B. et al. (1998) Clin. Exp. Immunol. I l l , 91-96. 23 Lachmann, P.J. et al. (1978) Clin. Exp. Immunol. 33, 193-803. 2-^ Morgan, B.P. et al. (1989) Clin. Exp. Immunol. 75, 396-401. 25 Fernie, B.A. et aL (1996) J. ImmunoL 157, 3648-3657. 26 Wiirzner, R. et al. (1999) Exp. Clin. Immunogenet. 15, 268-285. 27 Fernie, B.A. et al. (1993) Hum. Mol. Genet. 2, 591-592. 2« Dewald, G. et al. (1993) Biochem. Biophys. Res. Commun. 194, 458-464. 39 Goto, E. et al. (1991) Nucleic Acids Res. 19, 194. 30 Potter, P.C. et al. (1993) Exp. Clin. Immunogenet. 10, 38-44. 3^ Fernie, B.A. et al. (1996) Exp. Clin. Immunogenet. 13, 92-103. ^^ Fernie, B.A. and Hobart, M.J. (1997) Hum. Genet. 100, 104-108. ^^ Goto, E. et al. (1991) Immunogenetics 33, 184-187. 3^ Fernie, B.A. et al. (1995) Ann. Hum. Genet. 59, 163-181.

Michael Hobart, Department of Biological Sciences, De Montfort University, Leicester, UK

Physicochemical properties C7 is synthesized as a single-chain precursor of 843 amino acids including a leader peptide of 22 amino acids^. pi -6-6.5 (multiple bands and polymorphisms^) -6.3-7 (after neuraminidase treatment-^) Mr (K) -95 N-linked glycosylation site 1 (754)^

D

Structure Unknown.

Function The C5b6 complex avidly binds C7 and a site is revealed by which the complex binds to local substrates. These would normally be antigen on which the complement system was activated, including the membrane of cells or bacteria, but also carbohydrates (e.g. agarose)^. It is assumed that the binding site resides in C7. The bound complex attracts C8 and subsequently C9 to form the terminal complement complex which damages the osmotic integrity of cells.

Tissue distribution Serum protein: 90 mg/mP. Primary site of synthesis: granulocytes^.

Regulation of expression C7 is only a weak acute-phase reactant. This leads to a molar deficit compared with C5 and C6 during acute inflammation^.

Protein sequence^ MKVISLFILV SVAVYGQYGG LVCNGDSDCD FRNRVINTKS YNSTWSYVKH LWENTVEVA THYLQSGSLG HGCKELENAL RRYSAWAESV

GFIGEFQSFS QPCVGNAFET EDSADEDRCE FGGQCRKVFS TSTEHTSSSR QFINNNPEFL GEYRVLFYVD KAASGTQNNV TNLPQVIKQK

SASSPVNCQW QSCEPTRGCP DSERRPSCDI GDGKDFYRLS KRSFFRSSSS QLAEPFWKEL SEKLKQNDFN LRGEPFIRGG LTPLYELVKE

DFYAPWSECN TEEGCGERFR DKPPPNIELT GNVLSYTFQV SSRSYTSHTN SHLPSLYDYS SVEEKKCKSS GAGFISGLSY VPCASVKKLY

GCTKTQTRRR CFSGQCISKS GNGYNELTGQ KINNDFNYEF EIHKGKSYQL AYRRLIDQYG GWHFWKFSS LELDNPAGNK LKWALEEYLD

50 100 15 0 2 00 2 50 3 00 3 50 400 4 50

Protein sequence EFDPCHCRPC GGWSCWSSWS EHLRLLEPHC EGYSLIGNPV VGEKVTVSCS KCQRWEKLQN QGRNYTLTGR GFSICVEVNG

continued

QNGGLATVEG PCVQGKKTRS FPLSLVPTEF ARCGEDLRWL GGMSLEGPSA SRCVCKMPYE DSCTLPASAE KEQTMSECEA

THCLCHCKPY RECNNPPPSG CPSPPALKDG VGEMHCQKIA FLCGSSLKWS CGPSLDVCAQ KACGACPLWG GALRCRGQSI

TFGAACEQGV GGRSCVGETT FVQDEGPMFP CVLPVLMDGI PEMKNARCVQ DERSKRILPL KCDAESSKCV SVTSIRPCAA

LVGNQAGGVD ESTQCEDEEL VGKNWYTCN QSHPQKPFYT KENPLTQAVP TVCKMHVLHC CREASECEEE ETQ

500 550 600 650 7 00 750 800

The leader sequence is underlined and the N-linked glycosylation site is indicated (N).

Protein modules 1-22 24-83 84-121 122-450 451-487 497-545 570-627 630-689 694-770 771-843

Leader peptide TSPl LDLRA MACPF

EGF TSPl

CCP CCP FIMAC FIMAC

exon 0/1 exon 2/3 exon 3/4 exon 4-10 exon 10 exon 11 exon 12/13 exon 14 exon 15/16 exon 16/17

Chromosomal location Human^-^'^: 5pl2-14. 5' C6 3'-3' C7 5'^^'^2 Orientation on chromosome unknown, near C9. Mouse^^'^'^: chromosome 15.

cDNA sequence^ ATGAAGGTGA AGTGCCTCCT GGCTGTACCA CAGCCTTGTG ACAGAGGAGG TTGGTTTGCA GAGTCAGAAA GGAAATGGTT TTTGGTGGTC GGAAATGTCC TACAATAGTA AAGCGCTCCT GAAATCCATA CAGTTCATTA TCCCACCTCC ACACATTATC

TAAGCTTATT CTCCAGTCAA AGACTCAGAC TTGGAAATGC GATGTGGAGA ATGGGGATTC GGAGACCTTC ACAATGAACT AATGTAGAAA TGTCCTATAC CTTGGTCTTA TTTTTAGATC AAGGAAAGAG ATAACAATCC CCTCTCTGTA TGCAATCTGG

CATTTTGGTG CTGCCAGTGG TCGCAGGCGG TTTTGAAACA GCGTTTCAGG TGACTGTGAT CTGTGATATC CACTGGCCAG GGTGTTTAGT ATTCCAGGTG TGTAAAACAT TTCATCATCT TTACCAACTG AGAATTTTTA TGACTACAGT GTCGTTAGGA

GGATTTATAG GACTTCTATG TCAGTTGCTG CAGTCCTGTG TGCTTTTCAG GAAGACAGTG GATAAACCTC TTTAGGAACA GGGGATGGAA AAAATAAATA ACGTCGACAG TCTTCACGCA CTGGTTGTTG CAACTTGCTG GCCTACCGAA GGAGAATACA

GAGAGTTCCA CCCCTTGGTC TGTATGGGCA AACCTACAAG GTCAGTGCAT CTGATGAAGA CTCCTAACAT GAGTCATCAA AAGATTTCTA ATGATTTTAA AACACACATC GTTATACTTC AGAACACTGT AGCCATTCTG GATTAATCGA GAGTTCTATT

AAGTTTTTCA AGAATGCAAT GTATGGAGGC AGGATGTCCA CAGCAAATCA CAGATGTGAG AGAACTTACT TACCAAAAGT CAGGCTGAGT TTATGAATTT ATCTAGTCGG ACATACCAAT TGAAGTGGCT GAAGGAGCTT CCAGTACGGG TTATGTGGAC

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960

1D MM cDNA sequence

continued

\SB\

TCAGAAAAAT TAAAACAAAA TGATTTTAAT TCAGTCGAAG AAAAGAAATG TAAATCCTCA 1020

|IH||

GGTTGGCATT

IIHI

AAAGCTGCTT CAGGAACCCA GAACAATGTA TTGCGAGGAG AACCGTTCAT CAGAGGGGGA 1140

HII

GGTGCAGGCT

1^^ ^^1 ^^M ^^M ^^1 ^^M ^^1 ^ H ^^m ^^M ^^m ^^M ^^M ^ H ^ H ^ H ^ H ^^1 ^ H ^^m ^ H ^^M ^^M ^ H ^^M ^^M ^^M ^^M ^^m ^ H ^^m ^^H ^^1 ^^1 ^ H ^^M ^^M ^ H ^^1 ^ H ^ H ^ H ^ H ^^M ^^1

AGGCGATATT CTGACACCTT CTGAAATGGG CAAAATGGTG ACATTTGGTG GGAGGTTGGA CGTGAATGCA GAAAGCACAC TTTCCTTTGT TTTGTTCAAG GAAGGATACT GTTGGGGAAA CAGAGTCACC GGTGGCATGT CCTGAGATGA AAATGTCAGC TGTGGACCTT ACAGTTTGCA GACAGCTGTA AAATGTGATG GGGTTTAGCA GGCGCTCTGA GAAACCCAGT GGCTGAGTGA CTACCTTCCT TGAATCGAAT GGCTGGCTGC TGTAAAATTA TTAAGTATGA AGAAAGAACG GAGGACTTGA TTCATCCTAT AAGACAGAGT CAACCTCCGC TACAGGTGCC CATGTTGGCC CAAATGCTGG GAGATTGAAC TGTCTGAGAA GACTGGAAGT ATGAGTGACT CAGGGAGTAC TTTCATCAAA AAATAAGCTA CAAAACTTCA

TTGTCGTTAA ATTTTCAAGT CATGGATGCA AGGAACTGGA AAACGCTTTA 1080 TCATATCTGG CTGCCTGGGC TATATGAGCT CTCTTGAAGA GTTTGGCTAC CGGCGTGTGA GTTGCTGGTC ATAACCCACC AATGCGAAGA CTTTGGTTCC ATGAAGGTCC CTCTTATTGG TGCATTGTCA CCCAAAAACC CCTTAGAAGG AGAATGCCCG GCTGGGAGAA CCTTGGATGT AGATGCATGT CTCTGCCTGC CTGAGAGCAG TTTGTGTGGA GATGCAGAGG AGGCTCCTGG AAACATCTGC CCTCAACTCC TACTCTTTTG GTGTTCTTGA GAGGATTGCA TTGACACAGC TGATCTCCTG ACAAAAATGA AGAGTCAACC CTCACTTTGT CTCCTGGGTT CGCCACCACG ATGCTCGTCT GATTACAGAC CACTAAAATG CAAATACAAT TTGCCCCTGT GTGCATTTGA ACAGGTAGAT ACAAAACAGC TATTATACAA GATGACAAGG

CCTTAGTTAC AGAATCTGTG GGTAAAGGAA GTATCTGGAT TGTTGAGGGG GCAAGGAGTC CTCTTGGAGC TCCCAGTGGG TGAGGAGCTG AACAGAATTC AATGTTTCCT AAACCCAGTG GAAAATTGCC TTTCTACACA TCCTTCAGCA CTGTGTACAA ACTGCAGAAT ATGTGCTCAA TCTCCACTGT CTCAGCTGAG CAAATGTGTC AGTGAACGGC GCAGAGCATC AGGCCATGGT ACAACTGGGC CAGCCATCTG CCTCCTTTTT AATAGGTGTT CTAGAGAAAC CCATGGGCCA ATTAGAGAGG CAGATACAAA ACCACAGATA TGCCCAGGCT GAAGCGATTC CCCAGCTAAT CCAACTCCTG ATGAACCACC TTAGAGGAGA TCAGTCTTCT rnr^rnrnrnrnrnrn|^*^rn

GATAGTTTTC TAGTTTGAAG AGCTGTGGGA ATACTATCTC ATTAGAACAC

CTAGAGCTGG ACTAATCTTC GTACCTTGTG GAATTTGACC ACCCATTGTC CTCGTAGGGA CCCTGTGTCC GGTGGGAGAT GAGGACTTGA TGTCCATCAC GTGGGGAAAA GCCAGATGTG TGTGTTCTAC GTTGGTGAGA TTTCTCTGTG AAAGAAAATC TCAAGATGTG GATGAGAGAA CAGGGTAGAA AAAGCTTGTG TGCCGAGAAG AAGGAGCAGA TCTGTCACCA CAGCTTGCTT ACTGGACAGC TATAAACACA AATGTCAGTA ACCTTCTCTG TTGAATGCTC GAACACACTC GTGGTTTTCC CTATTTCTAT GGAATTCCTT GGAGCGCAGT TTGTGCCTCA TTTTGCATTT ACCTCAGGTA ACGCCTGGCT ATTCATTATG CTTTGGGGTT TTTGAAAGAA CCTATTCTGT CATTGACCTT GGAGAAATGA TGTATTGTTC TCATTAAGAT

ACAATCCTGC CTCAAGTCAT CCTCTGTGAA CCTGTCATTG TGTGCCATTG ATCAAGCAGG AAGGGAAGAA CCTGCGTTGG GGTTGCTTGA CTCCTGCCTT ATGTAGTGTA GAGAAGATTT CTGTACTGAT AGGTGACTGT GCTCCAGCCT CGTTAACACA TTTGTAAAAT GCAAAAGGAT ATTACACCCT GTGCCTGCCC CATCGGAGTG CGATGTCTGA GCATAAGGCC GGAATCCAGC TTTTCCTTCT ATCCTTTGTT AGGATATGAG GGCCTTGGTT CATTCAGGCC TACAAAATGA TCAATGGAAC CCTGAGTAGT ATTCTTTTTT GGGGTGATCT GCTTCCCAAG TTAGTAGAGA ATCCGTCTGC GGAATACTTA CTGTGGTCAC TTAGTATGTG CATCAGTTCA GGATACAGTC TTATTTATTC GAGGGCTTAA TGACCCTGGT GCTATTCTTC

TGGAAACAAA AAAACAAAAG AAAACTATAC CCGGCCTTGT CAAACCGTAC AGGGGTTGAT AACAAGAAGC AGAAACGACA ACCACATTGC GAAAGATGGA CACTTGCAAT ACGGTGGCTT GGATGGCATA TTCCTGTTCA TAAGTGGAGT GGCAGTGCCT GCCCTACGAA ACTGCCTCTG TACTGGTAGG ACTGTGGGGA CGAGGAAGAA GTGTGAGGCG TTGTGCTGCG AGGCAGCTGG TCTCCAGTGT CTCCCAAATC CCTTTGCACA TTTTAAAATC TATCATTTTA CTAGGATAAC CAAATATAAA AATCTCACAC TAATTTTTTT CATCTCCCTG CAGCTGGGAT TGGGTTTCAC CTTGGCCTCC CTCTTGTCGG AGGGGTGTCT TCAAACATAG TGGCTGAGGC CCAGAGTTTT CTTATTTCTC ATGAAATTTA AAATATATTT

1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TAG) and the polyadenylation signal (AAATAAl are indicated.

Genomic structure^^ The gene spans approximately 80 kb and is encoded in 18 exons (starting at exon 0 to align exon numbering with C9, the first member of the gene family investigated). The exact position of exon 0 is not known. The introns are fairly evenly spaced and vary from 1 kb to 9 kb. Note that the majority of the exons are asymmetric with respect to boundary phase type and that there is a strong counter-correspondence of protein modules (domains) and exons. 2kb

17

II I II

+-H

1—II I I

H

1—I

Accession numbers Human

cDNA HSC7A

Genomic HSC7EX1-HSC7EX17

Deficiency Complete^^ Autosomal recessive. Loss of lytic function of complement. Patients show increased susceptibility to recurrent meningococcal meningitis and systemic N. gonorrhoeae infection, though many, including siblings sharing the deficiency may be healthy^^. Mutations identified: G to A at the AG of the 3' spHce 1 chromosome acceptor site, intron P^ 5 chromosomes, Irish Deletion of exons 6 and 7^^" G to A at the GT of 5' splice donor 1 chromosome site, intron 7^^ G1135toC;G379toR^7 8 chromosomes, Moroccan Jewish 1 chromosome 1929 delC^s 1 chromosome G2044 to C; E682 to Q^« 1 chromosome G2060 to A; R687 to W^ 1 chromosome 2137 del TG^9 1 chromosome T2250 to A; C750 stop^^ 2350 del G^s 1 chromosome T to C at the GT of 5' spUce donor 1 chromosome site of intron 16^^

Subtotal Autosomal recessive. Very substantial but incomplete loss of lytic function of complement (approximately 2% normal haemolytic activity and protein). Patients may show no clinical effects. Rare (6 chromosomes) but often associated with subtotal C6 deficiency^'^'^^ However, note that partial C7 deficiency in the presence of normal or reduced but significant C6 synthesis

can lead to a complete C7 deficient phenotyp6^^,23 w h e n the complement system is activated, an excess of C5b6 may be formed and circulate in vivo. The C5b6 will scavenge any new C7 synthesis, leading to the local binding or inactivation of the complex. Mutation identified: CI561 to A; R521 to S^^; 6 chromosomes

Partial Autosomal recessive. Substantial loss of lytic function. Approximately 10-15% activity. No clinical effect unless compound heterozygote with a more serious C7 deficiency. Mutation identified: G659 to A; R220 to Q^«; 1 chromosome

Polymorphic variants Phenotypic polymorphism is detected by (a) isoelectric focusing and functional overlay^ or immunoblotting^, and is much more common among Oriental populations than white or African and (b) comparison of the binding of a monoclonal and polyclonal antibodies. Two common alleles (M (0.77) and N (0.23)) are found in white populations but the N allele is apparently absent in Africans^"^.

DNA variants A large number of polymorphic sites in the C7 gene have been identified as RFLPs or in the course of sequencing to determine the molecular bases of deficiencies. A/T, intron 15, position -37^^ A1792T; T576S frequency 0.12^^ A1769C; T587P; M / N frequency 0.77/O.IS^'^'^^ A/C, intron 12, position - 2 7 frequency 0.17^^ A/C, intron 11, position +18 Japanese C7D assoc.^^ G l 162C; S389T frequency 0.37'''^ ins T, intron 8, position +42^^ T/C, intron 7, position -9^^ T/A, intron 1, position +55^^ A/C, intron 1, position +29^^ Two Taql polymorphisms in intron 13 and IS^^'^^.

References 1 2 ^ ^ 5 6 ^ «

DiScipio, R.G. et al. (1988) J. Biol. Chem. 263, 549-560. Hobart, M.J. et al. (1978) J. Immunogenet. 5, 157-163. Tokunaga, K. et al. (1986) Am. J. Hum. Genet. 39, 414-419. Lachmann, P.J. and Thompson, R.A. (1970) J. Exp. Med. 131, 643-657. Wiirzner, R. et al. (1991) Clin. Exp. Immunol. 83, 430-437. Hogasen, A.K.M. et al. (1995) J. Immunol. 154, 4734-4740. Hobart, M.J. et al. (1978) J. Immunogenet. 5, 157-163. Abbott, C. et al. (1989) Genomics 4, 606-609.

^ Jeremiah, S.J. et al. (1990) Ann. Hum. Genet. 54, 141-147. 0 Rogde, S. et al. (1991) J. Med. Genet. 28, 587-590. ' Hobart, M.J. et al. (1993) Hum. Mol. Genet. 2, 1035-1036. 2 Setien, F. et al. (1993) Immunogenetics 38, 341-344. ^ Orren, A. et al. (1985) Immunogenetics 21, 591-599. "^ Hayakawa, J. et al. (1985) Immunogenetics 22, 637-642. 5 Hobart, M.J. et al. (1995) J. Immunol. 154, 5188-5194. 6 Wiirzner, R. et al. (1992) Immunodeficiency Rev. 3, 123-147. 7 Fernie, B.A. et al. (1997) J. Immunol. 159, 1019-1026. « Fernie, B.A. and Hobart, M.J. (1998) Hum. Genet, (in press). ^ Nishizaka, H. et al. (1996) J. Immunol. 157, 4239-4243. 20 Lachmann, P.J. et al, (1978) Clin. Exp. Immunol. 33, 193-203. 2^ Morgan, B.P. et al. (1989) Clin. Exp. Immunol. 75, 396-401. 22 Wiirzner, R. et al. (1996) Immunology 88, 407-411. 23 Fernie, B.A. et al. (1996) J. Immunol. 157, 3648-3657. 2^ Wiirzner, R. et al. (1990) Complement Inflamm. 7, 290-297. 25 Goto, E. et al. (1990) Hum. Genet. 85, 251-252. 26 Fernie, B.A. et al. (1995) Ann. Hum. Genet. 59, 163-181. 27 Wiirzner, R. et al. (1995) J. Immunol. 154, 4813-4819. 28 Dewald, G. et al. (1994) Hum. Hered. 44, 301-304. 29 Fernie, B.A. et al. (1996) Exp. Clin. Immunogenet. 13, 92-103.

Francesco Tedesco, Department of Physiology and Pathology, University of Trieste, Trieste, Italy Mnason E. Plumb and James M. Sodetz, Department of Chemistry and Biochemistry and School of Medicine, University of South Carolina, Columbia, SC, USA

Physicochemical properties^^^ C8 exists in the circulation as an oligomeric protein of M, (K) 151. It is composed of three non-identical subunits [a, p, 7) arranged as a disulfidelinked a-y dimer and a non-covalently associated P chain. Each subunit is encoded in a separate gene. At high ionic strength, CScf-yand C8^ can be dissociated and purified in stable form. Mature protein: pi ~6.20-7.50 M, (K) 151.0 Amino acids

leader mature M, (K) predicted observed iV-linked glycosylation sites Possible Probable Interchain disulfide bonds 1 a-y

a chain 30 554 61.7 64

P chain 54 537 60.9 64

/chain 20 182 20.3 22

2 (43, 437) 1

3(101,243,553) 1

0

194

60

Structure C8a and CSP are members of the MAC protein family which is composed of complement proteins C6, C7, C8a, C8/3 and C9^. All members have highly conserved amino acid sequences and a common structural organization. Each contains cysteine-rich modules that are found in a number of functionally unrelated proteins. Similarity in genomic organization (exon lengths, boundaries and phases) suggests that all arose from a common ancestral gene^'^. C87 is unrelated and belongs to the lipocalin family of homologous proteins that are widely distributed and have the common ability to bind small, lipophilic ligands, e.g. retinol, pheromones, odorants, etc.^.

Function C8 is one of five terminal complement components (C5b, C6, C7, C8, C9) that assemble in a sequential manner on target cell membranes to form C5b-9, the cytolytic membrane attack complex (MAC) of complement. Specific functions have been identified for human C8a and C8^. Within C8, C8a physically interacts with CSp and C87. During MAC assembly it interacts directly with the target membrane, provides a binding site for C9 and interacts with CD59, the membrane-associated inhibitor of MAC^'^'^. C8p interacts directly with C8a and also contains a binding site that

mediates incorporation of C8 into the MAC. A function for CS/has not been identified. It is not essential for the biosynthesis and secretion of C8a, the interaction between C8a and C8j3 or the haemolytic activity of C87.

lis

C8p

Tissue distribution Serum protein: -55-80/ig/ml in serum. Primary site of synthesis: Hver (hepatocytes). Secondary sites: monocytes*, macrophages^, fibroblasts^^, astrocytes^ endothelial cells^^.

Regulation of expression C8 is an aciite-phase protein. Expression by HepG2 cells is upregulated by IL-6, IL-1, IFN/and to a lesser extent by TNFa^^-^^. Expression of CSa-y and CSp is independently regulated by the first three cytokines^^. In astrocytes, C8 expression is up-regulated by IFN/^^ In hepatocytes, CSa-y and CSP are believed to associate intracellularly to form C8^^. However, CSa-y is synthesized at a faster rate than C8/3 and is secreted in excess. Studies of human C8 deficiencies and expression of recombinant C8 subunits indicate that CSa-y and CSp can be synthesized and secreted independently^'^^'^*.

Protein sequences MFAWFFILS FPCQDKKYRH QCKETGRCLK KAALGYNILT DDEKYFRKPY TIGIGPAGSP FKMRKDDIML YEYILVIDKA KFGGGKTERA KYNPWIDFE PCFNNGVPIL CRAGIQERRR

LMTCQPGVTA RSLLQPNKFG RHLVCNGDQD QEDAQSVYDA NFLKYHFEAL LLVGVGVSHS DEGMLQSLME KMESLGITSR RKAMAVEDII MQPIHEVLRH EGTSCRCQCR ECDNPAPQNG

QEKVNQRVRR GTICSGDIWD CLDGSDEDDC SYYGGQCETV ADTGISSEFY QDTSFLNELN LPDQYNYGMY DITTCFGGSL SRVRGGSSGW TSLGPLEAKR LGSLGAACEQ GASCPGRKVQ

AATPAAVTCQ QASCSSSTTC EDVRAIDEDC YNGEWRELRY DNANDLLSKV KYNEKKFIFT AKFINDYGTH GIQYEDKINV SGGLAQNRST QNLRRALDQY TQTEGAKADG TQAC

LSNWSEWTDC VRQAQCGQDF SQYEPIPGSQ DSTCERLYYG KKDKSDSFGV RIFTKVQTAH YITSGSMGGI GGGLSGDHCK ITYRSWGRSL LMEFNACRCG SWSCWSSWSV

50 10 0 15 0 2 00 250 3 00 3 50 400 45 0 500 550

MKNSRTWAWR RQMRSVDVTL NFSDKEVEDC DEANCRRIYK SPHYILNTRF KSGFSFGFKI VAHYKLKPRS GGIYEYTLVM GKCRGILNEI EWGDAVQYNP SSCHCAPCQG WSNWSSCSGR

APVELFLLCA MPIDCELSSW VTNRPCGSQV KCQHEMDQYW RKPYNVESYT PGIFELGISS LMLHYEFLQR NKEAMERGDY KDRNKRDTMV AIIKVKVEPL NGVPVLKGSR RKTRQRQCNN

ALGCLSLPGS SSWTTCDPCQ RCEGFVCAQT GIGSLASGIN PQTQGKYEFI QSDRGKHYIR VKRLPLEYSY TLNNVHACAK EDLWLVRGG YELVTATDFA CDCICPVGSQ PPPQNGGSPC

RGERPHSFGS KKRYRYAYLL GRCVNRRLLC LFTNSFEGPV LKEYESYSDF RTKRFSHTKS GEYRDLFRDF NDFKIGGAIE ASEHITTLAY YSSTVRQNMK GLACEVSYRK SGPASETLDC

NAVNKSFAKS QPSQFHGEPC NGDNDCGDQS LDHRYYAGGC ERNVTEKMAS VFLHARSDLE GTHYITEAVL EVYVSLGVSV QELPTADLMQ QALEEFQKEV NTPIDGKWNC S

50 10 0 150 2 00 2 50 3 00 3 50 40 0 450 50 0 550

TLLLAAGSLG RFLQEQGHRA RFLLQARGAR VLSGFEQRVQ

QKPQRPRRPA EATTLHVAPQ GAVHWVAET EAHLTEDQIF

SPISTIQPKA GTAMAVSTFR DYQSFAVLYL YFPKYGFCEA

NFDAQQFAGT 5 0 KLDGICWQVR 10 0 ERAGQLSVKL 15 0 ADQFHVLDEV 2 00

C8y2^ MLPPGTATLL WLLVAVGSAC QLYGDTGVLG YARSLPVSDS RR

Leader sequences are underlined and possible N-linked glycosylation sites are indicated (N).

Protein modules C8a 1-30 38-93 94-132 133-492 493-529 536-584

Leader peptide TSPl LDLRA MACPF EGF TSPl

exon exon exon exon exon exon

1/2 2/3 3/4 4-10 10 11

Leader peptide TSPl LDLRA MACPF EGF TSPl

exon exon exon exon exon exon

1/2 2/3 3/4 4-10 10/11 12

Leader peptide

exon 1

csp 1-54 64-119 120-157 158-498 499-535 542-591

CSy 1-20

Chromosomal location Human^^^^^: Mouse2^'25.

C8a lp32

C8/3 lp32 4

C87 9q34.3 2

cDNA sequences C8a^^

TTTTTTTTTT GACATCTTTT GGTGGCTTCT AGCCTGGGGT CAGCAGTTAC ACAAAAAGTA GTGGTGACAT AGTGTGGACA GTAATGGAGA CCATTGACGA GGTACAATAT GCCAGTGTGA AACGTCTCTA ACCACTTTGA ACCTTCTTTC GCCCAGCCGG TCTTGAACGA AGGTGCAGAC TGCAGTCATT ATGACTATGG TGGTGATTGA GTTTTGGAGG CAGGAGACCA CTGTGGAAGA CACAGAACAG TTGTTATCGA CTCTGGAGGC ATGCCTGCCG GCAGGTGCCA GAGCCAAAGC TCCAGGAAAG CAGGGCGGAA GCTGTGGATG ATGCAAATCA GCTGTCTACA ATGTTGACTG AAAGTAAAAT AGAGACAAAA AATTCTCTTA GTAAGCAGAC

CATCCTACTT ACTCCAATTT GGCTGAGATG AACTGCACAG CTGCCAGCTG CCGACACCGG CTGGGATCAA GGATTTCCAG CCAGGACTGC AGACTGCAGC CCTGACCCAG GACGGTATAC CTATGGAGAT AGCCCTGGCA CAAAGTTAAA CAGCCCTTTA ATTAAACAAG TGCACATTTT AATGGAGCTT CACCCATTAC CAAAGCAAAA CTCCTTGGGC TTGTAAAAAA CATTATTTCT GAGCACCATT TTTTGAGATG CAAGCGCCAG ATGTGGGCCT GTGCCGCCTG AGATGGGAGC GAGAAGAGAG AGTACAGACG TCGACCCCTG GCACACTTTT GCCAACTATT TTAACTAGAA AGAAAACTGA CAAGCAGACA GTTCTTTGAT ACTCTGAAAC

TGTTTTATTG CCTGAATAGA TTTGCTGTTG GAGAAGGTGA AGCAACTGGT AGCCTCTTGC GCCAGCTGCT TGTAAGGAGA CTTGATGGCT CAGTATGAAC GAAGATGCTC AATGGGGAAT GATGAGAAAT GATACTGGAA AAAGACAAGT TTGGTGGGTG TATAATGAGA AAGATGAGGA CCAGATCAGT ATCACATCTG ATGGAATCCC ATTCAATATG TTTGGAGGTG CGGGTGCGAG ACATACCGTT CAGCCTATCC AACCTGCGCC TGCTTCAACA GGTAGCTTGG TGGAGTTGCT TGTGACAATC CAGGCTTGCT CACTGACTAT TTCTTTGTTC CTATTAGTTA GCTCTGTCCT GAAAACTCAA CCTGAAACAA AACAATTTGT AATGAGAAAA

GGCGTTGATT TAGCTTTATT TTTTCTTCAT ACCAGAGAGT CAGAGTGGAC AGCCAAACAA CCAGTTCTAC CAGGTCGCTG CTGATGAGGA CAATTCCAGG AGAGTGTGTA GGAGGGAGCT ACTTTCGGAA TCTCCTCAGA CTGACTCATT TAGGTGTATC AGAAATTCAT AGGATGACAT ACAATTATGG GATCCATGGG TTGGTATTAC AAGACAAAAT GCAAAACTGA GTGGCAGTTC CCTGGGGGAG ACGAGGTGCT GCGCCTTGGA ATGGGGTGCC GTGCTGCCTG GGAGCTCCTG CAGCACCTCA GAGGGCCTCT TGGATAAAGA TGCCAGCTTC GAAAACTCAA ACTTACAGCA TCCATGACCA TCAACGCCCA TCACTCATAG ATACTAAAAA

GTTACAGGTC CCTTCAAGGT CTTGTCTTTG AAGACGGGCA AGATTGCTTT GTTTGGGGGA AACTTGTGTA CCTGAAACGC CGACTGTGAA ATCACAGAAG CGATGCCAGT TCGATATGAC ACCCTACAAC GTTTTATGAT TGGAGTGACC CCACTCACAA TTTCACAAGA TATGCTGGAT CATGTATGCC TGGCATTTAT CAGCAGAGAT AAATGTTGGT AAGGGCCAGG TGGCTGGAGC GTCATTAAAG GCGGCACACA CCAGTATCTG CATCCTCGAG TGAGCAAACA GTCTGTATGC GAATGGAGGG GGACACAGGC CTTCTTTCAA CAGGCCTAAG TCATTTTATT CTTTGGATCA GGGAGAACTT ATAAAACAAA AAACATTATT TTGACTTGAG

CCAGCCTGTA AATATAGTGC ATGACTTGTC GCTACACCCG CCGTGCCAGG ACCATCTGCA AGGCAAGCAC CACCTTGTGT GATGTCAGGG GCAGCCTTGG TATTATGGGG TCCACCTGTG TTTCTGAAGT AATGCAAATG ATCGGCATAG GACACTTCAT ATCTTCACAA GAAGGAATGC AAGTTCATCA GAATATATCC ATCACGACAT GGAGGTTTAT AAGGCCATGG GGTGGCTTGG TATAATCCTG AGCCTGGGGC ATGGAATTCA GGCACCAGCT CAGACAGAAG AGAGCAGGCA GCCTCGTGTC TGGACCAGAT CTAAGAGAAG ACTAGGTTTT CAGCAACTGG TCAAAAAAAT ACAGGATGTT GTAGGATGAA AATTGGTAGG TTATTTC

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340

TCCTGTCACA TATTTCTTCT CACATTCCTT GTGTGGATGT CATGTGACCC TCCATGGGGA CATGCGGAAG ACCGCAGACT GTAGAAGGAT TGGCCAGTGG ATTATGCAGG

TTGGGAAATG CTGTGCTGCC TGGGTCAAAT TACCCTGATG CTGTCAGAAG ACCGTGCAAC TCAAGTGCGA TCTTTGCAAT TTATAAAAAA GATAAATTTG TGGATGCTCC

AAGAATTCCA CTGGGCTGTC GCAGTCAACA CCCATTGATT AAAAGGTACA TTCTCTGACA TGTGAAGGCT GGGGACAATG TGTCAGCATG TTCACAAACA CCGCATTACA

GGACATGGGC TCAGTTTGCC AGAGCTTTGC GTGAGCTGTC GGTATGCCTA AGGAAGTCGA TTGTGTGTGC ACTGTGGAGA AAATGGACCA GTTTTGAGGG TCCTGAACAC

TTGGAGGGCG TGGCTCCAGA TAAGAGCAGA TAGTTGGTCC CTTGCTCCAG AGACTGTGTT ACAGACAGGA CCAGTCAGAT ATACTGGGGA CCCAGTTCTT GAGGTTTAGG

60 120 180 240 300 360 420 480 540 600 660

C8p2o CTGTGGCATC CCGGTGGAGC GGTGAAAGGC CAGATGCGGA TCTTGGACCA CCCTCTCAGT ACCAACAGAC AGGTGTGTAA GAAGCAAACT ATTGGCAGTC GATCACAGGT

CSP

continued

AAGCCCTACA AAAGAGTATG TCTGGTTTCA AGTGATCGAG TTTCTGCATG ATGCTCCATT GAATACAGAG GGCATTTATG CTTAACAACG GTCTACGTCA GACAGAAACA AGTGAGCACA TGGGGAGACG GAACTAGTGA GCACTGGAGG GGAGTCCCTG CTAGCCTGTG TCAAATTGGT CCTCCTCAAA TAGCAGATGA GCCAGCTCAG ATGCAAGCTG CTGAGTTAAA

ATGTGGAAAG AATCATACTC GTTTTGGTTT GCAAACACTA CACGCTCTGA ACGAGTTCCT ATCTCTTCCG AATACACCCT TCCATGCCTG GTCTGGGTGT AGAGGGACAC TCACCACCCT CTGTGCAGTA CAGCCACAGA AGTTCCAGAA TCCTGAAAGG AGGTCTCCTA CTTCATGCTC ATGGGGGTAG TACAGCAGTG CCCTACACCA TTTAAAATAA GGCTT

CTACACGCCA AGATTTTGAA TAAAATACCT TATTAGGAGA CCTTGAAGTA TCAGAGAGTT TGATTTTGGG CGTTATGAAC TGCCAAAAAT GTCTGTAGGC CATGGTGGAG GGCATACCAG CAACCCAGCC TTTTGCCTAT GGAAGTTAGT ATCACGCTGT TCGGAAGAAT TGGAAGACGT CCCCTGTTCA GGCTACATAC GTTTCCACCT AGATGTTACC

CAGACCCAAG CGCAATGTCA GGAATATTTG ACCAAACGAT GCACATTACA AAGCGGCTGC ACCCACTACA AAAGAGGCCA GATTTTAAAA AAATGCAGAG GACTTGGTGG GAGCTGCCGA ATCATCAAAG TCCAGCACAG TCCTGCCACT GACTGCATCT ACCCCCATTG AAGACAAGAC GGCCCTGCTT AATGAGAGCC GGAGTTCATG TTGTAAAATG

GCAAATACGA CAGAGAAAAT AACTTGGCAT TCTCTCATAC AGCTGAAACC CCCTGGAGTA TCACAGAGGC TGGAGAGAGG TTGGTGGTGC GTATTCTGAA TCCTGGTACG CGGCGGACCT TTAAGGTGGA TGAGGCAGAA GTGCTCCCTG GTCCTGTTGG ATGGGAAGTG AAAGGCAGTG CAGAAACACT CTGAGCCCTC CAAGGGCAAA CAAGTTGATT

ATTCATATTA GGCAAGCAAG CAGTAGTCAA TAAAAGCGTA CAGAAGCCTC CAGCTACGGG TGTGCTTGGG AGATTATACT CATTGAAGAG TGAAATAAAA AGGAGGGGCA GATGCAGGAG GCCTCTGTAT CATGAAGCAG CCAAGGAAAT ATCCCAAGGC GAATTGCTGG TAACAATCCA TGACTGCTCC AAGAACTCAC AGGCAGTGCC TAAATAAATA

720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980

GGTGGTGCTA CCTGGGACTG CAGAGGCCAC GCTCAGCAGT CAGGAGCAGG ATGGCTGTCA GGAGACACAG CACGTGGTTG GGGCAGCTGT GGGTTTGAGC AAGTACGGCT GGCCGGCACA GGTGCCCGCT AGGGGATCTT TGTCTCC

CCCTTGGCCT CGACCCTCTT GCCGGCCCGC TTGCAGGGAC GCCACCGGGC GTACCTTCCG GGGTCCTCGG TCGCTGAGAC CAGTGAAGCT AGCGGGTCCA TCTGCGAGGC CAGCTCCAGT GCCTGTCCTC CTGCCGGCTG

CCCAGAGTCC GACTCTGCTC ATCCCCCATC CTGGCTCCTT CGAGGCCACC AAAGCTGGAT CCGCTTCCTG CGACTACCAG CTACGCCCGC GGAGGCCCAC TGCAGACCAG GCTGAGAAGT CGTGAAACCA CCCCAGAGGA

TGCCACCCTG CTGGCAGCTG AGCACCATCC GTGGCTGTGG ACACTGCATG GGGATCTGCT CTTCAAGCCC AGTTTCGCTG TCGCTCCCTG CTGACTGAGG TTCCACGTCC CAGTGCCCCG GCCTCAGATC CAGTGGGTGG

CTGCCGCCAC GCTCGCTGGG AGCCCAAGGC GCTCCGCTTG TGGCTCCCCA GGCAGGTGCG GAGGCGCCCG TCCTGTACCT TGAGCGACTC ACCAGATCTT TGGACGAAGT AGAGACGAGC AGGGCCCTGC AGTGGTACCT

60 120 180 240 300 360 420 480 540 600 660 720 780 840

C8Y2^'26 CTGGGACTTT CATGCTGCCC CCAGAAGCCT CAATTTTGAT CCGTTTCCTG GGGCACAGCC CCAGCTCTAT AGGGGCTGTG GGAGCGGGCG GGTCCTGAGT CTACTTCCCC GAGGAGGTGA CCACCAGTGG CACCCAGGGC ACTTATTAAA

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATG), the termination codons (TGA, a and y chains,- TAG, p chain) and the polyadenylation signals (AATAAA, a and P chains; ATTAAA, y chain) are indicated.

Genomic structure (subunit/gene name) C8a/C8A: The gene spans -70 kb and contains 11 exons^. C8a

I

h

1-1-4

lokb -*^*-

4-IH

C8j3/C8B: The gene spans -40 kb and contains 12 exons^^. ^ ^

^

lOkb

^

i I mill I III C87/C8G: The gene spans -1.8 kb and contains 7 exons^.

1

7

Accession numbers Human

C8a M16974^9

Rabbit29

L26981

csp

M1697320 X043932S L26980

C87 M l 72632^ X0646526 L26979

Deficiency Autosomal recessive. Characterized by increased susceptibility to meningococcal infection. Classified as either C 8 a - 7 (rare) or CSp (common) deficiency^^. C 8 a - 7 deficiency: Low levels of dysfunctional C8a-7(~1%) and functional C8i8(~3%)^o. Mutations identified in C8a: G to A, intron 6, position - 1 2 (5 chromosomes)^^ G to A, intron 6, position - 1 5 (1 chromosome)^^ Both lead to an insertion, a frameshift and a premature stop codon. G to T, intron 2 (3 chromosomes) produces a mutation at the exon 2/intron 2 boundary^-^ C1407 to T, R424 to stop^^ No defects identified thus far in C87(six unrelated patients)'^^-^^. C8/3 deficiency: Variable levels of functional C 8 a - 7 b u t no detectable CSp^'^. Common mutation: C1309 to T; R428 to stop^^^^^ Other mutations (rare)-^^: C 2 9 8 t o T ; Q 9 1 to stop C388 to T; R121 to stop C847 to T; R274 to stop 363 del C 632 del C

Polymorphic variants C8a: Protein polymorphisms in C S a - / r e s i d e in C 8 a and are controlled by the C8A locus. Two common (A and B) and several rare variants have been identified by electrophoretic analysis of CSa-f^'^'^. C414A; Q93K: A/B^* BamHl RFLP described for CSA^^. CSP: Protein polymorphisms are controlled by the C8B locus. Two common (A and B) and two rare variants have been identified by electrophoretic analyses^^. A376G;R117G: A/B^« BamHl and Taql RFLPs have been described for CSB"^^'"*^. C87: N o protein polymorphisms reported. T193G; silent, associated RFLP^^ Two RFLPs associated with sites located in intron 1.

References ^ Sodetz, J.M. (1989) Curr. Top. Microbiol. Immunol. 140, 19-31. 2 Plumb, M.E. and Sodetz, J.M. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 119-148. 3 Michelotti, G.A. et al. (1995) Hum. Genet. 95, 513-518. ^ Hobart, M.J. et al. (1995) J. Immunol. 154, 5188-5194. 5 Kaufman, K.M. and Sodetz, J.M. (1994) Biochemistry 33, 5162-5166. 6 Lockert, D.H. et al. (1995) J. Biol. Chem. 270, 19723-19728. 7 Schreck S.F. et al. (1998) J. Immunol. 161, 311-318. « Hetland, G. et al. (1986) Scand. J. Immunol. 24, 421-428. 9 Pettersen, H.B. et al. (1987) Scand. J. Immunol. 25, 567-570. '0 Garred, P. et al. (1990) Scand. J. Immunol. 32, 555-560. '' Gasque P. et al. (1995) J. Immunol. 154, 4726-4733. ^2 Johnson E. and Hetland, G. (1991) Scand. J. Immunol. 33, 667-671. ^3 Perissutti, S. and Tedesco, F. (1994) Int. J. Clin. Lab. Res. 24, 45-48. ^^ Spath, G.F. et al. (1995) Exp. Clin. Immunogenet. 12, 53-60. ^^ Scheurer, B. et al. (1997) Immunopharmacology 38, 167-175. ^6 Ng, S.C. and Sodetz, J.M. (1987) J. Immunol. 139, 3021-3027. ^7 Tedesco, F. et al. (1983) J. Clin. Invest. 71, 183-191. ^« Letson C.S. et al. (1996) Mol. Immunol. 33, 1295-1300. '^ Rao, A.G. et al. (1987) Biochemistry 26, 3556-3564. 20 Howard, O.M.Z. et al. (1987) Biochemistry 26, 3565-3570. 2^ Ng, S.C. et al. (1987) Biochemistry 26, 5229-5233. 22 Theriault, A. et al. (1992) Hum. Genet. 88, 703-704. 23 Dewald, G. et al. (1996) Ann. Hum. Genet. 60, 281-291. 24 Tanaka, S. et al. (1991) Immunogenetics 33, 18-23. 25 Chan, P. et al. (1994) Genomics 23, 145-150. 26 Haefliger, J. et. al. (1987) Biochem. Biophys. Res. Commun. 149, 750-754. 27 Kaufmann, T. et al. (1993) Hum. Genet. 92, 69-75.

28 Haefliger, J. et.al. (1987) Biochemistry 26, 3551-3556. 29 White, R.V. et al. (1994) J. Immunol. 152, 2501-2508. 30 Tedesco, F. et al. (1990) J. Clin. Invest. 86, 884-888. 3^ Densen, P. et al. (1996) Mol. Immunol. 33 (suppl.), 68. 32 Zhang, L. et al. (1997) Exp. Clin. Immunol. 14, 67. 33 Kojima, T. et al. (1998) J. Immunol. 161, 3762-3766. 34 Kaufmann, T. et al. (1993) J. Immunol. 150, 4943-4947. 35 Saucedo, L. et al. (1995) J. Immunol. 155, 5022-5028. 36 Rogde, S. et al. (1985) Hum. Genet. 70, 211-216. 3^ Rittner, C. et al. (1993) Hum. Genet. 92, 413-416. 38 Zhang, L. et al. (1995) Hum. Genet. 96, 281-284. 39 Rogde, S. et al. (1991) Nucleic Acids Res. 19, 3762. ^^ Dewald, G. et al. (1994) FEES Lett. 340, 211-215. ^' ^ Rogde, S. et al. (1989) Nucleic Acids Res. 16, 6760. 42 Hermann, D. et al. (1989) Immunogenetics 30, 291-295.

B. Paul Morgan, Department of Medical Biochemistry, University of Wales, Cardiff, UK

Physicochemical properties^^^ C9 is a single-chain a-globulin peptide of 20 amino acids. pi M, (K) predicted observed N-linked glycosylation sites

of 558 amino acids including a leader 4.7 60.7 71 2 (276, 414)

Structure C9 is a globular ellipsoid of dimensions 7 . 7 x 7 x 5.2nm by electron microscopy with 24% a helix and 32% j8 sheet. Twenty-four cysteine residues give 12 intradomain disulfide bonds^. A thrombin cleavage site between residues 264 and 265 generates C9a (34 kDa) and C9b (37kDa) fragments, the former being highly acidic in nature, the latter containing numerous hydrophobic residues'*. There is a calciumbinding site in C9a^. C9 demonstrates a marked propensity to polymerize to form tubular structures containing between 12 and 18 C9 monomers, closely resembling the membrane-inserted membrane attack complex (MAC) with dimensions: inner diameter lOnm, outer diameter 21 nm, length 15 nm^.

Function Multiple copies of C9 (up to 12) bind the C5b8 complex on membranes and assemble the barrel-shaped pore which causes cell lysis. Polymerization of C9 in vitro generates similar structures containing 12-18 C9 monomers. Major conformational changes occur in C9 during MAC formation and polymerization, notably an unfolding which increases length from 7.7 nm to 15 nm.

Tissue distribution Serum protein: 60^g/ml in plasma. Primary site of synthesis: liver. Secondary sites: monocytes, fibroblasts, glial cells.

D

Regulation of expression C9 is an acute-phase protein, upregulated by IFN/in hepatocytes.

Protein sequence^^^ SACRSFAVAI SQCDPCLRQM GNDFQCSTGR LARTAGYGIN LIYETKGEKN EQCCEETASS HLGRFVMRNR LGGLYELIYV KDDCVKRGEG VTDFVNWASS INEFSVRKCH ALEFPNEK

CILEINILTA FRSRSIEVFG CIKMRLRCNG ILGMDPLSTP FRTEHYEEQI ISLHGKGSFR DWLTTTFVD LDKASMKRKG RAVNITSENL INDAPVLISQ TCQNGGTVIL

QYTTSYDPEL QFNGKRCTDA DNDCGDFSDE FDNEFYNGLC EAFKSIIQEK FSYSKNETYQ DIKALPTTYE VELKDIKRCL IDDWSLIRG KLSPIYNLVP MDGKCLCACP

TESSGSASHI VGDRRQCVPT DDCESEPRPP NRDRDGNTLT TSNFNAAISL LFLSYSSKKE KGEYFAFLET GYHLDVSLAF GTRKYAFELK VKMKNAHLKK FKFEGIACEI

DCRMSPWSEW EPCEDAEDDC CRDRWEESE YYRRPWNVAS KFTPTETNKA KMFLHVKGEI YGTHYSSSGS SEISVGAEFN EKLLRGTVID QNLERAIEDY SKQKISEGLP

50 100 150 200 250 300 350 400 450 500 550

The leader sequence is underlined and the two N-linked glycosylation sites are indicated (N).

Protein modules^ 1-20 41-97 98-135 136-504 505-539

Leader sequence TSPl LDLRA MACPF EGF

exon 1 exon 2/3 exon 3/4 exon 4-10 exon 10

Chromosomal location^^ Human: 5pl3 linked with genes for C6 and C7, but separated by at least 2.5 Mb.

cDNA sequence^'^'^^-'^ CAGCATGTCA CACAGCACAG ACACATAGAC ACAAATGTTT CGACGCTGTG TGACTGCGGA TAATGGTGAC TCCCCCCTGC GATCAACATT ACTCTGTAAC GGCTTCTTTG ACAAATTGAA ATCTCTAAAA CTCCTCAATT TTACCAACTA AGAAATTCAT GGATGATATA AACCTATGGA

GCCTGCCGGA TACACGACCA CGCAGAATGA CGTTCAAGAA GGAGACAGAC AATGACTTTC AATGACTGCG AGAGACAGAG TTAGGGATGG CGGGATCGGG ATCTATGAAA GCATTTAAAA TTTACACCCA TCTTTACATG TTTTTGTCAT CTGGGAAGAT AAAGCTTTGC ACTCACTACA

GCTTTGCAGT GTTATGACCC GCCCCTGGAG GCATTGAGGT GACAATGTGT AATGCAGTAC GAGACTTTTC TGGTAGAAGA ATCCCCTAAG ATGGAAACAC CCAAAGGCGA GTATCATCCA CTGAAACAAA GCAAGGGTAG ATTCTTCAAA TTGTAATGAG CAACTACCTA GTAGCTCTGG

TGCAATCTGC AGAGCTAACA TGAATGGTCA CTTTGGACAA GCCCACAGAG AGGCAGATGC AGATGAGGAT GTCTGAGCTG CACACCTTTT TCTGACATAC GAAAAATTTC AGAGAAGACA TAAAGCTGAA TTTTCGGTTT GAAGGAAAAA AAATCGCGAT TGAAAAGGGA GTCTCTAGGA

ATTTTAGAAA GAAAGCAGTG CAATGCGATC TTTAATGGGA CCCTGTGAGG ATAAAGATGC GATTGTGAAA GCACGAACAG GACAATGAGT TACCGAAGAC AGAACCGAAC TCAAATTTTA CAATGTTGTG TCATATTCCA ATGTTTCTGC GTGCTCACAA GAATATTTTG GGACTCTATG

TAAACATCCT 60 GCTCTGCATC 120 CTTGTCTCAG 180 AAAGATGCAC 240 ATGCTGAGGA 300 GACTTCGGTG 360 GTGAGCCCCG 420 CAGGCTATGG 480 TCTACAATGG 540 CTTGGAACGT 600 ATTACGAAGA 660 ATGCAGCTAT 720 AGGAAACAGC 780 AAAATGAAAC 840 ATGTGAAAGG 900 CAACTTTTGT 960 CCTTTTTGGA 1020 AACTAATATA 1080

cDNA sequence TGTTTTGGAT CCTTGGGTAT TAATAAAGAT CCTCATAGAT GAAAGAAAAG TTCCATAAAT TCCAGTGAAA CTATATCAAT TCTAATGGAT AATCAGTAAA GAGCTGTTGG CCCCTGAAGA ACCTCTGAAG TCCTGTGATG CTGCTTTATC ACAGAATGTT ATTCTGCGAG TATGTTTAGT CAGGCACGTG CCAAGGGGGA GGGAGATAAA TAGATAGTGT TTCCTTCTAT TACCTAGC

continued

AAAGCTTCCA CATCTGGATG GATTGTGTAA GATGTTGTTT CTTCTCCGAG GATGCTCCTG ATGAAAAATG GAATTTAGTG GGAAAGTGTT CAAAAAATTT GTTCTCTGAG TAATCTTAGC TCTCTTCTCT TTTCCATTTT CCACGGAAAA GGTTTAAAAA GTCCATGACG TTGGTTATTT CACAGGAGTT AAACATATTA GAGGAGTTGG TCAGTAGGAG TTGATCATAT

TGAAGCGGAA TATCTCTGGC AGAGGGGAGA CACTCATAAG GAACCGTGAT TTCTCATTAG CACACCTAAA TAAGAAAATG TGTGTGCCTG CTGAAGGATT CTGCAGTGGA TGCCAAGTAA CTTAGGTCTA TTGTTCCCTA AGCCAATCTC AGTTCAAAGT AGAGGTCTGT GCTTAGGTGT GCTTTTACTA TATTTGTAAC TTAATGGGTA AATAGAATGA TTTACAAGAA

AGGTGTTGAA TTTCTCTGAA GGGTAGAGCT AGGTGGAACC TGATGTGACT TCAAAAACTG GAAAGAAAAG CCACACATGC CCCATTCAAA GCCAGCCCTA AGAAGAAAAC ATAGCAACAT TAATTTTTTT ATGAGAAGTC TTCTAAAAAA AATTTTCAAA AGCATGCAAT GCATACATTC GTCTTAGCTC CAAAAACTAC CAAAAATCCA ACATAAAGTA AAAACATCAA

CTAAAAGACA ATCTCTGTTG GTAAACATCC AGAAAATATG GACTTTGTCA TCTCCTATAT TTGGAAAGAG CAAAATGGAG TTTGAGGGAA GAGTTCCCCA ACTAGTACCT GCTTCATGAA TTAATTTTTC AACAGTGAAA AAAACAAAAT CGGCTTTGTA TTAAGTGTTA ATTCAGCAAA TACGATTTAA TAGTTTACCA GTTAGATGAA TTAGTTTAAA TTTTATATAG

TAAAGAGATG GAGCTGAATT CCAGTGAAAA CATTTGAACT ACTGGGCCTC ATAATCTGGT CCATTGAAGA GTACAGTGAT TTGCCTGTGA ATGAAAAATA TCAGATCCTA AATCCTACCA TTCCTTAAAC TACGCGAGAA TAAATTAAAA TGGTTAACAT TTTAGATTGT TGCTGAGCAC ATCCATGTGT GAGGACTGAA AGGAATAATA TTATGTGAAA TCCAACTTAA

114 0 12 00 12 60 132 0 13 8 0 144 0 15 00 1560 162 0 168 0 17 4 0 180 0 18 60 192 0 19 80 2 040 210 0 2160 22 2 0 22 80 2340 2 40 0 2 460

The sequence begins just upstream of the ATG initiation codon; the sequence 5' to this in exon 1 is undefined. The first five nucleotides in each exon are underHned to indicate intron-exon boundaries, beginning with the exon 2/3 boundary. The methionine initiation codon (ATG) and the termination codon (TAA) are indicated.

Genomic structure^^'^^ The gene spans approximately 100 kb and contains 11 exons. Exon size varies between 100 and 250 bp, except for exon 11 which is of undetermined length but more than 800 bp. Introns vary in length from 260 bp (intron 3) to >20kb (e.g. intron 1). Note unusual distribution of exons with two clusters of closely spaced exons (2, 3 and 4; 7, 8 and 9) with other exons separated by very large introns. The approximate sizes of these large exons are indicated on the figure.

lOOkbapprox I

1

I .< I l l .<^ |.< I / . I I I . / | x i I >20kb I

I >10kb I I >5kb I I >5kb I

I >20kb I

>15kbl

Accession numbers Human Mouse Rat Rabbit Rainbow trout Puffer fish (Fugu)

X02176 K02766 Y08545 X05475 U52948 U20055 X05474 U87241

Deficiency^^'^^ Numerous cases reported. Prevalence of 1:1000 in the Japanese population. Probably much less common in white populations and other races. Causes increased susceptibility to infection with Neisseria, most commonly manifest as meningococcal meningitis. C350 to T; Rl 15 to stop; predominant in Japanese population CI66 to A; C53 to stop; white populations C464 to T; R153 to stop^^; white populations

1Hi

Polymorphic variants^^

1

An RFLP with the enzyme Taql has been identified with two alleles (frequency in Spanish), Al(0.74) = 6.5 kb and A2(0.26) = 8.0 and 6.0kb. References ' DiScipio, R.G. and Hughli, T.E. (1985) J. Biol. Chem. 260, 14802-14809. 2 DiScipio, R.G. (1993) Mol. Immunol. 30, 1097-1106. 3 Lengwiller, J.S. and Rickli, E.E. (1996) FEES Lett. 380, 8-12. ^ Biesecker, G. et al. (1982) J. Biol. Chem. 257, 2584-2590. 5 Thielens, N.M. et al. (1988) J. Biol. Chem. 263, 6665-6670. 6 Dankert, J.R. et al. (1985) Biochemistry 24, 2754-2762. ^ DiScipio, R.G. et al. (1984) Proc. Natl Acad. Sci. USA 81, 7298-7302. « Stanley, K.K. et al. (1985) EMBO J, 4, 375-382. 9 Marrazziti, D. et al. (1988) Biochemistry 27, 6529-6534. 10 Rogne, S. et al. (1991) J. Med. Genet. 28, 587-590. 11 Hobart, M.J. et al. (1995) J. Immunol. 154, 5188-5194. 1^ Witzel-Schlomp, K. et al. (1997) J. Immunol. 158, 5043-5049. 1^ Horiuchi, T. et al. (1998) J. Immunol. 160, 1509-1513. 1"^ Goto, E. et al. (1990) Nucleic Acids Res. 18, 5581.

Parts Regulators of Complement Activation (RCA)

CRl Lloyd B. Klickstein, Brigham and Women's Hospital, Boston, MA, USA. Joann M. Moulds, University of Texas Medical School, Houston, TX, USA Other n a m e s Complement receptor type 1, C3b/C4b receptor, CD35, i m m u n e adherence receptor.

Physicochemical properties CRl is a type 1 integral membrane glycoprotein of 2044 amino acids^-^, of which the leader sequence comprises either 41 or 46 amino acids; there are two possible translation initiation sites. The C-terminal transmembrane region contains 25 amino acids and there are 43 residues in the C-terminal cytoplasmic domain. The N-terminal residue is blocked^'^^, compatible with pyrollidone amide cyclization or N-terminal alkylation of Gln47. There are at least four major structural allotypes described in humans^^, the most common form is CR1*1 (F or A), and all further descriptions will focus on that h u m a n form except where specifically noted. The other forms are CR1*2 (B or S), CR1*3 (Cor F') and CR1*4 (D). pF 7.1 M, (K) 205-250 (depending on cell source and electrophoresis system^-^^).

Allotype CR1*1 CR1*2 CR1*3 CR1*4

Approx. Mj. (reduced) 220-250 250-280 190-220 >280

Approx. Mr (unreduced) 190-210 220-250 160-190 >250

CRl from polymorphonuclear leukocytes migrates Mj. (K) 5 larger than that from erythrocytes due to altered N-linked glycosylation^^. N-linked glycosylation sites 25 (61, 161, 257, 320, 415, 452, 514, 583, 707, 770, 865, 902, 964, 1033, 1157, 1220, 1315, 1486, 1509, 1539, 1545, 1610, 1673, 1768, 1913) N-linked glycosylation contributes approximately 20-25 K to the molecular weight of the CR1*1^'^^-^^. Protein sequence data from erythrocyte C R P supports occupancy of sites at 514 and 964. Similarly, sites at 320 and 770 are unoccupied. Occupancy of the other sites is unknown. There is no detectable O-linked glycosylation^^

Structure CRl has an extracellular region comprised of a linear array of 30 CCP units of 59-75 residues each^-^. There are 120 cysteine residues and all are believed involved in disulfide links, based on structural homology to ^2 glycoprotein V^. An extended linear structure has been confirmed by electron microscopy^^. The N-terminal 28 CCPs are further organized as four tandem, long homologous repeats of seven CCP units each^'^. The predicted transmembrane region was confirmed by deletion mutagenesis, which resulted in a soluble form of the protein^^'^^.

Function CRl has long been recognized as the receptor for C3b and C4b fragments, and recently as a receptor for Clq^^. CRl also binds iC3b, but relatively poorly^*. Human erythrocyte CRl mediates binding of complementopsonized i m m u n e complexes or microorganisms to the celP^. These bound complexes or particles are then carried through the bloodstream to the spleen or liver where they are removed^'^^^. CRl on neutrophils and monocytes can mediate phagocytosis if the cells are primed or activated^^-^^. CRl on B cells and dendritic cells participates in localization of antigen for presentation to T cells^^--^^. CRl on all cell types is a cofactor for factor I-mediated cleavage of C3b to iC3b and C3f, and further cleavage of iC3b to C3c and C3d,g. CRl is a cof actor for factor 1-mediated cleavage of C4b to C4c and C4d. CRl also accelerates the otherwise spontaneous decay of the C3 and C5 convertases of the classical pathway (C4b2a and C4b2a3b) as well as that of the corresponding alternative pathway convertases (C3bBb and C3bBbC3b)^'2,33 These activities may be either intrinsic or extrinsic (located on the same surface as the CRl or not)-^^"^^.

d Tissue distribution CRl as a type 1 transmembrane protein is found on all erythrocytes, B cells, polymorphonuclear leukocytes, monocytes, follicular dendritic cells and glomerular podocytes and is also found on a subset of T cells^'-^M/ C R I is absent on NK cells^*. A soluble form is found in serum at a concentration reported at 30-60 ng/ml^^''^^, however this is an overestimate as the monoclonal antibodies used have repeated epitopes in CRP'^^.

Regulation of expression CRl is constitutively expressed on the previously mentioned cells. It is slowly lost from the surface of erythrocytes over the normal life of the cells. This loss is greatly accelerated in patients with i m m u n e complex diseases such as systemic lupus erythematosus'*^-^^ and is an acquired phenomenon, not an hereditary predisposition to illness"^^. Ninety per cent of neutrophil CRl is intracellular^'^'^*, located in secretory vesicles distinct from azurophilic or specific granules^^. Upon neutrophil activation with chemotactic peptides or other stimuli, this intracellular C R l is mobilized to the cell surface^^'^^.

Protein sequence3A50 MCLGRMGASS PEWLPFARPT KDRCRRKSCR IISGDTVIWD NPGSGGRKVF GILVSDNRSL VCQPPPDVLH WSPAAPTCEV SASYCVLAGM VNYTCDPHPD PDHFLFAKLK KDVCKRKSCK ILSGNAAHWS NPGSGGRKVF GILVSDNRSL VCQPPPDVLH WSPAAPTCEV SASYCVLAGM VNYTCDPHPD PDHFLFAKLK KDVCKRKSCK ILSGNTAHWS NLGSRGRKVF GILVSDNRSL VCQPPPEILH WSPEAPRCAV SVSHCVLVGM ISYTCDPHPD CKTPEQFPFA SSVEDNCRRK TTCLVSGNNV YQCHTGPDGE VENAIRVPGN CSRVCQPPPE QGDWSPEAPR KGRSASHCVL GKEISYACDT AACPHPPKIQ IWSQLDHYCK GSPWSQCQAD LKHRKGNNAH

PRSPEPVGPP NLTDEFEFPI NPPDPVNGMV NETPICDRIP ELVGEPSIYC FSLNEWEFR AERTQRDKDN KSCDDFMGQL ESLWNSSVPV RGTSFDLIGE TQTNASDFPI TPPDPVNGMV TKPPICQRIP ELVGEPSIYC FSLNEWEFR AERTQRDKDN KSCDDFMGQL ESLWNSSVPV RGTSFDLIGE TQTNASDFPI TPPDPVNGMV TKPPICQRIP ELVGEPSIYC FSLNEWEFR GEHTPSHQDN KSCDDFLGQL RSLWNNSVPV RGMTFNLIGE SPTIPINDFE SCGPPPEPFN TWDKKAPICE QLFELVGERS RSFFSLTEII ILHGEHTLSH CTVKSCDDFL AGMKALWNSS HPDRGMTFNL NGHYIGGHVS EVNCSFPLFM DRWDPPLAKC ENPKEVAIHL

APGLPFCCGG GTYLNYECRP HVIKGIQFGS CGLPPTITNG TSNDDQVGIW CQPGFVMKGP FSPGQEVFYS LNGRVLFPVN CEQIFCPSPP STIRCTSDPQ GTSLKYECRP HVITDIQVGS CGLPPTIANG TSNDDQVGIW CQPGFVMKGP FSPGQEVFYS LNGRVLFPVN CEQIFCPSPP STIRCTSDPQ GTSLKYECRP HVITDIQVGS CGLPPTIANG TSNDDQVGIW CQPGFVMKGP FSPGQEVFYS PHGRVLFPLN CEHIFCPNPP STIRCTSDPH FPVGTSLNYE GMVHINTDTQ IISCEPPPTI lYCTSKDDQV RFRCQPGFVM QDNFSPGQEV GQLPHGRVLL VPVCEQIFCP IGESSIRCTS LYLPGMTISY NGISKELEMK TSRAHDALIV HSQGGSSVHP

SLLAWVLLA GYSGRPFSII QIKYSCTKGY DFISTNRENF SGPAPQCIIP RRVKCQALNK CEPGYDLRGA LQLGAKVDFV VIPNGRHTGK GNGVWSSPAP EYYGRPFSIT RINYSCTTGH DFISTNRENF SGPAPQCIIP RRVKCQALNK CEPGYDLRGA LQLGAKVDFV VIPNGRHTGK GNGVWSSPAP EYYGRPFSIT RINYSCTTGH DFISTNRENF SGPAPQCIIP RRVKCQALNK CEPGYDLRGA LQLGAKVSFV AILNGRHTGT GNGVWSSPAP CRPGYFGKMF FGSTVNYSCN SNGDFYSNNR GVWSSPPPRC VGSHTVQCQT FYSCEPSYDL PLNLQLGAKV NPPAILNGRH DPQGNGVWSS TCDPGYLLVG KVYHYGDYVT GTLSGTIFFI RTLQTNEENS

LPVAWGOCNA CLKNSVWTGA RLIGSSSATC HYGSWTYRC NKCTPPNVEN WEPELPSCSR ASMRCTPQGD CDEGFQLKGS PLEVFPFGKA RCGILGHCQA CLDNLVWSSP RLIGHSSAEC HYGSWTYRC NKCTPPNVEN WEPELPSCSR ASMRCTPQGD CDEGFQLKGS PLEVFPFGKA RCGILGHCQA CLDNLVWSSP RLIGHSSAEC HYGSWTYRC NKCTPPNVEN WEPELPSCSR ASLHCTPQGD CDEGFRLKGS PSGDIPYGKE RCELSVRAGH SISCLENLVW EGFRLIGSPS TSFHNGTWT ISTNKCTAPE NGRWGPKLPH RGAASLHCTP SFVCDEGFRL TGTPFGDIPY PAPRCELSVP KGFIFCTDQG LKCEDGYTLE LLIIFLSWII RVLP

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750 1800 1850 1900 1950 2000

The leader sequence is underlined and the potential N-linked glycosylation sites are indicated (N).

Protein modules^'^^^^ 1 or 6-46 47-106 107-168 169-238 239-300 301-360 361-423 424-496 497-556 557-618 619-688 689-750 751-810 811-873 874-946 947-1006 1007-1068 1069-1138 1139-1200 1201-1260 1261-1323 1324-1399 1400-1459 1460-1521 1522-1591 1592-1653 1654-1713 1714-1776 1777-1851 1852-1911 1912-1972 1977-2001 2002-2044

Leader peptide CCP1, begin LHR-A CCP2 CCP3 CCP4 CCP5 CCP6 CCP7, end LHR-A CCP8, begin LHR-B CCP9 CCP 10 CCP 11 CCP 12 CCP 13 CCP 14, end LHR-B CCP 15, begin LHR-C CCP 16 CCP 17 CCP 18 CCP 19 CCP20 CCP21, end LHR-C CCP22, begin LHR-D CCP23 CCP24 CCP25 CCP26 CCP27 CCP28, end LHR-D CCP29 CCP30 Transmembrane region Cytoplasmic region

exon 1 exon 2 exon 3/4 exon 5 exon 5 exon 6 exon 7/8 exon 9 exon 10 exon 11/12 exon 13 exon 13 exon 14 exon 15/16 exon 17 exon 18 exon 19/20 exon 21 exon 21 exon 22 exon 23/24 exon 25 exon 26 exon 27/28 exon 29 exon 29 exon 30 exon 31/32 exon 33 exon 34 exon 35 exon 36/37 exon 38

The ligand-binding sites are"^'^^'^^-^^: 47-300 CCPs 1-4 C4b-binding site (lower affinity for C3b) 497-750 CCPs 8-11 C3b-binding site (lower affinity for C4b) 947-1200 CCPs 15-18 C3b-binding site (lower affinity for C4b) 1400-1851 CCPs 22-28 A Clq-binding site

Chromosomal location Human56.57: iq32. Telomere ... MCP ... CRl ... CR2 ... DAF ... C4bp ... Centromere Factor H (Cfh) maps to lq32 but has not been physically linked with other members of the RCA. Mouse^*'^^: chromosome Iq, 40 cM. Telomere. ... Crry ... CR1/CR2 ... Cfh ... C4bp ... Centromere

cDNA sequence3A50,60 TTTTGTCCCG AGTCCTATTT TGTAGATGTG CGCCGGCGCC TTGCGCTGCC CTACCAACCT GCCCTGGTTA GTGCTAAGGA TGGTGCATGT GATACCGACT GGGATAATGA ATGGAGATTT GCTGCAATCC ACTGCACCAG TACCTAACAA GCTTATTTTC GACCCCGCCG CCAGGGTATG ACAACTTTTC GGGCTGCGTC AAGTGAAATC TAAATCTCCA GCAGCTCTGC CAGTGTGTGA GAAAACCTCT CAGACAGAGG CTCAAGGGAA AAGCCCCAGA CCATTGGGAC TCACATGTCT GTAAAACTCC GATCCAGAAT AATGTATCCT TTCCTTGTGG ATTTTCACTA TGTTTGAGCT TCTGGAGCGG AAAATGGAAT TTAGGTGTCA ACAAATGGGA TGCATGCTGA ACAGCTGTGA GAGACTGGAG AACTTCTTAA TTGTTTGTGA GAATGGAAAG CTCCAGTTAT AAGCAGTAAA GAGAGAGCAC CCCCTCGCTG TGAAAACCCA GTCCTGAGTA GTCCCAAAGA TGGTGCATGT

GAACCCCGCA GCCCTCCCCA TCGCTGAGCT TTTCCTCTTA CTTGGGGAGA ATGGGGGCCT CGGTCTCCCC TTCTGCTGCG GGTGGCCTGG GGTCAATGCA AACTGATGAG TTTGAGTTTC TTCCGGAAGA CCGTTTTCTA CAGGTGCAGA CGTAAATCAT GATCAAAGGC ATCCAGTTCG CATTGGTTCC TCGTCTGCCA AACACCTATT TGTGACAGAA CATTAGCACC AACAGAGAGA TGGAAGCGGA GGGAGAAAGG CAATGACGAT CAAGTGGGCA ATGCACGCCT CCAAATGTGG CTTAAATGAA GTTGTGGAGT TGTGAAGTGC CAGGCCCTGA TCAGCCACCT CCAGATGTCC ACCTGGGCAG GAAGTGTTCT TATGCGCTGC ACACCCCAGG CTGTGATGAC TTCATGGGCC GCTTGGAGCA AAAGTGGATT TAGTTACTGT GTCTTGGCTG ACAAATCTTT TGTCCAAGTC GGAAGTCTTT CCCTTTGGAA GACGAGCTTC GACCTCATTG TGGGGTTTGG AGCAGCCCTG TCATTTTCTG TTTGCCAAGT ATCTTTAAAG TACGAATGCC AGATAACCTG GTCTGGTCAA TCCAGATCCA GTGAATGGCA CAACTATTCT TGTACTACAG CTCGGGCAAT GCTGCCCATT GCTACCCCCC ACCATCGCCA TGGATCAGTG GTGACCTACC TGTGGGTGAG CCCTCCATAT CCCGGCCCCT CAGTGCATTA ATTGGTATCT GACAACAGAA GCCTGGCTTT GTCATGAAAG GCCGGAGCTA CCAAGCTGCT GCGTACCCAA AGGGACAAGG GCCCGGCTAT GACCTCAGAG CCCTGCAGCC CCCACATGTG TGGCCGTGTG CTATTTCCAG TGAAGGATTT _CAATTAAAAG CCTTTGGAAT AGCAGTGTTC TCCTAATGGG AGACACACAG TTACACATGC GACCCCCACC CATCCGCTGC ACAAGTGACC TGGAATTCTG GGTCACTGTC AACCAATGCA TCTGACTTTC CTACGGGAGG CCATTCTCTA TGTCTGTAAA CGTAAATCAT GATCACAGAC ATCCAGGTTG

CACTCTGGGC TTTCAGTTTT CTTCTCCAAG GAGGATCCCT ATGCCCCAGA CCATTGGGAC TCATCTGCCT GTCGTAATCC GATCCCAAAT CATGCATCAT TTCCTTGTGG ATTTTCACTA TGTTTGAGCT TCTGGAGCGG AAAATGGAAT TTAGGTGTCA ACAAATGGGA TGCATGCTGA ACAGCTGTGA GAGACTGGAG AACTTCTTAA TTGTTTGTGA GAATGGAAAG CTCCAGTTAT AAGCAGTAAA GAGAGAGCAC CCCCTCGCTG TGAAAACCCA GTCCTGAGTA GTCCCAAAGA TGGTGCATGT GGCACCGACT GGAGCACGAA ATGGAGATTT GCTGCAATCC ACTGCACCAG TACCTAACAA GCTTATTTTC GACCCCGCCG CCAGGGTATG ACAACTTTTC GGGCTGCGTC AAGTGAAATC TAAATCTCCA GCAGCTCTGC CAGTGTGTGA GAAAACCTCT CAGACAGAGG CTCAAGGGAA AAGCCCCAGA CCATTGGGAC TCACATGTCT GTAAAACTCC GATCCAGAAT

GCGGAGCACA CTTCGAGATC AAGCCCGGAG GCTGGCGGTT ATGGCTTCCA ATATCTGAAC AAAAAACTCA TCCAGATCCT TAAATATTCT CTCAGGTGAT GCTACCCCCC TGGATCAGTG TGTGGGTGAG CCCCGCCCCT ATTGGTATCT GCCTGGCTTT GCCGGAGCTA GCGTACCCAA GCCCGGCTAC CCCTGCAGCC TGGCCGTGTG TGAAGGATTT CCTTTGGAAT TCCTAATGGG TTACACATGC CATCCGCTGC TGGAATTCTG AACCAATGCA CTACGGGAGG TGTCTGTAAA GATCACAGAC CATTGGTCAC GCCGCCAATT CATTAGCACC TGGAAGCGGA CAATGACGAT ATGCACGCCT CTTAAATGAA TGTGAAGTGC TCAGCCACCT ACCCGGGCAG TATGCGCTGC CTGTGATGAC GCTTGGAGCA TAGTTATTGT ACAAATCTTT GGAAGTCTTT GACGAGCTTC TGGGGTTTGG TCATTTTCTG ATCTTTAAAG AGATAACCTG TCCAGATCCA CAACTATTCT

ATGATTGGTC AAATCTGGTT CCTGTCGGGC GTGGTGCTGC TTTGCCAGGC TATGAATGCC GTCTGGACTG GTGAATGGCA TGTACTAAAG ACTGTCATTT ACCATCACCA GTGACCTACC CCCTCCATAT CAGTGCATTA GACAACAGAA GTCATGAAAG CCAAGCTGCT AGGGACAAGG GACCTCAGAG CCCACATGTG CTATTTCCAG CAATTAAAAG AGCAGTGTTC AGACACACAG GACCCCCACC ACAAGTGACC GGTCACTGTC TCTGACTTTC CCATTCTCTA CGTAAATCAT ATCCAGGTTG TCATCTGCTG TGTCAACGAA AACAGAGAGA GGGAGAAAGG CAAGTGGGCA CCAAATGTGG GTTGTGGAGT CAGGCCCTGA CCAGATGTCC GAAGTGTTCT ACACCCCAGG TTCATGGGCC AAAGTGGATT GTCTTGGCTG TGTCCAAGTC CCCTTTGGAA GACCTCATTG AGCAGCCCTG TTTGCCAAGT TACGAATGCC GTCTGGTCAA GTGAATGGCA TGTACTACAG

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240

cDNA sequence

continued

GGCACCGACT CATTGGTCAC GGAGCACGAA GCCGCCAATT ATGGAGATTT CATTAGCACC GCTGCAATCT TGGAAGCAGA ACTGCACCAG CAATGACGAT TACCTAACAA ATGCACGCCT GCTTATTTTC CTTAAATGAA GACCCCGCCG TGTGAAGTGC CCAGGGTGTG ^CAGCCGCCT ACAACTTTTC ACCTGGGCAG GGGCTGCGTC TCTGCACTGC CAGTGAAATC CTGTGATGAC TTAATCTCCA GCTTGGGGCA GCAGTTCCGT TAGTCATTGT CTGTGTGTGA ACATATCTTT GAACTCCCTC TGGAGATATT CAGACAGAGG GATGACCTTC CTCATGGGAA TGGGGTTTGG GTCACTGTAA AACCCCAGAG TTGAGTTTCC AGTCGGGACA TGTTCTCTAT CTCCTGCCTA GAAAATCATG TGGACCTCCA CACAGTTTGG ATCAACAGTT CATCTACTAC TTGTCTCGTC GTGAGATCAT ATCTTGTGAG ATAGAACATC TTTTCACAAT GAGAACAGCT GTTTGAGCTT AAGTTGGTGT TTGGAGCAGC CAGAAGTTGA AAATGCAATT TCATCAGATT TAGATGTCAG AGACCAATGG CAGATGGGGG CAGAAATCCT GCATGGTGAG AAGTGTTCTA CAGCTGTGAG CGCCCCAGGG AGACTGGAGC TCCTGGGCCA ACTCCCTCAT AGGTGTCCTT TGTTTGCGAT TCTTGGCTGG AATGAAAGCC GTCCAAATCC TCCAGCTATC CCTATGGAAA AGAAATATCT ACCTCATTGG GGAGAGCTCC GCAGCCCTGC CCCTCGCTGT TCCAAAACGG GCATTACATT GCTACACTTG TGACCCCGGC AGGGAATCTG GAGCCAATTG TTATGAATGG AATCTCGAAG TGACTTTGAA GTGTGAAGAT CGGATGACAG ATGGGACCCT TAGTTGGCAC TTTATCTGGT TAATTCTAAA GCACAGAAAA ATTTACATTC TCAAGGAGGC ATAGCAGGGT CCTTCCTTGA TGGTGGGAAA GGAGCCAATT AAGTGACTTC ACAGAGACGC TAGCAAAGCT CCTGCCTCTT

TCATCTGCTG AATGTATCCT TGTCAACGAA TTCCTTGTGG AACAGAGAGA ATTTTCACTA GGGAGAAAGG TGTTTGAGCT CAAGTGGGCA TCTGGAGCGG CCAAATGTGG AAAATGGAAT GTTGTGGAGT TTAGGTGTCA CAGGCCCTGA ACAAATGGGA CCAGAAATCC TGCATGGTGA GAAGTGTTCT ACAGCTGTGA ACACCCCAGG GAGACTGGAG TTCTTGGGTC AACTCCCTCA AAGGTGTCCT TTGTCTGTGA GTCTTGGTTG GAATGAGAAG TGTCCAAATC CTCCAGCTAT CCCTATGGAA AAGAAATATC AACCTCATTG GGGAGAGCAC AGCAGCCCTG CCCCTCGCTG CAGTTTCCAT TTGCCAGTCC TCTTTGAATT ATGAATGCCG GAAAACTTGG TCTGGTCAAG CCAGAACCCT TCAATGGAAT AATTATTCTT GTAATGAAGG TCAGGCAATA ATGTCACATG CCACCTCCAA CCATATCCAA GGAACGGTGG TAACTTACCA GTGGGAGAAC GGTCAATATA CCTCCCCCTC GGTGTATTTC AGAGTACCAG GAAACAGGAG CCCGGGTTTG TCATGGTAGG CCCAAGCTGC CACACTGCTC CATACCCTAA GCCATCAGGA CCCAGCTATG ACCTCAGAGG CCTGAAGCCC CTAGATGTAC GGCCGTGTGC TACTTCCACT GAAGGGTTCC _GATTAAAAGG CTTTGGAATA GCAGTGTTCC CTTAATGGGA GACACACAGG TACGCATGCG ACACCCACCC ATCCGCTGCA CAAGTGACCC GAACTTTCTG TTCCTGCTGC GGAGGACACG TATCTCTATA TACCTGTTAG TGGGAAAGGG GATCATTATT GCAAAGAAGT GAGTTAGAAA TGAAAAAAGT GGGTATACTC TGGAAGGCAG CCTCTGGCCA AATGTACCTC ACGATCTTCT TTATTTTACT GGCAATAATG CACATGAAAA AGCAGCGTTC ATCCCCGAAC CAAAGTACTA TACAGCTGAA GATTTCAACA GAATCAGATC AGACATGTGC ACTTGAAGAT TGTGTGCGTC ACTGTGAAAC

CTCAGGCAAT GCTACCCCCA TGGATCAGTG TGTGGGTGAG CCCCGCCCCT ATTGGTATCT GCCTGGCTTT GCCAGAGTTA GCATACCCCA GCCTGGCTAT CCCTGAAGCC TGGCCGTGTG TGAAGGGTTT CCTTTGGAAT CCTTAATGGG TTACACATGT CATCCGCTGC TGAACTTTCT TACGATCCCA TCCTGGGTAT TGTTGAAGAC GGTGCATATA GTTTCGACTC GGATAAGAAG TGGAGACTTC GTGCCACACT TTGCACCAGC TACTAATAAA TTTCTTTTCC GTCCCACACT CAGGGTGTGT CAACTTTTCA GGCTGCGTCT AGTGAAATCC TAATCTCCAG CAGGTCTGCT AGTGTGTGAA AACTCCCTTT AGACAGAGGG TCAAGGGAAT CTGCCCACAT TCTTCCTGGG CTTCATTTTC AAATTGTAGC ATATCACTAT TCCCTGGAGC TCGTGCACAT CATCATTTTC CCCTAAAGAA TCTGCAAACA GAACATCTCG TGAGCTTCAT GCTGCCCCTT CCCCACCCTT

ACTGCCCATT ACCATCGCCA GTGACCTACC CCCTCCATAT CAGTGCATTA GACAACAGAA GTCATGAAAG CCAAGCTGCT AGCCATCAGG GACCTCAGAG CCGAGATGTG CTATTTCCAC CGCTTAAAGG AACAGTGTTC AGACACACAG GACCCCCACC ACAAGTGACC GTTCGTGCTG ATTAATGACT TTTGGGAAAA AACTGTAGAC AACACAGATA ATTGGTTCCC GCACCTATTT TACAGCAACA GGACCAGATG AAAGATGATC TGCACAGCTC CTCACTGAGA GTGCAGTGCC CAGCCGCCTC CCTGGGCAGG CTGCACTGCA TGTGATGACT CTTGGGGCAA AGTCATTGTG CAAATCTTTT GGAGATATTC ATGACCTTCA GGGGTTTGGA CCACCCAAGA ATGACAATCA TGTACAGACC TTCCCACTGT GGAGATTATG CAGTGCCAGG GATGCTCTCA CTCTCTTGGA GTGGCTATCC AATGAAGAAA AATACAATTT AAAGTCTTTG CCCTGGTACC CTGCCTCGTG

3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680 4740 4800 4860 4920 4980 5040 5100 5160 5220 5280 5340 5400 5460 5520 5580 5640 5700 5760 5820 5880 5940 6000 6060 6120 6180 6240 6300 6360 6420 6480

cDNA sequence CTAAACGCAC TGGATTACTT TCTTTTTTAA AAAGTTATGA TTTGATTCAT CCCCCTTAGT GAGTGAAATA AAAGGCATGA ATAAGATTTC AGTGCAGTGG CTGCCTCAGC CACGCCCGGC ATGGTCTGGA ACAGGCATGA ACTTTGTGCT ATTATAAAAG AACACAACTT

ACAGTATCTA AAAGGAATAA AATATTTGTA AAAATAAGTC TTTCTGCCTA TTGTTTCCTT TATGCTATAT AATGATCATG GATATCTTCT CGTAATCTCG CTCCTGAGTA TAATTTTTTT TCTCCTGACC GCCACCGCGC GTGTTCTATA TACTAGCTTA TTAAAAAATG

continued GTCAGGGGAA GGTGTTGCCT ATATGGAATG ACTTATAATT TCTTCTTTCA TTATTTTATA CAGTTTTTAC GGAAGAGTGG TTTTTTTTGA GCTCACTGCA GTTGGGACTA GTATTTTTAG TCGTGATCCA CTGGCCGCTT TAAAAAACAT CTTTTGTATG TATCAAAAAT

AAGACTGCAT GGAATTTCTG GGCTCAGTAA ATGCTACCTA CATATGTGTT GAGCAGAACC TTTCTCTAGG TTAAGACTAC GATGGAGTCT ACGTCCGCCT CCAGTAGATG TAGAGACGGG CCCGCCTCGG TCGATATTTT AATAAAAATT GATTCAGAAT AATAAACGTG

TTAGGAGATA GTTTGTAAGG GAAGAGCTTG CTGATAACCA TTTTTACATA CTAGTCTTTT GAGAAAAATT TGAAGAGAAA GGCTCTGTCT CCTGGGTTGA GGACTACAGG GTTTCACCAT CCTCCCAAAG CTAAACTTTA GAAATGAAAG ATACTAAATT TTCTGATATT

GAAAATAGTT TGGTCACTGT GAAAATGCAG CTCCTAATAT CGTACTTTTC AAACAGTTTA AATTTACTAG TATTTGGAAA CCCAGGCTGG CACCATTTTC CACCTGCCAA GTTAGCCAGG TGCTGCGATT ATTCAAAAGC AATAATTGTT AACTTTTTAA TTT

6540 6600 6660 672 0 67 80 6840 6900 69 60 7 02 0 7 08 0 7140 72 00 72 60 73 2 0 7 3 80 7440

The first five nucleotides in each exon are underhned. There are two transcriptional start sites Tl and A30, the A is predominant by SI nuclease analysis^^. The two possible methionine initiation codons (ATG), the termination codon (TGA) and the known polyadenylation site (AATAAAl are indicated. In this figure, nucleotides 116-7061 are a compilation from references 3 and 4, determined from cDNA clones. Nucleotides 1-115 and 7062-7493 were determined from genomic clones^^.

Genomic structure^^ The gene for the CR1*1 allotype of CRl spans approximately 133 kb and is encoded by 39 exons as illustrated below.

LHR-S

1

l—HH

\

20

1 III III llllMll llllllll

39

IIIIIH I I

The difference between the major allotypes is accounted for by deletion or duplication of a large segment of genomic DNA encoding an LHR-length of peptide sequence. The gene encoding the CRl*2 allotype is approximately 150-160 kb and is encoded by 47 exons, with the additional 8 exons inserted approximately in the location indicated. The gene encoding the CR1*3 allotype contains a deletion somewhere within the LHR-B to LHR-C regions, however the location has not been determined precisely^*^.

^Q Accession numbers (EMBL/GenBank) Human Chimpanzee Baboon Mouse

Q^13,4,50,60

Mouse

Crry69

Rat

Crry^o

C R 161,64 CR165 CR1/CR262.63-68

cDNA Y00816 L24920-L24922 L39791 M61132 M36470 M29281 M35684 J04153 M33527 U17123-U17128 X98171 M23529 M34164-M34173 L36532 D42115

Genomic L17390-L17430

Deficiency No humans totally lacking CRl have been identified. The Knops, McCoy, Swain-Langley and York blood group antigens are located on CRl, and some individuals with these antibodies have very low levels of erythrocyte CRF^. Acquired low levels of erythrocyte CRl are seen in patients with systemic lupus erythematosus^^-^^. These patients have abnormal clearance of immune complexes. Knockout mice have been prepared that lack CR1/CR2 and these animals exhibit profound defects in T cell and B cell function-^^'^^.

Polymorphic variants The structural allotypes below are a consequence of large insertions or deletions in the CRl gene, and may be detected by M, difference upon SDSPAGE^-^, northern blot analysis of mRNA or southern blot analysis of genomic DNA. The structural allotype may affect affinity of CRl for C3b dimers^^. The quantitative allotype, H or L, regulates CRl expression level on erythrocytes. Erythrocytes from individuals homozygous for the H allotype bear 4-10-fold more cell surface CRl than those from individuals homozygous for the L allotype^^.

Polymorphism frequencies^ Structural alleles CR1*1 CR1*2 CR1*3 CR1*4

White population 0.86-0.93 0.07-0.26 0-0.02 <0.01

AfricanAmericans 0.82-0.84 0.11-0.12 0.04-0.06 <0.01

MexicanAmericans 0.89 0.11 0 <0.01

Chinese/ Taiwanese 0.96 0.03 0.01 0

Quantitative alleles H L

White population 0.75-0.78 0.25-0.22

AfricanAmericans 0.85 0.15

MexicanAmericans 0.80 0.20

Chinese/ Taiwanese 0.71 0.28

Knops Phenotype Kn(a+) McC(a+) McC(b+) Sl(a+) Yk(a+)

White population 0.98 0.98 0.01 0.99 0.98

AfricanAmericans 0.99 0.94 0.51 0.65 0.92

MexicanAmericans Unknown Unknown Unknown Unknown Unknown

Chinese/ Taiwanese 0.99 1.00 0.02 0.97 Unknown

References 1 Fearon, D.T. (1979) Proc. Natl Acad. Sci. USA 76, 5867-5871. 2 Fearon, D.T. (1980) J. Exp. Med. 152, 20-30. ^ Klickstein, L.B. et al. (1987) J. Exp. Med. 165, 1095-1112. ^ Klickstein, L.B. et al. (1988) J. Exp. Med. 168, 1699-1717. 5 Holers, V.M. et al. (1986) Complement 3, 63-78. 6 Dykman, T.R. et al. (1983) Proc. Natl Acad. Sci. USA 80, 1698-1702. ' Wong, W.W. et al. (1983) J. Clin. Invest. 72, 685-693. s Dykman, T.R. et al. (1984) J. Exp. Med. 159, 691-703. 9 Dykman, T.R. et al. (1985) J. Immunol. 134, 1787-1789. 0 Wong, W.W. et al. (1985) J. Immunol. Methods 82, 303-313. 1 Lublin, D.M. et al. (1986) J. Biol. Chem. 261, 5736-5744. 2 Atkinson, J.M. and Jones, E.A. (1984) J. Clin. Invest. 74, 1649-1657. 3 Sim R.B. (1985) Biochem. J. 232, 883-889. ^ Lozier, J. et al. (1984) Proc. Natl Acad. Sci. USA 81, 3640-3644. 5 Weisman, H.F. et al. (1990) Science 249, 146-151. 6 Wong, W.W. and Farrell, S.A. (1991) 146, 656-662. 7 Klickstein, L.B. et al. (1997) Immunity 7, 345-355. « Kalli, K.R. et al. (1991) J. Immunol. 147, 590-594. 9 Nelson, R.A. (1953) Science 118, 733-737. 20 Bennaceraf, B. et al. (1959) J. Immunol. 82, 131-137. 2^ Arend, W.P. and Mannik, M. (1971) J. Immunol. 107, 63-75. ^^ Cornacoff, J.B. et al. (1983) J. Clin. Invest. 71, 236-247. 23 Schifferli, J.A. et al. (1988) J. Immunol. 140, 899-904. 2^ Kimberly, R.P. et al. (1989) J. Clin. Invest. 84, 962-970. ^5 Fearon, D.T. et al. (1981) J. Exp. Med. 153, 1615-1628. 26 Newman, S.L. et al. (1980) J. Immunol. 125, 2236-2244. 27 Wright, S.D. et al. (1983) J. Exp. Med. 158, 1338-1343. 2« Klaus, G.G.B. et al. (1980) Immunol. Rev. 53, 3-28. 29 Croix, D.A. et al. (1996) J. Exp. Med. 183, 1857-1864. 30 Carroll, M.C. (1998) Annu. Rev. Immunol. 16, 545-568. 3^ Fang, Y. et al. (1998) J. Immunol. 160, 5273-5279. 32 Molina, H. et al. (1996) Proc. Natl Acad. Sci. USA 93, 3357-3361. 33 lida, K. and Nussenzweig, V. (1981) J. Exp. Med. 153, 1138-1150. 3^ Medof, M. and Nussenzweig, V. (1984) J. Exp. Med. 159, 1669-1685. 35 Kinoshita, T. et al. (1986) J. Exp. Med. 164, 1377-1388.

^6 Fischer, E. et al. (1986) J. Immunol. 136, 1373-1377. ^^ Wilson, J.G. et al. (1983) J. Immunol. 131, 684-689. 3« Tedder, T.F. et al. (1983) J. Immunol. 130, 1668-1673. ^9 Yoon, S.H. and Fearon, D.T. (1985) J. Immunol. 134, 3332-3338. ^« Pascual, M. et al. (1993) J. Immunol. 151, 1702-1711. ^^ Nickells, M. et al. (1998) Clin. Exp. Immunol. 112, 27-33. -^2 Miyakawa, Y. et al. (1981) Lancet 2, 493-497. ^3 Wilson, J.G. et al. (1982) N. Engl. J. Med. 307, 981-986. ^^ lida, K. et al. (1982) J. Exp. Med. 155, 1427-1438. ^5 Walport, M.J. et al. (1985) Clin. Exp. Immunol. 59, 547-554. ^6 Fearon, D.T. and Collins, L.A. (1983) J. Immunol. 130, 370-375. ^^ Berger, M. et al. (1984) J. Clin. Invest. 74, 1566-1571. ^« O'Shea, J.J. et al. (1985) J. Immunol. 134, 2580-2587. ^9 Berger, M. et al. (1991) Proc. Natl Acad. Sci. USA 88, 3019-3023. 50 Hourcade, D. et al. (1988) J. Exp. Med. 168, 1255-1270. 5i Krych, M. et al. (1991) Proc. Natl Acad. Sci. USA 88, 4353-4357. 52 Kalli, K.R. et al. (1991) J. Exp. Med. 174, 1451-1460. 53 Makrides, S.C. et al. (1992) J. Biol. Chem. 267, 24754-24761. 5^ Krych, M. et al. (1994) J. Biol. Chem. 269, 13273-13278. 55 Reilly, B.D. et al. (1994) J. Biol. Chem. 269, 7696-7701. 56 Rey-Campos, J. et al. (1988) J. Exp. Med. 167, 664-669. 57 Carroll, M.C. et al. (1988) J. Exp. Med. 167, 1271-1280. 5« Kingsmore, S.F. et al. (1989) J. Exp. Med. 169, 1479-1484. 59 Kurtz, C.B. et al. (1989) J. Immunol. 143, 2058-2067. ^0 Vik, D.P. and Wong, W.W. (1993) J. Immunol. 151, 6214-6224. 6^ Nickells, M.W. et al. (1995) J. Immunol. 154, 2829-2837. 62 Molina, H. et al. (1990) J. Immunol. 145, 2974-2983. 63 Kinoshita, T. et al. (1988) J. Immunol. 140, 3066-3072. 6^ Birmingham, D.J. et al. (1994) J. Immunol. 153, 691-700. 65 Clemenza, L. et al. (1997) Mol. Immunol. 34, 297-304. 66 Molina, H. et al. (1990) J. Immunol. 145, 2974-2983. 67 Kurtz, C.B. et al. (1990) J. Immunol. 144, 3581-3591. 6s Fingeroth, J.D. (1990) J. Immunol. 144, 3458-3467. 69 Paul, M.S. et al. (1989) J. Immunol. 142, 582-589. 70 Quigg, R.J. et al. (1995) Immunogenet. 42, 362-367. 7^ Moulds, J.M. et al. (1991) J. Exp. Med. 173, 1159-1163. 72 Ahearn, J.M. et al. (1996) Immunity 4, 251-262. 73 Wilson, J.G. et al. (1986) J. Exp. Med. 164, 50-59. 7^ Van Dyne, S. et al. (1987) Clin. Exp. Immunol. 68, 570-579. ^' Wong, W.W. et al. (1986) J. Exp. Med. 164, 1531-1546. 76 ATCC Product #57330 (£. coli] or #57331 (plasmid DNA). ^^ Wong, W.W. et al. (1985) Proc. Natl Acad. Sci. USA 82, 7711-7715. 7s Moldenhauer, F. (1987) Arthritis Rheum. 30, 961-966. 79 Cohen, J.H.M. et al. (1989) Arthritis Rheum. 32, 393-398. «o Tebib, J.G. et al (1989) Arthritis Rheum. 32, 1465-1469. «^ Moulds, J.M. et al. (1996) Clin. Exp. Immunol. 105, 302-305. «2 Moulds, J.M. et al. (1998) Exp. Clin. Immunogenet. 15, 291-294.

CR2 Joel M. Guthridge and V. Michael Holers, School of Medicine, University of Colorado Health Sciences Center, Denver, CO, USA

Other names Complement receptor type 2, CD21, complement component-C3d/EpsteinBarr virus receptor 2, C3dR.

Physicochemical properties CR2 is synthesized as a precursor molecule of 1092 amino acids (16 CCP form)^ or 1033 amino acids (15 CCP form)^-^, each containing a 20 amino acid signal peptide, a 22-24 amino acid transmembrane region and a 34 amino acid intracellular domain. 15 CCP 16 CCP pi (predicted) 1,16 7.64 Amino acids 21-609,718-1092 21-1092 M,(K) predicted 111.0 117.3 observed 146.0 148.0 N-linked glycosylation sites 11 (121, 127, 294, 372, 492 13 (699, 709) 623, 741, 859, 882, 920, 970) as CCP 15 plus

Structure CR2 is a glycosylated^ type I transmembrane protein consisting on the extracellular face of a series of 15 or 16 CCP domains. It is likely that the CCPs of CR2 form an extended rod-like structure protruding from the cell membrane^.

Function CR2 is the receptor for complement components C3d and iC3b^'^ as well as the Epstein-Barr virus (EBV) glycoprotein gp350/220*'^. Complement receptor 2 also binds to the low-affinity IgE receptor, CD23, in both a glycosylation-dependent and -independent fashion^^'". It has also been reported to be a receptor for interferon a^^, although this has not been confirmed. CR2 plays a key role in the enhancement of humoral immune responses^^'^"^ by linking the binding of specific ligands to signal transduction events mediated by other members of the CD21/CD19 complex on B lymphocytes^^'^^. Expression of CR2 is also required on follicular dendritic cells for robust humoral immune responses, probably by mediating antigen trapping or cell-cell interactions in germinal centres of secondary lymphoid organs^ ^. Ligation of the receptor on basophils by CD23 leads to histamine release^*.

Tissue distribution Cell surface expression on mature B lymphocytes^^, follicular dendritic cells^^, thymocytes'^, a subpopulation of CD4+ and CD8^ T lymphocytes'^''^, basophils^*, keratinocytes'^, astrocytes'^ and epithelial cells'^. A soluble form has been described'^.

Regulation of expression CR2 is regulated in a cell type- and stage-specific developmental pattern. On B cells, the receptor is expressed on immature, mature circulating, follicular and marginal zone B cells^^^'^. Marginal zone B cells express the highest levels, while plasma cells lack expression. Memory B cells express CR2. Polyclonal B cell activators lead to decreased expression'^'^^, infection with EBV increases B cell expression through the transcriptional activating effects of EBNA-2^^ and infection of T cells with HTLV-1 leads to transformed cells expressing high receptor levels-^'.

Protein sequence MGAAGLLGVF LALVAPGVLG ISCGSPPPIL NGRISYYSTP lAVGTVIRYS

50

CSGTFRLIGE KSLLCITKDK VDGTWDKPAP KCEYFNKYSS CPEPIVPGGY

100

KIRGSTPYRH GDSVTFACKT NFSMNGNKSV WCQANNMWGP TRLPTCVSVF

150

PLECPALPMI HNGHHTSENV GSIAPGLSVT YSCESGYLLV GEKIINCLSS

200

GKWSAVPPTC EEARCKSLGR FPNGKVKEPP ILRVGVTANF FCDEGYRLQG

2 50

PPSSRCVIAG QGVAWTKMPV CEEIFCPSPP PILNGRHIGN SLANVSYGSI

3 00

VTYTCDPDPE EGVNFILIGE STLRCTVDSQ KTGTWSGPAP RCELSTSAVQ

3 50

CPHPQILRGR MVSGQKDRYT YNDTVIFACM FGFTLKGSKQ IRCNAQGTWE

40 0

PSAPVCEKEC QAPPNILNGQ KEDRHMVRFD PGTSIKYSCN PGYVLVGEES

450

IQCTSEGVWT PPVPQCKVAA CEATGRQLLT KPQHQFVRPD VNSSCGEGYK

500

LSGSVYQECQ GTIPWFMEIR LCKEITCPPP PVIYNGAHTG SSLEDFPYGT

550

TVTYTCNPGP ERGVEFSLIG ESTIRCTSND QERGTWSGPA PLCKLSLLAV

60 0

QCSHVHIANG YKISGKEAPY FYNDTVTFKC YSGFTLKGSS QIRCKADNTW

650

DPEIPVCEKG CQPPPGLHHG HHTGGNTVFF VSGMTVDYTC DPGYLLVGNK

7 00

SIHCMPSGNW SPSAPRCEET CQHVRQSLQE LPAGSRVELV NTSCQDGYQL

7 50

TGHAYQMCQD AENGIWFKKI PLCKVIHCHP PPVIVNGKHT GMMAENFLYG

800

NEVSYECDQG FYLLGEKKLQ CRSDSKGHGS WSGPSPQCLR SPPVTRCPNP

850

EVKHGYKLNK THSAYSHNDI VYVDCNPGFI MNGSRVIRCH TDNTWVPGVP

900

TCMKKAFIGC PPPPKTPNGN HTGGNIARFS PGMSILYSCD QGYLLVGEAL

950

LLCTHEGTWS QPAPHCKEVN CSSPADMDGI QKGLEPRKMY Q Y G A W T L E C 10 00 EDGYMLEGSP QSQCQSDHQW NPPLAVCRSR SLAPVLCGIA AGLILLTFLI 1050 VITLYVISKH RERNYYTDTS QKEAFHLEAR EVYSVDPYNP AS The leader sequence is underlined and AT-linked glycosylation sites are indicated (N).

Protein modules 1-20 23-82 91-146 154-210 215-271 276-342 351-406 410-466 471-522 527-593 602-657 661-717 721-773 778-838 847-902 910-966 971-1027 1031-1059

Leader peptide CCPl CCP2 CCP3 CCP4 CCP5 CCP6 CCP7 CCP8 CCP9 CCPIO CCPll* CCP12 CCP13 CCP14 CCPl 5 CCP16 Transmembrane domain

exon 1 exon2 exon 2 exon 3 exon 4/5 exon 6 exon 6 exon 7 exon 8/9 exon 10 exon 10 exon 11 exon 12/13 exon 14 exon 14 exon 15 exon 16 exon 17/18

1064-1092

Intracellular domain

exon 19

*CCP11 is unique to the 16 CCP isoform and is absent in the 15 CCP isoform. C3d and EBV gp350/220 bind to CCPs 1 and 233'34 c c ) 2 3 binds to a glycosylation-dependent epitope in CCPs 5-8, as well as a glycosylationindependent epitope in CCPs 1 and 2^^.

Chromosomal location H u m a n : chromosome 1, Cyto-Band q32. Telomere ... MCP ... C R l ... CR2 ... DAF ... Centromere^^ Mouse: chromosome 1, 106.6 cM. Telomere ... Cr2 ... Cfh ... Bcl2 ... Centromere^^

cDNA sequence GCCCTCCCAG GGCCGTGGCA GTCCTCGGGA TCTACCCCCA ATTGGAGAAA CCTGCTCCTA GGAGGATACA TGTAAAACCA TGGGGGCCGA CCTATGATCC TCTGTGACTT TTGTCTTCGG CTAGGACGAT GCAAACTTTT ATTGCTGGAC TCACCTCCCC GGAAGCATAG ATTGGAGAGA

AGCTGCCGGA TGGGCGCCGC TTTCTTGTGG TTGCTGTTGG AAAGTCTATT AATGTGAATA AAATTAGAGG ACTTCTCCAT CACGACTACC AC7y\TGGACA ACAGCTGTGA GAAAATGGAG TTCCCAATGG TCTGTGATGA AGGGAGTTGC CTATTCTCAA TCACTTACAC GCACTCTCCG

CGCTCGCGGG GGGCCTGCTC CTCTCCTCCG TACCGTGATA ATGCATAACT TTTCAATAAA CTCTACACCC GAACGGAAAC AACCTGTGTA TCACACAAGT ATCTGGTTAC TGCTGTCCCC GAAGGTAAAG AGGGTATCGA TTGGACCAAA TGGAAGACAT TTGTGACCCG TTGTACAGTT

TCTCGGAACG GGGGTTTTCT CCTATCCTAA AGGTACAGTT AAAGACAAAG TATTCTTCTT TACAGACATG AAGTCTGTTT AGTGTTTTCC GAGAATGTTG TTGCTTGTTG CCCACATGTG GAGCCTCCAA CTGCAAGGCC ATGCCAGTAT ATAGGCAACT GACCCAGAGG GATAGTCAGA

CATCCCGCCG TGGCTCTCGT ATGGCCGGAT GTTCAGGTAC TGGATGGAAC GCCCTGAGCC GTGATTCTGT GGTGTCAAGC CTCTCGAGTG GCTCCATTGC GAGAAAAGAT AAGAGGCACG TTCTCCGGGT CACCTTCTAG GTGAAGAAAT CACTAGCAAA AAGGAGTGAA AGACTGGGAC

60 CGGGGGCTTC CGCACCGGGG 120 TAGTTATTAT 180 CTTCCGCCTC 240 CTGGGATAAA 300 CATAGTACCA 360 GACATTTGCC 420 AAATAATATG 480 TCCAGCACTT 540 TCCAGGATTG 600 CATTAACTGT 660 CTGTAAATCT 720 TGGTGTAACT 780 TCGGTGTGTA 840 TTTTTGCCCA 900 TGTCTCATAT 960 CTTCATCCTT 1020 CTGGAGTGGC 1080

cDNA sequence

continued

CCTGCCCCAC AGAGGCCGAA GCTTGCATGT ACATGGGAGC AATGGGCAAA AGCTGTAACC GTGTGGACAC CTCTTGACAA GGGTACAAGT GAGATTCGTC CACACCGGGA CCTGGGCCAG AGCAATGATC CTTGCTGTCC GCCCCATATT GGCAGTAGTC GAAAAAGGCT GTCTTCTTTG GGAAACAAAT GAAGAAACAT GAGCTAGTTA TGTCAAGATG TGTCACCCTC CTATATGGAA AAATTGCAGT TGCTTACGAT CTCAATAAAA GGCTTCATCA GGTGTGCCAA AACGGGAACC AGCTGTGACC ACCTGGAGCC GATGGAATCC CTGGAGTGTG CACCAATGGA GGTATTGCTG TCAAAACACA GAAGCACGAG CTGGTGTGTG ATATCAGCAA TTCCTATATG CACTGCCATA GCCATGGCTA AAAAAGAGCT ACTATCTGCT GTTATGATGG CTGGTGGTGT ATATACTTTG ATGGGAATCA TAGCAGTTTT GTCATTCAAT

TTCTACTTCT GCAGAAAGAT CTTGAAGGGC AGTCTGTGAA ACACATGGTC GCTGGTGGGA CCAATGCAAA CCAATTTGTT TGTTTATCAG AATCACCTGC AGATTTTCCA GGAATTCAGC CACCTGGAGT TGTCCATATT CACTGTGACA CAAAGCTGAT TCCTGGGCTC GACTGTAGAC TATGCCTTCA GAGACAGAGT CCAAGATGGG AATTTGGTTC TGTCAATGGG TTATGAATGT TTCTAAAGGA GACTCGCTGC ATATTCCCAC TCGCGTGATT AAAAGCCTTC AAACATAGCT GCTGGTGGGA TCATTGTAAA GGAACCAAGG TATGCTGGAA GGCGGTTTGC ACTTCTTACC TTATTATACA TGTTGATCCA TGGAATTCAG ATGGCCTCAA CAAGAAGAAC GGACTTTCTG CATGGCTCTA CTAGACCCAT ATGTGTTTTT ATTAGGAAGT ATTTTTACCC ATCACTCAGT CTAGAGATTT CTAGTTGCTT TGTAAAGAAA

GCTGTGAACT TGGTATCTGG TTGGCTTCAC CATCTGCACC AGGAAGATAG CTGGCTATGT CCCCTGTACC AACCCCAGCA TAAGTGGGAG TTTGTAAAGA GTTCCTTAGA AAAGAGGAGT AAGAAAGAGG AGTGCTCACA TCTACAATGA AGATTCGTTG GCCAGCCACC TCTCTGGGAT CCATTCACTG GCCAGCATGT ATACGTCCTG CTGAAAATGG CACCAGTGAT ATGAAGTCTC GCAGAAGTGA CTCCTCCTGT CACATTCTGC TGAATGGTAG CTTGTATGAA ATACTGGTGG AAGGCTACCT AACCTGCCCC AGAAAGGGCT AAGATGGGTA ACCCTCCCCT CAGGTTTGAT GAGAACGCAA AAGTATATTC CCTCATTGCT GTCTCTTTAT CACTTATTCT TACTCTTCAA TAAACAATTA GTCCTGGTAT TTTGGTTATA CCTTACCTTT TAATCATTTC CCTTTCCATA AGATTTAATC GTGATAAGTT AAATTGTAAT

GCGGTTCAGT CGATATACCT AGCAAGCAAA AAGGAATGCC CGCTTTGACC GAAGAATCCA GTGGCAGCGT AGACCAGATG GAGTGTCAAG CCACCACCCC TATGGAACCA CTCATTGGAG GGCCCTGCTC GCAAATGGAT TTCAAGTGTT AACACCTGGG CACCATGGTC TACACTTGTG GGAAATTGGA CTTCAAGAAC TACCAGTTGA AAAAAGATTC AAGCACACAG GACCAAGGAT CATGGATCTT CCTAATCCAG AATGACATAG AGGTGTCATA ATAGGGTGTC CGATTTTCTC GAGGCACTCC GAGGTAAACT AAAATGTATC GGCAGTCCCC AGATCCCGTT TTCTTGATTG GATACAAGCC TACAACCCAG CGGAATATTG GATCAATGAA ATCTTTATGG AAGCCTCACT AAAAGTTTTG CTTCTTTTTG AATTATCTAA ATGGTTTTAT TTACTGTTTG GATTGCAATT GGTGTACAAT GTAAAATTTC A

GTCCACATCC ATAACGACAC TCCGATGCAA AGGCCCCTCC CTGGAACATC TACAGTGTAC GTGAAGCTAC TCAACTCTTC GCACAATTCC CTGTTATCTA CGGTCACTTA AGAGCACCAT CCCTATGTAA ACAAGATATC ATAGTGGATT ATCCTGAAAT ATCATACAGG ACCCTGGCTA GTCCTTCTGC TTCCAGCTGG CTGGACATGC CACTTTGTAA GGATGATGGC TCTATCTCCT GGAGCGGGCC AAGTCAAACA TGTATGTTGA CTGATAACAC CACCTCCGCC CTGGAATGTC TTCTTTGCAC GTAGCTCACC AGTATGGAGC AGAGCCAGTG CACTTGCTCC TCATTACCTT AGAAAGAAGC CCAGCTGATC ATTAGAAAGA ATGATGTCAT TAAAGATGGG TATGAGATGC CCCTTTTTAA AAATCAGCAT AGTATGAAGC TTTGATAGTA AGTTTCTCTC TGCACAAGTT TCAGGCTTTG ACTTAATAAT

CCAGATCCTA TGTGATATTT TGCCCAAGGC TAACATCCTC TATAAAATAT CTCTGAGGGG AGGAAGGCAA TTGTGGTGAA TTGGTTTATG CAATGGGGCA CACATGTAAC CCGTTGTACA ACTTTCCCTC TGGCAAGGAA TACTTTGAAG ACCAGTTTGT TGGAAATACG TTTGCTTGTG CCCACGGTGT TTCACGTGTG TTATCAGATG AGTTATTCAC AGAAAACTTT GGGAGAGAAA TTCCCCACAG TGGGTACAAG CTGCAATCCT ATGGGTGCCA TAAGACCCCT AATCCTGTAC ACATGAGGGA AGCAGATATG TGTTGTAACT CCAATCGGAT TGTCCTTTGT ATACGTGATA TTTTCATTTA AGAAGACAAA AACTGCTCTA AAGCGATCAC AGCCCAGTTT CTGAAGCCAG GGAAGGCACT ACTCAATGTT ATTTTCTGGG GCTTCCTCCT ACATTACTGT TTTTTAAATT GATGTTTCTT GTGTACATTA

1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080

The first five nucleotides in each exon are underUned to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TGA) and the probable polyadenylation signal (AATAAA) are indicated. Note: This sequence corresponds to the GenBank sequence accession number M26004, J04463 for CCPl-10, 12-16. The nucleotide sequence of CCPll corresponds to the cDNA sequence found in GenBank sequence accession number J03565.

Genomic structure The gene spans approximately 30 kb and is encoded by 19 exons as illustrated below^. The first intron is approximately 12-13 kb. Alternative splicing of exon 11 generates the 15/16 CCP isoforms of CR2. 10

20

iiiniiniiimii I • 2kb

Accession numbers Human Mouse

M26004 J04463 J03565 M35684 J04153 M33527

Deficiency There is no well-defined genetic deficiency of complement receptor type 2. Patients with the autoimmune disease systemic lupus erythematosus express -50-60% of normal levels on circulating B cells^^. Patients with HIV-1 infection manifest decreased B cell receptor levels^^.

Polymorphic variants There are several sequence polymorphisms of human CR2 that have been identified, although the frequency and population genetics of these have not been studied. Three forms were identified as outlined below (Sequence numbering and explanation of CCP designations has been modified from ref. 39 to correspond to the cDNA sequence and notations used here.) CCP7, 1438-1440 CCPIO, 1987 CCPIO, 2005 CCP 11, 2047-2224 CCPll, 2180 CCP13-14, 2393-2785 Cytoplasmic

Form I GGG missing AGT(S) GCT(A) noil same as II GCA(A)

Form II unknown AAT(N) GCT(A)

Form III GGG(G) AGT(S) CGT(R)

+

+

CAT(H) same as I GAA(E)

CGT(R) several differences GAA(E)

In a small population study, intronic Taql and Haelll RFLPs were identified and mapped^. Approximate frequencies for each in white populations are:

Taql Haelll

2.55 kb 2.10kb 1.45 kb 1.55 kb 1.75 kb

(0.7) (0.3) (0.93) (0.05) (0.02)

References ' Moore, M.D. et al. (1987) Proc. Natl Acad. Sci. USA 84, 9194-9198. 2 Fujisaku, A. et al. (1989) J. Biol. Chem. 264, 2118-2125. 3 Weis, J.J. et al. (1988) J. Exp. Med. 167,1047-1066. ^ Weis, J.J. and Fearon, D.T. (1985) J. Biol. Chem. 260, 13824-13830. 5 Moore, M.D. et al. (1989) J. Biol. Chem. 264, 20576-20582. 6 lida, K. et al. (1983) J. Exp. Med. 158, 1021-1033. 7 Weis, J.J. et al. (1984) Proc. Natl Acad. Sci. USA 81, 881-885. « Fingeroth, J.D. et al. (1984) Proc. Natl Acad. Sci. USA 81, 4510-4514. 9 Nemerow, G.R. et al. (1985) J. Virol. 55, 347-351. '0 Aubry, J.P. et al. (1992) Nature 358, 505-507. '' Aubry, J.P. et al. (1994) J. Immunol. 152, 5806-5813. ^2 Delcayre, A.X. et al. (1991) EMBO J. 10, 919-926. ^3 Molina, H. et al. (1996) Proc. Natl Acad. Sci. USA 93, 3357-3361. ^^ Ahearn, J.M. et al. (1996) Immunity 4, 251-262. ^5 Fearon, D.T. and Carter, R.H. (1995) Annu. Rev. Immunol. 13, 127-149. ^6 Tedder, T.F. et al. (1994) Immunol. Today 15, 437-442. '' Fang, Y. et al. (1998) J. Immunol. 160, 5273-5279. ^s Bacon, K. et al. (1993) Eur. J. Immunol. 23, 2721-2724. ^9 Tedder, T.F. et al. (1984) J. Immunol. 133, 678-683. 20 Keynes, M. et al. (1985) J. Immunol. 135, 2687-2694. 2i Watry, D. et al. (1991) J. Exp. Med. 173, 971-980. 22 Fischer, E. et al. (1991) J. Immunol. 146, 865-869. 23 Levy, E. et al. (1992) Clin. Exp. Immunol. 90, 235-244. 2^ Hunyadi, J. et al. (1991) Dermatologica 183, 184-186. 25 Casque, P. et al. (1996) J. Immunol. 156, 2247-2255. 26 Levine, J. et al. (1990) Reg. Immunol. 3, 164-170. 27 Huemer, H.P. et al. (1995) Clin. Exp. Immunol. 93, 195-199. 28 Timens, W. et al. (1989) Eur. J. Immunol. 19, 2163-2166. 29 Boyd, A.W. et al. (1985) J. Immunol. 134, 1516-1523. 30 Stashenko, P. et al. (1981) Proc. Natl Acad. Sci. USA 78, 3848-3852. ^^ Cordier-Bussat, M. et al. (1993) Int. J. Cancer 53, 153-160. ^2 McNearney, T.A. et al. (1993) Eur. J. Immunol. 23, 1266-1270. 33 Carel, J.-C. et al. (1990) J. Biol. Chem. 265, 12293-12299. 34 Lowell, C.A. et al. (1989) J. Exp. Med. 170,1931-46. 35 Hourcade, D. et al. (1992) Genomics 12, 289-300. 36 Kingsmore, S.F. et al. (1989) J. Exp. Med. 169, 1479-1484. 3^ Wilson, J.G. et al. (1986) Arthritis Rheum. 29, 739-747. 38 Scott, M.E. et al. (1993) AIDS 7, 37-41. 39 Toothaker, L.E. et al. (1989) J. Immunol. 142, 3668-3675.

Decay-accelerating factor

CD55, DAF

L. Kuttner-Kondo, W.G. Brodbeck, and M.E. Medof, Institute of Pathology, Case Western Reserve University, Cleveland, OH, USA

Physicochemical properties Decay-accelerating factor is synthesized as a single-chain prepropeptide of 381 amino acids including a 34 amino acid leader sequence, and a Cterminal signal sequence of 28 amino acids v^hich is cleaved and replaced v^ith a glycosylphosphatidylinositol (GPI) anchor^'^ M, [K]^'"^ 70 (non-reduced) 75 (reduced) N-linked glycosylation sites 1 (95)^'^'^

Structure DAF is a glycoprotein composed of four CCP domains, a heavily glycosylated STP (membrane proximal) domain rich in serines, threonines and prolines, and a GPI anchor^'^'^.

Function DAF functions intrinsically in cell membranes to protect host cells from autologous complement attack^. It accelerates the decay of the classical and alternative C3 and C5 convertases^'"^'^'^. The purified protein, when added to cells, incorporates into their membranes and is functional^.

Tissue distribution DAF is present on all blood elements^'^^, and most other cell types. It is expressed in high levels on cells that line extravascular compartments^^. Soluble variants of DAF are found in body fluids^^'^^

Regulation of expression^^-^^ Varies from 2500 molecules/cell on erythrocytes to > 100 000 molecules/cell on endothelial and epithelial cells. Induced by phorbol ester (PMA). Promoter has cAMP response element.

I

Protein sequence^^ MTVARPSVPA GRTSFPEDTV PTRLNSASLK WSTAVEFCKK SSFCLISGSS YACNKGFTMI TVNVPTTEVS TTSGTTRLLS

ALPLLGELPR ITYKCEESFV QPYITQNYFP KSCPNPGEIR VQWSDPLPEC GEHSIYCTVN PTSQKTTTKT GHTCFTLTGL

LLLLVLLCLP KIPGEKDSVI VGTWEYECR NGQIDVPGGI REIYCPAPPQ NDEGEWSGPP TTPNAQATRS LGTLVTMGLL

AVWGDCGLPP CLKGSQWSDI PGYRREPSLS LFGATISFSC IDNGIIQGER PECRGKSLTS TPVSRTTKHF T

DVPNAQPALE EEFCNRSCEV PKLTCLQNLK NTGYKLFGST DHYGYRQSVT KVPPTVQKPT HETTPNKGSG

50 100 150 2 00 250 3 00 3 50

The leader sequence is underlined, N-linked glycosylation site is indicated (N), and the cleavage site for GPI anchor attachment is double underlined. Amino acid differences between the two publications: 80 I/T, 85 S/M.

Dccav-accclcratiiiir hictor

Protein modules 1-34 35-95 97-159 162-221 224-284 287-356

Leader sequence^'^

CCP CCP CCP CCP STP

exon V^ exon 2 exon 3 exon 4/5 exon 6

exon 7-9

Chromosomal location Hurnan^^. iq32. Mouse^*: chromosome 1.

cDNA sequence^ 2A3 ACTGCAACTC CCTTGTTCTA CCTCCTCGGG GGGTGACTGT AAGTTTTCCC TGGCGAGAAG CTGCAATCGT TATCACTCAG CAGAAGAGAA AGCAGTCGAA GATTGATGTA GTACAAATTA GAGTGACCCG TGGAATAATT TAATAAAGGA AGGAGAGTGG ACCAACAGTT TCAGAAAACC TTCCAGGACA AGGTACTACC GCTAGTAACC GTATACAGAC TGTGCTCTTC CAAGGAGAAA AGAACAACTT TTGTTCGTAT GATCTGTAAT TCAAAAGCA^ ACCACATTAT AATATTTTAA TATAGAATGA AAAGGTGTCT TAAGAAAAGA ATTCTTTTGT AAAACAAGAA AATGATCCCA

GCTCCGGCCG ACCCGGCGCG GAGCTGCCCC GGCCTTCCCC GAGGATACTG GACTCAGTGA AGCTGCGAGG AATTATTTTC CCTTCTCTAT TTTTGTAAAA CCAGGTGGCA TTTGGCTCGA TTGCCAGAGT CAAGGGGAAC TTCACCATGA AGTGGCCCAC CAGAAACCTA ACCACAAAAA ACCAAGCATT CGTCTTCTAT ATGGGCTTGC TGTTCCTAGT ATTTAGGATG AAAGGCAGTC GCAGAATTGA TTAGAATGGG GTTATTTCCA ATAAAAACCC AAAGTAATCT AGGTAAAACA AAGACTGAAT TCTTTGACTT TTATATATTA AATATTTATT AAGTTGAAGA TTTTTTGGT

CTGGGCGTAG CCATGACCGT GGCTGCTGCT CAGATGTACC TAATAACGTA TCTGCCTTAA TGCCAACAAG CAGTCGGTAC CACCAAAACT AGAAATCATG TATTATTTGG CTTCTAGTTT GCAGAGAAAT GTGACCATTA TTGGAGAGCA CACCTGAATG CCACAGTAAA CCACCACACC TTCATGAAAC CTGGGCACAC TGACTTAGCC TTCTTAGACT CTTTCATTGT CTGGAATCAC GAGTGATTCC ATCACGAGGA CTTATAAAGG AATTCAGTCT TTGGCTGTAA TGCTGGTGAA CTTCCTTTGT AATGTCTTTA TTTCTGAATC TATATTTATT AGATATGTGA

CTGCGACTCG CGCGCGGCCG GCTGGTGCTG TAATGCCCAG CAAATGTGAA GGGCAGTCAA GCTAAATTCT TGTTGTGGAA AACTTGCCTT CCCTAATCCG TGCAACCATC TTGTCTTATT TTATTGTCCA TGGATATAGA CTCTATTTAT CAGAGGAAAA TGTTCCAACT AAATGCTCAA AACCCCAAAT GTGTTTCACG AAAGAAGAGT TATCTGCATA CTTTAAGATG ATTCTTAGCA TTTCCTAAAA AAAGAGAAGG AAATAAAAAA CTTCTAAGCA GGCATTTTCA CCAGGGGTGT TGCACAAATA AAAGTATCCA GAGATGTCCA TATGACAGTG AGAAAAATGT

GCGGAGTCCC AGCGTGCCCG TTGTGCCTGC CCAGCTTTGG GAAAGCTTTG TGGTCAGATA GCATCCCTCA TATGAGTGCC CAGAATTTAA GGAGAAATAC TCCTTCTCAT TCAGGCAGCT GCACCACCAC CAGTCTGTAA TGTACTGTGA TCTCTAACTT ACAGAAGTCT GCAACACGGA AAAGGAAGTG TTGACAGGTT TAAGAAGAAA TTGGATAAAA TGTTAGGAAT CACCTACACC GTGTAAGAAA AAAGTGATTT TGAAAAACAT AAATTGCTAA TCTTTCCTTC TGATGGTGAT GAGTTTGGAA GAGATACTAC TAGTCAAATT AACATTCTGA ATTTTTCCTA

GGCGGCGCGT CGGCGCTGCC CGGCCGTGTG AAGGCCGTAC TGAAAATTCC TTGAAGAGTT AACAGCCTTA GTCCAGGTTA AATGGTCCAC GAAATGGTCA GTAACACAGG CTGTCCAGTG AAATTGACAA CGTATGCATG ATAATGATGA CCAAGGTCCC CACCAACTTC GTACACCTGT GAACCACTTC TGCTTGGGAC ATACACACAA TAAATGCAAT GTCAACAGAG TCTTGAAAAT GCATAGAGAT TTTTCCACAA TATTTGGATA AGAGAGATGA GGGTTGGCAA AAGGGAGGAA AAAGCCTGTG AATATTAACA TGTAAATCTT TTTTACATGT AATAGAAATA

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100

Position 1 is the transcriptional start site. The first five nucleotides in each exon are underlined. Exon 10 (not illustrated in the cDNA sequence but depicted below), an Alu family sequence, has not been reported in DAF mRNA and is not used in surface DAF protein^. The initiation codon (ATG), the termination codon (TAG), and the four polyadenylation signals (AATAAA) are indicated. Nucleotide differences between the published sequences: 321 T/C, 336 G/T, 337 T/G.

Decay-acceleratine factor

I

Exon 10^ 1164 GTTCTCGTCC TGTCACCCAG GCTGGTATGC GGTGGTGTGA TCGTAGCTCA CTGCAGTCTC GAACTCCTGG GTTCAAGCGA TCCTTCCACT TCAGCCTCCC AAGTAGCTGG TACTACAG

Genomic structure^^ The gene spans -40 kb and is encoded by 11 exons. The introns vary in size from 0.5 tol9.8 kb (last intron). Exon 10 has not been reported in DAF mRNA. 2kb

ii II r I—H+i

i

Accession numbers (EMBL/GenBank) Human

Mouse^«'^9 Orang-utan^o Guinea-pig2i Rat22

M31516 M15799 M64653 S72858 L41365-6 D63679 S67775 D49416-D49421 AF039583-4

M64356 (promoter)

Deficiency Paroxysmal nocturnal haemoglobinuria results from the absence of DAF (as well as CD59 and all other GPI-anchored proteins) on peripheral blood elements^^'23,24 -fj^^ failure to express these proteins is due to a defect in the first step of GPI assembly (GlcNAc-PI synthesis^^^Tj j ^ a bone marrow stem cell eventuating from a mutation of the PIG-A gene^^. Deficient expression of DAF gives rise to heightened uptake of C3b2^. The Cromer blood group antigen system resides on the DAF molecule^'^; the Inab phenotype represents the absence of DAF^^ Polymorphic variants^^-^^ G237T; R52L G237C; R52P T327G; L82R C678T; S199L G761C; A227P T321A; I80N C831T;T250M References 1 Caras, I.W. et al. (1987) Nature 325, 545-549. 2 Medof, M.E. et al. (1987) Proc. Natl Acad. Sci. USA 84, 2007-2011. 3 Nicholson-Weller, A. et al. (1982) J. Immunol. 129,184-189.

Decay-accelerating factor

^ 5 6 7 « 9 0

20 2^ 22 23 2^ 25 26 27 2« 29 ^0 ^^ 32 33 3^ 35 36

Medof M.E. et al. (1984) J. Exp. Med. 160, 1558-1578. Lublin, D.M. et al. (1986) J. Immunol. 137, 1629-1635. Medof, M.E. et al. (1986) Biochemistry 25, 6740-6747. Pangburn, M.K. (1986) J. Immunol. 136, 2216-2221. Fujita, T. et al. (1987) J. Exp. Med. 166, 1221-1228. Kinoshita, T. et al. (1985) J. Exp. Med. 162, 75-92. Nicholson-Weller, A. et al. (1985) Blood 65, 1237-1244. Medof, M.E. et al. (1987) J. Exp. Med. 165, 848-864. Lass, J.H. et al. (1990) Invest. Ophthalmol. Vis. Sci. 31, 1136-1148. Ewulonu, U.K. et al. (1991) Proc. Natl Acad. Sci. USA 88, 4675-4679. Thomas, D.J. and Lublin, D.M. (1993) J. Immunol. 150,151-160. Bryant, R.W. et al. (1990) J. Immunol. 144, 593-598. Post, T.W. et al. (1990) J. Immunol. 144, 740-744. Lublin, D.M. et al. (1987) J. Exp. Med. 165, 1731-1736. Spicer, A.P. et al. (1995) J. Immunol. 155, 3079-3091. Fukuoka, Y. et al. (1996) Int. Immunol. 8, 379-385. Nickells, M.W. et al. (1994) J. Immunol. 152, 676-685. Nonaka, M. et al. (1995) J. Immunol. 155, 3037-3048. Hinchliffe, S.J. et al. (1998). J. Immunol. 161, 5695-5703 Pangburn, M.K. et al. (1983) Proc. Natl Acad. Sci. USA 80, 5430-5434. Nicholson-Weller, A. et al. (1983) Proc. Natl Acad. Sci. USA 80, 5066-5070. Armstrong, C. et al. (1992) J. Biol. Chem. 267, 25347-25351. Takahashi, M. et al. (1993) J. Exp. Med. 177, 517-521. Hidaka, M. et al. (1993) Biochim. Biophys. Acta 191, 571-579. Takeda, J. et al. (1993) Cell 73, 703-711. Medof, M.E. et al. (1985) Proc. Natl Acad. Sci. USA 82, 2980-2984. Telen, M.J. et al. (1988) J. Exp. Med. 167, 1993-1998. Parsons, S.F. et al. (1988) Proceedings of the 20th Congress of the International Society of Blood Transfusion, London, UK, p. 116 (abstr.). Stafford, H.A. et al. (1988) Proc. Natl Acad. Sci. USA 85, 880-884. Telen, M.J. and Green, A.M. (1989) Blood 74, 437-441. Lublin, D.M. et al. (1991) J. Clin. Invest. 87, 1945-1952. Telen, M.J. et al. (1994) Blood 84, 3205-3211. Lublin, D.M. et al. (1997) Transfusion 37, 102S (abstr.).

Membrane cofactor protein M. Kathryn Liszewski and John P. Atkinson, Department of Medicine, Washington University School of Medicine, St Louis, MO, USA

Other names MCP, CD46, gp45-70, measles virus receptor.

Physicochemical properties MCP is a type 1 membrane glycoprotein expressed as four common isoforms (CI, C2, BCl and BC2) that arise by alternative splicing^ Each possesses a 34 amino acid signal peptide followed by 328-350 amino acids (CI, 328 aa; C2, 335 aa,- BCl, 343 aa,- BC2, 350 aap-^. pi 3.9-5.8 Higher molecular freight species possess more O-linked sugars, v^hich correlate with their more acidic pi. M, (K) predicted -39 observed 51-58 (C isoforms) 59-68 (BC isoforms) N-linked glycosylation sites^'^ 3(83, 114,273) (all occupied)

Structure The N-terminal portion of all isoforms consists of four contiguous CCP repeats. Following the four CCP is an alternatively spliced segment enriched in serines, threonines and prolines (STP domain) that is Oglycosylated at positions 286-314 (BC isoforms) and 286-299 (C isoforms)^'^. The gene contains three STP exons termed A, B and C. Two of the regularly expressed isoforms consist of B+C exons (29 amino acids) and two consist of the C (14 amino acids) exon only. STP exon A is rarely used. Flanking the STP domain and common to all of the isoforms is a sequence of 12 amino acids of unknown function, which is followed by the transmembrane domain, and intracytoplasmic anchor. Alternative splicing produces two distinct cytoplasmic tails of 16 or 23 amino acids termed Cyt-1 or Cyt-2. Designations for the four commonly occurring isoforms are MCP-BCl, MCP-BC2, MCP-Cl and MCP-C2 (specifying the ahernatively spUced STP and tail regions). Other rarer isoforms of unclear significance have been described. The upper band on SDS-PAGE consists of BCl and BC2 while the lower band consists of CI and C2. Population studies indicate that the upper band predominant pattern is present in 65% of the population, an approximately equal distribution of upper and lower bands occurs in 29%, and 6% express the lower band (C isoforms) predominantly^.

Function MCP is a ubiquitously expressed complement regulatory protein. It is a cofactor for the factor I-mediated cleavage of C3b and C4b that deposits on self-tissue^. This regulation is only performed intrinsically in that MCP

Membrane cofactor protein

protects the cell on which it is anchored, not neighbouring cells*. MCP is expressed on placental trophoblast and on the inner acrosomal membrane of human spermatozoa^'^'^. Its role in these locations is likely to protect against complement activation, but other possibilities have been suggested^^. Crosslinking of MCP downregulates IL-12 production, a finding of potential significance for the immunosuppressive sequelae of measles virus infection^^ MCP is a receptor for several pathogens including measles virus^^-^^, group A Streptococcus pyogenes^^, and pathogenic Neisseha^'^. Therapeutic uses of MCP include production of transgenic animals expressing MCP in order to prevent the hyperacute graft rejection that accompanies xenotransplantation^* and a recombinantly produced soluble form in which MCP is linked with a second complement regulatory protein (decayaccelerating factor) for therapeutic use as an inhibitor of complement activation^^.

wwWWLfa

Tissue distribution Most cells express each of the four isoforms of MCP. Human erythrocytes lack MCP. Tissue-specific isoform expression has been found in kidney and salivary gland (BC isoforms) as well as brain and fetal heart (C isoforms)^^'2^.

Regulation of expression MCP levels are increased in certain haematologic malignancies, on most solid tumour cell lines, and following SV40 transformation (reviewed in ref. 22). MCP expression is upregulated in glomerular capillary walls and in mesangial regions of diseased kidney tissues and in astrocytes following cytomegalovirus infection. IFN7 and phorbal ester (PMA) enhanced expression in an oligodendrocyte cell line^^.

Protein sequence {BCI)^ MEPPGRRECP KPYYEIGERV YIRDPLNGQA SGKPPICEKV SLIGESTIYC TVMFECDKGF GPRPTYKPPV YRYLQRRKKK

FPSWRFPGLL DYKCKKGYFY VPANGTYEFG LCTPPPKIKN GDNSVWSRAA YLDGSDTIVC SNYPGYPKPE GKADGGAEYA

LAAMVLLLYS IPPLATHTIC YQMHFICNEG GKHTFSEVEV PECKWKCRF DSNSTWDPPV EGILDSLDVW TYQTKSTTPA

FSDACEEPPT DRNHTWLPVS YYLIGEEILY FEYLDAVTYS PWENGKQIS PKCLKVSTSS VIAVIVIAIV EQRG

FEAMELIGKP DDACYRETCP CELKGSVAIW CDPAPGPDPF GFGKKFYYKA TTKSPASSAS VGVAVICWP

50 100 150 200 250 300 350

The leader sequence is underlined, iV-linked glycosylation sites (all occupied) are indicated (N) and segments alternatively spliced are double underlined (see Structure).

Membrane cofactor protein

Protein modules 1-34 35-95 96-158 159-224 225-285 286-314 315-327 328-351 352-361 362-377 362-384

Leader peptide CCP CCP CCP CCP STP B domain: VSTSSTTKSPASSAS C domain: GPRPTYKPPVSNYP Undefined segment Transmembrane domain Intracytoplasmic anchor Cytoplasmic tail one: TYLTDETHREVKFTSL Cytoplasmic tail two: KADGGAEYATYQTKSTTPAEQRG

exon 1 exon 2 exon 3/4 exon 5 exon 6 exon 7-9 exon 8 exon 9 exon 10 exon 11/12 exon 12 exon 13 exon 14

Chromosomal location^^^^'^^ Human: lq3.2. It is located along with four other closely related genes on a 900 kb fragment within the RCA locus at lq3.2. An MCP-like genomic element includes sequences 93% homologous at the nucleotide level with the MCP 5' terminus (i.e. signal peptide, and CCPl-3)^^. Located within 60kb of MCP^^, it is unknown if this partial duplication produces a protein.

cDNA sequence (BC2)2 TCTGCTTTCC CCGCGAGTGT GCTGCTGTAC CATTGGTAAA AGGATACTTC GCTACCTGTC AAATGGCCAA TTGTAATGAG AGTAGCAATT AAAAATAAAA AGTAACTTAT CACGATTTAT CAAATGTCGA TTACTACAAA CACAATTGTC GTCGACTTCT CAAGCCTCCA TTTGGATGTT TTGTGTTGTC AGCTGAATAT AGATTCCACA TTATTCTGTA

TCCGGAGAAA CCCTTTCCTT TCCTTCTCCG CCAAAACCCT TATATACCTC TCAGATGACG GCAGTCCCTG GGTTATTACT TGGAGCGGTA AATGGAAAAC AGTTGTGATC TGTGGTGACA TTTCCAGTAG GCAACAGTTA TGTGACAGTA TCCACTACAA GTCTCAAATT TGGGTCATTG CCGTACAGAT GCCACTTACC ACCTGGTTTG GTTTCACTCT

TAACAGCGTC CCTGGCGCTT ATGCCTGTGA ACTATGAGAT CTCTTGCCAC CCTGTTATAG CAAATGGGAC TAATTGGTGA AGCCCCCAAT ACACCTTTAG CTGCACCTGG ATTCAGTGTG TCGAAAATGG TGTTTGAATG ACAGTACTTG AATCTCCAGC ATCCAGGATA CTGTGATTGT ATCTTCAAAG AGACTAAATC CCAGTTCATC CATGAGTGCA

TTCCGCGCCG TCCTGGGTTG GGAGCCACCA TGGTGAACGA CCATACTATT AGAAACATGT TTACGAGTTT AGAAATTCTA ATGTGAAAAG TGAAGTAGAA ACCAGATCCA GAGTCGTGCT AAAACAGATA CGATAAGGGT GGATCCCCCA GTCCAGTGCC TCCTAAACCT TATTGCCATA GAGGAAGAAG AACCACTCCA TTTTGACTCT ACTGTGGCTT

CGCATGGAGC CTTCTGGCGG ACATTTGAAG GTAGATTATA TGTGATCGGA CCATATATAC GGTTATCAGA TATTGTGAAC GTTTTGTGTA GTATTTGAGT TTTTCACTTA GCTCCAGAGT TCAGGATTTG TTTTACCTCG GTTCCAAAGT TCAGGTCCTA GAGGAAGGAA GTTGTTGGAG AAAGGGAAAG GCAGAGCAGA ATTAAAATCT AGCTAATATT

CTCCCGGCCG CCATGGTGTT CTATGGAGCT AGTGTAAAAA ATCATACATG GGGATCCTTT TGCACTTTAT TTAAAGGATC CACCACCTCC ATCTTGATGC TTGGAGAGAG GTAAAGTGGT GAAAAAAATT ATGGCAGCGA GTCTTAAAGT GGCCTACTTA TACTTGACAG TTGCAGTAAT CAGATGGTGG GAGGCTGAAT TCAATAGTTG GCAATGTGGC

60 12 0 180 24 0 3 00 3 60 42 0 4 80 540 600 660 72 0 7 80 840 900 960 102 0 1080 1140 12 0 0 12 6 0 132 0

Membrane cofactor protein

cDNA sequence (BC2) TTGAATGTAG AGATTGCCTG CTGGTTGTAT TAGTTCACAA

GTAGCATCCT CTTTCCCTTA TAAAGCAGGG TGAAATTATA

continued

TTGATGCTTC TTTGAAACTT GTATGAATTT GGGTATGAAC 13 80 AATAACACTT AGATTTATTG GACCAGTCAG CACAGCATGC 1440 ATATGCTGTA TTTTATAAAA TTGGCAAAAT TAGAGAAATA 1500 TTTTCTTTGT

The first five nucleotides in each exon are underhned to indicate the intron-exon boundaries. The methionine initiation codon (ATGI, the termination codon (TGA) and the probable polyadenylation signals (AATATA or AATGAA) are indicated.

Genomic structure The gene spans a minimum of 43 kb and is encoded by 14 exons^. There are two sites for alternative splicing: exons 7, 8 and 9 encode the STP domains commonly expressed as isoforms with B+C (8 + 9) or C (9) alone,- exons 13 and 14 encode the cytoplasmic tails, CYT-1 and CYT-2. Since exon 13 contains an in-frame stop codon, its expression as CYT-1 converts exon 14 into the 3' untranslated region of MCP.

5kb

14

Accession numbers Human^'^

Owl monkey^^ Baboon^^ Goeldii marmoset^^ Common marmoset^^ Tamarin^^ Squirrel monkey^^ African green monkey^^ Cynomologous monkey^^ Rhesus monkey^^ White-faced saki^^ Guinea-pig2^ pj^g28,29

Mouse^^

MCP-BC2 MCP-BCl MCP-Cl MCP-C2 MCP-ABC2 MCP-ABC 1

Y00651 X59405 X59406 X59407 X59409 X59410 U87914 U87915 U87916 U87917 U87918 U87919 U87920 U87921 U87922 U87923 D84130-3 D70897 AB001566

Membrane cohictor protein

D

Deficiency None known.

Polymorphic variants A Hindlll RFLP has been found that correlates with the phenotypic polymorphism of MCP^^. This size polymorphism results from variable splicing of exon 8^. Pvull and Bglll RFLPs have also been described^^'^^.

References ^ Liszewski, M.K. et al. (1996) Adv. Immunol. 61, 201-283. 2 Lublin, D.M. et al. (1988) J. Exp. Med. 168, 181-194. 3 Post, T.W. et al. (1991) J. Exp. Med. 174, 93-102. ^ Russell, S.M. et al. (1992) Eur. J. Immunol. 22, 1513-1518. 5 Ballard, L.L. et al. (1988) J. Immunol. 141, 3923-3929. 6 Ballard, L. et al. (1987) J. Immunol. 138, 3850-3855. ^ Seya, T. and Atkinson, J.P. (1989) Biochem. J. 264, 581-588. « Oglesby, T.J. et al. (1992) J. Exp. Med. 175, 1547-1551. 9 Cervoni, F. et al. (1992) J. Immunol. 148, 1431-1437. ^» Anderson, D.J. et al. (1989) Biol. Reprod. 41, 285-293. " Anderson, D.J. et al. (1993) Proc. Natl Acad. Sci. USA 90, 10051-10055. ^2 Karp, C.L. et al. (1996) Science 273, 228-231. ^3 Naniche, D. et al. (1993) J. Virol. 67, 6025-6032. '^ Dorig, R.E. et al. (1993) Cell 75, 295-305. ^5 Manchester, M. et al. (1994) Proc. Natl Acad. Sci. USA 91, 2161-2165. ^6 Okada, N. et al. (1995) Proc. Natl Acad. Sci. USA 92, 2489-2493. ^7 Kallstrom, H. et al. (1997) Mol. Microbiol. 25, 639-647. ^« Cozzi, E. and White, D.J.G. (1995) Nature Med. 1, 964-966. ^9 Higgins, P.J. et al. (1997) J. Immunol. 158, 2872-2881. 2« Johnstone, R.W. et al. (1993) Mol. Immunol. 30, 1231-1241. 2^ GoreUck, A. et al. (1995) Lupus 4, 293-296. 22 Liszewski, M.K. et al. (1998) In (Rother, K., Till, G.O. and Hansch, G.M. eds). 2nd ed. Berlin, Springer-Verlag, pp. 146-162. 23 Gasque, P. and Morgan, B.P. (1996) Immunology 89, 338-347. 24 Bora, N.S. et al. (1989) J. Exp. Med. 169, 597-602. 25 Hourcade, D. et al. (1992) Genomics 12, 289-300. 26 Hsu, E.G. et al. (1997) J. Virol. 71, 6144-6154. 27 Hosokawa, M. et al. (1996) J. Immunol. 157, 4946-4952. 2« van den Berg, C.W. et al. (1997) J. Immunol. 158, 1703-1709. 29 Toyomura, K. et al. (1997) Int. Immunol. 9, 869-876. 3» Tsujimura, A. et al. (1998) Biochem. J. 330, 163-168. 3^ Bora, N.S. et al. (1991) J. Immunol. 146, 2821-2825. 32 Wilton, A.N. et al. (1992) Immunogenetics 36, 79-85.

C4b-binding protein Santiago Rodriguez de Cordoba, Olga Criado Garcia and Pilar SanchezCorral, Department of Immunology, CIB/CSIC, Madrid, Spain

n

Other names Proline-rich protein (PRP), Ss(C4)-binding protein, C4-bp, C4b-bp, C4BP.

Physicochemical properties'-^ Human C4BP is a heterogeneous oligomeric protein present in plasma in three isoforms with different subunit composition^. The major isoform, a?/pi {M, (K) 540-570) is a complex of seven identical a chains (C4BPa) and one P chain (C4BP/3). The other isoforms in plasma are a7/pO and a6/pi. The proportion in which the three isoforms are synthesized is determined genetically, but can be modified by factors with a differential effect on the expression of the C4BPA and C4BPB genes^'^. The C4BPa and C4BP/3 chains are disulfide-linked by their C-terminal regions.

pi (after neuraminidase treatment) Amino acids

a chain 6.60, 6.65 (3r 6.75 depending on allele 48 549 61.5 70

P chain not known

leader sequence 17 mature 235 M, (K) predicted 26.4 observed 45 N-linked glycosylation sites potential 3 (221, 506, 528) 5(64, 71,98, 117, 15 Interchain disulfide bonds located in the oligomerization domain. Precise number and positions are not known.

Structure The C4b-binding protein molecule has a spider-like structure as observed by electron microscopy^. Synchrotron X-ray scattering and hydrodynamic analyses suggest that human C4BP has a more compact structure in solution^. Both C4BPa and C4BP/3 chains are members of the RCA family^.

Function C4BP is a regulator of complement activation^. It binds to C4b, accelerates the decay of the classical pathway C3/C5 convertase, and functions as a cofactor in the factor I-mediated inactivation of C4b*. Each C4BPa chain has a binding site for C4b, spanning from CCP-1 to CCP-3^. The C4BPa chain also carries a binding site for the serum amyloid P component (SAP)^^. The C4BP/3 chain binds and inactivates the anticoagulant protein S, suggesting that C4BP/3 plays a role in the control of the coagulation system^^. However, it is unclear how this regulatory mechanism would operate. Similarly, the

C4b-bindiiig protein

functional significance of the C4BPa/C4BP/3 association remains uncertain. C4BPa

C4BPP

• • • D

Tissue distribution Serum protein: 150-300 |ig/ml. Primary site of synthesis: liver (hepatocytes).

Regulation of expression^^^^^^^ The characteristics of the C4BPA and C4BPB promoters provide an explanation for the hepatic-specific expression of the C4BP polypeptides. The C4BPA promoter contains HNFl- and HNF3-binding sites, and the C4BPB promoter contains binding sites for the HNF3 and NFI/CTF transcription factors. C4BP is an acute-phase protein. However, acute-phase mediators like IL-6, IL-1/3, TNFa and INF7 differentially regulate the C4BPA and C4BPB genes in FIep3B cells, and have a dramatic effect on the proportions of the C4BP isoforms secreted by these cells. Sequence analyses of the C4BPA and C4BPB promoters have revealed potential target elements for different classes of cytokine response factors.

Protein sequences^^^ C4BPa MHPPKTPSGA GPPPTLSFAA EWVYNTFCIY TSRCEVQDRG YSCDPRFSLL SGFGPIYNYK NLPDIPHASW NLRWTPYQGC ETSRFSAICQ lYECDKGYIL YVEPENVTIQ TGKRLMQCLP

LHRKRKMAAW PMDITLTETR KRCRHPGELR VGWSHPLPQC GHASISCTVE DTIVFKCQKG ETYPRPTKED EALCCPEPKL GDGTWSPRTP VGQAKLSCSY CDSGYGWGP NPEDVKMALE

PFSRLWKVSD FKTGTTLKYT NGQVEIKTDL EIVKCKPPPD NETIGVWRPS FVLRGSSVIH VYWGTVLRY NNGEITQHRK SCGDICNFPP SHWSAPAPQC QSITCSGNRT VYKLSLEIEQ

PILFQMTLIA CLPGYVRSHS SFGSQIEFSC IRNGRHSGEE PPTCEKITCR CDADSKWNPS RCHPGYKPTT SRPANHCVYF KIAHGHYKQS KALCRKPELV WYPEVPKCEW LELQRDSARQ

ALLPAVLGNC TQTLTCNSDG SEGFFLIGST NFYAYGFSVT KPDVSHGEMV PPACEPNSCI DEPTTVICQK YGDEISFSCH SSYSFFKEEI NGRLSVDKDQ ETPEGCEQVL STLDKEL

50 100 150 2 00 250 3 00 3 50 400 450 500 550

VAWRVSASDA LFCNASKEWD YILKGSNRSQ ISYYCEDRYY ESKNLCEAME

EHCPELPPVD NTTTECRLGH CLEDHTWAPP LVGVQEQQCV NFMQQLKESG

NSIFVAKEVE CPDPVLVNGE FPICKSRDCD DGEWSSALPV MTMEELKYSL

GQILGTYVCI FSSSGPVNVS PPGNPVHGYF CKLIQEAPKP ELKKAELKAK

50 100 150 2 00 2 50

C4BPj8 MFFWCACCLM KGYHLVGKKT DKITFMCNDH EGNNFTLGST ECEKALLAFQ LL

The leader sequence is underlined and potential N-linked glycosylation sites are indicated (N).

C4b-binding protein

Protein modules^^^^ C4BPa 1-48 49-109 110-171 172-235 236-296 297-361 362-424 425-481 482-540 541-597

Leader sequence CCP CCP CCP CCP CCP CCP CCP CCP C-terminal oligomerization domain

exon 2 exon 3 exon 4/5 exon 6 exon 7 exon 8 exon 9 exon 10 exon 11 exon 12

C4BP/3 1-17 18-77 78-135 136-192 193-252

Leader sequence CCP CCP CCP C-terminal oligomerization domain

exon 3 exon 4 exon 5 exon 6/7 exon 7/8

Chromosomal location The C4BPA and C4BPB genes are closely linked within the RCA gene cluster. Human^«: lq32. Mouse^^: chromosome 1, 67.6 cM. Rat^o; 13q24-q25. Cen

Tel FHR2I

••[ZZI HF1

l?FKF^2 PFKFB2

p4BPifk C4BPA

iC4BI?ALJ iC4BPAL1

,

CR2

MCPL1

^ MCP

iF13B

Hs. 8688

C4BPB

SRP72

C4BPAL2

DAF

CR1

CR1L1

cDNA sequences C4BPA AACCGTCCTT ACCAGTCAAC GATCAAGGCA CTTCCTCAAC TCCATCTGGG GAAAGTCTCT TCTTGGCAAT GACTGAGACA CAGATCCCAT CTTCTGTATC TAAGACAGAT AATTGGCTCA TCTCCCACAA

GACCAGCCAA TTCAGGGTAT GTTTTCTTCT TACCAAAGAA GCTCTTCATA GATCCAATTC TGTGGTCCTC CGCTTCAAAA TCAACTCAGA TACAAACGAT TTATCTTTTG ACCACTAGTC TGTGAAATTG

CCACATGGCT GAAATTCAGG GACTCTTTGG TGGAGCAATT

60

TATGATAAAC TCTGATCTGG GGAGGAACCA GGACTAGATA

120

TTGAGAAACT ATCCCAGATA TCATCATAGA GTGTTCTGCT

180

AAACATCAGC GAAGCAGCAG GCCATGCACC CCGCAAAAAC

240

GAAAAAGGAA AATGGCAGCC TGGCCCTTCT CCAGGGTGTG

300

TCTTCCAAAT GACCTTGATC GCTGCTCTGT

TGCGTGGTGT

360

CACCCACTTT ATCATTTGCT GCCCCGATGG ATATTACGTT

420

CTGGAACTAC TCTGAAATAC ACCTGCCTCC

GTGGCTAGGT

480

CGCTTACCTG TAATTCTGAT GGCGAATGGG TGTATAACAC

540

GCAGACACCC AGGAGAGTTA CGTAATGGGC AAGTAGAGAT

600

GATCACAAAT AGAATTCAGC TGTTCAGAAG G A T T T T T C T T

660

GTTGTGAAGT CCAAGATAGA GGAGTTGGCT GGAGTCATCC

720

TCAAGTGTAA GCCTCCTCCA GACATCAGGA ATGGAAGGCA

780

Ending protein

cDNA sequence CAGCGGTGAA CTTCTCACTC TTGGAGACCA TGGGGAAATG GTGCCAAAAA ATGGAATCCT ACATGCTTCC TGTGTTAAGG GATTTGTCAG TGAACCAAAG CTGTGTTTAT AGCTATATGC CAATTTTCCT CAAAGAAGAG CTCCTGCAGT ACCAGAATTA TGTCACCATC TGGGAACAGA TGAACAAGTG AATGGCCCTG CAGCGCAAGA AGGTGTCTTG CAATTTGGCA GTGCTTTGAG CACACAAAGC

continued

GAAAATTTCT TTGGGCCATG AGCCCTCCTA GTCTCTGGAT GGTTTTGTTC TCTCCTCCTG TGGGAAACAT TACCGCTGTC AAAAATTTGA CTAAATAATG TTCTATGGAG CAAGGAGATG CCTAAAATTG ATTATATATG TATTCACACT GTGAATGGAA CAATGTGATT ACCTGGTACC CTCACAGGCA GAGGTATATA CAATCCACTT CTGGCTTGCC GTGATATTCA ATTGTGAAAT ACAAATTTTT

ACGCATACGG CCTCCATTTC CCTGTGAAAA TTGGACCCAT TCAGAGGCAG CTTGTGAGCC ATCCTAGGCC ATCCTGGCTA GATGGACCCC GTGAAATCAC ATGAGATTTC GCACGTGGAG CCCATGGGCA AATGTGATAA GGTCAGCTCC GGTTGTCTGT CTGGCTATGG CAGAGGTGCC AAAGACTCAT AGCTGTCTCT TGGATAAAGA TCTTGCAATT TCATAATAAA TATTAATCAT TTTCGATTAA

CTTTTCTGTC TTGCACTGTG AATCACCTGT CTATAATTAC CAGTGTAATT CAATAGTTGT GACAAAAGAG CAAACCCACT ATACCAAGGA TCAACACAGG ATTTTCATGT TCCCCGAACA TTATAAACAA AGGCTACATT AGCCCCTCAA GGATAAGGAT TGTGGTTGGT CAAGTGTGAG GCAGTGTCTC GGAAATTGAA ACTATAATTT CAATACAGAT TATCTAGAAA CCTCTGTGTG AAATGTATGT

ACCTACAGCT GAGAATGAAA CGCAAGCCAG AAAGACACTA CATTGTGATG ATTAATTTAC GATGTGTATG ACAGATGAGC TGTGAGGCGT AAAAGTCGTC CATGAGACCA CCATCATGTG TCTAGTTCAT CTGGTCGGAC TGTAAAGCTC CAGTATGTTG CCCCAAAGTA TGGGAGACCC CCAAACCCAG CAACTGGAAC TTCTCAAAAG CAGTTTAGCA TGATAATTTG CTCATGTTTT AT

GTGACCCCCG CAATAGGCGT ATGTTTCACA TTGTGTTTAA CTGATAGCAA CAGACATTCC TTGTTGGGAC CTACGACTGT TATGTTGCCC CTGCCAATCA GTAGGTTTTC GAGACATTTG ACAGCTTTTT AGGCGAAACT TGTGTCGGAA AGCCTGAAAA TCACTTGCTC CCGAAGGCTG AGGATGTGAA TACAGAGAGA AAGGAGGAAA AATCTACTGT CTAAAGTTTA TGCTTTTCAA

TCACATACAT TGACAATTAC CCTTAGAGAT AAATAAAAAA CTGGGAAGCC TGGGGAGAGG TGGCGAGTTT ATATTTGTCG TACCACCTGG ACTACTGAGT TCTTCAGGGC CTCAAGGGCA ATCTGCAAAA AATAACTTCA GGCGTGCAGG TTGATCCAGG AAGAACCTCT ATGGAGGAGC TAACACTACA TCTGAAA

TGAGACCAAA GCTGTTGCTT ACATAAAAGA GCAGGCCTTT CTAACTCTGG ACTTTGATCA CTGCTTCAGA CAAAGGAGGT TAGGAAAGAA GCCGCTTGGG CTGTGAATGT GCAATCGGAG GTAGGGACTG CCTTAGGATC AGCAGCAATG AAGCTCCCAA GCGAAGCCAT TAAAATATTC GCTGAGCAGA

AAGACCAAGT CTGAGTGAGA GACAAGCAAT GGAGCTCTCA AGGGACAGAG CCAGATGTTT TGCAGAGCAC GGAAGGACAG GACCCTTTTT CCACTGTCCT AAGTGACAAA CCAGTGTCTA TGACCCTCCT CACCATTAGT CGTTGATGGG ACCAGAGTGT GGAGAACTTT TCTGGAGCTG TGTAATAGAA

ACCTATAAGA AGTTACAGGC TTCCAAAACA GCTTTGGAGT ACAGGTGTCT TTTTGGTGTG TGTCCAGAGC ATTCTGGGGA TGCAATGCCT GATCCTGTGC ATCACGTTTA GAGGACCACA GGGAATCCAG TATTACTGTG GAGTGGAGCA GAGAAGGCAC ATGCAACAAT AAGAAAGCTG ATAAACCTAT

GGACCAACCC 60 CCAAGAAAGG 120 AAAAGCAAAG 180 CAGTTAAGAC 240 GAGCTGGGTG 300 CGTGCTGTCT 360 TTCCTCCAGT 420 CTTACGTTTG 480 CTAAGGAGTG 540 TGGTGAATGG 600 TGTGCAATGA 660 CCTGGGCACC 720 TTCATGGCTA 780 AAGACAGGTA 840 GTGCACTTCC 900 TTCTTGCCTT 960 TAAAGGAAAG 1020 AGTTGAAGGC 1080 GAATAAATTT 1140

TTTGTTGGGG AGAAACCAGG GTATTTTTGC AGATTTTTTC GAGGTGGTTA

AAGGATAAGC AAATTTCTGA AGTTCCTCCC TTTTCATTTT GGTTGGTCTT

CTGCGTTTCA AACTGAGTTT TATACCAGCT CTCCAGGGAG AAGCAGTGTT

AAAAGACTGG TAAACCTGGC TCCTTACAGC GGTAAATCAA AGAAGATCTA

TCAACTGGAT AACTCTTTTA CGTTCTGATT TTAACCTCTT TTTTTTTTCA

840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220

C4BPB A12 ATTCTGTCTT AGACGGGCTG GTAATGACAG GCAAAAAGAA CAGTTCCTTG AATTCCAGCC TATGGTTGCG GGACAATAGC TATCAAGGGC GGATAACACC AGAGTTCAGT CCACTACATC TCCCTTTCCC TTTTGAAGGA CTACTTAGTG AGTCTGCAAG TCAGGAGAGT TGGCATGACA AAAATTGTTG TCTTCTTGGT

A19 TTGGCCCATG TTGCATTCCC CCATCCCTTT TCCTGTCGTA CTCTAGAGAG

60 120 180 240 300

C4b-binding protein

••

cDNA sequence continued

|nn|

AACCAGGTGT CTGAGCTGGG T T T T T T G G T G TGCGTGCTGT ACTGTCCAGA GCTTCCTCCA AGATTCTGGG GACTTACGTT TTTGCAATGC CTCTAAGGAG CTGATCCTGT GCTGGTGAAT AAATCACGTT TATGTGCAAT TAGAGGACCA CACCTGGGCA CTGGGAATCC AGTTCATGGC GTTATTACTG TGAAGACAGG GGGAGTGGAG CAGTGCACTT GTGAGAAGGC ACTTCTTGCC TTATGCAACA ATTAAAGGAA TGAAGAAAGC TGAGTTGAAG AAATAAACCT ATGAATAAAT

ITHTI

knj nil H H ^^M ^^M ^^1 ^ H ^^M ^ H ^ H ^^m ^ H ^^1

TGAATTCCAG CTTATGGTTG GTGGACAATA TGTATCAAGG TGGGATAACA GGAGAGTTCA GACCACTACA CCTCCCTTTC TATTTTGAAG TACTACTTAG CCAGTCTGCA TTTCAGGAGA AGTGGCATGA GCAAAATTGT TTTCTTCTTG

CCTGGGGAGA CGTGGCGAGT GCATATTTGT GCTACCACCT CCACTACTGA GTTCTTCAGG TCCTCAAGGG CCATCTGCAA GAAATAACTT TGGGCGTGCA AGTTGATCCA GTAAGAACCT CAATGGAGGA TGTAACACTA GTTCTGAAA

GGACTTTGAT TTCTGCTTCA CGCAAAGGAG GGTAGGAAAG GTGCCGCTTG GCCTGTGAAT CAGCAATCGG AAGTAGGGAC CACCTTAGGA GGAGCAGCAA GGAAGCTCCC CTGCGAAGCC GCTAAAATAT CAGCTGAGCA

CACCAGATGT 360 GATGCAGAGC 420 GTGGAAGGAC 480 AAGACCCTTT 540 GGCCACTGTC 600 GTAAGTGACA 660 AGCCAGTGTC 720 TGTGACCCTC 780 TCCACCATTA 840 TGCGTTGATG 900 AAACCAGAGT 960 ATGGAGAACT 1020 TCTCTGGAGC 1080 GATGTAATAG 1140

The first five nucleotides in each exon are underhned to indicate the intron-exon boundaries. The methionine initiation codon (ATG), the termination codons (TAA) and the putative polyadenylation signals (AATAAA) are indicated. Multiple transcription start sites are indicated in bold and underlined.

Genomic structure^^-^^^^ The human C4BPA and C4BPB genes are arranged in tandem with the 5' end of the C4BPA gene located 4172bp downstream the 3' end of the C4BPB gene. The C4BPA gene spans over 40 kb of DNA and is composed of 12 exons, ranging from 186 to 425 bp. C4BPA introns vary from 167 bp to approximately 9 kb. The C4BPB gene spans more than lOkb of DNA and is composed of 8 exons. The C4BPB gene is transcribed in human liver from two different promoters, producing two transcripts of similar size, denoted as A12 and A19, that differ in their 5' untranslated sequences. C4BPB

C4BPA 8

12

1

ifr

^^—i-h

Accession numbers Human Bovine24 Rabbit25 Rat26

Mouse Guinea-pig AM67^^ Pig ApoR^o

C4BPA M31452 L05546-54 [C4BPAL1^^) X81360-62 (C45PAL223) Z31693 Z35490 Z50051 Ml 71222^ U75654 L06820 J50773

C4BPB LI1244-46 M29964 Z31694 Z50052 Z2194428

C4b-binding protein

Deficiency^^ C4BP deficiency should favour C3 consumption through uncontrolled activation of the classical pathway. There is only one case reported with primary deficiency of C4BP. The patient showed an atypical Behcet's disease complicated with angioedema.

Polymorphic variants^"^'^^^^ C4BPa Isoelectric focusing, after neuraminidase treatment results in three variants: C4-bp 1 (pi = 6.65), C4-bp 2 (pi = 6.60), C4-bp 3 (pi = 6.75) with allele frequencies of C4BP*1 (0.986), C4BP*2 (0.010), C4BP*3 (0.004). T1292C; Y357H Associated genetic markers: D1S3704 (CA repeat). C4BPj3 C818T (A19 numbering). Frequency of the alleles is T (0.16) and C (0.84). G to A intron 4, position +3. Frequency of the alleles is G (0.47) and A (0.53). Associated genetic markers: C4BPB (CA repeat).

References ' Hillarp, A. et al. (1989) FEBS Lett. 259, 53-56. 2 Perkins, J. et al. (1986) Biochem. J. 233, 799-807. 3 Barlow, P.N. et al. (1993) J. Mol. Biol. 232, 268-284. ^ Scharfstein, J. et al. (1978) J. Exp. Med. 148, 207-222. 5 Dahlback, B. et al. (1983) Proc. Natl Acad. Sci. USA 80, 3461-3465. 6 Sanchez-Corral, P. et al. (1995) J. Immunol. 155, 4030-4036. 7 Criado Garcia, O. et al. (1995) J. Immunol. 155, 4037-4043. « Fujita, T. et al. (1978) J. Exp. Med. 148, 1044-1051. 9 Accardo, P. et al. (1996) J. Immunol. 157, 4935-4939. 0 Garcia de Frutos, P. et al. (1995) J. Biol. Chem. 270, 26950-26955. ' Hardig, Y. et al. (1993) J. Biol. Chem. 268, 3033-3036. 2 Arenzana, N. et al. (1995) Biochem. J. 308, 613-621. 3 Arenzana, N. et al. (1996) J. Immunol. 156, 168-175. -^ Chung, L.P. et al. (1985) Biochem. J. 230, 133-141. 5 Hillarp, A. et al. (1990) Proc. Natl Acad. Sci. USA 87, 1183-1187. 6 Rodriguez de Cordoba, S. et al. (1991) J. Exp. Med. 173, 1073-1082. 7 Hillarp, A. et al. (1993) J. Biol. Chem. 268, 15017-15023. s Pardo-Manuel, F. et al. (1990) Proc. Natl Acad. Sci. USA 87, 4529-4532. Seldin, M.F. et al. (1988) J. Exp. Med. 167, 688-693. 20 Andersson, A. et al. (1990) Somat. Cell. Mol. Genet. 16, 493-500. 2^ Aso, T. et al. (1991) Biochem. Biophys. Res. Commun. 174, 222-227. '' Sanchez-Corral, P. et al. (1993) Genomics 17, 185-193. 23 Pardo-Manuel de Villena, F. et al. (1995) Immunogenetics 41, 139-143. 2^' Hillarp, A. et al. (1994) J. Immunol. 153, 4190-4199. 25 Garcia de Frutos, P. et al. (1995) Biochim. Biophys. Acta 1261, 285-289 26 Hillarp, A. et al. (1997) J. Immunol. 158, 1315-1323. 27 Kristensen, T. et al. (1987) Biochemistry 26, 4668-4674.

C4b-binding protein

Rodriguez de Cordoba, S. et al. (1994) Genomics 21, 501-509. Foster, J.A. et al. (1997) J. Biol. Chem. 272, 12714-12722. Cooper, S.T. et al. (1992) Biochemistry 31, 12328-12336. Trapp, R.G. et al. (1987) J. Rheumatol. 14, 135-138. Rodriguez de Cordoba, S. et al. (1987) Immunogenetics 25, 267-268. Morboeuf, O. et al. (1998) Br. J. Haematol. 101, 10-15.

Factor H

/?1H, FH

Richard G. DiScipio, La JoUa Institute for Experimental Medicine, La JoUa, CA, USA

Physicochemical properties Factor H is synthesized as a single-chain molecule of 1231 amino acids including an 18 amino acid leader sequence^'^. Mature protein: pP (predicted) 5.7-6.2 (observed) 6.5-6.75 (after neuraminidase treatment) Mr(K) N-linked glycosylation sites^ Potential Known to be occupied Known to be unoccupied

155 9 (217, 529, 718, 802, 822, 882, 911, 1029, 1095) 5 (529, 802, 822, 882, 911) 1 (217)

Structure The tandem array of CCP modules gives rise to an elongated flexible molecule which is creased at least once. The contour length is 49.5 nm and the cross-section thickness is 3.4nm'^. The 5th and 16th CCP modules were solved by NMR. The general structure of a CCP module is an ellipsoid of maximal length of 38 nm, consisting of five P strands linked by two overlapping disulfide bonds^'^. A pair of CCP modules (15th and 16th) was also solved, and the data show that a wide range of twist angles between these modules is possible, but a much more limited range of tilt angles can exist^.

Function Factor H controls the activity of the alternative pathway C3/C5 convertase by competing with factor B for C3b binding, by displacing the Bb subunit from the convertase, and by serving as a cofactor for factor I to mediate the cleavage of C3b to iC3b*-^'^. Factor H and thrombin-treated factor H are reported to be chemotactic for monocytes^^'^^. Factor H also serves as an adhesion protein for neutrophils and a secretagogue of IL-1/3 from monocytes^^. A truncated form of factor H, consisting of the first 7 of 20 modules, supports adhesion of epithelial and fibroblast cell lines by displaying the tripeptide sequence ROD found in the fourth module^^.

Tissue distribution Serum protein: -550 jiig/mP^. Primary site of synthesis: liver^^. Secondary sites: monocytes, endothelial cells, fibroblasts and myoblasts^^^^.

Regulation of expression The synthesis of factor H by fibroblast, monocyte, endotheHal and cultured liver cells is augmented by iFN^^^'^^'^o

Protein sequence^ MRLLAKIICL KCRPGYRSLG GNVFEYGVKA APENGKIVSS EKPKCVEISC CTESGWRPLP ATRGNTAKCT KYYSYYCDEH NHGRKFVQGK SSIDIENGFI AQPTCIKSCD IVCGYNGWSD FTIVGPNSVQ HSEWEYYCN WAQLSSPPYY KKCKSSNLII VNCSMAQIQL ITCKDGRWQS CEGGFRISEE YGEEVTYKCF MGEKKDVYKA PPTVQNAYIV QCKDSTGKCG TCRNGQWSEP CKRGYRLSSR

MLWAICVAED NVIMVCRKGE VYTCNEGYQL AMEPDREYHF KSPDVINGSP SCEEKSCDNP STGWIPAPRC FETPSGSYWD SIDVACHPGY SESQYTYALK IPVFMNARTK LPICYERECE CYHFGLSPDL PRFLMKGPNK YGDSVEFNCS LEEHLKNKKE CPPPPQIPNS IPLCVEKIPC NETTCYMGKW EGFGIDGPAI GEQVTYTCAT SRQMSKYPSG PPPPIDNGDI PKCLHPCVIS SHTLRTTCWD

CNELPPRRNT WVALNPLRKC LGEINYRECD GQAVRFVCNS ISQKIIYKEN YIPNGDYSPL TLKPCDYPDI HIHCTQDGWS ALPKAQTTVT EKAKYQCKLG NDFTWFKLND LPKIDVHLVP PICKEQVQSC IQCVDGEWTT ESFTMIGHRS FDHNSNIRYR HNMTTTLNYR SQPPQIEHGT SSPPQCEGLP AKCLGEKWSH YYKMDGASNV ERVRYQCRSP TSFPLSVYAP REIMENYNIA GKLEYPTCAK

EILTGSWSDQ QKRPCGHPGD TDGWTNDIPI GYKIEGDEEM ERFQYKCNMG RIKHRTGDEI KHGGLYHENM PAVPCLRKCY CMENGWSPTP YVTADGETSG TLDYECHDGY DRKKDQYKVG GPPPELLNGN LPVCIVEEST ITCIHGVWTQ CRGKEGWIHT DGEKVSVLCQ INSSRSSQES CKSPPEISHG PPSCIKTDCL TCINSRWTGR YEMFGDEEVM ASSVEYQCQN LRWTAKQKLY R

TYPEGTQAIY TPFGTFTLTG CEWKCLPVT HCSDDGFWSK YEYSERGDAV TYQCRNGFYP RRPYFPVAVG FPYLENGYNQ RCIRVKTCSK SIRCGKDGWS ESNTGSTTGS EVLKFSCKPG VKEKTKEEYG CGDIPELEHG LPQCVAIDKL VCINGRWDPE ENYLIQEGEE YAHGTKLSYT WAHMSDSYQ SLPSFENAIP PTCRDTSCVN CLNGNWTEPP LYQLEGNKRI SRTGESVEFV

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 10 5 0 110 0 1150 12 0 0

The signal sequence is underlined and the TV-linked glycosylation sites are indicated (N). N217, though a potential N-linked glycosylation site, is known not to be glycosylated and therefore not indicated.

Protein modules 1-18 1-62 63-123 124-187 188-244 245-302 303-367 368-425 426-487 488-547 548-606

leader peptide CCPl CCP2 CCP3 CCP4 CCP5 CCP6 CCP7 CCP8 CCP9 CCPIO

Factor H

607-668 669-731 732-790 791-847 848-908 909-967 968-1026 1027-1085 1086-1146 1147-1213

CCPll CCP12 CCP13 CCP14 CCP15 CCP16 CCP17 CCP18 CCP19 CCP20

CCPl, 2 and 3 are required for I cofactor and Bb decay acceleration activities2^'22 CCP4 augments I cofactor activity and is required for Bb decay acceleration2^'22 CCP7 is involved in heparin and streptococcus Mprotein binding23-25 ^^d CCP20 plays a role in heparin binding^^. Three C3bbinding sites are created by CCP clusters: 1-4, 6-10 and 16-20^^.

Chromosomal location Human2«'29; chromosome lq32. The gene for factor H is within the RCA cluster of genes. Telomere ... MCP CR1.CR2.DAF.C4BP .FH ... Centromere Mouse: chromosome 1, 74.1 cM.

cDNA sequence AATTCTTGGA AAAGATCCAA TGTAGCAGAA CTGGTCTGAC TAGATCTCTT ATTAAGGAAA TACCCTTACA GGGGTATCAA TGATATTCCT AATTGTCAGT TGTATGTAAC TTTTTGGAGT AAATGGATCT ATGTAACATG GCGTCCGTTG CTACTCACCT TGGTTTTTAT TGCTCCGAGA TCATGAGAAT CTGTGATGAA AGATGGATGG TGGATATAAT CCATCCTGGC GTCTCCTACT GAATGGGTTT ATGCAAACTA

AGAGGAGAAC AAAATGAGAC GATTGCAATG CAAACATATC GGAAATGTAA TGTCAGAAAA GGAGGAAATG TTGCTAGGTG ATATGTGAAG AGTGCAATGG TCAGGCTACA AAAGAGAAAC CCTATATCTC GGTTATGAAT CCTTCATGTG TTAAGGATTA CCTGCAACCC TGTACCTTGA ATGCGTAGAC CATTTTGAGA TCGCCAGCAG CAAAATCATG TACGCTCTTC CCCAGATGCA ATTTCTGAAT GGATATGTAA

TGGACGTTGT TTCTAGCAAA AACTTCCTCC CAGAAGGCAC TAATGGTATG GGCCCTGTGG TGTTTGAATA AGATTAATTA TTGTGAAGTG AACCAGATCG AGATTGAAGG CAAAGTGTGT AGAAGATTAT ACAGTGAAAG AAGAAAAATC AACACAGAAC GGGGAAATAC AACCTTGTGA CATACTTTCC CTCCGTCAGG TACCATGCCT GAAGAAAGTT CAAAAGCGCA TCCGTGTCAA CTCAGTATAC CAGCAGATGG

GAACAGAGTT GATTATTTGC AAGAAGAAAT CCAGGCTATC CAGGAAGGGA ACATCCTGGA TGGTGTAAAA CCGTGAATGT TTTACCAGTG GGAATACCAT AGATGAAGAA GGAAATTTCA TTATAAGGAG AGGAGATGCT ATGTGATAAT TGGAGATGAA AGCCAAATGC TTATCCAGAC AGTAGCTGTA AAGTTACTGG CAGAAAATGT TGTACAGGGT GACCACAGTT AACATGTTCC ATATGCCTTA TGAAACATCA

AGCTGGTAAA CTTATGTTAT ACAGAAATTC TATAAATGCC GAATGGGTTG GATACTCCTT GCTGTGTATA GACACAGATG ACAGCACCAG TTTGGACAAG ATGCATTGTT TGCAAATCCC AATGAACGAT GTATGCACTG CCTTATATTC ATCACGTACC ACAAGTACTG ATTAAACATG GGAAAATATT GATCACATTC TATTTTCCTT AAATCTATAG ACATGTATGG AAATCAAGTA AAAGAAAAAG GGATCAATTA

TGTCCTCTTA GGGCTATTTG TGACAGGTTC GCCCTGGATA CTCTTAATCC TTGGTACTTT CATGTAATGA GATGGACCAA AGAATGGAAA CAGTACGGTT CAGACGATGG CAGATGTTAT TTCAATATAA AATCTGGATG CAAATGGTGA AGTGTAGAAA GCTGGATACC GAGGTCTATA ACTCCTATTA ATTGCACACA ATTTGGAAAA ACGTTGCCTG AGAATGGCTG TAGATATTGA CGAAATATCA GATGTGGGAA

60 12 0 180 240 3 00 3 60 42 0 480 54 0 600 6 60 72 0 7 80 840 900 960 102 0 1080 1140 12 0 0 12 60 13 2 0 13 80 1440 150 0 1560

Factor H

cDNA sequence AGATGGATGG TGCCAGAACT CCATGATGGT TGGTTGGTCT ACACTTAGTT CTGCAAACCA GTCTCCTGAC CCTCAATGGG ATATTATTGC AGAGTGGACA ACTTGAACAT ATTCAATTGC AGTATGGACC AAATTTAATT CATAAGGTAC ATGGGATCCA GATTCCCAAT TGTTCTTTGC AAGATGGCAG AGAACACGGA ATTGAGTTAT CATGGGAAAA GATTTCTCAT GTACAAATGT AAAATGGTCT AAATGCCATA CACTTGTGCA ATGGACAGGA TGCTTATATA ATGTAGGAGC GACGGAACCA CAATGGGGAC CCAATGCCAG ATGGTCAGAA TTATAACATA AGTTGAATTT AACATGTTGG AAGTGCACAC TATTGTTTTA TATAAGCTGA

TCAGCTCAAC AAAAATGACT TATGAAAGCA GATTTACCCA CCTGATCGCA GGATTTACAA CTCCCAATAT AATGTTAAGG AATCCTAGAT ACTTTACCAG GGCTGGGCCC TCAGAATCAT CAACTTCCCC ATACTTGAGG AGATGTAGAG GAAGTGAACT TCTCACAATA CAAGAAAATT TCAATACCAC ACCATTAATT ACTTGTGAGG TGGAGTTCTC GGTGTTGTAG TTTGAAGGTT CACCCTCCAT CCCATGGGAG ACATATTACA AGGCCAACAT GTGTCGAGAC CCTTATGAAA CCTCAATGCA ATTACTTCAT AACTTGTATC CCACCAAAAT GCATTAAGGT GTGTGTAAAC GATGGGAAAC CTTTATTCAG CTCCTTTTTA GACCGGTGGC

continued CCACGTGCAT TCACATGGTT ATACTGGAAG TATGTTATGA AGAAAGACCA TAGTTGGACC GTAAAGAGCA AAAAAACGAA TTCTAATGAA TGTGTATTGT AGCTTTCTTC TTACAATGAT AGTGTGTGGC AACATTTAAA GAAAAGAAGG GCTCAATGGC TGACAACCAC ATCTAATTCA TCTGTGTTGA CATCCAGGTC GTGGTTTCAG CACCTCAGTG CTCACATGTC TTGGAATTGA CATGCATAAA AGAAGAAGGA AAATGGATGG GCAGAGACAC AGATGAGTAA TGTTTGGGGA AAGATTCTAC TCCCGTTGTC AACTTGAGGG GCTTACATCC GGACAGCCAA GGGGATATCG TGGAGTATCC AACTTTAGTA TTCATACGTA TCTCTT

TAAATCTTGT TAAGCTGAAT CACCACTGGT AAGAGAATGC GTATAAAGTT TAATTCCGTT AGTACAATCA AGAAGAATAT GGGACCTAAT GGAGGAGAGT CCCTCCTTAT TGGACACAGA AATAGATAAA AAACAAGAAG ATGGATACAC ACAAATACAA ACTGAATTAT GGAAGGAGAA AAAAATTCCA TTCACAAGAA GATATCTGAA TGAAGGCCTT AGACAGTTAT TGGGCCTGCA AACAGATTGT TGTGTATAAG AGCCAGTAAT CTCCTGTGTG ATATCCATCT TGAAGAAGTG AGGAAAATGT AGTATATGCT TAACAAGCGA GTGTGTAATA ACAGAAGCTT TCTTTCATCA AACTTGTGCA TTAAATCAGT AAATTTTGGA

GATATCCCAG GACACATTGG TCCATAGTGT GAACTTCCTA GGAGAGGTGT CAGTGCTACC TGTGGTCCAC GGACACAGTG AAAATTCAAT ACCTGTGGAG TACTATGGAG TCAATTACGT CTTAAGAAGT GAATTCGATC ACAGTCTGCA TTATGCCCAC CGGGATGGAG GAAATTACAT TGTTCACAAC AGTTATGCAC GAAAATGAAA CCTTGTAAAT CAGTATGGAG ATTGCAAAAT CTCAGTTTAC GCGGGTGAGC GTAACATGCA AATCCGCCCA GGTGAGAGAG ATGTGTTTAA GGGCCCCCTC CCAGCTTCAT ATAACATGTA TCCCGAGAAA TATTCGAGAA CGTTCTCACA AAAAGATAGA TCTCAATTTC TTAATTTGTG

TATTTATGAA ACTATGAATG GTGGTTACAA AAATAGATGT TGAAATTCTC ACTTTGGATT CTCCTGAACT AAGTGGTGGA GTGTTGATGG ATATACCTGA ATTCAGTGGA GTATTCATGG GCAAATCATC ATAATTCTAA TAAATGGAAG CTCCACCTCA AAAAAGTATC GCAAAGATGG CACCTCAGAT ATGGGACTAA CAACATGCTA CTCCACCTGA AAGAAGTTAC GCTTAGGAGA CTAGCTTTGA AAGTGACTTA TTAATAGCAG CAGTACAAAA TACGTTATCA ATGGAAACTG CACCTATTGA CAGTTGAGTA GAAATGGACA TTATGGAAAA CAGGTGAATC CATTGCGAAC ATCAATCATA ATTTTTTATG AAAATGTAAT

162 0 168 0 17 4 0 180 0 18 60 192 0 19 80 2 040 2100 2160 2 22 0 22 80 2340 2 4 00 2 4 60 2 52 0 2 5 80 2 640 27 00 27 60 2 82 0 2 880 2 94 0 3 000 3 06 0 312 0 3180 32 4 0 3 3 00 3 3 60 3 42 0 3 480 3 540 3 6 00 3 660 3 72 0 3780 3 84 0 3 9 00

The initiation methionine (ATG) and termination codon (TAG) are indicated.

Genomic structure^^ The structure of the human gene is unknown. The murine gene spans 120 kb and is composed of 22 exons. The 5' untranslated region and the leader peptide are encoded by the first exon. Of the 20 CCP modules comprising factor H, 19 were coded for by single exons. Only CCP module 2 was encoded by two exons. Exon sizes vary between 77 and 210 bp, but introns show a larger range of sizes: 86 bp to 26 kb.

Accession numbers Human^ Moused

Y00716 Ml 2660

Deficiency Autosomal recessive. Uncontrolled activation of the alternative pathway C3 results in reduced levels of C3 accompanied by meningococcal disease, glomerulonephritis, chronic hypocomplementemic renal disease and systemic lupus erythematosus^^-^^.

Polymorphic variants Five variant gene frequencies in white populations, only two of which are common: FH*1 (0.6-0.69), FH*2 (0.30-OA)^'*'^^. Among Japanese donors there are two common and two rare alleles, along with a null allele^^. Two forms, (|)1 and (^2, differ in affinity for phenyl-Sepharose, with ct)l being more hydrophilic and (\)2 more hydrophobic. The molecular differences between these two forms is unknown,- therefore, these may or may not be genetic variants^^. References ^ Ripoche, J. et al. (1988) Biochem. J. 249, 593-602. 2 Kristensen, T. and Tack, B.F. (1986) Proc. Natl Acad. Sci. USA 83, 3963-3967. 3 Sim, R.B. and DiScipio, R.G. (1982) Biochem. J. 205, 285-293. ^ DiScipio, R.G. (1992) J. Immunol. 149, 2592-2599. 5 Norman, D.G. et al. (1991) J. Mol. Biol. 219, 717-725. 6 Barlow, P.N. et al. (1992) Biochemistry 31, 3626-3634. ' Barlow, P.N. et al. (1993) J. Mol. Biol. 232, 268-284. « Whaley, K. and Ruddy, S. (1976) J. Exp. Med. 144, 1147-1163. 9 Kazatchkine, M.D. et al. (1979) J. Immunol. 122, 75-81. 0 Weiler, J.M. et al. (1976) Proc. Natl Acad. Sci. USA 73, 3268-3272. Nabil, K. et al. (1997) Biochem. J. 326, 377-381. Ohtsuka, H. et al. (1993) Immunology 80, 140-145. Iferroudjene, D. et al. (1991) Eur. J. Immunol. 21, 967-972. Hellwage, J. et al. (1997) Biochem. J. 326, 321-327. Vik, D.P. et al. (1990) J. Biol. Chem. 265, 3193-3201. Lappin, D.F. et al. (1992) Biochem. J. 281, 437-442. Legoedec, J. et al. (1995) Eur. J. Immunol. 25, 3460-3466. Guc, D. et al. (1993) Rheum. Int. 13, 139-146. Vik, D.P. (1996) Scand. J. Immunol. 44, 215-222. 20 Brooimans, R.A. et al. (1989) J. Immunol. 142, 2024-2030. 2^ Gordon, D.L. et al. (1995) J. Immunol. 155, 348-356. 22 Kuhn, S. and Zipfel, P.F. (1996) Eur. J. Immunol. 26, 2383-2387. 23 Blackmore, T.K. et al. (1998) Infect. Immun. 66, 1427-1431. 24 Blackmore, T.K. et al. (1992) J. Immunol. 157, 5422-5427. 25 Kotarsky, H. et al. (1998) J. Immunol. 160, 3349-3354.

Factor H

26 Blackmore, T.K. et al. (1998) J. Immunol. 160, 3342-3348. 27 Sharma, J.K. and Pangburn, M.K. (1996) Proc. Natl Acad. Sci. USA 93, 10996-11001. 2s Rodriguez de Cordoba, S. and Rubinstein, P. (1987) Immunogenetics 25, 267-268. 29 Rodriguez de Cordoba, S. and Rubinstein, P. (1984) J. Immunol. 132, 1906-1908. ^0 Vik, D.P. et al. (1988) J. Biol. Chem. 263, 16720-16724. 3^ Fijen, C.A. et al. (1996) Clin. Exp. Immunol. 105, 511-516. ^2 Nielsen, H.E. et al. (1989) Scand. J. Immunol. 30, 711-718. 33 Ault, B.H. et al. (1997) J. Biol. Chem. 272, 25168-25175. 34 Zhou, M. and Larsen, B. (1990) Hum. Hered. 40, 55-57. 35 Day, A.J. et al. (1988) Immunogenetics 27, 211-214. 36 Nakamura, S. et al. (1990) Hum. Hered. 40, 121-126. 37 Ripoche, J. et al. (1984) Biochem. J. 221, 89-96.

This Page Intentionally Left Blank

Part 6 Cell Surface Receptors

ClqRp Andrea J. Tenner, Department of Molecular Biology and Biochemistry, University of California, Irvine, CA, USA

D

Other names Clq receptor that enhances phagocytosis, human Clq/MBL/SPA receptor.

Physicochemical properties^ ClqRp is synthesized as a single-chain molecule which in humans is a 652 amino acid precursor protein that includes a 21 amino acid leader sequence. pi (predicted) 5.24 M, (K) predicted 66.5 observed 126 (reduced) 100 (unreduced) N-linked glycosylation sites^ Potential 1 (325)

D

Structure Not determined.

Function Multivalent interaction of this receptor with its known ligands, Clq, MBL and SP-A results in the enhancement of phagocytosis of suboptimally opsonized particles and/or cellular debris^^. Ca'*Ca»*Ca»*

Tissue distribution Myeloid cells, endothelial cells, platelets^'^'^.

D

Regulation of expression Unknown.

Protein sequence (human)^ MATSMGLLLL LLLLLTQPGA G T G A D T E A W CVGTACYTAH SGKLSAAEAQ

50

NHCNQNGGNL ATVKSKEEAQ HVQRVLAQLL RREAALTARM SKFWIGLQRE

10 0

KGKCLDPSLP LKGFSWVGGG EDTPYSNWHK ELRNSCISKR CVSLLLDLSQ

150

PLLPNRLPKW SEGPCGSPGS PGSNIEGFVC KFSFKGMCRP LALGGPGQVT

2 00

YTTPFQTTSS SLEAVPFASA ANVACGEGDK DETQSHYFLC KEKAPDVFDW

2 50

GSSGPLCVSP KYGCNFNNGG CHQDCFEGGD GSFLCGCRPG FRLLDDLVTC

3 00

ASRNPCSSSP CRGGATCVLG PHGKNYTCRC PQGYQLDSSQ LDCVDVDECQ

3 50

DSPCAQECVN TPGGFRCECW VGYEPGGPGE GACQDVDECA LGRSPCAQGC

400

TNTDGSFHCS CEEGYVLAGE DGTQCQDVDE CVGPGGPLCD SLCFNTQGSF

450

ClqRp

Protein sequence HCGCLPGWVL SPTRGPEGTP HATAASGPQE LGLLVYRKRR DC

continued

APNGVSCTMG KATPTTSRPS PAGGDSSVAT AKREEKKEKK

PVSLGPPSGP LSSDAPITSA QNNDGTDGQK PQNAADSYSW

PDEEDKGEKE PLKMLAPSGS LLLFYILGTV VPERAESRAM

GSTVPRAATA SGVWREPSIH VAILLLLALA ENQYSPTPGT

500 550 600 650

The leader sequence is underlined and the potential iV-linked glycosylation site is indicated (N).

Protein modules^ 1-21 29-184 260-301 302-344 345-384 385-426 427-468 469-578 581 -605 606-652

Leader peptide CRD EGF EGF EGF-Ca2+ EGF-Ca2+ EGF-Ca2+ STP Transmembrane domain Intracellular domain

Chromosomal location Unknown.

cDNA sequence^ AAAGCCCTCA CCCCTTGGGG TCCCGCAGAG GCTGCTGCTC CGTGGGGACC CCACTGCAAC CGTCCAGCGA CAAGTTCTGG GAAGGGCTTC GCTCCGGAAC GCTCCTTCCC CGGAAGTAAC GGCCCTGGGG CTTGGAGGCT CGAGACTCAG CAGCTCGGGC CCACCAGGAC CCGGCTGCTG TCGTGGGGGG CCAAGGGTAC CTCCCCCTGT TGGCTATGAG

GCCTTTGTGT CCCAGCTGGG GGCCACACAG CTGACCCAGC GCCTGCTACA CAGAACGGGG GTACTGGCCC ATTGGGCTCC AGCTGGGTGG TCGTGCATCT AACCGCCTGC ATTGAGGGCT GGCCCAGGTC GTGCCCTTTG AGTCATTATT CCCCTCTGTG TGCTTTGAAG GATGACCTGG GCCACGTGCG CAGCTGGACT GCCCAGGAGT CCGGGCGGTC

CCTTCTCTGC AGCCGAGATA AGACCGGGAT CCGGGGCGGG CGGCCCACTC GCAACCTGGC AGCTCCTGAG AGCGAGAGAA GCGGGGGGGA CCAAGCGCTG CCAAGTGGTC TCGTGTGCAA AGGTGACCTA CCTCTGCGGC TCGTGTGCAA TCAGCCCCAA GGGGGGATGG TGACCTGTGC TCCTGGGACC CGAGTCAGCT GTGTCAACAC CTGGAGAGGG

GCCGGAGTGG GAAGCTCCTG GGCCACCTCC GACGGGAGCT GGGCAAGCTG CACTGTGAAG GCGGGAGGCA GGGCAAGTGC GGACACGCCT TGTGTCTCTG TGAGGGCCCC GTTCAGCTTC CACCACCCCC CAATGTAGCC GGAGAAGGCC GTATGGCTGC CTCCTTCCTC CTCTCGAAAC CCATGGGAAA GGACTGTGTG CCCTGGGGGC GGCCTGTCAG

CTGCAGCTCA TCGCGGCTGG ATGGGCCTGC GACACGGAGG AGCGCTGCCG AGCAAGGAGG GCCCTGACGG CTGGACCCTA TACTCTAACT CTGCTGGACC TGTGGGAGCC AAAGGCATGT TTCCAGACCA TGTGGGGAAG CCCGATGTGT AACTTCAACA TGCGGCTGCC CCTTGCAGGT AACTACACGT GACGTGGATG TTCCGCTGCG GATGTGGATG

CCCCTCAGCT GCTTCTCGCC TGCTGCTGCT CGGTGGTCTG AGGGCCAGAA AGGCCCAGCA GGAGGATGAG GTCTGCCGCT GGCACAAGGA TGTCCCAGCC CAGGCTCCCC GCCGGCCTCT CCAGTTCCTC GTGACAAGGA TCGACTGGGG ATGGGGGCTG GACCAGGATT CCAGCCCATG GCCGCTGCCC AATGCCAGGA AATGGTGGGT AGTGTGCTCT

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320

ClqRp tp

cDNA sequence GGGTCGCTCG TGAGGAGGGC TGTGGGCCCG CTGTGGCTGC TGTGTCTCTG GAGCACCGTG GGCTACACCC ACTCAAGATG CGCCACAGCT AAACAACGAT GGCCATCCTA GAAGAGGGAG TCCAGAGCGA CTGCTGAAAG TGAACTCCCC CAAACAATTG TGTTTGATGT TCTATAATGA GGGTGTGAGG ATCTAAGAGG CCTAGGATGA TCAAAGGGAA AGCACAAGTC TAACCTCTTA CTTGGGTTTA CAGGTGTTTG CACAGATACT TGTGATCAAC CCTCAGACAC CGAGCTCAGA TGAACGGGAG TCATAGTCCA CACACCAAGT TTCCTTAAAA TTTTTACAGC TTGCAAATAT

continued

CCTTGCGCCC TACGTCCTGG GGGGGCCCCC CTGCCAGGCT GGACCACCAT CCCCGCGCTG ACCACAAGTA CTGGCCCCCA GCCTCTGGCC GGCACTGACG CTCCTGCTGG GAGAAGAAGG GCTGAGAGCA TGAGGTGGCC ATTCCAAAGG TAAGTCTCCT TCCTGAAGTG TTGTTACTCC AGGCTGGGGC AAAAGGTGAG AAACTAAATC CATGTTCGGA TTGCTAAATG GGTGGCAAGG TTTGCAAAGG TGAAGTCACA TGAATTAATT ACTAACAAGG CCTGCCTGTG CAGAGGAAGC ATGATGCACT CAGTTGATGC AGGGAGCTAG TTGGGGGTAA AAAAACTGCT TTCTCCCTAT

AGGGCTGCAC CCGGGGAGGA TCTGCGACAG GGGTGCTGGC CTGGGCCCCC CAACAGCCAG GACCTTCGCT GTGGGTCCTC CCCAGGAGCC GGCAAAAGCT CCCTGGCTCT AGAAGAAGCC GGGCCATGGA CTAGAGACAC GGCACCCACA CCTTAAAGGC GAAGCTGTGT CCCTCCCTTT TAAGGGGCTC TTGCTCATGC AATTAATTAT CTGGAAACAT TGATACTGTT AGGCAGGAAG AAGCTTGAAA TAATCTACGG CATCCAAATG AAACAAATTC GCCCCGCCTC CCTGCAGAAA GTGTTTTGAA AGCATCCTGA TCAGGCAGTT GGAGGGAAGG CAAAGCCATT GATAATGCAG

CAACACAGAT CGGGACTCAG CTTGTGCTTC CCCAAATGGG CGATGAGGAG TCCCACAAGG GTCATCTGAC AGGCGTCTGG TGCAGGTGGG GCTTTTATTC GGGGCTACTG CCAGAATGCG GAACCAGTAC TAGAGTCACC TTTTTTTGAA CCCTTGGAAC GTTGGCGTGC TCAAATTCCA CCCTGAATAT TGATTAGGAT TCAATTAGGT TTCTTTACAT GACATCCTCC TGCCTCTTTA AATATGAGAA GGCTAGGGCG TACTGAGGTT AAGGACAACC CACTTCATCC GTTCCATCAG AGTTGTCATT GATTTTAAAT TGCTTAAGGA AAGAGGGAAA TAAATTATAT TCGATAGTGT

GGCTCATTTC TGCCAGGACG AACACACAAG GTCTCTTGCA GACAAAGGAG GGCCCCGAGG GCCCCCATCA AGGGAGCCCA GACTCCTCCG TACATCCTAG GTCTATCGCA GCAGACAGTT AGTCCGACAC AGCCACCATC AGACTGGACT ATGCAGGTAT CACGGTGGGG ATGTGACCAA CTTCTCTGCT TGAAATGATT AAGAAGATCT TTGCATTCCT AGAATGGCCA GTTCTTACAT AAGTTGCTTG AGAGAGGCCA ACCACACACT TGTCTTTGAG TGCCCGGAAT GCTGTTTCCT TTAAAGCATT CCTGAAGTGT ACTTTTGTTC GAGATGACTA CCTCATTTTA

ACTGCTCCTG TGGATGAGTG GGTCCTTCCA CCATGGGGCC AGAAAGAAGG GCACCCCCAA CATCTGCCCC GCATCCATCA TGGCCACACA GCACCGTGGT AGCGGAGAGC ACTCCTGGGT CTGGGACAGA CTCAGAGCTT GGAATCTTAG TTTCTACGGG ATTTCGTGAC TTCCGGATCA CACTTCCACC TGTTTCTCTT GGTTTTTTGG CCATTTCGCC GAAGTGCAAT TTCTAATAGC AAGTGCATTA GGGATTTGTT TGACTACGGA CCAGGGCAGG GCCAGTGCTC AAAGGATGTG TTAGCACAGT GGGTGGCGCA TCTGTCTCTT ACTAAAATCA AAAGTTACAT

1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420

The probable methionine initiation codon (ATGj and the termination codon (TGAI are indicated.

Genomic structure In the mouse gene there are two exons and one intron*. By PCR sizing in human genomic DNA we know that this structure is the same in humans (unpubUshed observations).

Accession numbers Human^ Mouse

cDNA U94333 AF081789

Genomic AF074856

ClqRp

Deficiency None known.

Polymorphic variants None known.

References ^ Nepomuceno, R.R. et al. (1999) J. Immunol. 162, 3583-3589. 2 Nepomuceno, R.R. et al. (1997) Immunity 6, 119-129. 3 Guan, E. et al. (1991) J. Biol. Chem. 266, 20345-20355. "* Guan, E. et al. (1994) J. Immunol. 152, 4005-4016. 5 Tenner, A.}, et al. (1995) Immunity 3, 485-493. 6 Nepomuceno, R.R. and Tenner, A.J. (1998) J. Immunol. 160,1929-1935. 7 Lozada, C. et al. (1995) Proc. Natl Acad. Sci. USA 92, 8378-8382. * Kim, T.S. et al. (1999) In preparation.

C3a receptor

n

Robert S. Ames, Department of Molecular Biology, SmithKline Beecham Pharmaceuticals, King of Prussia, PA, USA Other names C3a anaphylatoxin receptor, C3aR, AZ3B^.

Physicochemical properties C3a receptor is a G protein-coupled receptor of 482 amino acids characterized by the presence of an unusually large second extracellular domain. M, (K) predicted 53.9 N-linked glycosylation sites 2 (9, 194)

Structure Transmembrane, G protein-coupled receptor protein with seven transmembrane domains.

Function C3a receptor functions as the cell surface receptor for the anaphylatoxin C3a, the C-terminal 77 amino acid cleavage product of the a chain of C3, but not C3a-desArg2. C3a and C4a have been reported to act through the same receptor^, however, this appears to be a unique property of the guineapig C3aR as C4a is not an agonist of the human or mouse C3aR'^'^.

Tissue distribution Transcript for the C3aR is widely distributed in peripheral tissues and the central nervous system^'*^'^. Using antibodies reactive with the second extracellular domain of the C3aR, expression has been demonstrated on neutrophils, monocytes, eosinophils, astrocytes, neurons and glial cells*-^^.

Regulation of expression Unknown.

C3a receptor

Protein sequence^'^^^ MASFSAETNS TDLLSQPWNE PPVILSMVIL SLTFLLGLPG NGLVLWVAGL TMRl

50

KMORTVNTIW FLHLTLADLL CCLSLPFSLA HLALQGQWPY GRFLCKLIPS TMR2

100

IIVLNMFASV FLLTAISLDR CLWFKPIWC

150

QNHRNVGMAC SICGCIWWA

TMR3

TMR4

CVMCIPVFVY REIFTTDNHN RCGYKFGLSS SLDYPDFYGD PLENRSLENI

2 00

VQPPGEMNDR LDPSSFQTND HPWTVPTVFQ PQTFQRPSAD SLPRGSARLT

2 50

SQNLYSNVFK PADWSPKIP

SGFPIEDHET SPLDNSDAFL STHLKLFPSA

3 00

SSNSFYESEL PQGFQDYYNL GQFTDDDQVP TPLVAITITR LWGFLLPSV TMR5

3 50

IMIACYSFIV FRMQRGRFAK SQSKTFRVAV WVAVFLVCW TPYHIFGVLS TMR6

400

LLTDPETPLG KTLMSWDHVC lALASANSCF NPFLYALLGK DFRKKARQSI

450

TMR7 QGILEAAFSE ELTRSTHCPS NNVISERNST TV

The seven transmembrane domains (TMRl-7) are underlined, N-linked glycosylation sites are indicated (N).

D

Chromosomal location Human^2. i2pi3.

cDNA sequence^'^'^ CACGAGGAGA ACTGTGGCTA CAGCTACTGT CAACTGACCT TCAGCCTTAC TGAAGATGCA TCTGCTGCCT ACGGCAGGTT TCTTCCTGCT GTCAGAATCA CTTGTGTGAT ATAGATGTGG ATCCACTAGA GGTTAGATCC AACCTCAAAC CAAGTCAAAA CCAGTGGGTT TCTCTACTCA TACCACAAGG CAACACCCCT TTATCATGAT AGTCTCAGAG GGACTCCATA GGAAAACTCT TTAATCCCTT TTCAGGGAAT

ACAGAAGAAG AGTGTGGGGA CTCAGTTTTT ACTCTCACAG TTTTTTACTG GCGGACAGTG CTCCTTGCCC CCTATGCAAG TACTGCCATT TCGCAATGTA GTGCATTCCT CTACAAATTT AAACAGGTCT TTCCTCTTTC ATTTCAAAGA TCTGTATTCT TCCTATTGAA TTTAAAGCTG TTTCCAGGAT CGTGGCAATA AGCCTGTTAC CAAAACCTTT CCACATTTTT GATGTCCTGG CCTTTATGCC TCTGGAGGCA

AGAAAGCTCA CCAGACAGGA TGAAGTTTAG CCATGGAATG GGATTGCCAG AACACAATTT TTCTCGCTGG CTCATCCCCT AGCCTGGATC GGGATGGCCT GTGTTCGTGT GGTCTCTCCA CTTGAAAACA CAAACAAATG CCTTCTGCAG AATGTATTTA GATCACGAAA TTCCCTAGCG TATTACAATT ACGATCACTA AGCTTCATTG CGAGTGGCCG GGAGTCCTGT GATCATGTAT CTCTTGGGGA GCCTTCAGTG

GCAAATTTTC CTCGTGGAGA CAATGGCGTC AGCCCCCAGT GCAATGGGCT GGTTCCTCCA CTCACTTGGC CCATCATTGT GCTGTCTTGT GCTCTATCTG ACCGGGAAAT GCTCATTAGA TTGTTCAGCC ATCATCCTTG ATTCACTCCC AACCTGCTGA CCAGCCCACT CTTCTAGCAA TAGGCCAATT GGCTAGTGGT TCTTCCGAAT TGGTGGTGGT CATTGCTTAC GCATTGCTCT AAGATTTTAG AGGAGCTCAC

TTGCCATACT CATCCAGGTG TTTCTCTGCT AATTCTCTCC GGTGCTGTGG CCTCACCTTG TCTCCAGGGA CCTCAACATG GGTATTCAAG TGGATGTATC CTTCACTACA TTATCCAGAC GCCTGGAGAA GACAGTCCCC TAGGGGTTCT TGTGGTCTCA GGATAACTCT TTCCTTCTAC CACAGATGAC GGGTTTCCTG GCAAAGGGGC GGCTGTCTTT TGACCCAGAA AGCATCTGCC GAAGAAAGCA ACGTTCCACC

TCATGACTTC CTGAAGCCTT GAGACCAATT ATGGTCATTC GTGGCTGGCC GCGGACCTCC CAGTGGCCCT TTTGCCAGTG CCAATCTGGT TGGGTGGTGG GACAACCATA TTTTATGGAG ATGAATGATA ACTGTCTTCC GCTAGGTTAA CCTAAAATCC GATGCTTTTC GAGTCTGAGC GATCAAGTGC CTGCCCTCTG CGCTTCGCCA CTTGTCTGCT ACTCCCTTGG AATAGTTGCT AGGCAGTCCA CACTGTCCCT

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560

C3a receptor

cDNA sequence CAAACAATGT AGCAGGGGCT AGCAGCGGAC CTATTGACAT AGACTTGCTG CGTTTCTGAT AACTAAGCTA GATATTTCCA

continued

CATTTCAGAA CTTAGGCAAT TTCAAAAACT CAGCATCACC AATCGGAATC TAATGCTAAA TGTGAAATAA TCATTAAATT

AGAAATAGTA CACATAGTGA GTCAAAGAAT TAGAAACTTG TCTGGGGGTT TGTAAGAATC GAGAAGCTAC TTTCCTTAGC

CAACTGTGTG AAGTTTATAA CAATCCAGCG TTAGAAATGC GGGACCCAGC ATTGTAAACA TTTGTTTTTA ATTGTCTAAG

AAAATGTGGA GAGGATGAAG GTTCTCAAAC AAATTCTCAA AAGGGCACTT TTAGTTCTAT AATGATGTTG TCAAAAAAAA

GCAGCCAACA 162 0 TGATATGGTG 1680 GGTACACAGA 1740 GCCGCATCCC 1800 AACAAACCCC 1860 TTCTATCCCA 192 0 AATATTTGTC 1980 AAAAAAAAAA 2 040

The methionine initiation codon (ATGl, and the termination codon (TGAl are indicated and the first five nucleotides of exon 2 are underlined.

Genomic structure The C3aR gene is encoded on 2 exons and contains a single -6.0 kb intron located 11 bp upstream of the ATG initiation codon^^. 1kb

Accession numbers Human^'^'7 Mouse^^'^^ Rat^5

Guinea-pig^

U62027 Z73157 U28488 U97357 U77460 U77461 U86379 AJ006402

Deficiency None known.

Polymorphic variants None known.

References Roglic, A. et al. (1996) Biochim. Biophys. Acta 1305, 39-43. Ember, J.A. et al. (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.F. eds). Marcel Decker, New York, pp. 241-284. Gorski, J.P. et al. (1979) Proc. Natl Acad. Sci. USA 10, 5299-5302. Ames, R.S. et al. (1997) Immunopharmacology 38, 87-92. Lienenklaus, S. et al. (1998) J. Immunol. 161, 2089-2093.

C3a receptor

6 Ames, R.S. et al. (1996) J. Biol. Chem. 271, 20231-20234. ^ Crass, T. et al. (1996) Eur. J. Immunol. 26, 1944-1950. « Martin, U. et al. (1997) J. Exp. Med. 186, 199-207. ^ Hawlisch, H. et al. (1998) J. Immunol. 160, 2947-2958. ^» Gasque, P. et al. (1998) J. Immunol. 160, 3543-3554. ^^ Davoust, N. et al. (1998) Glia 26, 201-211. ^2 Paral, D. et al. (1998) Eur. J. Immunol. 28, 2417-2423. ^3 Tornetta, M.A. et al. (1997) J. Immunol. 158, 5277-5282. ^^ Hsu, M.H. et al. (1997) Immunogenetics 47, 64-72. ^5 Fukuoka, Y. (1998) Biochem. Biophys. Res. Commun. 242, 663-668.

C5a receptor

CD88

Andreas Klos and Wilfried Bautsch, Medizinische Hochschule Hannover, Hannover, Germany

Physicochemical properties C5a receptor is synthesized as a single-chain molecule of 350 amino acids. Mr (K) predicted 39.3^'2 observed 43-48^ (HL-60 cells) 50-55"^ (eosinophils) N-linked glycosylation site (occupied) 1 (5)^

Structure Integral membrane G protein-coupled receptor with seven transmembrane a helices and an extracellular N-terminus. Probable intramolecular cystine bridge between C109 (TMR3) and C188 (extracellular loop 2).

Function Cellular receptor for the complement-derived anaphylatoxins C5a and C5a-desArg74. Intracellular activation of G proteins, like Gia2,3; in vitro: Ga-16^^, mediating chemotaxis, 02~-generation, granule release (histamine, interleukins, leukotrienes and enzymes) and upregulation of adhesion molecules (for a review see ref. 10).

Tissue distribution Expressed on myeloid-derived cells and cell lines (granulocytes, monocytes and monocyte-derived cell lines U-937, HL-60 and THP-1, mast cells, dendritic cells)^'^^-^^ and non-myeloid cells (vascular smooth muscle, endothelia, epithelia, glial cells)^^'^^. CAVE: Immunological cross-reactivity of monoclonal antibodies with keratinocytes^^.

Regulation of expression Upregulated in vitro by dibutyryl-cAMP, dimethylsulf oxide, 1,25dihydroxy-vitamin D in combination with prostaglandin E2, phorbol esters and IFN7 in U-937 and HL-60 cells lines^^'^'^,- in vivo in inflamed brain tissue^*.

C5a receptor

Protein sequence (human)^'^ MNSFNYTTPD YGHYDDKDTL DLNTPVDKTS NTLRVPDILA LVIFAWFLV TMRl GVLGNALWW VTAFEAKRTI NAIWFLNLAV ADFLSCLALP ILFTSIVQHH TMR2 HWPFGGAACS ILPSLILLNM YASILLLATI SADRFLLVFK PIWCQNFRGA TMR3 GLAWIACAVA WGLALLLTIP SFLYRWREE YFPPKVLCGV DYSHDKRRER TMR4 AVAIVRLVLG FLWPLLTLTI CYTFILLRTW SRRATRSTKT L K W V A W A S TMR5 FFIFWLPYQV TGIMMSFLEP SSPTFLLLNK LDSLCVSFAY INCCINPIIY TMR6 TMR7 WAGQGFQGR LRK5LP5LLR NVLTEESWR ESK5FTR5TV DTMAQKTQAV

50 100 150 2 00 2 50 300 3 50

The iV-linked glycosylation site (occupied^) is indicated (N) and the seven putative transmembrane regions (TMRl-7) are underlined. The Ugandbinding sites are at the N-terminus (21-30)2^-2'', £199^^ and R20626. The serine phosphorylation sites^^-^^ are marked (*) and italicized.

Chromosomal location Human: 19ql3.3-13.4^«.

cDNA sequence (human)^ AGGGACCTTC ACCCCTGATT AAAACTTCTA TTCCTGGTGG CGGACCATCA GCGCTGCCCA GCCTGCAGCA GCCACCATCA CGAGGGGCCG ACCATACCCT TGTGGCGTGG GTCCTGGGCT CGGACGTGGA GTGGCCAGTT CTGGAGCCAT TTTGCCTACA CAGGGCCGAC GTGGTTAGGG CAGGCAGTGT CCATTCTCCC CTCTCCTCCA TCATCCTTCC CCCCCCCCCA ATCTGGGATA GAAAGATTCT GAATCTCAAA

GATCCTCGGG ATGGGCACTA ACACGCTGCG GAGTGCTGGG ATGCCATCTG TCTTGTTCAC TCCTGCCCTC GCGCCGACCG GCTTGGCCTG CCTTCCTGTA ACTACAGCCA TCCTGTGGCC GCCGCAGGGC TCTTTATCTT CGTCACCCAC TCAACTGCTG TGCGGAAATC AGAGCAAGTC AGGCGACAGC TCTTGTTTTC TGTTGCCTGT TCATTTGCAA CACACCATCT TTTCCATATG CGCTTAAAAA AGTTCTTTGG

GAGCCCAGGA TGATGACAAG TGTTCCAGAC CAATGCCCTG GTTCCTCAAC GTCCATTGTA CCTCATCCTG CTTTCTGCTG GATCGCCTGT CCGGGTGGTC CGACAAACGG TCTACTCACG CACGCGGTCC CTGGTTGCCC CTTCCTGCTG CATCAACCCC CCTCCCCAGC ATTCACGCGC CTCATGGGCC ACTTCACTTT CTTTCCCAGA GGTGAACACT TTCCATCCCA GCAATAGGTG AATGTATTTA GACAAAACAG

GACCAGAACA GATACCCTGG ATCCTGGCCT GTGGTCTGGG TTGGCGGTAG CAGCATCACC CTCAACATGT GTGTTTAAAC GCCGTGGCTT CGGGAGGAGT CGGGAGCGAG CTCACGATTT ACCAAGACAC TACCAGGTGA CTGAATAAGC ATCATCTACG CTCCTCCGGA TCCACAGTGG ACTGTGGCCC TCGTGGGATG CTTGTCCCTC TCCTTCTAGG GGCTTTTGAA TGAACAGGGA TTTTATGGCA AAGTCCATGG

TGAACTCCTT ACCTCAACAC TGGTCATCTT TGACGGCATT CCGACTTCCT ACTGGCCCTT ACGCCAGCAT CCATCTGGTG GGGGTTTAGC ACTTTCCACC CCGTGGCCAT GTTACACTTT TCAAGGTGGT CGGGGATAAT TGGACTCCCT TGGTGGCCGG ACGTGTTGAC ACACTATGGC GATGTCCCCT GTGTTACCTT CTTTTCCAGC GAGCACCCTC AAACAAACAG ACTCAGAATA AGTTGGAAAA AGTTATCTAA

CAATTATACC CCCTGTGGAT TGCAGTCGTC CGAGGCCAAG CTCCTGCCTG TGGCGGGGCC CCTGCTCCTG CCAGAACTTC CCTGCTGCTG AAAGGTGTTG CGTCCGGCTG CATCCTGCTC GGTGGCAGTG GATGTCCTTC GTGTGTCTCC CCAGGGCTTC TGAAGAGTCC CCAGAAGACC TCCTTCCCGG AGCTAACTAA GGGACTCTTC CCACCCCCCA AAACCCGTGT CAGACT^GTA TATGTAACTG GCTCTTGTAA

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560

C5a receptor

cDNA sequence GTGAGTTAAT AACTTTGGGA CAGCATGGTG GTGCCTGTAA TGGAGGTTGT CTCTGTCTCA TTTGTACTTT TGTAAGTAAT ATCTTGCAAA ACAGGACATT CCCAGCCGTG CATTTCAAGA AAAAAGTATA GAG

continued

TTAAAAAAGA GGCTAAGGTG AAACCCCGTC TCCCAGCTAC GGTGAGCCAT AAAGCAAAGC GTTTTTAAAT GATACAGAGG ACTACAATGT CTCATCACCA TCCCTAACCC ATGTTATTCA CATGACTTTA

AAATTAGGCT GGTGGATCAC TGTACTAAAA TTGGGAGGCT GATCGCACCA AAAAACAAAA TATGCTTTCT GATCTTGTGT AGTCTCATAA CAGGGATCCC CTGGCAACCA ATGGAATCAT ATGAGGAAAA

GAGAGCAGTG CTGAGGTCAA ATACAAAAAA GAGGTGGGAG CTGCACTCTA ACAAAAACAC ATTTTGAGAT ACCCTTCACC CCAGGATATT CAGGATGCCC GGAATCCACT ATAGTATGTA TAAAAATGAA

GCTCACGCCT GAGTTCCAGA TTAACTGGGC AATTGCTCGA GCCTGGGTGA CTAAAAAACC CATTGCAAAC CAGCCTCCCC GACATTGATA ACTTCCCTCC CTCCATTTCT ACCTGTTTTG TATTGAAAAA

GTAATCCCAG CCAGGCTGGC ATGGTAGTGG ACCTTGGAGG CCGAGGGAGG TGCAGTTTTG TCAACACAAT CAATGGCAAC CAGTGAAGAT ACCCCCACAC ATAATGTTGT AGCTTAAAAA AAAAACTTTA

1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340

The initiation codon (ATG), termination codon (TAG) and polyadenylation signal (AATAAA) are indicated. The first five nucleotides of each exon are underlined (exon 2 starts at nucleotide 43, immediately after the initiation codon). Two nucleotides in the 5' UTR of the cDNA sequence differ from the reported sequence of exon 1: AG at positions - 2 3 / - 2 4 in the cDNA sequence versus TC in the genomic sequence^^.

Genomic structure Two exons separated by a ~9 kb intron sequence located between codon 1 and 2^«. 1kb

1

Accession numbers Human^'2

X57250 M62505 ^at^i,32 Y09613 AB003042 Mouse^^ S46665 Guinea-pig^"^ U86103 Canine^^ X65860 Chimpanzee^^ X97730 Orang-utan^^ X97732 Rhesus monkey^^ X97731 Lowland gorilla^^ X97733 The bovine sequence is also published^^.

Deficiency Knockout mice show defects in mucosal defence and altered Arthus reaction^^-^^.

C5a receptor

D

Polymorphic variants None known. References ^ Boulay, F. et al. (1991) Biochemistry 30, 2993-2999. 2 Gerard, N.P. and Gerard, C. (1991) Nature 349, 614-617. 3 Tardif, M. et al. (1993) J. Immunol. 150, 3534-3545. ^ Gerard, N.P. et al. (1989) J. Biol. Chem. 264, 1760-1766. 5 Pease, J.E. and Barker, M.D. (1993) Biochem. Mol. Biol. Int. 31, 719-726. 6 Rollins, T.E. et al. (1991) Proc. Natl Acad. Sci. USA 88, 971-975. ^ Offermans, S. et al. (1990) FEBS Lett. 260, 14-18. « Amatruda, T.T. et al. (1993) J. Biol. Chem. 268, 10139-10144. 9 Buhl, A.M. et al. (1993) FEBS Lett. 323, 132-134. ^0 Ember, J.A. et aL (1998) In The Human Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 241-284. 1^ Chenoweth, D.E. and Hugli, T.E. (1978) Proc. Natl Acad. Sci. USA 75, 3943-3947. ^2 Van-Epps, D.E. and Chenoweth, D.E. (1984) J. Immunol. 132, 2862-2867. ^3 Dahinden, C.A. et al. (1991) Int. Arch. Allergy Appl. Immunol. 94, 161-164. '^ Hartmann, K. et al. (1997) Blood 89, 2863-2870. ^5 Burg, M. et al. (1996) J. Immunol. 157, 5574-5581. ^6 Sozzani, S. et al. (1995) J. Immunol. 155, 3292-3295. ^7 Haviland, D.L. et al. (1995) J. Immunol. 154,1861-1869. ^« Gasque, P. et al. (1997) Am. J. Pathol. 150, 31-41. ^^ Werfel, T. et al. (1996) J. Immunol. 157, 1729-1735. 20 Rubin, J. et al. (1988) Endocrinology 123, 2424-2431. 2^ Oppermann, M. et al. (1993) J. Immunol. 151, 3785-3794. 22 Mery, L. and Boulay, F. (1994) J. Biol. Chem. 269, 3457-3463. 23 Siciliano, S.J. et al. (1994) Proc. Natl Acad. Sci. USA 91, 1214-1218. 24 Chen, Z. et al. (1998) J. Biol. Chem. 273, 10411-10419. 25 Monk, P.N. et al. (1995) J. Biol. Chem. 270, 16625-16629. 26 DeMartino, J.A. et al. (1995) J. Biol. Chem. 270, 15966-15969. 27 Giannini, E. et al. (1995) J. BioL C h e m . 2 7 0 , 1 9 1 6 6 - 1 9 1 7 2 .

28 Giannini, E. and Boulay, F. (1995) J. Immunol. 154, 4055-4064. 29 Bock, D. et al. (1997) Eur. J. Immunol. 27, 1522-1529. ^0 Gerard, N.P. et al. (1993) Biochemistry 32, 1243-1250. 3^ Rothermel, E. et al (1997) Mol. Immunol. 34, 877-886. 32 Akatsu, H. et al. (1997) Microbiol. Immunol. 41, 575-580. 33 Gerard, C. et al. (1992) J. Immunol. 149, 2600-2606. 3^ Fukuoka, Y. et al. (1998) Int. Immunol. 10, 275-283. 35 Perret, J.J. et al. (1992) Biochem. J. 288, 911-917. 36 Alvarez, V. et al. (1996) Immunogenetics 44, 446-452. 37 Hopken, U.E. et al. (1996) Nature 383, 86-89. 3« Bozic, C.R. et al. (1996) Science 273, 1722-1725. 39 Hopken, U.E. et al. (1997) J. Exp. Med. 186, 749-756.

CR3 Yu Xia and Gordon D. Ross, Department of Pathology, University of Louisville, Louisville, KY, USA

Other names Complement receptor type 3, Mac-1, Mol, OKM-1, C3bi-receptor, CD lib/CD 18, aMjSi-integrin, LeuCAM, leukocyte integrin.

Physicochemical properties CR3 is made up of two non-covalently associated subunits encoded by two genes. Both a and /3 subunits are type I membrane glycoproteins with leader sequences of 16 and 22 amino acids respectively. Amino acids Mr (K) predicted observed N-linked glycosylation sites

asubunit (CDllb) 17-1153 125.6 160 19(86,240,391,469, 693, 697, 735, 802, 881,901,912,941, 947, 919, 994, 1022, 1045, 1051, 1076)

j3subunit(CD18) 23-769 82.6 95 6(50, 116,212,254, 501, 642)

Structure CR3 is one of four members of the /32-integrin family that share a common /3 subunit (j82-integrin or CD 18) linked non-covalently to one of four a subunits forming a membrane surface glycoprotein heterodimer. There are no disulfide links between a and j8 subunits. The I-domain of C D l l b adopts a classic Rossmann a//3 fold, with seven hydrophilic a helices surrounding five parallel and one short antiparallel hydrophobic ^ sheet in the middle, and contains an unusual Mg^'-ZMn^'^ coordination site on its surface at the top of the j3 sheet. The metal-binding site, named metal ion-dependent adhesion site (MIDAS) plays a direct role in protein ligand binding. Some studies have suggested that an I-domain-like structure in CD 18 contributes part of the MIDAS site and increases its affinity for protein ligands^'^. The C-terminal domain of CDllb contains an unusual lectin site responsible both for cytotoxic recognition of exogenous polysaccharides on microbial pathogen cell walls (e.g. j3-glucan, j3-oligomannan, iV-acetylD-glucosamine)^, as well as for CR3-complex formation with various endogenous membrane glycoproteins (e.g. CD14, CD16, CD59, CD87)^'^. Molecular mapping studies have shown that the lectin site is contained somewhere in a broad region of CDllb located C-terminal to the Idomain and divalent cation-binding repeats sequence. The lectin site is unusual because this region of CDllb contains no C-type lectin consensus sequences and lectin activity does not require divalent cations^'^.

Function CR3 has several seemingly unrelated functions that can be divided into intercellular adhesion-related events and cytotoxic receptor functions. CR3 and LFA-1 are the major adhesion molecules used by phagocytes for directed migration through the vascular endothelium into sites of inflammation*'^. In its cytotoxic receptor function, phagocyte CR3 stimulates phagocytosis, a respiratory burst, and degranulation when it binds to iC3b-opsonized bacteria or yeast CR3. However, when CR3 binds instead to iC3b-opsonized erythrocytes or tumour cells, no response is stimulated and only adhesion occurs. Such iC3b-mediated adhesion occurs with native CR3 on resting cells through a low-affinity MIDAS within the I-domain of CD lib. Stimulation of CR3 for ingestion, cytotoxic degranulation or cytokine secretion requires additional ligation of CR3 via its lectin site to polysaccharides present on microbial surfaces. Natural killer (NK) cell CR3 functions similarly to phagocyte CR3 in its requirement for dual ligation of the lectin site and I-domain for triggering extracellular cytotoxicity^. NK cells use CR3 for recognition and cytotoxicity of yeast hyphae^^. The lectin site of CR3 also functions to promote cell surface transmembrane signalling complexes between CR3 and endogenous membrane glycoproteins that are attached only via glycosylphosphatidylinositol anchors, and thus have no other mechanism for signalling^. This allows CD 14 and CD 16 to prime and trigger neutrophil functions in a manner similar to exogenous microbial polysaccharides. In addition, the lectin site-dependent complex between CR3 and CDS 7 has been shown to be essential for development of the high-affinity MIDAS and neutrophil adhesion to endothelial cell ICAM-P^ This suggests that signalling through the lectin site may be essential for the full range of both adhesion and cytotoxic functions mediated by CR3. Protein tyrosine kinases (PTKs) play an important role in CR3 priming and activation for cytotoxicity, phagocytosis, respiratory burst or firm adhesion. Four different PTKs (fyn, lyn, hck and fgr) have been implicated in CR3 signalling^2-i4^ ^^(j these have been found in association with CR3 in large complexes that included also LFA-1 and the urokinase plasminogen activator receptor (uPA-R or CDS 7)^^. Several targets that are tyrosine phosphorylated have been identified including paxillin^^, Vav and Vavp21(ras)^^. CR3 signalling also involves activation of phospholipase Ar^^^'^^ and phosphatidylinositol 3-kinase2^. C-terminal domain

I-domain

1

y

\

1 -

.•:•:

i-domainlike region

1

1

"a

asubunit(CDIIb)

Cysteine-rich repeat region

1

1 1 1 1 1- H 12 3 4

Psubunit(CD18)

Tissue distribution CR3 is expressed on all myeloid lineage cells, but the amount of CR3 is diminished as monocytes mature into macrophages or dendritic cells, and may be undetectable on terminally differentiated cells. Among lymphoid cells, CR3 is expressed on the majority of NK cells but is restricted on B cells to the CD5+ subset. CR3 is undetectable on the majority of resting T cells.

Regulation of expression With monocytes, neutrophils, eosinophils and NK cells, most CR3 is stored in cytoplasmic granules2^'22 ^md various cellular activation events cause a rapid mobilization of the granule pool of CR3 to the membrane surface and a 3-10-fold increase in external membrane-bound CR3. Two major transcriptional start sites exist in the CD l i b gene located 90 bp and 54 bp upstream from the translational initiation methionine. The CD 18 gene has multiple transcriptional start sites spread out over a region of -45 nucleotides, with one or two major initiation sites contributing >50% of the transcripts. Transcription of CD l i b and CD 18 is regulated coordinately and is induced hormonally by retinoic acid.

Protein sequences a subunit^^ MALRVLLLTA PQEIVAANQR QLLACGPTVH lAFLIDGSGS TFKEFQNNPN KILWITDGE NTIASKPPRD SQEGFSAAIT DAYLGYAAAI TQIGAYFGAS RARWQCDAVL GAVYLFHGTS DLTVGAQGHV EVRVCLHVQK RQTQVLGLTQ PVLAEDAQRL FNVTVTVRND ASSTEVSGAL NVTSENNMPR VMQHQYQVSN TKERLPSHSD NLSFDWYIKT FEVPNPLPLI EPQ

LTLCHGFNLD GSLYQCDYST QTCSENTYVK IIPHDFRRMK PRSLVKPITQ KFGDPLGYED HVFQVNNFEA SNGPLLSTVG ILRNRVQSLV LCSVDVDSNG YGEQGQPWGR GSGISPSHSQ LLLRSQPVLR STRDRLREGQ TCETLKLQLP FTALFPFEKN GEDSYRTQVT KSTSCSINHP TNKTEFQLEL LGQRSLPISL FLAELRKAPV SHNHLLIVST VGSSVGGLLL

TENAMTFQEN GSCEPIRLQV GLCFLFGSNL EFVSTVMEQL LLGRTHTATG VIPEADREGV LKTIQNQLRE SYDWAGGVFL LGAPRYQHIG STDLVLIGAP FGAALTVLGD RIAGSKLSPR VKAIMEFNPR IQSWTYDLA NCIEDPVSPI CGNDNICQDD FFFPLDLSYR IFPENSEVTF PVKYAVYMW VFLVPVRLNQ VNCSIAVCQR AEILFNDSVF LALITAALYK

ARGFGQSWQ PVEAVNMSLG RQQPQKFPEA KKSKTLFSLM IRKWRELFN IRYVIGVGDA KIFAIEGTQT YTSKEKSTFI LVAMFRQNTG HYYEQTRGGQ VNGDKLTDVA LQYFGQSLSG EVARNVFECN LDSGRPHSRA VLRLNFSLVG LSITFSFMSL KVSTLQNQRS NITFDVDSKA TSHGVSTKYL TVIWDRPQVT IQCDIPFFGI TLLPGQGAFV LGFFKRQYKD

LQGSRVWGA LSLAATTSPP LRGCPQEDSD QYSEEFRIHF ITNGARKNAF FRSEKSRQEL GSSSSFEHEM NMTRVDSDMN MWESNANVKG VSVCPLPRGQ IGAPGEEDNR GQDLTMDGLV DQWKGKEAG VFNETKNSTR TPLSAFGNLR DCLWGGPRE QRSWRLACES SLGNKLLLKA NFTASENTSR FSENLSSTCH QEEFNATLKG RSQTETKVEP MMSEGGPPGA

50 100 150 200 2 50 300 3 50 400 450 500 550 600 650 7 00 7 50 800 850 900 9 50 1000 1050 1100 1150

i ^9 WM

|3 subunit^^

B|fl

MLGLRPPLLA LVGLLSLGCV LSQECTKFKV SSCRECIESG PGCTWCQKLN

H

FTGPGDPDSI RCDTRPQLLM RGCAADDIMD PTSLAETQED HNGGQKQLSP 100

llll llil ||9 ^ 1 ^ H ^ 1 ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H

QKVTLYLRPG GDLLRALNEI PFAFRHVLKL GWRNVTRLLV YPSVGQLAHK NWHLIKNAY DGVQINVPIT DQSRDRSLCH NNSIICSGLG GPGRGLCFCG CECHSGYQLP QLSNNPVKGR lAAIVGGTVA NPLFKSATTT

QAAAFNTVTFR TESGRIGFGS TNNSNQFQTE FATDDGFHFA LAENNIQPIF NKLSSRVFLD FQVKVTATEC GKGFLECGIC DCVCGQCLCH KCRCHPGFEG LCQECPGCPS TCKERDSEGC GIVLIGILLL VMNPKFAES

RAKGYPIDLY FVDKTVLPFV VGKQLISGNL GDGKLGAILT AVTSRMVKTY HNALPDTLKV IQEQSFVIRA RCDTGYIGKN TSDVPGKLIY SACQCERTTE PCGKYISCAE WVAYTLEQQD VIWKALIHLS

The leader sequences are underlined glycosylation sites are indicated (N).

YLMDLSYSML NTHPDKLRNP DAPEGGLDAM PNDGRCHLED EKLTEIIPKS TYDSFCSNGV LGFTDIVTVQ CECQTQGRSS GQYCECDTIN GCLNPRRVEC CLKFEKGPFG GMDRYLIYVD DLREYRRFEK

and

DDLRNVKKLG CPNKEKECQP MQVAACPEEI NLYKRSNEFD AVGELSEDSS THRNQPRGDC VLPQCECRCR QELEGSCRKD CERYNGQVCG SGRGRCRCNV KNCSAACPGL ESRECVAGPN EKLKSQWNND

the putative

50 150 200 250 300 350 400 450 500 550 600 650 700 750

N-linked

Protein modules 1-16 167-353 453-614 1109-1134 1135-1153

Leader peptide I-domain Divalent cation-binding region Transmembrane domain Cytoplasmic domain

exon 1/2 exon 6-9 exon 13-15 exon 29/30 exon 30

p subunit24'26 1-22 Leader peptide exon 2/3 23-700 Extracellular domain exon 3-15 701-723 Transmembrane domain exon 15 724-769 Cytoplasmic domain exon 15/16 The region from C445 to C631 is cysteine rich, with four tandem repeats of an eight-cysteine motif.

Chromosomal location a subunit^^'^^'^* 16pll-pl3.1 (CDlla, CDllb and CDllc are clustered between bands p l l and pi3.1 on chromosome 16; CDlld is arranged in tandem with CDllc and separated by more than 11.5 kb). p subunit Human29.30: 21q22.3. Telomere ... PFKL ... CD 18 ... CRYAl ... Centromere Mouse^^: chromosome 10.

cDNA sequences a subunit^^ GAATTCCGTG CTCCTTCCAG TTCAACTTGG AGCGTGGTCC GCCAACCAAA CGCCTGCAGG ACCAGCCCCC ACGTATGTGA TTCCCAGAGG GGCTCTGGTA ATGGAGCAAT CGGATTCACT CCAATAACGC GAGCTGTTTA ACGGATGGAG AGAGAGGGAG CGCCAAGAGC AACTTTGAGG GGTACTCAGA GCTGCCATCA GGAGTCTTTC TCAGACATGA CAAAGCCTGG CAGAACACTG TTCGGGGGCT ATCGGGGCCC CCCAGGGGGC CCCTGGGGCC ACGGACGTGG CACGGAACCT CTCTCTCCCA GATGGACTGG CCAGTACTGA TTTGAGTGTA CATGTCCAGA TATGACCTGG AACAGCACAC CTACAGTTGC TCTCTGGTGG GCTCAGAGAC TGCCAGGATG GGGCCCCGGG ACACAGGTCA AACCAGCGCT TCTGGGGCCT GAGGTCACCT CTCCTCAAGG CAACTGGAGC ACTAAATATC CAGGTCAGCA CGGCTGAACC AGTACGTGCC AAGGCCCCCG

GTTCCTCAGT CCATGGCTCT ACACTGAAAA AGCTTCAGGG GGGGCAGCCT TCCCCGTGGA CTCAGCTGCT AAGGGCTCTG CCCTCCGAGG GCATCATCCC TAAAAAAGTC TTACCTTCAA AGCTGCTTGG ACATCACCAA AAAAGTTTGG TCATTCGCTA TTAATACCAT CTCTGAAGAC CAGGAAGTAG CCTCTAATGG TATATACATC ATGATGCTTA TTCTGGGGGC GCATGTGGGA CCCTCTGCTC CCCATTACTA AGAGGGCTCG GCTTTGGGGC CCATTGGGGC CAGGATCTGG GGCTCCAGTA TAGACCTGAC GAGTCAAGGC ATGATCAGGT AGAGCACACG CTCTGGACTC GCAGACAGAC CGAATTGCAT GAACGCCATT TCTTCACAGC ACCTCAGCAT AGTTCAACGT CCTTCTTCTT CACAGCGATC TGAAGAGCAC TTAATATCAC CCAATGTGAC TGCCGGTGAA TCAACTTCAC ACCTGGGGCA AGACTGTCAT ACACCAAGGA TGGTGAACTG

GGTGCCTGCA CAGAGTCCTT CGCAATGACC ATCCAGGGTG CTACCAGTGC GGCCGTGAAC GGCCTGTGGT CTTCCTGTTT GTGTCCTCAA ACATGACTTT CAAAACCTTG AGAGTTCCAG GCGGACACAC CGGAGCCCGA CGATCCCTTG CGTCATTGGG CGCATCCAAG CATTCAGAAC CAGCTCCTTT CCCCTTGCTG AAAGGAGAAA CTTGGGTTAT ACCTCGATAT GTCCAACGCT CGTGGACGTG CGAGCAGACC GTGGCAGTGT AGCCCTAACA CCCAGGAGAG CATCAGCCCC TTTTGGTCAG TGTAGGAGCC AATCATGGAG GGTGAAAGGC GGATCGGCTA CGGCCGCCCA ACAGGTCTTG CGAGGACCCA GTCTGCTTTC CTTGTTTCCC CACCTTCAGT GACAGTGACT CCCGCTTGAC CTGGCGCCTG CAGCTGCAGC GTTTGATGTA CAGTGAGAAC ATATGCTGTC GGCCTCAGAG GAGGAGCCTC ATGGGACCGC GCGCTTGCCC CTCCATCGCT

ACCCCTGGTT CTGTTAACAG TTCCAAGAGA GTGGTTGGAG GACTACAGCA ATGTCCCTGG CCCACCGTGC GGATCCAACC GAGGATAGTG CGGCGGATGA TTCTCTTTGA AACAACCCTA ACGGCCACGG AAGAATGCCT GGATATGAGG GTGGGAGATG CCGCCTCGTG CAGCTTCGGG GAGCATGAGA AGCACTGTGG AGCACCTTCA GCTGCCGCCA CAGCACATCG AATGTCAAGG GACAGCAACG CGAGGGGGCC GATGCTGTTC GTGCTGGGGG GAGGACAACC TCCCATAGCC TCACTGAGTG CAGGGGCACG TTCAATCCCA AAGGAAGCCG AGAGAAGGAC CATTCCCGCG GGGCTGACCC GTGAGCCCCA GGGAACCTCC TTTGAGAAGA TTCATGAGCC GTGAGAAATG CTGTCCTACC GCCTGTGAGT ATAAACCACC GACTCTAAGG AACATGCCCA TACATGGTGG AATACCAGTC CCCATCAGCC CCCCAGGTCA TCTCACTCCG GTCTGCCAGA

CACCTCCTTC CCTTGACCTT ACGCAAGGGG CCCCCCAGGA CAGGCTCATG GCCTGTCCCT ACCAGACTTG TACGGCAGCA ACATTGCCTT AGGAGTTTGT TGCAGTACTC ACCCAAGATC GCATCCGCAA TTAAGATCCT ATGTCATCCC CCTTCCGCAG ATCACGTGTT AGAAGATCTT TGTCTCAGGA GGAGCTATGA TCAACATGAC TCATCTTACG GCCTGGTAGC GCACCCAGAT GCAGCACCGA AGGTGTCCGT TCTACGGGGA ACGTAAATGG GGGGTGCTGT AGCGGATAGC GGGGCCAGGA TGCTGCTGCT GGGAAGTGGC GAGAGGTCAG AGATCCAGAG CCGTCTTCAA AGACTTGTGA TTGTGCTGCG GGCCAGTGCT ATTGTGGCAA TGGACTGCCT ATGGTGAGGA GGAAGGTGTC CTGCCTCCTC CCATCTTCCC CTTCCCTTGG GAACCAACAA TCACCAGCCA GGGTCATGCA TGGTGTTCTT CCTTCTCCGA ACTTTCTGGC GAATCCAGTG

CAGGTTCTGG ATGTCATGGG CTTCGGGCAG GATAGTGGCT CGAGCCCATC GGCAGCCACC CAGTGAGAAC GCCCCAGAAG CTTGATTGAT CTCAACTGTG TGAAGAATTC ACTGGTGAAG AGTGGTACGA AGTTGTCATC TGAGGCAGAC TGAGAAATCC CCAGGTGAAT TGCGATCGAG AGGCTTCAGC CTGGGCTGGT CAGAGTGGAT GAACCGGGTG GATGTTCAGG CGGCGCCTAC CCTGGTCCTC GTGCCCCTTG GCAGGGCCAA GGACAAGCTG TTACCTGTTT AGGCTCCAAG CCTCACAATG CAGGTCCCAG AAGGAATGTA AGTCTGCCTC TGTTGTGACT TGAGACAAAG GACCCTGAAA CCTGAACTTC GGCGGAGGAT TGACAACATC CGTGGTGGGT CTCCTACAGG CACACTCCAG CACCGAAGTG GGAAAACTCA AAACAAACTG AACCGAATTC TGGGGTCTCC GCATCAATAT GGTGCCCGTC GAACCTCTCG TGAGCTTCGG TGACATCCCG

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180

cDNA sequences TTCTTTGGCA TACATCAAGA GATTCCGTGT AAAGTGGAGC GGACTGCTGC CAATACAAGG CTTCCCGACA CCAGGCTGCT TTTGTGTGTG AGTGTGTGCA CAAGTATGTG TGTGCGAGTG TCTCTGGCGT GCTCCCTTGT CCTGTGGGTG CCCGTCGCCT AGAAAAGCCG TGCCCACTGA TAATTTTTTG AGTGAAAAGT TTTTGAGGTT ACCACACACA CTGTATCTTG TACTTTTTCA ATTTAACCAG AATAAATCAA

TCCAGGAAGA CCTCGCATAA TCACCCTGCT CGTTCGAGGT TCCTGGCCCT ACATGATGAG GAGCTGCCTC GGACACGTCG TGCAAGTGTG CGTGTGCGTG AGTGTGTGCA TGTGCATGTG GTGGGTAGGT GCGTGGGTAA AAGAGAGAGG GCGAGCCTGC TGGGTGGAAC GGAATCATGA GATGGATAAG CTCCCTTTCC TCCTTCAGAC TACACACACA CTTTTTTTCA TTCTTTTATA TCTTCTTTTG ATATATGTCA

continued ATTCAATGCT CCACCTCCTG GCCGGGACAG CCCCAACCCC CATCACCGCC TGAAGGGGGT TCGGTGGCCA GACAGCGAAG TATGTGCGTG TGCGTGCATG GTGTGTGTGC TGTGCTCAGG GACGGCAGCG GCCGCTGCTG GAAACACAGC GGCCTGCTGG CAGGAGCCTC AGCTTCCTTT CCTGTCTATG AGATATTCAA AGATTCCAGG CAAGCTTTTT CCAATATTTC CCGCTGCATA ATATACTATT AAAAAAAAAA

ACCCTCAAAG ATCGTGAGCA GGGGCGTTTG CTGCCGCTCA GCGCTGTACA CCCCCGGGGG GCAGGACTCT TATCCCCGAC TGTGCGAGTG TGCACTCGCA GTGTGTCCAT GGCTGTGGCT TAGCCTCTCC GGTTTTCCTC AGCATCTCTC AGCCTGCGCA CTCCACACCA CTGGATTCAT GTACAAAAAT GTCACCTCCT CGATGTGCAA TACACAAATG TCAGACATCG GTATTCCATT TTCATCTCTT AAAA

GCAACCTCTC CAGCTGAGAT TGAGGTCCCA TCGTGGGCAG AGCTCGGCTT CCGAACCCCA GCCCAGACCA AGGACGGGCT TGTGCAAGTG CGCCCATGTG GTGTGTGCAG CACGTGTGTG GGCAGAAGGG CGGGAGAGGG CACTGAAAGA GCTTGGATGG GCGCTGATGC TTATTATTTC CACAAGGCAT TAAAGGTAGT GTGTATGCAC GTAGCATACT GTTCATATTA GTGTGAGTGT GTTATTGCAT

GTTTGACTGG CTTGTTTAAC GACGGAGACC CTCTGTCGGG CTTCAAGCGG GTAGCGGCTC CACGTAGCCC TGGGCTTCCA TCTGTGTGCA TGAGTGTGTG TGTGTGCATG ACTCAGAGTG AACTGCCTGG GACGGTCAAT AGTGGGACTT ATACTCCATG CCAATAAAGA AATGTGACTT TCAAGTGTAC CAAGATTGTG GTGTGCACAC TTATATTGGT AGACATAAAT ACCATAATGT CTGCTGAGTT

3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620 4680

p subunit24'^2 ACAGGAAGTG GGGCAGACTG ACCGAGGGAC CGGGTGCGTC CGAGTCGGGG TGACTCCATT CATCATGGAC GCTGTCCCCA GACCTTCCGG CTCCATGCTT CAACGAGATC GCCGTTCGTG GTGCCAGCCC TCAGACCGAG GGACGCCATG GCTGCTGGTG CATCCTGACC CGAATTCGAC GCCCATCTTC CCCCAAGTCA GAATGCTTAC CCTGAAAGTC AGGTGACTGT CACAGAGTGC GACCGTGCAG CCTCTGCCAT

TCAGGACTTT GTAGCAAAGC ATGCTGGGCC CTCTCTCAGG CCCGGCTGCA CGCTGCGACA CCCACAAGCC CAAAAAGTGA CGGGCCAAGG GATGACCTCA ACCGAGTCCG AACACGCACC CCGTTTGCCT GTCGGGAAGC ATGCAGGTCG TTTGCCACTG CCCAACGACG TACCCATCGG GCGGTGACCA GCCGTGGGGG AATAAACTCT ACCTACGACT GATGGCGTGC ATCCAGGAGC GTCCTTCCCC GGCAAGGGCT

ACGACCCGCG CCTCCAGCTG AGGTTTCTAG ACGTGACCCA

60

CCCCACGCCC AGCCAGGAGC ACCGCCGAGG ACTCCAGCAC

12 0

TGCGCCCCCC ACTGCTCGCC CTGGTGGGGC TGCTCTCCCT

ATGACGGCTT CCATTTCGCG GGCGACGGAA AGCTGGGCGC

180 240 300 360 420 480 540 600 660 720 780 840 900 960

GCCGCTGTCA CCTGGAGGAC AACTTGTACA AGAGGAGCAA

1020

TGGGCCAGCT GGCGCACAAG CTGGCTGAAA ACAACATCCA

1080

GTAGGATGGT GAAGACCTAC GAGAAACTCA CCGAGATCAT

1140

AGCTGTCTGA GGACTCCAGC AATGTGGTCC ATCTCATTAA

1200

CCTCCAGGGT CTTCCTGGAT CACAACGCCC TCCCCGACAC

1260

CCTTCTGCAG CAATGGAGTG ACGCACAGGA ACCAGCCCAG

1320

AGATCAATGT CCCGATCACC TTCCAGGTGA AGGTCACGGC

1380

AGTCGTTTGT CATCCGGGCG CTGGGCTTCA CGGACATAGT

1440

AGTGTGAGTG CCGGTGCCGG GACCAGAGCA GAGACCGCAG

1500

TCTTGGAGTG CGGCATCTGC AGGTGTGACA CTGGCTACAT

1560

AGTGCACGAA GTTCAAGGTC AGCAGCTGCC GGGAATGCAT CCTGGTGCCA GAAGCTGAAC TTCACAGGGC CGGGGGATCC CCCGGCCACA GCTGCTCATG AGGGGCTGTG CGGCTGACGA TCGCTGAAAC CCAGGAAGAC CACAATGGGG GCCAGAAGCA CGCTTTACCT GCGACCAGGC CAGGCAGCAG CGTTCAACGT GCTACCCCAT CGACCTGTAC TATCTGATGG ACCTCTCCTA GGAATGTCAA GAAGCTAGGT GGCGACCTGC TCCGGGCCCT GCCGCATTGG CTTCGGGTCC TTCGTGGACA AGACCGTGCT CTGATAAGCT GCGAAACCCA TGCCCCAACA AGGAGAAAGA TCAGGCACGT GCTGAAGCTG ACCAACAACT CCAACCAGTT AGCTGATTTC CGGAAACCTG GATGCACCCG AGGGTGGGCT CCGCCTGCCC GGAGGAAATC GGCTGGCGCA ACGTCACGCG

c D N A sequences

continued

TGGGAAAAAC CCGGAAGGAC CCTGTGCCAC CACCATCAAC CTTCTGCGGG GACCACTGAG CTGCAACGTA CTGCCCCTCA CCCCTTTGGG GAAGGGCAGG GCAGCAGGAC AGGCCCCAAC TCTCCTGCTG CTTTGAGAAG CACCACGACG GCCGTCAGGA TTGAGGATGT TGGCCGGCCG TCTTTGCATG CTGTGCAAGT TGTCAGGGTA AAAAATAAAA

AGACACAGGG TCATCTGCTC TCCCCGGCAA ACAACGGCCA GCCACCCGGG ACCCGCGGCG ATTCAGGCTA AGTACATCTC GCGCGGCGTG AGAGGGACTC GCTACCTCAT TCGTCGGGGG AGGCTCTGAT AGTCCCAGTG CCAAGTTTGC CTGCCCCATC CCAGAAATCC GGGGCTCGTC GAGGGAGGGC GTCTGATTAA TCCCATTAAT GGCTGTCCAT

TGTGAGTGCC AACAACTCCA ACCAGCGACG TGTGAGCGCT AAGTGCCGCT GGCTGCCTGA TGCGAGTGCC CCCTGTGGCA AAGAACTGCA ACCTGCAAGG GGGATGGACC ATCGCCGCCA GTCATCTGGA GAGAAGCTCA GTCATGAACC CCCACCATGT CACCAATTAA GGTGCTTCTG GAGACTTGAG CAGGACATCA TAAAATGACA CTTCAATACA

CCGGAGCAGC AGGGCTGGGG GCTGATATAC GGTCTGCGGC CTTTGAGGGC TGTTGAGTGT CCAGCTGCCT CTGCGCCGAG TCCGGGCCTG AGAGGGCTGC CTATGTGGAT CACCGTGGCA CCACCTGAGC GAACAATGAT TGAGAGTTAG ACGCGGCCGA AGTTATTTTC GGGGGGACAG TTGAGGTTGG AGGTGGTGCC TATATTGTTA GGAAAAAAAA

CAGGAGCTGG GACTGTGTCT GGGCAGTACT GGCCCGGGGA TCAGCGTGCC AGTGGTCGTG CTGTGCCAGG TGCCTGAAGT CAGCTGTCGA TGGGTGGCCT GAGAGCCGAG GGCATCGTGC GACCTCCGGG AATCCCCTTT GAGCACTTGG GACATGGCTT CGCCCTCAAA CTCCACTCTG TGAGGTTAGG AATTTATTTA ATCAATCACG AAAAAAAAAA

AAGGAAGCTG GCGGGCAGTG GCGAGTGTGA GGGGGCTCTG AGTGCGAGAG GCCGGTGCCG AGTGCCCCGG TCGAAAAGGG ACAACCCCGT ACACGCTGGA AGTGTGTGGC TGATCGGCAT AGTACAGGCG TCAAGAGCGC TGAAGACAAG GCCACAGCTC ATGACAGCCA ACTGGCACAG TGCGTGTTTC CATTTAAACT TGTATAGAAA

1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820

The methionine initiation codons (ATGI, the termination codons (TAG); and the probable polyadenylation signals (AATAAA) are indicated. The first five nucleotides in each exon are underlined to indicate intron-exon boundaries. The p subunit has multiple initiation sites for transcription as well as multiple polyadenylation sites^^. The sequence shown is derived from a cDNA with the longest 5' untranslated region and the most frequent polyadenylation site.

Genomic structure a subunit^^: The gene spans 55 kb with 30 exons. CD11b 2kb

H

w—^+-^

m ^ —

15

30

p subunit^^: The gene spans approximately 40 kb and is organized into 16 exons. CD18 h

2kb -H

1

\-{-{-\

Accession numbers Human

a subunit23»33

J03925 X07421 Ml 8044

9 I I I

16 III II

p subunit^^'^'^

Mouse

a subunit^^ P subunit^<^^*

Pig (unpublished) Bovine

a subunit P subunit p subunit^^

Chicken

P subunit^'^

M15395 M95293 M38701 X54481 X07640 X14951 M31039 U40072 U13941 M81233 Y12672 X71786

Deficiency^^ There has been one report of a patient with an a subunit defect associated with a marked deficiency in iC3b-dependent rosettes, a selective loss of a major epitope of the I-domain of CD l i b , and systemic lupus erythematosus^^ Defects of the P subunit give rise to the autosomal recessive disorder of leukocyte adhesion deficiency (LAD). Patients with LAD fail to express all four ap integrin complexes, namely LFA-1, CR3, CR4 and a^i^i on their leukocytes, which show abnormalities in a wide variety of iC3b-receptor and adhesion-dependent functions. Depending on the nature of the defect, the severity of LAD varies. Patients with the severe form of the defect (complete absence of CD 18 expression) die at early infancy whereas those with a mild defect (^10% of normal CD 18 expression) have been known to survive to adulthood. A similar deficiency has also been found in the Holstein cattle'^-^''^'^ and the Irish Setter dogs^^. H u m a n mutations identified in the p subunit: T132 to A; M l to K Deletion 249-258 resulting in frameshift G512toA;D128toN T576 to C; L149 to P G635 to A; G169 to R C663toT;P178toL A717toT;K196toT Intronic mutation, 4 extra amino acids PSSQ at exon 6/7 junction, linked with mutation R586W G980 to A; G284 to S Intronic mutation, multiple splicing variants at exon 7/8 junction Intronic mutation, exon 9 deletion A1182 to G; N351 to S, spontaneous mutation Deletion 1385-1386 resulting in a frameshift C1732toA;C534tostop Intronic mutation, exon 13 deletion

one one one one two two one

patient patient patient patient unrelated patients unrelated patients patient

two unrelated patients five unrelated patients one one one one one one

patient patient patient patient patient patient

C1886 to T; R586 to W, linked to splice mutation at exon Sjl junction, probably does not result in LAD CI907 to T; R593 to C Deletion 2220 resulting in a frameshift

two unrelated patients three unrelated patients two unrelated patients

Polymorphic variants a subunit^^'^^ CAG(1570-1572); Q500: expression depends on different splicing events T2969C; L966P

(3 subunit (only silent variants found)^^ A108C; 5' untranslated region G154T; L8 G292A; P54 A949G; G273 C1231A;V367 C1453T; V441 C1672T;C514 Two RFLP variants within intron 1: A 29-nucleotide inexact repeats with 3, 5, 6 and 11 repeats detected; an insertion/deletion of -1.2 kb 3' to the VNTR polymorphism^^. CDllb"'^*''^^ and CD18"^^'^^ knockout mice have been reported by several laboratories.

References ^ Lee, J.-O. et al. (1995) Cell 80, 631-638. 2 Goodman, T.G. and Bajt, MX. (1996) J. Biol. Chem. 271, 23729-23736. 3 Vetvicka, V. et al. (1996) J. Clin. Invest. 98, 50-61. ^ Petty, H.R. and Todd III, R.F. (1996) Immunol. Today 17, 209-212. 5 Todd III, R.F. and Petty, H.R. (1997) J. Lab. Clin. Med. 129, 492-498. 6 Thornton, B.P. et al. (1996) J. Immunol. 156,1235-1246. 7 Xia, Y. and Ross, G.D. (1999) J. Immunol. 162, 7285-7293. s Hogg, N. (1992) Immunol Today 13,113-115. 9 Smith, C.W. (1993) Semin. Hematol. 30 suppl. 4, 45-55. ^0 Forsyth, C.B. and Mathews, H.L. (1996) Cell. Immunol. 170, 91-100. ^^ May, A.E. et al. (1998) J. Exp. Med. 188,1029-1037. ^2 Berton, G. et al (1994) J. Cell Biol 126, 1111-1121. ^3 Yan, S.R. et al. (1995) Circ. Shock 45, 297-311. ^^ Zaffran, Y. et al. (1995) J. Immunol. 154, 3488-3497. ^5 Bohuslav, J. et al. (1995) J. Exp. Med. 181, 1381-1390. ^6 Fuortes, M. et al. (1994) J. Cell Biol. 127, 1477-1483. ^^ Zheng, L. et al. (1996) Proc. Natl Acad. Sci. USA 93, 8431-8436. ^« Goldman, R. et al. (1994) Biochim. Biophys. Acta Mol. Cell Res. 1222, 265-276. ^9 Kazan, I. et al. (1997) Biochem. J. 326, 867-876. 20 Jones, S.L. et al. (1998) J. Biol. Chem. 273, 10556-10566.

2i 22 23 24 25 26 2^ 2s 29 3» 3^ 32 33 34 35 36 37 3« 39 40 4^

Lacal, P. et al. (1988) Biochem. Biophys. Res. Commun. 154, 641-647. Singer, I.I. et al. (1989) J. Cell Biol. 109, 3169-3182. Corbi, A.L. et al. (1988) J. Biol. Chem. 263,12403-12411. Kishimoto, T.K. et al. (1987) Cell 48, 681-690. Fleming, J.C. et al. (1993) J. Immunol. 150, 480-490. Weitzman, J.B. et al. (1991) FEBS Lett. 297, 97-103. Wong, D.A. et al. (1996) Gene 171, 291-294. Shelley, C.S. et al. (1998) Genomics 49, 334-336. Corbi, A.L. et al. (1988) J. Exp. Med. 167,1597-1607. Gardiner, K. et al. (1988) Somat. Cell Mol. Genet. 14, 623-637. Petersen, M.B. et al. (1991) Genomics 9, 407-419. Lopez-Cabrera, M. et al. (1993) J. Biol. Chem. 268, 1187-1193. Arnaout, M.A. et al. (1988) J. Cell Biol. 106, 2153-2158. Agura, E.D. et al. (1992) Blood 79, 602-609. Pytela, R. (1988) EMBO J. 7, 1371-1378. Wilson, R.W. et al. (1989) Nucleic Acids Res. 17, 5397-5397. Zeger, D.L. et al. (1990) Immunogenetics 31, 191-197. Kofler, R. (1991) Immunogenetics 33, 11-11. Shuster, D.E. et al. (1992) Gene 114, 267-271. Bilsland, C.A.G. and Springer, T.A. (1994) J. Leukocyte Biol. 55, 501-506. Anderson, D.C. (1994) In The Metabolic Basis of Inherited Diseases, 7th edn (Shriver, C.R., Beaudet, A.L., Sly, W.S. and Valle, D. eds). Chapter 132, McGraw-Hill, New York, pp. 3955-3994. ^^2 Witte, T. et al. (1993) J. Clin. Invest. 92, 1181-1187. ^3 Shuster, D.E. et al. (1992) Proc. Natl Acad. Sci. USA 89, 9225-9229. ^^ Kehrli, M.E., Jr. et al. (1992) Am. J. Pathol. 140, 1489-1492. ^5 Giger, U. et al. (1987) Blood 69, 1622-1630. ^^ Law, S.K.A., unpublished. ^7 Law, S.K.A. and Taylor, G. M. (1991) Immunogenetics 34, 341-345. -^s Coxon, A. et al. (1996) Immunity 5, 653-666. ^9 Lu, H.F. et al. (1997) J. Clin. Invest. 99, 1340-1350. 50 Wilson, R.W. et al. (1993) J. Immunol. 151, 1571-1578. 5^ Mizgerd, J.P. et al. (1997) J. Exp. Med. 186, 1357-1364.

CR4 Alex Law, Department of Biochemistry, University of Oxford, Oxford, UK

Other names Complement receptor type 4, pl50,95 antigen, CDllc/CD18 antigen, integrin axft.

Physicochemical properties CR4 is made up of two non-covalently associated subunits encoded by two genes. Both a and p subunits are type I membrane glycoproteins with leader sequence of 19 and 22 amino acids respectively. The P subunit, CD 18, is discussed under the CR3 entry. Only brief details will be provided in this entry. a subunit (CDllc) 20-1163 125.9 150 10 (61, 385, 392, 697, 735, 899, 904, 939, 993, 1050)

Amino acids Mr(K) predicted observed AT-linked glycosylation sites

p subunit (CD18) 23-769 82.6 95 6 (50, 116, 212, 254, 501, 642)

Structure Higher order structure of CR4 may be presumed to be similar to other integrins based on electron microscopy of the platelet integrin a^^p^^ and the fibronectin receptor asA^. The two subunits interact at the N-terminal regions to form a globular head, whereas the C-terminal regions appear to insert into the membrane independently. However, there is evidence that the two cytoplasmic domains interact with each other as a regulatory mechanism for integrin-mediated adhesion^.

Function CR4 is so called because it binds the complement fragment iC3b^. However, it has been shown to bind many other ligands, including fibrinogen, ICAM-1, lipopolysaccharide and denatured peptides^-^^. It also mediates the adhesion of neutrophils and monocytes to endothelium but the ligands have yet to be identified^^. It has also been reported that CR4 plays an important role in T cell killing^^ and T cell aggregation^^. l-domain I

l..r'''J^'v,''".'-^'j

C-terminal domain I

l-domainWWe region

J

a subunit (CD11 c)

t ^

p subunit (CD18)

Cysteine-rich repeat region

M i l l 12

3 4

Tissue distribution CR4 is expressed mainly on myeloid cells and at high levels on tissue macrophages^^. It is also found on dendritic cells^^, activated B cells^, and lymphoid cell lines^^ including hairy cell leukaemia^^.

Regulation of expression It has been estimated that neutrophils and blood monocytes express 6000 and 8000 molecules of CR4 per cell respectively^. The level of CR4 on monocytes can be upregulated by 4-fold upon stimulation with fMLP, and by 8-fold upon culture in plastic plates for 7 days^'^^. An 8-fold increase of CR4 on the neutrophils upon fMLP stimulation was also reported^^.

Protein sequences a subunit^^ MTRTRAALLL VGAPQKITAA SPSQLLACGP DIVFLIDGSG FTFEEFRRTS TKILIVITDG LNDIASKPSQ MAQEGFSAVF RDSYLGYSTE GTQIGSYFGA WRRWWCDAVL GAVYLFHGVL DLAVGARGQV QSNICLYIDK SRVRVLGLKA PMLAALAQRY LNAEVMVWND APVGSQGTWS SSENNTPRTS AMHRYQVNNL EKIAPPASDF LSFGWVRQIL KVHNPTPLIV ENGTQTPSPP

FTALATSLGF NQTGGLYQCG TVHHECGRNM SISSRNFATM NPLSLLASVH KKEGDSLDYK EHIFKVEDFD TPDGPVLGAV LALWKGVQSL SLCSVDVDTD YGEQGHPWGR GPSISPSHSQ LLLRTRPVLW RSKNLLGSRD HCENFNLLLP FTASLPFEKN GEDSYGTTIT TSCRINHLIF KTTFQLELPV GQRDLPVSIN LAHIQKNPVL QKKVSWSVA GSSIGGLLLL SEK

NLDTEELTAF YSTGACEPIG YLTGLCFLLG MNFVRAVISQ QLQGFTYTAT DVIPMADAAG ALKDIQNQLK GSFTWSGGAF VLGAPRYQHT GSTDLVLIGA FGAALTVLGD RIAGSQLSSR VGVSMQFIPA LQSSVTLDLA SCVEDSVTPI CGADHICQDN FSHPAGLSYR RGGAQITFLA KYAVYTWSS FWVPVELNQE DCSIAGCLRF EITFDTSVYS ALITAVLYKV

RVDSAGFGDS LQVPPEAVNM PTQLTQRLPV FQRPSTQFSL AIQNWHRLF IIRYAIGVGL EKIFAIEGTE LYPPNMSPTF GKAVIFTQVS PHYYEQTRGG VNGDKLTDW LQYFGQALSG EIPRSAFECR LDPGRLSPRA TLRLNFTLVG LGISFSFPGL YVAEGQKQGQ TFDVSPKAVL HEQFTKYLNF AVWMDVEVSH RCDVPSFSVQ QLPGQEAFMR GFFKRQYKEM

WQYANSWW SLGLSLASTT SRQECPRQEQ MQFSNKFQTH HASYGARRDA AFQNRNSWKE TTSSSSFELE INMSQENVDM RQWRMKAEVT QVSVCPLPRG IGAPGEEENR GQDLTQDGLV EQWSEQTLV TFQETKNRSL KPLLAFRNLR KSLLVGSNLE LRSLHLTCDS GDRLLLTAITV SESEEKESHV PQNPSLRCSS EELDFTLKGN AQTTTVLEKY MEEANGQIAP

50 100 150 200 2 50 3 00 350 400 450 500 550 600 650 7 00 7 50 800 85 0 9 00 950 1000 1050 1100 1150

The leader sequences are underlined and putative N-linked glycosylation sites are indicated (N). p subunit^92o See CR3 entry.

Protein modules a subunit Leader peptide j8-propeller and I-domain I-domain Transmembrane segment Cytoplasmic domain

1-19 20-617 194-337 1106-1128 1129-1163

exon 1/2 exon 2-15 exon 6-9 exon 29 exon 29/30

P subunit See CR3 chapter.

Chromosomal location a subunit Human^^: 16pll.2. Gene located in a cluster which also contain the genes for the c^L (CDlla) and C^M (CDllb) integrin subunits. The three gene products combine with the same p subunit to form the LFA-1, CR3 and CR4 antigens.

p subunit22,23 See CR3 entry.

cDNA sequences a subunit^'^-^^ AGTACCTTGG TCTAGTCATG AGGTTTCAAC AGACAGCGTG AGCTGCCAAC CATCGGCCTG TACCACCAGC GAACATGTAC CCCGGTGTCC CTCAGGCAGC AAGCCAGTTC AACACACTTC TGTTCACCAG ATTGTTCCAT TGATGGGAAG AGCAGGCATC GAAAGAATTA CTTTGATGCT TACGGAGACC TGTGTTCACA TGCCTTCCTG GGACATGAGG GAGCCTGGTC GGTGTCCAGG CGGGGCCTCC CGGGGCCCCC CAGGGGGTGG GGGTCGCTTT CGTGGTCATC AGTCTTGGGA CTCCAGGCTG

TCCAGCTCTT ACCAGGACCA TTGGACACAG GTCCAGTATG CAAACGGGTG CAGGTGCCCC CCTTCCCAGC CTCACCGGAC AGGCAGGAGT ATCTCCTCCC CAGAGACCCA ACTTTCGAGG CTGCAAGGGT GCCTCATATG AAAGAAGGCG ATCCGCTATG AATGACATTG CTGAAAGATA ACAAGCAGTA CCTGATGGCC TACCCCCCAA GACTCTTACC CTGGGGGCCC CAATGGAGGA CTCTGCTCCG CATTACTACG AGAAGGTGGT GGGGCGGCTC GGGGCCCCAG CCCAGCATCA CAGTATTTTG

CCTGCAACGG GGGCAGCACT AGGAGCTGAC CCAACTCCTG GCCTCTACCA CGGAGGCCGT TGCTGGCCTG TCTGCTTCCT GCCCAAGACA GCAACTTTGC GCACCCAGTT AATTCAGGCG TTACATACAC GGGCCCGTAG ACAGCCTGGA CAATTGGGGT CATCGAAGCC TTCAAAACCA GCTCCTTCGA CCGTTCTGGG ATATGAGCCC TGGGTTACTC CCCGCTACCA TGAAGGCCGA TGGACGTAGA AGCAGACCCG GGTGTGATGC TGACAGTGCT GAGAGGAGGA GCCCCTCCCA GGCAGGCACT

CCCAGGAGCT CCTCCTGTTC AGCCTTCCGT GGTGGTGGTT GTGTGGCTAC GAACATGTCC CGGCCCCACC CCTGGGCCCC GGAGCAGGAC CACGATGATG TTCCCTGATG CACGTCAAAC GGCCACCGCC GGATGCCACC TTATAAGGAT TGGATTAGCT CTCCCAGGAA ACTGAAGGAG ATTGGAGATG GGCTGTGGGG TACCTTCATC CACCGAGCTG GCACACCGGG AGTCACGGGG CACCGACGGC AGGGGGCCAG TGTTCTCTAC GGGGGATGTG GAACCGGGGT CAGCCAGCGG GAGCGGGGGT

CAGAGCTCCA ACAGCCTTAG GTGGACAGCG GGAGCCCCCC AGCACTGGTG CTGGGCCTGT GTGCACCACG ACCCAGCTCA ATTGTGTTCC AACTTCGTGA CAGTTCTCCA CCCCTCAGCC ATCCAAAATG AAAATTCTCA GTCATCCCCA TTTCAAAACA CACATATTTA AAGATCTTTG GCACAGGAGG AGCTTCACCT AACATGTCTC GCCCTCTGGA AAGGCTGTCA ACTCAGATCG AGCACCGACC GTGTCTGTGT GGGGAGCAGG AATGGGGACA GCTGTCTACC ATCGCGGGCT CAAGACCTCA

CATCTGACCT CAACTTCTCT CTGGGTTTGG AAAAGATAAC CCTGTGAGCC CCCTGGCGTC AGTGCGGGAG CCCAGAGGCT TGATCGATGG GAGCTGTGAT ACAAATTCCA TGTTGGCTTC TCGTGCACCG TTGTCATCAC TGGCTGATGC GAAATTCTTG AAGTGGAGGA CCATTGAGGG GCTTCAGCGC GGTCTGGAGG AGGAGAATGT AAGGGGTGCA TCTTCACCCA GCTCCTACTT TGGTCCTCAT GTCCCTTGCC GCCACCCCTG AGCTGACAGA TGTTTCACGG CCCAGCTCTC CCCAGGATGG

60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860

1^Q •H

cDNA sequences continued

InMI ACTGGTGGAC Htl GCTCTGGGTG Inl llll

GTGTCGGGAG TGACAAACGT

H M

CCTGGCCCTC

^ H ^ H ^ H ^ H ^ H ^^M ^ H ^ H ^^1 ^^M ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^ H ^^1 ^ H

GAGTCTGAGC GCTCCCGAGC GGTGGGCAAG GAGATACTTC GGACAATCTC CCTGGAGCTG CATCACCTTC AGGGCAGCTG CTGGAGCACC CTTGGCTACC CAATGTGAGC CCCGGTGAAG CAACTTCTCA TAACCTGGGA CCAGGAGGCT CTCCTCAGAG CGTGCTGGAC CGTCCAGGAG GATATTGCAG GTACTCCCAG GAAGTACAAG GCTGCTGGCA GGAAATGATG CCCGCCCAGT CTTACTTACC GGGATGGGCC TTGTCAAGGT CTTGACTTGC TCTCCCATGA

^ ^ B

^.rp^^^^rp^^^

^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^

CTCGGCTCGC CGTAGCTGGG CTGAATATGC TCTGAACTCT CGCACAGCAT TCAGCATCAG GGAGTGAGAT TGCTGGCCTG CTCACCAGCC TAGTACCTGA TGTGTTAATA AATCTGAAAG

H H H H H H H H H H H H

^ H

CTGGCTGTGG GGGGTGAGCA CAGGTGGTCT TCTAAGAACC GACCCTGGCC CGAGTCCGAG TGCGTGGAGG CCCCTCCTTG ACGGCCTCCC GGCATCTCCT AACGCAGAAG TCCCACCCCG CGTTCCCTGC AGCTGCAGAA TTTGACGTCT AGTGAGAACA TATGCTGTCT GAGTCTGAGG CAGAGGGACC GTGTGGATGG AAAATCGCAC TGCTCCATTG GAGCTGGATT AAGAAGGTGT CTTCCAGGAC GTCCACAACC CTCATCACAG GAGGAGGCAA GAGAAATGAT CTCACCTGTC TGCTTCCTGT TCCAACTGGA AATTTCTACC GGCACGAATG TTGAGACGGA TGCAACCTCC ACTACAGGCA TGCTCATCCC CCAGCTTCGC GAGAGCCTCT CTCAGGGCTT GCCTGCATGC GAAGGGAGGA CCAGGGGCAG AAAAATGCCA CACATTAAAA GTCTAAAAAT

GGGCCCGGGG TGCAGTTCAT CTGAGCAGAC TGCTTGGGAG GCCTGAGTCC TCCTCGGGCT ACTCTGTGAC CCTTCAGAAA TACCCTTTGA TCAGCTTCCC TGATGGTGTG CAGGACTGTC ACCTGACATG TCAACCACCT CCCCCAAGGC ACACTCCCAG ACACTGTGGT AGAAGGAAAG TGCCTGTCAG ATGTGGAGGT CCCCAGCATC CTGGCTGCCT TCACCCTGAA CGGTCGTGAG AGGAGGCATT CCACCCCCCT CGGTACTGTA ATGGACAAAT CCCTCTTTGC AGGCTGACGG GTTTGGGAGA AACCCTTAGG TAGAAATACA ATCTTTCTTT GTCTCGCTCT GCCTCCCGGG CACGCCACCT CACCTGTCTT GTGAGAAGTC GTGCCCCCAT CATCGTGGGG TGGGTTCTGC GCGCCCTCTA AAGAGACCCA AGCACTAGAT CATCGCACAA AAAAAAGCCT

CCAGGTGCTC ACCTGCCGAG CCTGGTACAG CCGTGACCTC CCGTGCCACC GAAGGCACAC CCCCATTACC CCTGCGGCCT GAAGAACTGT AGGCTTGAAG GAATGACGGG CTACCGCTAC TGACAGCGCC CATCTTCCGT TGTCCTGGGA GACCAGCAAG TAGCAGCCAC CCATGTGGCC CATCAACTTC CTCCCACCCC TGACTTCCTG GCGGTTCCGC GGGCAACCTC TGTGGCTGAA TATGAGAGCT CATCGTAGGC CAAAGTTGGC TGCCCCAGAA CTTGGACTTC GGAGGAACCA AAACGTCTTG ACAGGGTCCC TGGACAATAC CCTTTCCTTT GTCACCCAGG TTCAAGTAAT CGCCCGGCCC CAACAGCTCC CCCTTCCATC CACCCTCGTT CTCTCAGTTC ACAGCTGGCC GGGAGGGACA ACCACTTCCT TATTTTTTTA AAACGATGCA TCTGTGGAAA

CTGCTCAGGA ATCCCCAGGT TCCAACATCT CAAAGCTCTG TTCCAGGAAA TGTGAAAACT TTGCGTCTGA ATGCTGGCCG GGAGCCGACC TCCCTGCTGG GAAGACTCCT GTGGCAGAGG CCAGTTGGGA GGCGGCGCCC GACCGGCTGC ACCACCTTCC GAACAATTCA ATGCACAGAT TGGGTGCCTG CAGAACCCAT GCGCACATTC TGTGACGTCC AGCTTTGGCT ATTACGTTCG CAGACGACAA AGCTCCATTG TTCTTCAAGC AACGGGACAC TTCTCCCGCG CTGCACCACC CTTGGGAAGG TGCTGTGTTC CCCCAGGCCT TTTTTTTTTT CTGGAGTGCA TCTGCTGTCT GATCTTTCTA CCATTACCCT CCAGAGGGTG TCCAGTGAAT CGATTCCCCA TCCCGCGGTT TGGCCCCGGT ATTTTTTGAG AAAAGGGTAC TCTACCGCTC AAAAAAAAAA

CCAGACCTGT CTGCGTTTGA GCCTTTACAT TGACCTTGGA CAAAGAACCG TCAACCTGCT ACTTCACGCT CACTGGCTCA ATATCTGCCA TGGGGAGTAA ACGGAACCAC GCCAGAAACA GCCAGGGCAC AGATCACCTT TTCTGACAGC AGCTGGAGCT CCAAATACCT ACCAGGTCAA TGGAGCTGAA CCCTTCGGTG AGAAGAATCC CCTCCTTCAG GGGTCCGCCA ACACATCCGT CGGTGCTGGA GGGGTCTGTT GTCAGTACAA AGACCCCCAG ATTTTCCCCA GAGAGAGGCT GGCCTTTGTC CCCAAAAGGA CAGTCTCCCT rnp(rnrnrnrnpirnrnrn

ATGGCGTGAT CAGCCTCCTG AAATACAGTT CAGGACAATG GGCTTCAGGG TAGTGTCATG GGCTGAATTG GGGTCAACAT GCGGCTGCAG GCTATGAATA TTTAAATGTT CTTGGGAAAT

1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3240 3300 3360 3420 3480 3540 3600 3660 3720 3780 3840 3900 3960 4020 4080 4140 4200 4260 4320 4380 4440 4500 4560 4620

The methionine initiation codon (ATGl, the termination codon ("_ rcA)

and the probable polyadenylation signal (AATAAA) are indicated. The first five nucleotides in each exon are underlined The major site for transcription initiation of the a subunit is as shown^^. However, it is not clear if there are more introns after the C at 4064. P subunit^^ 20,27 See CR3 entry.

Genomic structure a subunit^''-^*: The gene spans -25 kb and is encoded by at least 30 exons. It is not clear if there are any more introns in the 3' untranslated region. CD11C hi^

H-m—IH-HHfl

HHH

1

15

III WHU 30

j8 subunit^^: See CR3 entry.

Accession numbers Human

a subunit^^ j8 subunit^^'20,28-32

M81695 Y00093 See CR3 entry

Deficiency^^ There is no known defect of the a subunit. For defects of the ^ subunit, see CR3 chapter.

Polymorphic variants There are no known polymorphic variants for the a or p subunits. Silent variants are found in the p subunit, see CR3 chapter.

References 1 Carrell, N.A. et al. (1985) J. Biol. Chem. 260, 1743-1749. 2 Nermut, M.V. et al. (1988) EMBO J. 7, 4093-4099. 3 Hughes, P.E. et al. (1996) J. Biol. Chem. 271, 6571-6574. ^ Myones, B.L. et al. (1988) J. Clin. Invest. 82, 640-651. 5 de Fougerolles, A.R. et al. (1995) Eur. J. Immunol. 25, 1008-1012. 6 Loike, J.D. et al. (1991) Proc. Natl Acad. Sci. USA 88, 1044-1048. ^ Postigo, A.A. et al. (1991) J. Exp. Med. 174, 1313-1322. « Wright, S.D. and Jong, M.T.C. (1986) J. Exp. Med. 164, 1876-1888. ^ Ingalls, R.R. and Golenbock, D.T. (1995) J. Exp. Med. 181, 1473-1479. '0 Davis, G.E. (1992) Exp. Cell Res. 200, 242-252. " Stacker, S.A. and Springer, T.A. (1991) J. Immunol. 146, 648-655. ^2 Keizer, C D . et al. (1987) J. Immunol. 138, 3130-3136. ^3 Blackford, J. et al. (1996) Eur. J. Immunol. 26, 525-531. ^^ Hogg, N. et al. (1986) Eur. J. Immunol. 16, 240-248. ^5 Freudenthal, P.S. and Steinman, R.M. (1990) Proc. Natl Acad. Sci. USA 87, 7698-7702. ^6 Hanson, C.A. et al. (1990) Blood 76, 2360-2367. ^7 Miller, L.J. et al. (1987) J. Clin. Invest. 80, 535-544.

^« ^^ 20 2^ 22 23 2^ 25 26 27 28 29 3« 3^ 32 33

Corbi, A.L. et al. (1987) EMBO J. 6, 4023-4028. Law, S.K.A. et al. (1987) EMBO J. 6, 915-919. Kishimoto, T.K. et al. (1987) Cell 48, 681-690. Callen, D.F. et al. (1991) Cytogenet. Cell Genet. 58, 1998. Petersen, M.B. et al. (1991) Genomics 9, 407-419. MacDonald, G.P. et al. (1988) Am. J. Hum. Genet. 43, A151. Corbi, A.L. et al. (1990) J. Biol. Chem. 265, 2782-2788. Corbi, A.L. et al. (1990) J. Biol. Chem. 265, 12750-12751. Lopez-Cabrera, M. et aL (1993) J. BioL Chem. 268,1187-1193. Weitzman, J.B. et al. (1991) FEBS Lett. 294, 97-103. Zeger, D.L. et al. (1990) Immunogenetics 31, 191-197. Wilson, R.W. et al. (1989) Nucleic Acids Res. 17, 5397. Shuster, D.E. et al. (1992) Proc. Natl Acad. Sci. USA 89, 9225-9229. Lee, J.K. et al. (1996) Xenotransplantation 3, 222-230. Bilsland, C.A. and Springer, T.A. (1994)}. Leukocyte Biol. 55, 501-506. Anderson, D.C. et al. (1994) In The MetaboHc Basis of Inherited Diseases, 7th edn (Shriver, C.R., Beaudet, A.L., Sly, W.S., and Valle, D. eds). Chapter 132, McGraw-Hill, New York, pp. 3955-3994.

This Page Intentionally Left Blank

Part/ Miscellaneous Complement Components

CI inhibitor

CI esterase inhibitor, CI inactivator

Ranol Zahedi and Alvin E. Davis III, The Center for Blood Research, Boston, MA, USA

Physicochemical properties Cl inhibitor is synthesized as a single-chain molecule of 500 amino acids including a 22 amino acid signal peptide. pP 2.7-2.8 Mr (K) predicted 52.9 71.1 (glycosylated) observed 104 N-linked glycosylation sites^^^ ^ (25^ 69, 81, 238, 253, 352)

Structure Cl inhibitor is a two-domain molecule consisting of an elongated (15 nm) heavily glycosylated N-terminal domain that is mucin-like and a globular 7 X 3 X 3nm serpin domain, determined by electron microscopy and neutron scattering^'"^. There are seven O-linked glycosylation sites^'^ (S64; T48, 71, 83, 88, 92, 96).

Function Cl inhibitor functions as a regulator of activation of the classical complement pathv^ay and of the contact activation system of kinin generation and coagulation. It is a serine proteinase inhibitor (serpin) that, like other inhibitor members of the family, functions by forming a tight, probably covalent, bimolecular complex with target proteinase. It, therefore, functions as a suicide substrate. Its primary biologically relevant target proteinases are Clr, Cls, coagulation factors XIa and Xlla, and plasma kallikrein^-^^

mL Tissue distribution Serum protein: 150;Ug/ml. Primary site of synthesis: liver^^. Secondary sites: monocytes, fibroblasts, endothelial cells, megakaryocytes, microglial cells, neurons, placenta^^^^

Regulation of expression Cl inhibitor is an acute-phase reactant. Its synthesis is induced in monocytes, hepatocytes and fibroblasts by IFN7, IL-6, and TNFa^^'22-26

CI inhibitor

Protein sequence3,27,28 MASRLTLLTL LLLLLAGDRA SSNPNATSSS SQDPESLQDR GEGKVATTVI SKMLFVEPIL QPTQPTTQLP LKLYHAFSAM KDFTCVHQAL LSNNSDANLE TTFDPKKTRM HNLSLVILVP IKVTTSQDML ETGVEAAAAS

EVSSLPTTNS TDSPTQPTTG KKVETNMAFS KGFTTKGVTS LINTWVAKNT EPFHFKNSVI QNLKHRLEDM SIMEKLEFFD AISVARTLLV

TTNSATKITA SFCPGPVTLC PFSIASLLTQ VSQIFHSPDL NNKISRLLDS KVPMMNSKKY EQALSPSVFK FSYDLNLCGL FEVQQPFLFV

NTTDEPTTQP SDLESHSTEA VLLGAGENTK AIRDTFVNAS LPSDTRLVLL PVAHFIDQTL AIMEKLEMSK TEDPDLQVSA LWDQQHKFPV

TTEPTTQPTI VLGDALVDFS TNLESILSYP RTLYSSSPRV NAIYLSAKWK KAKVGQLQLS FQPTLLTLPR MQHQTVLELT FMGRVYDPRA

50 10 0 150 2 00 2 50 3 00 3 50 40 0 45 0 500

The leader peptide is underHned, N- and O-Unked glycosylation sites are indicated by N and S or T, respectively, and the reactive centre R466 is indicated by double underlining.

Protein modules 1-22 22-120 121-500 453-470

Leader peptide Probable STP Serpin domain Reactive centre loop

exon 2 exon 2/3 exon 3-8

Chromosomal location Human: l l q l l - q l 3 . 1 .

cDNA sequence3,27,28 AGTCTGCACT ATGGCCTCCA TCCTCAAATC GGCGAAGGGA GAGGTTTCCA AATACCACTG CAACCCACCC TCCTTCTGCC GTGTTGGGGG AAGAAGGTGG GTCCTGCTCG AAGGACTTCA GTCTCTCAGA CGGACCCTGT CTCATCAACA CTGCCCTCCG ACAACATTTG AAAGTGCCCA AAAGCCAAGG CAGAACCTGA GCCATCATGG ATCAAAGTGA

GGAGCTGCCT GGCTGACCCT CAAATGCTAC AGGTCGCAAC GCTTGCCGAC ATGAACCCAC AACCAACTAC CAGGACCTGT ATGCTTTGGT AGACCAACAT GGGCTGGGCA CCTGTGTCCA TCTTCCACAG ACAGCAGCAG CCTGGGTGGC ATACCCGCCT ATCCCAAGAA TGATGAATAG TGGGGCAGCT AACATCGTCT AGAAACTGGA CGACCAGCCA

GGTGACCAGA GCTGACCCTC CAGCTCCAGC AACAGTTATC AACCAACTCA CACACAACCC CCAGCTCCCA TACTCTCTGC AGATTTCTCC GGCCTTTTCC GAACACCAAA CCAGGCCCTG CCCAGACCTG CCCCAGAGTC CAAGAACACC TGTCCTCCTC AACCAGAATG CAAGAAGTAC GCAGCTCTCC TGAAGACATG GATGTCCAAG GGATATGCTC

AGTTTGGAGT CTGCTGCTGC TCCCAGGATC TCCAAGATGC ACAACCAATT ACCACAGAGC ACAGATTCTC TCTGACTTGG CTGAAGCTCT CCATTCAGCA ACAAACCTGG AAGGGCTTCA GCCATAAGGG CTAAGCAACA AACAACAAGA AATGCTATCT GAACCCTTTC CCTGTGGCCC CACAATCTGA GAACAGGCTC TTCCAGCCCA TCAATCATGG

CCGCTGACGT TGCTGGCTGG CAGAGAGTTT TATTCGTTGA CAGCCACCAA CCACCACCCA CTACCCAGCC AGAGTCATTC ACCACGCCTT TCGCCAGCCT AGAGCATCCT CGACCAAAGG ACACCTTTGT ACAGTGACGC TCAGCCGGCT ACCTGAGTGC ACTTCAAAAA ATTTCATTGA GTTTGGTGAT TCAGCCCTTC CTCTCCTAAC AGAAATTGGA

CGCCGCCCAG GGATAGAGCC GCAAGACAGA ACCCATCCTG AATAACAGCT ACCCACCATC CACTACTGGG AACAGAGGCC CTCAGCAATG CCTTACCCAG CTCTTACCCC TGTCACCTCA GAATGCCTCT CAACTTGGAG GCTAGACAGT CAAGTGGAAG CTCAGTTATA CCAAACTTTG CCTGGTACCC TGTTTTCAAG ACTACCCCGC ATTCTTCGAT

60 12 0 18 0 240 3 00 3 60 42 0 480 54 0 60 0 660 72 0 7 80 84 0 900 9 60 102 0 1080 114 0 12 0 0 12 6 0 13 2 0

CI inhibitor

cDNA sequence TTTTCTTATG ATGCAGCACC GCCATCTCTG CTCTGGGACC TGAGACCTGC AGCCCTGCTG CCACCAAAAG GGCCCTGCCA TCTATAAATA

ACCTTAACCT AGACAGTGCT TGGCCCGCAC AGCAGCACAA AGGATCAGGT CTGCCTGCCT GGCTCCTGAG TGCTCTCCAA AAACCTGACA

continued GTGTGGGCTG GGAACTGACA CCTGCTGGTC GTTCCCTGTC TAGGGCGAGC GGACTTGCCC GGTCTGGGCA ACCACTTTTT GAGCAT

ACAGAGGACC GAGACTGGGG TTTGAAGTGC TTCATGGGGC GCTACCTCTC CTGCCACCTC AGGGACCTGC GCAGCTTTCT

CAGATCTTCA TGGAGGCGGC AGCAGCCCTT GAGTATATGA CAGCCTCAGC CTGCCTCAGG TTCTATTAGC CTAGTTCAAG

GGTTTCTGCG TGCAGCCTCC CCTCTTCGTG CCCCAGGGCC TCTCAGTTGC TGTCCGCTAT CCTTCTCCAT TTCACCAGAC

13 80 144 0 150 0 1560 162 0 1680 17 40 18 00

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TGAl and the probable polyadenylation signal (AATAAAI are indicated.

Genomic structure^^ The gene spans 17.5 kb and is encoded by 8 exons, as illustrated. The intron sizes range from 0.2 kb (intron 5) to 5.2 kb (intron 6). 1 kb

Mil

It

Accession numbers (EMBL/GenBank) Human Mouse

M13656 X54486 Y10386

Deficiency^^ Autosomal dominant (hereditary angioedema). Uncontrolled activation of the complement system which results in consumption of C2 and C4, and of the contact system which results in release of bradykinin from high molecular weight kininogen. Patients have recurrent episodes of localized oedema that may affect the skin, gastrointestinal tract or larynx. Patients may also develop SLE-like disorder secondary to the acquired C2 and C4 deficiencies. More than 50 different mutations leading to deficiency or dysfunction have been described^^

Polymorphic variants G1473A; V480M3. The DNA encoding the V variant creates an HgiAl recognition sequence (GTGCTC).

CI inhibitor

References ' 2 3 ^ 5 6 7 « 9

^

s 20 2^ 22 23 24 25 26 27 2s 29 30 3^

Haupt, H. et al. (1970) Eur. J. Biochem. 17, 254-261. Perkins, S.J. et al. (1990) J. Mol. Biol. 214, 751-763. Bock, S.C. et al. (1986) Biochemistry 25, 4292-4301. Odermatt, E. et al. (1981) FEBS Lett. 131, 283-285. Ratnoff, O. and Lepow, I. (1957) J. Exp. Med. 106, 327-343. Schreiber, A.D. et al. (1973) J. Clin. Invest. 52, 1402-1409. Ratnoff, O. et al. (1969) J. Exp. Med. 129, 315-331. Wuillemin, W.A. et al. (1995) Blood 85, 1517-1526. Sim, R.B. et al. (1979) FEBS Lett. 97, 111-115. Schapira, M. et al. (1982) J. Clin. Invest. 69, 462-468. d'Agostini, A. et al. (1984) J. Clin. Invest. 73, 1542-1549. Pixley, R.A. et al. (1985) J. Biol. Chem. 260, 1723-1729. Johnson, A.M. et al. (1971) Science 173, 553-554. Bensa, J.C. et al. (1983) Biochem. J. 216, 385-392. Yeung, L.A. et al. (1985) Biochem. J. 226, 199-205. Katz, Y. and Strunk, R. (1989) J. Immunol. 142, 2041-2045. Schmaier, A.H. et al. (1989) J. Biol. Chem. 264, 18173-18179. Endresen, G.K.M. (1980) Thromb. Res. 19, 157-163. Schmaier, A.H. et al. (1985) J. Clin. Invest. 75, 242-250. Walker, D.G. et al. (1995) Brain Res. 67S, 75-82. Gitlin, D. and Biasucci, A. (1969) J. Clin. Invest. 48, 1433-1446. Lotz, M. and Zuraw, B.L. (1987) J. Immunol. 139, 3382-3387. Hamilton, A.O. et al. (1987) Biochem. J. 242, 809-815. Lappin, D. et al. (1990) Biochem. J. 268, 387-392. Schmidt, B. et al. (1991) Immunology 74, 677-679. Zuraw, B. and Lotz, M. (1990) J. Biol. Chem. 265, 12664-12670. Davis III, A.E. et al. (1986) Proc. Natl Acad. Sci. USA 83, 3161-3165. Tosi, M. et al. (1986) Gene 42, 265-272. Carter, P. et al. (1991) Eur. J. Biochem. 197, 301-308. Donaldson, V.H. and Evans, R.R. (1963) Am. J. Med. 35, 37-44. Davis III, A.E. (1998) In The H u m a n Complement System in Health and Disease (Volanakis, J.E. and Frank, M.M. eds). Marcel Dekker, New York, pp. 455-480.

Apolipoprotein J (clusterin) Mark E. Rosenberg, Division of Renal Diseases and Hypertension, University of Minnesota, Minneapolis, MN, USA

Other names^ Serum protein 40,40, complement lysis inhibitor (CLI), testosterone repressed prostate message 2 (TRPM-2), sulfated glycoprotein 2 (SGP-2), dimeric acidic glycoprotein (DAG), glycoprotein 80, NA1/NA2, glycoprotein III.

Physicochemical properties Apolipoprotein J is synthesized as a single polypeptide chain of 449 amino acids including a 22 amino acid signal peptide. It is cleaved internally at R227, S228 to yield two chains (a and P). It has a tendency to aggregate into high molecular weight multimers. Mature protein: pF M, (K)^ predicted observed Amino acids N-linked glycosylation sites'* Interchain disulfide bonds

Sa-p (highly conserved across species

4.9-5.4 46 60 (glycosylated) 80 a chain 23-227 3 (86, 103, 145)

P chain 228-449 4(291,317,354,374)

102 113 116 121 129

313 305 302 295 285

Structure^-^ Apolipoprotein J is a heterodimer linked by a unique five disulfide bond motif. Carbohydrate accounts for 30% of the molecular mass. The N-linked carbohydrate is the site of sulfation with heterogeneity of glycosylation in different sites. Two clusterin signature motifs have been defined, each containing three cysteines involved in disulfide bonds: CKPCLKxTC (clusterin 1) and CL(R,K)M(R,K)x(E,Q)C(E,D)KC (clusterin 2). There are two a-helical regions which show some structural similarity with other ahelical coiled-coil structures such as the heavy chain of myosin. The amino acid sequence is highly conserved between species.

Function^-^^ The exact role of apolipoprotein } (clusterin) is unclear, however it has been proposed to play a role in a number of pathways. In the regulation of complement activity it binds to several sites in the membrane attack Consensus regarding terminology is that clusterin will be used as either the name of this protein or be included in brackets when using other names^

Apolipoproteiii J (clusterin)

complex and prevents insertion into the cell membrane. It also prevents C9 polymerization. It circulates as a high density lipoprotein and may play a role in lipid transport and recycling. It may also function in cell adhesion; cell aggregation; cell-cell and cell-substratum interactions. Other proposed functions include: membrane protection, extracellular scavenging, apoptosis, endocrine secretion, reproduction, cell differentiation and morphogenesis, cytoprotection. The interactions of apolipoprotein J (clusterin) with other molecules include: the endocytic receptor glycoprotein 330 (gp330)^2. cell lines expressing gp330 can endocytose and degrade clusterin; type I and type II TGFjS receptors^^; terminal complement components C7, C8 and C9; immunoglobulins (Fab and Fc fragments )^^, apolipoprotein A-P^, paraoxonase^^ and the soluble form of the amyloid p peptide^''. It also interacts with the surface of Staphylococcus aureus^^-, and streptococcal inhibitor of complement-mediated lysis protein (SIC) secreted by Streptococcus pyogenes^^.

Heparin-binding domains Amphipathic helices

Tissue distribution Physiologic fluids: plasma (100-300 pg/mpo)^ semen (10-fold higher concentration than in serum), urine, cerebrospinal fluid and breast milk. Blood cells: megakaryocytes and platelets contain clusterin primarily in agranules (2.5iLig/109 platelets)2^'22 Primary site of synthesis: liver, testes, epididymis, brain. Secondary sites: many organs including stomach, brain, heart, lung, kidney, pancreas; not all cells in a given organ express clusterin, e.g. atria but not ventricles of the heart; secreted across the apical surface of many cell types.

Regulation of expression^'-^'^^'^^ Expression is induced following toxic or ischaemic tissue injury to the brain, kidney, heart, liver and also after apoptosis in some tissues (e.g. androgen regulation in the male reproductive tract). There are increased serum levels in models of atherosclerosis and it is present in atherosclerotic plaques. It is specifically induced in several disease states including plaques in Alzheimer's disease, retinitis pigmentosa and cystic renal diseases. Endotoxin and cytokines increase liver expressions^. Clusterin is variably regulated by TGF/31 depending on the cell type^^.

Apolipoprotein J (clusterin)

Protein sequence27 MMKTLLLFVG NGVKQIKTLI VCNETMMALW YFWMNGDRID QDTYHYLPFS IHEAQQAMDI QCDKCREILS KMLNTSSLLE GVTEVWKLF

LLLTWESGQV EKTNEERKTL EECKPCLKQT SLLENDRQQT LPHRRPHFFF HFHSPAFQHP VDCSTNNPSQ QLNEQFNWVS DSDPITVTVP

LGDQTVSDNE LSNLEEAKKK CMKFYARVCR HMLDVMQDHF PKSRIVRSLM PTEFIREGDD AKLRRELDES RLANLTQGED VEVSRKNPKF

LQEMSNQGSK KEDALNETRE SGSGLVGRQL SRASSIIDEL PFSPYEPLNF DRTVCREIRH LQVAERLTRK QYYLRVTTVA METVAEKALQ

YVNKEIQNAV SETKLKELPG EEFLNQSSPF FQDRFFTREP HAMFQPFLEM NSTGCLRMKD YNELLKSYQW SHTSDSDVPS EYRKKHREE

50 100 150 200 250 300 350 400

The signal sequence and the cleavage site (RS) between the two chains of clusterin are underlined. The N-linked glycosylation sites are indicated (N).

Protein modules^'^ 67-82 212-218 424-430 442-449 176-189 243-259 423-440 104-131 285-313

Heparin-binding domain Heparin-binding domain Heparin-binding domain Heparin-binding domain Amphipathic helix Amphipathic helix Amphipathic helix Cysteine domain Cysteine domain

exon 3 exon 5 exon 8 exon 8/9 exon 5 exon 5 exon 8 exon 4 exon 6/7

Cliromosomal location Human27'28; 8p21 (43% of the distance from the centromere to the telomere). Telomere ... NFL ... CLU ... LHRH ... Centromere Mouse^: chromosome 14.

cDNA sequence^^ CGCGGACAGG GACTCTGCTG CCAGACGGTC TAAGGAAATT AAACGAAGAG TGCCCTAAAT TGAGACCATG GTTCTACGCA CCTGAACCAG GGAGAACGAC GTCCAGCATC CTACCACTAC CCGCATCGTC GTTCCAGCCC CAGCCCGGCC TGTGTGCCGG CAAGTGCCGG GCGGCGGGAG GCTGCTAAAG

GTGCCGCTGA CTGTTTGTGG TCAGACAATG CAAAATGCTG CGCAAGACAC GAGACCAGGG ATGGCCCTCT CGCGTCTGCA AGCTCGCCCT CGGCAGCAGA ATAGACGAGC CTGCCCTTCA CGCAGCTTGA TTCCTTGAGA TTCCAGCACC GAGATCCGCC GAGATCTTGT CTCGACGAAT TCCTACCAGT

CCGAGGCGTG GGCTGCTGCT AGCTCCAGGA TCAACGGGGT TGCTCAGCAA AATCAGAGAC GGGAAGAGTG GAAGTGGCTC TCTACTTCTG CGCACATGCT TCTTCCAGGA GCCTGCCCCA TGCCCTTCTC TGATACACGA CGCCAACAGA ACAACTCCAC CTGTGGACTG CCCTCCAGGT GGAAGATGCT

CAAAGACTCC GACCTGGGAG AATGTCCAAT GAAACAGATA CCTAGAAGAA AAAGCTGAAG TAAGCCCTGC AGGCCTGGTT GATGAATGGT GGATGTCATG CAGGTTCTTC CCGGAGGCCT TCCGTACGAG GGCTCAGCAG ATTCATACGA GGGCTGCCTG TTCCACCAAC CGCTGAGAGG CAACACCTCC

AGAATTGGAG AGTGGGCAGG CAGGGAAGTA AAGACTCTCA GCCAAGAAGA GAGCTCCCAG CTGAAACAGA GGCCGCCAGC GACCGCATCG CAGGACCACT ACCCGGGAGC CACTTCTTCT CCCCTGAACT GCCATGGACA GAAGGCGACG CGGATGAAGG AACCCCTCCC TTGACCAGGA TCCTTGCTGG

GCATGATGAA 60 TCCTGGGGGA 120 AGTACGTCAA 180 240 TAGAAAAAAC AGAAAGAGGA 300 GAGTGTGCAA 360 420 CCTGCATGAA TTGAGGAGTT 480 ACTCCCTGCT 540 600 TCAGCCGCGC 660 CCCAGGATAC 720 TTCCCAAGTC TCCACGCCAT 780 840 TCCACTTCCA ATGACCGGAC 900 ACCAGTGTGA 960 AGGCTAAGCT 1020 AATACAACGA 1080 AGCAGCTGAA 1140

Apolipoprotein J (clusterin)

cDNA sequence CGAGCAGTTT TCTGCGGGTC TGAGGTGGTC CTCCAGGAAG CAAAAAGCAC GAGTCCAGCT GTAACCAGGC TGCACTCTAA TAATTCAATA

continued

AACTGGGTGT ACCACGGTGG GTGAAGCTCT AACCCTAAAT CGGGAGGAGT CCCCCCAAGA CCCAGCCTCC CACTCGACTC AAACTGTCTT

CCCGGCTGGC CTTCCCACAC TTGACTCTGA TTATGGAGAC GAGATGTGGA TGAGCTGCAG AGGCCCCCAA TGCTGCTCAT GTGAGCTG

AAACCTCACG TTCTGACTCG TCCCATCACT CGTGGCGGAG TGTTGCTTTT CCCCCCAGAG CTCCGCCCAG GGGAAGAACA

CAAGGCGAAG GACGTTCCTT GTGACGGTCC AAAGCGCTGC GCACCTTACG AGAGCTCTGC CCTCTCCCCG GAATTGCTCC

A C C A G T A C T A 12 0 0 C C G G T G T C A C 12 6 0 C T G T A G A A G T 13 2 0 A G G A A T A C C G 13 8 0 G G G G C A T C T T 1440 A C G T C A C C A A 150 0 C T C T G G A T C C 15 60 T G C A T G C A A C 162 0

The first five nucleotides in each exon are underlined. The methionine initiation codon (ATG), the termination codon (TGA) and the polyadenylation signal (AATAAA) are indicated.

Genomic structure^'^ The gene spans 16.58 kb and is organized into 9 exons and 8 introns as illustrated below. The exons range in size from 47 bp (exon I) to 412 bp (exon V). The introns range in size from 207bp (intron 8) to 4377bp (intron 6). The a-p cleavage site is in exon 5. 1

1kb

II III

I II

Accession numbers Human Rat Mouse Dog Cow Pig Horse Quail (C corturnix)

cDNA M64722 J0298 M64723 S70244 X84792 M38757 J05391 M84639 L46797

Genomic M63376-M63379 M64733 U02391 L08235

X80760

Deficiency No deficiency statesJ defined.

Polymorphic variants There are three polymorphisms of serum clusterin identified by isoelectric focusing which do not impact on lipid traits^^. Seven DNA sequence polymorphisms have been described, two of which alter the amino acid sequence.

Apolipoprotein J (clusteriii)

A1001 to C; N317 to H deleting a glycosylation signal G1034 to A; D328 to N, adding a glycosylation signaP*^ There is marked racial variability in allele frequency and no association between polymorphisms and Alzheimer's disease^'^.

References ^ 2 3 ^ 5 6 7 « ^

Fritz, LB. (1992) Clin. Exp. Immunol. 88, 375. DeSilva, H.V. et al. (1990) J. Biol. Chem. 265, 13240-13247. Burkey, B.F. et al. (1991) J. Lipid Res. 32, 1039-1048. DeSilva, H.V. et al. (1990) Biochemistry 29, 5380-5389. Tsuruta, J.K. et al. (1990) Biochem. J. 268, 571-578. Kirszbaum, L. et aL (1989) EMBO J. 8, 711-718. Jordan-Starck, T.C. et al. (1992) Curr. Op. Lipidol. 3, 75-85. Silkensen, J.R. et al. (1994) Biochem. Cell. Biol. 72, 483-488. Rosenberg, M.E. et aL (1995) Int. J. Biochem. CelL BioL 27, 633-645. Jenne, D.E. et al. (1992) TIBS 17, 154-159. Fritz, LB. et al. (1993) Trends Endocrinol. Metab. 4, 41-45. Kounnas, M.Z. et aL (1995) J. BioL Chem. 270,13070-13075. Reddy, K.B. et al. (1996) Biochemistry 35, 309-314. Wilson, M.R. et al. (1992) Biochim. Biophys. Acta 1159, 319-326. Jenne, D.E. et al. (1991) J. Biol. Chem. 266, 11030-11036. Blatter, M.-C. et aL (1993) Eur. J. Biochem. 211, 871-879. Ghiso, J. et aL (1993) Biochem. J. 293, 27-30. « Akesson, P. et al. (1996) J. Biol. Chem. 271, 1081-1088. Partridge, S.R. et al. (1996) Infect. Immun. 64, 4324-4329. 20 Hogasen, K. et al. (1993) J. Immunol. Methods 160, 107-115. 2^ Witte, D.P. et aL (1993) Am. J. Pathol. 143, 763-773. 22 Tschopp, J. et al. (1993) Blood 82, 118-125. 23 Aronow, B.J. et al. (1993) Proc. Natl Acad. Sci. USA 90, 725-729. 2^ Jordan-Starck, T.C. et al. (1994) J. Lipid Res. 35, 194-210. 25 Hardardottir, I. et al. (1994) J. Clin. Invest. 94, 1304-1309. 26 Jin, G. et al. (1997) J. BioL Chem. 272, 26620-26626. 27 Wong, P. et al. (1994) Eur. J. Biochem. 221, 917-925. 28 Purrello, M. et al. (1991) Genomics 10, 151-156. 29 Kamboh, M. I. et al. (1991) Am. J. Hum. Genet. 49, 1167-1173. ^0 Tycko, B. et al. (1996) Hum. Genet. 98, 430-436.

Properdin

Native properdin^

Timothy Parries, Imutran Ltd (A Novartis Pharma AG Company), Cambridge, UK

Physicochemical properties Properdin is synthesized as a single-chain molecule of 469 amino acids, including a 27 amino acid leader sequence. pF >9.5 8.33 (theoretical) M, (K) predicted 48.5 (unglycosylated) 53.2 (glycosylated^ observed 55 N-linked glycosylation site (occupied) 1 (428)

Structure Properdin is a polydisperse set of cyclic polymers, in the ratio 26:54:20 of dimers:trimers:tetramers, plus higher oligomers, formed by head-to-tail association of monomers^'^. Each monomer is 26 X 2.5 nm by electron microscopy^, confirmed by neutron and X-ray scattering^.

Function

llli

Properdin binds to C3b with higher polyvalent avidity for clustered surfacebound C3b, larger polymers being more active, and higher affinity for C3b.Bb > C3b.B > C3b complexes^. The result is (1) to inhibit cleavage of C3b by factor I, (2) to increase the affinity for factor B, and most significantly (3) to increase the stability of C3bBb, inhibiting its dissociation into C3b + Bb^^^. Consequently properdin promotes positive amplification of C3b deposition onto an activating surface.

23333213Tissue distribution Serum protein: 4.3-5.7 jag/ml in plasma*. Sites of synthesis: monocytes^, T cells^'^ and granulocytes^^.

Regulation of expression Properdin synthesis by monocytic cell lines is upregulated by phorbol esters, bacterial LPS, IL-lp and TNFa^^. TNFa, C5a, IL-8 and fMLP also stimulate release of properdin stored in neutrophil granules^^

Properdin

Protein sequence 13,14 MITEGAQAPR LLLPPLLLLL TLPATGSDPV LCFTQYEESS GKCKGLLGGG

50

VSVEDCCLNT AFAYQKRSGG LCQPCRSPRW SLWSTWAPCS VTCSEGSQLR

100

YRRCVGWNGQ CSGKVAPGTL EWQLQACEDQ QCCPEMGGWS GWGPWEPCSV

150

TCSKGTRTRR RACNHPAPKC GGHCPGQAQE SEACDTQQVC PTHGAWATWG

200

PWTPCSASCH GGPHEPKETR SRKCSAPEPS QKPPGKPCPG LAYEQRRCTG

250

LPPCPVAGGW GPWGPVSPCP VTCGLGQTME QRTCNHPVPQ HGGPFCAGDA

300

TRTHICNTAV PCPVDGEWDS WGEWSPCIRR NMKSISCQEI

350

PGQQSRGRTC

RGRKFDGHRC AGQQQDIRHC YSIQHCPLKG SWSEWSTWGL CMPPCGPNPT

400

RARQRLCTPL LPKYPPTVSM VEGQGEKNVT FWGRPLPRCE ELQGQKLWE

450

EKRPCLHVPA CKDPEEEEL

The leader sequence is underlined, and the single N-linked glycosylation site (occupied) is indicated (N).

Protein modules Leader peptide 1-27 exon2 TSPl (1) 77-134 exon 4 TSPl (2) 135-191 exon 5 TSPl (3) 192-255 exon 6 TSPl (4) 256-313 exon 7 TSPl (5) 314-377 exon 8 TSPl (6) 378-437 exon 9/10 Polymerization and ability to stabilize C3bBb are impaired by deletion of TSPl (4), (5) or (6), but not by deletion of (3)^5.

D

Chromosomal location Human^^: short arm of X chromosome. Xpll.3-Xp 11.23.

cDNA sequence 13,14,17 GAGCCTATCA TCAACATGAT TGCTCACCCT CCTCCGGCAA ACACTGCCTT GATGGTCCCT TGCGGTACCG CCCTGGAGTG GGTCTGGCTG GCAGGCGAGC AGGAATCAGA GGGGCCCCTG CACGAAGCCG CGGGGCTAGC GCTGGGGGCC TGGAACAACG ATGCCACCCG

ACCCAGATAA CACAGAGGGA GCCAGCCACA GTGCAAGGGC TGCCTACCAG GTGGTCCACA GCGCTGTGTG GCAGCTCCAG GGGGCCCTGG CTGTAATCAC GGCCTGTGAC GACCCCCTGC CAAGTGTTCT CTAGGAGOAG TTGGGGCCCT GACGTGCAAT GACCCACATC

AGCGGGACCT GCGCAGGCCC GGCTCAGACC CTCCTGGGGG AAACGTAGTG TGGGCCCCCT GGCTGGAATG GCCTGTGAGG GAGCCTTGCT CCTGCTCCCA ACCCAGCAGG TCAGCCTCCT GCACCTGAGC CGGAGGTGCA GTGAGCCCCT CACCCTGTGC TGCAACACAG

CCTCTCTGGT CTCGATTGTT CCGTGCTCTG GTGGTGTCAG GTGGGCTCTG GTTCGGTGAC GGCAGTGCTC ACCAGCAGTG CTGTCACCTG AGTGTGGGGG TCTGCCCCAC GCCACGGTGG CCTCCCAGAA CCGGCCTGCC GGCCTGTGAC CCCAGCATGG CTGTGCCCTG

AGAGGTGCAG GCTGCCGCCG CTTCACCCAG CGTGGAAGAC TCAGCCTTGC GTGCTCTGAG TGGAAAGGTG CTGTCCTGAG CTCCAAAGGG CCACTGCCCA ACACGGGGCC ACCCCACGAA ACCTCCTGGG ACCCTGCCCA CTGTGCCCTG GGGCCCCTTC CCCTGTGGAT

GGGGCAGTAC 60 CTGCTCCTGC 120 TATGAAGAAT 180 TGCTGTCTCA 240 AGGTCCCCAC 300 360 GGCTCCCAGC GCACCTGGGA 420 ATGGGCGGCT 480 540 ACCCGGACCC GGACAGGCAC 600 TGGGCCACCT 660 CCTAAGGAGA 720 780 AAGCCCTGCC GTGGCTGGGG 840 GGCCAGACCA 900 TGTGCTGGCG 960 GGGGAGTGGG 1020

Properdin

••

cDNA sequence continued

||H| IMtl IHi HiJ 1^9 ^ H ^ H ^ H ^ H

ACTCGTGGGG AAATCCCGGG GATGTGCCGG AAGGATCATG CTACCCGTGC CCATGGTCGA GTGAGGAGCT CTGCTTGCAA TGACCTTCCA

GGAGTGGAGC CCAGCAGTCA GCAACAGCAG GTCAGAGTGG CCGCCAGCGC AGGTCAGGGC ACAAGGGCAG AGACCCTGAG AACCTCAATA

CCCTGTATCC GACGGAACAT CGCGGGAGGA CCTGCAGGGG GATATCCGGC ACTGCTACAG AGTACCTGGG GGCTGTGCAT CTCTGCACAC CCTTGCTCCC GAGAAGAACG TGACCTTCTG AAGCTGGTGG TGGAGGAGAA GAAGAGGAACT CTAACACTT AACTAGCCTCT TCGAAAAAA

GAAGTCCATC CCGCAAGTTT CATCCAGCAC GCCCCCCTGT CAAGTACCCG GGGGAGACCG ACGACCATGT CTCTCCTCCA AAAAAAAAAA

AGCTGTCAAG GACGGACATC TGCCCCTTGA GGACCTAATC CCCACCGTTT CTGCCACGGT CTACACGTGC CTCTGAGCCC AAA

1080 1140 1200 1260 1320 1380 1440 1500

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TAA) and the probable polyadenylation signal (AATAAAI are indicated.

Genomic structure^^'18 The gene spans 6kb and is encoded by 10 exons illustrated below. The introns vary from 0.1 to 1.6 kb. 1 kb

H 10

Hf

I

mil

Hi

Accession numbers Human^^'^^ Mouse^^ Guinea-pig2o

X57748 M83652 S49355 X12905 S81116

Deficiency X-linked. Defective alternative pathway function, resulting in highly impaired bactericidal activity. Patients are highly susceptible to fulminant meningococcal infections. Mutations identified: C546toT;R161 to stop C363 to T; RlOO to W T1305 to G; Y414 to D

Polymorphic variants None known.

Type I (complete deficiencyj^^ Type II (partial deficiency)^^ Type III (dysfunctional protein)^^

Properdin

References ' Fames, T.C. et al. (1987) Biochem. J. 243, 507-517. 2 Fearon, D.T. and Austen, K.F. (1975) J. Exp. Med. 142, 856-863. ^ Nolan, K.F. and Reid, K.B.M. (1990) Biochem. Soc. Trans. 18, 1161-1162. ^ Smith, C.A. et al. (1984) J. Biol. Chem. 259, 4582-4588. 5 Pangbum, M.K. (1989) J. Immunol. 142, 202-207. 6 Smith, K.F. et al. (1991) Biochemistry 30, 8000-8008. 7 Parries, T.C. et al. (1988) Biochem. J. 252, 47-54. « Nolan, K.F. and Reid, K.B.M. (1993) Methods Enzymol. 223, 35-46. 9 Whaley, K. (1980) J. Exp. Med. 151, 501-516. » Schwaeble, W. et al. (1993) J. Immunol. 151, 2521-2528. ^ Wirthmueller, U. et al. (1997) J. Immunol. 158, 4444-4451. 2 Schwaeble, W. et al. (1994) Eur. J. Biochem. 219, 759-764. ^ Nolan, K.F. et al. (1992) Biochem. J. 287, 291-297. 4 Nolan, K.F. et al. (1991) Eur. J. Immunol. 21, 771-176. 5 Higgins, J.M. et al. (1995) J. Immunol. 155, 5777-5785. 6 Goundis, D. et al. (1989) Genomics 5, 56-60. 7 Maves, K.K. and Weiler, J.M. (1992)}. Lab. Clin. Med. 120, 761-766. 8 Fredrikson, G.N. et al. (1996) J. Immunol. 157, 3666-3671. 9 Goundis, D. and Reid, K.B.M. (1988) Nature 335, 82-85. 20 Maves, K.K. et al. (1995) Immunology 86, 475-9. 2i Westberg, J. et al. (1995) Genomics 29, 1-8.

CD59 B. Paul Morgan, Department of Medical Biochemistry, University of Wales, Cardiff, UK Other names P-18, membrane inhibitor of reactive lysis (MIRL), homologous restriction factor 20 (HRF-20), membrane attack complex inhibitory factor (MACIF), protectin.

Physicochemical properties^ CD59 is synthesized as a 128 amino acid precursor, including a 25 amino acid leader peptide at the N-terminus and a 26 amino acid signal for glycosylphosphatidylinositol (GPI) anchor addition at the C-terminus. The mature protein consists of 77 amino acids, the GPI anchor attachment site being at N102. M,(K) predicted 11.5 observed 18-23 iV-linked glycosylation site^ 1 (43)

Structure^^ CD59 is a compact, disc-shaped structure with four loops created by intramolecular disulfide bonds projecting from the disc. There is a large Nlinked carbohydrate group placed laterally containing variable structures in the M, rSinge 4000-6000. A flexible seven amino acid stalk extends from C94 to N102, the site of GPI anchor addition. The C8/C9-binding site has been putatively localized to a hydrophobic groove on the concave upper face of the disc, centred around W65. CD59 shows sequence and structural similarities with the murine Ly-6 antigens and a number of other molecules now grouped in the 'Ly-6 multigene family'.

Function^-^ CD59 binds C8 in the forming membrane attack complex (MAC) and blocks the recruitment of multiple C9 molecules necessary for assembly of the MAC pore. It may also bind C9 in partially assembled MACs. It has a proposed role in signalling cell activation upon assembly of the MAC. Suggested roles as a ligand for CD2 and as an inhibitor of perforin have not been supported.

Tissue distribution*'^ Widely expressed, present on all circulating cells, vascular endothelium, epithelia and in most tissues. Weakly expressed in the central nervous system. Fluid-phase forms in urine and some other biological fluids.

Regulation of expression Little studied. Expression in vitro is enhanced by incubation of cells with phorbol esters.

Protein sequence^^ 10 MGIQGGSVLF GLLLVLAVFC HSGHSLQCYN CPNPTADCKT AVNCSSDFDA CLITKAGLQV YNKCWKFEHC NFNDVTTRLR ENELTYYCCK KDLCNFNEQL E^GGTSLSEK TVLLLVTPFL AAAWSLHP

50 100

The N-terminal leader peptide is underlined and the single iV-linked glycosylation site is indicated (N). The site of GPI anchor addition is indicated by N. This sequence does not include the translated product of exon 2.

Chromosomal location Human^^: Ilpl4-pl3. Mouse^^: chromosome 2, E2-E4.

cDNA sequence1,10,13 CGCAGAAGCG GGTGTAGGAG GGAGGGTCTG AGCCTGCAGT TCATCTGATT TGGAAGTTTG ACGTACTACT ACATCCTTAT AGCCTTCATC TCCGCTTTCT GAAAGAATAA GACCAGTCCT GTGACTTGAA ACAGCTTGAG GTCAGTTAGC CTCACATGGA TGTTCCATAT TCTGGCAGGG AGGTACAAGT TATCTTCCAC

GCTCGAGGCT TTGAGACCTA TCCTGTTCGG GCTACAACTG TTGATGCGTG AGCATTGCAA GCTGCAAGAA CAGAGAAAAC CCTAAGTCAA CTTGCTGCCA AATTAGCTTG GCCCGCAGGG CTAGATTGCA TGGGTTCTCT ATCATTAGTA ACGCTTTCAT GTGGGTGTCA AAGTGGGGAA GGCTGAAAAT TGGAAAAGTG

GGAAGAGGAT CTTCACAGTA GCTGCTGCTC TCCTAACCCA TCTCATTACC TTTCAACGAC GGACCTGTGT AGTTCTTCTG CACCAGGAGA CATTCTAAAG AGCAACCTGG AAGCCCCACT TGCTTCCTCC GCAGCCCTCA CATCTTTGGA AAACTTCAGG GTCAGGGACA GTGTTCCAGA CGAGTTTTTC TAATAGCATA

CCTGGGCGCC GTTCTGTGGA GTCCTGGCTG ACTGCTGACT AAAGCTGGGT GTCACAACCC AACTTTAACG CTGGTGACTC GCTTCTCCCA GCTTGATATT CTAAGATAGA TGAAGGAAGA TTTGCTCTTG GATTATTTTT GGGTGGGGCA GATCCCGTGT ACAAGATCCT TTCCAGATAG CTCTGTCTTT CATCAATGGT

GCCAGTCTTT CAATCACAAT TCTTCTGCCA GCAAAACAGC TACAAGTGTA GCTTGAGGGA AACAGCTTGA CATTTCTGGC AACTCCCCGT TTCCAAATGG GGGGTCTGGG AGTCTAAGAG GGAAGACCAG CCTCTGGCTC GGAGTATATG TGCCATGGAG TAATGCAGAG CAGGGCATGA AAATTTTATA GTGTT

AGCACCAGTT 60 GGGAATCCAA 12 0 TTCAGGTCAT 180 CGTCAATTGT 2 40 TAACAAGTGT 3 00 AAATGAGCTA 3 60 AAATGGTGGG 42 0 AGCAGCCTGG 480 TCCTGCGTAG 540 ATCCTGTTGG 600 AGACTTTGAA 660 TGAAGTAGGT 72 0 CTTTGCAGTG 7 80 CTTGGATGTA 84 0 AGCATCCTCT 900 GCATGCCAAA 9 60 CTAGAGGACT 102 0 AAACTTAGAG 108 0 TGGGCTTTGT 112 0

The first five nucleotides in each exon are underlined to indicate the intron-exon boundaries. The methionine initiation codon (ATGj, the termination codon (TAA) and the first polyadenylation signal (AATAAA) are indicated. Exon 2 is alternatively spliced and only a minority of total mRNA (10-20%) contains this exon. At least four species of mRNA have been identified with mobilities of 0.6 kb, 1.2 kb, 1.9 kb and 2.2 kb, differing only in the degree of polyadenylation.

Genomic structure^^-^^ The gene spans approximately 26 kb and contains 5 exons, including the alternatively spliced exon 2. 2kb 1

I

5

\

1

1

•

Accession numbers (EMBL/GenBank) Human Primate Rat Mouse Pig Rabbit

X15861 X16447 L22860 U48255 U60473 AF 020302 AF 040387

Deficiency^^'^^ Single reported case of complete deficiency presenting with nocturnal haemoglobinuria and multiple thrombotic episodes. Defect caused by deletion of C231 in coding region leading to premature termination. On the same allele, G469 is also deleted. Deficiency of CD59 (and other GPIanchored proteins) on clone of circulating cells in paroxysmal nocturnal haemoglobinuria.

D

Polymorphic variants None reported in coding region.

References ^ Davies, A. et al. (1989) J. Exp. Med. 170, 637-654. 2 Rudd, P.M. et al. (1997) J. Biol. Chem. 272, 7229-7244. ^ Fletcher, CM. et al. (1994) Structure 2, 185-199. ^ Kieffer, B. et al. (1994) Biochemistry 33, 4471-4482. 5 Meri, S. et al. (1990) Immunology 71, 1-9. 6 Rollins, S.A. and Sims, P.J. (1990) J. Immunol. 144, 3478-3483. ^ Morgan, B.P. et al. (1993) Eur. J. Immunol. 23, 2841-2850. « Nose, M. et al. (1990) Immunology 70, 145-149. ^ Meri, S. et al. (1991) Lab. Invest. 65, 532-537. ^0 Okada, H. et al. (1989) Biochem. Biophys. Res. Commun. 162, 1553-1559. ^^ Bickmore, W. et al. (1993) Genomics 17, 129-135. ^2 Powell, M.B. et al. (1997) J. Immunol. 158, 1692-1702. ^^ Holguin, M.H. et al. (1996) J. Immunol. 157, 1659-1668. ^^ Tone, M. et al. (1992) J. Mol. Biol. 227, 971-976. ^5 Petranka, J.G. et al. (1992) Proc. Natl Acad. Sci. USA 89, 7876-7879. ^6 Motoyama, N. et al. (1992) Eur. J. Immunol. 22, 2669-2673. '^ Yamashina, M. et al. (1990) N. Engl. J. Med. 323, 1184-1189.

This Page Intentionally Left Blank

Index Underlined type refers to complement main entries. Acute-phase proteins, 32 Adaptive i m m u n e response amplification, 18-19 Adipsin, see Factor D Adult respiratory distress syndrome (ARDS), 42, 44 AIDS, 34 'Alexin', 7 Alternative pathway, 16 Alzheimer's disease, 214 'Amboreceptors', 7 Amyloid A, 26 Anaphylatoxins, 18, 20, 21 see also C3aR Angioedema, 20, 166, 208 Antibodies, and the complement system, 18-19 Apolipoprotein J (clusterin), 210-214 Apoptosis, 20-21 Artificial membranes, 21 Asthma, 49 AZ3B'. see C3aR B, see Factor B Bacterial sepsis, 20-21 see also specific bacteria by name Bacteriolysis, 7 B cells, 19, 150 Behcet's disease, 166 Beta (b) IH, see Factor H Bordet, J., 7 Bovine conglutinin, see Conglutinin see also Collectins Bradykinin, 20, 208 'Bunch of tulips' structure — f o r C l q , 9, 26 forMBL, 31 Bystander lysis, 18 CI complex ( C l q - C l r - C l s ) , 15 CI inactivator, see CI inhibitor

CI inhibitor (ClINH), 15, 206-209 CI inhibitor deficiency-linked disease, 20 Clq, 9, 15, 26-30 see also Collectins ClqA, 28 ClqB, 28 C l q C , 29 ClqRp, 176-179 see also Cell surface receptors Clr, 15. 52-55 see also Serine proteases Cls, 15. 56-60 see also Serine proteases C2. 73-77 see also Serine proteases C2a, 15, 17 C2b, 15 C3, 17. 88-94 thioester bond in, 11 see also C3 family C3a, 15 C3aR, 180-183 see also Cell surface receptors C3b, 15, 17 C3b*, 15, 17 C3bB, 17 C3bBb, 17 C3b/C4b receptor, see CRl C3bi-receptor, see CR3 C3 convertase activator, see Factor D C3 deficiency, 20 C3 family, 10-11 C3 (H2O), 16-17 C3 (H20)Bb complex, 17 C3 proactivator, see Factor B C3dR, see CR2 C3 family, 88-109 C3. 88-94 C4. 95-103

C5.104-109 C4, 15, 95-103 thioester bond in, 11 see also C3 family

C4a, 15 C4b, 15 C4b*, 15 C4b-binding protein, see C4BP C4b-bp, see C4BP C4BP, 15, 161-167 see also Regulators of complement activation (RCA) C4-bp, see C4BP C5. 104-109 see also C3 family C5a, 17 C5aR, 184-187 see also Cell surface receptors C5b, 17 C5b*, 17 C5b67, 18 C6. 112-116 see also Terminal pathway components C7. 117-122 see also Terminal pathway components C8.123-130 see also Terminal pathway components C9, 131-134 see also Terminal pathway components CD4^ lymphocytes, 32 CD lib/CD 18, see CR3 CDllc/CD18, seeCR4 CD 18, see CR3 and also CR4 CD21, see CR2 CD35, see CRl CD46, see MCP CD55, see DAF CD59, 18 CD87, see Urokinase plasminogen activator receptor CD88, see C5aR Cell lysis, 18 Cell surface receptors, 176-203 ClqRp, 176-179 C3aR, 180-183 C5aR, 184-187 CR3. 188-197 CR4. 198-203 'Cellular theory^ 7

Chain association, in collectins, 8 Chemotaxis, 109 Chromosome lq32, 12 Chymotrypsin family, 9 Classical pathway, 15 Clusterin, see Apolipoprotein J (clusterin) Collectins, 9, 26-50 Clq. 26-30 chain association, 8 conglutinin, 36-40 MBL. 31-35 SP-A, 41-45 SP-D, 46-50 Complement activation regulators, see Regulators of complement activation (RCA) Complement compound-C3d/EpsteinBarr virus receptor 2, see CR2 Complement control protein modules (CCP), 12 Complement deficiency, 70 Complement lysis inhibitor (CLI), see Apolipoprotein J (clusterin) Complement pathways, 13-18 alternative, 16 classical, 15 lectin, 15 terminal, 17-18 Complement receptor type 1, see CRl Complement receptor type 1, see CR2 Complement receptor type 3, see CR3 Complement receptor type 4, see CR4 Complement system, 7-22 activation of (summary), 14 and disease, 19-20 function of, 19-20 history of, 7 molecular structure of components, 9 and tissue injury, 20-21 Conglutinin, 9, 36-40 see also Collectins Core-specific lectin, see Mbl CP4, see SP-D CRl. 136-145 see also Regulators of complement activation (RCA) Clr. 52-55

CR2. 146-151 see also Regulators of complement activation (RCA) CR3. 188-197 see also Cell surface receptors CR4. 198-203 see also Cell surface receptors C-reactive protein, 13 Cromer antigens, 154 D, see factor D DAF, 15, 17. 152-155 see also Regulators of complement activation (RCA) DAG, see Apolipoprotein } (clusterin) Decay-accelerating factor, see DAF Dendritic cells, 19 Dialysis, 21 Dimeric acidic glycoprotein (DAG), see Apolipoprotein J (clusterin) Discoid lupus erythematosus, 102 see also Systemic lupus erythematosus Disease, and the complement system, 19-20 Disulfide pattern, of SP-A, 8 DNA, 26 EC 3.4.21.41, s ^ e C l r EC 3.4.21.42, see C l s EC 3.4.21.43, see C2 EC 3.4.21.45, see Factor I EC 3.4.21.46, see Factor D EC 3.4.21.47, see Factor B Epstein-Barr virus receptor 2, see CR2 Escherichia coli, 36, 71 Epidermal growth factor, 12 Esterase, CI inhibitor, see CI inhibitor Factor Xlla, 20 Factor B, 12, 15, 17. 78-82 see also Serine proteases Factor D (adipsin), 17, 69-72 see also Serine proteases Factor H, 15, 17, 20, 168-173 see also Regulators of complement activation (RCA)

Factor 1, 15, 17.20. 83-86 see also Serine proteases Fc receptors, 21 FFl, see factor H Follicular dentritic cells, 19 Fulminant meningococcal infection, 217 see also Meningitis Germinal centres, 19 Glomerulonephritis, see under Nephritis Glycoprotein 2, sulfated, see Apolipoprotein J (clusterin) Glycoprotein III, see Apolipoprotein J (clusterin) Glycoprotein 45-70, see MCP Glycoprotein 80, see Apolipoprotein J (clusterin) Gonorrhoea, 109 see also under Neisseria gp45-70, see MCP Haemolysis, 7 Haemodialysis, 21 Heart disease, 21 Heart-lung bypass, 21 Heparin, 26 Herpes simplex-2, 36 History, of complement research, 7 HIV, 32, 34, 150 Homologous restriction factor 20 (HRF20), see CD59 Host defences, 18 HRF-20, see CD59 H u m a n Clq/MBL/SPA receptor, see ClqRp Humoral theory, 7 Hypersensitivity, 19 Hypocomplementaemic renal disease, 172 I, see Factor I IgG, 15, 26 IgM, 15, 26 Immune adherence receptor, see CRl I m m u n e bodies', 7

Immunoglobulins, 15, 26 Immune system, and complement, 18-19 Infectious disease, 20 Neisseria, 20: see also under Neisseria pyogenic, with C3 deficiency, 20 pyogenic, with partial MBL deficiency, 20 Inflammatory injury, 20-21 Influenza A virus, 31, 36 Insulin, 70 aj„P2-Iiitegrin, see CR3 axp2-Integrin, see CR4 Integrins, see CR3 and also CR4 Interferon-y, 70, 206 Interleukin 6, 206 Kallikrein, 20 Kidney, immune complex disease, 102 see also under Nephritis Kininogen, 208 Kinins, 20 Knops antigen, 143 Lectin, see MBL (mannose-binding lectin) Lectin pathway, 15 LeuCAM, see CR3 Leukocyte adhesion deficiency (LAD), 195 Leukocyte integrin, see CR3 Leukotrienes, 18 LFA-1, 189 Local tissue injury, 20-21 Low-density lipoprotein, in terminal components, 12 Lytic pathway, see Terminal pathway Mac-1, seeCR3 MACIF, see CD59 a2-Macroglobulin, 11 Magnesium ions, 188 Manganese ions, 188 Mannan-binding lectin (MBL)associated serine protease 1, 5^6 MASP-1

Mannan-binding lectin (MBL)associated serine protease 2, see MASP-2 Mannan-binding protein (lectin), see MBL Mannose-binding lectin, see MBL MASP-1, 15-16.31.61-64. see also Serine proteases MASP-2, 15-16.31.65-68 see also Serine proteases MBL, 9, 15-16, 31-35 see also Collectins MBL deficiency, 20 McCoy antigen, 143 MCP, 17. 156-160 see also Regulators of complement activation (RCA) Measles, 157 Measles virus receptor, see MCP Membrane attack complex (MAC), 12, 18,20 Membrane attack complex inhibitory factor (MACIF), see CD59 Membrane cofactor protein, see MCP Membrane inhibitor of reactive lysis (MIRL), see CD59 Membranes, artificial, 21 Membranoproliferative glomerulonephritis, see under Nephritis Meningitis, 109, 115, 120, 128, 134, 172,217 Meningococcal meningitis, see under Meningitis Metal ion-dependent adhesion site (MIDAS), 188 Metchnikoff, EHe, 7 MIRL, see CD59 Myasthenia gravis, 21 Myocardial infarction, 21

NA1/NA2, see Apolipoprotein J (clusterin) Native properdin, see Properdin Neisseria gonorrhoea, 71, 115, 120, 134 Neisseria meningitidis, 71 Neisseria spp., 20, 157 Nephritis, 21, 30, 76, 93, 172

Nocturnal haemoglobinuria, 221 see also Paroxysmal nocturnal haemoglobinuria OKM-1, seeCRS Opsonins, 18 P-18, sg^CD59 PlOO, se^MASP-l pi50, 95 antigen, see CR4 Paraoxonase, 211 Paxillin, 189 Perforin, 12 Phagocytosis, enhancement, see ClqRp Plasmin, 20 Proline-rich protein (PRP), see C4BP Properdin, 7, 215-218 Properdin factor B, see Factor B Prostaglandins, 18 Protectin, see CD59 Protein tyrosine kinases (PTKs), 189 PSAP, see SP-A PSPD, see SP-D Pulmonary surfactant (glyco)protein A, see SP-A Pulmonary surfactant (glyco)protein D, see SP-D Proteases, see Serine proteases Proteinuria, 21 Pyogenic infections, 20, 55, 59, 76, 93 RaRF (Ra-reactive factor), see MBL Regulators of complement activation (RCA), 12-13, 136-173 C4BP, 161-167 C R l . 136-145 CR2. 146-151 DAF. 152-155 Factor H, 168-173 MCP. 156-160 Retinoic acid, 190 Rheumatoid arthritis, 21 Salmonella montevideo, 31 Salmonella typhimurium, 36 'Sensitizer', 7 Sepsis, 20-21

Serine proteases, 9-10 C l r . 52-55 C l s , 56=60 C2. 73-77 Factor B, 78-82 Factor D, 69-72 Factor I, 83-86 MASP-1. 61-64 MASP-2, 65-68 Serum protein 40,40, see Apolipoprotein J (clusterin) SFTPAl, see SP-A SFTPA2, see SP-A SFTPD, see SP-D SGP-2, see Apolipoprotein J (clusterin) Short concensus repeat (SCR), see CCP Sialic acid, 17 SP-A, 8, 41-45 see also Collectins SP-D, 46-50 see also Collectins l i s protein, see C l q and also Collectins Spontaneous inherited complement deficiency, 20 Ss(C4)-binding protein, see C4BP Ss protein (mouse), see C4 Staphyloccous aureus, 42, 211 Streptococcus pyogenes, 157, 211 Sulphated glycoprotein 2, see Apolipoprotein J (clusterin) Surfactant protein A, see SP-A Surfactant protein D, see SP-D Swain-Langley antigen, 143 Systemic diseases, 20-21 Systemic lupus erythematosus (SLE), 20, 30, 55, 59, 76, 93, 102, 143, 150, 172, 195,208 Systemic vasculitis, 21

Terminal pathway, 16 Terminal pathway components (C6, C7, C8, C9), 12 EGF-like repeat, 12 LDL receptor class A repeat, 12 perforin-like segment in, 12 thrombospondin type 1 repeat in, 12

Testosterone repressed prostate message 2, see Apolipoprotein J (clusterin) Thioester bonding, 11, 15 Thrombosis, 221 Thrombospondin, 12 Tickover'hypothesis, 16 Tissue injury, 20-21 Transforming growth factor-(3 receptors, 211 TRPM-2, see Apolipoprotein J (clusterin) Trypsin subfamily, 9 Tumour necrosis factor-a, 206 Tyrosine kinase, 189

Urokinase plasminogen activator receptor, 189 Vasculitis, 76 Vav, 189 Vav-p21 (ras), 189 York antigen, 143 Zymogen, 16 Zymosan, 7