Signaling and Switching for Packet Telephony
For a listing of recent titles in the Artech House Telecommunications Library, turn to the back of this book.
Signaling and Switching for Packet Telephony Matthew Stafford
Artech House, Inc. Boston • London www.artechhouse.com
Library of Congress Cataloguing-in-Publication Data Stafford, Matthew. Signaling and switching for packet telephony / Matthew Stafford p. cm.—(Artech House telecommunications library) Includes bibliographical references and index. ISBN 1-58053-736-7 (alk. paper) 1. Internet telephony. 2. Packet switching (Data transmission) I. Title. TK5105.8865.S73 2004 621.382'16—dc22 CIP 2004053829
II. Series.
British Library Cataloguing in Publication Data Stafford, Matthew Signaling and switching for packet telephony. —(Artech House telecommunications library) 1. Internet telephony 2. Packet switching (Data transmission) I. Title 621.3’85 ISBN 1-58053-736-7 Cover design by Yekaterina Ratner
© 2004 Matthew Stafford and Cingular Wireless. All rights reserved.
Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher. All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Library of Congress Cataloguing-in-Publication Number: 2004053829 International Standard Book Number: 1-58053-736-7 10 9 8 7 6 5 4 3 2 1
Contents
Acknowledgments CHAPTER 1 Introduction 1.1 1.2
1.3 1.4 1.5 1.6
In the Beginning, There was Voice Motivation: What Is the Case for Packet Telephony? 1.2.1 One Network Versus Two 1.2.2 Services Switch Design 1.3.1 Separating Bearer and Control Planes Motive and Opportunity for Carriers What Are We Waiting For? Motivation for this Book
PART I Switching Architectures for Packet Telephony: An Expository Description
xiii
1 1 2 2 3 3 4 5 6 7
9
CHAPTER 2 Essentials of Next Generation Switching
11
2.1 2.2 2.3
12 13 14
Another Look at the Backhaul Example Ability to Enter New Markets Switch Components and Terminology 2.3.1 Where Does One Switch Component End and 2.3.1 Another Component Begin? 2.4 A Useful Abstraction 2.5 Defining the Fabric 2.5.1 Do Control Messages Between Media Gateways and 2.5.1 Their Controller Pass Through the Switch Fabric? 2.5.2 What Is a Packet? CHAPTER 3 Motivation for Packet Telephony Revisited 3.1
Separation of Bearer and Control 3.1.1 Open Interfaces 3.1.2 Introducing and Maintaining Services
14 15 16 17 18
21 21 22 23
v
vi
Contents
3.2
3.1.3 New Bearer Types Packet Fabrics 3.2.1 Exploiting Routing Intelligence of Packet Networks 3.2.2 Exploiting Low Bit-Rate Voice Codecs
CHAPTER 4 Signaling and Services 4.1 4.2
25 26 26 29
31
The Control Plane What Is a Service? 4.2.1 Vertical Services 4.2.2 Services that Offer Alternative Billing Schemes 4.2.3 Short Message Service 4.3 Where Do Services “Live,” and What Do They Entail? 4.3.1 Can You Say “Database?” 4.4 Limitations of Circuit-Switched Networks
31 32 32 33 33 33 34 35
PART II Components of Packet Telephony: Technical Descriptions
37
CHAPTER 5 Introduction to Part II
39
5.1
40
Selected Telco Terminology
CHAPTER 6 Protocols 6.1 6.2
6.3
6.4
6.5
What Is a Protocol Stack? 6.1.1 Comparison with Last In, First Out Data Structures Generic Layer Descriptions 6.2.1 Data Link Layer 6.2.2 Network Layer 6.2.3 Transport Layer 6.2.4 A Note on Terminology: Packets and Frames 6.2.5 General Comments Internet Protocol and Transmission Control Protocol 6.3.1 What Is an Internet Protocol Router? 6.3.2 A Brief Look at TCP 6.3.3 TCP/IP Networking Illustration 6.3.4 Alternatives to TCP at Level 4: UDP and SCTP What Is a Finite State Machine? 6.4.1 States 6.4.2 State Transitions 6.4.3 Additional Comments Signaling System 7 in Brief 6.5.1 MTP2 6.5.2 MTP3 6.5.3 SCCP 6.5.4 TCAP
43 43 44 44 45 46 46 46 47 48 48 48 51 52 53 54 54 55 55 56 57 57 57
Contents
vii
6.5.5 MAP 6.5.6 ISUP 6.6 Summary References
57 58 60 61
CHAPTER 7 A Closer Look at Internet Protocol
63
7.1 7.2 7.3
7.4
7.5
7.6 7.7
7.8
7.9 7.10
The IPv4 Header 7.1.1 Fragmentation and Path MTU Discovery The IPv6 Header 7.2.1 IPv6 Extension Headers Addressing and Address Resolution 7.3.1 Conserving IPv4 Address Space 7.3.2 The IPv6 Address Space 7.3.3 Uniform Resource Identifiers and Domain Name System Security and AAA 7.4.1 Security 7.4.2 Authentication, Authorization, and Accounting Routing 7.5.1 Network Optimization 7.5.2 Internet Routing Protocols 7.5.3 A Link State Protocol: OSPF 7.5.4 Distance Vector Protocols: RIP and BGP 7.5.5 Routing Protocol Convergence 7.5.6 Scalability 7.5.7 Trade-offs Reachability Information Quality of Service and Statistical Multiplexing 7.7.1 What Is Statistical Multiplexing? 7.7.2 Differentiated Services 7.7.3 Multiprotocol Label Switching 7.7.4 “DiffServ at the Edge, MPLS in the Core” 7.7.5 Multiservice Networks Layer 4 Protocols: Suitability to Task 7.8.1 UDP 7.8.2 Carrying SS7 Traffic over an IP Network: SCTP 7.8.3 Comparing and Contrasting TCP with UDP and SCTP Mobile IP Summary 7.10.1 Further Reading References
64 65 65 66 67 67 68 69 69 69 70 71 71 72 72 73 73 75 75 76 76 77 78 79 80 80 81 81 82 83 84 84 85 85
CHAPTER 8 A Closer Look at SS7
89
8.1 8.2 8.3
89 91 92
SS7 Architecture and Link Types SS7 Routing and Addressing Review of the SS7 Protocol Stack
viii
Contents
8.4
8.5
8.6 8.7 8.8
Message Transfer Part 8.4.1 MTP2 8.4.2 MTP3 SCCP 8.5.1 General Description and Communication with MTP3 8.5.2 Getting There Is Half the Fun: Global Title Translation TCAP 8.6.1 Number Portability MAP Summing Up 8.8.1 Additional Weaknesses of SS7 8.8.2 Strengths of SS7 References
CHAPTER 9 The Bearer Plane 9.1
93 94 94 95 95 96 98 99 100 103 104 105 105
107
Voice Encoding 9.1.1 G.711 9.1.2 Why Digital? 9.1.3 Other Voice-Encoding Schemes 9.2 Bearer Interworking 9.2.1 Transcoding 9.2.2 Encapsulation of Digitized Sound 9.2.3 Packetization Delay and Playout Buffers 9.3 Voice over IP 9.3.1 Real-Time Services in IP Networks: RTP over UDP References
107 107 108 108 111 111 111 113 113 113 116
CHAPTER 10 Media Gateway Control and Other Softswitch Topics
119
10.1
Requirements 10.1.1 ID Bindings 10.2 SDP in Brief 10.3 Megaco/H.248 10.3.1 Introducing the Megaco Connection Model 10.3.2 Terminations 10.3.3 Contexts 10.3.4 Megaco Commands 10.3.5 Example Call Flow 10.3.6 Usage of the Move Command 10.3.7 Descriptors 10.3.8 Sample Megaco Messages 10.3.9 Three Way-Calling Example 10.3.10 Megaco Miscellanea 10.4 MGCP 10.4.1 Example Call Flow 10.4.2 Brief Comparison with Megaco
119 121 122 123 123 124 124 125 126 129 130 132 134 136 137 137 138
Contents
ix
10.4.3 Other MGCP Verbs 10.4.4 Transactions and Provisional Responses 10.4.5 MGCP Packages 10.5 Interworking with Circuit-Switched Networks 10.5.1 Latency Trade-offs 10.6 Inhabiting the Bearer, Service, and Control Planes 10.7 Signaling Between Two Softswitches 10.7.1 BICC References
140 141 142 142 142 143 143 143 144
CHAPTER 11 Session Control
145
11.1
“Generic” Session Control 11.1.1 Comparison with ISUP Call Flow 11.1.2 Modularity in Protocol Design 11.2 The H.323 Protocol Suite 11.2.1 Heritage of H.323: ISDN 11.2.2 H.323 Call Control and Media Control Signaling 11.2.3 Talking to the Gatekeeper: RAS Signaling 11.2.4 Evolution of H.323 11.3 SIP Basics 11.3.1 SIP Requests and Responses 11.4 SIP Functional Entities 11.4.1 Proxy Servers and Redirect Servers 11.4.2 Back-to-Back User Agents 11.4.3 Registrars References
145 147 147 148 148 149 150 151 152 154 155 156 157 157 157
CHAPTER 12 More on SIP and SDP
159
12.1 12.2
12.3 12.4 12.5 12.6
12.7 12.8 12.9
A Detailed SDP Example 12.1.1 Additional Line Types A Detailed SIP Example 12.2.1 Registration Procedures 12.2.2 Making a Call 12.2.3 The Offer/Answer Model Forking of SIP Requests SIP for Interswitch Signaling 12.4.1 Comparison with BICC Additional SIP Methods 12.5.1 UPDATEs and re-INVITEs Resource Reservation and SIP 12.6.1 QoS Attributes in SDP 12.6.2 More on Parameter Negotiation SIP-T and Beyond Authentication and Security Further Reading
159 161 161 161 164 167 168 168 170 170 171 171 173 174 174 176 176
x
Contents
References
177
CHAPTER 13 Implementing Services
179
13.1 13.2
13.3 13.4 13.5 13.6 13.7
13.8 13.9
Introduction SS7 Service Architectures: Intelligent Network 13.2.1 The Global Functional Plane 13.2.2 The Distributed Functional Plane 13.2.3 IN Capability Sets 13.2.4 Limitations and Trade-offs of IN CAMEL and WIN Parlay/OSA JAIN SIP and Services 13.6.1 SIP and Intelligent Networks: PINT and SPIRITS SIP in Wireless Networks 13.7.1 Push To Talk over Cellular 13.7.2 SIP Header Compression 13.7.3 IP Multimedia Subsystem Short Message Service 13.8.1 SMS in Support of Other Applications Further Reading References
CHAPTER 14 Properties of Circuit-Switched Networks 14.1
14.2 14.3 14.4 14.5 14.6 14.7 14.8
Telco Routing and Traffic Engineering 14.1.1 Truitt’s Model 14.1.2 Dynamic Nonhierarchical Routing, Metastable States, 14.1.2 and Trunk Reservation 14.1.3 Optional Section: Traffic Intensity and the 14.1.3 Erlang B Formula Comparison with IP Routing and Dimensioning Security Quality of Service Scalability Survivability and Reliability Billing Functionality Emergency Service and other Government Mandates References
CHAPTER 15 Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments 15.1
QoS and Traffic Engineering in IP Networks 15.1.1 Class-Based Queuing 15.1.2 DiffServ and IntServ Revisited
179 180 182 182 183 184 185 185 186 186 186 190 190 191 192 195 196 196 197
199 199 200 202 203 205 206 206 207 207 207 208 208
209 209 209 210
Contents
15.2 15.3 15.4 15.5 15.6
xi
15.1.3 Verifying and Enforcing Traffic Contracts 15.1.4 ITU-T and 3GPP QoS Standards Service-Level Agreements and Policy Control SDP and SDPng Sigtran Adaptation Layers Middlebox Traversal Comments and Further Reading 15.6.1 More on IP QoS 15.6.2 IPv6 and ROHC 15.6.3 Routing for Voice over IP Protocols: iptel Working Group 15.6.4 ENUM 15.6.5 Service Architectures References
212 212 215 216 216 217 219 219 219 221 221 222 222
CHAPTER 16 Conclusion
225
APPENDIX A Data Link Layer Protocols
227
A.1 A.2
HDLC Frame Relay A.2.1 The Frame Relay Header A.2.2 Label Switching and Virtual Circuits A.3 Asynchronous Transfer Mode A.3.1 The ATM Header A.3.2 ATM Approach to Quality of Service and Statistical A.3.2 Multiplexing A.3.3 The ATM Control Plane A.3.4 ATM Adaptation Layers and Options for Voice over ATM A.3.5 Virtual Paths A.3.6 MPLS over ATM: VC Merge Capability A.3.7 Why Not Voice over ATM? A.4 Ethernet A.4.1 History of Ethernet A.4.2 Ethernet Frame Structure A.4.3 CSMA/CD and Its Scalability Limitations A.4.4 Hubs, Bridges, and Switches A.4.5 Further Reading References
227 227 228 229 230 230
About the Author
243
Index
245
230 231 232 234 234 235 236 237 237 238 238 240 240
.
Acknowledgments During the development of this book, I had many helpful conversations with colleagues. I would like to thank the following people in particular: Haifeng Bi, Mike Boeckman, Jasminka Dizdarevic, Steve Frew, Cathleen Harrington, David Howarth, Rich Kobylinski, Si-Jian Lin, Rias Muhamed, Jessie Lee, Steve Partridge, Simon Richardson, Sam Sambasivan, Richard Tam, Randy Wohlert, and Mark Wuthnow. The preparation of this book devoured many evenings and weekends. I would like to thank my wife, Miriam, for her patience.
xiii
.
CHAPTER 1
Introduction 1.1
In the Beginning, There was Voice Voice telephony was arguably the first telecommunications service to achieve truly widespread deployment. Data services came into their own much later. To be sure, data telecommunication (in the form of smoke signals and other visual semaphore schemes, for example) has been around for a very long time. Note also that the telegraph preceded the telephone, and that innumerable messaging schemes of all sorts have been devised and used for military purposes. Morse code telegraphy is an interesting example of a data service: this scheme for electrical transmission of text messages was commercially viable before telephony became available. In scale of deployment, however, telegraph service never came close to the level subsequently reached by telephony. The average person in a developed country has direct access to telephones at home and at work. Moreover, this has been the case for many years; nowadays, wireless telephones add mobility to the equation. Data telecommunication became common in the consumer market only within the last 10 to 20 years. Uptake in the academic and business communities came somewhat earlier (but still much later than voice). Thus voice networks were already ubiquitous when data networking technologies reached mass-market scale, and it was perfectly natural to ask whether one could also use these networks to transmit large volumes of data between distant sites. In a nutshell, the question was: “We have extensive voice networks; are they useful for carrying data as well?”
The answer to this question was definitely “Yes!” The biggest reason for this resounding affirmative was that, starting in the 1960s, the voice network in the United States evolved from analog to digital. In other words, this network was already transporting bits. For years, data networks were islands in a voice-based world. Data networking technologies designed for so-called local area networks (LANs) suffered from severe distance limitations. Long-haul data transmission in the consumer, academic and business markets was achieved using networks initially designed for voice. In fact, telephone carriers’ networks provided the only viable means of interconnecting distant data networks. In many instances, data traffic is still transported in this fashion; fax and dial-up modem transmissions are familiar examples. Widespread access to dedicated wide area data networks (such as the Internet) is a relatively recent phenomenon.
1
2
Introduction
Dedicated data networks are increasingly common and far-reaching, however. This begs the following question: “We have extensive data networks; are they useful for carrying voice traffic?” (Although voice telephony is our primary focus in this book, note that the question remains equally pertinent for other real-time services such as videoconferencing.) Today, telephone networks and data networks are very different beasts, designed according to different philosophies. Here is a crucial distinction to keep in mind: •
In traditional telephone networks, transmission capacity is meted out in a continuous fashion. That is, a fixed amount of capacity is allocated to each call and deallocated when the call ends. Meanwhile, this transmission capacity cannot be shared with any other call (even when the parties on the first call are silent). We say that traditional telephone networks are circuit-switched; here the term circuit refers to the end-to-end transmission capacity that is reserved throughout the life of a call.
•
In data networks, transmission capacity is allocated in a discrete fashion. Suppose two pairs of users are conducting ongoing sessions that use a shared transmission link. When transmission of a packet for one session is completed, the other session can transmit a packet, although neither session has ended. Here a packet is a chunk of digital data (i.e., a sequence of bits); we say that data networks are packet-switched.
Note that these are somewhat oversimplified in the interest of brevity.
1.2
Motivation: What Is the Case for Packet Telephony? 1.2.1
One Network Versus Two
To understand why there is a great deal of interest in packet telephony, one need look no further than the corporate environment. A typical office site has two completely separate in-building networks: a circuit-switched network for telephones and a packet-switched network for computers. The latter is called a local area network. Each of these networks must be provisioned and maintained. Could we combine telephone and data traffic on one of the two networks (thereby allowing us to reduce cost by eliminating the other network)? Of the two in-building networks, the computer network undoubtedly has much greater bandwidth than the phone network. Thus, if we try to interconnect our computers via the phone lines (abandoning the local area network), there is essentially no hope of a satisfactory result. Therefore, if we want to dispense with one of the two networks, eliminating the in-building phone network is the only reasonable choice. And this option entails placing voice traffic over a packet-based medium. The situation is substantially different in the consumer market. A local telephone company that already has a revenue-producing “legacy” voice network has little motivation to invest in voice over packet technologies. However, cable
1.3
Switch Design
3
companies might become interested in doing just that, so as to compete with local telephone companies. In recent years, cable companies have upgraded their networks so they can provide broadband Internet access. In the process, cable has become a bidirectional packet-based medium, suitable for packet voice traffic. We note that, as of this writing, cable telephony has not taken off (at least in the United States). 1.2.2
Services
Packet telephony solutions must provide some economic benefit (that is, increased revenue and/or reduced cost); otherwise, they will not be widely deployed. We have already begun to address reduced cost with our previous example. We will return to this topic later, exploring the service provider’s point of view. Now we look at revenue generation. To increase revenue, a telephone service provider must add new customers (and this is becoming increasingly difficult) or must come up with new features that customers will want to buy. This brings us to the development of enhanced services. As an example, we look at “find me/follow me” services. The basic idea of find me/follow me is a simple one: to offer flexible configuration options for callforwarding behavior on a per-user basis. Suppose, for example, a traveler wants certain callers (identified by calling party numbers, say) to be able to reach him/her via automatic forwarding from his/her landline phone to a wireless phone. All other calls will be forwarded to voice mail. Such features are offered by today’s Private Branch Exchanges, but are not generally available to consumers. Moreover, it is easy to envision useful “add-on” functionalities that are difficult to implement in today’s networks. For example, it would be nice if one could configure out-of-office settings for voice mail and e-mail from the same menu, perhaps employing a speech-to-text processor to convert the voice mail greeting to a text message that is automatically sent in reply to incoming e-mail messages. Other desirable options include the ability to reconfigure (wireline) forwarding options from a wireless phone. This is difficult partly because wireless and wireline networks grew up separately; so-called “Intelligent Network” features in the two realms are based on different signaling protocols. (We flesh out this topic in Chapter 13.) Moreover, with the User Interface limitations of today’s phones, subscribers may find such features difficult to use. Admittedly, packetized voice does not automatically bring about “convergence” between wireless and wireline networks. Well-designed service control schemes that can cross network boundaries, however, may facilitate convergence. Therefore such schemes promise to be an important part of the overall evolution toward packet telephony.
1.3
Switch Design Packet voice equipment is available and in use today, so packet-based telephony is certainly feasible. If people only wanted to call others located in the same building as themselves, the case for packet telephony in the corporate environment would be overwhelmingly positive.
4
Introduction
In reality, one of the best features of existing telephone networks is that it is possible to call almost anyone. To maintain this universality, interworking with the outside world is a must. This boils down to interworking with circuit-switched networks, since the vast majority of telephones today are connected to circuit switches. In particular, circuit switching is utterly predominant in public telephone networks; packet voice is only just beginning to make inroads in this market. How should packet voice switches be designed? This is one of the main topics of discussion in this book. Specifically, we will talk extensively about the telco environment and design principles that are expedient for operating in this environment. In this context, we will draw comparisons with legacy voice switches. We note that the design principles discussed here are equally applicable to cable operators, if and when they decide that they want to become large-scale telephone service providers. 1.3.1
Separating Bearer and Control Planes
The separation of bearer and control planes is a fundamental concept in next generation switch design. The bearer plane is the part of the network that carries end-user traffic (e.g., voice samples, in the case of telephony). As the name suggests, the control plane is the part of the network that carries call-control signaling. In circuit switches, the bearer and control planes are not clearly separated. In a nutshell, the reason for separating the bearer and control planes is the promise of increased flexibility. This flexibility can take several forms, notably: •
Distributed architecture. A rich set of options for placement of switch components: the elements of a switch can be geographically dispersed.
•
A rich set of options [based on packet technologies such as Internet Protocol (IP), Asynchronous Transfer Mode (ATM) and Ethernet as well as sophisticated vocoders] for representing, encapsulating, routing, and transporting voice traffic.
•
The ability to base the creation and implementation of new services on standardized open interfaces. This is an important step toward the “holy grail” of services that combine voice, video, and data in useful ways.
•
Flexibility in choosing suppliers. That is, different components can potentially be purchased from different equipment vendors.
We will return to these topics in the Section 1.4 (where we argue that these advantages make a compelling case for packet telephony) and elsewhere. At this point, the reader can begin to see that packet telephony is much more than replacing circuit-switched bearer channels with packet-switched alternatives. It is possible to build switches that internally employ packet bearers, but for all intents and purposes act exactly like circuit switches. There is a place for such technology. However, we will see that next generation switching concepts hold the potential for much more.
1.4
1.4
Motive and Opportunity for Carriers
5
Motive and Opportunity for Carriers Why would a telephone service provider want to invest in voice over packet technology? We saw one reason in Section 1.2: to enable enhanced services (and thereby realize new revenue streams). In the following example, the goal is to reduce cost. We said earlier that separation of bearer and control allows for the components of a switch to be geographically dispersed. To see why this is important, we direct the reader’s attention to Figure 1.1. (The fabric of a switch is the conduit through which voice samples flow from the calling party to the called party and vice versa. Each area in the figure might represent a local switch, along with all of the customers that are homed to that switch, or a private branch exchange, etc.) Note that areas 1 and 2 are not directly connected. Therefore, when a customer in area 1 calls a customer in area 2, the bearer path must include the nearest switch that connects to both areas (switch A in this example). If areas 1 and 2 are much closer to one another than they are to switch A, then we are faced with so-called “backhaul” costs. That is, the voice- encoding bit stream must travel the long way around for the duration of the call. Suppose that the volume of traffic between areas 1 and 2 is not sufficient to justify either of the following alternatives: •
Reserving dedicated transmission capacity between areas 1 and 2;
•
Installing an additional switch closer to areas 1 and 2.
Suppose, however, that the volume of traffic is large enough that it grieves us to pay for backhaul transmission capacity. If, by shifting to a distributed design, we could dramatically reduce the cost of adding switching capacity at a nearby location, then we would have a viable alternative. In Figure 1.2, we illustrate the notion that the fabric under the command of a given controller can consist of geographically-dispersed nodes. If these fabric nodes are very inexpensive (relative to the cost of a “legacy” voice switch), and if one controller can direct the operation of many fabric nodes, then the cost of introducing switching capacity to new locations can indeed be reduced a great deal. Note that a new type of traffic appears in Figure 1.2: control messages between the controller and the fabric component at location B. (The Controller also sends commands to the colocated fabric component at location A, but these messages do not require interlocation transmission facilities.)
Area 1
Bearer path Switch A Controller Fabric
Area 2
Figure 1.1
Bearer path must traverse closest switch.
6
Introduction
Area 1
Bearer path Control traffic
Location A Controller
Fabric component B Area 2 Figure 1.2
Fabric component A
Location B
Distributed fabric.
We note that the example presented here is oversimplified in a number of ways. In particular, we have glossed over the following matters: •
How are fabric components interconnected?
•
What kinds of instructions does the controller need to send to the fabric components?
We will return to these considerations in the main body of the book. Meanwhile, we think that the example of this section offers a useful way (though not the only way) to think about next generation switching architectures: one controller commands numerous fabric elements. The controller possesses sophisticated call control functionality (and the resident software is likely to be expensive), whereas the fabric elements are not very smart (but are relatively inexpensive). Although we hold the notion of a distributed fabric as an idea whose time has come, distributed call processing is very difficult—thus the centralized controller.
1.5
What Are We Waiting For? If it offers such wonderful advantages, why not enter the brave new world of packet telephony right away? In our mind, the reason is that existing voice technology evolved over a long period of time. As a result of many years of growth and refinement, today’s circuit-switched networks: •
Are optimized for voice. Today’s telephone networks are designed to deliver voice samples from origin to destination switch at very regular intervals, and to do so quickly (i.e., utterances are recreated at the listener’s end very soon after they leave the speaker’s mouth). Moreover, today’s voice networks can set up calls quickly—when placing a call, one does not have to wait very long for the called party’s phone to ring, or to begin a conversation once the called party answers. The capabilities listed in this bullet are often associated with the blanket term quality of service (QoS).
•
Possess a labyrinth of functionality that is difficult to duplicate in a short space of time. This ranges from schemes that keep call-control processing capabilities from “overheating” when congestion occurs to complex billing systems.
1.6
Motivation for this Book
•
• •
7
The list, which also includes integration of a variety of applications with Touch-Tone signaling, literally goes on and on. Are successfully deployed on a very large scale. Most things are more difficult to acheive on a large scale than on a small scale—think of coordinating schedules of a large number of people or keeping a major airport running smoothly. Telephony is no exception; over the years, telcos and their network infrastructure manufacturers have turned scalability into an art. Are extremely reliable. Represent an enormous investment of capital.
For all of the reasons listed, equipment in existing voice networks will remain in use for a long, long time. Thus packet voice switches will have to interwork gracefully with legacy equipment. We will see that this requirement is far from trivial—it is one of the main hurdles that must be crossed before packet telephony can gain a foothold in service providers’ networks. This hurdle is economic as well as technical. In fact, collecting voice samples and stuffing them into packets is the easy part. This is not to belittle the development that went into making this possible. Rather, it is meant to indicate that digital signal processing techniques have advanced to the point where this “encoding and encapsulating” step is well understood and, moreover, can be accomplished in a cost-effective way. Traditionally, data networking has not emphasized quality of service. In recent years, much attention has been devoted to quality of service in packet networks. It is certainly possible to achieve good voice quality and low latency in the packet domain. But the telecommunications industry as a whole is still struggling to find the formula for realizing this goal on a large scale and at a palatable cost. To be fair, packet voice deployments are beginning to happen, and we believe that they will eventually happen on a truly large scale. However, widespread deployment will take time.
1.6
Motivation for this Book Off and on since 1998, we have worked on projects investigating packet voice technology. We often wished that we could find an expository introduction to the topic (especially when we were new to the subject). Our first purpose is to set forth such an exposition, focusing on architectural design of packet voice switches. We take on this task in Part I, taking care to introduce as little technical terminology as possible (and trying especially hard to avoid acronyms). We hope that this portion of the book is accessible to readers who do not have engineering backgrounds, as well as those who do. In Part I of this book, we will see that the new paradigm draws on many areas that used to be disparate—at one time, it would have been very surprising to hear mention of data-network protocol stacks in the “same breath” as new voice-encoding techniques. This is no longer such an unusual juxtaposition, as these developments find common cause in next generation switching products.
8
Introduction
Our second purpose, which is served in Part II, is to provide information on some of the disparate technical areas that are such newly-acquainted bedfellows. This book is not encyclopedic and is far from being the last word on the technical topics that we cover (many of which merit entire books on their own). Our aim in introducing these topics is to flesh out the view of packet voice switches that we develop in the early part of the book. To this end we highlight essential features of each technical topic, and provide pointers to other sources for in-depth coverage. The technically-oriented portion of this book will likely be of greatest interest to readers who have engineering backgrounds. Our intention is that the “prerequisites” for reading this book are nonspecific, however. Many people have good knowledge of telephony but not of data networking, or extensive knowledge of data networking but not of telephony. Others may have had limited exposure to both topics, but find that they are interested in the subject of this book because they are starting to hear about Voice over Internet Protocol. We hope that technically inclined people of all stripes will find useful information here. Last but not least, we hope to give the reader some insight regarding the difficulty of migrating from circuit-switched telephony to packet telephony. We emphasize that this difficulty is economic at least as much as it is technical; its scale has often been underestimated in the past. Throughout the book, our mindset is tilted toward large-scale deployments and interworking with existing public telephone networks. When we were first exposed to the arguments of Sections 1.2 and 1.4, we were “rarin’ to go.” That was several years ago, and carriers have been very slow to adopt the new paradigm in the intervening time. Although there are many compelling plusses for packet telephony (and surely they would win the day if we were building networks from scratch), there are also many reasons why large telephone service providers’ interest in packet telephony has been tepid. We touched on some of the barriers to migration in Section 1.5, and will elaborate on these barriers as the discussion progresses. We believe that, if we can impart a sense of the sheer enormity of the undertaking, then readers can more accurately envision the new paradigm’s road to economic viability.
PART I
Switching Architectures for Packet Telephony: An Expository Description
.
CHAPTER 2
Essentials of Next Generation Switching This chapter is organized as follows. After introducing a minimum of terminology, we begin with a reexamination of the backhaul example from Chapter 1. This is followed by a variation on the backhaul example. At that point, it becomes expedient to define a number of industry-standard terms. Then we recast the backhaul example in the new nomenclature. Distributed Architecture
In this book, the term distributed architecture refers to the idea that the physical components of a switch may be geographically dispersed, yet they function in a coordinated way: to the outside world, they are seen collectively as a single logical entity. Switches and Switching Fabrics
We normally think of a transmission link (or simply a link) as something that is subdivided into channels. Our emphasis in the current discussion is on telephony, so a channel can be defined as transmission capacity for one voice call. (In multiservice networks, one must think in more general terms, although similar concepts can be applied.) For our purposes, a switch is a device with the capacity to dynamically direct traffic from any input link to any output link on a per-channel basis. We can rephrase this as follows: a switch is a device with the ability to dynamically “cross connect” any input channel with any output channel. The fabric is the conduit through which traffic flows between input channel and output channel. (Although voice channels are bidirectional, we can make sense of “input” and “output” by agreeing that the input channel is on the same side of the switching fabric as the call’s originating party and the output channel is on the destination side.) Throughout our discussion, we distinguish between bearer traffic and call-control signaling traffic. Recall from the introduction that bearer traffic is the traffic generated by end users (we think primarily of voice samples in the case of telephony; for completeness, we also include fax transmissions and other voiceband data applications).
11
12
2.1
Essentials of Next Generation Switching
Another Look at the Backhaul Example Recall that, in the backhaul example, the volume of traffic between areas 1 and 2 is not sufficient to justify dedicated area 1 ↔ area 2 transmission capacity. So the bearer path for each area 1 ↔ area 2 call must traverse a switch that connects to both areas. In our example, the closest switch (switch A, that is) is far away, relative to the distance between areas 1 and 2. In Figure 2.1, we have redrawn the original diagram (see Figure 1.1) to include transmission facilities. The voice-encoding bit stream must travel to switch A and back throughout the duration of the call. This is true even if (as shown in the figure) the area 1 ↔ switch A and switch A ↔ area 2 portions of the bearer path use the exact same transmission facilities for part of the way. The point is that, in this example, the multiplexing equipment that grooms both of these segments onto the same transmission facility (labeled “Mux” in Figure 2.1) does not have switching capability or intelligence. Let us suppose that areas 1 and 2 are not the only users of switch A’s capabilities, but that many more “areas” residing on the opposite side of switch A also connect to it. (These additional areas are not shown so as to keep the diagrams simple.) Thus the transmission capacity that is reserved for traffic between area 1 and switch A may be well-utilized even if calls between area 1 and area 2 are rare (because it carries traffic headed for many other final destinations in addition to area 2). The same holds for reserved capacity connecting area 2 and switch A. The configuration illustrated in Figure 2.1 is not unreasonable at all: if the volume of traffic between areas 1 and 2 is small, this configuration will be less expensive than the alternatives. Two of these alternatives are: •
Reserve dedicated transmission capacity between areas 1 and 2. The problem with this alternative is that the transmission facilities set aside for this purpose will tend to be poorly utilized if traffic volume between areas 1 and 2 is low.
•
Install an additional switch at location B. The problem with this alternative is that voice switches tend to be very expensive (much more expensive than multiplexers, for example).
We can summarize the statement that backhaul to A is the least-expensive option (for low volumes of area 1 ↔ area 2 traffic) via the “inequalities” cost(xmission to switch A) < cost(dedicated area 1 ↔ area 2 xmission capacity).
Bearer path
Area 1
Bearer plane connectivity
Switch A Controller
Mux Area 2
Figure 2.1
Fabric
Location B
Backhaul example showing transmission facilities and multiplexing equipment.
2.2
Ability to Enter New Markets
13
and cost(xmission to switch A) < incrementalCost(additional switch at location B).
If the volume of traffic between areas 1 and 2 increases steadily, then we will eventually reach a “crossover point” beyond which one or both of our inequalities are reversed. One interpretation of the distributed approach is that it attempts to lower this crossover point by decreasing the right-hand side of the second inequality above. In Figure 2.2, we move to a geographically-dispersed configuration in which the fabric components at locations A and B are both under the command of the controller at location A. We repeat the following observation from the introduction: if the fabric nodes are very inexpensive (relative to the cost of a circuit switch), and if one controller can direct the operation of many fabric nodes, then the cost of introducing switching capacity to new locations can be dramatically reduced. Figure 2.2 shows that control traffic must flow (over inter-location transmission facilities) between the controller at location A and the fabric component at location B. Note that, in contrast to bearer traffic, control traffic does not flow continuously throughout the life of the call. Before moving on, let us make a few observations about Figure 2.2. In all cases, when a user in area 1 wants to call someone in area 2 (or vice versa), a connection request must be dispatched to the controller at A. Then the Controller at A must inquire about the availability of the destination user, and so on (for simplicity, this call-control signaling traffic is not shown in the figure). So even though the configuration of Figure 2.2 eliminates bearer backhaul between location B and location A, signaling traffic must still be transported to a controller. Certainly, we still gain something, because signaling traffic is much less voluminous than bearer traffic.
2.2
Ability to Enter New Markets Here we imagine that Figure 2.2 arises via a different line of reasoning than that of the previous section. More specifically, suppose that a service provider initially does not offer service to customers in areas 1 and 2. Our service provider wants to enter
Bearer path Control traffic Area 1
Bearer plane connectivity Location A Controller Fabric component B
Area 2 Figure 2.2
Location B
Distributed fabric revisited.
Fabric component A
14
Essentials of Next Generation Switching
the new market represented by these areas, but in the past, there has been a chicken-and-egg problem. That is, the provider could not afford to install switching equipment near these locations without the existence of a revenue-producing customer base, but could not build a customer base without the presence of switching equipment. (Here we are assuming that the distance between locations A and B is so large that backhaul is economically impractical.) Distributed switching could lower the barriers to entering the market represented by areas 1 and 2. As in Section 2.1, this is predicated on the assumption that fabric components are inexpensive.
2.3
Switch Components and Terminology Up to this point, we have made a concerted effort to introduce as little terminology as possible. Now that the reader has had a chance to absorb some of the main concepts of next generation switching, we introduce a number of terms that will facilitate further discussion. Many of the concepts we have covered (especially separation of bearer and control) are often associated with the term “softswitch.” Therefore we find it convenient to make liberal use of softswitch terminology. In this book, we use the terms softswitch and next generation switch interchangeably. In our view, there are four essential softswitch functional components: 1. Media gateway controller. This is the brains of the operation; it directs traffic (but bearer traffic does not actually pass through the media gateway controller—that is, the role of the media gateway). 2. Media gateway. Bearer traffic enters/exits the switch fabric via this device. This device often (but not necessarily always) performs conversion between formats (e.g., between circuit-switching and packet-switching formats, or between different voice encoding schemes, or both). 3. Signaling gateway. Call control signaling traffic (between the switch and the outside world) enters/exits the switch via this device. 4. Intergateway switching fabric. Bearer traffic travels from ingress media gateway to egress media gateway via this fabric. The first thing to notice about this breakdown is that bearer and control are, in fact, separate. Note also that this taxonomy divides the switch into functional components; the subdivision into physical components need not be the same. For the time being, we are avoiding detailed discussion of call control signaling. Thus we will not clarify the motivation for listing the signaling gateway as a separate component until a subsequent chapter. Meanwhile, the signaling gateway will appear alongside the media gateway controller in our diagrams. For now, the reader may want to make the simplifying assumption that these two components are colocated. 2.3.1 Where Does One Switch Component End and Another Component Begin?
We caution the reader that subtleties are often encountered when one attempts to “draw the boxes” (i.e., map the functions given in the conceptual framework
2.4
A Useful Abstraction
15
previously mentioned to a group of network elements). In particular, the following questions are difficult to answer cleanly:
2.4
•
Exactly how should we define the fabric of a next generation switch?
•
When a media gateway and its controller exchange control messages, do these messages travel through the switch fabric?
A Useful Abstraction We will approach these questions by drawing analogies with traditional circuit switches. Figure 2.3 is a simplified representation of a circuit switch. The ports shown in the figure are receptors where transmission links can be attached. The fabric is marked with a large “X” shape to remind the reader that it has the capacity to dynamically connect any pair of channels. These cross-fabric connections are set up and torn down at the behest of the controller. In a sense, we can view a media gateway as a (set of) port(s) on a distributed switch. This is a useful abstraction, but one must take care to realize that the analogy only holds up to a point: media gateways may (and often do) incorporate some degree of switching fabric functionality. On the other hand, ports (or multiport line cards) on circuit switches tend to have minimal functionality; in particular, line cards are often “fabricless.” Remote line frames, which often incorporate small fabrics, constitute a notable exception. There is a good reason why media gateways tend to incorporate switching fabrics- we need look no further than the backhaul example. To emphasize this point, we redraw the picture one more time in Figure 2.4 (see Figure 2.2). The shaded portion of each Media Gateway is part of the switch fabric; the figure is drawn so as to remind the reader that the bearer path must always traverse the switching fabric. In addition to housing a fabric component and ports that connect to the outside world, each media gateway also serves as a bearer interworking function. Recall that media gateways serve as ingress/egress points between two realms: that is, the realm that is external to the switch and the realm that is internal to the switch. In many, if not most cases, bearer traffic is represented and handled differently within Controller
Control messages
Fabric Ports Figure 2.3
Schematic representation of a circuit switch.
16
Essentials of Next Generation Switching
Bearer path Area 1
Control traffic
Signaling gateway Media gateway controller Location A
Media gateway B Area 2 Figure 2.4
Location B
Media gateway A Distributed fabric
Backhaul example with components relabelled.
the two realms. The interworking function often involves translation between packet and nonpacket formats (although it can involve translation between two different packet formats instead). Another common aspect of the interworking function is translation between voice-encoding schemes. Note that we regard the media gateway ports facing the external realm as part of the interworking function.
2.5
Defining the Fabric All of the media gateways in a softswitch must be interconnected. That is, it must be possible to set up a bearer path connecting any pair of media gateways. (Recall our definition of a switch: a device with the ability to dynamically connect any input channel to any output channel.) When its media gateways reside in multiple locations (as is usually the case), the fabric of a softswitch has to be a distributed entity. We define the fabric of a softswitch (or simply distributed fabric) to be the fabric components of its media gateways together with the interconnections between the media gateways. According to the definitions in Section 2.3, the interconnections between the media gateways are lumped together under the term intergateway switching fabric. The topology of the intergateway switching fabric can be simple or complex. As an example of the former, we show only two media gateways in Figure 2.4, so the intergateway switching fabric could simply be a single point-to-point connection between the two gateways. If the number of media gateways is large, a full mesh of point-to-point links between gateways becomes expensive, and it may be expedient to use one or more packet switches to interconnect the gateways more efficiently. In Figure 2.5, we give an example; this figure is more representative in that it reflects the possibility that packet switches may be part of the intergateway switching fabric. There is a “switch within a switch” aspect at this point in the discussion that could become confusing. We will refer to the “big” switch as a softswitch or next generation switch and its fabric as a distributed fabric. This will serve to distinguish the softswitch from any packet switch that serves as a component of the distributed fabric.
2.5
Defining the Fabric
17 Physical connectivity
Signaling gateway Media gateway controller
Media gateway
Packet switch
Media gateway
Figure 2.5
Packet switch
Distributed fabric
Media gateway
Media gateway
Schematic representation of a next generation switch.
In any case, devices in the softswitch-external realm (e.g., circuit switches or other softswitches) are unlikely to know or care about the topology of the intergateway switching fabric. Let us appeal once again to our analogy between softswitches and circuit switches. In the case of a circuit switch, an external device might know that a given input channel (on port A, say) is connected to a particular output channel (on port B). But it knows nothing about how port A and port B are connected to the switch fabric, whether the switch fabric resides on one or many circuit boards, and so on. Conversely, packet switches in the intergateway switching fabric may not even know that they are part of a larger entity: all of the interworking with external telephone network entities is handled by other components of the encompassing next generation switch. Packet switches in the intergateway switching fabric may come from different manufacturers than other parts of the softswitch (e.g., the media gateways and their controllers). These packet switches may carry data traffic that the media gateway controller does not know about, and that does not traverse any media gateway. For simplicity, however, the packet switches in Figure 2.5 are depicted as fully contained within the softswitch’s distributed fabric. The terms media gateway, media gateway controller and signaling gateway are all part of the industry-standard parlance. The term inter gateway switching fabric is one we have not seen before (but we need to refer to this component by some name). 2.5.1 Do Control Messages Between Media Gateways and Their Controller Pass Through the Switch Fabric?
The answer to this question is a qualified “yes, provided that we are not too doctrinaire about separation of bearer and control.” Each media gateway must talk to its controller. This communication could take place via a point-to-point link. However, we have already seen that packet switches can provide a viable alternative to the assortment of point-to-point links that would otherwise be necessary when there are many media gateways that share the same controller. It is reasonable to
18
Essentials of Next Generation Switching
interconnect media gateways and controllers via the same packet switches used for bearer traffic. So, in a sense, control messages do pass through the fabric of our next generation switch. However, there is an important distinction to keep in mind which we now explain. The controller instructs the media gateway regarding allocation and deallocation of bearer channels across the distributed fabric as calls come and go. We can think of this dialog as taking place on a quasi-continuous, permanent basis. The permanence of the control association between media gateways and their controllers contrasts with the dynamic nature of bearer channels. Lastly, note that we maintain a logical separation of bearer and control even when bearer and control traffic share capacity on the same packet switches. With the preceding comments in mind, we have redrawn the schematic switch representation to include control traffic in Figure 2.6 (see Figure 2.5). 2.5.2
What Is a Packet?
A packet is a chunk of digital data with a header. The following types of information are often included in packet headers: •
Field(s) indicating where the packet is supposed to go and/or who is supposed to receive the packet.
•
Field(s) indicating what sort of treatment should be afforded to the packet (e.g., an indicator of whether this is a high-priority or low-priority packet). Field(s) indicating how the payload of the packet should be interpreted by higher-layer protocols. This is a fancy way of saying that the packet must be marked so that the end recipient can tell what it contains (e.g., a voice sample, a video streaming sample, a control message) and process the contents accordingly. A field indicating the size of the packet.
•
•
Signaling path
Signaling gateway
Physical connectivity Media gateway controller
Media gateway
Packet switch
Media gateway
Figure 2.6
Packet switch
Distributed fabric
Next generation switch with signaling paths.
Media gateway
Media gateway
2.5
Defining the Fabric
19
Although the question “what is a nonpacket?” sounds a bit silly, we include a word here about the distinction between circuit switches and packet switches. Circuit switches determine where to send any given byte of data according to when that byte of data arrives. Packet switches determine where to send a given packet of data according to information in the packet header. (Thus the first item in the previous list of packet header information types is essential.)
.
CHAPTER 3
Motivation for Packet Telephony Revisited Owing to its flexibility, the potential benefits of next generation switching technology are wide-ranging. It is a challenge to capture all of the benefits within a coherent and well-organized framework. On the other hand, the main ideas behind the next generation switching philosophy are not very complicated, provided that they are approached in the right way. Our approach in this chapter revolves around the following question: “What can a service provider hope to gain by shifting to the new paradigm?” Next generation switching architectures are heavily influenced by the softswitch philosophy. The main features of the softswitch paradigm are: 1. Functional separation of bearer and control; 2. Distributed architecture; 3. Packet fabrics for bearer traffic. For each of these features, pratical realizability is heavily dependent on the existence of open standards. Without open standards, next generation switches would end up being geographically-dispersed “black boxes” and would be very difficult to troubleshoot (for anyone but the equipment manufacturer, that is). In Table 3.1, we list advantages offered by features 1–3 above. Of course, these advantages must translate into economic benefits somewhere along the line: in the absence of a significant economic incentive, most service providers will stick with familiar technology. Although we do not attempt to quantify the potential economic benefits of next generation switching in this book (and, indeed, particulars of cost studies will vary from one carrier’s network to the next), we ask the reader to bear these considerations in mind. In the following discussion, the term “distributed architecture” refers to the ability to geographically separate the components of a switch. We have already expounded on the benefits of distributed architecture in the previous chapter. In this chapter, we therefore devote most of our attention to the other topics in Table 3.1.
3.1
Separation of Bearer and Control Clearly, separation of bearer and control is a prerequisite for the distributed architecture discussed in Chapter 2. (Recall the “paradigm” example of Figure 2.5: a single controller controls several geographically dispersed media gateways.) That is, 21
22
Motivation for Packet Telephony Revisited
Table 3.1 Next Generation Switches: Main Features and Associated Benefits Feature
Description of Benefits
Distributed architecture
Transmission bandwidth efficiency (as illustrated by the reduced backhaul example); ability to enter new markets with reduced initial investments. Ability to evolve and maintain bearer and control elements separately; ability to interwork bearer and control elements from different suppliers. This feature is a prerequisite for distributed architecture. Ability to exploit routing intelligence of packet netwrks; ability to exploit low bit rate encoding schemes to reduce transmission bandwidth requirements.
Functional separation of bearer and control
Packet fabrics for bearer traffic
functional separation of bearer and control is necessary if we are to achieve the physical separation of bearer and control that was crucial to the motivation we presented in Chapter 2. We now argue that functional separation itself promises additional benefits. Referring to legacy voice switches, we said in the introduction that “bearer and control planes are not separated within the switches themselves.” We may have lied a little bit: like computers, voice switches are modular, and it would be quite natural to build separate bearer and control modules. The point we are trying to make, however, is that these two modules have essentially always been packaged together. We are propounding a philosophy that says we want to pry them apart: •
It should be possible to place the control and bearer modules in different geographic locations. (Here we are plugging the concept of distributed architecture again.)
•
The dialog between the control module and the bearer module should be conducted in a message format that is defined by a publicly-available standard. Or, rephrasing this point in the terminology of Section 2.3: The media gateway and its controller should interwork via an open interface.
3.1.1
Open Interfaces
Many of the interfaces on today’s circuit switches are open (i.e., based on published standards). Here we are referring to interfaces between switch and customer premises (in the signaling and bearer planes) as well as interfaces between two switches (also in both the signaling and bearer planes). This is important to service providers, because it makes multivendor interworking possible. That is, equipment produced by different manufacturers can be interconnected and can cooperate in completing calls. So it is natural to ask whether an open controller-to-fabric interface would yield similar benefits. (Today’s circuit switches do not feature open interfaces here; this is shown schematically in Figure 3.1.) Let us develop this idea a little further. With open interfaces paving the way for multivendor interworking, a service provider could purchase controller and fabric from different suppliers. Most of a switching fabric’s functionality is implemented in hardware. Controllers, on the other hand, are full of software processes that
3.1
Separation of Bearer and Control
23
Open signaling interface Controller Proprietary interface Fabric Open bearer interfaces
Figure 3.1
Interfaces on a legacy voice switch.
exchange messages with other call-control devices and keep track of each call’s state. So it would be sensible for software companies to make controllers and hardware companies to make fabrics. (The controller software might reside on a computer, such as a high-availability workstation, that is resold by the software company.) Even if a service provider chooses to buy media gateways and controllers from the same manufacturer, open interfaces are desirable for more than simply philosophical reasons. First of all, the availability of a standard may make it more palatable for an equipment vendor to port its controller functionality to a commercially available workstation, rather than continuing to manufacture dedicated controller hardware. Moreover, the process of developing a communication protocol requires clear role definitions for the elements that will conduct dialogs using that protocol; this imposes a certain discipline in the developers’ collective thought process that helps to produce cleanly-executed designs. In the current context, a standard needs to specify exactly what kinds of things a media gateway controller can ask a media gateway to do, and to abstract these requests into a generic set of messages, parameters, and so on. This goes hand in hand with an effort to specify exactly what sorts of things a media gateway should be able to do, both now and in the foreseeable future. The task we are describing is not a simple one, but if it is done well, the resulting flexibility in creating new services can be enormously profitable. We discuss the two leading standards for media gateway control, MGCP and Megaco/H.248, at length in Chapter 10. 3.1.2
Introducing and Maintaining Services
Suppose a service provider wants to change the behavior of a widely-dispersed collection of switches, and to do so as efficiently as possible. This is exactly what a provider must do to implement a new revenue-generating service: in almost every case, enhanced features in the switching network will be required to support the new service. Therefore, to make sure that a new service quickly becomes profitable, it is important to control the cost of introducing the necessary switching functionality. To this end, it would be extremely helpful to “have all of the knobs in one
24
Motivation for Packet Telephony Revisited
place”—that is, to have the power of enhancing the capabilities of an entire collection of switch components by implementing a localized change. Let us first look at the way new functionality is introduced into circuit-switched networks. This has long been a major difficulty for the telecommunications industry. To be profitable, a service normally has to be supported by a large number of switches, and the functionality needs to be consistent from one switch to another. Only then can the service be marketed to a large population of customers. In what follows, we assume that the enhanced features previously mentioned are administered via upgrades to switch-resident software. It is very difficult, if not impossible, to perform feature upgrades on many switches at once. But whenever an upgraded switch interacts with another switch that has not been upgraded, the potential for incompatibilities exists. Moreover, if a bug is discovered after a new feature is installed on a large number of switches, it may be necessary to “roll back” each of these switches to an earlier software version. This can be a disastrous outcome. Because of the difficulties and potential risk associated with such upgrades, service providers have historically been slow to introduce new services. New features have typically been tested thoroughly at a variety of stages on the way to full-scale deployment. This has meant in turn that services cannot be profitable unless they have long useful lifetimes (and/or high adoption rates). If it were simpler (and less fraught with risk) to deploy new switch features, then: •
In the effort to create new services that customers want, it would be possible to experiment more freely.
•
A given service would not require such a long useful lifetime to be worthwhile for the carrier.
The next generation approach is promising in this regard: A media gateway’s behavior is largely determined by its controller, and we have seen that a single media gateway controller can preside over many media gateways. It may seem that we are expounding on another benefit of distributed architecture (and that is indeed part of the story here). However, there is more to it than that, as we now explain. The previous paragraph leads naturally to the suggestion that it should be possible to implement a new revenue-generating service via a localized change (that is, a software upgrade at a central controller). Moreover, this should be possible even when the new service dictates that a whole collection of media gateways and fabric components act somewhat differently than they did before. This sort of argument is plausible, for instance, if our goal is to implement a new type of call-forwarding behavior. The “localized change” in this discussion is really localized in two ways: •
Functional Localization. After the change, the media gateway controller speaks to the media gateways in terms of the same primitives that were available before the change. The media gateways operate based on the same set of capabilities that they possessed all along.
3.1
Separation of Bearer and Control
•
25
Geographical Localization. The software upgrade is only performed in one physical place.
Although the importance of the former item, Functional Localization, is less obvious than that of the latter (especially considering the fact that Chapter 2 focused on the latter), it is just as crucial to a full appreciation of the next generation switching philosophy. Before moving on to the next topic, we note a caveat. The following must be true if our argument is to hold water, so to speak: •
Assumption: Each media gateway possesses adequate functionality to support a wide variety of potential services.
Conversely, if each new feature requires us to upgrade media gateways as well as controller software, we are right back where we started. Thus considerable foresight must be applied in component design if the promise of the new switching paradigm is to be fully realized. 3.1.3
New Bearer Types
In the last section, the gist of the argument was that we should be able to implement enhanced call-control features in a next generation switch without touching the media gateways. By the same token we should be able to implement bearer plane enhancements without making major changes to media gateway controllers. Here we must avoid sweeping statements about the realizability of this goal, however. When a requirement comes along to support a new type of bearer traffic, it makes a difference whether it is a new media type, a new bearer protocol, or a new voice encoding scheme. Suppose, for example, that our softswitch needs to support a new voice encoding scheme, or codec. This means that bit streams representing voice signals must be interpreted in a different way at affected interfaces. Let us try to make the argument that the media gateway controller requires little or no reconfiguration in this case, and see how many assumptions we have to make. When a new codec is introduced, the switching equipment will require transcoding capacity—that is, the capacity to convert between the new codec and other codecs that are already in use. If a switch employs a single codec for all internal traffic, then the affected bearer interfaces would be configured to transcode from the new codec to this common codec (note that this configuration change would be localized to one or more media gateways). In this case, the amount of capacity per call (within the switch fabric, that is) would remain the same as always. Wherever links employing the new codec are connnected to the switch, the media gateway controller presumably must calculate capacity utilization differently than with other codecs. But the controller would not otherwise have to alter the algorithm it uses when determining whether to admit each incoming call. Although the codec employed varies from one interface to another in our example, we have assumed in the foregoing discussion that any two calls arriving at the same interface use the same codec. What about call-control messages? They should be essentially the same regardless of whether the new codec is in use in any “leg” of
26
Motivation for Packet Telephony Revisited
the bearer channel—each of the switch’s bearer interfaces to the outside world is statically configured with a codec. If we make a number of assumptions, as we have done here, we can argue that a new bearer type (in this case a different codec) can be accomodated with limited impact on the media gateway controller. If we remove some of these assumptions, however, the argument is harder and harder to make. For instance, if codec selection can vary from call to call on the same interface, then added intelligence is required: we have to make decisions about transcoding on a call-by-call basis; determining whether to admit calls becomes more complicated because resource utilization calculations must now be more sophisticated; and so on. In later chapters, we will see examples in which the capacity to make transcoding decisions on a call-by-call basis would indeed be desirable. For the time being, we will simply remark that we have encountered a fundamental trade-off: the intelligence necessary to achieve efficient resource utilization versus the cost of implementing that intelligence.
3.2
Packet Fabrics Here we briefly consider the items that were listed under “Packet fabrics for bearer traffic” in Table 3.1. To do justice to these topics requires some additional background and terminology that we will introduce in Chapters 7 and 9. 3.2.1
Exploiting Routing Intelligence of Packet Networks
Routing in circuit-switched networks is a mixed bag. In local telco networks (at least in the United States), routing is fixed. That is, the bearer path for any given call is chosen from a preprovisioned list of paths connecting the originating and terminating switches. When a call request comes in, the network checks these paths in a fixed order, selecting the first one that has available capacity (or responding with a “network busy” signal if it exhausts the list without success). Moreover, the list of paths for any given origin-destination pair of switches is usually short (in local networks, two- and three-item lists are common). From this discussion it is clear that calls may sometimes be blocked when there is available capacity in the network, simply because that capacity did not lie along preprovisioned paths for the calls in question. Administrative intervention is required for networks with fixed routing to adapt to significant changes to traffic patterns. The big long-distance carriers employ dynamic routing (in proprietary implementations); their routing schemes are altogether more sophisticated than their landline counterparts. We give a brief overview of telco routing (and references to in-depth expository material) in Chapter 14. Dynamic routing is de rigueur in data networks. Via routing protocol message exchanges, the elements of a network discover the network’s topology. Suppose for the sake of discussion that the network elements are Internet Protocol routers (there are numerous other possibilities). Then we can rephrase this by saying that each router becomes aware of the other routers in the network and of the manner in which those routers are interconnected. The idea here is this: if there is unused
3.2
Packet Fabrics
27
capacity in a distributed fabric that could be used to satisfy an incoming call request, the routing protocols should be able to find it. Today’s data network routing protocols realize this ideal to a limited degree. One major difficulty is load balancing among a multiplicity of desirable routes to a given destination. Moreover, dynamism can bring on instability. The aforementioned dynamic routing schemes implemented by long-distance telcos incorporate features specifically designed to maintain stability. (The routing algorithms make sure to avoid network states in which an inordinate proportion of traffic is carried on alternate routes; the reader can consult the description of trunk reservation in Section 14.1.2.) We do not believe that data network routing schemes are as mature in this regard. (We discuss data network routing in Chapter 7 and, to a more limited degree, in Appendix A.) In time, packet-switched routing schemes may overcome these limitations, while being inexpensive to administer. To underscore the differences between telco routing and data network routing, let us discuss remote line frames for circuit switches. Roughly speaking, one can think of these devices as small satellite switches. Remote line frames are commonly used to serve business customers with sizable campuses. In residential markets, remote line frames are also cost effective for serving rural communities that surround a sizable metropolitan area. In the second example, one can envision a full-fledged switch in the metropolitan “hub” area exercising control over “miniswitches” in smaller outlying communities. Remote line frames incorporate small switching fabrics; the motivation for including this functionality can be found in the backhaul example. That is, the rationale is exactly the same as for incorporating fabric components in media gateways. (Here the reader can refer to the discussion in Section 2.4, culminating in Figure 2.4.) In Figure 3.2, we depict two bearer paths. The significance of the shaded box is that it encompasses all components of the switching fabric. The first bearer path connects a user in area 1 to a user in area 2. Due to the presence of a fabric component within remote line frame A, the path does not have to traverse the central fabric component. The second bearer path in Figure 3.2 connects areas 1 and 3. The way we have drawn our example, this path must go through the central fabric component, even if the two remote line frames are much closer to one another than either is to the
Area 1
Controller Central fabric Remote line frame B
Area 2
Remote line frame A
Area 3
Figure 3.2
Legacy switch with remote line frames.
Bearer path
28
Motivation for Packet Telephony Revisited
controller/fabric complex shown in the upper right hand corner of the diagram. While it is not out of the question to offer direct connectivity between line frames A and B, circuit switches are not usually configured in this way. Such a direct interconnection would have to be “nailed up” and manually configured. Typically, the only sort of bearer connectivity that is supported is that of a hub-and-spoke configuration as exemplified in Figure 3.3. To keep the diagrams simple, we have omitted signaling traffic from Figures 3.2 and 3.3. For completeness, however, we note that signaling traffic between a remote line frame and its controller is carried on a point-to-point link. Thus, switch-internal signaling traffic hews to the same hub-and-spoke topology as bearer traffic, except that the controller is now the hub component. Let us contrast the limitation illustrated in Figures 3.2 and 3.3 with the flexibility of a next generation switch. In the latter, the fabric components can be interconnected in any fashion, so long as every media gateway is reachable from every other media gateway. Routing protocols in the packet domain will automatically discover the switch fabric’s topology. Suppose we are serving the same customer base as in the previous example (see Figure 3.2), but we choose to deploy a next generation switch instead of a legacy switch. So media gateways A and B replace line frames A and B; we also replace the central fabric component with media gateway C. For purposes of illustration, we assume that the distributed fabric also incorporates two packet switches. This layout is shown in Figure 3.4. The solid lines depict the topology of the switch fabric. The dotted line traces a bearer path for a call connecting areas 1 and 3. When the call is attempted, the media gateway controller tells media gateway A to set up a bearer channel to media gateway B. In turn, media gateway A signals to packet switch I that it wishes to make such a connection. Packet switch I “sees” the gray-shaded portion of Figure 3.4. That is, routing protocol software running on packet switch I has built (and continually maintains) a model of the switch fabric’s topology. From this model, packet switch I knows (without being told by the media gateway controller) that it can reach media gateway B without involving packet switch II.
Remote line frame Z Controller Remote line frame A
Central fabric
Remote line frame B Figure 3.3
Remote line frame C
Bearer connectivity for a legacy switch with remote line frames.
3.2
Packet Fabrics
29
Signaling GW Media GW controller
Area 1
Media GW A
Media GW C
Packet switch I Packet switch II
Area 2
Media GW B
Area 3
Figure 3.4
Physical connectivity Bearer path
Bearer connectivity for a next generation switch.
Before moving on, we need to draw an important comparison between the remote line frame example (Figure 3.2) and its softswitch incarnation (Figure 3.4). Although media gateway C replaced the central fabric component, their roles are not analogous. In the remote line frame example, every call processed by the switch falls into one of two categories: •
The calling and called parties are served by the same remote line frame.
•
The calling and called parties are not served by the same remote line frame.
In the first case, the situation is clearly analogous to that of a next generation switch, with “media gateway” replacing “remote line frame.” So we have not troubled to redraw the bearer path connecting areas 1 and 2 in Figure 3.4. In the second case, the call’s bearer path always traverses the central fabric component. Thus this component has a very special role. There is no comparable component in the next generation incarnation (unless we decide to implement a hub-and-spoke topology). As with the previous figures in this section, no signaling traffic appears in Figure 3.4. Again, this is purely to simplify the diagrams. 3.2.2
Exploiting Low Bit-Rate Voice Codecs
In legacy voice networks, the basic unit of transmission capacity is 64 kilobits per second (kbit/s). Capacity is allocated in multiples of this basic unit. The reason for adopting this quantum as the fundamental building block is that the predominant voice encoding scheme operates at 64 kbit/s. Over the years, a variety of voice-encoding schemes have come into existence. Codec bit rates vary depending on level of sophistication and requirements of intended usage. But essentially all of the newer codecs, including those employed in wireless networks over the so-called “air interface,” operate at lower bit rates than the standard codec. Since landline circuit-switched networks cannot allocate less than 64 kbit/s, however, voice-encoding bit streams are usually converted to the standard codec for transmission through the network. This is true even if both endpoints of the call use the same low bit-rate codec.
30
Motivation for Packet Telephony Revisited
Clearly, this arrangement is not ideal; it is an inefficient use of transmission bandwidth. Moreover, converting to and from the standard 64 kbit/s codec entails degradation of voice quality. Since packet networks are not hard wired to function in terms of fixed capacity increments, they offer a means of addressing this shortcoming.
CHAPTER 4
Signaling and Services In this chapter, we describe the structure of the control plane. Then we take a cursory look at a variety of services and talk about the requirements they place on the serving network. We briefly discuss limitations of today’s telco networks that make it difficult to create new services.
4.1
The Control Plane In Chapter 1, we introduced the concept of a circuit-switched network and said that traditional telephone networks are of this type. In the interest of precision, we now rephrase that statement as follows: in a typical telco network, a circuit-switched bearer plane is controlled by a packet-switched control plane. In this brief chapter, we expand on this concept by describing the structure of the control plane. In today’s telco networks, call-control signaling does not use bearer channels but is instead carried on separate, dedicated channels. (There is a caveat: it would be more correct to say that interswitch call-control signaling uses separate, dedicated channels.) Moreover, switches do not exchange call-control messages directly. This is true almost universally in the United States, but is somewhat less so in Europe. Here we are talking about signaling between “core” telco switches; our statement is not applicable to Integrated Services Digital Network (ISDN) deployments. Instead, these message pass through intermediaries known as signaling transfer points (STPs), which are usually drawn as boxes with diagonal slashes (see Figure 4.1). This is the case even when the switches involved are adjacent. So the bearer and control planes are implemented as separate networks and the latter is a packet-switched network. Among other things, STPs are responsible for the routing of call-control messages. Recall that packets have header fields indicating where the payloads are supposed to go. STPs base their routing decisions on the contents of these destination header fields. One voice switch may be directly connected to many other voice switches. The number of links to STPs is likely to be much smaller than the number of links to other switches, however; control messages for calls to many destinations can be multiplexed on the same signaling link. The switch controller only has to listen for call-control messages on links to STPs. To summarize, there are three main concepts in this section: 1. Signaling interchanges are conducted on dedicated signaling channels. 2. These signaling channels are packet-based. 31
32
Signaling and Services Control plane
Controller
Switch A Bearer plane Figure 4.1
Signaling transfer point
Controller
Controller
Switch B
Switch C
Control plane.
3. Rather than running directly from switch to switch, signaling channels are connected to STPs. STPs do not come in contact with bearer traffic. So, unlike voice switches (which straddle the bearer and control planes), STPs reside in the control plane. We can think of STPs as special-purpose packet switches. Note that the items in this list represent distinct concepts. In particular, technologies such as ISDN (which is briefly described in Section 11.2.1) embody the first design principle but not the third.
4.2
What Is a Service? In Section 1.2, we introduced the idea of “find me/follow me” services. Rather than give a formal definition of the term service, we prefer to offer the following examples as a sort of operational definition: 1. Telephony itself: the basic ability to complete calls; 2. So called “vertical” services, such as: Voice mail; Caller ID/ calling name ID; • (Unconditional) call forwarding; • Call waiting. 3. Services that offer alternative billing schemes, such as: • •
Toll-free service; Prepaid service; • Calling card service. 4. Short message service. • •
4.2.1
Vertical Services
Vertical services are so named because of their place in the telco business model: they allowed telcos to “stack” additional revenue streams on existing customers. Since these customers were already being billed for basic service, the marginal cost of
4.3
Where Do Services “Live,” and What Do They Entail?
33
collecting the additional revenue was very modest. So, although initial rollout of these services was expensive and time consuming, they proved enormously profitable in the long run. Profitability of vertical services depended on mass-market acceptance and extended revenue-producing lifetimes. One only needs to look at the wireless industry to see that times are changing (hence our use of the past tense in the previous paragraph). Services such as voice mail, caller ID and call forwarding are de rigueur in the wireless telephone industry; they do not represent separate revenue streams to wireless carriers. One obvious reason is the wireless industry is much more competitive than the landline local exchange industry (at least in the United States). In the case of caller ID, another reason comes to mind: wireless handsets can display text and are more intelligent than traditional landline telephones. For a host of reasons (including increasing miniaturization), wireless handsets are replaced much more frequently than landline telephones. The handsets tend to get smarter at each iteration of the upgrade cycle. We have listed call forwarding as a vertical service, using the epithet unconditional to refer to a service that, at any given time, is either enabled or disabled (in contrast with a service that forwards calls only if the original called number does not answer, or even something as sophisticated as a follow-me service). Note that call forwarding, even in its most basic form, is different than the other vertical services listed in an essential way. Namely, it involves a higher level of user configurability: the user needs to be able to dynamically turn it on and off. When turning the service on, the user must also specify the “forward-to” number. 4.2.2
Services that Offer Alternative Billing Schemes
In the case of toll free service, the called party (rather than the calling party) is billed. Some of the services listed under this heading may be more sophisticated than others. For instance, credit card service typically offers the ability to get a dial tone and make another call by pressing and holding the “#” key. Prepaid service may offer a way for users to check their account balances. 4.2.3
Short Message Service
Short message service is the vehicle by which wireless phones send and receive text messages. Unlike the other services listed above, short message service does not involve voice telephony.
4.3
Where Do Services “Live,” and What Do They Entail? One can visualize the control plane as a layer that enables services to use the bearer plane. This is often schematically represented by placing the control plane on top of the bearer plane. This is a useful, if imperfect, viewpoint (for example, short message service blurs the separation between control and bearer planes, as we will describe in Chapter 13). As the name suggests, basic call-control signaling inhabits the control plane.
34
Signaling and Services
So we have the control plane stacked atop the bearer plane. In turn, the service plane is logically stacked atop the control plane. Again, the notion is imprecise— even though we list basic telephony as a service, it “lives” in the control and bearer planes. Moreover, other services such as call forwarding and caller ID are switch-based in the sense that they do not require per-call interaction with external service plane entities to function properly. 4.3.1
Can You Say “Database?”
The resources required for implementation vary from one service to the next. For example, voice mail requires a platform capable of recording, storing, retrieving, and replaying messages. A common denominator, however, is that services require access to subscriber data. So databases play a central role in telco service infrastructures. Depending on the service in question, the necessary subscriber data might be very simple. This is the case, for example, with caller ID service: the called party’s serving switch has the calling party number (since it is contained in the call setup message) and simply needs to know whether to display it. This switch is the only device that must have this information when the incoming call arrives. Even such a rudimentary capability means that a Boolean value (e.g., yes/no or enabled/disabled) must be associated with the called party’s phone number and stored in a table or database. Caller ID subscription information is periodically downloaded from “master” provisioning databases to switches. Processing for incoming calls is based on locally stored subscription information. On the surface, calling name ID is similar to caller ID. But the former is more difficult to implement than the latter: unlike the calling party number, the calling party name is not contained in the call setup message. Therefore it must be fetched dynamically from the appropriate database on a per-call basis; that database resides in the service plane. (It would not be at all practical to store calling name ID information for all potential callers on each switch.) In Figure 4.2, we update our schematic network representation to include the service plane. We have populated the service plane with two new devices for the support of an unspecified service. Next, we briefly describe these devices. As we saw in Section 4.1, voice switches rely on signaling transfer points as intermediaries for call control signaling. When a signaling transfer point needs to access a database, it relies on another intermediary called a service control point. We will draw databases as cylinders, since this shape is suggestive of a computer disk drive (and thus of the “data store” concept). This is in keeping with common practice. (As an aside, service logic often resides on service control points; we elaborate in Chapter 13.) We have discussed calling name ID; toll-free service is another pertinent example. The following description will be in terms of the North American implementation. The caller’s serving switch is statically provisioned to know that prefixes such as 800, 888, and so on are reserved for toll-free service (unlike area codes, triplets such as 800 and 888 have no geographic significance; we say that such numbers are not routable). So any time it sees a called number with one of these prefixes, the switch knows that it must query the toll-free database, which resides in the service
4.4
Limitations of Circuit-Switched Networks
35
Service control point
Database
Service plane Signaling transfer point Control plane Controller
Controller
Controller
Bearer plane
Figure 4.2
Service plane.
plane. This query sets in motion a series of events that eventually yields a routable number. One important point is that the procedure is transparent to the calling party. In particular, the calling party never sees the routable number. We omit the details. One point of clarification may be in order here. In the case of landline caller ID service, a given switch needs subscription information for a restricted population (namely, those customers who use it as their “home” switch). In contrast, the tollfree database may be quite large, and a switch has no way of knowing in advance which records it will need. It would not be practical to replicate the toll-free database in local storage at every switch that needs to query this database. Instead, a database query will be launched for each toll-free call. One can make exactly the same argument for calling name ID. Note the following difference, however: the calling party’s serving switch generates the toll free query; the called party’s serving switch generates the calling name ID query.
4.4
Limitations of Circuit-Switched Networks What are the limitations of circuit-switched networks when it comes to providing services? Can packet telephony ease these limitations? We highlight the following difficulties: 1. Circuit switches can ask the service plane for assistance. For example, we have discussed the capability to obtain subscriber information as needed by launching database queries. But the switches themselves have to know what information to request and when to request it; when new services come along, this is not trivial. Many services are, at least partly, resident in the switches themselves. Moreover, when a customer subscribes to multiple services, those services may interact in unpredictable ways. Thus, provisioning and maintenance of services has traditionally been anything but streamlined. 2. As we will see in Chapter 8, the control plane for today’s circuit-switched networks is idiosyncratic. In particular, the routing scheme has some unfortunate limitations.
36
Signaling and Services
3. Landline telephones for the residential market possess minimal intelligence and signaling capabilities. Moreover, it is difficult to control parameter settings for sophisticated services with a 12-button keypad. These problems are now widely recognized as such in the telecommunications industry, and there are ongoing efforts to address them. The notion of a service creation environment has been proposed in response to the first problem. In a nutshell, the idea here is to offer a “scripting” capability that makes it easy to develop services. Scripts would invoke basic call-control functions that are already implemented in the network (and have already been debugged). The service creation environment concept also includes ways for switches to autodiscover the services offered. These precepts predate widespread interest in packet telephony, but so far implementations have offered very limited capabilities. As it develops, packet telephony may provide the means to make better use of these ideas. Modern data networking technology can also be used to evolve toward a more flexible control plane infrastructure. However, Internet protocol routers and other data network equipment historically have not been engineered to the same level of reliability as have signaling transfer points. For this and other reasons, this evolution will occur over a long period of time. Lastly, we briefly discuss the lack of intelligence in the telephone itself. In the landline residential market, this may be very slow to change. But wireless handsets (a.k.a. cellphones) are already much more sophisticated than residential landline telephones; moreover, they continue to evolve. We are entering an era where wireless handsets are complex enough that they require operating systems. Thus, they are, in a very real sense, small computers.
PART II
Components of Packet Telephony: Technical Descriptions
.
CHAPTER 5
Introduction to Part II So far, we have tried to introduce an absolute minimum of terminology and avoid acronyms altogether. We are reaching a point of diminishing returns with this approach, however. Now we want to take a look at the technical “nuts and bolts” of packet telephony. In large part, this means we will examine the protocols that devices use to talk to one another. These are the lingua franca of our chosen topic, and they will pave the way for detailed discussion of some interesting examples. Our examples will be chosen to: •
Illustrate themes in the migration to packet telephony;
•
Give the reader a sense of how a collection of protocols works together to produce an overall solution.
In Part I, we hope that we have armed the reader with a conceptual framework that helps assemble the forthcoming technical information into a coherent whole. As we prepare to plunge into technical details, there are several terms and concepts from Part I to keep in mind. We highlight the following: •
Quality of service. Circuit-switched networks are engineered to provide high voice quality. Packet-switched networks, which have traditionally been designed to meet different requirements, are less adept at providing quality of service suitable for real-time applications (such as bidirectional voice and video). The traditional data networking paradigm must be adapted to support the same quality of service as that delivered by legacy voice networks.
•
Packet-switched control plane. When we say that legacy telephone networks are circuit-switched, we are really referring to the bearer plane. The control plane is already a packet network. Routing. The data networking community has standardized robust dynamic routing schemes and brought them into widespread use. Today’s data network routing protocols offer immediate promise for alleviating difficulties with traditional telco control plane protocols.
•
Data network routing schemes also show promise in the bearer plane. But realistically, delivering on that promise will take more time. Potential benefits include graceful adaptation to changing traffic conditions with minimal administrative oversight.
39
40
Introduction to Part II
•
Distributed switching. Voice switches will evolve from centralized devices to distributed devices, and functional components will be clearly separated. Recall that we have identified the following functional components: •
•
•
•
Bearer traffic enters and exits the distributed switch fabric via media gateways. The actions of media gateways are directed by media gateway controllers. The former “live” in the bearer plane, whereas the latter inhabit the control plane. Call control signaling traffic enters and exits a distributed switch via a signaling gateway. The intergateway switching fabric interconnects media gateways belonging to a distributed switch.
How can we evolve a circuit-switched infrastructure toward packet telephony? We will use the wireless industry as a source of examples that shed light on this topic. Wireless networks are more sophisticated than wireline networks, since wireless handsets must communicate with towers. The radio technology that is employed for this purpose is certainly complex. But wireless networks are complicated in another way: the towers are interconnected by sophisticated “wired” networks. It is the second type of complexity that interests us in this book. In a nutshell, mobile carriers’ networks have to be smart enough to keep track of mobile subscribers, so their control planes have to manage information in a dynamic fashion that would be alien to their landline counterparts. When a customer powers up his or her handset (or moves to a new location), the switch that serves that customer’s current location must: •
Fetch the appropriate subscriber records from a centralized database.
•
Make a note of the current location in the centralized database.
The latter is important because, in order to complete an incoming call, the network must be able to find the subscriber in question. We elaborate in Section 8.7.
5.1
Selected Telco Terminology At this point, it is expedient to introduce the following telco vocabulary: •
A line is a connection between a voice switch and a telephone (or other customer premises equipment).
•
A trunk is a voice channel that connects two switches. Public-switched telephone network (PSTN). This is a generic term for a telco network. We will use this term to mean circuit-switched network. (This is a reasonably accurate usage: although packet voice is seeping into traditional telco networks, the vast majority of telco infrastructure is still circuit-switched.)
•
5.1
Selected Telco Terminology
•
• •
•
•
•
41
A Class 5 switch directly serves subscribers. Said differently, a Class 5 switch is a switch that terminates lines. “Class 5” is a reference to the PSTN switching hierarchy in the United States. Detailed discussion of the hierarchy is beyond our scope. Suffice to say that switches in the other layers of the hierarchy only connect to other switches-not to end users. End office switch is a synonym for “Class 5 switch.” Time division multiplexing (TDM). A scheme in which bits associated with different channels are distinguished according to when they arrive. This is essentially the multiplexing approach in landline circuit-switched networks. Private branch exchange (PBX). Switching equipment that is common in office environments, college campuses, and the like. PBXs usually offer abbreviated dialing plans for internal calls and other convenience features. Centrex service is typically hosted at the Class 5 switch; the features are similar to that of a PBX. Telcos offer centrex service as a way to compete with PBX providers. Wireless carriers’ base stations are typically interconnected by landline networks known as public land mobile networks (PLMNs). We note here that connectivity may be achieved by other means, such as microwave links, in special circumstances.
.
CHAPTER 6
Protocols In this chapter we introduce the protocol stack concept. We define the following reference terminology: physical layer, data link layer, network layer, transport layer and application layer. It is easier to understand the main ideas with a few examples in mind. To this end, we describe Transmission Control Protocol/Internet Protocol (TCP/IP), which is an important foundation in data networking. Then we look at an example that is more directly pertinent to telephony: Signaling System 7 (SS7), which is predominant in the control and service planes of today’s telephone networks. In the process, we briefly discuss finite state machines. We will explore voice over IP in Section 9.3.
6.1
What Is a Protocol Stack? A protocol stack is a bunch of layers of software. Many things have to happen in order for one user or application to exchange data with another, especially if the exchange occurs across a telecommunication network. The many tasks that must be performed are grouped into modules. This is in keeping with good programming practice, which uses modularity to make large tasks manageable and to pave the way for code reuse. To a degree, implementation details within different layers are independent. Since transmission across a network must involve hardware at some point, we have oversimplified a bit by only mentioning software in the previous paragraph. Moreover, processes that could be implemented in software may be implemented in silicon in the interest of performance. Imagining several layers of software (some of it taking the form of application-specific integrated circuits) running atop switching and transmission equipment gives a more accurate picture. Referring to a collection of protocol modules as a stack is, to our way of thinking, primarily a visualization aid. In the vertical direction, lower layers provide functionality to higher layers running on the same device. In the horizontal direction, entities at the same layer (on different devices) talk to each other across a network. Physical transmission capacity always resides at the bottom of the protocol stack. All higher layers ultimately rely on the physical layer to conduct dialogs with their counterparts on remote devices. When an application submits a packet of data for transmission to a remote entity, that packet must descend the protocol stack to the physical layer. Along the way:
43
44
Protocols
•
Successive encapsulation occurs. That is, each protocol layer adds its own header information.
•
Fragmentation may occur. Fragmentation is necessary whenever a protocol layer receives a chunk of data that is too big to handle “all in one shot.”
On the receiving end, successive decapsulation takes place: each protocol entity strips off the header that was added by its same-layer counterpart, processes that header, and passes the payload to the layer above it. If fragmentation took place at the originating device, reassembly is performed at the destination. Segregating functions into protocol layers makes life easier for humans. One can concentrate on the details of one protocol layer and think of the other layers as “black boxes,” keeping in mind only a very rough description of their roles. 6.1.1
Comparison with Last In, First Out Data Structures
In computer science, the notion of a last in, first out data structure is prominent. Such a data structure is usually called a stack, but this is clearly not the same as a protocol stack. However, we note the following similarity: when a packet climbs the protocol stack at its destination, the encapsulating headers are removed in last in, first out order.
6.2
Generic Layer Descriptions Each layer in a protocol stack builds on the capabilities of the layer below in order to provide services to the layer above. How should functionality be subdivided among the layers? There is a de facto standard approach. In Table 6.1, we list industry-standard layer names and summarize the functions assigned to each layer. Table 6.1 is not exactly a protocol stack; instead it is a reference model that suggests how a protocol stack ought to be organized. This model is really just a guideline; technologies and protocols never seem to align cleanly with the boundaries between the layers. However, the model can provide useful insight on how individual technologies fit into the bigger picture. On occasion we will refer to the data link layer as layer 2 and the network layer as layer 3; although it dates back to the 1980s, this nomenclature has remained in
Table 6.1 Protocol Layer Descriptions Layer Name
Description
Application Transport
Defines processes that allow applications to use network services. Ensures reliable communication across a network. The transport layer verifies the integrity of the data it receives. Adds routing and addressing functionality for end-to-end communication. Responsible for reliable point-to-point communication between devices. Packages data in structured frames that are submitted to the physical layer for transmission. Responsible for transporting bits through a physical medium.
Network Data link
Physical
6.2
Generic Layer Descriptions
45
the vernacular. The layer numbers (as well as some of the basic ideas) are taken from the Open Systems Interconnection (OSI) reference model, which was developed by the International Organization for Standardization (ISO) in 1982. That reference model defined a seven-layer stack, with the application layer at the top. There were two additional layers (not shown in Table 6.1) between the transport and application layers. Implementation of those two layers was problematic from the start, and the OSI reference model was pretty much abandoned. Some of the terminology stuck around, however. In particular, one may still encounter references to “layer 7” as a synonym for the application layer, although this terminology is probably best avoided. At any rate, the layers of primary interest in this book are the data link, network, and transport layers; we discuss these layers next. We will not expand on Table 6.1’s telegraphic descriptions of the other layers. 6.2.1
Data Link Layer
The data link layer provides reliable point-to-point communication between devices. Two things are particularly important here: •
The physical layer does not worry about any errors in transmission that may happen: it does not notice bit errors, let alone try to correct them.
•
The data link layer does not have overall knowledge of the network. The two devices mentioned previously need to be directly connected from the data link layer’s point of view (although there might be other physical-layer devices between them).
The data link layer packages data in structured frames that are submitted to the physical layer for transmission. This layer performs error checking; each frame header includes data (such as a cyclic redundancy check field) to support error checking functionality. In order to recognize situations in which transmitted data has been lost altogether, sequence numbers may also be present in frame headers. In this case, the destination device sends acknowledgments to the originating device; these acknowledgments indicate which frames have been successfully received. The data link layer may also be responsible for flow control (in response to congestion) and for error recovery (e.g., resetting a link) in response to errors at the physical layer. Examples
Common protocols at the data link layer include the following. The reader can find more information in Appendix A. •
High-level data link control (HDLC) was standardized by the ISO. Its basic approach to framing has been borrowed by many protocols, often with adaptations to suit specific needs. Point-to-point protocol (PPP), which is heavily used in dial-up networking, is often deployed with a framing structure similar to that of HDLC.
46
Protocols
•
•
6.2.2
Ethernet is the most common LAN technology and is evolving beyond its traditional roots. Frame Relay and ATM are widely used to transport data traffic across so called wide area networks. Network Layer
Recall that the data link layer does not “see” the topology of the network. The network layer is responsible for routing packets: by looking at the destination address of each packet, it determines which outgoing link to use. In this context, each destination address must have network-wide significance (i.e., it must uniquely identify a destination device). In some cases, the network layer has some ability to manage quality of service (e.g., priority fields in network layer headers can be used to request preferential treatment). Examples
At the network layer, most of our attention will be devoted to Internet Protocol. We also give some coverage to layer 3 functionality in SS7 stacks. For completeness, we note that other network layer protocols exist (such as Novell’s IPX). 6.2.3
Transport Layer
The transport layer is responsible for ensuring reliable communication across the network. When there are problems with dropped and duplicated packets, it detects and corrects these problems. This layer performs fragmentation and reassembly. The transport layer is also responsible for flow control; it uses buffering and/or windowing as flow control tools. Examples
Examples include TCP, Stream Control Transmission Protocol (SCTP) and (nominally) User Datagram Protocol (UDP). Each of these are covered later in this chapter. 6.2.4
A Note on Terminology: Packets and Frames
Recall our definition of a packet: it is a chunk of digital data with a header. Among other things, the header contains fields indicating where the data is supposed to go. Many other terms in common use refer to the same basic concept. When discussing data link layer technologies, chunks of data are often called frames (e.g., in the case of Ethernet or Frame Relay). One distinction is that frames often have trailers in addition to headers; packets do not. No term seems to be universally applicable. In ATM, for instance, all chunks of data have the same length; people wanted to use a different term to emphasize this difference with other data link layer technologies, and they settled on the term cell. Moreover, ATM cells do not have trailers.
6.2
Generic Layer Descriptions
47
When referring to layers above the data link layer, we will predominantly use the word “packet.” When referring to the data link layer, we will predominantly use the word “frame.” In the interest of simplicity, other terms will be used sparingly.
6.2.5
General Comments
The physical layer provides the fundamental ability to pump bits through a physical medium. One can view Table 6.1 as a roster of additional functions that are necessary to harness that basic capability. In some cases, capabilities essentially must appear in the roster exactly where the model places them. As an example, it is compulsory that a framing structure be defined between the physical and network layers, because this is the means by which chunks of data arriving on the physical layer are delimited. In other cases, some capabilities might be arranged differently in different protocol stacks. Note, for example, that the word “reliable” appears at layers 2 and 4. Clearly, there is a major difference in context: layer 2 provides reliability on individual links, whereas layer 4 is responsible for end-to-end reliability. Bit errors do happen at the physical layer; moreover the physical layer cannot detect these errors. The word “reliable” suggests error detection and correction. Now the frequency (or rate) of bit errors varies depending on the physical layer—it tends to be higher in wireless transmissions than on fiber-optic cables, for instance. When the physical layer has a high bit error rate, it is common to implement error correction at the data link layer as well as the transport layer. When the physical medium has a low error rate, error correction is commonly left to the transport layer. Note, however, that essentially all data link layer technologies perform error detection. Rather than implementing sophisticated error correction schemes or requesting retransmissions themselves, they may simply discard errored frames, leaving it to the transport layer to request retransmission upon discovering that there is missing data. In the case of real-time services (such as voice and video), end-to-end retransmission is not a palatable option: by the time a retransmitted packet makes it across the network, the real-time session has already progressed “downstream” and has no use for this “stale” packet. We will see that the bearer protocol stacks for packet voice are therefore different from those employed for traditional data networking. When applications on two host computers communicate, the protocol stacks they rely on implement essentially all of the functions cataloged in Table 6.1 (although there is some variation in how these functions are organized into layers). It is important to understand that intermediate devices implement only a subset of the full functionality. We will describe numerous protocols in this book, starting with IP and TCP in the next section. In each case, we will indicate (or at least approximate) where the protocol in question fits in the reference model, and we will talk about the fields in its packet or frame headers.
48
6.3
Protocols
Internet Protocol and Transmission Control Protocol IP operates at the network layer. As with other protocols, IP has gone through a series of versions as it evolves. Version 4 (which we will abbreviate as IPv4) is predominant in today’s networks. The annointed successor, IPv6, has yet to “establish a beachhead.” The IP header includes the following fields: IP version number, source address, destination address, and length. There are significant differences in the IPv4 and IPv6 headers, each of which contains a number of other fields not mentioned here. We defer detailed discussion of the IPv4 and IPv6 headers until Chapter 7. IP has to ride over something at the data link layer, as IP itself has no provision for “reaching down” to this layer. IP can be carried by many data link layer technologies (including the examples listed in Section 6.2.1). This independence of the data link layer is one of IP’s great strengths. Whenever one makes a call over a circuit-switched network, the bearer channel is bidirectional, and bearer traffic in both directions follows the same route. Before moving on, we note that IP is not intrinsically bidirectional. Common applications such as e-mail and instant messaging essentially adhere to a unidirectional paradigm. For applications such as full duplex voice, there is no guarantee that the bearer paths in the two directions traverse the same nodes. 6.3.1
What Is an Internet Protocol Router?
Our discussion will include many references to IP routers. For our purposes, an IP router is a switching device that examines the IP header of each incoming packet and uses the contents of that header to make its forwarding decision (i.e., to select the outgoing interface for the packet). 6.3.2
A Brief Look at TCP
TCP has been a huge factor in the data networking industry’s growth. This is true to such a degree that TCP/IP is often thought of as a “package deal.” TCP nominally operates at the transport layer, so we could think of TCP/IP as “TCP running over IP.” Many applications in turn run over TCP, including the following: •
Web browsing using HyperText Transfer Protocol (HTTP);
•
Telnet; File Transfer Protocol; E-mail (using Simple Mail Transfer Protocol); Applications requiring cross-network database access (using Lightweight Directory Access Protocol).
• • •
We briefly discuss the functionality offered by TCP; we will refer to the TCP header as illustrated in Figure 6.1. (We describe the significance of the shaded fields in the paragraphs below; we do not discuss the other fields.) The main responsibility of the transport layer (see Table 6.1) is to ensure reliable communication across a network. The checksum field in the TCP header is used to detect corrupted packets
6.3
Internet Protocol and Transmission Control Protocol
49
(see Figure 6.1). Each TCP header also contains a sequence number field. Sequence numbers are reckoned in bytes of payload. The TCP protocol entity at the receiving end of a connection acknowledges receipt of packets back to the sender. This is the purpose of the acknowledgment number field; the receiver fills in the value of the next sequence number it is expecting from the sender, and sets the ACK bit to 1 to indicate that the contents of the acknowledgment number field are meaningful. The implication is that every byte with a smaller sequence number than the acknowledgment number has been successfully received. (Let us think of the main direction of data flow as “downstream.” Then we can say that acknowledgments travel upstream.) If the sender does not receive acknowledgments in a timely fashion, it retransmits the packet(s) in question. So far, we have described a rudimentary mechanism for ensuring reliable communication. To a significant degree, TCP owes its success to additional features. TCP provides the following functionality (so protocols running over it do not have to implement any of this functionality): •
Flow control. TCP tries to make full use of the network capacity that is available to it. If it waited for acknowledgment of each packet sent before sending another, this would make for very slow going. Instead, TCP sends a certain number of bytes without waiting for acknowledgment. (The number of bytes is governed by the window field in previous acknowledgments.) If no congestion is detected (i.e., all packets seem to be getting through) and the destination host is able to keep pace, TCP increases the window. This has the effect of sending data at a higher rate. If a session has a substantial traffic volume over a long duration, TCP will eventually saturate the available network capacity, and packets will begin to be dropped. TCP will then realize that it has gone too far and will back off (i.e., move to a smaller window size). Then the cycle starts again—the window size slowly inches up, and so on.
•
Sequencing and eliminating duplication. Packets may not be received in the same order that they were transmitted (IP networks make no guarantees
Source port (16 bits)
Destination port (16 bits)
Sequence number (32 bits)
Acknowledgment number (32 bits) Data offset
Reserved
U A P R S F R C S S Y I G K H T N N
Checksum (16 bits)
Options (variable length)
Figure 6.1
TCP header.
Window (16 bits)
Urgent pointer (16 bits)
Padding (variable length)
50
Protocols
•
•
about preserving order). Moreover, a TCP protocol entity may unnecessarily send duplicate packets because it did not wait long enough for acknowledgments (or because acknowledgments got lost in transit). TCP uses the Sequence Numbers to correct for these anomalies, so the higher layer protocols do not have to worry about them. Multiplexing. A single TCP protocol entity may have many applications running over it. The source port and destination port numbers are used to make sure that data gets to and from the right applications. Segmentation and reassembly. Applications running over TCP can submit large amounts of data to the TCP layer (e.g., large files in the case of file transfer protocol) and assume that their data is streamed across the network. TCP decides how the data should be segmented into chunks at the sending end and reassembles the data at the receiving end.
When the sending host has no more data to transmit, it indicates this by setting the FIN bit in the TCP header to 1. The data offset header field indicates the length of the TCP header (in units of 32-bit words). This is necessary because of the variable length of any options fields that may be present. We described the use of each of the shaded fields in Figure 6.1; the other fields are beyond our scope. One curious fact to note is that the TCP header does not contain a length field. When TCP passes a packet to the IP layer, it informs the IP entity as to the length of that packet. TCP’s flow control scheme has been tuned very carefully over the years. For example, the protocol entity on the sending host maintains an empirical estimate of the round-trip time by noticing how long it takes to receive acknowledgments for its outgoing packets. (More precisely, it maintains a moving average.) This is used to decide how long to wait before retransmitting unacknowledged packets. For file transfers, Web downloads, and the like, TCP works amazingly well. On the other hand, TCP is not very well suited to real-time voice and video applications. This is only the briefest of introductions to TCP. There is also much more to know than is contained in the defining document [1], especially when it comes to tuning TCP performance. Further reading is available in abundance, including the well-known three volume series [2–4] and Wilder’s book [5]. Placing Intelligence at the Edge of the Network
People who are steeped in data networking often say that, for Internet Protocol net works, the intelligence is at the edge. The sophistication of TCP is a big reason why they say this. When TCP/IP hosts communicate with one another across the network, the packets they send may traverse many intermediate devices, and many, if not most, of these intermediate devices are oblivious to what is going on at the TCP layer. TCP knows nothing about the topology of the network, but it is smart enough to adjust to conditions based on the acknowledgments it does (or, in the event of congestion, does not) receive. It is not quite true that all of the intelligence resides in the TCP layer at the endpoints. For one thing, IP networks have routing intelligence. Also, many Internet Protocol routing devices purposely discard packets as they approach congestion, so that hosts participating in TCP sessions that traverse those
6.3
Internet Protocol and Transmission Control Protocol
51
routing devices will see fit to reduce their window sizes. Still and all, the development of TCP represents a conscious effort to realize useful intelligence in end-user devices. There is a big difference in the intelligence of a device running a TCP/IP stack and that of a typical consumer landline telephone set. 6.3.3
TCP/IP Networking Illustration
In this section, we describe a simple Web surfing scenario. We do so to cement the foregoing discussion of TCP/IP in the reader’s mind. Protocol stacks at the end systems and at intermediate network elements are represented schematically in Figure 6.2. Above the TCP layer on the end systems, the web browser and server software are using HTTP. When the server software fetches information in response to a download request, it hands the data to the HTTP entity, which in turn passes it to TCP for transport across the network to the client PC. TCP segments the data into palatable chunks and reassembles these chunks after making sure that they reach their destination intact. The HTTP entities on the endpoints never see this segmentation, nor do they see what goes on at the IP, Ethernet, and physical layers. The intermediate network elements are an Internet Protocol router (this is right in the middle of the diagram) and two Ethernet LAN switches (flanking the IP router). We have not labeled these network elements as such in the diagram; this keeps the diagram simple and reinforces the point that TCP does not need to know what the intervening network looks like. The lower layers on the client PC and Web server are not aware of what is happening at the TCP and HTTP layers; HTTP and TCP headers are just payload to them. The same holds for the intermediate devices; they can figure out all they need to know by restricting their attention to the lower layer headers and trailers. The figure reflects this at the TCP layer by showing TCP packets “sailing over the heads” of the intermediate devices. (By the same token, the Ethernet LAN switches do not concern themselves with the IP packet headers.)
Web browser
Server software
HTTP
HTTP TCP packets (aka “segments”)
TCP
IP packets
IP
IP
TCP
IP packets
IP
Ethernet
Ethernet
Ethernet
Ethernet
Ethernet
Physical
Physical
Physical
Physical
Physical
Client PC Figure 6.2
Simple TCP/IP networking schematic.
Web server
52
Protocols
IP
IP payload
Figure 6.3
IP payload
Ethernet
IP payload
IP payload
Physical
IP payload
Packet flow through an Internet Protocol router.
Figure 6.3 describes the flow of packets through an IP router. We have placed packet headers on the left merely because English is read from left to right. When frames are submitted to the physical layer by the data link layer, headers are transmitted first. We have chosen to show traffic flowing from right to left in order to reinforce the notion that headers lead the way. As a frame comes in from the left and rises through the Ethernet layer to the IP layer, the Ethernet header and trailer are removed. The IP layer selects an outgoing interface—this is a fancy way of saying that the IP layer determines which of the transmission links connected to the router will be the egress link for this packet. It hands the packet to the Ethernet layer on that interface, which adds the Ethernet header and trailer before handing off to the physical layer. In the course of its processing, the IP layer alters the IP header slightly (although this is not reflected in the picture—the IP header appears as the same cross-hatched rectangle in each “snapshot” of the packet). Headers at the data link layer may also be changed. This is clearly necessary in any case where a packet enters a router via one data link layer technology and leaves via another technology. Suppose, for example, that we alter the scenario in Figure 6.2 to include two intermediate IP routers with a Frame Relay connection between them. Then there would be traffic entering each router on an Ethernet link and leaving on a Frame Relay link (and vice versa). We diagram the left-hand router (i.e., the IP router closer to the client PC) in Figure 6.4. This figure shows a packet flowing toward the client PC; note that we have used a different hatching pattern for the Frame Relay header/trailer than for the Ethernet header/trailer. (Of course, packets flow in the other direction as well, although we do not show any of these.) 6.3.4
Alternatives to TCP at Level 4: UDP and SCTP
UDP is much simpler than TCP. Whereas TCP performs every function ascribed to the transport layer and then some, UDP does nothing to ensure reliable communication (let alone implement flow control). UDP does offer multiplexing functionality to the higher layers. It manages this task using source and destination port numbers just as TCP does. Another similarity with TCP is that UDP uses a checksum to detect corrupted data. Unlike UDP, SCTP does ensure reliable communication. It is for applications that require reliability from the transport layer, but for which TCP’s flow control
6.4
What Is a Finite State Machine?
53
IP payload
IP payload
Ethernet Frame Relay
IP
IP payload
Physical Physical IP payload
Figure 6.4
IP payload
Flow through an IP router with Ethernet and Frame Relay Interfaces.
mechanism is not suitable. As with the other layer 4 protocols mentioned here, SCTP employs source and destination port numbers. We will discuss UDP and SCTP at greater length in Section 7.8.
6.4
What Is a Finite State Machine? The best way to introduce the notion of a finite state machine (FSM) is probably via an example, as formal definitions are often hard to penetrate. For our purposes, a finite state machine is an abstraction used to indicate how a protocol entity behaves. An FSM will typically be represented by a block diagram. The blocks and interconnecting arrows in the diagram are called states and state transitions, respectively. These items show how a protocol entity is organized. Loosely speaking, one can think of the states as software modules. A state transition corresponds to an event that transfers control from one module to another. Here is our example. When a telco customer tries to originate a phone call, the serving switch (which we will call the originating switch) keeps track of the state of that call. It does so by instantiating (i.e., creating) and maintaining an FSM for that call. This FSM often goes by the name call state model. We present a simplified call state model in Figure 6.5. This state machine “lives” in the originating switch (the destination switch will maintain a different state machine for the same call). The basic idea is simple enough: to make it from the idle
Idle
Collecting information Analyzing information
Active Figure 6.5
Routing and alerting
Simplified version of call state model (originating).
54
Protocols
state to the active state (i.e., the state in which a bearer channel has been set up and the customers are conversing), and back to the idle state when the call is completed. 6.4.1
States
The states in Figure 6.5, whose names are fairly self-explanatory, can be interpreted as follows. In the idle state, the call does not yet exist. Information (such as dialed digits) is expected by the switch when it is in the collecting information state. In the analyzing information state, the originating switch has the dialed digits and is conducting various checks (e.g., whether the customer has dialed a toll-free number). In the routing and alerting state, the switch has requested that a call be set up and is waiting for the call to be routed, the called party’s phone to ring, and so on. We have already described the active state. The five states we have shown are sufficient for our purposes. Let us note, however, that the state machine implemented inside a digital switch is much more granular. In particular, the analyzing information and routing and alerting states shown in the figure are both aggregates of numerous states in the switch’s internal call state model. 6.4.2
State Transitions
Figure 6.5 is simplified in another major way. Even if it is only a high-level model, any self-respecting state machine should include some description of how the state transitions occur: what circumstances or events trigger these transitions? We have not said anything thus far about the state transitions except to include a sequence of arrows stringing together the states. Moreover, the arrows present in the diagram only depict the sequence for a successful call. The call state machine also needs to take a variety of failure scenarios into account. Thus, in Figure 6.6, we added transitions to the idle state from all other states to account for the fact that the caller may choose to hang up at any time (before or after the call setup is completed), thereby abandoning the call. We also labeled the transitions. Until the state machine reaches the analyzing information state, the switch is interacting only with the caller’s telephone; the rest of the network is not yet aware of this call. Let us assume for the sake of discussion that the calling and called parties are served by different switches. Once it has analyzed the information (e.g., dialed digits), the switch notifies the network of the new call request; the state machine for this call passes into the routing and alerting state. While in this state, the originating switch is in a somewhat passive role: it is waiting for its signaling transfer point to contact the destination switch and relay confirmation from the destination switch that it can handle the call. We emphasize that Figure 6.6 is still an incomplete representation of a typical voice switch’s call state machine. In fact, the state machine in the switch is very complicated, because the number of scenarios is large and the switch must be prepared for each possibility. Not only does the model presented in this section lack a full complement of states; we also comment here that, in a more faithful representation, the logic associated with the state transitions might also be complex. We mention one particular type of transition trigger, a so-called timeout that is common in
6.5
Signaling System 7 in Brief
Idle
55 Detect off-hook
Detect on-hook
Done collecting 7 or 10 digits
Collecting information
Detect on-hook
Analyzing information Active
Receive call setup confirmation
Figure 6.6
Routing and alerting
Dispatch call setup request
A more realistic call state model with labeled transitions.
protocol entities but is omitted from the high-level description in this section. When a protocol entity sends a message, it generally will not wait indefinitely for a response. Instead, a timer is set when a message is sent. If the timer expires before a response is received, the sender assumes that the message was lost and takes appropriate action (e.g., resending the message). 6.4.3
Additional Comments
State machines residing in different network elements often need to be synchronized. Recall that the state machine represented by Figure 6.6 has a partner of sorts: that is, there is a state machine associated with our telephone call in the destination switch. For instance, if the originating state machine is in the active state, we would certainly hope that the destination state machine is also in its active state. Many protocols have mechanisms for ensuring that peer entities’ state machines are properly synchronized. That is, the protocol entities continually check for evidence that they are “on the same page,” and have defined procedures for re-establishing proper communication when they determine that something has gone wrong. The labels “dispatch call setup request” and “receive call setup confirmation” in Figure 6.6 refer to a call-control signaling exchange between SS7 protocol entities; we introduce SS7 in Section 6.5.
6.5
Signaling System 7 in Brief SS7 is the vehicle for call-control signaling in today’s telephone networks and is also an essential ingredient in many telco network services. We will discuss the SS7 layers starting at the bottom and working our way up the protocol stack. The reader may find it instructive to peek ahead to Section 6.5.6, which is relatively self contained, before trying to attack the bigger picture. This is where we touch on ISDN User Part (ISUP), which is the protocol for basic call control.
56
Protocols
SS7 did not spring, fully formed, from the head of Zeus. Even though we associate it with “legacy” networks, SS7 is itself the outgrowth of a long evolution. The previous step in that evolution was Common Channel Signaling 6 (CCS6). Like SS7, CCS6 was packet-based. Recall that it is possible to perform call-control signaling using bearer channels; another choice is to dedicate certain channels entirely to signaling. In the latter case, one signaling channel would be able to serve the needs of many bearer channels (which would have the same signaling channel in common—hence the name “common channel signaling”). CCS6 codified the common channel approach, but it had the following limitations: •
It used fixed-length packets (and therefore had limited extensibility).
•
It lacked an independent transport layer.
SS7 overcomes both of these limitations. The thing to note about SS7 is that it is vast. This is true in terms of the variety of signaling messages that ride atop SS7 stacks as well as the sheer scale of the SS7 infrastructure that is now deployed worldwide. Moreover, the SS7 footprint is growing in both of these “dimensions” as applications like number portability are rolled out, wireless networks evolve toward ever-increasing functionality, and so on. At the higher layers, many of the SS7 message types are grouped into so called application parts. To give the reader a sense of the vastness of SS7, we hold up the Mobile Application Part (MAP [6]) specification as as an example: this specification is about a thousand pages long, and it is still changing. MAP is the protocol for so-called mobility management functions in wireless networks adhering to the GSM specifications; we discuss this protocol later in this chapter. Here we note that GSM originally stood for Groupe Spéciale Mobile. Nowadays, this acronym is more commonly expanded as Global System for Mobile Communications. In an SS7 stack, Message Transfer Part (MTP) is responsible for levels 1, 2, and 3. These levels are roughly aligned with layers 1 through 3 (as described in Section 6.2), although we will see that MTP level 3 falls short of full-fledged network layer functionality. MTP level 1, the SS7 physical layer, operates independently of the other layers; we do not discuss it any further. 6.5.1
MTP2
Message Transfer Part Level 2 (MTP2) provides basic error detection and correction at the data link layer (as the name suggests, MTP2 operates at this layer). Recall that an essential function of the data link layer (and a prerequisite for any sort of error detection and correction capability) is to package data in structured frames that are submitted to the physical layer. MTP2 maintains sequence numbers, but these only have local significance. That is, sequence numbers used on a given link have nothing to do with sequence numbers on any other link. Note that this is different than TCP, in which the end systems agree on the semantics of the sequence numbers they employ. The fact that TCP can take an end-to-end view (and MTP2 cannot) is due to their relative positions in
6.5
Signaling System 7 in Brief
57
protocol stacks: recall that TCP operates at the transport layer, which is above the network layer. 6.5.2
MTP3
Message Transfer Part Level 3 (MTP3) is responsible for network management functions; it is also responsible for the following message handling functions: •
Message discrimination. This function determines whether the current node is the destination for this message. If so, the message is passed to the message distribution function, described below. If not, the message is passed to the message routing function, also described below.
•
Message routing. This function selects the outgoing link. MTP3 has pointto-point routing capabilities (but we note here that its routing intelligence is limited). Message distribution. This function, which is invoked when the current node is the destination for this message, passes the message to the correct higher layer protocol entity.
•
The use of the term “destination” can be confusing in the context of the message discrimination function already described. When an incoming message is passed by the message distribution function to a higher-layer protocol, that protocol may furnish additional information and send the request back down to MTP3 for routing to another node. Signaling Connection Control Part (SCCP), which commonly performs this function, is our next topic of discussion. 6.5.3
SCCP
SS7 is not used exclusively for basic call control. To an increasing degree, SS7 is used to carry database queries and responses to and from telco network elements. Many services require database access (see Section 4.3), with toll free service being the classic example. Message transfer part alone is not sufficient to support this capability. SCCP provides the end-to-end routing capability needed to reach a database; this protocol runs directly over MTP3. Note that database queries are not actually formulated in SCCP—SCCP just makes sure that those queries reach their destinations. We discuss routing in SS7 networks in Chapter 8. 6.5.4
TCAP
Whenever an SS7 node needs to perform a database query, Transaction Capabilities Application Part (TCAP) comes into play. The query and the response are formulated as TCAP messages. As indicated in the previous section, MTP3 routing does not support database access, so TCAP requires the services of the SCCP layer. 6.5.5
MAP
Wireless networks maintain subscriber profiles, as well as dynamic information on each active user’s location, in large databases called home location registers.
58
Protocols
Incoming calls trigger queries to these databases, as the location information contained in the registers is required to successfully route calls to mobile users. Signaling exchanges are also required whenever a mobile user moves to a different portion of the serving network. The blanket term mobility management is often used to refer to the transaction types described in this paragraph. For wireless networks that are built to the GSM standards, the language of Mobility Management is Mobile Application Part (MAP). MAP runs over TCAP. GSM is not the only wireless technology. For example, two other wireless schemes are widely deployed in North America: time division multiple access (TDMA) and code division multiple access (CDMA). Both of these technologies utliize the ANSI-41 standard for Mobility Management. The functionality is very similar to that of MAP, but the details are different. 6.5.6
ISUP
ISDN User Part (ISUP) is used for basic call control signaling. Here ISDN stands for Integrated Services Digital Network. The ITUT’s ISUP specification is contained in its Q.76x series of recommendations; Q.761 [7], which contains the functional description, is the starting point. The U.S. version [8] is published by the American National Standards Institute (ANSI); although there are differences in the details, the ANSI and International Telecommunication Union Standardization Sector (ITU-T) versions are conceptually the same. Let us outline the ISUP messaging exchange for a simple call scenario. In this example, the originating switch (i.e., the calling party’s serving switch) has a direct bearer connection to the destination switch. After it collects the dialed digits, the originating switch selects a bearer channel to allocate to this call and sends an Initial Address Message (IAM) towards the destination switch. The IAM specifies the identity of the selected bearer channel. When it receives the IAM, the destination switch sends an Address Complete Message (ACM) towards the originating switch and rings the called party’s phone. When the called party answers the call, the destination switch sends another message to the originating switch: namely, an Answer message (ANM). For a toll call, this is the signal that the originating switch should commence billing. Suppose for the sake of discussion that the calling party hangs up first. Then the originating switch sends a release message (REL) to the destination switch, which responds with a release complete message (RLC). In Figure 6.7, we have redrawn the call state model (see Figure 6.6) to reflect that the ISUP signaling exchange begins with an Initial Address Message, and that the active state is reached once the Answer message arrives. To keep the diagram uncluttered, we have used the abbreviations IAM and ANM, respectively. Since the model is not very granular, the diagram does not offer suitable places for the other messages described above. For the most part, we will not include finite state machine diagrams in our protocol descriptions. Matters would quickly get too complicated. Recall, for example, that there is a counterpart to the FSM of Figure 6.7 in the destination switch. Suppose we drew versions of the originating and terminating ISUP state machines that were sufficiently granular to show all state transitions related to messages in the
6.5
Signaling System 7 in Brief
Idle
59
Detect off-hook
Collecting information
Done collecting 7 or 10 digits
Detect on-hook Detect on-hook Analyzing information Active
Routing and alerting Receive ANM Figure 6.7
Send IAM
Revised call state model showing ISUP messages.
FSMs’ signaling dialog. If we then tried to interpose a schematic of the signaling flow itself, we would end up with a diagram that confused much more than it enlightened. Although complex FSMs are cumbersome to depict, it is important to keep the FSM concept in mind. We will often display sample signaling flows in so-called “ping-pong” diagrams. In our first example, Figure 6.8, we render the ISUP signaling flow described above. In this diagram, the vertical “axis” no longer runs up and down the protocol layers. Instead, it represents time, which elapses as we proceed downward. Before moving on, we note that ISUP runs directly on top of MTP3. Although the standards allow for SCCP to be interposed between ISUP and MTP3, it is not necessary to do so. To the best of our knowledge, no carrier has implemented ISUP over SCCP.
Originating switch
Destination switch IAM ACM ANM ...telephone conversation... REL RLC
Figure 6.8
ISUP call flow diagram.
60
6.6
Protocols
Summary Although the ability to transmit bits across physical media is crucial to telecommunication, this capacity is only a part of the functionality that is present in any telecommunication network. One might be tempted to think of the remaining required functionality as being all of a piece. The complexity of the task is such that it needs to be subdivided, however, and it is natural to build functionality in a layered fashion. A protocol stack is a means of realizing all of the capabilities necessary for end-to-end communication. Protocol stacks are modular, and the main modules are usually called layers. As we move up a protocol stack from the physical layer, each subsequent layer builds on the functionality of previous layer(s). When network elements talk to each other, they do so at a variety of layers simultaneously. Let us elaborate on the last point. In a protocol stack running on a given network element, each layer is represented by a protocol entity. Protocol entities operating at the same layer, but on different network elements, conduct dialogs by exchanging packets. Each layer has its own packet format that embodies the semantics of such a dialog. When a chunk of information is submitted by an upper-layer protocol for transmission, packet headers are prepended by each layer as the data “descends” towards the physical layer. By the same token, headers are peeled off of incoming packets (in last in, first out fashion) as those packets “ascend” the protocol stack. How does one come to understand a given protocol? We know of no protocol that functions all by itself. Therefore, as a starting point, one can describe how that protocol fits into a protocol stack. For such discussions, it is useful to have a point of reference. In this chapter, we described the functionality associated with the physical, data link, network, and transport layers. We looked at examples of protocols that are widely deployed today: namely, TCP/IP and various SS7 protocols. We saw some of the major differences between the TCP/IP and SS7 protocol suites by relating both to the framework of Table 6.1. To understand the functionality of a given protocol, it also helps to examine the packet headers used by that protocol. In Section 6.3.2, we discussed the TCP header. In the process of describing the semantics of several header fields, we outlined TCP’s main functionality. For a more detailed understanding, one can look at protocol state machines. We introduced the notion of an FSM and gave a simplified example of a circuit-switched call model. State machine descriptions are useful for capturing the way protocol entities behave. State transitions in protocol state machines are often associated with receipt or transmission of protocol messages. Such an association reflects expectations. At each point in a signaling dialog, that is, a given protocol entity expects some types of messages and not others. In an FSM representation, points in the signaling dialog correspond to states. Receipt of an unexpected message type might trigger a transition to an error-processing state, whereas receipt of a message type that is appropriate at this point in the signaling dialog leads to the next state in the “normal” progression. Protocol state machines tend to be complex. One reason for this is that they must provide for graceful handling of a wide variety of error conditions.
6.6
Summary
61
Signaling flow diagrams provide a third way to understand signaling protocol functionality. These so-called “ping-pong” diagrams are quite useful; we will encounter a number of them as we proceed.
References [1] [2] [3] [4] [5] [6] [7] [8]
Postel, J., RFC 793, Transmission Control Protocol, IETF, September 1981. Stevens, W. R., TCP/IP Illustrated, Volume 1: The Protocols, Reading MA: AddisonWesley, 1994. Wright, G. R., et al., TCP/IP Illustrated, Volume 2: The Implementation, Reading MA: Addison-Wesley, 1995. Stevens, W. R., TCP/IP Illustrated, Volume 3: HTTP, NNTP, and the Unix Domain Protocols, Reading MA: Addison-Wesley, 1996. Wilder, F., A Guide to the TCP/IP Protocol Suite, Norwood, MA: Artech House, 1998. TS 29.002, Mobile Application Part, 3GPP. Recommendation Q.761, Signaling System No. 7—ISDN User Part—Functional Description, ITU-T, December 1999. T1.113, Signaling System, No. 7 (SS7)—Integrated Services Digital Network (ISDN) User Part, ANSI, 2000.
.
CHAPTER 7
A Closer Look at Internet Protocol In this chapter, we will look at IP itself, as well as a number of related topics in IP-based networking. Since we will not be able to do full justice to these topics, we give numerous references for further reading. Thus portions of this chapter read like an annotated bibliography. For those unfamiliar with the Internet Engineering Task Force (IETF), it helps to know the following. Internet specifications begin life in the IETF as Internet drafts. Internet drafts that pass muster become requests for comments (RFCs). There are a number of categories of RFCs (including informational, best current practice, proposed standard, draft standard, and standard). All RFCs are permanently available at www.ietf.org and www.rfc-editor.org, regardless of category. By searching the latter URL, one can learn which RFC(s), if any, have obsoleted or updated a given RFC. IETF is organized into working groups; www.ietf.org also serves as a point of access to working groups’ charters and the documents they produce. IPv4, which is currently predominant, appeared as an IETF RFC in 1981 [1]. IPv4 has a number of shortcomings; IPv6 ([2] in its first incarnation, later supplanted by [3]) was designed to overcome these deficiencies. The IPv4 embedded base is huge, so migration to IPv6 will take a long time. To facilitate a basic understanding of IPv4 and IPv6 (and to get a glimpse of the differences between the two), we will examine the packet headers for both protocols. Why Migrate to IPv6?
If the migration to IPv6 promises to be difficult, why undertake it at all? The main driver is the size of the IPv4 address space: in a world where all sorts of devices (e.g., mobile phones, vending machines, appliances) will “speak” Internet Protocol, the IPv4 address space will eventually be exhausted. When this exhaustion will occur is a matter of some debate. What Happened to the Other Versions?
IPv4 was the first widely deployed version of Internet Protocol. There were two precursor documents to the IPv4 RFC previously cited ([4] and [5], no longer active). But there was no IP version 1, 2, or 3 per se. (The version number indicates that some iteration took place before the Internet community settled on IPv4, however). Similarly, there is no IPv5 RFC. However, the Internet Stream Protocol (ST-II) protocol specification [6, 7] stipulated that the “version” field in the IP packet header should be set to 5.
63
64
7.1
A Closer Look at Internet Protocol
The IPv4 Header The IPv4 header format is displayed in Figure 7.1. In the following header field descriptions, field values (when specified) are given in hexadecimal. This is indicated by the prefix 0x. The value of the 4-bit Version field is, of course, 0x4. The 4-bit Header Length field is reckoned in 32-bit words. With no options, the value of this field is 0x5 (or, in other words, a header with no optional fields is 20 bytes long). Since the maximum expressible value in a 4-bit field is 0xF, we see that the maximum combined length of all optional fields is 10 words (which equals 40 bytes). We will discuss the Differentiated Service/type of service (DiffServ/TOS) field when we look at quality of service in Section 7.7.2. Unlike the Header Length field, the 16-bit Packet Length field is reckoned in bytes. The byte count includes the packet header. The maximum value is 0xFFFF (which equals 65,535 decimal). Thus, when a datagram (that is, a chunk of data to be transmitted) is larger than 64K, it must be fragmented into multiple IP packets. The IPv4 header has several fields that relate to fragmentation, starting with the 16-bit Identification field. This is set by the original sender of data and is copied into each fragment during the fragmentation process. The Reserved (RES) bit must be 0. If the Do Not Fragment (DF) bit is set to 1, then this packet may not be fragmented. The More Fragments (MF) bit is set to 0 if and only if this packet is the last fragment of the datagram. The 13-bit Fragment Offset is reckoned in 8-byte units (unlike the Header Length and Packet Length fields). It specifies the location of the fragment in the reassembled datagram (i.e. the distance from the beginning of the datagram to the beginning of the fragment). In IP networks, transitory routing loops can arise. To ensure that packets do not go around in circles for an extended period of time, the 8-bit Time To Live (TTL) field is decremented by 1 each time the packet traverses a router. If TTL reaches 0,
Header length
Version
R D M E F F S
Identification
TTL
Packet length
DiffServ/TOS
Header checksum
Protocol
Source address
Destination address
Options Figure 7.1
The IPv4 header.
Fragment offset
7.2
The IPv6 Header
65
the packet is discarded. So TTL, which is usually set to 0xFF by the packet’s creator, is the maximum number of routing hops. The 8-bit Protocol field specifies how the payload of the packet should be interpreted (i.e. which protocol it adheres to). Examples include TCP (value 0x6). The 16-bit Header Checksum is used to verify the integrity of the packet header. The Source and Destination Address fields are each 32 bits long. As noted earlier, the total header length depends on the options selected; we omit details. 7.1.1
Fragmentation and Path MTU Discovery
The maximum transmission unit (MTU) of a path is the size of the largest (unfragmented) packet that can be transported across that path. Each link along the path has an MTU imposed on it at the data link layer; the path MTU is the minimum of the constituent link MTUs. To determine the path MTU to a given destination [8], an IP node sends a packet whose size is the MTU of the egress link for that destination, setting the DF bit to 1. (Since the path MTU cannot exceed the MTU of the egress link, the node takes the latter as its “estimated MTU.”) If some router along the path cannot forward the packet because its length exceeds the MTU of the outoing link, that router discards the packet and sends an Internet Control Message Protocol (ICMP [9]) error message back to the originating node. The MTU of the outgoing link is included in the ICMP message. Upon receipt of this message, the originating node therefore has a new estimated MTU; it repeats the process until it stops receiving ICMP error messages. Note that, if the DF bit is not set to 1, intermediate routers will simply fragment the packet as necessary and send it on its way. In the process, the correct fragment offset must be computed and the MF bit must be set to the appropriate value in each fragment. Lastly, we note that not all IPv4 endpoints implement path MTU discovery.
7.2
The IPv6 Header The IPv6 header, which has a fixed length of 40 bytes, is displayed in Figure 7.2. The first thing to point out is that the following IPv4 header fields are omitted altogether from the IPv6 header: Header Length; RES, DF, and MF bits; Fragment Offset; and Header Checksum. These are the shaded fields in Figure 7.1. The 4-bit Version and the Source and Destination Address fields (which are 128 bits each) correspond to the IPv4 header fields of the same names. The 8-bit traffic class field is analogous to IPv4’s DiffServ/ToS field; see Section 7.7.2. The 20-bit Flow Label field, which has no counterpart in the IPv4 header, is used to identify individual traffic streams or aggregates. The use of the Flow Label still seems ill-defined, although a recent RFC [10] gives a basic description. IPv6’s 16-bit Payload Length field is reckoned in bytes. Payloads larger than the nominal limit of 65,535 bytes can be accommodated by setting this field to 0 and appending a “jumbogram” extension header. The 8-bit Next Header field replaces IPv4’s protocol field and expands its role: the next header can be a higher-layer Protocol header
66
A Closer Look at Internet Protocol
Version
Traffic class Payload length
Flow label Next header
Hop limit
Source address
Destination address
Figure 7.2
The IPv6 header.
(e.g., TCP or UDP) or an IPv6 extension header. The 8-bit Hop Limit field serves the same function as IPv4’s TTL field (and is more aptly named). Recall that several IPv4 header fields lack counterparts in the IPv6 header. The absence of one of those fields, the Header Checksum, indicates that error detection is left to other layers. Of the six omitted fields, four (Identification, DF, MF, and Fragment Offset) are used to manage fragmentation. For reasons of efficiency, IPv6 does not support packet fragmentation by routers so clearly there is no need for a DF header bit. Except for this difference, path MTU discovery in IPv6 [11] using ICMPv6 [12] proceeds as described in Section 7.1.1. All IPv6 interfaces must support MTUs of at least 1,280 bytes (although the standards say that IPv6 nodes should perform path MTU discovery to take advantage of larger MTUs whenever possible). 7.2.1
IPv6 Extension Headers
Although IPv6 routers do not fragment packets, IPv6 endpoints can fragment packets. TCP and SCTP support fragmentation and reassembly, so in many cases fragmentation at the IP layer is unnecessary. Note, however, that UDP does not have any such facility. Whenever an IP source node fragments an IPv6 packet, it needs to tell the destination node how to reassemble that packet. For this purpose, Identification, MF, and Fragment Offset fields accompany each fragment, just as they do in IPv4. However, these fields are relegated to an IPv6 extension header, which can be safely ignored by intermediate nodes. In the interest of performance, the IPv6 header is streamlined so that processing at routers is held to a minimum. The fragment header is not the only IPv6 extension header; numerous others are defined. Examples include routing, authentication, and two kinds of security headers. Like the IPv6 header itself, each extension header has a Next Header field. So extension headers can be “daisy-chained” between the IPv6 header and the transport layer header.
7.3
Addressing and Address Resolution
67
In some cases, additional processing is required at intermediate nodes. In this case, a hop-by-hop options extension header is used; when present, this must be the first extension header after the IPv6 header itself. When an intermediate node receives an IPv6 packet, it can therefore look at the Next Header field to find out whether a hop-by-hop extension header is present. If not, then all extension headers are ignored. This contrasts with IPv4, in which routers must examine all of the options that are present in IPv4 headers. When a packet contains a routing extension header (which is used to explicitly specify nodes to visit en route), the destination address in the IPv6 header may not reflect the packet’s ultimate destination.
7.3
Addressing and Address Resolution 7.3.1
Conserving IPv4 Address Space
In the early 1990s, blocks of IPv4 addresses were being consumed at an alarming rate. When an organization requested a block of addresses from the central authority, it was normally assigned either a so-called “class C” block (each of which consisted of 254 host addresses) or a “class B” block (which consisted of 65,534 host addresses. For completeness, we note that address classes A, B, and C are defined, alongside IPv4 itself, in RFC 791 [1].) Many an organization that was too large for a class C block received a class B block, even if it really only needed a few thousand IP addresses. To stave off exhaustion of the IP address space, classless inter-domain routing (CIDR [13, 14]) was developed. People also came to realize that, for hosts to communicate within private IP networks, globally unique IP addresses were unnecessary. IETF RFC 1918 formally set aside portions of the IPv4 address space for private use [15]. Private addresses can be (re)used internally by any number of organizations. Network Address Translation
Suppose we assign a private IP address to a host. How can we connect that host to the public Internet? This is routinely accomplished using network address translation (NAT, [16]). This is really an umbrella term, since there is more than one kind of NAT. We do not give details; however, the general idea is that an intermediate node (which is also called a NAT) acts as a gateway to the public Internet. Whenever it receives a packet from a private host (that is, a host that has been assigned a private IP address), the NAT replaces the source address with a globally unique address. At the transport layer, the source port number may also be altered. For incoming packets destined to the private host, the NAT must perform the reverse translation on the destination address. Note that NATs are stateful: they must keep track of bindings between address/port number pairs on their private and public interfaces. How does this conserve IP addresses? The bindings are established temporarily on a per session basis. Moreover, many internal hosts can be represented by the same IP address when they talk to the outside world. That is, many private IP addresses can be bound to the same external IP address so long as different port numbers are used.
68
A Closer Look at Internet Protocol
So NAT promotes efficient use of globally unique IP addresses. It also enhances privacy, since the bindings already described are transitory. But NAT also creates problems, notably in the realm of scalability. Moreover, IP addresses are used in many higher-layer protocols. These encapsulated IP addresses are not altered by NATs per se. Instead, the necessary translations are performed by so-called application level gateways (ALGs). ALGs know where to find IP addresses within higher-layer protocol messages. ALGs often reside on NAT devices but are troublesome for a number of reasons. Among them is the following: when a new application layer protocol comes along, ALG functionality has to be upgraded to support it. In addition, NAT traversal is a problem for end-to-end security associations. Dynamic Host Configuration Protocol
Dynamic Host Configuration Protocol (DHCP) is another scheme for conserving IP addresses through reuse. Servers (e.g., Dial-Up servers) at Internet service providers typically have pools of IP addresses at their disposal. As part of the sign-on process, such a server will temporarily assign an IP address to the user in question. When the user logs off, his/her IP address is relinquished and returned to the “available” pool. DHCP has the added benefit of making networks easier to administer: host IP addresses are assigned on the fly, without intervention by network management personnel. 7.3.2
The IPv6 Address Space
The IPv6 address space is subdivided into a number of address categories, as specified in RFC 3513 [17]. (We note here that a current Internet draft sets forth a revised version of the architecture described in RFC 3513; the reader can keep abreast of the latest developments by visiting the IPv6 working group’s home page at www.ietf.org.) Address types include unicast, anycast, and multicast. We will not cover anycast or multicast addressing in any detail. However, we offer the following examples. To find a nearby subnet router, an IPv6 host can use a well-known anycast address. Predefined well-known multicast addresses include “all routers.” Note that IPv6 addresses have well defined scopes, so “all routers” does not mean “all routers in the entire world.” In this regard, we also note that IPv4 defines broadcast addresses, but IPv6 does not. The Unicast address space is further subdivided into Globally Unique addresses, Link-Local addresses and other categories which we do not enumerate. Usage of the former is defined in [18]. The lower-order 64 bits is typically a globally unique Interface ID whose format is defined by the Institute for Electrical and Electronics Engineers (IEEE). The higher-order 64 bits are ‘001’ followed by a 45-bit global routing prefix and a 16-bit subnet ID. In the so-called Stateless Autoconfiguration process [19], an IP node supplies the lower-order 64 bits of its IP address and obtains the higher-order 64 bits from the network. There is precedent for this type of approach. For example, the ATM Forum’s Integrated Local Management Interface (ILMI) specification [20] defines a similar scheme for configuring ATM nodes. In the interest of privacy, there is a variant of the aforementioned autoconfiguration scheme in which the lower-order 64 bits are randomly generated. This variant
7.4
Security and AAA
69
requires an additional capability to make sure that no two systems in the same domain are assigned the same lower-order 64 bits. In closing, we note that the IETF initially defined IPv6 site-local addresses. Site-local addressing was supposed to be IPv6’s version of IPv4 private addressing but has recently been deprecated. At the time of this writing, a consensus alternative to site-local addressing had not yet been defined.
7.3.3
Uniform Resource Identifiers and Domain Name System
When users access resources in IP networks, usually there is a level of indirection involved. For example, although one can enter a “raw” IP address in a Web browser, it is far more common to enter a universal resource identifier (URI) instead. URI syntax is defined in IETF RFC 2396 [21], which says that a URI is “a compact string of characters for identifying an abstract or physical resource.” Regarding the distinction between URI and the better-known term Uniform Resource Locator (URL), RFC 2396 says that a URI can be a locator, a name, or both. Moreover, “The term ‘uniform resource locator’ (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism...”[21]. We will use the terms URL and URI interchangeably. To access a host across an IP network, its IP address must first be obtained. (This is normally transparent to the end user.) Domain name system (DNS [22, 23]) resolves URIs (such as http://www.cingular.com) to IP addresses. To access DNS, it is necessary to have the IP address of a DNS server; DNS server addresses are often entered manually when computers are configured for network access. In summary, DNS implements bindings between URIs and IP addresses. URIs have the obvious mnemonic benefit (i.e., that http://www.cingular.com is easier to remember than an IP address). A less obvious benefit is this: any changes to the IP address of Cingular Wireless’ Web server are transparent to Web users. As long as the appropriate DNS binding is kept up to date, users can reach the Web site.
7.4
Security and AAA 7.4.1
Security
Security is a multifaceted subject. Within the IETF, there is not a single security working group that has the last word on this subject. Rather, there is a security area in which there are some 21 active working groups. Moreover, every new RFC incorporates a “security considerations” section. We mention only two of the groups in the security area: •
The Transport Layer Security (TLS) working group has produced RFC 2246 [24], which defines the TLS protocol. TLS is widely deployed. For example, URIs that begin with “https:” refer to resources that run HTTP over TLS. (Often, such a URI will appear in a browser window when a secure transaction, such as entry of credit card data, takes place.)
70
A Closer Look at Internet Protocol
•
7.4.2
The IP Security Protocol (IPSec) working group has issued many RFCs, including a batch consisting of 12 RFCS (numbered 2401–2412) in the fall of 1998. From that batch, we single out two overview documents [25, 26] along with authentication header [27], encapsulation security payload [28], and two specifications dealing with encryption keys [29, 30].
Authentication, Authorization, and Accounting
How do users gain access to services in IP networks? How do service providers collect information necessary to bill for services and otherwise monitor the use of their networks? These issues are usually lumped together under the heading “authentication, authorization and accounting (AAA).” The most widely deployed AAA protocol is Remote Authentication Dial-In User Service (RADIUS). As the name suggests, this protocol was developed to fill a void in dial-up networking. This usage is well documented in RFC 2865 [31], the baseline RADIUS specification, and numerous other RFCs generated by the RADIUS working group (which is now concluded). RADIUS deployment has gone far beyond its initial milieu (it is the de facto AAA protocol for IP networks), but documentation of “extended” use cases is uneven. In many cases, a host that wants to gain access to an IP network must contact a DHCP server to obtain an IP address and must also successfully complete AAA procedures. Thus DHCP and RADIUS protocol entities often reside on the same server. RADIUS does what it was designed to do, but it is now deployed in settings that expose its limitations. To overcome these limitations, IETF’s AAA working group has crafted a successor protocol called Diameter [32], which recently reached RFC status after many delays. Diameter has been a contentious issue in the IETF, as many people thought it would be better to develop an enhanced version of RADIUS rather than a new protocol. Diameter requires the use of a transport layer security scheme (such as TLS or IPSec), whereas RADIUS does not. RADIUS normally runs over UDP and does not specify congestion control functionality, whereas Diameter runs over SCTP or TCP (and can therefore benefit from the transport layer protocol’s congestion control capabilities). Moreover, Diameter makes explicit provisions for failover scenarios, whereas RADIUS does not. Diameter specifies error message formats (whereas RADIUS does not) and specifies proxy behavior more completely than its predecessor. Moreover, Diameter defines three other kinds of “agents” (in addition to proxy agents): relay agents, redirection agents, and translation agents. Diameter maintains more state information than RADIUS; in particular, Diameter has notions of session state and transaction state. Diameter’s additional functionality does not come for free—it generates much more overhead than is necessary to run RADIUS. This is a major reason that industry support for Diameter has not been unanimous. At this point, RADIUS is still the dominant AAA protocol and it is not clear how soon Diameter will be widely deployed.
7.5
7.5
Routing
71
Routing Many routing protocols incorporate optimization algorithms. Before discussing Internet routing protocols, it is helpful to look briefly at network optimization in general. In a nutshell, the point we want to make is this: ideally, one would like to determine routes by solving a multicommodity network flow problem. (We describe this class of problems in the next section.) But the difficulties of doing so in distributed fashion are substantial. In large networks, the difficulties are in fact prohibitive. So data network routing protocols are, of necessity, based on simpler optimization problems. Significant limitations result; we elaborate now. 7.5.1
Network Optimization
Network optimization refers to the family of optimization problems that can be posed in terms of graphs. A graph consists of nodes and interconnecting arcs. In our context, network elements such as voice switches and IP routers are the nodes and transmission links are the arcs. We will use the terms arc and link interchangeably. Network optimization is not limited to telecommunications. Transportation authorities and shipping companies apply optimization techniques to networks of roads, electric companies apply such techniques to power grids, and so on. As a result, the literature in this general subject area is vast. In this section, we introduce three well-known categories of network optimization problems: shortest path, minimum cost spanning tree, and multicommodity network flow. In each of these categories, a cost of traversal is assigned to each link. The shortest path problem is the easiest to describe. In it, origin and destination nodes are given, and the objective is to find a minimum cost path from origin to destination. The cost of a path is the sum of the costs of its constituent arcs. A tree is a graph with the following property: for any pair of nodes, there is one and only one interconnecting path. The cost of a tree is the sum of the costs of its arcs. For a general graph a spanning tree is simply a tree that contains all of the nodes in the original graph (and whose arcs are part of the original graph). With this terminology in hand, the objective of the minimum cost spanning tree problem is self-explanatory. Shortest path algorithms usually solve simultaneously for shortest paths from a given originating node to all possible destinations. This can be done efficiently, as was first demonstrated by Dijkstra [33]. The union of the solution paths is a spanning tree. (Note, however, that it is generally not a minimum cost spanning tree.) Network flow problems have additional structure. As a result, they are able to capture the relationship between demand and the resources necessary to satisfy that demand. Each link has a specified capacity. We are also given demand information: a number of units of some commodity that we must transport from origin to destination. This quantity is given separately for each origin-destination pair; commodities associated with different origin-destination pairs are not fungible (hence the term “multicommodity”). The objective is to satisfy demand at minimal cost. Note that the cost of traversing a link is a function of the total commodity flowing through that link (e.g., in the linear case it is proportional).
72
A Closer Look at Internet Protocol
Objectives
The usefulness of an optimization model depends on how well its objective function reflects what we are trying to do. Objectives vary from problem to problem. A trucking firm might seek to minimize total mileage traveled, viewing this as a proxy for fuel consumption. In this case, the arc costs would be intercity mileages. A designer of enterprise data networks might seek to minimize some metric of congestion or delay. A telco might seek to minimize blocking probability, viewing this as roughly equivalent to maximizing the total number of calls carried or minutes of use. In this case, the objective would be a nonlinear function of the offered load. Limitations of Optimization Models
Shortest path and minimum cost spanning tree problems are always feasible, so long as the underlying graph is connected (i.e., for any pair of nodes, an interconnecting path exists). The first thing to notice about network flow problems is that they are not always feasible: the demand may outstrip the arc capacities. One can also model congestion in the network flow context (although we must employ nonlinear arc costs to do so). So network flow formulations capture an essential feature that is not reflected at all in shortest path or spanning tree formulations. In a real-world network, the state of the system evolves dynamically over time. For an airline, weather-induced delays at one airport disrupt the whole system. For a telecommunications carrier, traffic load is constantly changing. Traditional optimization models do not capture the dynamics of time-evolving systems very well. There are various approaches to ameliorating this basic difficulty; each approach has a different set of strengths and weaknesses. 7.5.2
Internet Routing Protocols
Let us now look at routing in IP networks, with a mind toward understanding the choices that have been made, the reasons for those choices, and the trade-offs that result. Rather than covering the protocols in detail, our intent is to give just enough information to support this goal. Simply stated, the philosophy behind Internet routing protocols is to adapt dynamically to changing network conditions while making as few assumptions as possible about traffic patterns. The crux of the problem is this: each IP router must make forwarding decisions for the packets it receives on the basis of very limited information regarding the state of the network as a whole. Multicommodity network flow formulations are not tractable in large networks. Network flow algorithms do not lend themselves to distributed implementations. Moreover, it is not practical to solve a network flow problem in a centralized location and distribute the pertinent results to a large population of IP routers. In particular, updating numerous routing tables in accordance with changing network conditions becomes very problematic. 7.5.3
A Link State Protocol: OSPF
Most IP routing protocols are based on shortest path formulations. Open Shortest Path First (OSPF, [34]) is widely deployed and probably the best-known example.
7.5
Routing
73
OSPF messages carry so-called link state advertisements (LSAs). LSAs are originated by routers adjacent to the links in question and are propagated about an OSPF domain. Of the many types of information that can be carried in an LSA, we mention the following three: •
Adjacency information that allows receiving routers to determine the topologies of the networks they inhabit;
•
Information that allows routers to determine whether an incoming LSA is newer than the current entry in its link state database; Link cost information. Link costs are normally inversely proportional to their capacities.
•
This description is oversimplified but should serve to get the general idea across. Each OSPF router derives a view of the network topology from the LSAs it receives. It then runs a Dijkstra algorithm against this topology when it builds its routing table. So each OSPF router makes forwarding decisions based on its own routing calculations, and we can therefore say that OSPF routing is distributed. Ironically, OSPF’s shortest path calculation is not distributed. 7.5.4
Distance Vector Protocols: RIP and BGP
Shortest path calculations can be done in distributed fashion. Routing Information Protocol (RIP [35–38]) takes this approach. Each RIP router builds a distance vector; that is, a roster of all other routers in the domain and the shortest-path distances to each of those routers. Distance vectors are flooded throughout the network. For a time, inaccurate distance vectors are circulating—routers initially only know of their immediate neighbors, for instance, and “shortest seen so far” distances are not truly optimal in general. Each router eventually reaches optimality, however. Distance vector protocol entities do not model network topology: for each destination, each router knows which of its neighbors is the optimal next hop, but nothing more. In large networks, RIP suffers from two major flaws: routing tables do not converge very quickly (see Section 7.5.5 for discussion), and routing information exchanges consume an immodest amount of transmission bandwidth. RIP predates OSPF; the latter was developed to overcome the shortcomings of RIP. RIP and OSPF are known as interior gateway protocols. Exterior gateway protocols provide a means of distributing routing and reachability information among OSPF domains (as well as domains that run RIP or other IP routing protocols). Border Gateway Protocol (BGP [39]), a distance vector protocol, is most commonly employed for this task. Given the fact that operators do not wish to fully divulge their network topologies, it makes sense that BGP is not a link state protocol. 7.5.5
Routing Protocol Convergence
Each routing protocol must specify exactly what information is exchanged among routers and how that information is propagated. This aspect of protocol design is just as important as the associated shortest path algorithm itself.
74
A Closer Look at Internet Protocol
Dynamic routing requires that routing information be updated (this is done in response to detected outages and also in the form of periodic refreshes). It is especially important to realize that full propagation of routing information does not happen instantaneously, especially in a large network. Thus, when a link outage occurs, some routers will be acting on outdated topology information longer than others. This results in transient routing loops. A simple example is depicted in Figure 7.3. Suppose that the three links drawn with heavy lines have the same transmission capacity, and that the Philadelphia-Baltimore link has smaller capacity than these three. Then, under normal circumstances, the router in Chicago (or simply “Chicago” in what follows) will route Baltimore-bound traffic via Atlanta. In the figure, the Atlanta-Baltimore link has just gone down. Atlanta, which becomes aware of the outage before Chicago does, bounces Baltimore-bound packets back to Chicago, which in turn sends them to Atlanta, and so on. The routing loop in our example will go away quickly. Let us suppose that the routers shown inhabit a single OSPF domain. Then Atlanta will tell its remaining neighbors of the outage by sending a link state advertisement (as will Baltimore). As soon as it becomes aware of the link outage, Chicago: •
Recomputes its routing table by running a Dijkstra algorithm against the updated topology;
•
Forwards the advertisement to its neighbors.
When routers in a network are all acting on correct routing information, we say that their routing tables have converged. Note that: •
While they exist, routing loops can cause severe congestion on their component links.
•
In large networks, routing information takes time to propagate. So routing loops can persist for long enough to cause trouble.
To other cities... To other cities...
Philadelphia
Chicago Baltimore
Baltimore
Baltimore Atlanta To other cities... Figure 7.3
Simple routing loop example.
7.5
Routing
75
Why is OSPF generally preferable to RIP? When a link outage occurs, it is more efficient all the way around to say so explicitly. RIP does not have a good way to do this (although RIP does at least offer a way to mark a route as invalid).
7.5.6
Scalability
So far, we have assumed that each IP router knows about every other router in the network. But this can only be true in a limited way for reasons of scalability (when the number of nodes in a network is very large, every node would otherwise have to maintain a large routing table) and security (operators do not want to divulge their network topologies to hackers, or even to each other). Because of its scalability limitations, RIP is only deployed in private networks of modest size. OSPF supports the subdivision of networks into areas. Each router has full topology information about its own area but only summary information about other areas in the same network. In an OSPF domain, one area is designated as the backbone area; it is responsible for distributing routing information between areas. Even with the introduction of areas, OSPF has scalability limits. Security and scalability are the main reasons for the existence of exterior gateway protocols such as BGP.
7.5.7
Trade-offs
Load Balancing and Routing Hot Spots
Often there are equal-cost paths to the same destination. Let us return to the example of Figure 7.3, but now assume that the four links connecting Chicago, Atlanta, Baltimore, and Philadelphia all have the same capacity. We assume that links of the same capacity have the same cost of traversal. Since this assumption typically holds true, minimum cost routing often devolves to minimum hop-count routing. (The hop count should be roughly proportional to the amount of processing required to transmit packets across a given path, and so it is a very reasonable measure of cost.) Since the two paths connecting Chicago and Baltimore have equal costs, how do we choose between them? Ideally, we would like to balance traffic among these two paths, but this is not as simple a matter as it might initially seem. One possible approach is to subdivide the block of “Baltimore” addresses, sending packets destined for one sub-block via Philadelphia and the other sub-block via Atlanta. However, such a scheme would not adapt to variations in sub-block traffic volumes. Note also that traffic bound for various destinations other than Baltimore traverses one or more of the links appearing in Figure 7.3, so concentrating purely on traffic to Baltimore may not yield a satisfactory result for those links. IP networks tend to have bottleneck links (or routing hot spots). Even if there is diversity of equal-cost paths between traffic hubs, routing protocols do not tend to distribute traffic evenly among these paths. To be fair, load balancing schemes are used to advantage in IP networks. But we generally think of these schemes as blunt tools—they are insufficient for eliminating hot spots altogether.
76
A Closer Look at Internet Protocol
Affecting Routing by Adjusting Link Costs
If we adjust link costs, routing will be affected. For example, we might decide that the cost of traversal for a congested link should be high. There are two difficulties with effectively implementing this simple idea. The first is this: how would a router learn that a remote link (let us call it link L for definiteness) is congested? The endpoint(s) of link L would have to inform the rest of the network. Since congestion is a time-varying phenomenon, and it takes time for link state information to propagate through a routing domain, link L may not be overloaded by the time all routing tables have converged. There is also the question of frequency—how often should link status information be circulated? If updates are infrequent, routers are acting on outdated information. But frequent updates consume significant bandwidth that could otherwise be used for bearer traffic. Stability is also a problem: in avoiding link L, routers may cause other link(s) to congest. When link updates describing the new network state propagate, the pendulum may swing back to the state in which link L was overloaded (in which case the cycle begins again).
7.6
Reachability Information In the example of the previous two sections, all of the nodes are IP routers. However, end users want to access hosts (such as Web servers and e-mail servers); routers are just transit points along the way. To facilitate communication between hosts, routers advertise reachability to blocks of IP addresses. We will not cover this aspect of routing protocols in any detail. But to give the reader a rough idea of what we mean, let us pretend for a moment that routing in legacy telephone networks was entirely analogous to routing in IP networks (of course, this is not the reality). Then one or more switches in Chicago would advertise reachability to the 312 area code, switch(es) in Atlanta would advertise reachability to the 404 area code, and so on. Such an advertisement would, in the first case, essentially boil down to saying: “if you want to call anyone in the 312 area code, route the call to me and I can handle it from there.” Reachability advertisements would be circulated throughout the network so that calls could be properly routed regardless of their origination points.
7.7
Quality of Service and Statistical Multiplexing We believe that quality of service (QoS) in IP networks faces two main implementation hurdles: •
Statistical multiplexing will “take a hit.” We explain this statement in the next section.
•
Increased control-plane complexity will be necessary.
In addition to the technical aspects of these problems, there has been some reluctance from a philosophical point of view. In the IETF, there has traditionally been a distaste for complex control-plane signaling. To emphasize this point, we
7.7
Quality of Service and Statistical Multiplexing
77
could use the word “heavyweight”: one might imagine a network that is bogged down by a ponderous control “superstructure.” Can a robust, scalable QoS implementation be developed with a lightweight approach to control? This question is so general as to be almost rhetorical, but it may be worthwhile to keep in mind as we embark on our discussion of IP QoS. Let us imagine a conversation between a data-networking expert and a circuit-switching expert. Both have read an article that hypes convergence between the data networking and voice domains; they are discussing the article. The circuit-switching expert might very well say: “Whatever you do, don’t mess up my quality of service.” The data-networking expert’s retort might be: “It’s fine with me if you want to carry voice on my network, as long as you don’t mess up my statistical multiplexing.” In our current context, QoS will mean guarantees on bit rate, latency and jitter. Voice is a delay-sensitive application, so the importance of latency is easy to understand. Not only do digitized voice samples need to make it across the network quickly, they also need to arrive at the decoder at very regular intervals (low jitter). Bit rate guarantees are necessary to make sure that voice samples are delivered in the first place (rather than dropped at a congestion point somewhere along the way). Historically, developments in the data networking domain have not been much concerned with QoS. This makes sense, as traditional data networking applications such as e-mail and Web access are not particularly delay-sensitive. The “classical” data networking protocol suite (particularly TCP/IP) is, however, designed to exploit statistical multiplexing to the fullest. We say that traditional data networking employs a best-effort service model. We have seen that TCP tries to make the best use of available transmission capacity; if packets get lost along the way, it resends them and adjusts its flow parameters accordingly. Thus we could say that reliability (in the form of TCP retransmissions) is implemented at layer 4. The IP QoS framework that we discuss later in this section represents a real sea change for the Internet community—we will see that QoS is implemented at layer 3 and below.
7.7.1
What Is Statistical Multiplexing?
Data traffic tends to be bursty. Suppose several computer users are sharing the same link to the Internet; all are periodically generating bursts of traffic as they download web content. The main idea of statistical multiplexing is this: the users’ traffic bursts are likely to happen at different times and are therefore unlikely to interfere with one another. Of course, if the number of active users sharing a limited amount of bandwidth is extremely large, congestion will result. But the number of users that can be supported without chronic congestion is surprisingly large. The reason is that bursts are interspersed with periods of inactivity; during periods of inactivity, users consume essentially no bandwidth. Telephone networks allocate a constant bit rate to each call (throughout the life of the call); whenever either participant wants to speak, the bandwidth is there. By now it is probably clear that QoS and statistical multiplexing are competing objectives. To deliver QoS, one must relinquish a certain amount of statistical
78
A Closer Look at Internet Protocol
multiplexing. That is, suppose we want to implement packet telephony on a large scale and support carrier-grade voice quality in the bargain. Then it is not possible to achieve the same degree of statistical multiplexing that is now de rigueur in best-effort data networking. 7.7.2
Differentiated Services
The usage of the Type of Service octet in the IPv4 header has evolved over time, and the terminology has changed. Along the way, RFC 2474 [40] defined the meaning of this header field, essentially renaming it the DiffServ (or DS) field. RFC 2474 attached the same nomenclature to IPv6’s Traffic Class header field. DiffServ only uses the six most significant bits of the octet in question, however; RFC 3260 [41] sets the record straight by formally defining these bits as the so-called DSField. (Meanwhile, RFC 3168 [42] assigned the remaining two bits to Explicit Congestion Notification.) Preliminary note on terminology.
Most discussions of QoS in IP networks mention DiffServ somewhere along the way. In the DiffServ approach, complex packet classification and traffic conditioning functions (such as shaping and policing) only need to be implemented at network boundary nodes. In contrast, each intermediate node along a given traffic stream’s path is only required to perform comparatively simple functions in handling that traffic. This approach is taken so that scalability is not compromised; it is also consistent with the general preference, described at the beginning of our QoS discussion, for lightweight control. DiffServ RFCs include [40, 43, 44]; for a complete rundown, one can consult IETF’s DiffServ working group (which is now concluded). The “comparatively simple functions at intermediate nodes” previously mentioned are called per-hop forwarding behaviors (PHBs). A PHB defines a means of allocating buffer and bandwidth resources at each node among the traffic streams that compete for these resources. PHBs are selected using the DiffServ Code Point (DSCP). A DSCP is six bits long and is “encoded” in the aforementioned DSField in the IP header. The DiffServ working group issued standards-track RFCs defining two classes of PHBs:
The DiffServ Architecture.
•
Expedited Forwarding (EF) provides the ability to configure a bit rate (R, say) and guarantee that high-priority packets (i.e., those with DSCPs indicating that they should receive EF treatment) are served at an aggregate rate of at least R. Details appear in [45–47].
•
Assured Forwarding (AF) defines four classes of packets, providing a means of dividing buffer and bandwidth resources among those four classes. Relative importance of packets within the same class can be distinguished by means of the so-called “drop precedence” value. See [48] for details.
Although it may not be obvious from the names, Expedited Forwarding is much more stringent than Assured Forwarding. The former is oriented towards “hard” QoS guarantees, whereas the latter is not. Of the two, Assured Forwarding is much more widely used.
7.7
Quality of Service and Statistical Multiplexing
79
The DiffServ framework allows for other classes of PHBs to be defined in the future, and anticipates that service providers may want to tailor the DSCP-to-PHB mapping to fit a wide variety of traffic requirements. This aspect of DiffServ is still evolving. Before DiffServ, There Was IntServ
The so-called Integrated Services (IntServ) model emerged from early work on IP QoS. In the IntServ approach, resources (e.g., buffer and bandwidth) can be explicitly allocated to specific packet streams, or flows. Indeed, one of the stated assumptions in the architectural overview document [49] is that such explicit resource management is necessary to meet the requirements of real time applications. The Resource ReSerVation Protocol (RSVP) protocol specification [50] was later proposed as the resource management mechanism [51]. The mentality of the latter RFC is, at least in part, that individual applications can signal their requirements. To many in the Internet community, this smacked of a heavyweight control plane; as a result, IntServ as specified in the RFCs above has received a lukewarm reception. 7.7.3
Multiprotocol Label Switching
Let us first discuss the original motivation for multiprotocol label switching (MPLS). “Wide area” connections between IP routers usually traverse layer 2 switches and are usually static. Such static connections are manually provisioned using management software that has no awareness of layer 3. (Recall that IP routers base their forwarding decisions on the contents of IP headers; we say that they operate at layer 3. Note that it would be very expensive to deploy a router in place of each intermediate layer 2 switch. Moreover, latency would increase.) This begs the question: “is it possible to automate the process of interconnecting routers?” A wide area connection is necessary whenever the routers to be linked are too distant to reside on the same local area network (so the “layer 2 interconnect” issue has come up repeatedly). The question of how to approach the aforementioned automation, especially for ATM at layer 2, was a topic of intense debate in the mid-1990s. The MPLS philosophy is that routing for layer 2 interconnections should be informed by the intelligence already present in IP routing protocols such as OSPF and BGP. MPLS eventually established itself as the frontrunner among competing “IP over ATM” schemes. It took time for the MPLS specifications to stabilize—early in 2001, the IETF formally released a batch of MPLS RFCs. We think of [52–54], as the “foundational” RFCs; RFCs [55–59], which were released at the same time, nail down details (particularly for ATM and Frame Relay). ATM and Frame Relay are today’s predominant layer-2 technologies in wide area networks; they are briefly described in Appendixes A.3 and A.2, respectively. Ethernet is making inroads into this market, but Ethernet-MPLS interworking is not standardized as of this writing; see Section A.4 in the appendix. In summary, MPLS sets up paths at layer 2; these are called label switched paths (LSPs). On the one hand, LSP routes are determined by Internet routing protocols. On the other hand, IP packet headers are examined only at LSP ingress and egress
80
A Closer Look at Internet Protocol
nodes. (That is, a node in the “interior” of an LSP can forward the associated traffic stream without going to the effort of examining IP headers.) MPLS and IP QoS
What does all of this have to do to do with QoS in IP networks? First, resources (e.g., bandwidth) and even routes can be explicitly assigned to LSPs. Two proposed schemes incorporate such functionality into the MPLS framework. One approach [60, 61] is based on RSVP (see the IntServ discussion in Section 7.7.2); the second approach [62–64] is quite different. Second, incoming packet streams can be mapped to LSPs based on their QoS requirements. As an illustration, different traffic classes headed to the same destination might be assigned to different LSPs. As one might expect, work in this direction seeks to harmonize MPLS with DiffServ [65]. The topics mentioned in this section are quite immature. DiffServ and MPLS are both relatively young at the time of this writing, so it is natural to expect that their “confluence” is quite early in its maturation cycle. 7.7.4
“DiffServ at the Edge, MPLS in the Core”
To the degree that there is a consensus approach to IP QoS, it can be summed up by the phrase “DiffServ at the edge, MPLS in the core.” Roughly speaking, we can envision this in the following way: •
DiffServ marking is performed on packets entering a given IP domain. This means that the DiffServ/Traffic Class field in each IP packet header is set to a value that reflects the application’s QoS requirements.
•
On relatively low-capacity links near the edges of the domain, IP routers implement appropriate per hop behaviors for each class of traffic. LSPs will traverse high-capacity links in the core of the network. Such an LSP will aggregate traffic from many sessions having similar QoS requirements. More specifically, the traffic is aggregated at LSP ingress, transported through the core, and deaggregated at LSP egress.
•
We will return to this topic when we discuss traffic engineering in Section 15.1. 7.7.5
Multiservice Networks
For a number of years, the IP networking faithful (and ATM boosters before them) have touted the promise of voice, video, and data over the same network. It is a worthy goal, and the idea of offering a rich variety of revenue-generating services over a single backbone is a compelling one. In the short and medium terms, however, we believe it will be very hard to live up to the hype. Let us qualify this statement. In some settings (e.g., corporate campuses), we expect to see steady progress toward multiservice ideals. However, we also believe that the “carrier-grade voice” sphere (that is, the realm now inhabited by telcos) will
7.8
Layer 4 Protocols: Suitability to Task
81
continue to exist. In telco backbone networks, it will be years before packet voice reaches a comparable scale to that of circuit-switched voice. So the first point is that telco backbone networks will evolve slowly toward packet voice. Our second point is that, in the early phases of this evolution, backbone network elements will be dedicated to packet voice. In our mind, the fundamental reason for this is that packet voice will be compared with circuit-switched voice; if packet voice is implemented in a way that is markedly inferior to its older sibling, many people will stick with the latter. We believe it is only a slight oversimplication to say that: •
Network elements and management systems that are able to deliver carrier-grade voice will be very expensive.
For a long time, it will not make sense to “throw this kind of money” at traditional data networking applications. There is another key reason that the promise of full-blown multiservice networking is a long way off: security. That is, a true multiservice network is a very different security environment than a traditional telco network. •
7.8
Layer 4 Protocols: Suitability to Task What sits on top of IP? We have already looked at TCP. But TCP takes a back seat in the realm of IP telephony. To lay the foundation for our discussion of the bearer and control planes in subsequent chapters, we now introduce layer 4 protocols that play an important role there. 7.8.1
UDP
If TCP is available, why would anyone want to use UDP? As remarkable as its success has been, TCP is not suited to every task. TCP retransmits packets when it concludes that they did not reach their destinations. For a real-time application, there is no point in retransmitting a stale packet; it is just a waste of transmission bandwidth. Similarly, TCP’s flow control and reordering mechanisms are also of limited usefulness for real-time applications, which want to deliver packets periodically to destination hosts (rather than achieve bulk data transfers as quickly as possible). As shown in Figure 7.4 the UDP header is very simple. As is the case with TCP, applications may be multiplexed atop a single UDP protocol entity (think, for example, of voice and video streams emanating from the same host). The Source and Destination Port numbers in the UDP header are used to distinguish the applications. The Length header field is self-explanatory and the Checksum header field is used to detect corrupted packets. There is nothing more to UDP—the defining RFC [66] is only three pages long. Recall that QoS is managed at layer 3 and below for real-time services; UDP is suitable for these services because it adds as little overhead as possible. Note that UDP also has some uses in traditional data networking; for example, it is often used for domain name system queries. In such cases the DNS client (rather than TCP) is responsible for repeating queries that go unanswered.
82
A Closer Look at Internet Protocol
Figure 7.4
7.8.2
Source port (16 bits)
Destination port (16 bits)
Length (16 bits)
Checksum (16 bits)
UDP header.
Carrying SS7 Traffic over an IP Network: SCTP
We have seen that SS7 is a packet technology; as IP networks proliferate, it is quite natural to think of carrying SS7 messages over IP. This is especially true given that SS7 traffic volumes are still growing steadily—it is difficult to accommodate this growth with traditional 56 kbit/s SS7 links. However, SS7 places unique requirements on the underlying IP network. Neither TCP nor UDP is ideal for satisfying these requirements, so the IETF’s Signaling Transport (sigtran) working group came up with SCTP [67] to fill the void. General information about SCTP can be found in [68, 69]. All SCTP packets begin with a common header; see Figure 7.5. The Source and Destination Ports play the same role as for TCP and UDP, as does the Checksum field. Note, however, that the latter is twice as long as that of TCP and UDP. Moreover, the checksum computation specified in [67] was later replaced by a more robust scheme; see [70]. The receiver of an SCTP packet uses the Verification Tag to certify the identity of the sender. Like TCP (but unlike UDP), SCTP is a reliable transport protocol. It can provide sequenced delivery of messages within multiple streams and can bundle multiple messages into a single packet. One of the most important features of SCTP is its support of multihoming: SS7 networks are traditionally held to a very high standard of reliability, and thus fault tolerance is a must. We will talk about multihoming in Chapter 8. Also included in the design of SCTP are congestion avoidance behavior (similar to that of TCP) and measures to resist flooding and “spoofing” attacks. All of this is achieved by means of SCTP associations. (Readers familiar with TCP sockets can think of an SCTP association as a sort of generalized TCP socket.) The SCTP packet header is followed by a variable number of chunks, each with its own header. (Note the similarity of this concept to the design of IPv6, in which a simple header can be followed by a variety of extension headers, depending on the situation.) There are several types of chunks, most of which are used to set up, tear down, maintain, and control associations between SCTP entities. Each chunk type
Source port (16 bits)
Destination port (16 bits)
Verification tag (32 bits) Checksum (32 bits) Figure 7.5
SCTP common header.
7.8
Layer 4 Protocols: Suitability to Task
83
has its own header format. The SCTP data chunk format is used to encapsulate the user’s data for transport through the underlying IP network; this is the only format that we will examine directly. The data chunk header format appears in Figure 7.6. The value 0 in the Type field indicates that this is a data chunk. If the U (unordered) bit is set, then the receiving SCTP entity must pass the packet to the upper layer without any attempt at reordering. (Thus, unlike TCP, the SCTP user can selectively choose not to restore the original transmission order of incoming packets.) The B and E bits, if set, indicate the beginning and ending fragments of a user message, respectively. (In the case of an unfragmented message, both bits are set.) The Length field is self-explanatory. The remaining header fields are interpreted as follows: •
Transmission sequence number (TSN): SCTP assigns a TSN to each piece of data that it transmits (whether it be an entire message or a fragment thereof), independently of the stream sequence number. All TSNs are acknowledged by the receiving end; if the transmitter does not receive an ACK, it eventually retransmits. For a fragment of a segmented user message, TSNs must be in strict sequence.
•
Stream identifier: See description of payload. Stream sequence number: This must be the same for each fragment of a segmented user message. Payload protocol: This represents the higher-layer application; it is not used by SCTP itself.
•
•
The payload that follows the data chunk header is (all or part of) message number n within stream S, where n is the stream sequence number in the header and S is the stream ID. By keeping track of multiple streams, SCTP implements another layer of multiplexing (above the source and destination port numbers, that is). If one stream is blocked because a packet arrived out of order (i.e., an earlier packet in the stream has not yet arrived) and its U bit is not set (so it is an ordered stream), packets from other streams can still be proferred to the higher-layer protocol. 7.8.3
Comparing and Contrasting TCP with UDP and SCTP
UDP, TCP, and SCTP all multiplex using port numbers; they all employ checksums to recognize corrupted packets. This is all UDP does. TCP and SCTP offer reliable
Type = 0 (8 bits) Reserved U B E
Length (16 bits)
Transmission sequence number (32 bits) Stream ID (16 bits)
Stream sequence number (16 bits)
Payload protocol ID (32 bits) Figure 7.6
SCTP data chunk header.
84
A Closer Look at Internet Protocol
data transport by retransmitting lost packets, and they implement similar congestion avoidance schemes. TCP maintains strict ordering (i.e., it always delivers packets to the higher layer in the order in which they were transmitted). TCP is aggressive in the following sense: it increases its flow rate until it detects packet losses. SCTP’s flow control scheme is a superset of TCP’s flow control scheme. SCTP can be configured so that it does not seek to saturate the available transmission capacity a la TCP. Moreover, SCTP multiplexes streams; strict ordering can be enabled or disabled on a stream-by-stream basis.
7.9
Mobile IP With the advent of third generation wireless networks and wireless LAN, IP hosts can no longer be expected to stay in the same place. Mobile IP [71] provides a means for mobile nodes to dynamically change their points of attachments to IP networks. Mobile nodes have two kinds of IP addresses: home and care-of addresses. Care-of addresses are registered with so-called home agents. Packets sent to a host’s home address are tunneled (by the host’s home agent) to the appropriate care-of address.
7.10
Summary For the first several sections of this chapter, we talked about IP networking in general. Starting with the section on IP QoS and statistical multiplexing, we discussed changes that are coming about in IP networking to accommodate full duplex voice and other real-time services. We highlight the following points from that discussion: 1. QoS and statistical multiplexing are conflicting goals. Traditional IP networking seeks to maximize the latter without giving much thought to the former. QoS will “cost” something in terms of reduced statistical multiplexing. 2. Reliable transport of packets is traditionally the responsibility of layer 4 (embodied in TCP). Moreover, TCP’s flow control mechanism attempts to adjust to the available transmission capacity. But TCP’s capabilities are limited by the fact that it cannot see or directly harness the resources that reside below the IP layer. As a result, TCP is not the right tool to provide adequate QoS for real-time services. 3. Therefore, TCP is not a central protocol in packet telephony. Instead, UDP (for bearer traffic) and SCTP (for signaling traffic) are the crucial layer 4 protocols. 4. IP QoS initiatives such as DiffServ and MPLS attempt to harness underlying resources (that is, transmission bandwidth, buffering, and switching resources) that reside below layer 3. It is important to note that items 1 and 4 are expensive. It is tempting to measure the costs associated with “mainline” data networking and compare them with the
7.10
Summary
85
costs associated with legacy telephone networks. Such a comparison is ultimately misleading, however, for the simple reason that packet telephony will cost more than traditional data networking. Telephone equipment manufacturing has traditionally been a specialized, high-margin business. One way or another, that will probably change. The point we are trying to make, however, is that early generations of carrier grade packet telephony “gear” will also be specialized and command premium prices. 7.10.1
Further Reading
Our coverage of IPv6 has been minimal. The list of IETF standards that had to be adapted to work with IPv6 is far too long to present here. Throughout the remainder of this book, the reader should assume that “IP” means “IPv4” unless explicitly noted otherwise. Regarding our discussion of (and bibliographic references to) protocols related to IP, the reader should not assume compatibility with IPv6. There is more discussion of IPv6 in Section 15.6.2. For in-depth coverage, one needs to follow the pointers given there and/or consult an expository reference such as Hagen’s book [72]. We also found several informative tutorials via simple-minded Web searches. We mentioned ICMP [9, 12] in the context of path MTU discovery. Although we do not cover them in this book, ICMP is used for numerous other purposes. DNS is another subject area that “got cheated.” Many subsequent RFCs have updated RFCs 1034 [22] and 1035 [23] (the two documents we referenced in Section 7.3.3). Perhaps the easiest way to obtain the details is to go to www.rfc-editor.org, follow the “RFC search” link, and search on “dns.” Some of the RFCs listed there came out of the DNS Extensions (dnsext) working group, which is still active at the time of this writing. For comprehensive coverage of topics in network optimization, we recommend the fine book by Ahuja, Magnanti, and Orlin [73]. This is not a telecommunications book, however; the book by Bertsekas [74] is more directly pertinent to telecommunications and also adds a control theory flavor. We have noted that simple-minded distance vector protocols suffer from severe scalability limitations; various schemes for enhancing scalability, including that specified in [75], are widely deployed. Halabi and McPherson’s book [76] discusses BGP in considerable detail; the authors also give background on CIDR and other topics. John Moy, the main author of the OSPF specification, has written a book on the subject [77]. In their data networking book, Bertsekas and Gallager [78] present a useful discussion of traffic modeling.
References [1] Postel, J., RFC 791, Internet Protocol, IETF, September 1981, Part of IETF STD 5. [2] Deering, S., and R. Hinden, RFC 1883, Internet Protocol, Version 6 (IPv6) Specification, IETF, December 1995. [3] Deering, S., and R. Hinden, RFC 2460, Internet Protocol, Version 6 (IPv6) Specification, IETF, December 1998.
86
A Closer Look at Internet Protocol [4] Postel, J., Internetwork Protocol Specification—Version 4, IEN-41, June 1978. [5] Postel, J., DOD Standard Internet Protocol, IEN-41, December 1979. [6] Topolcic, C., RFC1190, Experimental Internet Stream Protocol: Verizon 2, IETF, October 1990. [7] Delgrossi, L., and E. L. Berger, RFC 1819, Internet Stream Protocol Version 2 (ST2) Protocol Specification—Version ST2+, IETF, August 1995. [8] Mogul, J., and S. Deering, RFC 1191, Path MTU Discovery, IETF, November 1990. [9] Postel, J., RFC 792, Internet Control Message Protocol, IETF, September 1981, Part of IEFT STD 5. [10] Rajahalme, J., et al., RFC 3697, IPv6 Flow Label Specification, IETF, March 2004. [11] McCann, J., S. Deering, and J. Mogul, RFC 1981, Path MTU Discovery for IP Version 6, IETF, August 1996. [12] Conta, A., and S. Deering, RFC 2466, Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification, IETF, December 1998. [13] Rekhter, Y., and T. Li, RFC 1518, An Architecture for IP Address Allocation with CIDR, IETF, September 1993. [14] Fuller, V., et al., RFC 1519, Classless Interdomain Routing (CIDR): An Address Assignment and Aggregation Strategy, IETF, September, 1993. [15] Rekhter, Y., et al., RFC 1918, Address Allocation for Private Internets, IETF, February, 1996. [16] Srisuresh, P., and K. Egevang, RFC 3022, Traditional IP Network Address Translator (Traditional NAT), IETF, January 2001. [17] Hinden, R., and S. Deering, RFC 3513, IP Version 6 Addressing Architecture, IETF, April 2003. [18] Hinden, R., and S. Deering, and E. Nordmark, RFC 3587, IPv6 Global Unicast Address Format, IETF, August 2003. [19] S. Thomson, and T. Narten, RFC 2462, IPv6 Stateless Address Autoconfiguration, IETF, December 1998. [20] ATM Forum Technical Committee, af-imli-0065.000, Integrated Local Management Interface (ILMI) Specification Version 4.0, ATM Forum, September 1996. [21] Berners-Lee, T., R. Fielding, and L. Masinter, RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, IETF, August 1998. [22] Mockapetris, P. V., RFC 1034, Domain Names—Concepts and Facilities, IETF, November 1987, Part of IEFT STD 13. [23] Mockapetris, P. V., RFC 1035, Domain Names—Implementation and Specification, IETF, November 1987, Part of IETF STD 13. [24] Dierks, T., and C. Allen, RFC 2246, The TLS Protocol Version 1.0 IETF, January 1999. [25] Kent, S., and R. Atkinson, RFC 2401, Security Architecture for the Internet Protocol, IETF, November 1998. [26] Thayer, R., N. Doraswamy, and R. Glenn, RFC 2411, IP Security Document Roadmap, IEFT, November 1998. [27] Kent, S., and R. Atkinson, RFC 2402, IP Authentication Header, IEFT, November 1998. [28] Kent, S., and R. Atkinson, RFC 2406, IP Encapsulating Security Payload, IEFT, November 1998. [29] Maughan, D., et al., RFC 2408, Internet Security Assocation and Key Management Protocol (ISAKMP), IETF, November 1998. [30] Harkins, D., and D. Carrel, RFC 2409, The Internet Key Exchange (IKE), IETF, November 1998. [31] Rigney, C., RFC 2865, Remote Authentication Dial-In User Service (RADIUS), IEFT, June 2000. [32] Calhoun, P., et al., RFC 3588, Diameter Base Protocol, IETF, September 2003.
7.10
Summary
87
[33] Dijkstra, E., “A Note on Two Problems in Connections With Graphs,” Numeriche Mathematics, Vol. 1, 1959, pp. 269–271. [34] Moy, J., RFC 2178, OSPF Version 2, IEFT, April 1998. [35] Hedrick, C., RFC 1058, Routing Information Protocol, IEFT, Juen 1998. [36] Malkin, G., RFC 1388, RIP Version 2—Carrying Additional Information, IEFT, January 1993. [37] Malkin, G., RFC 1387, RIP Version 2 Protocol Analysis, IETF, January 1993. [38] Malkin, G., and F. Baker, RFC 1389, RIP Version 2 MIB Extension, IETF, January 1993. [39] Rekhter, Y., and T. Li, RFC 1771, A Border Gateway Protocol 4 (BGP), IETF, March 1995. [40] Li, T., and Y. Rekhter, RFC 2474, Definition of the Differentiated Services Field in the IPv4 and IPv6 Headers, IETF, October 1998. [41] Grossman, D., RFC 3260, New Terminology and Clarifications for DiffServ, IETF, April 2002. [42] Ramakrishnan, K., S. Floyd, and D. Black, RFC 3268, The Addition of Explicit Congestion Notification (ECN) to IP, IETF, September 2001. [43] Blake, S., et al., RFC 2475, An Architecture for Differentiated Services, IETF, December 1998. [44] Black, D., et al., RFC 3140, Per-Hop Behavior Identification Codes, IETF June 2001. [45] Davie, B., et al., RFC 3246, An Expedited Forwarding PHB, IETF, March 2002. [46] Charny, A., et al., RFc 3247, Supplemental Information for the New Definition of the EF PHB, IETF, March 2002. [47] Armitage, G., et al., RFC 3248, A Delay Bound Alternative Revision of RFC 2598, IETF, March 2002. [48] Heinane, J., et al., RFC 2597, Assured Forwarding PHB Group, IETF, June 1999. [49] Braden, R., D. Clark, and S. Shenker, RFC 1633, Integrated Services in the Internet Architecture: An Overview, IETF, June 1994. [50] Braden, R., et al., RFC 2205, Resource ReSerVation Protocol (RSVP)—Version 1 Function Specification, IETF, September 1997. [51] Wroclawski, J., RFC 2210, The Use of RSVP With IETF Integrated Services, IETF, September 1997. [52] Rosen, E., A. Viswanathan, and R. Callon, RFC 3031, Multiprotocol Label Switching Architecture, IETF, January 2001. [53] Andersson, L., et al., RFC 3036, LDP Specification, IETF, January 2001. [54] Thomas, B., and E. Gray, RFC 3037, LDP Applicability, IETF, January 2001. [55] Rosen, E., et al., RFC 3032, MPLS Label Stack Encoding, IETF, January 2001. [56] Suzuki, M., RFC 3033, The Assignment of the Information Field and the Protocol Identifier in the Q.2941 Generic Identifier and Q.2957 User-to-user Signaling for the Internet Protocol, IEFT January 2001. [57] Conta, A., P. Doolan, and A. Malis, RFC 3034, Use of Label Switching on Frame Relay Networks Specification, IETF, January 2001. [58] Davie, B., et al., RFC 3035, MPLS Using LDP and ATM VC Switching, IETF, January 2001. [59] Nagami, K., et al., RFC 3038, VCID Notification Over ATM Link for LDP, IETF, January 2001. [60] Awduch, D., et al., RFC 3209, RSVP–TE: Extensions to RSVP for LSP Tunnels, IETF, December 2001. [61] Awduche, D., A. Hannan, and X. Xiao, RFC 3210, Applicability Statement for Extensions to RSVP for LSP Tunnels, IETF, December 2001. [62] Jamoussi, B., et al, RFC 3212, Constraint-Based LSP Setup Using LDP, IETF, January 2002.
88
A Closer Look at Internet Protocol [63] Ash, J., et al., RFC 3213, Applicability Statement for CR–LDP, IEFT, January 2002. [64] Ash, J., et al., RFC 3214, LSP Modification Using CR–LDP, IETF, January 2002. [65] Faucher, F. L., et al., RFC 3270, Multiprotocol Label Switching (MPLS) Support of Differentiated Services, IETF, May 2002. [66] Postel, J., RFC 768, User Datagram Protocol, IETF, August 1980. [67] Stewart, R., et al., RFC 2960, Stream Control Transmission Protocol, IETF, October 2000. [68] Coene, L., RFC 3257, Stream Control Transmission Protocol Applicability Statement, IETF, April 2002. [69] Ong, L., and J. Yoakum, RFC 3286, An Introduction to the Stream Control Transmission Protocol (SCTP), IETF, May 2002. [70] Stone, J., R. Stewart, and D. Otis, RFC 3309, Stream Control Transmission Protocol (SCTP) Checksum Change, IETF, September 2002. [71] Perkins, C., RFC 3344, IP Mobility Support for IPv4, IETF, August 2002. [72] Hagen, S., IPv6 Essentials, Sebastopol, CA: O’Reilly, July 2002. [73] Ahuja, R. K., T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, Engelwood Cliffs, NJ: Prentice Hall, 1993. [74] Bertsekas, D. P., Network Optimization: Continuous and Discrete Models (Optimization, Computation, and Control), Belmont, MA: Athena Scientific, 1998. [75] Bates, T., R. Chandra, and E. Chen, RFC 2796, BGP Route Reflections—An Alternative to Full Mesh IBGP, IETF, April 2000. [76] Halbi, B., and D. McPherson, Internet Routing Architectures, 2nd ed., Indianapolis, IN: New Riders Publishing (Cisco Press), August 2000. [77] Moy, J. T., OSPF: Anatomy of an Internet Routing Protocol, Reading, MA: AddisonWesley, 1998. [78] Bertsekas, D. P., and R. Gallager, Data Networks, 2nd ed., Engelwood Cliffs, NJ: Prentice Hall, 1991.
CHAPTER 8
A Closer Look at SS7 In some ways, SS7 is inelegant—for example, its routing scheme has been surpassed by today’s IP routing protocols. This begs the following question: Why is the “footprint” of this technology still expanding? The answer is that SS7 is robust in some very important ways: •
SS7 networks are built to an extremely high degree of reliability. There is a high degree of redundancy built into the SS7 “architecture,” and the SS7 protocol stack fully exploits this redundancy.
•
The SS7 protocol stack monitors the health of SS7 links. There are well-established tools and procedures for troubleshooting SS7 networks; these are based in part on SS7’s builtin capabilities for health assessment. SS7 supports much more than just basic call-control functionality. TCAP is especially important, because it allows SS7 entities to generate database queries (e.g., to look up subscriber data).
•
The number and cumulative volume of the ITU-T SS7 standards documents is truly enormous. The ITU-T document [1] provides an overview and outlines the other documents in the series. The U.S. versions of the SS7 standards are published by ANSI. Details differ between the ITU-T and ANSI versions, but the concepts are the same. The standards documents make for difficult reading. They are probably not the best starting point for readers who are new to the subject. We found good Web tutorials on SS7 at http://www.pt.com and http://www.iec.org/online, simply by searching against the text “SS7 tutorial.” If these pointers become stale, the reader may still be able to find resources by repeating the same search. For a detailed treatment of SS7, one can consult books by Manterfield [2], van Bosse [3] and Russell [4].
8.1
SS7 Architecture and Link Types In this section, we describe how redundancy is builtin to the SS7 architecture. There are three types of SS7 network elements: •
Service control points (SCPs) house service logic (one can think of this as the “flowchart” defining the workings of a given service) and/or supporting data. A subscriber database (such as a home location register in the case of a wireless network) is a good example.
89
90
A Closer Look at SS7
•
•
Voice switches are the “clients” of the SCPs. That is, they look to the SCPs for data or instructions necessary to provide a given service. Note that voice switches are called service switching points in SS7, although we will have little use for this terminology. Signaling transfer points are packet switches that act as intermediaries between pairs of voice switches or voice switches and SCPs.
SS7 links come in different types, which are distinguished by their placement and roles in the SS7 architecture. Voice switches and SCPs connect to STPs via A, or “access,” links. Since one SCP typically serves many switches, SCPs are often deployed in redundant pairs. This is shown in Figure 8.1. Multiple types of links are used to interconnect STPs: B (“bridge”), C (“cross”), and D (“diagonal”). STPs are deployed in redundant pairs called mated pairs; the two STPs in a mated pair perform identical functions. Unlike SCPs, the members of a mated STP pair are interconnected; such a connection is called a C link. In Figure 8.1, note that the voice switch is “dual-homed”: it is connected to STP1 and STP2, which form a mated pair. If connectivity to one of the two STPs fails, the switch is not isolated from the rest of the SS7 network. The C link connecting STP1 and STP2 is used only in the case of failure. Here is an example: suppose that our voice switch has launched a database query toward the SCP, whose response comes to STP2. Suppose also that connectivity between STP2 and the voice switch has failed, but neither of these network elements is down. (This could be because a line card at one endpoint of the link has failed, or because of a failure on some intervening transport network element.) Then STP2 will forward the response to STP1 in the hope that STP1 still has connectivity to the voice switch. Note the label on the link between STP3 and STP4 (which also form a mated pair). B and D links are used to interconnect mated pairs of STPs. For our purposes, there is no distinction between the two; we will call them B/D links. As illustrated in the figure, there are four ways to connect a member of the STP1-STP2 mated pair to a member of the STP3-STP4 pair. Thus B/D links come in sets of four. Unlike C links, B/D links are used during normal operation—traffic between the STP1-STP2 and STP3-STP4 mated pairs will be shared among the four links so that the load is balanced. There are two additional SS7 link types (these are not shown in Figure 8.1). F (“fully associated” or “facility”) links connect voice switches directly to SCPs. In the
STP 3
STP 1
A SCP 1
A A C Voice switch
B/D
C A
A STP 2
SCP 2
STP 4 A
Figure 8.1
SS7 link types.
8.2
SS7 Routing and Addressing
91
United States, STPs are almost universally deployed and F links are uncommon. An E (“extended”) link connects a voice switch to an alternate STP (that is, an STP in an alternate mated pair). E links are also uncommon. Further Description of Service Control Points
To motivate the subsystem number (SSN) concept presented in Section 8.2, we need to offer a truer representation of an SCP than what appears in Figure 8.1. There, due to space limitations, each SCP is shown as a single database. However, it often makes sense to house multiple services behind the same SS7 interface. Among other things, it may be advantageous to multiplex the traffic pertaining to those services on common linkset(s). Figure 8.2 positions the SCP as an SS7 “front end” for two database applications (the latter are suggested by the cylinders at the far right). As we have drawn the picture, the databases are external to the SCP itself (this was an arbitrary choice); the connecting links are not SS7 links (they are dotted to contrast with the A links at the far left). Note that the SS7 network does not know or care whether the databases are part of the SCP. Note also that we do not intend to suggest in Figure 8.2 that the upper SS7 interface is associated with Service #1 or that the lower SS7 interface is associated with Service #2. The two SS7 interfaces are present for redundancy and load sharing; each A link will carry traffic for both services.
8.2
SS7 Routing and Addressing We need some routing and addressing terminology to facilitate our forthcoming discussion of SS7 protocol layers. The following four terms will suffice for our purposes: •
Linkset. In Figure 8.1, we suppressed one important detail: each of the lines can represent a linkset rather than a single link. For our purposes, the defining characteristic of a linkset is this: all members of a linkset terminate at the same pair of nodes and serve the same function. Legacy SS7 links have limited transmission capacity (typically 56 kbit/s); this is the reason that the linkset concept was devised and implemented.
Service #1 Redundant A links to SS7 network
SS7 interfaces Service #2 SCP
Figure 8.2
Representation of a service control point.
92
A Closer Look at SS7
•
Routeset. Linksets only connect adjacent SS7 nodes. Unlike links, routes can contain intermediate nodes. More formally, a route is a collection of linksets that, when “concatenated,” form a path between two SS7 nodes not directly connected. As an example, consider the following collection from Figure 8.1: the linkset connecting the voice switch to STP1, the linkset connecting STP1 to STP3, and the linkset connecting STP3 to SCP1. A routeset is a collection of routes that share the same originating and terminating nodes. The following collection is an example routeset: •
The route already described;
•
The route voice switch-STP1-STP4-SCP1;
•
The route voice switch-STP2-STP3-SCP1;
•
The route voice switch-STP2-STP4-SCP1.
Routesets typically incorporate route diversity (i.e., multiple paths between their originating and terminating nodes, as illustrated by the example). This exploits the redundancy present in SS7 networks to achieve fault tolerance.
8.3
•
Point code. SS7 node addresses are called point codes. The details differ from country to country. The United States and China use 24-bit point codes, but the formats are different. Europe uses 14-bit point codes.
•
Subsystem number. Recall that a single SCP may provide more than one service. SS7 distinguishes between services by assigning subsystem numbers to them. One can think of a service as a piece of application software running on an SCP; then the application is known to the SS7 protocol stack on that SCP by its subsystem number. Approaching the concept in this way, we see that subsystem numbers in SS7 networks play an analogous role to that of port numbers in IP networks. Recall from our discussion of IP networking that source and destination port numbers appear in the layer 4 header (i.e., the TCP, UDP, or SCTP header).
Review of the SS7 Protocol Stack Most of the protocols mentioned in this section were also discussed, albeit briefly, in Chapter 6. Using Figure 8.3, we remind the reader of these protocols and show schematically how they fit together. By drawing an imaginary vertical line through Figure 8.3, the reader can glimpse the stack for a given SS7 protocol. In legacy SS7 deployments, the three layers of MTP are always present. Let us look at the example of ISUP. Depending on where we position the vertical line, it may or may not pass through the box labeled “SCCP.” This reflects the fact that ISUP can run over SCCP, or it can run directly over MTP3. ISUP over SCCP is extremely rare, so the pertinent portion of the ISUP box in the diagram is purposely very slim. ISUP is the protocol for basic call-control signaling in most networks today; this protocol is fairly straightforward and has changed little over the years.
8.4
Message Transfer Part
93
MAP
IS41
INAP
TCAP ISUP SCCP MTP level 3 MTP level 2 MTP level 1
Figure 8.3
”Traditional” SS7 stack.
We do not describe ISUP any further in this chapter; instead, we refer the reader to the overview that appears in Section 6.5.6. Numerous protocols run over TCAP, which in turn requires SCCP. The past 15 or 20 years have seen continual change in this area—new protocols have emerged and existing protocols have evolved enhanced capabilities. MAP and ANSI-41 handle mobility management for GSM and CDMA/TDMA wireless networks respectively. Intelligent network application part (INAP) is used to support a variety of intelligent network services, such as prepaid long distance. There are many other application parts. Although we do not discuss those protocols here, their existence is proof of SS7’s great flexibility. Packet Formats
We will present less detail about packet headers in this chapter than in Chapter 7. There are several reasons for this. Field lengths and formats differ from place to place (as we have seen with point codes). There are some also regional differences in packet formats. Finally, there are numerous options at some protocol layers, and the interpretation of protocol fields depends on the option settings. The interested reader will find that SS7 is well documented in the print medium and, to a lesser extent, on the World Wide Web. SS7 Network Management Traffic
We have seen that redundancy is a built-in feature of the SS7 infrastructure. This is important for SS7’s reliability and overall robustness, but is not by itself sufficient. SS7 has complex network management features essential to its robustness. Although we will not discuss them further, network management messages are exchanged at various levels in the SS7 protocol stack; network management traffic is an important part of the mix in today’s SS7 networks.
8.4
Message Transfer Part Recall that MTP1 is synonymous with the physical layer. Details of the physical layer are beyond the scope of this book.
94
A Closer Look at SS7
8.4.1
MTP2
MTP2 runs directly over the physical layer. MTP2 protocol entities at either end of a link continually assess the health of that link, reporting this information to the MTP3 layer. MTP2 is responsible for retransmission of packets that are lost or garbled. In support of these responsibilities, the MTP2 frame includes sequencing information and a cyclic redundancy check field used to detect corrupted frames. MTP2 frames, called signal units in SS7 jargon, come in three varieties, which we now describe. MTP2 entities that share an SS7 link talk to each other constantly, transmitting fill in signal units (FISUs) if they do not have anything in particular to say. A FISU can also be used by an MTP2 entity that wants to acknowledge receipt of a signal unit from the far end but does not have any additional content to convey. Link status signal units are used for the health assessment functions of MTP2. All payloads received from the MTP3 layer are encapsulated in message signal units. 8.4.2
MTP3
MTP3 receives information from MTP2 about link health and is responsible for acting on this information. MTP3 incorporates congestion avoidance procedures. Should a link exhibit severe problems, MTP3 can mark that link as “out of service” and initiate a link reset (that is, it can restart the state machines at either end of the link and allow them to “sync up”). Once an incoming message signal unit (MSU) has been processed by MTP2, it passes the service information octet (SIO) and signaling information field (SIF) to MTP3. The SIO tells MTP3 how the SIF should be interpreted: it identifies the higher-layer protocol entity for this message (e.g., ISUP or SCCP) and includes priority information. MTP3 also has routing capabilities; it is responsible for selecting the outgoing link for each outgoing packet. Although its format is variable, the SIF always begins with a routing label. The routing label contains the destination point code (DPC), the originating point code (OPC), and the signaling link selection (SLS); see Figure 8.4. Note that this drawing is not to scale in the sense that the SIO, DPC, OPC, and SLS fields do not have the same length. In some (but not all) variants, the routing label contains “filler” bits in addition to the fields shown. Whenever a message is ready to be sent to another node, MTP3 makes the outgoing link selection using the DPC and SLS as follows. The DPC is used as a key for a routing table lookup; this lookup determines the outgoing linkset. The SLS is then used to determine which link in the linkset will be used. MTP3 balances the load among all links in a linkset. One might imagine that MTP3 would simply cycle the SLS through all possible values, thus distributing packets among the links in a round-robin fashion. This gives the general idea, but is a bit oversimplified: packets vary in length, and some messages must be segmented across multiple packets because of their size. In the case of a multipacket message, all of the fragments must be sent on the same link to make certain they do not arrive out of order. The routing capabilities of MTP3 are limited; in many routing scenarios, MTP3 requires the assistance of the SCCP layer.
8.5
SCCP
95
Service information octet (SIO) Destination point code (DPC) Routing label
Originating point code (OPC) Signaling link selection (SLS)
Service information field (SIF)
...length and interpretation depends on contents of SIO... Figure 8.4
8.5
MTP3 fields.
SCCP SCCP provides the routing capabilities that MTP3 lacks. Network layer functionality is shared by MTP3 and SCCP.
8.5.1
General Description and Communication with MTP3
SCCP provides four classes of service. Two are connection-oriented, in which a session must be initiated before data transfer takes place. The two connection-oriented services are distinguished by whether in-sequence data delivery is assured. The other two services are connectionless and are again distinguished by whether in-sequence delivery is assured. The connectionless services are far more common than their connection-oriented counterparts, although SCCP’s name might lead one to guess the opposite. When SCCP assures in-sequence delivery, it does so in a rather crude way: it tells MTP3 not to change the SLS. This assurance is necessary whenever a message is long enough to require fragmentation at the MTP3 layer. If fragments arrive out of order, the receiving end will not be able to reassemble them correctly. We saw in Section 8.2 that SSNs are used to distinguish applications running on an SCP. For each packet that it receives from MTP3, SCCP uses the SSN field (located in the SCCP header) to determine which higher-layer application will receive the payload. This handoff occurs at the end node (e.g., the SCP that is the ultimate destination for a service request) and is conceptually simple. One could argue that this is a layer 4 functionality (see the comparison with TCP port numbers in Section 8.2), but the dividing line between the layers is not so clear here, as we will explain. Using an SSN to select a higher-layer application at a destination SS7 node is the easy part. Reaching this ultimate destination is often more complex, and SCCP plays a crucial role here. The relevant SCCP functionality is called global title translation (GTT).
96
A Closer Look at SS7
8.5.2
Getting There Is Half the Fun: Global Title Translation
In many cases, the sender of an SS7 message does not know the point code of that message’s final destination. Toll-free numbers, for instance, are not routable. When a toll-free number is dialed, the originating switch must therefore obtain routing information from the toll free database (which is an SCCP subsystem) before the call setup can proceed. Moreover, the originating switch usually does not know the location (that is, Point Code and SSN) of the toll free database. As long as the switch can pass the request to another SS7 node (using MTP3 routing) whose SCCP layer knows where to go next, however, the query can make progress toward the toll-free database. How does the query make progess toward the database? The SCCP layer at the receiving node fetches a new DPC from a routing table, using the dialed digits as an “index” into that routing table. The SCCP entity then repopulates the DPC for the next phase of the query’s journey and forwards again over MTP3. This process is called GTT. In Figure 8.5, the GTT procedure is performed by the node in the center of the diagram. The nodes flanking the GTT node route the service request at the MTP3 layer only; usually these would be STPs. (GTT nodes are not always separate; some STPs implement GTT in addition to their basic functionality.) It is worthwhile to note the similarity between Figure 8.5 and Figure 6.2. GTT may be invoked multiple times along an SS7 message’s end-to-end path. The overall process looks like this: the dialing customer’s serving switch must know the point code of the device (an STP, say) that handles toll-free translations for it. The serving switch inserts this point code in the DPC field of the MTP3 header and forwards it, over MTP3, to that device. The message is formatted so as to indicate that the switch is requesting GTT. The receiving STP invokes GTT to fill in a new DPC and forwards again over MTP3. Suppose this STP does not know the final destination of the query. Then it indicates to its “SCCP next hop” that it, too, is requesting GTT. Ultimately, the query reaches the SCP that hosts the toll-free database application; recall that the SSN identifies the correct application.
MAP
MAP
Subsystems
TCAP
TCAP SCCP
SCCP
SCCP
MTP3
MTP3
MTP3
MTP3
IP
MTP2
MTP2
MTP2
MTP2
MTP2
Physical
Physical
Physical
Physical
Physical
DPC1 at MTP3 layer Figure 8.5
SS7 routing example.
GTT
DPC2 at MTP3 layer
8.5
SCCP
97
This is starting to sound complicated, and therefore begs the following question: “Why bother with multiple GTT invocations along the path of a single message?” One reason is that this is almost certainly more cost-effective than populating the DPC and SSN of the toll-free database into the routing table of every STP. One would have a management nightmare if this DPC needed to be changed for some reason. Moreover, when one incorporates many services into the picture, routing tables would become large and unwieldy. Clarifying the Bigger Picture
We can enumerate the following steps in processing a toll-free call (note that this description pertains to the U.S. implementation): 1. The calling party’s serving switch looks at the dialed digits and realizes that they constitute a special, nonroutable number. Thus it cannot initiate an ISUP call flow right away. 2. The switch sends a query to the toll free database. (As a “plug” for the material in Section 8.6, we note that the query is encapsulated in a TCAP message.) 1 3. Usually, the toll-free database responds with a carrier code. 4. Using ISUP signaling, the local switch sets up a trunk to the access switch for the correct interexchange carrier (as indicated by the carrier code obtained in step 3). 5. The interexchange carrier’s access switch queries the toll-free database and obtains a routable number. 6. The interexchange carrier’s switch completes the call using the newly obtained routable number as the called party number. It is important to understand that GTT takes place in the course of steps 2 and 5, not in steps 4 or 6. Indeed, we have noted that ISUP runs directly over MTP3, so we cannot directly take advantage of SCCP’s global title functionality within ISUP call-control signaling. The philosophy is that we do not want to reserve any trunks until we know where we are going. (Recall that trunk is just a fancy name for a voice bearer channel connecting two switches; the term usually implies that the channel does not traverse any intermediate switches.) The process already outlined is transparent to the calling party, which never sees the routable number. Toll-free service predates the widespread deployment of GTT. In earlier incarnations, not every switch had the capability to perform database queries. In such cases, the caller’s serving switch did reserve a trunk to a switch that could query the toll-free database. After obtaining a routing number, the second switch would continue the call-control signaling flow. Note that the trunk from the first switch to the second switch remained in the bearer path throughout the life of the call. We still see the vestiges of this arrangement in the fact that the local exchange carrier trunks to 1.
If the dialed number is handled by the Local Exchange Carrier, then the query yields a routable number directly. (Local Exchange Carriers in the US are allowed to offer intra-LATA toll-free service.)
98
A Closer Look at SS7
the interexchange carrier’s access switch. Usually the distance between the two carriers’ switches is small, so the implied loss of efficiency is minimal. More on GTT
We have only scratched the surface of SCCP and global title translation. We close this section with a list of comments about additional aspects of the SCCP layer; details on these topics are beyond the scope of this book. •
In Section 8.4, we mentioned routing tables that are consulted by the MTP3 layer. As noted in this section, routing tables are also present at the SCCP layer (to support GTT).
•
The name of the DPC field in the MTP3 header is potentially misleading. This is indeed the final destination from MTP3’s point of view. However, as we have seen, SCCP/GTT may insert a new value in the DPC field and hand the message in question back to MTP3 for transport to another SS7 node. The address indicator field in the SCCP header indicates whether GTT is required. If so, this field also controls the global title options, which are numerous. Many of these options go hand in hand with the type of address information supplied with the GTT request; the address indicator tells SCCP what type of information to expect in the address header field. One of the global title options is to consider the SSN in the routing decision. Here is one use of this feature: suppose we have a service that relies on a database; for redundancy purposes there are two copies of the database residing at different point codes. SCCP can mark the associated SSN as “subsystem prohibited” for one of the two point codes. This feature can be used to route queries to the backup copy of the database (e.g., when the primary copy is taken down for scheduled maintenance). Recall our statement that SSNs are similar to TCP (or UDP or SCTP) port numbers in IP networks. However, port numbers are not normally taken into account in IP routing; the separation between layers 3 and 4 is clearer there than in SS7 networks. GTT happens at the SCCP layer. Recall our statement that SCCP uses the SSN field to make sure it hands off to the correct higher-layer protocol entity. In fact, the appropriate subsystem is invoked when a database query or other service invocation request reaches its final destination (so our statement was accurate, as far as it went). However, when GTT takes place at intermediate node(s), there is no handoff to a higher-layer protocol entity (because we have not yet reached the SCP that provides the desired service).
•
•
•
•
8.6
TCAP TCAP is all about invoking operations on remote nodes and reporting the results of those operations to the invoking entities. TCAP comes into play when an application on one node (a user in SS7-speak) asks a peer application on another node to
8.6
TCAP
99
do something. Usually this “something” is a database lookup (think, for example, of local number portability or of verifying a calling-card personal identification number), but there are other uses. For example, consider a ringback service in which a caller who receives a busy signal can ask to be connected to the called party whenever the latter becomes available. The caller’s serving switch asks to be informed when this event occurs. The request is encapsulated in a TCAP message to the called party’s serving switch. Each TCAP message contains a transaction portion and a component portion. Each TCAP message contains an indication whether it is a unidirectional transfer of information (in which no reply is expected), an initiation of a dialog, a continuation of an ongoing dialog, a final response that ends a dialog, or an abort. This indication appears in the transaction portion of the TCAP message, along with originating and responding transaction ID fields. TCAP uses these transaction IDs to match each transaction with the correct applications at the endpoints. To summarize, originating and responding applications talk to each other within TCAP transactions. A TCAP transaction can be an ongoing dialog or a “one-shot deal.”
TCAP transactions.
Many operations can be conducted under the aegis of a single TCAP transaction. If two applications talk to one another frequently, many requested operations may be in process simultaneously, in various stages of completion. Individual invocations and responses are called components; multiple components can be bundled inside a single TCAP message provided that they all share the same originating and responding applications. As a crude example, a node that is generating many queries to a single database may “batch” multiple queries per TCAP message; each query is regarded by TCAP as an invoke component. The database application may similarly batch responses; each “normal” response is a return response component. For handling of abnormal conditions, there are also return error and reject components. Note that when multiple invoke components are bundled into a single TCAP message, the responses do not have to be bundled in the same way. To TCAP, application-specific data (such as the MAP queries discussed in Section 8.7) appear as one or more parameters. TCAP does not try to parse or otherwise examine application-specific data. We have seen that TCAP uses the transaction portion to match each message to the correct application. TCAP keeps track of individual operations within a transaction at the component layer and can tell certain things (e.g., whether this is an invoke or some sort of response; whether error-handling is necessary) without any knowledge of the application-specific data. This information may assist in making sure that each component is handed to the right module within the receiving application (e.g., a MAP protocol entity). This suggests a tight coupling between TCAP and the higher-layer application. Such a coupling may blur the boundary between the two layers, but might be expedient for implementation efficiency.
TCAP components.
8.6.1
Number Portability
We have seen that toll-free numbers are not routable, and that the dealiasing process is realized via TCAP queries and responses (much as DNS is used to resolve URIs to
100
A Closer Look at SS7
IP addresses—see Section 7.3.3). TCAP queries are routed to the appropriate hosts using GTT. Toll-free numbers are not the only nonroutable telephone numbers. Number portability uses the same technology and does so in a similar way. Number portability allows a customer to subscribe to a new carrier without changing his/her telephone number. When a number is “ported” to a new carrier, it becomes an alias for a routable number assigned by the new carrier. The binding between the ported number and its alias is stored in a database. When someone dials the ported number, a TCAP query is launched toward that database; routing of the query (and the subsequent response) relies on GTT. This sets off a sequence of events that inserts the routable number at an appropriate point in an ISUP call flow. The whole process is transparent to the calling and called parties, who never see the routable number. Wireless phone numbers are also aliases rather than true routable numbers. In Section 8.7, we look in detail at the dealiasing process for mobile terminated calls.
8.7
MAP Wireless networks make heavy use of TCAP, particularly for keeping up with subscriber locations; this functionality is called mobility management. Some readers may not be especially interested in mobility management (or may have reached the saturation point for details about SS7). Readers who wish to skip this section can simply keep the following points in mind: •
SS7’s footprint is still growing. Every mobile terminated call and short message relies on mobility management (we briefly describe Short Message Service in Section 13.8). So wireless networks are extremely heavy users of SS7. (Number portability is another reason for continued worldwide increases in SS7 traffic volume; number portability implementations typically ride on top of TCAP.)
•
SS7 will continue to be important for years to come. As noted, the sheer scale of SS7 deployments is one factor. Moreover, any would-be replacement for SS7 [e.g., Session Initiation Protocol (SIP)] would have to offer a great deal of functionality. Moreover, early implementations of a successor protocol will not be as robust as SS7 is today. Here we note that SIP is covered at length in Chapters 11 and 12.
•
Because of the aforementioned factors, we believe that sigtran will see widespread deployment. We briefly mentioned IETF’s Signaling Transport (sigtran) working group in connection with SCTP (see Section 7.8.2); further discussion appears in Section 15.4.
Note that Section 8.8 refers to the material in the current section. The remainder of this book is largely independent of the material contained in the current section. In wireless networks, voice switches are called mobile switching centers (MSCs). MSCs are similar to landline switches and are key components of the interconnecting infrastructure—that is, the infrastructure that:
8.7
MAP
101
•
Connects radio towers within an operator’s network;
•
Connects the operator’s network to those of other operators (both landline and wireless).
When a call comes into a wireless operator’s network (this is known as a mobile terminated call), the network has to know which MSC is currently serving the called subscriber. This information is maintained in a subscriber database called a home location register (HLR). MAP is the vehicle for maintaining such information in GSM wireless networks. In this section, we give two examples of GSM MAP signaling flows. Non-GSM wireless networks have entirely analogous functionality, although the details differ. When a handset powers on, it must register with the network. The registration process, which we now describe, is illustrated in Figure 8.6. The current serving MSC tells the HLR that it “sees” the handset (i.e., that the handset is talking to one of the radio towers that said MSC subtends). This takes the form of a MAP Update Location Request. At this point, the MSC knows little or nothing about the subscriber but has gleaned unique identifiers from its dialog with the handset and populated the update location request therewith. Upon receipt of this request, the HLR looks up the subscriber’s data (using the IDs previously mentioned) and returns the result via a MAP Insert Subscriber Data message. Once the MSC acknowledges receipt of the subscriber data, the HLR acknowledges that the Update Location request has been successfully completed. By this time, the HLR has updated the subscriber’s record with the identity of the serving MSC. The Insert Subscriber Data message tells the MSC which services the customer has subscribed to, among other things. The MSC needs to store this information somewhere; it does so in another database called a visiting location register (VLR). (In the abstract, MSC and VLR are functionally separate. However, the two functions are usually integrated into a single network element; we assume that this is the case in our diagrams.) At the TCAP and MAP layers, only the MSC/VLR and HLR are involved in the location updating procedure we have just described. Additional SS7 nodes (such as STPs) are involved at the MTP and SCCP layers, however. In SCCP terms, the VLR and HLR are subsytems; the SSN is always 7 for VLR and 8 for HLR. Thus the SSN = 7 Serving MSC/ VLR
SSN = 8 GTT node Update location Insert subscriber data Insert subscriber data ACK Update location ACK
Figure 8.6
GSM location updating procedure.
HLR
102
A Closer Look at SS7
SCCP headers for the Update Location and Insert Subscriber Data ACK messages will have their destination SSNs set to 8. The destination SSNs for the other two messages in Figure 8.6 will, by the same token, be 7. GTT may be required somewhere along the way; this is reflected in the figure by the presence of a “GTT node” (perhaps this is an STP). If this is a roaming scenario (i.e., the subscriber is not attached to the network of his/her chosen carrier), then the serving MSC and HLR are in fact in different carriers’ networks. In this case, the serving MSC almost certainly does not know the point code of the subscriber’s HLR but simply knows the point code of a GTT node that can correctly forward its MAP messages. In Figure 8.6, we have omitted certain details in the interest of simplicity. Let us note here that the HLR validates the subscriber before sending the insert subscriber data message. Moreover, when a subscriber moves to a new MSC/VLR, there is an additional step: the HLR must inform the previous MSC/VLR that the subscriber has left its area. The subscriber data will be purged from the previous MSC/VLR as a result. (If the subscriber is new or has not connected to the network in a long time, no MSC/VLR will have a copy of the subscriber’s data; essentially, there is no previous MSC/VLR in this case.) At this point, our subscriber has registered with the network. Now suppose a call is placed to this subscriber. We now describe the signaling flow that appears in Figure 8.7. In the figure, the nodes labeled “Gateway MSC” and “Serving MSC/VLR” inhabit the same carrier’s network. Assume for the moment that the Gateway MSC (GMSC) is the originating switch. In keeping with this assumption, let us pretend that the grayed-out ISUP IAM message at the far left is not present. The serving MSC/VLR maintains a bank of routable numbers (numbers that are routable to itself, that is). In short, this is what happens: the serving MSC/VLR selects a number from the aforementioned bank and temporarily binds that number to the called subscriber. The serving MSC/VLR informs the gateway MSC of the routable number it has selected for this call. The gateway MSC then initiates an ISUP call flow using the number temporarily assigned by the serving MSC/VLR. Note that this process is transparent to the calling and called parties: they never see the phone number that is temporarily assigned by the MSC. The complicating factor in all of this is that the HLR must act as intermediary between the GMSC and the serving MSC/VLR. (As we have seen, the HLR knows which MSC is currently serving the subscriber.) When the call request comes in, the GMSC interrogates the HLR regarding the subscriber’s whereabouts in the form of
Gateway MSC ISUP IAM
Serving MSC/ VLR
HLR
Send routing info
Provide roaming number
Send routing info ACK
Provide roaming number ACK ISUP IAM
...ISUP call flow continues...
Figure 8.7
Signaling for GSM mobile terminated call.
8.8
Summing Up
103
a MAP Send Routing Info message. Via an interchange with the serving MSC/VLR (MAP Provide Roaming Number and Provide Roaming Number ACK messages), the HLR obtains the aforementioned routable number, which it then forwards to the GMSC in its MAP Send Routing Info ACK response. Note that the serving MSC/VLR has a limited supply of roaming numbers and therefore does not bind a roaming number to the called subscriber until it receives the Provide Roaming Number request from the HLR. The GMSC can now send an ISUP IAM with the roaming number as the called party address. The ISUP call flow continues, just as we saw in Chapter 6 (see Figure 6.8). Note that the IAM and subsequent ISUP messages do not pass through the HLR. Global title translation may take place in the routing of MAP messages. However, we have omitted this detail from the representation in Figure 8.7. In our description, we essentially assumed that the calling and called parties are subscribers for the same wireless carrier and are attached to that carrier’s network. Moreover, we assumed that the calling party’s serving switch is capable of interrogating the HLR. (This capability, which is called gateway functionality, is not universal among GSM MSCs. It is common, however, so the scenario we just described is a reasonable one.) In this scenario, the GMSC collects dialed digits from the caller’s handset, which leads to the Send Routing Info message shown in the figure, and so on. Note that all of the signaling depicted in Figure 8.6, along with the MAP dialog that appears in Figure 8.7, must take place before the GMSC and serving MSC/VLR can interchange ISUP call-control messages. Now suppose that the calling and called parties are attached to different networks. Then the calling party’s network sends an ISUP IAM to a designated switch in the called party’s network (hence the name gateway MSC). This is the grayed-out IAM that appears at the far left of the figure; receipt of this ISUP message causes the gateway MSC to launch its HLR query. In this case, an ISUP call flow is initiated by a switch in another carrier’s network; when it reaches the called party’s network, the ISUP entity must wait while the MAP entity obtains a roaming number. In Figures 8.6 and 8.7, we do not show any signaling between the handset and the MSC. Such signaling does take place but note that it is not SS7 signaling. The SS7 network does not extend to the telephone (regardless of whether it is a wireless handset or a landline telephone).
8.8
Summing Up SS7 involves much more than basic call control. Much of this chapter is concerned with the following question: What happens when a switch wants to complete a call but does not have a routable number? The glib answer is that the switch must query a database. This begs the question: How do we find the appropriate database? The answer to the second question turned out to be long-winded; along the way, we found that SS7 routing functionality suffers from a lack of uniformity. For many services (including toll-free service), GTT is the answer. GTT, which is the responsibility of the SCCP layer, is therefore a crucial part of SS7’s routing capability. It is ungainly that this routing capability is split between the MTP3 and SCCP layers. For example, we pointed out that routing tables must be maintained at both layers. Moreover, SS7 routing tables must be manually provisioned.
104
A Closer Look at SS7
In a sense, MTP3 and SCCP are not the whole story of SS7 routing. More specifically, MTP3 and SCCP are not enough when it comes to mobile terminated calls. In mobile networks, the game of “find the database” often boils down to “find the HLR.” We presented two examples that involved finding the HLR. For the first sample signaling flow (the update location example of Figure 8.6), we said that MTP3, assisted by GTT at the SCCP layer, was adequate. But in the second flow (the mobile terminated call of Figure 8.7), it was necessary for the GMSC to query the HLR. This begs the following question: What is the essential difference between the two flows? The obvious difference is that the first flow is composed entirely of MAP messages, whereas the second call flow also includes ISUP call control messages. For MAP, the protocol stack includes SCCP. ISUP runs directly over MTP3. Thus GTT is possible in the first flow, but the GMSC cannot use GTT to “directly” forward the incoming ISUP message in the second flow. From this point of view, we can think of the ensuing MAP signaling flow as a sort of glorified global title translation. Continuing with this train of thought, one might wonder why the calling party’s serving switch does not query the HLR to obtain a routable number before launching an ISUP IAM toward the called party’s network. This is because switches in external networks may not know how to query an HLR. A secondary reason is that a carrier may not want to expose its roaming numbers to the scrutiny of outside parties. In both cases, it is a matter of maintaining transparency—switches in external networks do not have to know whether they are calling wireless subscribers. In wireline networks, switches do not even know that HLRs exist. For a mobile-to-mobile call, a similar problem exists when the calling and called parties employ different technologies (e.g., a CDMA subscriber cannot query a GSM HLR without some sort of translation because the syntaxes are different). Another sort of translation goes on, although we have not emphasized it up to this point. In Figure 8.7, an ISUP IAM enters the GMSC with one called party number (the called party’s “permanent phone number”) and is forwarded by the GMSC with a different called party number (namely, the roaming number supplied by the called party’s serving MSC/VLR). This translation must be reversed by the GMSC when it receives an ISUP ACM and subsequently an ISUP ANM from the serving MSC/VLR. This translation is not a GTT, because ISUP runs directly over MTP3. (Recall that the roaming number is not exposed to external entities. The reader may want to refer back to the ISUP call flow in Figure 6.8, since the ISUP portion is truncated in the diagram for our mobile terminated scenario.) We note here that network address translation (described in Section 7.3.1) bears a strong resemblance to the type of translation discussed in this paragraph. 8.8.1
Additional Weaknesses of SS7
SS7’s approach to routing is not ideal—we have “beat that horse to death” over the last several paragraphs. In addition, SS7 was devised for use with low-speed links (and links with potentially high bit error rates to boot). Much of the complexity of MTP2 and MTP3 exists because SS7 needed to operate reliably in the presence of these limitations. But that complexity may not be warranted in deployments that enjoy high-speed transmission and low bit error rates.
8.8
Summing Up
8.8.2
105
Strengths of SS7
One SS7 link can carry the signaling traffic for many voice channels. This was one of the initial motivations for out-of-band signaling, but it also meant that SS7 link outages would have major detrimental effects. Thus the SS7 protocol stack provides for redundancy. Using the protocol stack’s built-in redundancy features, SS7 networks are typically engineered to an extremely high degree of reliability. SS7 is deployed in a physically secure way. That is, the SS7 network extends only to voice switches, STPs, and SCPs; all of these network elements are physically under lock and key. (How can you hack a network that you cannot touch?) Moreover, SS7 is a stable and mature technology (“it works”). Lastly, SS7 is extremely flexible.
References [1] [2] [3] [4]
Recommendation Q.700, Introduction to CCITT Signalling System No. 7, ITU–T, March 1993. Manterfield, R., Telecommunications Signalling, Revised ed., IEE Publications, February 1999. van Bosse, J. G., Signaling in Telecommunication Networks, New York, London: John Wiley and Sons, January 1997 Russell, T., Signaling System, No. 7, 4th ed., New York: McGraw-Hill, June 2002.
.
CHAPTER 9
The Bearer Plane This chapter features a brief discussion of voice-encoding schemes. A variety of encoding techniques are available now that did not exist when the design principles for circuit-switched networks were formulated. As a result, it is not easy to incorporate voice-encoding innovations into today’s telephone networks. This is one of the motivations for migrating to packet telephony. Having set the stage, we begin our discussion of Voice over IP in earnest. That discussion will continue in subsequent chapters.
9.1
Voice Encoding Today’s voice network is digital. That is, voice signals are not transmitted between switches as continuously varying waveforms. Instead, for each active call, the transmitting switch periodically sends a string of 0’s and 1’s. (We confess that this is not, strictly speaking, quite accurate: the 0’s and 1’s themselves are encoded for transmission across the physical medium as waveforms. But voice signals are represented by strings of 0’s and 1’s; switches do not attempt to transmit the original voice waveforms directly.) A scheme for converting analog waveforms to digital format (and for converting back to analog at the receiving end of the resulting digital transmission) is often called a codec (an elision of the words “coder” and “decoder”). The digitizing process necessarily results in some loss of information—the recreated signal at the receiving end is not exactly the same as the input signal at the transmitting end. 9.1.1
G.711
Let us look at an example. For the G.711 codec [1], the sampling rate is 8,000 Hz. This means that the encoding device “polls” the analog signal 8,000 times per second and produces a chunk of digital information at each polling epoch. (G.711, an example of a pulse code modulation scheme, is by far the most common voice-encoding method. While certainly not the first codec to be developed, G.711 was the first to see widespread use outside of military applications.) To understand why a sampling rate of 8,000 Hz was chosen, one needs to know that most of the energy in conversational voice signals falls in the frequency band below 4,000 cycles per second, or Hz. So, for purposes of understanding what the person on the other end of the phone line is saying, loss of information at frequencies above 4,000 Hz is tolerable; we say that voice is (essentially) band-limited. The
107
108
The Bearer Plane
Nyquist theorem says that all of the information in a band-limited signal can be recovered from samples in discrete time, as long as the sampling rate is at least twice the maximum frequency found in the original (continuously varying) signal. Thus in the case of conversational voice, a sampling rate of 2*4,000 = 8,000 Hz is sufficient. The digitizing process entails loss of information in another way. Each sample involves measurement and quantization of a voltage. Returning to the case of G.711, each sample is represented by an 8-bit field, meaning that there are only 28 = 256 possible values; each of these values has a range of voltage measurements assigned to it. This assignment of a range of measurements to a single value is called quantization. (Note that 8 bits/sample times 8,000 samples/sec yields 64 kbit/s, the bit rate first mentioned in Section 3.2.2.) 9.1.2
Why Digital?
Given the fact that information is lost in the digitizing process, why bother to digitize voice signals in the first place? One answer is that error detection and correction are possible in the digital realm, whereas they are nearly impossible for analog signals. For error detection and correction, some level of redundancy (in the form of extra bits) is added to the payload. The particulars vary from one error-handling scheme to another; cyclic redundancy check is a widespread approach for detecting corrupted speech frames. Mobile wireless communications typically employ forward error correction (e.g., convolutional or block codes) in conjunction with interleaving to overcome signal fades. Digital signals can also be regenerated. Transmission through physical media is imperfect; in particular, the waveforms representing the information being transmitted attenuate over distance. Regeneration is the process of taking an input signal that has begun to degrade (but is still intelligible) and producing a robust copy of the signal. A digital signal is just a bit stream and is therefore intelligible as long as the receiving device can distinguish 0’s from 1’s. A regenerator that receives a degraded but still intelligible signal can transmit a “clean” copy of the same bit stream. Another advantage of digital signals is that they can be encrypted. For example, signals from wireless phones are ciphered to prevent eavesdropping. 9.1.3
Other Voice-Encoding Schemes
In his survey article [2], Cox enumerates the following key attributes of voice-encoding schemes: bit rate, delay, implementation complexity, and quality. G.711 provides good voice quality (at least when transmitted over media with low bit error rates) and is not very complicated. But in this era of cheap and plentiful processing power and memory, G.711 has a higher bit rate than it needs to; therefore it is not ideal for circumstances in which transmission bandwidth is at a premium. Because of its high voice quality and widespread use, G.711 is a reference point against which other codecs are compared. G.711 takes advantage of the fact that voice is band-limited (recall our earlier comments about sampling rate). But this is the only aspect of conversational voice that G.711 takes into account; for developers of fax machines and modems, this
9.1
Voice Encoding
109
turned out to be an advantage. G.711 performs a straightforward discretization of the input waveform. Codecs that operate in this fashion are known as waveform coding schemes. Can we lower the bit rate and still remain in the relatively simple realm of waveform coders? To a degree, it is possible to do so while retaining good voice quality. The G.721 scheme [3] encodes differences between successive samples (rather than encoding the samples themselves, as in G.711) and employs some quantizing tricks to reduce the bit rate to 32 kbit/s. Recommendation G.726 [4] specifies an enhanced version of G.721’s 32 kbit/s scheme and also introduces 16 kbit/s, 24 kbit/s and 40 kbit/s variants. Waveform codecs do not perform well at low bit rates; 16 kbps is probably “pushing the envelope.” Source coding schemes try to exploit the characteristics of the human voice (via modeling of the human vocal tract) in search of increased efficiency. To emphasize this point, such schemes are often called vocoders (for “voice encoders”). Some vocoders also exploit certain limitations of the human auditory system (for example, phase distortion is not easily detected by the human ear). Output from early vocoders sounds artificial (although they can intelligibly reproduce speech at very low bit rates). Hybrid coding schemes, which employ a degree of waveform matching in the context of speech production models, were introduced to address this shortcoming. There are many vocoders; due to space limitations, we will only look at a few. (We will use the word “vocoders” as a blanket term encompassing source and hybrid schemes. Note, however, that there seems to be some variation in usage.) Vocoders share the following feature: encoder and decoder both “think” in terms of the same mathematical model; the encoder sends parameter values associated with the model to the decoder. Many of the vocoders in common use today employ code excited linear predictive (CELP) schemes. Now we briefly explain the meanings of the terms that make up this acronym. Such a vocoder uses a linear predictive coding filter to model the vocal tract; the vocal tract is made up of the tongue, the teeth, the oral cavity itself, and so on. The term linear filter has a rigorous mathematical machinery associated with it. For our purposes, the filter is simply the thing that converts input (puffs of air traveling up from the lungs through the larynx) to output (the speech utterance itself). We say that the input signal excites the linear filter. The mathematical details are beyond our scope; suffice to say that filtering is a big part of what digital signal processing hardware does. Although much of the requisite mathematics has been around for a long time, the digital signal processing “muscle” to implement sophisticated codecs has not. For each sound, the encoder needs to observe (and communicate to the decoder) whether the input to the vocal tract is voiced (i.e., the vocal cords are actively engaged) or unvoiced. As an example, the initial consonant is voiced in the word “vocal” and is unvoiced in the word “focal.” Actually, source coders distinguish between voiced and unvoiced sounds but do not further characterize the filter input. Researchers realized that this was not enough to reproduce speech in a natural-sounding manner and set about defining a catalog of input signals called a codebook. Each codebook entry is called a code vector because it specifies multiple characteristics of the input signal. Since encoder and decoder must each maintain a copy of the codebook, we see that a CELP vocoder has nontrivial memory requirements.
110
The Bearer Plane
Like waveform encoders, CELP encoders typically sample speech at 8,000 Hz (or, in other words, sampling is performed at intervals of 125 µsec). A CELP encoder will then group the PCM samples into blocks of fixed length. For each block, the encoder determines which code vector (among all of the codebook entries) most closely reproduces the input waveform. The encoder transmits the index of this code vector, along with a set of linear filter parameters and a gain factor, to the far end. The far end produces an output signal in accordance with these parameters. G.728
The key design goal for G.728 [5], an early CELP vocoder, was to minimize delay while acheiving a moderately low bit rate (16 kbit/s). G.728 groups samples into blocks of five. Therefore it only takes 625 µsec to accumulate a block. Moreover, 10 there is no look-ahead. The codebook has 2 = 1,024 entries, so a 10-bit field is adequate to uniquely specify a code vector. Although this codec produces good voice quality, it is expensive to implement because its processing requirements are demanding. G.723.1 and G.729
In the mid-to-late 1990s, numerous codecs were standardized. The G.723.1 vocoder [6] can operate at two rates (5.3 kbit/s or 6.3 kbit/s); the rate can be changed during a session. Samples are grouped into blocks of 240, and a look-ahead of 7.5 msec is used. So G.723.1 has an inherent delay of 240 * 125 µsec + 7.5 msec = 37.5 msec. The delay inherent in an encoding scheme is known as algorithmic delay. The G.729 vocoder [7] operates at 8 kbit/s. Samples are grouped into blocks of 80 and there is a 5 msec look-ahead, resulting in an algorithmic delay of 15 msec. The complexity of this codec is a major shortcoming. This led to the specification of G.729 Annex A [8], which introduces some simplifications. The trade-off is that there is a slight reduction in voice quality. Further information on these codecs can be found in [9, 10]. G.723.1 and G.729 have silence suppression features (specified in [11, 12], respectively; see also [13] for information on the latter). Later annexes to the G.729 standard introduced vocoders with different bit rates than the original 8 kbit/s. The GSM Adaptive Multirate Family of Vocoders
The adaptive multirate (AMR) codec (see [14] and references therein) is an intriguing example. For each GSM voice call, a fixed-rate voice channel is allocated; this is effectively an 11.4 kbit/s channel (the so-called half rate case) or a 22.8 kbit/s channel (the full rate case). Transmission between handset and radio tower chronically suffers from high bit error rates; GSM’s error correction scheme compensates for this. If radio conditions are particularly bad, however, GMS’s default level of error correction is not sufficient to keep the voice signal from “breaking up.” This is the motivation for AMR.
9.2
Bearer Interworking
111
AMR can operate at a variety of bit rates ranging from 4.75 kbit/s to 7.95 kbit/s in the half rate case or 4.75 to 12.2 kbit/s in the full rate case. The AMR source can change its bit rate at 20-msec intervals. Let us concentrate on the half rate case. When the codec is operating at 7.4 kbit/s, the remainder of the 11.4 kbit/s “channel” is consumed by necessary overhead. When the codec is operating at 4.75 kbit/s, what happens to the extra channel bandwidth? It is used for additional error correction. The idea is this: in poor radio conditions, it is better to use a low bit rate scheme; although its output is less natural-sounding, it is still intelligible. By freeing some of the channel capacity for error correction, we hope that the encoded voice signals will consistently make it to the far end intact. Further Investigation
In this section, we barely touched on a vast subject area. The reader can find much more information in speech processing texts (e.g., [15]) and/or filter theory texts (e.g., [16]). We have not talked about evaluation of speech quality—how does one quantify the performance of a vocoder or the perceptual effects of latency? This is another large subject; Hardy’s recent book [17] gives broad coverage.
9.2
Bearer Interworking 9.2.1
Transcoding
Whenever participants on a call use different codecs, transcoding (that is, translation between codec formats) must be performed somewhere along the bearer path. For example, when a landline telco subscriber is connected to a wireless customer, transcoding must occur between the landline codec (usually G.711) and the wireless-specific codec. Even for mobile-to-mobile calls involving more than one MSC, G.711 is almost always used on the inter-MSC “legs’’ of the bearer paths. This is because today’s MSCs are based on landline circuit switches, whose interswitch trunking is based in turn on the 64 kbit/s “quantum.” Each transcoding instance entails a loss of quality and incurs some delay. Note, however, that transcoding to and from G.711 simplifies things in a way: to connect mobile subscribers using different codecs (e.g., because their carriers have deployed incompatible wireless technologies), no additional bearer-plane functionality is required. The two MSCs both “speak” G.711 (and they already transcode to and from this common codec as a matter of course). In particular, an MSC in one carrier’s network does not have to know which codecs are supported by its counterpart in the other carrier’s network. 9.2.2
Encapsulation of Digitized Sound
Now that we have digitized voice, what happens next? At ingress to the packet voice domain, the encoded voice is encapsulated and transmitted as a stream of packets. The encapsulation must adhere to a defined format so that the device at the far end will know how to feed the digitized signal to the voice decoder.
112
The Bearer Plane
In traditional circuit-switched networking, voiceband transmission is voiceband transmission. That is, we do not have to afford special treatment to fax signals, modem signals or dual tone multifrequency (DTMF) digits. (For readers familiar with the marketing term “touch tone” but not DTMF, these are two names for the same thing.) The developers of fax and modem technology took advantage of the fact that waveform codecs (such as G.711) really are not specific to voice. Data is encoded as an analog signal using voiceband tones (i.e., tones whose frequencies are less than 4,000 Hz). The analog signal is, in turn, passed through a G.711 encoder. The latter has no idea it is digitizing data rather than voice, and the relationship between the bits that are encoded in an analog modem signal and those representing the resulting G.711 samples is oblique. But to the fax machine or modem at the far end, the reconstituted analog signal emanating from the G.711 decoder is an intelligible data stream. Note that the conversion from analog to digital and back to analog is a lossy process; therefore it matters how many times this translation takes place. (This is one reason why modem performance varies from place to place: for telephone customers who are not served by so-called subscriber loop carriers, this conversion normally takes place only once; in the presence of a subscriber loop carrier, it takes place a second time. The second conversion does not perceptibly affect speech quality, but it has a marked effect on modem performance.) DTMF transmission is similar to fax and modem transmission, except that speed is not an issue. That is, modems and fax machines try to “pump bits” through voice channels, and throughput is important. Modems have gotten faster over the years as modulation schemes have improved. DTMF applications (such as entering PINs for credit card calls) came about because telephone sets have traditionally lacked a more sophisticated signaling capability. These applications needed some cost effective way to obtain information from subscribers; for example, it would be enormously expensive to have each PIN verification processed by an operator. Thus DTMF sensors enable an array of services that otherwise would not be cost effective. Special-purpose encapsulation formats for fax, modem, and/or DTMF signals may be necessary as vocoders see increasingly widespread deployment. This is ironic, given that fax, modem, and DTMF signals would never have been prevalent if it had been easy to send packets in the first place. Now that we are evolving to packet telephony, why do we have to accommodate all of this legacy stuff? The reality is that packet telephony will phase in very slowly, and it does not make good economic sense for carriers to dump workable technology that still generates revenue. Likewise, corporate subscribers may have substantial investments in older technology (e.g., back-end systems that can be controlled by DTMF signaling) that they are not ready to write off. Interactive voice recognition (IVR) systems may also be affected by the emergence of new codecs. For complex DTMF-driven applications, navigating through the menus can be frustrating, to say the least. Many companies (such as airlines) now employ voice-driven systems. Early generations of IVR technology are not much of an improvement on their DTMF forerunners. But this may change substantially as the intelligence and accuracy of such systems evolves. Most IVR systems expect G.711-encoded voice as input. Of course, one can transcode to G.711, but IVR performance may vary with the “original” codec.
9.3
Voice over IP
9.2.3
113
Packetization Delay and Playout Buffers
Two sources of delay associated with packet telephony are worth mentioning here. Bearer traffic must be packetized at ingress to the voice-over-packet domain; this process is the first source of delay. It would be very inefficient to transmit a packet every time a “quantum” of information was received from the encoder (either directly or via transmission through the circuit-switched domain). If, for example, each encoded block emanating from a vocoder were to be packetized separately, the payload would be dwarved by the packet headers. If the packetizer is willing to wait, more payload will stream in from the encoder (samples in the case of a waveform coder; block encodings in the case of a vocoder) and the payload-to-overhead ratio will improve. Thus there is a fundamental trade-off between throughput efficiency and delay. In many situations, it may not be worthwhile to deploy a vocoder with low algorithmic delay (such as G.728). The packetization delay may be such that one is unable to reap the low-delay benefits of the vocoder. The number of encoded blocks per packet is often a configurable parameter; in such cases, we can think of packetization delay as an adjustable thing. The second source of delay resides at egress. Packet-switched networks typically suffer from higher jitter than circuit-switched networks: if packets are dispatched across a packet-switched domain at regular intervals, they may not reach the far end at regular intervals. Therefore, playout buffers are usually present at egress points from the packet-switched domain; these buffers “smooth out” jitter but at the cost of additional delay. Buffer sizing (as well as the attendant delay) depends on the severity of jitter. When a PC user streams an audio clip from a Web site, there is a significant delay before the playout commences. The media player is waiting for its buffer occupancy to reach a threshold level; in our experience, this is a fairly extreme example of playout buffer delay.
9.3
Voice over IP Voice can be borne by a variety of packet technologies, including ATM and Frame Relay; there is nothing sacrosanct about IP in this regard, and our exposition to this point applies to packet telephony in general. Since the clear industry direction for packet voice is VoIP, we now turn our attention to its particulars. Note that we do present a limited discussion of Voice over ATM in Section A.3. We point out that, since ATM was designed from the beginning with real-time services in mind, “how to ‘do’ Voice over ATM” is quite well established. In contrast, carrier grade VoIP is still a work in progress. 9.3.1
Real-Time Services in IP Networks: RTP over UDP
“Voice over IP” is essentially an abbreviation for Voice over Real Time Transport Protocol (RTP) over UDP over IP. That is, encoded voice “frames” are encapsulated in RTP packets and then handed off to the UDP layer. RTP is not just for voice. It was also designed as a transport protocol for real-time video, and care was taken to “leave the door open” for other real-time applications to use RTP in the future.
114
The Bearer Plane
RFC 3550
The base RTP specification, RFC 3550 [18], has limited scope. In keeping with modular design principles, this document defines functionality that is required by most if not all real-time applications. Details of specific applications are relegated to separate profile and payload format documents. We discuss a few examples later in this section. Along with RTP itself, RFC 3550 defines the RTP Control Protocol (RTCP). With RTCP, the distinction between the control and bearer planes gets a little blurry: RTCP provides rudimentary control capabilities and a means of identifying multimedia session participants. RTCP can also be used to monitor data delivery performance. The RTP/RTCP specification does not define any resource reservation functionality. Thus RTP/RTCP cannot, by itself, offer QoS guarantees. By design, RTP and RTCP are independent of the transport and network layers. So it is not necessary that RTP be carried over UDP (or even IP, for that matter). It is expedient to carry RTP over UDP, however; we have seen that TCP’s retransmission capabilities are ill-suited to real-time applications. In some situations, firewall issues may dictate the use of TCP (but the extra overhead in the TCP header is simply wasted in this case). The standard insists that RTP and RTCP packet streams be distinguishable at a lower layer (e.g., by assigning different UDP port numbers to the two protocols—the default port numbers for audio/videoconferencing applications are 5004 for RTP and 5005 for RTCP). The RTP header format is displayed in Figure 9.1. Up to and including the synchronization source (SSRC) identifier, all of the fields shown must always be present. The first field, V, indicates that the current protocol version number is 2. The P bit, if set, indicates that padding octet(s) follow the payload. If the X bit is set, it means that exactly one header extension follows the fixed header shown in the figure. The CC field tells how many contributing source (CSRC) identifiers follow the mandatory header fields. The CC field is 4 bits long and the value zero is allowed. The interpretation of the M, or marker, bit varies depending on the profile; we discuss profiles later. Since many codecs are in use nowadays, we need a way to identify the codec that produced the RTP payload. The 7-bit payload type (PT) field is used for this purpose
V=2 P X
CC
M
Sequence number
PT
Timestamp
Synchronization source (SSRC) identifier
...contributing source (CSRC) identifier(s), if any, begin here... Figure 9.1
RTP header.
9.3
Voice over IP
115
in audio, video, and multimedia applications. The sequence number increments by 1 for each packet sent. The contents of the timestamp field can be (but do not have to be) based on the Network Time Protocol [19] format. We will not cover the SSRC identifier and CSRC identifier fields, except to say that a session participant’s SSRC ID can change during the life of the session. Note that no length field appears in the RTP header; RTP payload length can be deduced from the UDP length field (or other lower-layer header) and other information (e.g., the CC field). The RTCP packet header is similar to that of RTP. RFC 3550 defines the following five RTCP packet types: 1. Sender reports (SRs) are used by session participants to send transmission and reception statistics. 2. Session participants that “listen” but do not send RTP packets use receiver reports (RRs) to send reception statistics. 3. Source description (SDES) packets can contain various items such as a user’s preferred display NAME or EMAIL address. There is only one mandatory SDES item: this is the canonical name, or CNAME. CNAMEs often take the form user@host and are important because they remain constant; SSRC identifiers are bound to them. If it becomes necessary to change a participant’s SSRC ID during the lifetime of a session, RTCP is used to establish a binding between the new SSRC ID and the participant’s CNAME. 4. A BYE packet is used to indicate that an entity wants to terminate its participation in a session. 5. APP packets are used for application-specific purposes. Payload Formats and the RTP/AVP Profile
RFC 3551 [20] specifies the so-called RTP/audio/video profile (AVP) profile, which is specific to the domain of audio and videoconferencing. Among other things, RTP/AVP includes a default mapping of PT numbers to encoding schemes. For instance, the PT number for G.729 is 18. RTP/AVP also defines the RTP payload format for G.729: that is, it defines the manner in which the G.729 parameters should be arranged in encapsulating RTP packets. RTP/AVP provides PT and payload format specifications for other common audio and video codecs. (Note that some PT values are statically assigned, whereas others are left for the applications themselves to negotiate via signaling.) Payload formats for many codecs are specified in separate RFCs. Examples include [21] for AMR, [22] for comfort noise, and [23] for DTMF tones (as of this writing there is Internet draft activity that seeks to update or obsolete the current DTMF RFC). To understand the need for comfort noise, consider the following. Even during two-way conversations (let alone conference calls with many participants), participants do not speak all of the time. Conservative estimates indicate that participants are silent, on average, at least 60% of the time. Why transmit vocoder frames when there is no content? Actually, it is disconcerting for the listener to hear “talk bursts” interspersed with periods of complete silence. Therefore, so-called comfort noise
116
The Bearer Plane
packets are transmitted during periods of speaker inactivity. This allows the decoder at the far end to produce background noise, which sounds far more natural to the listener. Comfort noise packets consume much less transmission bandwidth than speech packets, as fidelity is not important in reproducing background noise. Some voice codecs provide for discontinuous transmission with comfort noise generation whereas others do not. The intent of the comfort noise RTP payload format specification is to standardize this capability for use with codecs that do not offer built-in support.
References [1] Recommendation G.711, Pulse Code Modulation of Voice Frequencies, ITU-T, 1972. [2] Cox, R. V., “Three New Speech Coders From the ITU Cover a Range of Applications,” IEEE Communications Magazine, September 1997, pp. 40–47. [3] Recommendation G.721, 32 kb/s Adaptive Differential Pulse Code Modulation (AD-PCM), ITU-T, 1988. [4] Recommendation G.726, 40, 32, 24, 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM), ITU-T, 1990. [5] Recommendation G.728, Coding of Speech at kbit/s Using Low-Delay Excited Linear Prediction, ITU-T, 1992. [6] Recommendation G.723.1, Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 kbit/s, ITU-T, 1996. [7] Recommendation G.729, Coding of Speech at kbit/s Using Conjugate-Structure Algebraic Code-Excited Linear Prediction, ITU-T, 1996. [8] Recommendation G.729, Annex A, Reduced Complexity 8 kbit/s CS-ACELP Speech Codec, ITU-T, 1996. [9] Schroder, G., and M. H. Sherif, “The Road to G.729: ITU 8 kb/s Speech Coding Algorithm With Wireline Quality,” IEEE Communications Magazine, September, 1997, pp. 48–54. [10] Salami, R., et al., “ITU-T G.729 Annex A: Reduced Complexity 8 kb/s CS-ACELP Code for Digital Simultaneous Voice and Data,” IEEE Communications Magazine, September 1997, pp. 56–63. [11] Recommendation G.723.1 Annex A, Silence Suppression Scheme, ITU-T, 1996. [12] Recommendation G.729, Annex B, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to Recommendation V.70, ITU-T, 1996. [13] Benyassine, A., et al., “ITU-T Recommendation G.729 Annex B: A Silence Compression Scheme for G. 729 Optimized for Terminals Conforming to Recommendation V.70,” IEEE Communications Magazine, September 1997, pp. 64–70. [14] TS 26.071, AMR Speech Code; General Description, 3GPP. [15] Rabiner, L., and R. Schafer, Digital Processing of Speech Signal, Engelwood Cliffs, NJ: Prentice Hall, 1978. [16] Haykin, S., Adaptive Filter Theory, 4th ed., Upper Saddle River, NJ: Pearson Education, 2001. [17] Hardy, W. C., VoIP Service Quality, New York: McGraw-Hill, 2003. [18] Schulzrinne, H., et al., RFC 3550, RTP: A Transport Protocol for Real-Time Applications, IETF, July 2003. [19] Mills, D., RFC 1305, Network Time Protocol (Version 3) Specification, Implementation and Analysis, IETF, March 1992. [20] Schulzrinne, H., and S. Casner, RFC 3551, RTP Profile for Audio and Video Conferences With Minimal Control, IETF, July 2003.
9.3
Voice over IP
117
[21] Sjoberg, J., et al., RFC 3267, RTP Payload Format and Storage Fomat for the Adaptive Multirate (AMR) and Adaptive Multirate Wideband (AMR-WB) Audio Codecs, IETF, June 2002. [22] Zopf, R., RFC 3389, Real-Time Transport Protocol (RTP) Payload for Comfort Noise (CN), IETF, September 2002. [23] Schulzrinne, H., and S.Petrack, RFC 2833, RTP DTMF Digits, Telephony Tones and Telephony Signals, IETF, May 2000.
.
CHAPTER 10
Media Gateway Control and Other Softswitch Topics Recall the following softswitch terminology: bearer traffic enters/exits the fabric of a distributed switch via media gateways (MGs). Often the media gateways belonging to a single switch are geographically dispersed. Media gateway controllers (abbreviated hereafter as MGCs or controllers) direct the operation of media gateways. When a distributed switch receives a call setup request, it is the media gateway controller that must determine the identities of the ingress and egress media gateways. The media gateway controller then instructs these media gateways to set up a bearer path through the switch fabric. The two best-known protocols for this purpose are called MGCP and Megaco/H.248. (MGCP and Megaco are both abbreviations for Media Gateway Control Protocol.) Signaling gateways are the ingress/egress points for call control signaling. Megaco/H.248 [1] and MGCP [2] are the primary topics in this chapter. Before addressing them, we take a look at generic media gateway control requirements. We also cover some preliminaries [notably Session Description Protocol (SDP)]. Then we discuss Megaco/H.248 (in considerable detail), followed by MGCP (in less detail). We finish the chapter with a few other softswitch topics. Even though we cover Megaco/H.248 before MGCP, it was historically the other way around: the IETF and ITU-T joined forces to develop a successor to MGCP, and Megaco/H.248 is the result. As part of the process, IETF’s Megaco Working Group produced a requirements document [3]. Note that early versions of MGCP (as well as some precursor protocols that were subsumed by the development of MGCP) predate the requirements RFC. So the authors of the requirements RFC had the benefit of experience gained in the development of MGCP.
10.1
Requirements The question “How should MGCs exert control over media gateways?” begs questions about the capabilities of the MGs themselves. What Is a Media Gateway Supposed to Do?
In broad strokes, the distinctions between softswitch functional elements (media gateway controllers, media gateways, and signaling gateways) are clear. The details are not necessarily so clean-cut, however, and implementations may vary. This presents a challenge: on the one hand, any protocol for media gateway control must
119
120
Media Gateway Control and Other Softswitch Topics
make some assumptions about the division of functionality between controllers and media gateways. On the other hand, the protocol should not be prejudiced toward any particular implementation approach. Moreover, the sum of required media gateway functionalities across all plausible deployment scenarios is quite extensive. It would not be cost effective to require that every media gateway support every functionality; in many, if not most, situations, a proper subset would be perfectly adequate. The Megaco and MGCP RFCs each impose baseline requirements on media gateways and then relegate optional functionality to so-called packages. Controllers can query media gateways to determine which packages they support. Lastly, required media gateway functionality might be expected to evolve as new technologies and protocols emerge. The flexibility to define new packages affords a degree of extensibility. The requirements document assumes that MGs possess the following capabilities: •
Support for maintenance functions, including establishment and maintenance of associations with controllers: The protocol must be flexible enough to support load sharing among controllers and robustness in the face of failure scenarios (i.e., fail-over capabilities). Although we do not discuss it in detail in this book, such functionality is crucial for achieving carrier-grade reliability. Connection management, including: • Capacity to allocate resources to connections (and to later deallocate those resources). The umbrella term “resources” is meant to include transcoders, voice recognition systems, and the functional components that play announcements. • Support for conferences as well as point-to-point connections. Ability to report status of resources to the MGC. Ability to detect events and apply signals. The degree to which this is necessary depends on the bearer types that a given MG terminates. Ability to recognize inband signaling and act accordingly. The degree to which this is necessary depends on the bearer types that a given MG terminates. The ability to detect DTMF digits is a very common example. •
•
• •
•
Events, signals and inband signaling take a variety of forms. Moreover, the distinction between events and inband signaling is sometimes hazy. To clarify the terminology, we offer the following examples. Example events for analog lines are on-hook and off-hook transitions. Causing a phone at the far end of an analog line to ring is an example of applying a signal. Additional examples of signals include applying dial tone and playing announcements. Signals are typically generated by the MG. Unfortunately, the industry standard terminology in this area seems to suffer from a substantial degree of built-in ambiguity. When we speak of signaling traffic, we are often talking about messages that “live” in the packet domain (e.g., SS7 or ISDN call-control messages). DTMF tones on analog lines exemplify an altogether
10.1
Requirements
121
different type of signaling. Where confusion would otherwise be possible, we will include the word message when referring to the former type of signaling traffic. In the case that a media gateway terminates a link carrying signaling messages, there are two choices (assuming that we have not incorporated signaling gateway functionality into our media gateway): •
Backhaul the signaling messages to a signaling gateway;
•
Report signaling messages to a media gateway controller as events.
Other Protocol Requirements
The requirements document also says that the protocol must: •
Support bearer connections involving arbitrary combinations of TDM, analog, ATM, Frame Relay, and IP. Note that this includes TDM-TDM, TDM-analog and analog-analog connections. Numerous requirements pertain to specific bearer types; we will not detail such requirements here.
•
Allow the MGC to assign varying priorities to different connections Incorporate a means of specifying QoS parameters on a per-connection basis. Offer a means for the MGC to specify QoS thresholds and for the MG to report threshold violations. Support a mechanism for the MG to report performance statistics and accounting information. Support a means for the MGC to specify which performance statistics and accounting information should be collected and reported by the MG. Be flexible in allocation of intelligence between MGC and MG.
• •
•
•
•
Note that these are protocol requirements; this does not mean that every MG supports every aspect of the protocol. This list is by no means exhaustive. We hope it serves to illustrate the point that any robust protocol for media gateway control must support a complex array of functions. First of all, today’s circuit switches are sophisticated beasts, and softswitches must be capable of duplicating their functionality. (Otherwise, telcos will stick with existing technology for a long time.) If softswitches are to function transparently in telco networks, then MGCs and MGs must conduct dialogs of wide scope. This in turn demands a versatile lingua franca. Moreover, softswitches add interworking between packet-switched and circuit-switched bearers to the mix. 10.1.1
ID Bindings
Softswitches must maintain bindings between identifiers in circuit-switched and packet-switched domains. The requirements RFC emphasizes this point more clearly for ATM bearers than for IP bearers (but it is fundamental in all cases). Suppose we have a softswitch that terminates ISUP trunks and has an IP-based fabric. ISUP trunks are distinguished by their circuit identification codes (CICs). In an IP network, an audio stream can be uniquely identified by an IP address and port
122
Media Gateway Control and Other Softswitch Topics
number. As calls come and go, the softswitch must keep track of bindings between CICs and (IP address, port number) pairs. These bindings, which are necessary to establish end-to-end bearer paths, are created and dissolved by media gateways at the behest of their controllers. For other softswitch configurations (e.g., an architecture that serves ISDN customers in the circuit-switched domain and employs an ATM fabric), the details vary but the principle is the same.
10.2
SDP in Brief As we saw in Chapter 9, a wide variety of voice-encoding schemes is now available. In the world of packet telephony, it is clear that vocoder selection must be negotiable; this is a major departure from the “hard-coded” approach of today’s circuit-switched networks. Addresses in the packet domain must also be exchanged dynamically. Thus one needs a standard way to specify session parameters; SDP [4] was created for this purpose. Note that SDP’s scope includes multimedia sessions (although our examples will only involve audio). Many protocols use SDP to specify session parameters (notably Megaco, MGCP, and Session Initiation Protocol; the latter is discussed at length in Chapters 11 and 12). Use of SDP is not always mandated, but it is predominant in current implementations. SDP is a text-based protocol. Some rudimentary “literacy” in SDP is necessary to understand sample Megaco messages that appear later in this chapter. In this section, we present just enough content so that the reader can “parse” those messages. To that end, we present sample SDP text below. More information on SDP appears in Chapter 12. v=0 o=- 2890844526 2890842807 IN IP4 192.168.0.50 s=t= 0 0 c=IN IP4 192.168.0.50 m=audio 4444 RTP/AVP 18 a=ptime:20
We are primarily interested in the last three lines of our SDP sample. The syntax of the “c=”, or Connection Data, series field is as follows: c=
It is easy to (correctly) guess that “IN” means Internet and “IP4” means that the last field on the line is an IPv4 address. For the last two lines of the SDP text, the reader may want to refer to supporting material in Chapter 9 and/or consult the AVP specification [5] directly. The syntax of the “m=”, or Media Description, field is: m=<media> <port>
The first two subfields are therefore self explanatory. The next subfield says that encoded voice will be transported over RTP; moreover, packet contents will be interpreted according to the audio/video profile specification. Recalling that G.729
10.3
Megaco/H.248
123
is AVP payload type 18, we see that the last subfield on this line specifies the codec. (During a negotiation, a list of codecs may appear on this line. Note also that, from the point of view of RTP payloads, G.729 and G.729A are indistinguishable. So either codec could be used.) The last line says that the “ptime”, or packet time, attribute is 20 milliseconds. (Here “a=” stands for “attribute equals.”) Since G.729 groups samples into blocks of 80, a new block comes along every 80 * 125 µsec, or 10 msec. Thus it would be possible to transmit a new RTP packet every 10 msec, or at any integer multiple thereof. With a ptime attribute of 20 msec, each RTP packet contains two block encodings. (In fact, this is the default.) We briefly describe the first four lines. The line “v=0” simply gives the SDP version number. For some mandatory fields, a “null” value is specified by a dash. There are two examples here: the username (the dash in “o=-” at the beginning of the second line) and the session name (the dash in “s=-” on the third line). The “o=” field is intended to serve as a globally unique session identifier; we omit further details. Had we wanted to specify start and stop times for our session, we would have entered nonzero values on the “t=” line.
10.3
Megaco/H.248 IETF and ITU-T have jointly developed a protocol for media gateway control. It is called Megaco by IETF and H.248 by ITU-T. Since the name “Megaco” is mnemonic, we will use it in preference to “H.248.” Version 2 of Megaco [1] was approved by ITU-T in the spring of 2002 and remains in force at the time of this writing. The corresponding IETF document, which is to supplant RFC 3525 [6], has not reached RFC status; it seems to be stuck in a “holding pattern” as an Internet-draft. The requirements document discussed in Section 10.1 was produced by IETF’s Megaco Working Group. Megaco, which came after MGCP, was produced by the same working group in concert with ITU-T’s Study Group 16. The Megaco specification is an outgrowth of the requirements process. IETF and ITU-T have jointly endorsed Megaco/H.248 as the standard for media gateway control. Thus Megaco is intended to supplant MGCP in the long run. Note that, since MGCP enjoyed a substantial “deployment head start,” Megaco will not instantly predominate. 10.3.1
Introducing the Megaco Connection Model
To understand Megaco, one must first understand its connection model. The connection model is an abstraction of a media gateway’s resources. When a media gateway controller makes a request of a gateway (and the gateway answers), both devices are “thinking” in terms of the connection model. Of course, when a controller is talking to multiple gateways (e.g., in setting up an end-to-end call), it must make sure that the information is consistent. The Megaco connection model’s two central concepts are termination and context. Roughly speaking:
124
Media Gateway Control and Other Softswitch Topics
•
Terminations are the “places”where media streams enter and/or exit media gateways.
•
Contexts describe the bindings between terminations.
10.3.2
Terminations
In Figure 10.1, an end-to-end bearer path has been set up between subscribers in areas 1 and 2 (users 1 and 2, say). For the sake of discussion, let us assume that: •
The shaded area labeled “distributed fabric” is a VoIP domain that uses the G.729A codec.
•
There is a circuit switch (although none is shown) between user 1 and media gateway A. The portion of the bearer path connecting area 1 to media gateway A is an ISUP trunk. Signaling between area 1 and the softswitch will reach the signaling gateway via an SS7 network. Voice is encoded on the ISUP trunk using G.711. User 2 accesses the network via a private branch exchange. The portion of the bearer path connecting area 2 to media gateway B is an ISDN line. ISDN signaling between Area 2 and the softswitch will enter and exit the softswitch via media gateway B. Voice is encoded on the ISDN line using G.711.
•
There are four terminations along the bearer path: that of the ISUP trunk (T1 in the figure), that of the ISDN line (termination T4), and two VoIP terminations (T2 and T3). Termination T5, an analog line, will come to the fore in Section 10.3.9. 10.3.3
Contexts
In our example, terminations T1 and T2 are associated with one another, as are terminations T3 and T4. In Megaco terms, these associations are embodied in contexts. Note that each context is local to a media gateway. Thus, in Figure 10.2, there are two contexts:
Signaling gateway Media gateway controller
Area 1
Media gateway B
Media gateway A
Area 2 T1
T2
T3
Termination
Area 3 T5 Figure 10.1
T4
Distributed fabric
Megaco connection model: Terminations.
Bearer path
10.3
Megaco/H.248
125
Signaling gateway Area 1
Media gateway controller Media gateway A
Media gateway B Area 2
T1
T2
T3
T4 Termination
Area 3 T5
Distributed fabric
Context Bearer path
Figure 10.2
Megaco connection model: Contexts.
1. A context residing in media gateway A that associates termination T1 with termination T2. 2. A context residing in Media Gateway B that associates terminations T3 and T4. Each context specifies the direction(s) of media flow among its terminations (“who hears/sees whom,” or the topology of the context, in the words of Megaco’s authors). Media mixing parameters, if necessary, are also part of the context specification. This may seem trivial for the simple two-way conversation of our example. For a broader perspective, consider an audio/video teleconference in which: •
A small population of “active” participants can speak. All active participants can see one another.
•
A much larger population of “passive” participants can listen but cannot speak to the other participants.
•
Among the passive participants, some have video-capable terminals whereas others do not.
Megaco’s notion of a context is flexible enough to support a rich variety of conferencing scenarios. 10.3.4
Megaco Commands
Most Megaco commands can only be issued by MGCs: the controllers give the instructions, and the gateways carry them out. The two exceptions are the Notify command, which can only be issued by MGs, and the ServiceChange command, which can be issued by either an MG or an MGC. There are eight Megaco commands in all; they are listed in Table 10.1.
126
Media Gateway Control and Other Softswitch Topics
Table 10.1 Megaco Commands Command
Description
Add Modify
Adds a termination to a context. Issued by MGC whenever it wishes to modify the properties, events and/or signals of a termination. Removes a termination from its current context; returns statistics on the termination’s participation in the context. Moves a termination to a different context. Returns the current state of termination(s). Issued by MGC whenever it requires information about properties, events, signals and/or statistics of termination(s). Issued by MGC when it wishes to ascertain which termination properties are supported by an MG. Issued by the MG whenever it needs to inform the MGC that event(s) have occurred (e.g., off-hook for an analog line). Can be issued by MG or MGC to take termination(s) out of service (or return termination(s) to service). This command has other uses; we do not give a complete list here.
Subtract Move AuditValue
AuditCapabilities Notify ServiceChange
The Add, Modify, Subtract, and Notify commands are the “workhorses”; for many call flows, this set of four commands is sufficient. Note the lack of explicit commands for creating and destroying contexts. A context is created when the first termination is added and is deleted when the last termination is subtracted. 10.3.5
Example Call Flow
In Figures 10.1 and 10.2, a bearer path is already present. How did it get set up in the first place? We display a simplified signaling flow in Figure 10.3.
Media gateway A
Media gateway B
Media gateway controller Add; Add (ReceiveOnly) Reply
Add; Add (SendReceive)
Modify (SendReceive)
Reply
Reply ...voice packets flow... Subtract; Subtract Reply
Figure 10.3
Simplified Megaco call flow.
Subtract; Subtract Reply
10.3
Megaco/H.248
127
Setting Up the Call
Let us suppose that user 1 originates the call. User 1’s serving switch sends an ISUP IAM to the softswitch. (Recall that ISUP is the predominant call-control protocol in SS7 networks. The reader may want to refer to Section 6.5.6’s brief discussion of ISUP.) Receipt of the IAM at the signaling gateway triggers the messaging exchange of Figure 10.3. (We assume that the signaling gateway, which is not shown in the figure, has alerted the media gateway controller that a call setup request has arrived from the SS7 domain.) Let us look at the messages in Figure 10.3 one at a time. The first Add command implicitly instructs MG A to create a context and tells MG A to place a specific ISUP trunk in that context. (That is, the controller will populate the Add command with the CIC from the incoming IAM.) The second Add command (which is also perched on the first arrow from the MGC to MG A in the figure) tells MG A to place an RTP termination in the same context and set its Mode to ReceiveOnly. Other than mode parameters of the Add and Modify commands, we have omitted all command parameters for simplicity. Typically, the controller would not request a specific (IP address, port number) pair, but would instead ask the MG to make the selection; we assume that this is the case here. MG A’s Reply will contain the address and port number that it selects. The controller then conducts a similar signaling interchange with MG B. An ISDN line and an RTP termination are added. (Creation of a context to hold these terminations is again implicit.) The second Add in this exchange is populated with the IP address and port number selected by MG A. That is why it is possible to go ahead and set the RTP termination’s mode to SendReceive. MG B’s Reply contains the IP address and port number of the RTP termination that it added; now the controller can forward this information to MG A and change that RTP termination’s mode to SendReceive. It does so by issuing a Modify command. Megaco commands are grouped into transactions. In all likelihood, the controller would combine the two Add commands to MG A into a single transaction. That is why Figure 10.3 only shows one arrow for the two commands. MG A copies the transaction’s ID into its Reply (so that there will be no confusion if multiple transactions are simultaneously in progress). Similarly, the two Adds that are dispatched to MG B would constitute one transaction. Tearing Down the Call
When the conversation is finished, the controller tells MG A to Subtract both terminations from the context that they inhabited throughout the call. The second Subtract removes the last termination from the context and therefore implicitly deletes the context. The Subtracts that are sent to MG B have an entirely analogous effect. Figure 10.4 reflects the state of the system after the call has been torn down. The bearer path and the contexts that appeared in Figure 10.2 are gone. Note that the RTP terminations (T2 and T3) are also missing. In the Megaco connection model, terminations in the packet domain are created and destroyed as calls come and go. TDM terminations (such as T1 and T4) as well as analog line terminations (recall that T5 terminates an analog line) are fundamentally different: they are
128
Media Gateway Control and Other Softswitch Topics
Signaling gateway Media gateway controller
Area 1
Media gateway A
Media gateway B Area 2 T4
T1
Termination
Area 3 T5 Figure 10.4
Distributed fabric
Bearer path
After teardown: RTP terminations deleted.
created as the result of a provisioning process, and they “live” as long as the configurations of the media gateways in question remain the same. Such terminations must reside somewhere when they are idle; the Megaco connection model defines a special context called the null context expressly for this purpose. Let us return for a moment to Figure 10.3. Terminations T1 and T4 are shifted from the null context by Add commands and are returned to the null context by Subtract commands. Note that Megaco semantics do not allow a move to or from the null context; the Subtract and Add commands must be used for this purpose. We assume that termination T5 is “parked” in the null context throughout the call flow of Figure 10.3. Additional Comments
In Figure 10.3, the gray rectangle labeled “… voice packets flow…” depicts the “lifespan” of a bearer path through the softswitch. In a manner that we now describe, the label on that rectangle may be a bit misleading (but a more accurate label would have been a mouthful). In a real call flow, the Megaco messages of Figure 10.3 would be dovetailed with: • •
ISUP messages going to and from the left-hand side of the diagram. ISDN call control messages going to and from the right-hand side. (Q.931, the basic ISDN call-control protocol, is briefly discussed in Section 11.2.1).
The ISUP and ISDN portions of the call setup must be completed before we can say that a true end-to-end bearer path exists (and it is only at this point in time that voice samples begin to flow). Softswitches tend to be geographically distributed entities (although there is no rule saying that the switch components must be in different geographic locations). Note that the latency incurred by the Megaco signaling exchanges depends on the distances between the media gateways and their controller. If the MGs are far from
10.3
Megaco/H.248
129
the controller, there will be a nontrivial effect on so-called “post dial delay.” This is true even if the MGs are relatively close to one another, for the simple reason that the MGs do not signal one another directly. 10.3.6
Usage of the Move Command
The usefulness of Megaco’s Move command is demonstrated by the following call-waiting example. This is a variant of an example presented in the Megaco specification. As a point of departure for our current example, we refer to Figure 10.2. Suppose that a user at the far end of termination T5 (user 3, say) calls user 2 while the user 1-user 2 call is still in progress. Moreover, assume that a call-waiting feature is available to user 2; user 2 chooses to answer the call and place user 1 on hold. Then the controller will: •
Add termination T5 to a context (implicitly directing that a new context should be created in the process);
•
Move termination T2 to the new context. Users 2 and 3 can now talk, whereas user 1 is on hold (the context containing termination T1 still exists, but it temporarily contains no other endpoints). The system configuration at this point is schematically represented in Figure 10.5.
Elevator music (for user 1) is optional. Under the assumption that MG A supports an ElevatorMusic signal (e.g., by supporting a package that defines such a thing), the controller could choose to apply that signal by issuing an appropriate Modify command to MG A. We have “grayed out” a bearer path from area 1 to the softswitch to show that user 1 is still connected to MG A: the trunk connecting user 1’s serving switch to MG A has not been released. The ElevatorMusic signal, if applied, would reach user 1 via this trunk. When the conversation between users 2 and 3 is over, the controller will:
Signaling gateway Media gateway controller
Area 1
Media gateway A
Media gateway B Area 2
T1
T3
T4 Termination
Area 3 T5
T2
Context
Distributed fabric Bearer path
Figure 10.5
Call waiting example.
130
Media Gateway Control and Other Softswitch Topics
•
Move termination T2 back to the first context. (And remember to turn off the ElevatorMusic signal!)
•
Subtract termination T5 from the context that contains it (resulting in the deletion of the now-empy context). At this point, we have returned to the configuration of Figure 10.2.
Is this example realistic? Here the softswitch is implementing the call waiting feature; User 2 is not employing a call-waiting feature offered by his/her private branch exchange. This example is therefore more realistic if the softswitch has replaced user 2’s private branch exchange. We have omitted many details from our presentation of this example, in which our goal was simply to demonstrate the usefulness of Megaco’s Move command. Note that signals are covered in Section 10.3.7; we give some examples in Section 10.3.10. 10.3.7
Descriptors
Megaco protocol entities have to keep track of numerous parameters. Related parameters are grouped into descriptors. The Megaco RFC often refers to atomic parameters as properties. Wherever a set of related parameters is particularly large, a hierarchy of descriptors comes into play. The Media descriptor is a case in point. All media stream parameters are specified in Media descriptors, which contain other descriptors. The hierarchy of media stream properties is as follows: •
Media descriptor: •
•
TerminationState descriptor. This descriptor contains those properties that are not specific to any one media stream but rather apply to the termination as a whole. It includes the ServiceStates property (whose allowable values are “test,” “out of service,” and “in service”) and the EventBufferControl property, which modulates the processing of detected events. (This property is applicable when the Events descriptor is nonempty. We cover Events descriptors later in this section.) Stream descriptor. This descriptor specifies the properties of a single media stream. These properties are further categorized as follows: LocalControl descriptor. Termination properties specific to the stream in question reside here. This descriptor includes the Mode property, whose allowable values are “SendOnly,” “ReceiveOnly,” “SendReceive,” “Inactive,” and “Loopback.” (Note that the Megaco RFC is inconsistent in its naming of these properties. Sometimes the names are hyphenated, as in “send-only.”) Local descriptor. This descriptor refers to media received by the MG. When the protocol is encoded in text, SDP is used for this descriptor. Remote descriptor. This descriptor refers to media transmitted by the MG. When the protocol is encoded in text, SDP is used for this descriptor.
10.3
Megaco/H.248
131
An Events descriptor is essentially a list of events that the MG is commanded to detect and report. For example, an MG may be asked to detect on-hook and off-hook transitions and fax tones. Typically an MG will send a Notify message to its controller whenever it detects such an event. Each Events descriptor has a RequestIdentifier. When reporting the occurrence of an event, the Notifying MG attaches the appropriate RequestIdentifier to an ObservedEvents descriptor. Additional actions may be appropriate for specific events (e.g., cease to apply ringing or apply dial tone upon detecting off-hook). To efficiently support such behavior, an Events descriptor can incorporate an embedded Signals descriptor. Events and Signals are defined in packages. It would be inefficient for an MG to report dialed digits one at a time. That is, when an end user is attempting to place a call, it would be laborious for the MG to Notify its controller of each dialed digit individually. Megaco uses DigitMap descriptors to specify dialing plans. (In private branch exchange and Centrex environments, dialing plans typically offer features such as four-digit dialing for internal calls. Implementation requires some pattern-matching capability on the part of the entity that collects dialed digits. A dialing plan is essentially a specification of the patterns that need to be matched.) Via the DigitMap mechanism, controllers can export dialing plans to MGs. An MG that has received such information can perform pattern matching locally and can notify its controller only when a string of dialed digits is complete. DigitMaps are particularly useful in deployments where Media Gateways replace private branch exchange or Centrex equipment. Flow directions between terminations in a context are specified via a Topology descriptor. The Megaco RFC explicitly says that it is not mandatory for MGs to support Topology descriptors. A Topology descriptor consists of one or more triples of the form (T1, T2, )
where T1 and T2 are terminations (or possibly wildcards) and is one of “isolate,” “oneway,” or “bothway.” (Wildcards are introduced in Section 10.3.8.) In the case of a oneway association, the direction of media flow is T1 → T2. The default is bothway. More specifically, if no Topology descriptor is given, then a full mesh of bothway connections among the terminations in the context is assumed. Revisiting the Distinction Between Terminations and Contexts
Contexts describe what goes on inside an MG. Terminations, being an MG’s “touch points” to the rest of the network, describe what goes on outside. When trying to sort out the difference between the LocalControl and Topology descriptors, it helps to keep this distinction in mind. The Topology descriptor determines the flow of media between terminations in the same media gateway. The mode property in the LocalControl descriptor determines the flow of media between terminations in different media gateways. Media, Events, ObservedEvents, Signals, and DigitMap descriptors belong to terminations. Topology descriptors belong to contexts.
132
Media Gateway Control and Other Softswitch Topics
10.3.8
Sample Megaco Messages
Armed with a basic knowledge of Megaco descriptors, we are now ready to examine a few Megaco messages in detail. The Megaco RFC specifies text and binary encodings of the protocol. For both UDP and TCP, the default port number for text-encoded operation is 2944 and the default port number for binary-encoded operation is 2945. In our examples, we will employ the text encoding for readability. In many instances, a controller will not know what identifiers to assign when it issues a command (e.g., port numbers for RTP streams). The controller will ask the MG to select values for these identifiers, and to populate its reply with those values. The so-called CHOOSE wildcard is used for this purpose. In our examples, the CHOOSE wildcard is denoted “$”; this is consistent with the examples presented in the Megaco RFP itself. We remark here that there is also an ALL wildcard, although we will not need it in our examples. Recall that Megaco uses SDP for the Local and Remote descriptors. For messages traveling controller → media gateway, Local/Remote descriptors are actually cast in a slightly modified version of SDP. The following departures from the SDP specification [4] are allowed: • •
The “s=”, “t=” and/or “o=” fields may be omitted. In place of a single parameter value: The “$” wildcard may appear. • Alternatives may be specified. •
Note that Local/Remote descriptors in messages traveling MG → Controller must adhere to the SDP specification. We review the call setup procedure of Figure 10.3. The controller initiates a transaction commanding MG A to add a TDM termination and an RTP termination to a context. MG A’s reply contains the IP address and port number of the latter termination. The controller initiates a transaction commanding MG B to Add a TDM termination and an RTP termination to a context. In its reply, MG B populates the IP address and port number of the RTP termination; the controller then communicates this information to MG A via a Modify command. The messages below detail the second transaction and reply (that is, the transaction between the controller and MG B; in the interest of brevity, we do not display the messages constituting either of the transactions between the controller and MG A). In this example, the participants’ IP addresses are as follows: •
Controller: 192.168.0.51;
•
MG A: 192.168.0.30; MG B: 192.168.0.50.
•
The default port number for text-encoded Megaco is 2944. We assume that the Megaco protocol entities on all three network elements are listening at this port number. Although each of the participating network elements has one and only one address in this example, the reader should not assume that this is universally true. In
10.3
Megaco/H.248
133
practice, it is common for MGs to have multiple IP interfaces (each having a different IP address). In particular, bearer terminations might very well reside at different IP addresses than Megaco protocol entities. The following transaction request travels controller → MG B. MEGACO/1 [192.168.0.51]:2944 Transaction = 122603 { Context = $ { Add = BT0001, Add = ${ Media{ Stream = 1{ LocalControl Mode = SendReceive }, Local v=0 c=IN IP4 $ m=audio $ RTP/AVP 18 a=ptime:20 }, Remote v=0 c=IN IP4 192.168.0.30 m=audio 2222 RTP/AVP 18 a=ptime:20 } } } } } }
The first line contains the IP address and port number of the originator (in this case, the controller). The controller is asking MG B to create a new context and furnish a contextID; this is the first of several uses of $, the CHOOSE wildcard. In the first Add, the controller requests a specific TDM channel. The Megaco RFC does not specify terminationID semantics; the name “BT001” is totally arbitrary. The second Add requests an RTP termination (note that RTP/AVP appears in its Local and Remote descriptors). This Add features the $ wildcard in three places: for the terminationID (immediately after the word “Add”) and for the IP address and port number of the RTP termination (in the Local descriptor). We will return to the Local descriptor in a moment. Although the mode setting is the only content of the LocalControl descriptor in this example, other properties can be specified there (e.g., jitter requirements). Recall that the Local descriptor refers to media received by the MG, whereas the Remote descriptor refers to media transmitted by the MG. In the Local descriptor, the controller is asking MG B to specify the IP address and port number where it wishes to receive RTP packets. As noted earlier, RTP traffic may arrive at different IP interface(s) than Megaco messages. So the presence of an IP address field here is not redundant. (Note that, if the destination IP addresses for RTP and Megaco traffic are the same, the port numbers must be different.) The Remote descriptor specifies the IP address (192.168.0.30) and port number (2222) for RTP traffic emanating from this context on MG B. Note that the IP address is that of MG A, and that the port number is not the same as that of the Megaco protocol entity. We are assuming in this example that the TDM termination BT0001 has been provisioned in SendReceive mode. The following Reply travels MG B → controller.
Notes on the Add transaction.
134
Media Gateway Control and Other Softswitch Topics MEGACO/1 [192.168.0.50]:2944 Reply = 122603{ Context = 5050{ Add = BT0001, Add = BR0720{ Media{ Stream = 1{ Local v=0 o=- 2890844526 2890842807 IN IP4 192.168.0.50 s=- t= 0 0 c=IN IP4 192.168.0.50 m=audio 4444 RTP/AVP 18 a=ptime:20 } } } } } }
The first line contains the IP address and port number of the originating protocol entity. The controller knows that this message is in fact a response to the transaction that appears above because the TransactionIDs match. We see that MG B has filled in all of the wildcards: namely, the ContextID (5050) for the newly created context, as well as the TerminationID (BR0720), IP address (192.168.0.50), and port number (4444) for the newly created RTP termination. We remark that MG B has also added “o=”, “s=” and “t=” fields to the Local descriptor. (These fields are missing from the Local and Remote descriptors in the earlier Add transaction. As noted earlier, controllers are allowed to bend the SDP syntax rules, but media gateways are not.) MG B has omitted the LocalControl and Remote descriptors from its reply. Since the controller did not ask MG B to populate any fields in those descriptors, it would be redundant to include them.
Notes on the Reply transaction.
10.3.9
Three Way-Calling Example
In this section, we present a simple conferencing scenario: suppose that a subscriber in area 3 (user 3, say) has been added to the conversation of our previous example. User 3 accesses the network through an analog line that is terminated by the softswitch. That analog line termination is labeled T5 in Figure 10.6 (as well as Figure 10.2). As suggested by Figure 10.6, Megaco’s connection model handles the three-way connection in a simple way: termination T5 is added to the context that already contained terminations T1 and T2. Let us assume that a two-way conversation (between user 1 in area 1 and user 2 in area 2) is in progress; suppose that the participants want to add user 3 to the call. The signaling flow shown in Figure 10.7 accomplishes this goal. We detail the Add, Notify and Modify messages depicted in the figure. The replies do not add content, so we have chosen to omit them. (Note that we circumvent the matter of how current participant(s) tell the softswitch that they want to add a third participant. This is beyond our scope; here we are only interested in the Megaco portion of the call flow.) The Add transaction (which travels Controller → MG A) looks like this:
10.3
Megaco/H.248
135
Signaling gateway Media gateway controller
Area 1
Media gateway A
Media gateway B Area 2
T2
T1
T3
T4 Termination
Area 3 T5
Distributed fabric
Context Bearer path
Figure 10.6
Simple conference call example.
Media gateway A
Media gateway controller
Media gateway B
...two-way conversation... Add/Reply Notify/Reply Modify/Reply ...three-way conversation...
Figure 10.7
Signaling for three-way call.
MEGACO/1 [192.168.0.51]:2944 Transaction = 155851{ Context = 5643{ Add = AA0049{ Media{ Stream = 1 {LocalControl {Mode = SendReceive}}} , Events=4211{a1/of(strict=state)}, Signals al/ri } } }
Note the presence of Events and Signals descriptors for the analog line. In those descriptors, “al” refers to the analog line package, “of” denotes off-hook, and “ri” denotes ringing. Thus the controller is telling MG A to ring user 3’s phone and to report off-hook when and if this transition occurs. Since the newlyadded termination connects to an analog line, Local and Remote descriptors are unnecessary.
136
Media Gateway Control and Other Softswitch Topics
When user 3 answers, we have the Notify transaction traveling MG A → controller: MEGACO/1 [192.168.0.30]:2944 Transaction = 158053{ Context = 5643{ Notify = AA0049{ ObservedEvents = 4211 {20031231T22020002:al/of(init=false)} } } }
The ContextID, TerminationID and EventsId in the Notify transaction match those in the Add transaction that went before. Lastly, we have the Modify transaction traveling controller → MG A: MEGACO/1 [192.168.0.51]:2944 Transaction = 155854{ Context = 5643{ Modify = AA0049{ Events=4212al/on(strict=state), Signals{} } } }
Here the controller has asked MG A to turn off ringing (as indicated by the null Signals descriptor) and to detect and report on-hook. Note that the EventsID is different than that of the earlier Add transaction; the new Events descriptor replaces the old Events descriptor for termination AA0049. In this example, users 1 and 2 do not hear ringing while waiting for user 3 to answer. For this to happen, we would merely have to apply ringback signals to the appropriate terminations (by modifying them in the course of the first transaction, say). Note that we would need to remove the ringback signals later (as part of the last transaction). We assume that users 1, 2, and 3 can all hear one another once the Modify Transaction above is processed and accepted. Recall that connections have toplogies; also that the default topology is a full mesh of bothway connections among the terminations in a connection. That is why no Topology descriptor appears in either of the messages above: none is necessary, since the intended behavior is the default. 10.3.10
Megaco Miscellanea
Other Descriptors
We have only covered a handful of the descriptors defined in the Megaco RFC. Here we briefly mention a few more. The specification defines descriptors for special types of bearer traffic (Modem and Multiplex descriptors). Moreover, we have not covered error reporting and handling (which involves Error descriptors) or performance and resource usage reporting (which involves Statistics descriptors). We remark that performance and resource usage reporting is crucial—since controllers do not encounter bearer traffic directly, they have no other way of knowing what happens in the bearer plane.
10.4
MGCP
137
Packages
The Megaco RFC defines some 13 packages. Here are a few highlights (for details, the reader can consult the specification [6] directly): the base root package defines gateway-wide properties, such as the maximum number of terminations per context. In Section 10.3.9, we saw an example use of the analog line supervision package. Ringback and dial tone reside in the call progress tones generator package. Echo cancellation can be turned on or off by setting the echo cancellation property in the TDM circuit package. Other packages are defined within ITU-T’s H.248 series of recommendations: fax, text conversation, call discrimination, and other functionalities in [7], user interface elements and actions in [8], dynamic tones in [9], and announcements in [10]. Packages that assist with switch management include [11–13]. A package is defined by specifying its properties, events, signals, statistics, and procedures. We remind the reader that, in the interest of extensibility, Megaco’s authors left the door open for new packages to be defined as needed. Transport
The “base” Megaco specification covers IP transport of Megaco messages via UDP and TCP. Transport over SCTP and ATM are defined in [14] and [15], respectively. Revisiting the Call-Waiting Example
We review the following circumstances in the call-waiting example of Section 10.3.6. User 2 has a call-waiting service that is administered by a softswitch. Users 1 and 2 are in the midst of a conversation when another user attempts to call user 2. User 2 is alerted of the second incoming call by a call-waiting tone. That tone, which is defined in the call progress tones generator package, is applied (to user 2’s termination) as a result of a Modify command with an appropriate Signals descriptor.
10.4
MGCP MGCP is defined in RFC 3435 [2]. As noted in Section 10.3, Megaco is the standard supported by IETF and ITU-T. MGCP is not on the “standards track”: RFC 3435 is an informational RFC. However, MGCP is widely deployed. Unlike Megaco, MGCP is purely a text-based protocol (recall that Megaco offers binary and text-encoding options). 10.4.1
Example Call Flow
In this section, we recast the call flow of Section 10.3.5 in MGCP terms. In so doing, we introduce some MGCP terminology as a “mapping” of the following Megaco terms. Even though the two protocols aim to solve the same problems, Megaco and MGCP use different vocabularies. Therefore the mappings are loose and should not be taken as precise equivalances.
138
Media Gateway Control and Other Softswitch Topics
•
Megaco media gateway controller → MGCP call agent.
•
Megaco context → MGCP connection. Megaco termination → MGCP endpoint.
• •
Megaco Add command → MGCP CreateConnection command, or “verb.” The latter is abbreviated CRCX.
•
Megaco Modify command → MGCP ModifyConnection command. The abbreviation for this MGCP verb is MDCX.
•
Megaco Delete command → MGCP DeleteConnection command. The abbreviation for this MGCP verb is DLCX.
•
Megaco mode property → MGCP ConnectionMode parameter. The abbreviation for this MGCP parameter is “m:”.
With this mapping in mind, we see that Figure 10.8 is entirely analogous to our earlier call flow (see the Megaco call flow presented in Figure 10.3). We have seen that, in both the Megaco and MGCP signaling flows, the first connection/context must be modified—that is, converted to a bidirectional connection once the far end’s session parameters are available. The same capability can be used to effect mid-call modifications. 10.4.2
Brief Comparison with Megaco
In this section, we discuss conceptual similarities and differences between Megaco and MGCP. The levels of difficulty for implementing the two protocols are beyond the scope of our discussion. The differences between MGCP and Megaco start with the differences in their connection models. The fundamental notions in the MGCP model are endpoint (which is similar to a Megaco termination) and connection (which is similar to a Megaco context). MGCP views digital channels (e.g., TDM trunks and ISDN lines) and analog lines as endpoints. In Figure 10.9, endpoint E1 terminates a TDM trunk, endpoint E4 terminates an ISDN line, and endpoint E5 terminates an analog line. This is Media gateway A
Media gateway B
Call agent CRCX (m: recvonly) 200 OK MDCX (m: sendrecv)
CRCX (m: sendrecv) 200 OK
200 OK ...voice packets flow... DLCX 250 OK
Figure 10.8
Simple MGCP signaling flow.
DLCX 250 OK
10.4
MGCP
139
Signaling gateway Area 1 Call agent Media gateway A
Media gateway B Area 2
E1
E4 Endpoint
Area 3 E5
Distributed fabric
Connection Bearer path
Figure 10.9
MGCP connection model.
analogous to the Megaco example of Figure 10.2. A few other entities (such as announcement server and interactive voice response access points) are also categorized as endpoints. So far, the “mapping” of Megaco terminations to MGCP endpoints seems to be going well. But the correspondence only holds to a limited degree: in Figure 10.9, note that there are no endpoints facing the switch fabric. In fact, there is no such thing as an RTP endpoint in the MGCP model. The session description parameters (that is, IP addresses and port numbers for RTP streams, codec specification, and so on) belong to connections on the media gateways. (By way of review: in Megaco, we do have RTP terminations, which are in turn contained in contexts.) The analogy between MGCP connections and Megaco contexts is a little more robust—an MGCP connection describes the binding between ingress and egress points on a media gateway. Two connections appear in Figure 10.9—one for each media gateway that the bearer path traverses. (The choice of the term connection here is perhaps unfortunate, since it might lead to confusion with the notion of an end to end connection. In MGCP, connections are grouped together in calls. Although each call has a unique CallID, this parameter is vestigial; the MGCP RFC says that the CallID has “little semantic meaning in the protocol,” although it may be useful for accounting purposes.) MGCP defines virtual endpoints; the examples cited in the MGCP RFC are digital signal processing resources that belong to announcement servers, interactive voice response devices, or conference bridge devices. Conferences
To set up a conference call with MGCP, multiple connections to a single endpoint are established. There are two apparent approaches, which are distinguished according to whether that endpoint is a conference bridge endpoint: •
Case I: Endpoint is not a conference bridge endpoint. The specification says that some types of endpoints (notably TDM channels and analog lines) should
140
Media Gateway Control and Other Softswitch Topics
support the establishment of multiple simultaneous connections. Referring to Figure 10.9, the media gateway should be able to establish a three-way call by connecting terminations E5 and E1. This capability would support an ad hoc conference like that of the Megaco scenario presented in Section 10.3.9. The problem is that, if the user at the far end of E1 hangs up, the other participants on the call are also disconnected. •
Case II: Endpoint is a conference bridge endpoint. In this case, individual participants can presumably come and go at will without causing the other participants to disconnect. So this appears more robust than the other approach. This approach requires participant(s) to anticipate the need for conference bridge resources, however, so ad hoc conferences present a problem.
Regardless of the approach that is chosen, MGCP does allow a degree of control over “who hears/sees whom”: one of the allowable values for the ConnectionMode parameter is “conference.” The MGCP specification [2] says that a media stream received through a connection in this mode is forwarded to: •
The endpoint associated with the connection;
•
All other conference-mode connections.
Still and all, MGCP’s approach to conference calls is not conceptually as clean as that of Megaco. MGCP’s connection concept is not as general as Megaco’s notion of a context. Moreover, in the MGCP connection model, there is nothing as flexible as Megaco’s Topology descriptor. (Recall that the Topology descriptor determines “who hears/sees whom.”) 10.4.3
Other MGCP Verbs
Event notification. The Megaco Notify command maps to an MGCP verb of the same name. As with Megaco, this is the means by which a media gateway reports the occurrence of events (e.g., off-hook and on-hook transitions for analog lines). An MGCP call agent uses the NotificationRequest verb to specify those events that the Media Gateway should report. Recall that Megaco’s approach is slightly different: rather than using a separate command for this purpose, controllers incorporate Events descriptors in Add and/or Modify commands. The MGCP EndpointConfiguration verb is used to specify the bearer encoding scheme for a TDM endpoint. The values “A-law” and “mu-law,” which distinguish between the two common variants of G.711, are the only ones defined in the MGCP RFC. In our view, this is a shortcoming; Megaco’s approach is more sensible. Megaco uses Local and Remote descriptors (whose lingua franca is SDP) within Modify commands for this purpose. The MGCP AuditEndpoint and AuditConnection verbs allow call agents to query status of terminations and connections. The functionality of the AuditTermination command is quite similar to that of Megaco’s AuditValue command. AuditConnection has no direct Megaco counterpart. Note, however, that many of the properties of an MGCP connection (e.g., RTP stream parameters) correspond to properties of a Megaco termination.
10.4
MGCP
141
MGCP’s RestartInProgress command, which is similar to Megaco’s ServiceChange command, can be used to take endpoints in and out of service. The Lack of a Move Command
As we saw in the call-waiting example of Section 10.3.6, Megaco’s Move command is a nice convenience. There is no MGCP equivalent (a MoveConnection verb was proposed but later abandoned). Call waiting and other similar features are, of course, possible with MGCP. MGCP signaling to implement such features is somewhat more laborious than with Megaco, however. We regard this as a drawback of MGCP. 10.4.4
Transactions and Provisional Responses
Each MGCP message is either a command, a response, or a response acknowledgment. Responses and response acknowledgments have 3-digit return codes, which can be categorized as follows: •
0xx: response acknowledgment;
•
1xx: provisional response; 2xx: final response indicating successful completion; 4xx, 5xx: final response indicating error condition; 8xx: package-specific response.
• • •
Response acknowledgments are only generated for final responses. Each MGCP command includes one (and only one) verb and a transactionID. TransactionIDs are used to correlate commands with responses and response acknowledgments (which also include transaction-IDs). According to the specification, MGCP is transported over UDP, which does nothing to ensure reliability. Thus a call agent or media gateway may not be able to tell whether a message sent was received. Therefore it is important to recognize and discard duplicates; transaction-IDs are essential for this purpose. The “three-way handshake” that is present in the pattern
is also intended to compensate for the shortcomings of unreliable transport. Provisional responses are useful in situations where commands cannot be executed quickly (e.g., when a media gateway’s processing capability is congested due to a “burst” of call setup requests). Provisional responses can be used to keep timers (e.g., retransmission timers in protocol state machines) from expiring when it is appropriate to do so. The Megaco RFP briefly mentions the need for provisional responses, but MGCP does a substantially better job in this area. We have seen that Megaco supports the incorporation of multiple commands into a single transaction. This is an advantage over MGCP, which cannot group multiple verbs within the same transaction. However, MGCP does provide for multiple transactions and/or responses to be encapsulated in a single UDP/IP packet:
142
Media Gateway Control and Other Softswitch Topics
successive messages inhabiting the same packet must be separated by a line consisting entirely of a single “.”. 10.4.5
MGCP Packages
RFC 3435 (the “base” MGCP RFC) does not define any packages—this aspect is relegated to other documents. At the time of this writing, packages that define the following MGCP functionalities exist: channel-associated signaling [16], ATM bearers [17], and Bulk Auditing [18]. Note that RFC 3435 is not a “stand-alone” document, since some of its basic functionality is specified as a package (see reference [19]). One of the precursors of MGCP was IP Device Control Protocol. This is presumably the reason that ATM bearers were “tacked on” via a package definition RFC.
10.5
Interworking with Circuit-Switched Networks The softswitch architecture was motivated by the need to interwork with circuit-switched domains. Physical separation of bearer and control, while not mandated in the softswitch realm, is clearly a key benefit in many deployment scenarios (think of the backhaul example we explored in earlier chapters). Although softswitch is arguably the main topic of this book, it is not the only useful approach to packet telephony. We elaborate on this point in later chapters. In alternative architectures, logical separation of bearer and control remains an important concept. 10.5.1
Latency Trade-offs
Let us envision a simple call flow in which the called party accesses a softswitch via an analog line. When does the softswitch apply ringing to that analog line? Certainly the media gateway controller will attempt to set up a bearer path through the softswitch fabric as soon as it realizes the need to do so. That process takes time to complete, however. The latency involved may be substantial if the softswitch is experiencing a high rate of call setup requests. The distance of the media gateways from their controller, if large, may also be a nontrivial contributing factor. Should the controller apply ringing to the analog line immediately, on the assumption that the cross-fabric bearer setup will, in all probability, complete successfully? The motivation for doing so is that the cross fabric setup can proceed while the switch waits for the end user to answer, thereby reducing overall latency. The downside is also clear: if the called party answers immediately, he/she may not initially be able to hear the caller’s voice (and vice versa). Worse yet, if the cross-fabric setup fails, we will have disturbed the called party unnecessarily. Different implementations may opt for different approaches to this problem. Had we chosen to incorporate end-to-end message flows in our Megaco/MGCP examples, we would have encountered similar problems. That is, ISUP and ISDN
10.6
Inhabiting the Bearer, Service, and Control Planes
143
messaging would be interleaved with Megaco/MGCP messaging. Exactly how would the interleaved flows be ordered? The choices would again involve latency trade-offs. The standards do not address these issues; in selecting a preferred approach, each implementor must evaluate his or her priorities.
10.6
Inhabiting the Bearer, Service, and Control Planes Try as we might, it is impossible to achieve perfect separation of the bearer plane from the service and control planes. The need to handle events and signals is a key exemplar. This fact accounts for some of the complexity of media gateway control. To reinforce this point, which is an important theme of this chapter, we revisit DTMF:
10.7
•
DTMF tones are ubiquitious in telephony—they are used to navigate menus in a variety of applications (e.g., voice mail and banking systems) and to enter data (e.g., personal ID numbers for authentication of credit card calls). Softswitches will be expected to support services like those available in today’s PSTN.
•
By definition, DTMF tones and other voiceband bearer signals do not pass through media gateway controllers. So controllers must rely on media gateways for notification of DTMF activity.
Signaling Between Two Softswitches For the sake of simplicity, we have so far assumed that the calling and called parties are served by the same softswitch. Clearly this cannot always be true. When a softswitch is connected to other switches via the TDM domain, ISUP and/or ISDN signaling can be used. This is true regardless of whether the other switches are softswitches or circuit switches. If we wish to interconnect two softswitches in the packet domain, however, ISUP and ISDN no longer suffice. We discuss two protocols suited to this purpose: Bearer Independent Call Control (BICC) and Session Initiation Protocol. The former is covered briefly in the next section; the latter is covered at length in subsequent chapters. 10.7.1
BICC
BICC is based on ISUP, which was introduced in Section 6.5.6. Thus it is a telco style approach to call control; it basically extends ISUP to handle packet bearers and multiple codecs. BICC is a relatively modest extension of ISUP. The idea is to replicate the features of today’s circuit-switched networks in the realm of packet telephony, thereby enabling softswitches to support the types of services offered now. The initial BICC specification, which was set forth in [20], was followed by the Capability Set 2 (CS2) specification. ITU-T’s Q.1902.x series of recommendations defines BICC CS2; the functional description appears in [21].
144
Media Gateway Control and Other Softswitch Topics
The ISUP specification stretches across multiple documents (ITU-T’s Q.76x series of recommendations). The BICC specification basically amounts to a set of “delta documents.” That is, the ITU-T recommendations that cover BICC are written as a set of exceptions to the corresponding ISUP recommendations. Although this arrangement makes for very tough reading, it does make some sense: BICC can be transported over any protocol stack that supports ISUP (and the array of SS7 transport options has widened considerably in recent years). BICC’s initial capability set supported ATM bearers. IP bearers were covered later (see [22]; not surprisingly, that recommendation endorses SDP for IP bearers). So BICC’s order of development reversed that of MGCP and Megaco.
References [1] Recommendation H.248.1, Media Gateway Control Protocol Version 2, ITU-T, 2002 [2] Andreasen, F., and B. Foster, RFC 3435, Media Gateway Control Protocol (MGCP) Version 1.0, IETF, January 2003. [3] Greene, N., M. Ramalh, and B. Rosen, RFC 2805, Media Gateway Control Protocol Architecture and Requirements, IETF, April 2000. [4] Handey, M., and V. Jacobson, RFC 2327, SDP: Session Description Protocol, IETF, April 1998. [5] Schulzrinne, H., and S. Casner, RFC 3551, RTP Profile for Audio and Video Conferences With Minimal Control, IETF, July 2003. [6] Groves, C., et al., RFC 3525, Gateway Control Protocol Version 1, IETF, June 2003. [7] Recommendation H.248.2, Facsimile, Text Conversation, and Call Discrimination Packages, ITU-T, 2000. [8] Recommendation H.248.3, User Interface Elements and Actions Package, ITU-T, 2000. [9] Recommendation H.248.6, Dynamic Tone Definition Package, ITU-T, 2000. [10] Recommendation H.248.7, Generic Announcement Package, ITU-T, 2000. [11] Recommendation H.248.8, Error Codes and Service Change Reason Description, ITU-T, 2000. [12] Recommendation H.248.10, Congestion Handling Package, ITU-T, 2001. [13] Recommendation H.248.11, Media Gateway Overload Control Package, ITU-T, 2002. [14] Recommendation H.248.4, Transport over SCTP, ITU-T, 2000. [15] Recommendation H.248.5, Transport over ATM, ITU-T, 2000. [16] Foster, B., RFC 3064, MGCP CAS Packages, IETF, February 2001. [17] Kumar, R., RFC 3441, Asynchronous Transfer Mode (ATM) Package for the Media Gateway Protocol (MGCP), IETF January 2003 [18] Foster. B., D. Auerbach, and F. Andreason, RFC 3624, The Media Gateway Control Protocol (MGCP) Bulk Audit Package, IETF, November 2003. [19] Foster, B., and F. Adreasen, RFC 3660, Basic Media Gateway Control Protocol (MGCP) Packages, IETF, December 2003. [20] Recommendation Q.1901, Bearer Independent Call Control Protocol, ITU-T, June 200. [21] Recommendation Q.1902.1, Bearer Independent Call Control Protocol (CS2) Functional Description, ITU-T, July 2001. [22] Recommendation Q.1970, Bearer Independent Call Control IP Bearer Control Protocol, ITU-T, July 2001.
CHAPTER 11
Session Control In this chapter, we discuss various approaches to session control. We will sometimes use the word session, in preference to call, to connote something more general than a typical bidirectional telephone conversation. Thus a session might be any number of things, such as a videoconference or a half-duplex voice “conversation” (see Section 13.7.1). The conferencing theme has many variants (e.g., a conference in which some participants transmit and receive voice and video while others participate only via voice, or a conference in which some participants transmit and receive media streams while others can only receive). We have seen that the Megaco and MGCP protocol designs incorporate some flexibility along these lines (as evidenced, for example, by Megaco’s Topology descriptor). We will see in this chapter that media gateway control (in the form of Megaco or MGCP) is not “the only game in town” when it comes to flexible session control.
11.1
“Generic” Session Control Protocol details vary from one session control scheme to another. Later in this chapter, we will encounter some differences in basic functionality. As a starting point, however, we will concentrate on the similarities between session control protocols. There is a substantial amount of common functionality due to the fact that various protocols are aimed at solving the same or similar problems. Figure 11.1 presents a signaling flow; we have intentionally avoided casting this flow in terms of any specific protocol, using generic names for the messages instead. The flow in the figure is simplified; in particular, no signaling or switching intermediaries are shown and no address resolution is depicted. It is useful to think of the calling and called parties as end users’ terminal equipment (or, more precisely, as processes running therein) rather than the end users themselves. The steps in our generic signaling flow are as follows: after (1) the calling party issues some sort of session setup request, (2) the called party sends a message acknowledging the request and indicating that it has begun processing for this request. Once its processing is complete, (3) the called party confirms its availability and willingness to participate in a session. It is not a given that the calling and called parties support the same codecs (or have the same preferences). So (4) the parties enter into a dialog to compare capabilities and preferences. Once an agreement has been reached, (5) bearer channel(s) are established. At this point, the interactive session between end users can begin.
145
146
Session Control
Caller
Callee (1) Session setup request (2) “I’m working on it” (3) Session setup confirmation (4) Negotiate terminal capabilities (5) Establish bearer
...voice and/or video packets flow... (6) End participation in session (7) Acknowledge end of session Figure 11.1
Generic signaling flow for session control.
Note that steps (4) and (5) are shown with bidirectional arrows: at least in principle, either endpoint could initiate the interchange. The session ends when (6) one endpoint announces its intention to withdraw from the session and (7) the other endpoint acknowledges. Of course, we could reverse the arrows for steps (6) and (7); the main thing is that the two point in opposing directions. Why are there so many steps, even in a simplified flow? The session’s bandwidth requirements may not be known until (4) negotiate terminal capabilities is complete. So, at least in networks with explicit bandwidth reservation and connection admission control, step (4) is a prerequisite for (5) establish bearer. Moreover, it does not make sense to perform step (4) until we know that the called party is available and willing. Thus (3) session setup confirmation is a prerequisite to step (4). If calling and called parties are Voice over IP entities on the same LAN, steps (2) and (3) of the signaling flow may occur very close together in time. If interworking between domains is necessary, or even if requests must be authenticated and approved by a controller (not shown in the figure), then the delay between steps (2) and (3) may well be nontrivial. In this case, step (2) allows the calling party’s protocol state machine to set its internal timers appropriately1. Some protocols piggyback the codec negotiation of step (4) on other messages in the call flow. The primary reason for this is to speed things up. In this case, a separate step (5) may also be unnecessary. We chose to show the codec negotiation as a separate step to emphasize that this function has no counterpart in traditional telco signaling (read Section 11.1.1 for more on this point).
1.
This state machine may have one timer for the gap between its dispatch of the message in step (1) and receipt of the acknowledgment in step (2); if the timer expires, the state machine might reissue the setup request on the assumption that the original request was lost. Once step (2) has completed, a second timer may govern the state machine’s willingness to wait for step (3); this timer may be set to a higher value than the first timer.
11.1
“Generic” Session Control
11.1.1
147
Comparison with ISUP Call Flow
It is worthwhile to compare the signaling flow of the previous section with a basic ISUP call flow (see Figure 6.8). The initial session setup request [step (1) in the generic signaling flow of Figure 11.1] takes the form of an ISUP IAM. Step (2) “maps” to an ACM. No codec negotiations need to take place, so step (4) has no ISUP counterpart. Moreover, steps (3) and (5) coincide. Together, they correspond roughly to an ISUP ANM. Steps (6) and (7) correspond, respectively, to ISUP REL and RLC messages. The most important distinction between an ISUP call flow and that of Figure 11.1 is this: the participants in the ISUP call flow are switches, whereas the participants in the latter are the end users’ terminals. Telephone sets have traditionally possessed very little intelligence. In the typical legacy deployment, this means that essentially all of the intelligence (and, perhaps more importantly, the vast majority of control capabilities) resides in the network. 11.1.2
Modularity in Protocol Design
Just as large-scale software projects are modular, protocol design should be modular. We discussed this concept when we introduced protocol stacks in Chapter 6; a given module (or layer) relies on the services of the layer below it and provides services to the layer above it. In the current context of control and service planes, numerous capabilities are required. Modularity is still important: it is expedient to subdivide the necessary functionality into a variety of protocol specifications, even when the protocols reside in different planes. We need a way for media gateways to talk to their controllers (the subject of Chapter 10), for end systems to talk to each other and to various network elements, and so on. Of course, a significant degree of modularity is already present in the legacy case: ISUP call-control signaling is specified separately from the G.711 codec, for example. There was an underlying assumption that the two protocols would go together, however, embodied in the fact that the switching infrastructure works in 64 kbit/s “quanta.” This was entirely reasonable, considering that ISUP had very limited “competition” and G.711 had essentially none at all. (For completeness, we note that Telephone User Part preceded ISUP and is still used in some parts of the world.) These days, there is an additional question of how constituent protocols should fit together in a full-featured network deployment. The pundits keep saying that the current era is one of fundamental change. For the most part, we are inclined to agree, although we fully expect such change to be painfully slow. What is beyond debate is this: there are many people who think they understand the best way to affect fundamental change and are driving a variety of standards bodies to formulate new protocols at a dizzying rate. Many good ideas are being promulgated in the standards bodies; there is also quite a bit of one-upsmanship. In the current era of protocol proliferation, modular protocol design is all the more important. Suppose, for example, that we have settled on a particular media gateway control protocol and now need to select a protocol for media gateway controllers to talk to each other. It would be nice if, after making the first selection, we
148
Session Control
still had complete freedom in making the second selection. Ideally, a “clean” design process would yield this sort of independence, or at least minimize (and clearly document) any dependencies. A second goal is to interoperate gracefully with existing protocols. (Similarity with existing protocols, especially those that enjoy large embedded bases, eases the learning curve for humans and arguably eases the software implementation process.) Unfortunately, the second goal usually competes with rather than complements the first goal. There are multiple approaches to the question, “What is the best mix of protocols?” for a given network deployment. For example, an explicit “umbrella” standard could specify a collection of protocols that interwork to form a coherent whole. At the other end of the spectrum, protocol combinations could be addressed by flexible recommendations or left entirely to the discretion of operators and/or equipment vendors. Note that, in either case, agreement on the scope of each candidate protocol is very important. Otherwise, it would be difficult to avoid overlap in functionality while assuring that a given collection of protocols is adequate for the task at hand.
11.2
The H.323 Protocol Suite H.323 [1], which was developed by the ITU-T, is an “umbrella” specification of the sort just mentioned. H.323 “intersects” the bearer plane as well as the control plane. In particular, this standard covers codecs; support for G.711 voice is mandatory, whereas support for video is optional. Any H.323 terminal that does have video capabilities must, at a minimum, support the Quarter Common Interchange Format (QCIF) format specified in [2]. The standard lists other recommended audio and video codecs. Another interesting thing about H.323 is that it incorporates ITU-T standards (e.g., the standards mentioned in the previous paragraph) as well as IETF specifications. In the bearer plane, for example, encoded audio and/or video streams are transported as RTP payloads. We have RTP (along with RTCP) running over UDP/IP. H.323 Terminology
In the H.323 lexicon, the end user’s communication device is called a terminal. Wherever an H.323 network is connected to a legacy circuit-switched network, a gateway takes care of the necessary interworking (in the bearer plane as well as the control plane). A gatekeeper controls other network elements such as terminals, gateways and multipoint controllers, which in turn exert control over multiparty conferences. Gateways, gatekeepers, and multipoint controllers are not mandatory —it is possible to implement the scenario of Figure 11.1 with two terminals and no other H.323 entities. 11.2.1
Heritage of H.323: ISDN
Intelligent end systems are de rigueur in data networking. Evolution toward intelligent terminals is not an entirely new concept when it comes to telephony, either—
11.2
The H.323 Protocol Suite
149
this was the goal of ISDN. ISDN call-control signaling is specified in ITU-T recommendation Q.931 [3]. Q.931 signaling is used as a point of departure not only by H.323 but also in ITU-T’s development of its ATM User Network Interface. The two ISDN phones in Figure 11.2 are served by different telco switches. Between the phones and their serving switches, we have ISDN signaling. The two switches signal each other using ISUP messages. To help the reader keep track, we labeled the signaling domains at the top of the diagram. ISDN and ISUP are not exactly the same, but the ITU-T did make every effort to harmonize these standards. So SETUP is very similar to (and compatible with) IAM. ALERTING and ACM serve much the same purpose, as do CONNECT and ANM, and so on. Note that the originating phone receives two “progress alerts”: CALL PROCeeding and ALERTING. The originating switch sends the former to let the calling phone know that it has received the SETUP and is trying to contact the far-end switch. ALERTING, on the other hand, means that the far-end telephone has been contacted and is processing the SETUP (for example, it may be ringing). Figure 11.2 looks “asymmetric” in the sense that no RELEASE/REL COM interchange takes place between the destination switch and the called party’s phone. In the figure, we assume for the sake of discussion that the calling party hangs up first. This is what triggers his/her phone to send a DISCONNECT message to the originating switch, which in turn RELeases the ISUP trunk, and so on. At the end of this example, the called party’s phone is not yet “on hook.” 11.2.2
H.323 Call Control and Media Control Signaling
H.323 signaling is specified in recommendations H.225.0 [4] and H.245 [5]. The content of the former is further subdivided into two major pieces: call-control signaling (this is similar to Q.931, on which it is based) and registration, admission, ISDN ISDN phone
ISUP Originating switch
ISDN ISDN phone
Destination switch
SETUP CALL PROC ALERTING CONNECT
IAM SETUP ACM ANM
ALERTING CONNECT
...telephone conversation... DISCONNECT RELEASE REL COM
Figure 11.2
ISDN call flow.
REL RLC
DISCONNECT
150
Session Control
and status (RAS) signaling. H.245 is used for tasks like codec negotiation between endpoints. In Figure 11.3 we display an H.323 signaling flow. Since H.225.0 and H.245 are both present, we identify the pertinent protocol alongside each message name. As in Figure 11.1, we make the simplifying assumption that the terminals signal one another directly. Note that no RAS signaling appears in this diagram. The numbering of the messages is intended to help the reader “map” the H.225.0 messages to the generic flow that appears in Figure 11.1. The fact that the calling party receives two “status update” messages (i.e., CALL PROCeeding and ALERTING) comes directly from H.323’s ISDN “roots.” This makes more sense when intermediate switches are present in the signaling flow (as in Figure 11.2). Note also that the H.225.0 messages in Figure 11.3 have the same names as their ISDN counterparts in Figure 11.2. In this example, codec negotiation and bearer establishment are the province of H.245. H.323 also defines a mode in which the necessary information about terminal capabilities is piggybacked on the H.225.0 call-control messages. 11.2.3
Talking to the Gatekeeper: RAS Signaling
By itself, the functionality reflected in Figure 11.3 is impractical for all but the tiniest deployments. Large deployments necessitate additional capabilities such as address translation, authorization, and admission control. H.323 gatekeepers are responsible for these functions and a few others: •
Address translation makes it possible to place a call using a phone number, e-mail address, or H.323 URI. The gatekeeper is responsible for resolving these IDs to (IP address, port number) pairs.
•
Call authorization and admission control are both involved in deciding whether requests will be granted. The former covers things like basic registration, security, who is allowed to call whom, and so on. Admission control
Originating terminal
Destination terminal (1) H.225.0 SETUP (2a) H.225.0 CALL PROC (2b) H.225.0 ALERTING (3) H.225.0 CONNECT (4 and 5) H.245 Session establishment
...voice and/or video packets flow... (6) H.245 Session release (7) H.225.0 RLC
Figure 11.3
Simplistic H.323 session control flow.
11.2
The H.323 Protocol Suite
•
•
151
determines whether adequate resources are available to serve a given call request. By definition, a zone is the set of endpoints managed by a single gatekeeper. Zone management takes care of tasks like adding a new endpoint to a zone and removing an endpoint from a zone. Call management controls things like call-forwarding behavior.
We give an example of H.225.0 RAS signaling in Figure 11.4. In the interest of brevity, this figure only shows call authorization (this is the Registration Request/ Registration Confirm interchange) and admission control (the Admission Request/ Admission Confirm interchange). The signaling flows of the two figures in this section should actually be dovetailed as follows. In a network governed by a gatekeeper, both terminals must complete the registration shown in Figure 11.4 (and the originating terminal must obtain permission to place the call in the form of an Admission Confirm message) before the flow of Figure 11.3 can begin. The destination terminal does not send its Admission Request until it receives the SETUP message from the originating terminal. In effect, the destination terminal is asking for permission to answer the call; once it receives the Admission Confirm from the gatekeeper, the remaining steps in the call control flow of Figure 11.3 proceed as shown. In particular, the gatekeeper does not have to be involved. Note also that the H.225.0 SETUP and CALL PROCeeding messages do not have to traverse the gatekeeper. Why Is RAS Signaling Necessary?
IP networking is typically a more dynamic environment than that of traditional telephony, and RAS signaling is necessary to “fill in the gaps.” Endpoints might often be moved from one zone to another, their identifiers might take a variety of forms, and so on. H.323 endpoints must be able to signal their identities to their gatekeepers, due to the simple fact that “nailed up” connections are atypical. 11.2.4
Evolution of H.323
H.323 version 4, which appeared in 2000, brought major enhancements that made it more practical for large telco deployments. The security framework is more robust than in previous versions. Some pragmatic tunneling capabilities were also
Gatekeeper
Terminal H.225.0 Registration Request H.225.0 Registration Confirm H.225.0 Admission Request H.225.0 Admission Confirm Figure 11.4
H.323 RAS signaling.
152
Session Control
added. With this release, H.323 (and the “component” protocols under its aegis) reached a level of relative maturity. At the time of this writing, version 5 is in the final stages of approval. Version 5 is being characterized as a maintenance release that solidifies the H.323 protocol family’s basic functionality. The interested reader can keep track of the latest developments by consulting www.h323forum.org. Tunneling
Suppose two circuit-switched telephone subscribers want to talk to each other, and the optimal bearer path traverses an intermediate H.323 domain. Imagine, for example, that two PBXs are interconnected via an H.323 network. Early versions of the H.323 standard did not address this scenario directly. Version 4 introduced a means to “tunnel” ISUP messages (i.e., encapsulate entire ISUP messages within H.225.0 payloads) through an H.323 domain. This makes the presence of the intermediate H.323 domain transparent to ISUP entities that signal each other across that domain. Tunneling capabilities are also specified for PBX signaling protocols.
11.3
SIP Basics We begin our discussion of SIP by casting our generic signaling flow in terms of SIP messages. The result appears in Figure 11.5. As in Figures 11.1 and 11.3, this example’s simplicity is deceptive: it does not reflect the challenges one encounters in large scale deployments. There is also a major contrast: Figure 11.1’s steps (4) negotiate terminal capabilities and (5) establish bearer apparently lack counterparts in the SIP call flow of the current section. As we will see later, the initial SIP INVITE’s payload usually contains information about supported codecs. So the terminal capability negotiation is not really missing; in fact, it begins right away. Recall from Section 11.2.2 that H.323 offers a similar option (notwithstanding the fact that Figure 11.3 shows “dedicated” terminal negotiation messages).
User agent 2
User agent 1 (1) INVITE (2) 180 Ringing (3) 200 OK ACK
...voice and/or video packets flow... (6) BYE (7) 200 OK
Figure 11.5
Simplistic SIP signaling flow.
11.3
SIP Basics
153
The establish bearer step is missing from SIP. In cases where resources are explicitly allocated to sessions, that process is carried out by other protocols; SIP has no resource reservation capabilities. User agent 1 does, however, confirm receipt of the 200 OK response (to its own INVITE message) with an ACK message. The “INVITE-200 OK-ACK” exchange is called a three-way handshake. Another thing to take away from Figure 11.5 is that SIP endpoints are called user agents (UAs); these are roughly comparable with H.323 terminals. (There does, however, seem to be a difference in mentality: we think of a UA as a collection of software, whereas the word “terminal” makes us think “hardware.”) SIP is defined in RFC 3261 [6]; numerous other RFCs define extensions of SIP, make recommendations regarding its use with other protocols, and set forth use cases. Before delving into details, it is worthwhile to give a brief overview of what SIP is and what it is not. SIP’s design is loosely based on that of HTTP [7]). In RFC 3261, the authors subdivide SIP’s functionality into five major categories: •
User location: Which end system(s) will be involved in the session?
•
User availability: Are the called party(s) willing to communicate? User capabilities: What media should be employed for this session, and what are the associated parameter settings? Session setup: Once the previous questions about the user(s) have been resolved, this is the function that establishes session parameters for all parties. Session management: This is a catch-all that covers transfer/termination of sessions, modification of session parameters, and service invocation.
•
•
•
It is important to note that SIP does not provide these functions all by itself; it works in concert with other protocols. In particular, SIP per se does not know how to describe media types or set the associated parameters. SDP [8] is the current favorite for this task, but the authors of SIP purposely “left the door open” for other protocols to play this role. It bears repeating that SIP does not support resource reservation; in fact, it has no QoS “hooks” whatsoever. More than anything, SIP is a signaling framework. That is, SIP facilitates exchange of information among session participants. The exact type and format of that information varies with the intended application and with the protocols used in concert with SIP. SIP Identifiers
SIP users are usually “named” by URIs. SIP URIs resemble e-mail addresses prepended with the characters sip. Examples include: sip:antigone@greek_tragedies.com sip:antigone@greek_tragedies.com:5001.
Note from the second example that port numbers can be explicitly specified; if these are not present, well-known port numbers are assumed. Telephone numbers also make nice identifiers: tel:12025551212@washdc_gateway.com is an example (telephone URIs are defined in RFC 2806 [9]). Other types of URIs exist (and can be used with SIP); we do not present an exhaustive list.
154
Session Control
DNS functionality is used to resolve domain names in SIP URIs to IP addresses. (We discussed DNS in Section 7.3.3.) The resulting syntax is similar; the URIs of the previous paragraph might translate to: sip:[email protected] and sip:[email protected]:5001.
The latter indicates that a SIP entity will be listening on port 5001 at the IP address shown and that the name of the associated UA is antigone. 11.3.1
SIP Requests and Responses
There are two types of SIP messages: requests (which are also called methods) and responses. Actually, RFC 3261 does make a distinction between request and method by saying that a method is “the primary function that a request is meant to invoke on a server.” However, we will not be careful to maintain this distinction. By definition, a SIP entity plays the role of client when it generates a request and plays the role of server when it responds to a request. To avoid confusion, it is important to be aware that SIP entities, including user agents, routinely play both roles. To clarify this point, let us refer to Figure 11.5. When user agent 1 sends the initial INVITE, it plays the role of client; user agent 2 acts as server when it sends 180 Ringing and 200 OK messages. At the end of the signaling flow, the roles are reversed: when user agent 2 sends the BYE message (which SIP defines as a request), it acts as a client. As of this writing, 13 SIP methods are defined in standards-track RFCs (i.e., there are 13 types of SIP requests). Table 11.1 lists the six methods defined in RFC 3261 itself. This serves to give the reader a sense of SIP’s “base” capability set. We already introduced INVITE, ACK, and BYE in Figure 11.5. (Strangely, SIP classifies ACK as a request.) We said that SIP relies on DNS functionality to resolve URIs to IP addresses. (It would be more accurate to say that SIP proxy servers rely on DNS functionality; we discuss proxy servers later.) How are IP addresses bound to URIs in the first place? The REGISTER method provides a way for a user agent to establish or dissolve such bindings dynamically. As a simple example, a subscriber may be REGISTERed when he/she arrives at work in the morning. The SIP URI resolves to the subscriber’s work IP address. Upon returning home in the evening, the subscriber’s UA updates the
Requests.
Table 11.1 SIP Methods Defined in RFC 3261 Method Name
Description
REGISTER INVITE ACK
Used to create bindings between a URI and one or more contact addresses. Used to initiate a session. Used by session originator to confirm receipt of a final response to its INVITE request. Used to abandon a request that is still pending. Used to terminate a session. Used to query the capabilities of a SIP server or client.
CANCEL BYE OPTIONS
11.4
SIP Functional Entities
155
URI-IP address binding to reflect the change in location. The SIP URI now resolves to the subscriber’s home IP address. To illustrate the usefulness of the CANCEL method, let us vary the example of the previous paragraph: suppose the subscriber REGISTERs at both locations simultaneously so that phones in both places will ring whenever a call comes in. SIP servers support such “forking” behavior. If the subscriber answers at one location, the server will CANCEL the INVITE sent to the other location. We said that, in the sample call flow of Figure 11.5, information about codec support is piggybacked on the initial INVITE message; if user agent 2 sees a codec it “likes” in the INVITE payload, it can specify this codec in its 200 OK message and the call can proceed. There is another way to go about it: user agent 1 could have queried user agent 2’s capabilities using the OPTIONS method and populated the INVITE based on the resulting information. (In this scenario, the OPTIONS request would precede the INVITE.) Table 11.1 does not tell the whole story: seven other SIP methods are defined outside RFC 3261. We defer discussion of these additional SIP methods until Chapter 12. SIP response status codes, of which there are many, can be grouped into six categories. Each status code consists of three decimal digits, with the first digit indicating the code’s category. “1xx” responses are known as provisional responses; they report on requests whose processing is not yet completed. All of the other response categories are final responses. Receipt of a final response indicates that processing of the associated request is now complete. The response categories are listed and briefly described in Table 11.2.
Responses.
11.4
SIP Functional Entities So far, we have only been exposed to UAs. These are the SIP entities in the end users’ terminals. SIP networks also feature proxy servers and redirect servers. Among
Table 11.2
SIP Response Message Categories
Response Code
Response Category
Description and Examples
1xx
Information
2xx
Success
3xx
Redirection
4xx
Client error
5xx
Server error
6xx
Global failure
Used to indicate status of a request in progress. Examples: 100 trying; 181 call being forwarded Self explanatory. Example: 200 OK Further action required. Examples: 301 moved permanently; 302 moved temporarily Receiving server could not process the request. Examples: 401 unauthorized; 404 not found Request processing failed although the request was valid. Example: 503 service unavailable Self-explanatory. Example: 600 busy everywhere
156
Session Control
other things, these elements locate “called” UAs on behalf of “calling” UAs—they are routing intermediaries, in other words. 11.4.1
Proxy Servers and Redirect Servers
The word server means more than one thing in the SIP lexicon. To avoid later confusion, we take some care here to distinguish between two uses of this term: •
Recall that every SIP message is either a request or a response; we say that SIP is a request/response protocol. In this context, server is simply a term for an entity that responds to a request. Various SIP entities, including user agents and proxy servers, can respond to requests (i.e., act as servers) as well as generate requests (i.e., act as clients). Thus, the “server” role is a transient one. To summarize, this use of the term refers to an entity’s role in a particular message exchange, not to its “role in life.”
•
A server is the source of something you need. Here are some concrete examples: file servers provide access to storage media, Web servers provide content, and authentication servers provide access to things like subscriber passwords. In this context, the “server” role is indicative of the device’s “mission in life” (that is, the role is static).
The terms proxy server and redirect server should be interpreted in light of the second context. The difference between the two is what they provide: •
Proxy servers forward requests and responses. For example, proxy servers forward INVITEs toward destination user agents. Proxy servers can also forward responses toward originating user agents. There are two types of proxy servers: Stateful proxies (temporarily) keep track of the requests they forward. Stateful proxies are further subdivided into transaction stateful and call stateful proxies. To describe the difference between the two, suppose we have a successful session. Roughly speaking, a call stateful proxy retains information about that session from INVITE until BYE. (Thus, the proxy must allocate memory for the purpose.) A transaction stateful proxy that is not call stateful “remembers” the INVITE until it receives the 200 OK response; this entity realizes that these two messages belong to the same transaction and, since 200 OK is a final response, discards all information about this transaction. When the BYE request comes along, the transaction stateful proxy regards it as something completely new. • Stateless proxies “forward ‘em and forget ‘em.” That is, stateless proxies do not retain any information about the SIP messages they forward. Redirect servers do not forward INVITEs. Instead, they respond to the inviting user agents (or their proxies) with information that will assist them in reaching the subscribers they wish to invite. The user agents or proxies are then responsible for reissuing the INVITEs. So redirect servers provide routing information. •
•
11.4
SIP Functional Entities
157
When a proxy server forwards an INVITE (from a UA, say) it might send a 1xx response (recall that 1xx responses are provisional responses, so the INVITE is still pending after the UA receives such a response). On the other hand, when a redirect server responds to an INVITE, it does so in the form of a 3xx response. Such a response is final, so after the INVITE’s issuer receives and processes the 3xx response, that INVITE transaction is finished. A new INVITE will then be issued (this INVITE is populated with information gleaned from the 3xx response). 11.4.2
Back-to-Back User Agents
We have seen that a stateless proxy is a signaling pass-through that maintains no awareness of SIP sessions. A back-to-back user agent (B2BUA) is an entity that resides at the opposite end of the spectrum, so to speak. The B2BUA’s job is to engage in separate SIP sessions with end users and fool those users (or rather the SIP software in their terminals) into thinking that they are participating in end-to-end sessions. One could say that B2BUAs terminate SIP signaling (but in a particular way). B2BUAs are far from stateless: they have to retain bindings between identifiers associated with “incoming” and “outgoing” sessions. To illustrate, suppose user 1 wants to INVITE user 2 to a SIP session and that there is a B2BUA situated between the two users. Rather than passing user 1’s INVITE to user 2, the B2BUA crafts a completely new INVITE. When user 2’s 200 OK comes back, the B2BUA must correlate that message to its own INVITE and then in turn to the original triggering INVITE received from user 1. Why would a carrier implement a B2BUA? As we will see in Chapter 12, SIP messages can accumulate a substantial amount of routing information as they travel from one proxy to another; there is also information that identifies end users. If a service provider’s users want to be anonymous, or if the service provider does not want to disclose addresses of SIP proxies within its network, then B2BUAs might be a pragmatic choice. B2BUAs can also be useful when it comes to traversing firewalls and NATs. 11.4.3
Registrars
One other SIP functional entity bears mentioning: this is the registrar. As the name suggests, this entity can accept and process REGISTER requests. Note that proxy servers, redirect servers, and registrars are all functional entities—a single network element might incorporate more than one of these entities. Before moving on to other protocols, we note that much more information on SIP appears in Chapter 12. Among other things, that chapter presents detailed SIP signaling flows. At this point, it may be worthwhile to glance ahead at Figures 12.1 and 12.2. These are the overview diagrams for our detailed signaling flows and may serve to flesh out the current discussion of SIP entities in the reader’s mind.
References [1]
Recommendation H.323 Version 4, Packet-Based Multimedia Communications Systems, ITU-T, 2000.
158
Session Control [2] [3] [4] [5] [6] [7] [8] [9]
Recommendation H.261, Video Codec for Audiovisual Services at px64 kbit/s, ITU-T, March 1993. Recommendation Q.931, ISDN User Network Interface Layer 3 Specification for Basic Call Control, ITU-TY, May 1998. Recommendation H.225.0 Version 4, Call Signaling Protocols and Media Stream Packetization for Packet-based Multimedia Communication Systems, ITU-T, 2000. Recommendation H.245 Version 8, Control Protocol for Multimedia Commmunication, ITU-T, 2001. Rosenberg, J., et al., RFC 3261, SIP: Session Initiation Protocol, IETF, June 2002. Fielding, R., et al., RFC 2616, Hypertext Transfer Protocol—HTTP/1.1, IETF, April 1998. Hanley, M., and V. Jacobson, RFC 2327, SDP: Session Descriiption Protocol, IETF, April 1998. Vaha-Sipila, A., RFC 2806, URLs for Telephone Calls, IETF, April 2000.
CHAPTER 12
More on SIP and SDP SDP [1] was produced by IETF’s Multiparty Multimedia Session Control (mmusic) working group. Incidentally, the original version of SIP (RFC 2543, now obsolete) also came out of the mmusic working group. But SIP activity mushroomed; as a result, a separate SIP working group was chartered.
12.1
A Detailed SDP Example Recall that SDP is text-based. Each line in an SDP session description takes the form =, where is always exactly one (case-sensitive) character. SDP can be used to convey a wide range of session information. Before examining the following session description in detail, let us note the following caveat: there is really no such thing as an SDP packet. SDP information is always encapsulated within another protocol’s packet (prime examples are Megaco, MGCP, and SIP). To underscore this point, SDP is sometimes called a “metaprotocol.” v=0 o=- 3240312009 3240312009 IN IP4 192.168.0.30 s=Standalone SDP Example c=IN IP4 224.0.12.17/15 t=3240312009 0 m=audio 10108 RTP/AVP 0 100 m=video 52170 RTP/AVP 31 a=rtpmap:0 PCMU/8000 a=rtpmap:100 telephone-event/8000 a=ptime:20 a=fmtp:100 0-11
The first line of the session description gives the SDP version number. The format of the “o=” (owner/creator and session identifier) line is o=<username> <session id> .
In the “o=” line, the “-” is a placekeeper; the username subfield is basically null. The SDP RFC recommends (but does not mandate) the use of Network Time Protocol (NTP)[2] time stamps for the session ID and version subfields. In this example, both fields are in fact NTP timestamps; this means that their values can be interpreted as the number of seconds since January 1, 1900. The point of all this is to ensure (or at least make it very likely) that the 5-tuple <username> <session id>
159
160
More on SIP and SDP
is a globally unique identifier for the session at hand, and moreover that distinct versions of this session will not be confused. The last three subfields say that the host lives in an IPv4 network and disclose its IPv4 address. The “s=” (Session Name) field is self-explanatory. The subfields of the “c=” (connection data) line, which are of the form
often coincide with the last three subfields of the “o=” line. The current example is an exception, however: the IP address shown is a multicast address (as are all IP addresses with a number between 224 and 239, inclusive, preceding the first “.”). The number after the “/”, which is called a TTL, says that no packet from this session shall traverse more than 15 IP routers. Without giving details on IP multicast, let us say that it is wise to include a TTL here to avoid congestion. The “t=” line consists simply of <start time> <stop time>; again these subfields are NTP time stamps, converted to decimal. This feature is useful for scheduled conferences. In cases where we do not wish to impose a time limit, a stop time of 0 is given. The SDP RFC says that, if the start time is also 0, the session is regarded as permanent. From our point of view, this overstates the case: it is reasonable to indicate start and stop times of 0 for sessions that will be set up and torn down dynamically (as we have done in the Megaco examples of Chapter 10). We do not really mean that such sessions are permanent, but only that they are not scheduled. The “m=” (media name and transport address) lines are easy to parse once one knows that their format is m=<media> <port> .
We see that there are audio and video streams assigned to UDP ports 10109 and 52170, respectively. Next we see that both streams are transported over RTP, and that the item(s) in refer to the AVP [3]. Looking first at the audio stream, the first payload type is 0, which AVP statically assigns to G.711. The second payload type, 100, is dynamically assigned; more on the latter in a moment. For the video stream, payload type 31 is H.261. Next we have several “a=” (media attribute) lines. In general, the format can be a= or a=:. The last four lines of our session description can be interpreted as follows: •
The first of the session attributes says that payload type 0 is G.711 (pulse code modulation, µlaw, sampled at 8,000 Hz); this actually reiterates the assignment given in the AVP so it is not, strictly speaking, necessary.
•
The second “a=” line binds payload type 100 to the RTP payload type “telephone-event,” which is defined in [4]. We have seen “a=ptime:20” before; this session attribute means that an RTP packet will be transmitted every 20 milliseconds. The last line of the entire session description conforms to the format a=fmtp: <list of values>, which is used for format-specific attributes. SDP is not concerned with the semantics of format-specific attributes, which are
•
•
12.2
A Detailed SIP Example
161
conveyed unchanged to the entities that will employ the formats in question. In this example, we are completing the task begun in the second “a=” line by listing which telephone-event types are supported. We will save the reader from suspense by saying that telephone-events 0 through 11 are associated with the DTMF “digits” 0-9, *, and #. 12.1.1
Additional Line Types
Most of the lines in this session description are mandatory. Exceptions include: •
The “c=” line can be omitted if the requisite information is included “in all media” (although we have never actually seen it omitted).
•
The SDP “syntax” does not require that any “a=” lines be present (although the sample session description would be ambiguous if those lines were stripped out).
The SDP specification defines a number of other line “types” that do not appear in our examples: they are either not relevant or the encapsulating protocol already has a means to specify the same content. As an example of the latter, we could specify an “a=sendrecv” attribute in SDP. But in our Megaco examples (see Section 10.3.8), we used the mode property in Megaco’s LocalControl descriptor to accomplish the same task. Moreover, this makes good sense: if all we wanted to do was change the mode of a termination, it would be wasteful to send an entire SDP description. Additional optional SDP types include an “i=” line type for free-format session information, “e=” (email address) and “p=” (phone number) types, a “b=” line type for bandwidth information, and a “k=” line type for conveying encryption keys. RFC 3312 [5] extends SDP by defining media-level attributes for quality of service. We introduce a few of these additional attributes and briefly discuss their use in Section 12.6.
12.2
A Detailed SIP Example In this section, we present SIP signaling flows and dissect the messages therein. Since SIP is text-based, we can print the messages verbatim and still make some sense of them (as we have done with MGCP, Megaco and SDP). The flows involve a proxy server and a redirect server, so they are more realistic than the oversimplified flow depicted in Figure 11.5. As in that earlier example, there are only two users; they are called Zebra and BrerRabbit. 12.2.1
Registration Procedures
The users must first register. In Figure 12.1, we display the registration process for Zebra. We do not display BrerRabbit’s registration, as it is entirely similar. Note the similarity of the current example to H.323 RAS signaling as discussed in Section 11.2.3.
162
More on SIP and SDP 192.168.0.30 port 5001
192.168.0.51 port 5060
192.168.0.50 port 5070
Proxy server
Zebra
Redirect server
192.168.0.50 port 5000 Brer Rabbit
REGISTER 100 Trying 200 OK Figure 12.1
REGISTER 200 OK
SIP registration.
We now display the first REGISTER message from the flow of Figure 12.1. Explanatory comments follow the message itself. REGISTER sip:jackalope.tri.sbc.com SIP/2.0 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK001 To: Zebra<sip:[email protected]> From: Zebra<sip:[email protected]> Call-ID: [email protected] CSeq: 1 REGISTER Max-Forwards: 70 Expires: 60000 Contact: <sip:[email protected]:5001;user=phone Content-Length: 0
To understand the messages in this flow, it helps to know that the proxy server is running on a machine with host name jackalope and is listening at the “well-known” port for SIP (which is 5060). Also, Zebra is a display name for the SIP URI sip:[email protected]. We have labeled the entities in Figure 12.1 with their IP addresses and port numbers. Our purpose is to assist the reader in interpreting “raw” IP addresses and port numbers that appear in the messages. Note that, because we only had three machines at our disposal, the registrar/redirect server and BrerRabbit’s UA ran on the same host. Each of the four entities appearing in the figure has a different port number, however, so focusing on port numbers may help avoid confusion. The first line of the message above exemplifies the following general request-line format. SIP-Method Request-URI SIP-Version
Zebra specifies in its Request-URI that it wants its proxy server to “field” the request. Zebra does not need to know where the registrar is, or even whether the registrar and proxy server are running on different machines. This message comes from a UA residing at the IP address and port number shown in the Via: line. This message asks the registrar to bind the SIP URI in the From: line to the SIP URI in the Contact: line. Thus it binds the former URI to a User Agent (UA) running at the IP address and port number contained in the latter URI. The Expires field is expressed in seconds; Zebra’s UA is asking that the registration endure for this amount of time (unless a subsequent message explicitly changes it). The Content-Length: line indicates that there is no additional payload; the SIP
12.2
A Detailed SIP Example
163
header is the entire message. We postpone discussion of the other fields for nowtheir meanings are a bit easier to explain in later messages. The proxy server sends a provisional response to Zebra; the first line of that message is SIP/2.0 100 Trying. The general response-line format is SIP-Version Status-Code Reason-Phrase
Rather than reproduce the response message in its entirety, let us say that its remaining lines are exact copies of lines from the preceding REGISTER message (namely, the Via:, To:, From:, Call-ID:, CSeq:, and Content-Length: lines). The remaining lines from the REGISTER message are omitted from the 100 Trying response. Zebra’s UA knows that this response pertains to its pending REGISTER request because the From, Call-ID, CSeq and Via fields match. (In fact, for any response, these fields must be identical to their counterparts in the associated request.) Next, the REGISTER message is forwarded by the proxy server. REGISTER sip:192.168.0.50:5070 SIP/2.0 Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK002 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK001 To: Zebra<sip:[email protected]> From: Zebra<sip:[email protected]> Call-ID: [email protected] CSeq: 1 REGISTER Max-Forwards: 69 Expires: 60000 Contact: <sip:[email protected]:5060> Contact: <sip:[email protected]:5001;user=phone> Content-Length: 0
The first Via: line in the message is new; it has been added by the proxy server. In this example, the proxy server knows (as a result of explicit configuration) the registrar/redirect server’s IP address and port number. These fields are “hard-coded” in the first Via: line. (We doubt that this is typical of live network deployments.) The second Via: line is unchanged from the first REGISTER message. Looking collectively at the two REGISTER messages we have seen so far, a pattern emerges. Keep in mind that these two messages are “incarnations” of the same request; the Via: line(s) reveal the path that the message has traversed thus far. This information will be used in routing responses. Note that a similar change has been made to the Contact: information. That is, the first Contact: line is new; the second Contact: line has been copied across from the previous REGISTER message. The To:, From:, Call-ID:, and CSeq: lines are exactly the same as in the earlier REGISTER message. Note that the Max-Forwards value has been decremented (so its function is just what the name implies). The registration attempt is successful, as indicated by the following message from the registrar to the proxy server. SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK002 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK001 To: Zebra<sip:[email protected];tag=c666e2c8 From: Zebra<sip:[email protected]
164
More on SIP and SDP Call-ID: [email protected] CSeq: 1 REGISTER Expires: 3600 Contact: <sip:[email protected]:5060> Contact: <sip:[email protected]:5001;user=phone> Content-Length: 0
Before forwarding the 200 OK message to Zebra, the proxy server strips out the first Via: line and the first Contact: line. These are the only changes the proxy server makes. 12.2.2
Making a Call
Now that our users have registered, they can make calls. For example, Zebra can call BrerRabbit, as illustrated in Figure 12.2. In the interest of space, we dropped the hosts’ IP addresses from the figure. As discussed earlier, the port numbers (which remain in the figure) are more useful to the reader anyway, since they uniquely identify the entities participating in the call flow. Let us start by looking at the INVITE that the proxy server forwards toward the redirect server. (In the figure, this is the second INVITE from the top.) INVITE sip:[email protected]:5070;user=phone SIP/2.0 Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK004 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK003 To: BrerRabbit<sip:[email protected]:5060;user=phone> From: Zebra<sip:[email protected]:5001;user=phone> Call-ID: [email protected] CSeq: 1 INVITE Max-Forwards: 69 Subject: Let’s talk about SIP Record-Route: <sip:[email protected]:5060;maddr=192.168.0.51>
port 5001
port 5060
port 5070
port 5000
Zebra
Proxy server
Redirect server
Brer Rabbit
INVITE 100 Trying
INVITE 302 Moved Temporarily ACK INVITE
180 Ringing
200 OK
200 OK ACK BYE
ACK ...voice and/or video packets flow...
200 OK
Figure 12.2
180 Ringing
Making a SIP “call.”
BYE 200 OK
12.2
A Detailed SIP Example
165
Contact: <sip:[email protected]:5001;user=phone> Content-Type: application/sdp Content-Length: 201 v=0 o=- 3240312009 3240312009 IN IP4 192.168.0.30 s=c=IN IP4 192.168.0.30 t=3240312009 0 m=audio 10108 RTP/AVP 0 100 a=rtpmap:0 PCMU/8000 a=rtpmap:100 telephone-event/8000 a=ptime:20 a=fmtp:100 0-11
The Via: lines in this message are similar to those of the second REGISTER message we examined in Section 12.2.1. The interpretation is the same: the request emanated from Zebra’s UA and went from there to the proxy server, which inserted the first Via: line. The To: and From: lines are intuitive here: they just say that Zebra is trying to call BrerRabbit. Note that the To: line essentially characterizes BrerRabbit as a user that is registered with the proxy server. This is because Zebra does not know where BrerRabbit currently is, or that the proxy server is not the registrar. It just relies on the proxy server to get the necessary information. The value of the Call-ID field is different here than in the registration procedure of the previous section. That signaling interchange was completed with the last 200 OK, and the old Call-ID referred to that interchange. (Thus the name Call-ID is a bit of a misnomer.) The first INVITE in Figure 12.2 kicks off an entirely new dialog. (Here note that RFC 3261 has a formal definition of the word dialog, but that we are being somewhat loose with this term.) This message features some fields we did not encounter in the previous section. Just like an e-mail message, an INVITE can have a subject; this is a text string that the receiving UA can display to the INVITEd user. This is also the first time we have seen Record-Route: and Content-Type: lines. The Record-Route: line was inserted by the proxy server. In so doing, it has asked that future requests in the same dialog be routed through it. The Content-Type: line indicates that an SDP payload follows the header; the Content-Length: field indicates that this payload is 201 bytes long. Notice the blank line between the SIP header and the SDP payload. In fact, every SIP header must end with a blank line, but this was hard to see in our previous message displays (because all of them had Content-Length 0). Next, the redirect server responds with a message that looks like this: SIP/2.0 302 Moved Temporarily Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK004 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK003 To: BrerRabbit<sip:[email protected]:5060;user=phone; tag=f8505b69> From: Zebra<sip:[email protected]:5001;user=phone> Call-ID: [email protected] CSeq: 1 INVITE Contact: <sip:[email protected]:5000;user=phone> Content-Length: 0
166
More on SIP and SDP
The crucial piece of information in this 302 Moved Temporarily appears in the Contact: line. This is where the registrar/redirect server indicates that BrerRabbit is currently registered at the IP address shown, and that its UA is listening on port 5000. Recall that 3xx responses are final, and therefore the INVITE is no longer pending. The proxy server ACKnowledges the 302 Moved Temporarily response before moving on. After ACKnowledging the redirect server’s message, the proxy server generates a new INVITE as follows: INVITE sip:[email protected]:5000;user=phone SIP/2.0 Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK005 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK003 To: [email protected]:5060;user=phone From: [email protected]:5001;user=phone Call-ID: [email protected] CSeq: 1 INVITE Max-Forwards: 69 Subject: Let’s talk about SIP Record-Route: [email protected]:5060;maddr=192.168.0.51 Contact: <sip:[email protected]:5001;user=phone> Content-Type: application/sdp Content-Length: 201
...SDP payload goes here... Note that we have omitted the SDP payload from this message, as it is identical with that of the previous INVITE. In fact, the new INVITE differs from its predecessor in only two ways: •
In the request-line, the Request-URI now reflects BrerRabbit’s current location.
•
The branch parameter in the first Via: line is new. The INVITE that the proxy server received from Zebra’s UA is still pending, even though the INVITE previously sent by the proxy server is not pending. Thus we can think of the proxy server’s newly issued INVITE as a new branch in its continuing efforts to process the still-pending INVITE that began this signaling interchange.
We do not display details of the 180 Ringing message that is originated by BrerRabbit’s UA and then forwarded by the proxy server. The following 200 OK response indicates that the session setup procedure is successful. SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.0.51:5060;branch=z9hG4bK005 Via: SIP/2.0/UDP 192.168.0.30:5001 To: BrerRabbit<sip:[email protected]:5060;user=phone>;tag=771dcb18 From: Zebra<sip:[email protected]:5001;user=phone> Call-ID: [email protected] CSeq: 1 INVITE Record-Route: <sip:[email protected]:5060;maddr=192.168.0.51> Contact: <sip:[email protected]:5000> Content-Type: application/sdp Content-Length: 147
12.2
A Detailed SIP Example
167
v=0 o=- 3240316793 3240316793 IN IP4 192.168.0.50 s=c=IN IP4 192.168.0.50 t=3240316793 0 m=audio 10042 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=ptime:20
Once the 200 OK is forwarded by the proxy server and the ACKnowledgment makes its way in the other direction (again, details are omitted), the voice conversation begins. When the conversation is over, Zebra initiates the call termination sequence by issuing the following BYE request: BYE sip:[email protected]:5060;maddr=192.168.0.51 SIP/2.0 Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK006 To: BrerRabbit<sip:[email protected]:5060;user=phone>;tag=771dcb18 From: Zebra<sip:[email protected]:5001;user=phone> Call-ID: [email protected] CSeq: 2 BYE Max-Forwards: 70 Route: <sip:[email protected]:5000> Content-Length: 0
The Call-ID field has remained constant throughout the entire signaling flow. That is how BrerRabbit’s UA knows which call is being terminated. Note that the CSeq field is different, however. It now refers to the BYE rather than to an INVITE (or any other SIP message that preceded the exchange of RTP packets). The Call-ID and CSeq are identical to those of the very last message in the flow: SIP/2.0 200 OK Via: SIP/2.0/UDP 192.168.0.30:5001;branch=z9hG4bK007 To: BrerRabbit<sip:[email protected]:5060;user=phone;tag=771dcb18 From: Zebra<sip:[email protected]:5001;user=phone> Call-ID: [email protected] CSeq: 2 BYE Content-Length: 0
12.2.3
The Offer/Answer Model
SDP does not specify the manner in which session parameters should be negotiated. The use of SIP and SDP together to conduct such negotiations proceeds according to the offer/answer model of RFC 3264 [5]. The first SDP session description that is sent in a SIP dialog is called the offer; the SDP session description that is sent in the opposite direction is called the answer. The codec negotiation is not complex in the previous example: Zebra includes the offer (which only contains one codec) in its INVITE, and BrerRabbit accepts that choice of codec in its answer, which is included with its 200 OK response. Since it must be ACKed, we say that 200 OK is a reliable response. If the sender of a 200 OK does not receive an ACK, it will suspect that something is wrong. Contrast this with the 100 Trying response, which is never ACKed, and the 180 Ringing response,
168
More on SIP and SDP
which is not ACKed in the foregoing call flow. The SDP “answer” cannot be included in an unreliable SIP response; if that response is lost in transmission, the intended recipient will have no way to know. In the call flow of Figure 12.2, the offer could have been omitted from the INVITE and included instead in BrerRabbit’s 200 OK. In that case, Zebra’s answer would have been included in its ACK. This would have allowed the session setup to progress to the point where BrerRabbit’s willingness to engage in a session had been established. Note the following regarding this alternative: even though Zebra is still the initiator of the SIP session, it plays the role of answerer in the codec negotiation. Let us also make the point that BrerRabbit expects an ACK from Zebra (it is mandatory). So, if the ACK is lost in transmission (and the “answer” SDP session description is lost along with it), BrerRabbit will realize that something is wrong. After reading Chapter 10, it may seem strange that there is no “mode” parameter—recall that the RTP termination on the first media gateway was initially in recvonly mode. That termination could only be upgraded to sendrecv once a target IP address and port number was obtained from the second media gateway. SIP session parameter negotiation can also be more complicated than in the flow we just presented. We will see an example in Section 12.6.
Alternative version of the codec negotiation.
12.3
Forking of SIP Requests SIP allows requests such as INVITE to be forked. This means that a proxy could forward an INVITE toward multiple UAs. Branch parameters and the CANCEL method are crucial for supporting a flexible forking capability. The proxy uses branch parameters to keep track of the different outgoing forks associated with a single incoming INVITE. CANCEL is important for “pruning the dead ends,” as we will soon see below. Forking capability is useful, for example, in the case of a “follow me” service (see Section 1.2). A subscriber might want to have phones ring at work and at home. Forking can be done in parallel (the phones at both locations ring simultaneously; when one is answered, the proxy sends a CANCEL request toward the other phone) or sequentially (e.g., the office phone rings first; if unanswered, the proxy CANCELs that fork and sends an INVITE toward the home phone).
12.4
SIP for Interswitch Signaling In the Megaco and MGCP signaling flows of Chapter 10, the media gateways were part of the same softswitch. What happens when a call spans two softswitches? If the softswitches are connected by TDM trunks, the signaling proceeds just as if one (or both) of the switches were a circuit switch instead. One would expect packet connectivity between softswitches, however; in this case, SIP is an appropriate vehicle for interswitch signaling. In the call flows from Chapter 10, media gateways use controllers as intermediaries instead of
12.4
SIP for Interswitch Signaling
169
communicating session information directly to one another. Here, the situation is quite similar. In Figure 12.3, we have redrawn the Megaco call flow of that earlier chapter (see Figure 10.3). The media gateway controller in the middle has bifurcated so that there are now two intermediaries: MGC 1 (which controls MG A) and MGC 2 (which controls MG B). We see that the SIP three-way handshake is interspersed with the Megaco portions of the call setup. In response to MGC 1’s Add request, MGC 1 fills in the missing media parameters for the termination that faces MG B. (By way of review, MG A supplies the IP address and port number in the Local descriptor of its Megaco reply.) The resulting “fleshed out” SDP description is encapsulated in the SIP INVITE for transmission to MGC 2, which passes that description on to MG B. (More precisely, MG B learns of MG A’s IP address and port number by looking at the Remote descriptor in the Megaco Add transaction it receives from MGC 2.) Comments
If the bearer path traverses more than one MG in either softswitch (or both softswitches), there is additional Megaco signaling that is not shown. However, the portion of the call flow that is shown is largely the same either way. Of course, we could just as easily have employed MGCP signaling within one or both of the softswitches. MGC 1 and MGC 2 view each other as SIP proxies. There could be additional SIP proxies between the two (not shown in the figure for reasons of simplicity). Numerous configurations are possible, including variations on the SIP flow of Figure 12.2. For example, MGC 1 might consult a redirect server directly to determine where to send the INVITE, receiving this information in the body of a 302 Moved Temporarily message. Or MGC 1 might send the INVITE to an intermediate proxy server that in turn consults a redirect server before forwarding MGC 1’s INVITE.
MG A
MGC 2
MGC 1
MG B
Add; Add/Reply INVITE
Add; Add/Reply
180 Ringing 200 OK
Modify/Reply
ACK ...voice packets flow... Subtract (x2)/Reply BYE 200 OK
Figure 12.3
SIP for interswitch signaling.
Subtract (x2)/Reply
170
More on SIP and SDP
12.4.1
Comparison with BICC
Recall that there is an alternative to SIP—BICC. When we discussed it (in Section 10.7.1), we said that BICC is based on ISUP. If we assume that BICC would be running atop a traditional SS7 stack, then BICC would have a clear disadvantage— namely, that it would rely on global title translation whenever call routing got complicated. We believe that SIP’s flexible “routing resolution” capabilities represent a more sensible option. If BICC were running atop some “flavor” of sigtran, SIP’s advantage might be slightly less clear. Any analysis of the competitive merits would depend on the protocol layers between SCTP and BICC. Without going into detail, we believe that SIP is still a better choice overall. Moreover, the industry seems to be heading in this direction. Note, however, that BICC has one clear advantage over SIP—built-in compatibility with ISUP.
12.5
Additional SIP Methods In Section 11.3, we discussed the SIP methods defined in RFC 3261 itself (in particular, see Table 11.1). Table 12.1 lists the remaining SIP methods currently defined in standards-track RFCs. In the table, we reference the defining RFC for each method. The MESSAGE method is included in the table for completeness; we will not comment on it further. One of the strengths of SIP is that it can be used to pass along information that it does not understand. The INFO method is used when applications wish to pass information back and forth, but the content is not of concern to intermediate SIP entities. Data encapsulated within an INFO message has no effect on the state of SIP session(s). In Section 12.7, we will see an example INFO usage in which the “applications” are ISUP entities. The provisional response ACKnowledgment, or PRACK, method is used in situations that call for provisional responses to be reliable. Recall that provisional responses such as 180 Ringing were not acknowledged in the signaling flows of Sections 12.2 and 12.4; in fact, the ACK method is used only for final responses.
Table 12.1
SIP Methods Defined Outside RFC 3261
Method Name
Defining RFC [ref]
Description
INFO
RFC 2976 [6]
PRACK
RFC 3262 [7]
SUBSCRIBE
RFC 3265 [8]
NOTIFY UPDATE
RFC 3265 [8] RFC 3311 [9]
MESSAGE REFER
RFC 3428 [10] RFC 3515 [11]
Carries information that is generated during a session and that relates to the control of that session by a non-SIP application. Used when reliable transmission of provisional responses is required. Used to request notification whenever specific events occur (e.g., changes in call state). Used to send notification of SUBSCRIBEd events. Used to furnish updated session information while an INVITE request is still pending. Used to transfer instant messages. Instructs recipient to contact a third party and provides the information necessary to do so.
12.6
Resource Reservation and SIP
171
The sender of a provisional response can ask for a PRACK by including a line of the form Require: 100rel in that response’s header. It is illegal to PRACK a 100 Trying response (or for the sender to ask for a PRACK). SUBSCRIBE and NOTIFY make it possible for SIP entities to be notified of events that take place in other domains (e.g., the PSTN). We elaborate on the SUBSCRIBE, NOTIFY, and REFER methods in Chapter 13. 12.5.1
UPDATEs and re-INVITEs
Suppose a party to a session wants to affect a change to the session parameters. If it is an established session (i.e., a success response to the initial INVITE has been received), then a new INVITE can be issued. This is often called a re-INVITE. If session establishment has not been completed, it is illegal to issue a re-INVITE. It is possible to request that session parameters be changed, however, by using the UPDATE method. Other than this difference in allowed usage, UPDATE and (re-)INVITE are entirely similar.
12.6
Resource Reservation and SIP The previous call flows in this chapter (see Figures 12.2 and 12.3) assume that adequate resources are available. Of course the call setup fails if this is not true; this is a distinct possibility, especially in deployments that feature explicit resource reservation and call admission control. With the call flows as they are drawn, we would have already rung the called party’s phone by the time it became clear that the call setup would fail. Consider Figure 12.3. The SIP portion of the call setup is particularly simple there: it is just the typical INVITE / 200 OK / ACK (with a 180 Ringing thrown in for good measure). Up until the ACK, the only message that MGC 2 receives from MGC 1 is the INVITE itself. If we wish to avoid spurious ringing on the called party’s phone, the MGCs must engage in additional dialog. Moreover, session parameters may be negotiated dynamically by the MGCs. (Recall that the MGCs see each other as SIP proxies.) RFC 3312 [12] addresses this problem. In Figure 12.4, we present a signaling flow that is compatible with the scheme proposed in that RFC. In this depiction, the interchange takes place between two User Agents (UAs), but it could just as well take place between two proxies acting on behalf of UAs. The flow of Figure 12.4 is applicable to SIP interswitch signaling scenarios: in such scenarios, MGCs regard one another as SIP proxies with UAs “behind them,” so to speak. In our example, the UAs use RSVP to reserve resources for the call. IP adheres to a unidirectional paradigm, so resource reservation for the UA 1 → UA 2 and UA 2 → UA 1 bearer directions requires two separate RSVP signaling interchanges. Moreover, RSVP signaling proceeds in the opposite direction of bearer traffic. Thus, UA 2 must send an RSVP Resv message toward UA 1 to reserve UA 1 → UA 2 bearer capacity (and vice versa). We annotate the call flow of the figure as follows:
172
More on SIP and SDP
UA 1
UA 2 INVITE 183 Session Progress
UA 1 sends RSVP
PRACK/200 OK
UA2 sends RSVP Resv--> UA 1
Resv --> UA 2 Resv rec’d
UPDATE 200/OK
Resv rec’d
180 Ringing 200 OK/ACK ...session now established... Figure 12.4
•
The INVITE contains UA 1’s initial SDP offer. Included in the offer are the IP address and port number that UA 1 wishes to assign to this call. UA 2 needs this information for two reasons: As a target address for bearer traffic once the session has been established. As a necessary parameter for RSVP signaling to allocate resources along the UA 1 → UA 2 bearer path. In the right-hand margin of the diagram, we indicate that UA 2 launches an RSVP Resv message toward UA 1 as soon as it gets the prerequisite addressing information. MGC 2’s initial response to the INVITE is a 183 Session Progress message containing its SDP answer. Now 1xx responses are provisional, and SDP offer/answer dialogs must be conducted reliably, so MGC 2 requests a PRACK (by including a Require: 100rel): • Once it receives the IP address and port number in UA 2’s answer, UA 1 can initiate RSVP signaling to reserve resources for the UA 2 → UA 1 bearer path. The PRACK and ensuing 200 OK are a pair (i.e., the latter is a final response to the former). Here we are simply following the rules pertaining to reliable provisional responses; no SDP descriptions are interchanged here. The UPDATE and 200 OK that follows it are also a request/response pair. However, these messages do contain SDP descriptions: • The Resv message initiated by UA 2 is propagated “upstream” toward UA 1, with routers along the way reserving the necessary resources hop by hop. Once the propagated Resv reaches UA 1, the UA 1 → UA 2 bearer reservation is complete. In effect, UA 1 says in its UPDATE message: “I can send now.” • By the time it sends its 200 OK message, UA 2 has received a propagated Resv (analogous to the above, this is the culmination of the RSVP signaling interchange initiated by the far end). Thus at this point UA 2 is also able to say: “I can send now.” At this point, the QoS negotiation is complete. • •
•
•
•
SIP call flow incorporating resource management .
12.6
Resource Reservation and SIP
•
173
The 200 OK and ACK at the end of the signaling flow represent the completion of the three-way handshake that began with the INVITE.
Not everything about the foregoing signaling flow is cast in stone. Suppose, for instance, that UA 2 had decided to send 200 OK (in response to UA 1’s UPDATE) before receiving the propagated Resv message. Then UA 2 would need to send an updated session description once the Resv arrived. It could do so in its subsequent 180 Ringing provisional response (but in this case the 180 Ringing would have to be PRACKed). This is quite reasonable, since our model in this example is that UA 2 should not alert its user of the incoming call until resource reservation in both directions has been confirmed. Another variant is that the initial offer could have been included in the 183 Session Progress instead of the INVITE. 12.6.1
QoS Attributes in SDP
RFC 3312 [12] extends SDP by introducing QoS attributes. Recall the “I can send now” statements that appear in the previous signaling flow description. These declarations are formally expressed using QoS attributes. We provide a glimpse of the new attributes by excerpting SDP media descriptions from the INVITE and UPDATE messages in our signaling flow. Note well that these are not complete SDP session descriptions; each only incorporates media level attributes. First, let us look at the media attributes contained in the INVITE: m=audio 10108 RTP/AVP 0 c=IN IP4 192.168.0.30 a=curr:qos e2e none a=des:qos mandatory e2e sendrecv
The syntax of the “m=” (media name and transport address) and “c=” (connection data) lines was covered in Section 12.1. A “c=” line that appears after an “m=” line applies only to the media stream associated with that “m=” line. In this SDP fragment, the “a=curr” line says that the current level of end-to-end QoS is “none”; resources have not yet been reserved in either direction. The “a=des” line says that the desired level of end-to-end QoS is sendrecv. That is, resources must be reserved in both directions (as indicated by the word “mandatory”). By the time it sends the UPDATE message, UA 1 has received an RSVP Resv message indicating that resources have been reserved along its downstream path toward UA 2. The “a=curr” line in the following SDP fragment therefore reflects that UA 1 is now authorized to send. m=audio 10108 RTP/AVP 0 c=IN IP4 192.168.0.30 a=curr:qos e2e send a=des:qos mandatory e2e sendrecv
UA 2 wants to know when it would be appropriate to alert its user of the incoming call. Therefore, in its 183 Session Progress message, UA 2 includes an “a=conf” line (not shown in the interest of simplicity) asking UA 1 to send confirmation
174
More on SIP and SDP
whenever its downstream bearer path has been set up. The second of these media descriptions is the desired confirmation. RFC 3312 also allows for endpoints to independently negotiate QoS parameters in their access networks. In such a scenario, separate lines with “local” and “remote” qualifiers would replace the single line with the “e2e” qualifier. For example, a=curr:qos local none a=curr:qos remote none
would replace the line a=curr:qos e2e none
in the SDP payload of the INVITE. If only one of the access networks implemented QoS negotiation, only the “local” or “remote” line would appear. 12.6.2
More on Parameter Negotiation
In the signaling flow of Figure 12.4, four SDP session descriptions are sent in all: two “offers” from UA 1 and two “answers” from UA 2. It is easy to imagine other scenarios that require more than a single offer/answer interchange. For instance, the offerer might list several codecs. The answerer could indicate which codecs it supports by responding with a subset of the original list. One more step would remain: the offerer would make the final codec selection and, finally, inform the answerer. RFC 3312 was evidently aimed at integrating RSVP within SIP signaling flows. The authors introduce a general notion of a precondition, say that they will only define “QoS” preconditions in the document in question, and finally give examples in which the user agents employ RSVP to make sure that the QoS preconditions are met before session establishment is completed. The model proposed by RFC 3312 does not mandate RSVP for resource management, however. Moreover, the specification leaves the door open for other types of preconditions. One might expect that the SDP extensions of RFC 3312 would be used in conjunction with MGCP and/or Megaco, or that there would be some other “formula” for integrating media gateway control with RSVP. Although there have been some Internet drafts addressing this general topic area, there are no active ones in the Megaco Working Group at the time of this writing (nor have any such drafts made it to RFC status).
12.7
SIP-T and Beyond It is important for SIP to interwork with PSTN signaling protocols. The SIP-T RFC [13] concentrates on interworking between SIP and SS7 signaling domains (more specifically, between SIP and ISUP). The status of this RFC is “best current practice.” For a simple two-party call, we can identify the following four possibilities:
12.7
SIP-T and Beyond
175
1. The “SIP bridging” case: origination and termination are both in the PSTN, but call control signaling must traverse an intermediate SIP domain. 2. PSTN origination—IP termination. 3. IP origination—PSTN termination. 4. IP origination—IP termination. Of these possibilities, SIP alone suffices only for the last; the other three require some degree of ISUP-SIP interworking. The SIP-T document provides signaling flows for the other three cases. SIP-T seeks to satisfy three main interworking requirements: •
Transparency: The presence of an intermediate SIP signaling domain should not alter the exchange between ISUP endpoints in any way. This requirement only makes sense in case 1, the SIP bridging case.
•
Routability: SIP messages must be routable in the SIP signaling domain purely on the basis of information in their SIP headers. This requirement applies to cases 1 and 2. Ability to transfer mid-call ISUP messages: This requirement applies to case 1.
•
Let us concentrate first on the SIP bridging case. SIP-T’s authors observe that there is not a one-to-one mapping between ISUP message fields and SIP message fields. Thus, any mapping of ISUP messages to SIP be imperfect and would generally entail loss of information. Thus, it would be difficult if not impossible to: •
Translate ISUP message contents to SIP at ingress to a SIP signaling domain.
•
Reconstitute the ISUP messages at egress from that SIP domain.
Thus SIP-T recommends that the transparency requirement be satisfied by encapsulating ISUP messages within SIP payloads. The encapsulation format is specified in RFC 3204 [14]. The philosophy behind the routability requirement is that SIP proxies cannot be expected to screen piggybacked ISUP content for routing information. Instead, ISUP information that is pertinent to routing is translated into a form that can be incorporated into the SIP header. ISUP defines information request (INR) and information (INF) messages. They allow one signaling exchange to ask for and receive information from another signaling exchange during a call. There are also messages that allow a call to be SUSpended (without tearing down the connection) and later RESumed. INR/INF and SUS/RES are rather obscure. In spite of their rarity, these message types are the motivation for the aforementioned mid-call requirement. For INR/INF and SUS/RES interchanges between ISUP protocol entities, the intermediate SIP domain is strictly a pass-through (recall that the mid-call requirement applies only to the SIP bridging case). In particular, no changes to SIP session state are implied. The SIP INFO method is employed for pass-through signaling. RFC 3398 [15], a standards-track RFC, continues in the direction laid out by SIP-T. It does so by fleshing out the correspondence between ISUP parameters and SIP message headers. The bulk of the document specifies the mapping between parameters of ISUP Initial Address Messages and SIP INVITE message headers. The
176
More on SIP and SDP
resulting benefit is that SIP proxies can make informed routing decisions. RFC 3398 is pertinent to the aforementioned cases 1 and 2 above (the two PSTN origination scenarios listed earlier in this section). Once an initial INVITE has been routed properly, there is typically enough information in the Via: and Route: headers for correct routing of later messages pertaining to the same session. RFC 3398 also discusses the mapping of SIP status codes to ISUP Address Complete Messages and Release Messages.
12.8
Authentication and Security SIP supports HTTP digest authentication [16]. In this scheme, a user agent would initially send an unauthenticated REGISTER or INVITE. A SIP proxy, registrar, redirect server, or far-end user agent would issue a challenge within the body of a 4xx final response (the details differ depending on whether the challenger is a proxy or another functional element). The UA would respond by incorporating its credentials into a new INVITE. Wireless networks already employ sophisticated authentication schemes. As wireless carriers move to deploy SIP-based services, they will base their authentication procedures on those existing schemes. We saw in Section 11.4.2 that information such as URIs and IP addresses can be hidden from the “outside world” by B2BUAs, but that such devices have to maintain state information in order to create the illusion of end-to-end SIP sessions. Let us look at some other ways to protect routing information and other message content. SIP messages contain URIs and IP addresses, both in the message headers and in SDP payloads. Secure/Multipurpose Internet Mail Extensions (S/MIME) [17] can be used for end-to-end encryption of SDP payloads. End-to-end encryption of entire SIP messages is possible: SIP’s secure version employs the same approach as (and an analogous URI scheme to) that of HTTP over TLS [18]. However, much of SIP’s usefulness is tied to its routing functionality: SIP proxy servers and redirect servers can be configured in a variety of ways. So end-to-end encryption is problematic because it defeats the purpose. TLS [19] can be employed on a “hop-by-hop” basis. In this approach, SIP functional entities maintain TLS adjacencies. SIP messages would be decrypted and examined by each proxy. The proxy would make its routing decision based on the decrypted content and then re-encrypt the SIP message before sending it on to the next proxy or UA (or other SIP functional element). Security is a major topic; we have not done justice to it. In closing this section, we remark that telcos are accustomed to the physical security associated with their SS7 infrastructures. Any large-scale move to a control plane based on SIP represents a drastically different security environment. We cannot overemphasize this point.
12.9
Further Reading Even though we devoted a fair number of pages to SIP, we have not come close to exhausting this topic.
12.9
Further Reading
177
It is evident from the foregoing discussion of SIP that address resolution and routing is crucial. RFC 3263 [20] gives details on the use of DNS to support SIP message routing. There was so much activity surrounding SIP that an additional IETF working group was chartered: the Session Initiation Protocol Investigation (sipping) working group. The sipping working group documents the use of SIP in applications; the original SIP working group remains focused on the essential functionality of the protocol. SIP-T came out of the sipping working group. Two recent sipping RFCs [21, 22], present a range of scenarios. Like the SIP-T RFC, these RFCs are categorized as best current practices. For readers who wish to delve further into SIP, we recommend books by Camarillo [23] and Johnston [24].
References [1] Handley, M., and V. Jacobson, RFC 2327, SDPL Session Description Protocol, IETF, April 1998. [2] Mills, D., RFC 1305, Network Time Protocol (Version 3) Specification, Implementation, and Analysis, IETF, March 1992. [3] Schulzrinne, H., and S. Casner, RFC 3551, RTP Profile for Audio and Video Conferences with Minimal Control, IETF, July 2003. [4] Schulzrinne, H., and S. Petrack, RFC 2833, RTP DTMF Digits, Telephony Tones, and Telephony Signals, IETF, May 2000. [5] Rosenberg, J., and H. Schulzrinne, RFC 3264, An Offer/Answer Model with the Session Description Protocol (SDP), IETF, June 2002. [6] Donovan, S., RFC 2976, The SIP INFO Method, IETF, October 200. [7] Rosenberg, J., and H. Schulzrinne, RFC 3262, Reliability of Provisional Responses in Session Initiation Protocol (SIP), IETF, June 2002. [8] Roach, A.B., RFC 3311, The Session Initiation Protocol (SIP) Specific Event Notification, IETF, June 2002. [9] Rosenberg, J., RFC 3311, The Session Initiation Protocol (SIP) UPDATE Method, IETF, September 2002. [10] Campbell, B., et al., RFC 3428, Session Initiation Protocol (SIP) Extension for Instant Messaging, IETF, December 2002. [11] Sparks, R., RFC 3515, The Session Initiation Protocol (SIP) Refer Method, IETF, April 2003. [12] Camarillo, G., W. Marshall, and J. Rosenberg, RFC 3312, Integration of Resouce Management and Session Initiation Protocol (SIP), IETF, October 2002. [13] Vemuri, A., and J. Peterson, RFC 3372, Session Initiation Protocol for Telephones (SIP-T): Context and Architectures, IETF, September 2002. [14] Zimerer, E., et al., RFC 3204, MIME Media Types for ISUP and QSIG Objects, IETF, September 2002. [15] Camarillo, G., RFC 3398, Integrated Services Digital Network (ISDN) User Part (ISUP) to Session Initiation Protocol (SIP) Mapping, IETF, December 2002. [16] Franks, J., et al., RFC 2617, HTTP Authentication: Basic and Digest Access Authentication, IETF, June 1999. [17] Ramsdell, B., RFC 2633, S/MIME Version 3 Message Specification, IETF, June 1999. [18] Rescorla, E., RFC 2818, HTTP Over TLS, IETF, May 2000.
178
More on SIP and SDP [19] Dierks, T., and C. Allen, RFC 2246, The TLS Protocol Version 1.0, IETF, January 1999 [20] Rosenberg, J., and H. Schulzrinne, RFC 3263, Session Initiation Protocol (SIP): Locating SIP Servers, IETF, December 2003. [21] Johnston, A.B., et al., RFC 3665, Session Initiation Protocol (SIP) Basic Call Flow Examples, IETF, December 2003. [22] Johnston, A.B., et al., RFC 3666, Session Initiation Protocol (SIP) Public Switched Telephone Network (PSTN) Call Flows, IETF, December 2003. [23] Camarillo, G., SIP Demystified, New York: McGraw-Hill Professional, 2002. [24] Johnston, A. B., SIP: Understanding the Session Initiation Protocol, 2nd ed., Norwood, MA: Artech House, November 2003.
CHAPTER 13
Implementing Services In this chapter, we discuss intelligent network (IN) and related approaches to providing telco services. Then we move on to SIP, where we cover SIP/IN hybrid services and SIP-based services. Finally, we describe Short Message Service (SMS), which provides text messaging capabilities in wireless networks. SMS is a departure from IN-style telco services. That is the main reason we cover SMS—we think it helps one to associate a broader connotation with the word “service.” We also draw some interesting comparisons between SMS and SIP.
13.1
Introduction Traditionally, development of new services in the telco realm has been expensive and time-consuming. Telecom service providers have come to recognize that this is a major problem: to be profitable, services must enjoy long, useful (and revenue-generating) lifetimes and mass-market penetration. Niche markets and rapidly evolving user requirements are therefore extremely difficult to accommodate. Telco switches have been programmable for quite some time. Initially, services were programmed directly on the switches. It would be nice if telco networks were programmable, as it would become much easier to develop and maintain new services. There has been progress toward this goal, but unfortunately it has been very slow. Many approaches to this problem have sprung up. One can achieve a rough (albeit imperfect) dichotomy as follows: •
Initiatives such as Telcordia’s AIN project and ITU-T’s IN standardization effort seek “programmability from the inside.” That is, they seek to centralize service capabilities at SCPs, which reside in carriers’ networks (and are not traditionally accessible from outside those domains). In this model, new services are developed by the SCP vendors (and/or the carriers themselves). WIN and CAMEL are variants that seek to extend IN concepts to the wireless realm.
•
Initiatives such as Parlay/OSA and JAIN seek “programmability from the outside.” Such initiatives aim to provide a secure environment that is “friendly” to third-party software developers.
179
180
Implementing Services
We will expand these acronyms as we go along. We remark here that the term “intelligent network” refers to a specific set of ITU-T standards and also to a more generic philosophy of service implementation. When it is not otherwise clear from context, the reader should assume we are referring to the latter. We begin our coverage by examining the generic IN philosophy. In the intelligent network approach, finite state machines running in telephone switches can be interrupted at specified points. Interrupts are not generated for every call but only when specified conditions are met. The service developer specifies the code (a.k.a., service logic) that is to be executed whenever an interrupt is generated. The intelligent network model allows the service logic to reside outside the serving switch.
13.2
SS7 Service Architectures: Intelligent Network Consider traditional telco services like calling name ID, call blocking, and toll-free service. How are these services provided in circuit-switched networks? To a large degree, the answer is that SCPs tell voice switches what to do. (Recall that SCPs reside in SS7 networks.) SCPs did not exist when many services were initially conceived and implemented. Early on in the era of digital switching, service logic was executed by the switches themselves. (Digital switches, whose large-scale deployment began in the 1970s, are programmable.) Hosting the service logic entirely on the switches proved to be a maintenance nightmare, however: logic for new services had to be installed on every switch. Interaction with previously existing services had to be debugged, and so on. Moreover, a service provider with a multivendor switching network had to go through this process separately for each vendor. Thus, the idea of controlling services from a central point was appealing. With the deployment of SS7 in the 1980s, that idea became realizable: the SS7 network provided a secure and reliable way for service control platforms to communicate with voice switches. Various organizations sought to establish a systematic way to move service logic out of voice switches, housing that logic instead in SCPs. We can characterize the classic example, toll-free service, as follows: •
The service is triggered by the originating switch (that is, the calling party’s serving switch) when it realizes that it does not have any routing information that is directly applicable to the called number.
•
Two types of special handling are required (note that these are both taken care of during call setup): • Alternative billing; • Assistance in routing the call.
If every desirable service shared these characteristics, life would be easy. But: •
Some services (e.g., call blocking) are triggered by the terminating switch.
•
There are services that require mid-call interaction. In a credit card call, for example, one can press and hold the “#” key to ask for dial tone (so as to place another call without reentering calling card number and PIN).
13.2
SS7 Service Architectures: Intelligent Network
•
181
Some tasks/services are not associated with calls at all. The MAP Update Location procedure covered in Section 8.7 is a case in point. By way of review, this procedure tells the home location register that a subscriber has “surfaced” in a given place. The procedure is therefore a crucial prerequisite for completion of mobile terminated calls (but the Update Location itself is not associated with any particular call).
The ability to manipulate individual call “legs” (e.g., to support three-way calls and conference calls) is also desirable. One of the early IN-type initiatives was launched by Bellcore (later renamed Telcordia). The Bellcore project, which was limited to the wireline environment, was called Advanced Intelligent Network (AIN). The goal of Bellcore’s AIN initiative was to develop a conceptual model that would support a rich array of services, and to shorten the development cycle for new services. ITU-T’s IN standards were inspired by AIN. The IN conceptual model, as standardized by ITU-T, is made up of several planes: two layers of abstraction are interposed between service functionality (from the end user’s point of view) and service implementation on physical devices. We elected not to list the (numerous) entities that reside in each plane: otherwise, there would be acronyms all over the place and prolixity would be unavoidable. However, we think it is worthwhile to briefly enumerate and describe the following planes: •
What features does the service offer to the end user? Its description in the service plane answers this question but does not say anything about implementation.
•
In the global functional plane (GFP), the software components used to implement the service are specified. The software components that live in this plane are called service independent building blocks (SIBBs). The GFP essentially pretends that the network is one machine. The distributed functional plane (DFP) is a more realistic representation of the network than the GFP. As we move from the GFP to the DFP, we admit that the former is oversimplified; here we model the network as a distributed entity. Network elements such as SCPs and voice switches reside in the physical plane.
•
•
Before going on, we should be careful to say that, in today’s networks, some service logic remains in voice switches. Not all services are equally amenable to the IN model—in some cases, it may be more trouble than it is worth to implement a service along these lines. Moreover, carriers may choose to maintain the status quo with switch-based services that were in place before IN was sufficiently mature. Throughout this section, our examples are given under the blanket assumption that services are implemented according to the IN model. The fact that this assumption sometimes differs from real-world implementations should not diminish the illustrative value of those examples.
182
Implementing Services
13.2.1
The Global Functional Plane
The developers of IN concluded that standardizing services was not a useful thing to do. One of the driving principles behind ITU-T’s IN effort was that it would be far more useful to standardize the capabilities that services rely on. Different services would use the same building blocks but in different ways. This notion is reflected by the presence of SIBBs in the global functional plane and is very much in line with the concepts of modularity and code reuse in object-oriented programming. A special SIBB known as the basic call process (BCP) is a FSM; we introduced FSMs in Section 6.4. The idea is that a separate copy of the BCP is instantiated for each call. At points of transition between the states, it is possible to interrupt the BCP and execute service logic before returning—that is, invoke other SIBB(s) with parameters appropriate to the situation at hand. Moreover, it is not necessary to return to the BCP at the point where the interrupt was generated. The call blocking example is a case in point: if the calling party is on the called party’s “do not accept” list, the call will be aborted (perhaps after playing an announcement to the calling party). There are SIBBs for a variety of functions, including authentication, user interaction, and charging. Except for the ability to interrupt the BCP at specified points, the SIBBs are “black boxes.” 13.2.2
The Distributed Functional Plane
The BCP is an abstraction; let us take a step toward implementation by going from the global functional plane to the DFP. As we do so, the BCP “morphs” into two basic call state models (BCSMs): the originating BCSM and the terminating BCSM. As the name suggests, the former (a.k.a., the O-BCSM) resides in the originating switch, whereas the latter resides in the terminating switch. So, at the DFP level, we explicitly recognize that distinct software processes are acting on behalf of the calling and called parties. The O-BCSM and the T-BCSM are finite state machines; thus, we have replaced the BCP by a pair of more granular abstractions. Recall the IN mentality: the service logic resides, outside the switch, in an SCP. Thus, when an “interesting” event occurs, we need a way to notify the SCP. Moreover, call processing may have to be suspended until the switch receives instructions from the SCP. In the case of a toll-free call, for instance, the originating switch does not know how to route the call. The O-BCSM generates an interrupt asking for routing information and suspends processing until that information arrives. In IN terminology, the interrupts are called detection points (DPs). We confess that there are some mixed metaphors in the previous paragraph: some of the entities mentioned reside in the DFP, whereas others live in the physical plane. To pave the way for our forthcoming discussion of SIP/IN interworking, it behooves us to align ourselves with IN jargon by speaking of service control functions (SCFs) instead of SCPs. SCFs live in the distributed functional plane. SCPs, which are instantiations of SCFs, inhabit the physical plane. Detection Points
DPs are associated with events that cause state transitions in the BCSMs. When execution reaches an armed DP, the SCF is informed of the event. Control may be
13.2
SS7 Service Architectures: Intelligent Network
183
transferred to the SCF (with processing in the BCSM suspended until the SCF returns control thereto; this is called a request), or the BCSM may continue processing (this is called a notify). We already saw an example request; we will see an example notify shortly. DPs are further classified as follows: •
A trigger DP is armed when a user’s service subscription is configured. For example, when a user subscribes to a call blocking service, a trigger DP in the T-BCSM is armed. When this DP is tripped, the SCF will check the calling party number against the called party’s “do not accept” list.
•
An event DP is armed dynamically by service logic during call processing.
Example. Implementation of a “ringback on busy” service requires us to have a trigger DP and an event DP at our disposal, as we now describe. When a subscriber to this service tries to call a busy line, the service logic is invoked by a trigger DP called O_Called_Party_Busy. The “O_” prefix refers to the originating BCSM; O_Called_Party_Busy was armed when the user subscribed to the service. When that trigger DP fires, the service logic must arm an event DP known as T_Disconnect in the terminating switch. (In effect, the service logic is asking the terminating switch to “let me know when the busy party hangs up.”) Both detection points in this example are of the “notify” variety. There is a vague similarity here to the usage of Megaco and MGCP’s Notify commands. We also point out the following distinction: recall that, in the Megaco Notify example of Section 10.3.9, the media gateway reported the occurrence of an off-hook event for an analog line. In the current example, it does not matter whether the line in question is analog- in the IN model, that aspect has been “abstracted out.” 13.2.3
IN Capability Sets
When it was released, IN capability set 1 (CS-1) represented a step forward. But CS-1 suffers from the following serious limitations: 1. It is geared toward a model where services reside within a single operator’s network. 2. It lacks support for interactions that are not associated with any call. 3. Multiparty calling, videoconferencing, and the like are largely out of scope. Given the fact that wireless networks are evolving much more rapidly than their wireline counterparts, items 1 and 2 point to serious shortcomings: roaming requires that services be supported across network boundaries and mobility management requires support for transactions not associated with calls. CS-2 addresses item 1 by allowing SCFs (which are potentially in different networks) to communicate among themselves. Regarding item 2 above, CS-2 introduces a call-unrelated counterpart of the Basic Call Process; improved support for teleconferencing is also added. CS-2 brought other enhancements, but in the interest of brevity we do not detail them here. CS-3 introduced a few new capabilities but was mostly a maintenance release.
184
Implementing Services
ITU-T proceeded to standardize CS-4, which “blesses” the IETF’s PINT and SPIRITS protocols. But by this time, many people felt that IN was running out of steam, so to speak. We cover PINT and SPIRITS in Section 13.6.1. 13.2.4
Limitations and Trade-offs of IN
To a degree, we could view IN as an effort to extend PBX-type services to the mass consumer market, both in the wireline and wireless domains. IN did meet with some success (notably for 800 number portability, followed by local number and finally wireless local number portability). But why was IN not a bigger success? In our mind, the following reasons stand out: •
SCPs did not allow the degree of vendor independence that telcos had hoped for. As a result of interoperability issues (e.g., vendor A’s SCF only works with vendor A’s voice switches), telcos often ended up buying separate SCPs from each of their switching vendors.
•
IN did not shorten the service development cycle as much as its proponents hoped.
•
The IN model is somewhat cumbersome. In our opinion, the class 5 switch is still a bottleneck: it instantiates a complex state machine for every call.
•
The telecommunications industry changed rapidly. As a result the bar was continually raised, and the cycle of standardizing a new capability set, developing, and finally deploying compliant products was too slow to keep pace. • The importance of interworking between different network environments in general (and wireless networks in particular) increased more quickly than the IN community could respond. • Messaging changed dramatically. Business users now need to manage e-mail, fax and voice mail messages. For many users, short messages and/or instant messages are added to the mix. The IN community did not anticipate the need for unified messaging. (To be fair, few people saw it coming.)
The telephone itself is part of the problem. Here we are referring to: •
The user interface: 12 buttons, accompanied in most cases by a tiny text display (or no display at all). It is difficult to configure and control a complex service via such a primitive interface. Thus, it is very difficult to create new services that are easy to use.
•
The lack of intelligence and signaling capabilities in the typical telephone. The redial button on an analog wireline telephone comes to mind. Although it serves a purpose, it is not very smart. Suppose a subscriber makes a call that is answered by a DTMF-driven menu system. The subscriber navigates through the menus to a certain point, perhaps entering a PIN for authentication somewhere along the way. But suppose the subscriber gets disconnected prematurely. The telephone does not know the difference between the called number and the DTMF digits that were entered after the call was answered. So, if the subscriber attempts to access the system once again by pressing the redial
13.3
CAMEL and WIN
185
button, the phone replays the entire string of digits that was entered since the last off-hook transition (or, rather, however many DTMF digits the phone can remember). Although it has not fulfilled its initial promise, IN is still useful for flexible call control in circuit-switched networks. We are interested mainly in the conceptual IN architecture, as previously described. For completeness, we mention that the protocol for IN signaling is called INAP. INAP is carried over SS7 links. Mobility in wireless networks poses problems that are not encountered in wireline environments. Next, we look briefly at two standards that apply the IN philosophy to these problems.
13.3
CAMEL and WIN Customized Applications for Mobile Enhanced Logic (CAMEL [1, 2]), an IN-like standard, is deployed in GSM networks. It reuses IN terminology and concepts. In particular, one speaks of arming CAMEL triggers, just as with IN. Initially, CAMEL’s primary use was to support roaming for prepaid wireless subscribers. As with IN, the developers had high hopes, and CAMEL has evolved through multiple capability sets. Some carriers now use CAMEL for more than just prepaid services. Overall, however, CAMEL appears to have limited horizons. To support prepaid roaming, it is necessary for a service control platform in one network to exert a degree of control over a voice switch in another network. In our discussion of IN, we saw that CS-1 does not support such internetwork control scenarios. CAMEL signaling interchanges are conducted between the home location register and the visiting location register. (We confess that we are oversimplifying a bit here.) Just as CAMEL was devised to solve problems in GSM networks, Wireless Intelligent Network (WIN) extends IN notions to solve mobility-related problems in North American CDMA and TDMA networks. CAMEL and WIN signaling is transported over SS7 links. For CAMEL, the signaling protocol is called CAMEL Application Part (CAP [3]). Message formats to support WIN signaling appear in the ANSI-41 standards.
13.4
Parlay/OSA The stimulus for Parlay (see www.parlay.org) was a regulatory mandate: British Telecomm (BT) had to allow other service providers to access its switches. BT was very concerned about security; as a result, it became a cofounder of the Parlay group, which sought to define open, secure interfaces for third-party access to telco networks. Parlay’s creators wanted to open the interface between the SCP and the switch to allow third-party developers to create new services. Moreover, Parlay specifies a framework for security (i.e., authentication and authorization for third-party applications wishing to access telco network resources) and service management (including service discovery).
186
Implementing Services
Another group, Open Service Access (OSA), embarked on a similar work program. Parlay and OSA decided to align their specifications; here we treat Parlay and OSA as a single combined entity. The Parlay group specifies application programming interfaces (APIs). One of the design goals of the APIs is to make it easy for software developers to create applications that involve telephony regardless of the developers’ level of telecommunications expertise. This is seen as a key to bringing innovation to the realm of telco service creation. Actually, the Parlay group maintains two sets of specifications: Parlay itself and a variant called ParlayX. The two specifications differ “under the hood”: Parlay employs CORBA as its middleware layer, whereas ParlayX uses XML. The ParlayX APIs are easier to use (but somewhat less powerful) than their Parlay siblings. A word on the difference between a protocol and an API may be in order here. The former specifies message formats for communicating parties. The latter specifies procedures that software developers can call within their code. The advantage of an API in the present context is that software developers can harness the capabilities of underlying protocol entities without getting involved in protocol stack “nuts and bolts.” For more on the distinction between protocols and APIs, we recommend the enlightening discussion in Mueller’s book [4].
13.5
JAIN Java Advanced Intelligent Network (JAIN) is an initiative aimed at Java software developers who lack in-depth telecom experience. Thus it bears some similarities to Parlay/OSA (although JAIN is not as comprehensive). Again, the idea is to stimulate innovation by making it possible for a larger community to participate in service creation.
13.6
SIP and Services Our purpose in this section is to make the case that SIP is useful for services. We start by covering PINT and SPIRITS, which enable services that straddle circuit-switched and packet-switched networks (using SIP in the process). It is telling that Igor Faynberg, cochair of PINT, has coauthored a book on intelligent networks [5]. 13.6.1
SIP and Intelligent Networks: PINT and SPIRITS
In this section, we look at the output of two IETF working groups: PSTN and Internet Interworking (PINT; this working group is concluded) and Services in PSTN requesting Internet services (SPIRITS, which is still active at the time of this writing). PINT’s aim is to enable IP-based services to request PSTN services. With SPIRITS, it is the other way around. Of the two, PINT came first. The main PINT RFC [6] introduced the SUBSCRIBE and NOTIFY methods (which were later absorbed into the “baseline” SIP specification—see [7]). Here is an example use of PINT functionality: a user visits a company’s Web site, fills out a form, and requests
13.6
SIP and Services
187
that a copy of the form be sent to a fax number. The PINT service might want to SUBSCRIBE to information about the success (or failure) of the fax delivery. When the fax transmission’s success/failure code becomes available in the PSTN, it is conveyed to the PINT service by means of a NOTIFY. Complex services will require bidirectional interaction, so we expect that PINT and SPIRITS will be used together more often than not. Moreover, exposition can be cumbersome if we insist on maintaining the distinction between the two. Therefore, we will not be careful to do so in the rest of this section but will instead just refer to SPIRITS. As is the case with IN, SPIRITS does not aim to specify services so much as it aims to specify flexible building blocks. However, it can be useful to have a paradigm example in mind. One of the paradigm examples in the SPIRITS architecture document [8] is Internet caller ID (ICID). Our primary purpose here is to familiarize the reader with SIP’s SUBSCRIBE and NOTIFY commands; ICID is well suited to that purpose. We now give a thumbnail description of ICID. A PC user wants to be notified whenever there is an incoming call to his/her phone. The notification, which includes the identity of the caller, is presented to the subscriber by an ICID client running on the PC. To realize this service, we clearly need SPIRITS functionality: the incoming call request “lives” in the PSTN, but the ICID client is running on a PC that is only reachable through the public Internet. How should the ICID service work? When the terminating switch becomes aware of the call request (e.g., by receiving an ISUP Initial Address Message), it must generate an event notification. This means that, somewhere along the way, a detection point must have been armed in the terminating BCSM. Let us assume that, at the request of the ICID subscriber, the ICID service logic arms an event DP. When that event DP fires, a message is sent to the ICID client on the subscriber’s PC, which in turn alerts the subscriber. In Figure 13.1, there are two SIP user agents: one inside the ICID end user’s PC, and the other inside (or alongside) the ICID SCF. The SPIRITS gateway is a SIP proxy. The SIP SUBSCRIBE and NOTIFY methods can be used to implement ICID in the following way: 1. The PC-based user agent issues a SIP SUBSCRIBE indicating that it wants to arm the appropriate DP in the Terminating BCSM. Thus we have labelled this User Agent as the SPIRITS SUBSCRIBEr (the capital letters indicate that we are referring to the User Agent’s role rather than to the end user of the ICID service). 2. The user agent in the SCF sends a 200 OK. After the SCF successfully arms the DP on the terminating switch, its user agent sends a SIP NOTIFY to inform the user agent of its subscription status. 3. When the DP fires, the SCF user agent alerts the PC user agent by issuing another SIP NOTIFY. RFC 3265 [7] says that the notification in step 2 should be sent, even though it is not an indication that the trigger has actually fired. The 200 OK tells the far-end UA
188
Implementing Services
Event notifications SPIRITS SUBSCRIBEr
SPIRITS gateway
Event subscriptions
ICID end user’s PC IP Domain PSTN Figure 13.1
SPIRITS NOTIFYer SCF
Schematic SPIRITS configuration.
to expect that a NOTIFY message will follow soon. The SCF UA can send a 202 OK instead if it wants to acknowledge the SUBSCRIBE request without promising that the trigger will be armed. Since the NOTIFY method is used to communicate the status of the subscription request, there is really no need to send a provisional response (e.g., 100 Trying). A certain amount of provisioning is required to make ICID work: both UAs have to be able to find the SPIRITS gateway and vice versa. Why would anyone implement ICID? It would only be worthwhile if it allowed the subscriber to influence the processing of the call in some useful way. Thus, for a service built around ICID, the IN trigger would have to be a request DP (i.e., a DP for which control is transferred to the ICID SCF). Presumably it would be useful to provide calling name ID, regardless of the other particulars of the chosen service. This means that the SCF would need to dip the calling name database and incorporate the resulting information into the NOTIFY message of step 3. The following service examples come to mind: •
A call-forwarding capability (e.g., for a traveler who has to be away from his/her phone but does not know the forwarding number before reaching his/her destination or who wants to forward calls selectively).
•
An Internet call-waiting service, which “kicks in” when: • The subscriber is engaged in a dial-up Internet session on the subscribed line. • The subscriber’s serving switch receives an incoming call request for that line. Instead of engaging in the default behavior (playing a busy signal or perhaps forwarding to voice mail), Internet call waiting enables the subscriber to determine the disposition of incoming calls on a case-by-case basis.
An Internet call-waiting service must be able to alert the subscriber of the incoming call (this is ICID capability again), receive the subscriber’s instructions for disposition of the call, and see to it that those instructions are carried out. In the case where the subscriber’s intent is to forward the call, the second step would normally be carried out using the REFER method [9]. The REFER request would contain the forwarding information, which could “point” to a PSTN termination or to a VoIP
13.6
SIP and Services
189
domain (e.g., a VoIP gateway with connectivity to an H.323 terminal or SIP phone). The third step (carrying out the subscriber’s instructions) is beyond our scope. Although each REFER request requires a final response, the REFER method does not employ the three-way handshake we saw with INVITE (that is, 200 OK is not followed by an ACK). As a result, REFER’s associated state machine is less complicated than INVITE’s. The same is true of SUBSCRIBE and NOTIFY. In its current Internet draft incarnation, the SPIRITS protocol specifies an Internet call waiting service whose ICID portion differs from the SUBSCRIBE/NOTIFY approach previously described. (We refer to this service by its acronym, ICW, to distinguish it from the version described in previous paragraphs.) The ICW specification is at least partly the result of a desire to codify implementation approaches that predate SPIRITS [10]. One difference is worth mentioning: ICW uses a trigger detection point in the serving switch; presumably this DP is armed when the end user’s subscription (to the service) is processed. It may seem like a small thing at first glance, but we claim that the security models for the two approaches are quite different. For ICW, telco provisioning systems are responsible for arming the necessary triggers. In the ICID approach, entities outside the service provider’s network are empowered to change settings on voice switches inside the network. With the ICID approach, service providers must account for (and adequately protect against) actions of rogue user agents. For example, a rogue UA might arm triggers for many lines that are not actually subscribed to the service and simply not respond to NOTIFY messages reporting that the triggers have fired. Eventually, the NOTIFY transactions would time out in the SCF. But the SCF’s memory and/or processing capacity might be overloaded (with unpredictable results), especially if the SCF has been dimensioned for a fairly small customer base. Why would a carrier want to offer an Internet call-waiting service? If offered exclusively, it might enable the service provider to bring in additional revenue by packaging services in attractive bundles. For instance, it might help the company sell the services of its own (or an affiliated) Internet service provider to its exising customers. If ICW and calling name ID services were combined at an attractive price, it might be possible to attract new subscribers. The Internet call-waiting service concept is relatively new. But depending on the speed of migration from dial up to broadband Internet access [e.g., cable modem and digitial subscriber line (DSL)], this service may have a fairly short useful lifetime. Contrast this with the revenue-generating lifespans of services like call waiting and caller ID, which might be measured in decades. Although these services represented a substantial investment of time and money on the part of service providers, there has been ample opportunity to recoup the investment several times over. Many services in the future may have short lifetimes. So the need for faster and cheaper development cycles is real. At the time of this writing, the SPIRITS protocol has not reached RFC status. There is an Internet draft that is updated from time to time, however, and the protocol described therein is based on SIP. In closing we note that there is a SPIRITS protocol requirements RFC [11]. SIP is making inroads into wireless networks. As we continue our discussion of SIP-based services in Section 13.7, we shift our attention to the wireless realm.
190
13.7
Implementing Services
SIP in Wireless Networks The 3rd Generation Partnership Project (3GPP, www.3gpp.org) seeks to evolve GSM networks to support third generation services. As an aside, we note that the CDMA “camp” has similar initiatives underway—the interested reader can consult www.3gpp2.org. Our discussion will focus on GSM and 3GPP. A word about the history of wireless networks may be in order here. First generation mobile phones employed analog transmission in the radio frequency (RF) domain. So called second generation (2G) technologies such as TDMA, CDMA and GSM are digital. 2G represents a big step forward from analog, but its data networking capabilities are minimal (2G technologies offer little more than wireless versions of dial up service). Third generation (3G) wireless initiatives seek to offer a much richer array of capabilities to the end user. Broadly speaking, the goal is to enable services that combine voice, video, and data in useful ways. As wireless networks evolve, SIP promises to be an important piece of the puzzle. In recent years, CDMA and GSM carriers have augmented their networks by adding IP routers. They also adapted their RF interfaces to incorporate a modicum of statistical multiplexing capability. While this still falls short of the “vision” of 3G, it is now possible to implement SIP-based services (that is, services with SIP, rather than SS7, inhabiting the control plane). Push to Talk over Cellular (PoC), an early example, is the topic of Section 13.7.1. To reach full-fledged 3G functionality, further enhancements to RF infrastructures and mobile handsets are required. These improvements, which encompass support for higher data rates and multiple simultaneous sessions, will not be emphasized here. 3G architectures also incorporate enhancements to IP domains within PLMNs. We touch on aspects that are pertinent to enriched services in Section 13.7.3. 13.7.1
Push To Talk over Cellular
PoC is a half-duplex voice service that dispenses with customary call setup procedures. That is, PoC departs from the model in which the caller dials the called party’s number, an end-to-end TDM channel is allocated, the called party’s phone rings, and finally the called party decides to accept or reject the call. In some usage patterns, the standard call setup procedures become laborious; PoC is aimed at user populations that would benefit from “lightweight” call control. For example, construction workers or other blue-collar workers at job sites may need to relay information in short “talkbursts.” Setting up a standard circuit-switched call is overkill for such applications. Fleet dispatching applications are also candidates for simplified control. PoC’s operation is often compared with that of walkie-talkies. A major difference is that PoC is designed to use wireless wide area networks and therefore does not suffer from the distance limitations of walkie-talkies. Push to Talk was pioneered (and the name was trademarked) by Nextel, a U.S. wireless carrier, using proprietary technology. At the time of this writing, PoC is in the process of being standardized by the Open Mobile Alliance for (interoperable) use in 3GPP and 3GPP2 networks. The interested reader can find pointers to the PoC standardization effort at www.openmobilealliance.org.
13.7
SIP in Wireless Networks
191
Various carriers are introducing (or have already introduced) prestandard PoC implementations that use RTP as the voice bearer and rely on SIP (and perhaps a bit of RTCP) for control. The standardized version will retain similar general characteristics. The PoC initiative will standardize the management of “buddy lists.” Buddy lists enhance ease of use by offering subscribers a simple way to contact individuals or groups that they frequently interact with. PoC will also standardize ways to make use of presence, availability, and location information. Why PoC? Now that wireless carriers have made basic data networking functionality available to their subscribers, they are eager to roll out revenue-generating services that take advantage of the new capabilities. PoC fills the bill. It is not an accident that PoC voice is half duplex, for at least two reasons: 1. Half-duplex mode hides latency. Latency is substantial in today’s PLMN IP domains; in particular, it is much greater than in PLMN circuit-switched domains. If one tried to implement a full-duplex voice service in the current environment, delays in the bearer plane would be painfully obvious. Such a service would be very hard to use. 2. Media mixing is not necessary in half-duplex mode. In PoC, only one party can have the “floor” at any given time. This means that group PoC sessions are much easier to implement than full duplex conference calls (although the PoC server must replicate voice packets). Unlike PoC servers, conference bridges must also mix media streams (since they do not know which participant is currently speaking; moreover, multiple participants may try to speak at the same time). On the other hand, part of the complexity of PoC resides in its floor control scheme. Thus, we see that, to a substantial degree, PoC embodies the “art of the possible.”
13.7.2
SIP Header Compression
Wireless networks suffer from high bit error rates. Moreover, bandwidth is at a premium, since licenses for wireless spectrum are expensive. SIP signaling exchanges can consume a substantial amount of bandwidth (as can IP traffic in general). As we have seen, SIP headers tend to be voluminous given the amount of routing information they can convey; this is especially true in IPv6 deployments. On the other hand, SIP signaling exchanges feature a significant amount of repeated and/or predictable information: SIP’s text-based nature makes the messages easy for humans to understand, but no one would accuse the protocol of being compact. Thus, it should be possible to compress SIP headers for transmission over the RF interface without any loss of information. But the high bit error rates that are endemic to the wireless domain must be carefully taken into account. This is the motivation behind SigComp [12], which came out of IETF’s Robust Header Compression (rohc) working group. Other RFCs that pertain to SIP compression are [13, 14].
192
Implementing Services
13.7.3
IP Multimedia Subsystem
How should carriers go about deploying SIP-based services? Carriers face many issues (such as operations, administration, maintenence, provisioning, billing, security/privacy, availability, and reliability) that differ from those encountered with services residing in the public Internet. Of course, public Internet services also have to be administered. Since carriers want to maintain a high degree of control over their networks, however, the two environments are very different. In addition, wireless carriers require sophisticated mobility management functionality. SIP’s development in the IETF was driven by people who embraced the notion of lightweight control; the progenitors wanted to develop tools that would enable multimedia services in the public Internet. This was especially true early on. Since telcos got interested in SIP, the standards have “bulked up” a bit. As we saw in the previous section, wireless carriers want to deploy SIP-based services. Is SIP now ready for the telco environment? If so, how does its usage vary from other scenarios? To shed some light on these questions, we take a cursory look at 3GPP’s IP Multimedia Subsystem (IMS) specifications. In broad terms, IMS is a platform for wireless telco services that feature: •
IP-based bearers;
•
SIP as the primary session control protocol (in lieu of SS7). Here we note that SIP does not take on all of the functions performed by SS7 in today’s networks: for example, subscriber database platforms are queried using Diameter [15].
If PoC were the only forthcoming service to rely on SIP, IMS would be unnecessary. The IMS standardization effort comes out of a belief that numerous services will not only use SIP but will also be “consumers” of the same types of information. For example, SIP-based instant messaging may become widespread in the years to come. Like PoC, instant messaging service can be enhanced by incorporating presence information. IMS is intended as a flexible platform for service creation and management. The IMS overview document [16] states explicitly that the goal is to standardize flexible service-supporting mechanisms rather than the services themselves. Thus IMS bears more than a passing resemblance to the IN philosophy discussed in Section 13.2. IMS Functional Elements
IMS defines a variety of call session control functions (CSCFs). Note this terminology’s similarity to that of IN’s distributed functional plane (see Section 13.2.2). The CSCFs function primarily as SIP proxies, although they may behave as user agents under certain circumstances. Let us first discuss the serving-CSCF (S-CSCF). In the IMS model, service logic resides on application servers (ASs); the S-CSCF acts as an intermediary between ASs and the rest of the network. For example, in an IMS PoC implementation, the user’s handset would register with the S-CSCF. After checking the user’s PoC subscription, the S-CSCF would provide access to the PoC server. The proxy-CSCF (P-CSCF) serves as the first point of contact for the handset-resident user agent. That is, the handset’s SIP user agent talks to the SIP entity in
13.7
SIP in Wireless Networks
193
the P-CSCF. Note that, between the handset UA and the P-CSCF, there are intervening IP network elements. In particular, the gateway router in the visited PLMN is responsible for finding the P-CSCF (e.g., by launching a DNS query). Note that gateway routers go by different names depending on the cellular technology; that is why we prefer to remain nonspecific. The P-CSCF is also responsible for terminating SigComp (i.e., compressing SIP messages in the downlink direction and decompressing SIP messages in the uplink direction). It may seem a little odd to distinguish between the S-CSCF and P-CSCF. Keep in mind that CSCFs are functional entities; for a subscriber who is attached to his/her home network, the serving- and proxy-CSCF roles may be played by the same network element. A quick look at the roaming case, which is a little more complicated, serves to motivate the S-CSCF/P-CSCF distinction. A simple supporting illustration appears in Figure 13.2. After it attaches to the visited PLMN, the subscriber’s handset establishes IP connectivity with the outside world through the gateway IP router. The gateway IP router lacks a SIP stack; it can locate the P-CSCF on behalf of the user but does not know about SIP entities in other networks. The P-CSCF is in turn responsible for forwarding SIP requests and responses between the handset and the S-CSCF. A third functional entity, the interrogating-CSCF (I-CSCF), is optional. In small deployments, separate I-CSCFs are unnecessary. When present, the I-CSCF assigns users to S-CSCFs as those users register. In large deployments, the I-CSCF function could be used to implement load balancing (e.g., based on subscriber identity). Moreover, telcos may not want to disclose their topologies (or even the number of S-CSCFs) to one another. The I-CSCF can be used to hide this type of information from external entities. Service Mobility
In Figure 13.2, why is the S-CSCF located in the subscriber’s home network? It is an arrangement that supports service mobility. Simply stated, a subscriber should be able to access the same services when roaming as are available in the home network. This idea, which has been bouncing around for years in the IN community, often goes under the name virtual home environment (VHE). To date, the VHE concept
Handset
App server
SIP IP
Visited PLMN
Figure 13.2
Gateway IP router
ProxyCSCF
IMS serving-CSCF and proxy-CSCF.
ServingCSCF
Home PLMN
194
Implementing Services
has remained somewhat nebulous. By listing VHE as a requirement, the framers of the IMS specifications have expressed a view that IMS is a suitable vehicle for realizing service mobility. Triggering IMS Services
How are IMS services triggered? Here we present a simplified view of 3GPP’s answer to this question; for more information, see the IM Call Model Technical Specification [17]. The influence of IN concepts is very prominent in this document. This makes sense: application servers (ASs) would get bogged down if they had to peruse every SIP signaling flow, message by message, to determine when they needed to do something. So ASs rely on S-CSCFs to notify them of pertinent events (much as CAMEL SCPs rely on mobile switching centers for notification). How does an S-CSCF know when to notify an AS? Figure 13.3 gives a simplified view of the triggering architecture described in [17]. The service point triggers (SPTs) in the diagram are the SIP signaling points that potentially cause the S-CSCF to send a SIP message to some AS (although there is only one AS in the figure, there may be many ASs in an actual deployment). SPTs can be based on the type of SIP message received, the presence (or absence) of specific header fields, and/or the contents of specific header fields. The filter criteria determine which SPTs actually cause the S-CSCF to send message(s) to one or more ASs. Filter criteria also specify which application servers should be contacted. When a user registers with IMS, the S-CSCF queries the subscriber database for the appropriate filter criteria. This allows for SPTs to be “armed” on a per-subscriber basis. In the temporal realm, the interchange depicted in Figure 13.3 would play out as follows: •
When the subscriber registers, the S-CSCF acquires the associated filter criteria from the subscriber database. The filter criteria tell the S-CSCF which service point triggers are pertinent for this subscriber.
•
The incoming SIP message (an INVITE, say) matches one of the filter criteria, so the S-CSCF dispatches a SIP message to the AS.
•
Upon receipt of this message, the AS invokes the appropriate service logic.
•
Depending on the application particulars, the S-CSCF continues its tasks as a SIP proxy (e.g., message forwarding; this is the horizontal dotted arrow emanating from the “filter criteria” box) or waits for instructions from the AS (this is the dotted arrow emanating from the “service logic” box).
Note that this example is highly simplified. In particular, space limitations prevent us from covering the following topics: •
What happens when multiple ASs “subscribe” to the same SPT.
•
The S-CSCF service control model. This defines finite state machines that reside in S-CSCFs and ASs. Dialog between S-CSCFs and ASs occurs in the context of these state machines.
13.8
Short Message Service
195
Subscriber database
Incoming SIP message
App server
Service logic
Service point triggers
Filter criteria Serving-CSCF
Figure 13.3
•
•
Triggering IMS applications.
How IMS interacts with circuit-switched domains (the short answer is that it does so via a softswitch). How IMS interacts with other control platforms (e.g., CAMEL platforms and OSA/Parlay gateways). For CAMEL interworking, see [18, 19].
Additional Comments on IMS
Before moving on, it is worth noting that the IMS specifications mandate support of IPv6 (support of IPv4 is optional) and forbid forking on the part of CSCFs. Moreover, there are some IMS-specific SIP headers that carry information about charging and networks that are transited in roaming scenarios. In keeping with IETF’s guidelines for extending SIP, the additional headers are defined as private headers (or P-headers) in an informational RFC [20]. IMS signaling also involves specialized use of SDP: namely, the “local” and “remote” QoS descriptors defined in RFC 3312 [21]. We refer the reader to the brief discussion that appears in Section 12.6.1. 3GPP has published four main IMS technical specifications. We already cited [16, 17]. For detailed specifications on IMS-compliant use of SIP and SDP, the interested reader can consult [22]. Detailed renditions of numerous call flows appear in [23].
13.8
Short Message Service People make a big deal (perhaps justifiably) of SIP’s ability to transfer free-format information. That is, SIP does not care about the contents of the payload. But this is not new—SMS has offered the same type of flexibility for some time. SMS is heavily used in wireless networks; there has been talk of extending SMS functionality to wireline networks. However, we know of no wireline deployments at the time of this writing. For GSM networks (and their 3G descendents), SMS is specified in [24]. SMS was originally conceived and deployed as a text messaging service; it adheres to a “store and forward” paradigm. In some ways, the solution is inelegant:
196
Implementing Services
•
At various stages of its end-to-end path, a short message (SM) is likely to be carried over a variety of protocols. In PLMNs, SS7 MAP [25] is the primary transport vehicle for SMs. A different protocol stack carries SMs over the RF interface. A third protocol stack can be (and often is, at least in the United States) used for intercarrier transport of SMs, and so on.
•
There are differences in text-encoding schemes (e.g., between CDMA and GSM carriers). There are also differences in the maximum message sizes (they are typically somewhere around 150 characters).
•
Based on these, we see that SMs really are…, well, short.
•
On today’s typical handsets, SMs are difficult to type: the 12-key keypad is a poor substitute for a “real” keyboard.
SMS is all about the “art of the possible.” One can almost reconstruct the thought process that went into its creation: “How could we get a text message from one (digital) handset to another? Well, the SS7 network that interconnects the mobile switching centers is a packet network...let’s concoct a delivery mechanism that uses excess capacity on our SS7 links.” 13.8.1
SMS in Support of Other Applications
SMS suffers from throughput and transport efficiency limitations. Despite these limitations, SMS has been remarkably successful; people have employed it to do useful things. This is true from a technical point of view as well as a marketing point of view. We give an example for each point of view: •
Over the air provisioning. This is very useful because it allows the carrier to update the settings on the handset without requiring the subscriber to visit a store location. As handsets have evolved complex architectures, SMS has evolved the ability to address SMs to specific functional components residing within the handset.
•
Voting applications. Here we are referring to mass-market campaigns that allow subscribers to cast their votes for things like “most valuable player” or “most glamorous movie star.”
SMS is used for many things; this list is intended as an illustration and is by no means exhaustive.
13.9
Further Reading Just as telephone switching technology is changing, service architectures are also evolving. This chapter only scratches the surface of this subject area. For a general overview of IN concepts and their potential evolution, we recommend Zuidweg’s recent book [26]. We have been particularly sparing in our coverage of WIN (the ANSI-41 counterpart of CAMEL). Christenson et al. [27] have written an informative book on this topic.
13.9
Further Reading
197
We have also given short shrift to OSA/Parlay. Mueller’s book [4] is a very good reference; the author is active in the Parlay group.
References [1] TS 22.078 Customised Applications for Mobile Networks Enhanced Logic (CAMEL); Service Description, Stage 1, 3GPP. [2] TS 23.078, Customised Applications for Mobile Netwrok Ehnanced Logic (CAMEL); Stage 2 Specification, 3GPP. [3] TS 29.078, Customised Applications for Mobile Networks Enhanced Logic (CAMEL); CAMEL Application Part (CAP) Specification, 3GPP. [4] Mueller, S. M., APIs and Protocols for Convergent Network Services, New York: McGraw-Hill Professional, 2002. [5] Faynberg, I., et al., The Intelligent Network Standards: Their Application to Services, New York: McGraw-Hill, 1996. [6] Petrack, S., and L. Conroy, RFC 2848, The PINT Service Protocol: Extentions to SIP and SDP for IP Access to Telephone Call Services, IETF, June 2000. [7] Roach, A. B., RFC 3265, Session Initiation Protocol (SIP) Specific Event Notification, IETF, June 2002. [8] Slutsman, L., I. Faynberg, and M. Weissman, RFC 3136, The SPIRITS Architecture, IETF, June 2001. [9] Sparks, R., RFC 3515, The Session Initiation Protocol (SIP) Refer Method, IETF, April 2003. [10] Lu, H., RFC 2995, Pre-SPIRITS Implementation of PSTN-Initiated Services, IETF, November 2000. [11] Faynberg, I., et al., RFC 3298, Service in the Public Switched Telephone/Intelligent Network (PSTN/IN) Requesting Internet Service (SPIRITS) Protocol Requirements, IETF, August 2002. [12] Price, R., et al., RFC 3320, Signaling Compression (SigComp), IETF, January 2003. [13] Garcia-Martin, M., RFC 3485, The Session Initiation Protocol (SIP) and Session Description Protocol (SDP) Static Dictionary for Signaling Compression (SigComp), IETF, February 2003. [14] Camarillo, G., RFC 3486, Compressing the Session Initiation Protocol (SIP), IETF, February 2003. [15] Calhoun, P., et al., RFC 3588, Diameter Base Protocol, IETF, September 2003. [16] TS 23.228, IP Multimedia Subsystem (IMS); Stage 2, 3GPP. [17] TS 23.218, IP Multimedia (IM) Session Handling; IM Call Model; Stage 2, 3GPP. [18] TS 23.278, Customised Applications for Mobile Network Enhanced Logic (CAMEL)— Stage 2; IM CN Interworking, 3GPP. [19] TS 29.2787, Customised Applications for Mobile Network Enhanced Logic (CAMEL) CAMEL Application Part (CAP) Specification for IP Multimedia Subsytem (IMS), 3GPP. [20] Garcia-Martin, E. Henrikson, and D. Mills, RFC 3455, Private Header (P-Header) Extensions to the Session Initiation Protocol (SIP) for the 3rd-Generation Partnership Project (3GPP), IETF, January 2003. [21] Camarillo, G., W. Marshall, and J. Rosenberg, RFC 3312, Integration of Resource Management and Session Initiation Protocol (SIP), IETF, October 2002. [22] TS 24.229, IP Multimedia Call Control Protocol Based on Session Initiation Protocol (SIP) and Session Description Protocol (SDP); Stage 3, 3GPP. [23] TS 24.228, Signalling Flows for the IP Multimedia Call Control Based on SIP and SDP; Stage 3, 3GPP.6
198
Implementing Services [24] [25] [26] [27]
TS 23.040, Technical Realization of the Short Message Service (SMS); Stage 2, 3GPP. TS 29.002, Mobile Application Part, 3GPP. Zuidweg, Next Generation Intelligent Networks, Norwood: Artech House, 2002. Christenson, G., P. Florack, and R. Duncan, Wireless Intelligent Networking, Norwood, MA: Artech House, 2000.
CHAPTER 14
Properties of Circuit-Switched Networks In this brief chapter, we take a look at the features and functionalities of today’s public telephone networks. Where appropriate, we compare and contrast PSTN/ PLMN traditions and design goals with those of data networks. The gist of the discussion is this: the world’s PSTN/PLMN infrastructure does what it was designed to do on a truly enormous scale. Naturally, packet telephony will be compared with today’s telco networks; this means that “the bar is set very high.” By and large, data networking gear is not built to the same reliability standard as telco equipment. To be fair, data networking equipment also does what it was designed to do. We are pointing out, however, that data networking gear has not traditionally been designed to provide “five 9’s” reliability (or, for that matter, to provide real-time services). To a significant degree, telcos are positioned as public utilities; surely this is a big reason why telco networks are designed to such a high level of reliability. Perhaps our experience coincides with that of many readers: when our PC breaks, we use the telephone to call the help desk.
14.1
Telco Routing and Traffic Engineering The term traffic engineering refers to network dimensioning. For the most part we will think of traffic engineering as a decision process for which the main inputs are traffic forecasts and the outputs are transmission link capacities. In this section, we turn our attention to circuit-switched networks. Based on the expected volume of traffic, the network topology, and the rules that will be used to route the traffic, there is a well-developed methodology for determining link capacities to meet a target blocking probability. (When the network cannot complete a call because it cannot find idle transmission capacity along any of its candidate routes, we say that the call is blocked.) This begs the question “How are the network topology and routing rules determined?” Let us assume for the time being that our network topology is fixed. We now undertake an overview of routing with a mind toward understanding the bearer plane’s traffic dynamics. In a nutshell, the lack of routing flexibility conspires with a certain predictability in offered traffic patterns. This confluence makes it possible to determine “how much network to build” with a remarkable degree of accuracy. In other words, routing and dimensioning go together. For readers who want more information, we recommend Girard’s book [1]. We also note that Ash’s encyclopedic book [2] describes the evolution of telco
199
200
Properties of Circuit-Switched Networks
routing in great detail and offers a useful discussion of trunk reservation (see Section 14.1.2). Traditional telco routing is fixed—switch routing tables are configured manually. When a call is placed, the caller’s serving switch proceeds through a static list of routing table entries in a fixed order until the call is successfully completed or the list is exhausted. In many, if not most, cases, this list is short—in the United States, two or three entries is fairly typical for local calls. (Selection of routing table entries is based on the called party’s number.) Traditional telco routing is also hierarchical. At the bottom of the hierarchy are so-called end office switches—these are the switches that directly serve end users. If two end office switches are directly connected, then this is the preferred routing option. If (and only if) no direct channel is available, the originating end office switch goes up the hierarchy to a so-called tandem switch. (To make sense of the name, one can think of a tandem switch as “lashing together” pairs of switches that are lower in the hierarchy.) If the call cannot be completed via this tandem switch, an attempt might be made via another tandem switch. If the latter attempt also fails, the call is blocked (at which point the originating end office switch will send a “fast busy” signal). This description typifies routing of local calls in the United States, where local carriers’ networks have two-layer hierarchies (consisting of end office switches and tandem switches that connect directly to these end office switches. We will refer to such tandem switches as local tandems.) Local carriers still employ fixed hierarchical routing today. Long-distance routing has become more sophisticated in the last 20 years. Let us first discuss the fixed hierarchical routing that was universal until the 1980s. Taken as a whole, the U.S. telephone infrastructure was designed as a five-layer hierarchy. The typical pattern was that two switches could be directly connected if they were, at most, one layer apart in the hierarchy. Thus, a call that needed to go up the hierarchy would do so one layer at a time: the bearer path would consist of a trunk from the originating end office switch to a local tandem, a trunk from the local tandem to a long distance tandem residing in the middle of the five-layer hierarchy, and so on. Switches at the top two layers of the hierarchy did not directly connect to end office or local tandem switches. Nowadays, the largest long-distance carriers implement dynamic routing schemes and the rigid five-layer hierarchy is no longer ubiquitous. Before discussing some of the challenges of dynamic routing, we note that telco routing is static in another way: once a call is set up, voice samples follow the same path throughout the life of the call. As we have seen, this also contrasts with traditional IP routing. 14.1.1
Truitt’s Model
Consider the simple network depicted in Figure 14.1. In this section, we assume that switches A and C are end office switches; switch B is a tandem switch. To carry the A-C offered load, what is the optimal dimensioning of the A-C and A-B transmission links? Here the A-C offered load is the stream of setup requests for calls originating at switch A and terminating at switch C, or vice versa. Note that, for purposes of
14.1
Telco Routing and Traffic Engineering
201
Switch B
Switch A Figure 14.1
Switch C
Topology for discussion of Truitt’s model.
dimensioning, it does not matter which of the two switches originates the call attempt; for ease of exposition, we can assume that all calls originate at switch A. Moreover, the problem is symmetric: an A-C call traverses the A-B link if and only if it also traverses the B-C link. So these two links carry an equal proportion of the A-C offered load. It stands to reason that one hop is better than two: calls that are carried directly on the A-C link consume resources on only one transmission link (and only two switches), so the direct connection is generally preferred. Therefore any incoming call attempt that can be served on the A-C link will be served on the A-C link. When capacity on the A-C link is saturated, incoming call attempts will overflow to the A-B link; whenever the latter is also saturated, calls will be blocked. Let us make a semiformal statement of the problem at hand by saying that: •
The objective is to minimize cost (or some “proxy” for cost).
•
The constraint is that we must not exceed a target blocking probability. In landline networks, the target blocking probability is usually 1% in the busy hour. For wireless networks, 2% is typical. For the sake of definiteness we will assume 1% for the remainder of this section.
Now if A and C were truly the only end office switches in the network, it would always be cheaper to add capacity on the A-C link than on the A-B and B-C links. What we are really trying to do here is decompose a larger problem and work on a tractable subproblem: suppose that, for each End Office switch X not shown in Figure 14.1, the capacities of the A-X and C-X links are held fixed. Moreover, overflow traffic from the A-X offered load is carried on the A-B link (and similarly for the C-X offered load and the C-B link). Recall that we are discussing fixed hierarchical routing in this section—End Office switches never play a tandem role; all overflow traffic goes through switch B. With this reasoning in the back of our minds, it really does make sense to ask whether it is cheaper to satisfy the A-C load by adding capacity on the direct route, or instead by adding capacity on the tandem route. Suppose that, starting with a small number of A-C trunks, we decide to add trunks until the blocking probability satisfies the constraint. Up to a certain point (10% blocking, say), the number of trunks required does not seem unreasonable. But “grinding out” another order of magnitude (i.e., going from 0.1 to .01) proves to be quite expensive and we begin to lose our resolve. The trouble is that, at this point, the overflow process is a relative trickle compared to the original offered load. A-C offered load is the only traffic that can use
202
Properties of Circuit-Switched Networks
the A-C direct link; if we add enough capacity to reach the target blocking probability, we find that utilization on the direct link is poor. Why is it more attractive to add capacity to the A-B and B-C links? There, we are able to realize “economies of scale,” so to speak, stemming from the fact that many overflow traffic streams are also carried on these links. Individually, the traffic intensities of the overflow streams may be quite small. When there are many end office switches, however, the traffic intensity of their superposition is “big enough to be worthwhile.” The foregoing discussion begs the question: where is the crossover point? That is, where does it cease becoming cost effective to add capacity along the direct route? Truitt conducted a detailed analysis of this problem [3]. In this context, the topology of Figure 14.1 is often called Truitt’s triangle. Similar reasoning can be used to determine whether the A-C offered load is sufficient to warrant any dedicated A-C capacity at all. Truitt’s work is an enormously influential foundation in telco traffic engineering: in his voluminous book [2], Ash states that variants of Truitt’s ideas appear throughout. 14.1.2 Dynamic Nonhierarchical Routing, Metastable States, and Trunk Reservation
In a network with fixed hierarchical routing, calls may be blocked even when sufficient capacity is available. This is because the switches do not have the flexibility to exploit available transmission capacity unless that capacity happens to lie along routes specified in preplanned routing tables. Thus the motivation for dynamic routing is simple enough: if capacity is available, let us find it and use it, rather than blocking calls unnecessarily. If it is implemented in a simple-minded way, dynamic routing introduces a few new problems. In a broad sense, routing and therefore capacity utilization is less predictable than in the old fixed hierarchical approach. Blocking probabilities can actually go up when dynamic routing is introduced. As a plausibility argument for this point, we again use the simplistic network topology of Figure 14.1. However, let us: •
Drop the assumption that B is a tandem switch.
•
Add an assumption that each of the three transmission links in the figure has a 100-trunk capacity.
In this network, there are two possible bearer paths for each pair of switches; one is direct and the other goes through the third switch. For example, a call connecting switches A and B can either go through C (we will call this the alternate route) or not (the primary route). Consider the following two network states: •
300 calls are active: 100 on the A-B primary route, 100 on the B-C primary route, and the remaining 100 on the A-C primary route.
•
150 calls are active: 50 on the A-B alternate route, 50 on the B-C alternate route, and the remaining 50 on the A-C alternate route.
14.1
Telco Routing and Traffic Engineering
203
In both cases, our network’s transmission capacity is completely saturated. The first scenario is far preferable to the second; in it, the network is able to carry twice as many calls. The trouble with the second scenario is that calls borne on alternate routes consume more resources than calls on primary routes. This general problem is not unique to the oversimplified scenario at hand. A network state with a high percentage of alternate-routed calls is sometimes called a metastable state. When a network finds itself in a metastable state, it may take a substantial amount of time to get out of that state. To explain why this is the case, consider the following state in our example network: 50 calls are active on the B-C alternate route and another 50 calls are active on the A-B alternate route. Transmission capacity on the A-C link is saturated with alternate-routed calls. If an A-C call comes along, it will have to be borne along the alternate route. Although this example is simplistic, it illustrates the general metastable phenomenon: the presence of many alternate-routed calls increases the probability that newly arriving call requests must also be served on alternate routes. Note that the scenario described here would not occur if switch B was a local tandem, whereas switches A and C were end office switches. Connections between switches B and C would never pass through switch A, since switch A never plays a tandem role. Similarly, A-B connections would never pass through switch C. In fact, metastable states do not present a problem in networks with fixed hierarchical routing. In networks with dynamic routing, how can we avoid metastable states? Looking back at the previous example, some of the capacity on the A-C link should have been reserved for A-C calls. At the very least, the alternate-routed call that consumed the last A-C trunk should have been blocked instead. This is the notion of trunk reservation. This topic is much discussed in the literature on telephone networks; empirical evidence, simulation studies and rigorous results in the mathematical literature on loss networks have firmly established the usefulness of trunk reservation, and addressed the question “how many trunks should be reserved?” Examples of the rigorous mathematical variety include work of Kelly [4], Mitra and Seery [5], and Nguyen [6].
14.1.3
Optional Section: Traffic Intensity and the Erlang B Formula
Traffic engineering topics such as those introduced in Sections 14.1.1 and 14.1.2 have been studied extensively, resulting in rich mathematical modeling literature. In this section, we outline the derivation of the Erlang B formula—one of the essential building blocks in this subject area. We also mention adaptations to the basic formula and give references for further reading. The mathematical content presented here, which may not be to every reader’s taste, is not a prerequisite for any other part of this book. Suppose setup requests arrive at a rate of λ calls per unit time and that the average call duration is 1/µ units of time. Then the ratio λ/µ is called the traffic intensity. We say that the load on the system is λ/µ Erlangs. Traffic intensity, being the product of a rate and a time, is a dimensionless quantity. If we pretend that the system has infinite capacity (so that no calls are ever blocked), then λ/µ is the steady-state mean number of calls in progress at any given time.
204
Properties of Circuit-Switched Networks
Erlang computed the blocking probability for a system with the capacity for N simultaneous calls, given a traffic intensity of λ/µ. (All probabilities discussed here are steady-state probabilities.) Erlang assumed that the system was memoryless. (For readers unfamiliar with probability theory, this means the following: although probabilities of future events depend on the current state of the system, they are completely independent of past events—the path that the system took to get into this state is irrelevant. For readers well versed in probability, it means that call holding times are independent identically distributed exponential random variables, call interarrival times are also independent identically distributed exponential random variables, and moreover that the holding times and interarrival times are independent of one another.) Let pk be the probability that there are exactly k calls in progress for k = 0,1,...,N (recall that N is the capacity of the system). Then for 1≤k≤N we can relate pk−1 and pk as follows: in steady state, transition rate Sk−1 → Sk = transition rate S k → S k−1
where Sk is the state of having k active calls in the system. The left-hand side of the equation is λpk−1 regardless of the value of k (simply because calls arrive at rate λ). Regarding the right-hand side of the equation: the departure rate of calls is proportional to the number of active calls and is inversely proportional to the mean holding time 1/µ. Putting it all together, we have λpk−1 = kµpk; this recursion, along with the fact that the probabilities must sum to 1, leads to the celebrated formula pN =
ρΝ N!
∑
N k=0
ρ k k!
where ρ = λ/µ. Since an incoming call is blocked if and only if the system is operating at full capacity, pN is in fact the blocking probability. How realistic were Erlang’s assumptions? For a large population of users who act independently of one another, a memoryless arrival process (this is also called a Poisson arrival process) is a reasonable model. The assumption that the departure process is also memoryless is less intuitive (at least to us). It turns out, however, that calculations based on this assumption are still reasonable in practice. A famous theorem in queuing theory says that Poisson Arrivals See Time Averages, suggesting that system dynamics may not be terribly sensitive to deviations in this regard. The Erlang B formula is not perfect, however. We have tacitly assumed that, even though there is variability in the arrival process, the arrival rate itself is constant. Certainly this is not true in practice. Hill and Neal [7] showed how an Erlang-based model could be adjusted for day-to-day variation in traffic load. Another important consideration is this: although the load that is offered to a direct link (e.g., the A-C link in the example of Section 14.1.1) may be characterized by a memoryless arrival process, the overflow traffic is definitely not memoryless. This and other topics in this section are afforded careful treatment in Wolff’s queuing theory book [8].
14.2
14.2
Comparison with IP Routing and Dimensioning
205
Comparison with IP Routing and Dimensioning It is already quite clear (by comparing the exposition of the previous section with Section 7.5’s discussion of IP routing) that IP routing and telco routing are quite different. Here we add the following comments: •
Suppose that an IP bearer path fluctuates during the life of a call. Then, at the very least, significant jitter is likely to result. Moreover, packets can arrive at their destination out of order. If this happens with any regularity, it wreaks havoc on real-time applications, which cannot be expected to deal with reordering. MPLS can be employed to provide path stability.
•
Stability is prized in the telco environment, and telco routing schemes reflect that. In fixed hierarchical routing, a modest number of routes is typically available for each originating-terminating pair of switches. Calls that exhaust their collection of candidate routes are blocked. As a result, a spike in call attempts for one switch pair has a limited effect on the rest of the network. Moreover, circuit switches implement so-called gapping to cope with intense bursts of activity (for example, if many people rush to call a radio station in response to a promotion, the serving switch does not try to process every call request, thereby allowing it to react gracefully to overflow conditions. Many callers are unable to get through, of course, but the sudden spate of calls does not knock down the switch). We have also seen that long-distance carriers employ trunk reservation to lend stability to dynamic routing schemes. While there are schemes to avoid hysteresis in the data networking realm, we believe it is fair to say that IP networks do not typically offer the stability guarantees of their circuit-switched counterparts. To the best of our knowledge, there is no widely deployed analog of trunk reservation in IP networking (although one could argue that, with the advent of MPLS, such a thing is warranted: in an MPLS domain, a VoIP bearer path is likely to remain constant throughout the life of a call). There is no simple way to dimension independently for bearer and control traffic in IP networks because the two types of traffic are not systematically segregated. Note that there may still be a logical separation of bearer and control and some degree of physical separation (e.g., bearer traffic may not flow through SIP proxies). Contrast this with circuit-switched domains: SS7 signaling traffic and TDM bearer traffic are carried on different networks. Among other things, this means that the two networks can be dimensioned independently. The optimization problems faced in dimensioning circuit-switched domains are therefore fundamentally simpler than in IP networks. In IP networks, the same routing protocols are applied to signaling and bearer traffic (excepting the signaling that drives the routing protocols themselves.) In circuit-switched networks, SS7 traffic is routed independently of bearer traffic. IP routing protocols are, in our opinion, far superior to SS7’s routing scheme. The upshot of Section 14.1 is that telco transmission links are dimensioned according to mathematical formulae. There is a well-established methodology
•
•
•
206
Properties of Circuit-Switched Networks
that tells traffic engineers when and where to add capacity. Telcos have also explored “time of day” routing schemes to take advantage of noncoincident busy hours. When chronic congestion occurs in IP networks, the typical solution is to “throw bandwidth at the problem.” That is, network engineers typically add transmission capacity in a relatively ad hoc manner to relieve routing hot spots; they may adjust load balancing parameters. But they have far less formal methodology to rely on than their telco counterparts.
14.3
Security We have seen that SS7 networks interconnect voice switches with one another, and with service control points, using signaling transfer points. SS7 networks essentially connect to nothing else (except for low-level hardware such as digital cross-connect systems). In particular, SS7 signaling does not extend to end users. The result is a very secure environment: How can you hack a network that you cannot touch? We say that SS7 networks have physical security. This is a big part of the reason why one never hears of a virus attack on a PSTN. Some telephone traffic is already carried over the public Internet; no doubt this will continue. However the public Internet is much less secure than the traditional telco environment. For this and other reasons, we believe that carriers’ packet voice offerings will be realized, to a significant degree, over private data networks.
14.4
Quality of Service QoS is another reason to deploy private data networks in support of packet telephony. The public Internet follows a best-effort service model in which there is no admission control. Admission control is essential for any network that provides QoS guarantees. That is, incoming call requests must be refused whenever the network determines that it does not have enough available resources to meet its QoS objectives. We have seen that wireline telcos engineer their networks to meet blocking probability objectives. Whenever a call setup is successful, resources are reserved along an end-to-end bearer path, so QoS is not a problem. Wireless networks also engineer to blocking probability objectives. A wireless telco that is meeting its blocking probability objectives may still have to upgrade its cell sites if customers experience a high rate of dropped calls. But by and large, the arrangement is similar to that of wireline networks: if a call attempt is successful, reserved bandwidth is available (end to end) throughout the life of the call. Today’s corporate customers demand that their service providers enter into formal service level agreements. Initially, service-level agreements primarily covered data services, but they are starting to incorporate performance requirements for packet voice. This in turn means that service providers must have a way to guarantee that sufficient resources are available to maintain “toll quality” voice service.
14.5
14.5
Scalability
207
Scalability Telephony is, of course, deployed on a very large scale. Although we might argue that people with telephone service still outnumber those with Internet service, the gap has certainly narrowed in recent years; the public Internet is no longer orders of magnitude smaller than the world’s telephone infrastructure. In circuit-switched and packet-switched networking, there is (justifiably) a big emphasis on technologies that are adaptable to very large subscriber populations.
14.6
Survivability and Reliability For our purposes, a reliable network is a network that functions normally the vast majority of the time. A survivable network is one that can continue to function, albeit at reduced capacity, even if disastrous things happen to isolated components within the network. Today’s telephone networks are extremely reliable. Typically, PSTN/PLMN infrastructure is designed for “five 9’s” availability; this means that the network is able to complete calls, as in normal operation, 99.999% of the time. Core network equipment designs incorporate redundancy throughout: dual power supplies, backup storage, backup processors, “working” and “protect” switching fabrics, working/protect port pairs on line cards, and so on. The study of reliability involves measures such as mean time between failure and mean time to repair. Reliability is a branch of applied probability and, as such, has a robust mathematical framework associated with it. The interested reader can consult a textbook on reliability; for those new to the subject, the book by Billinton and Allan [9] is a good starting point. More rigorous mathematical treatments can be found in [10–12]. To implement a survivable network, it is important to avoid single points of failure in critical systems. We saw an example of this in Chapter 8 when we talked about mated pairs of STPs. Fiber-optic deployments commonly feature diverse “working” and “protect” paths. If, for example, a backhoe ruptures a fiber-optic link, the outage is quickly detected and the system switches to the “protect” path. For many data networking equipment manufacturers, the enterprise market is key. The cost/benefit “equation” for enterprise networks traditionally has not favored telco-grade reliability. However, redundant fiber-optic links are used for wide area network transport of data traffic as well as voice traffic. Moreover, as data networking has become increasingly critical to corporations, outages are very disruptive when they do occur. So reliability and survivability requirements are becoming more important, especially for core routers, Ethernet technologies for metropolitan area networks, and so on.
14.7
Billing Functionality The telco environment requires complex billing systems—voice switches generate charging data records on a per-call basis. Why is this necessary? Such records are vital to the settlement process in which long distance carriers pay access charges to
208
Properties of Circuit-Switched Networks
local exchange carriers (at least in the United States). Wireless carriers pay roaming charges to one another; they also pay for the use of long-distance networks. Again, carriers “settle up” based on analysis of per-call records. Traditionally, data networks have not featured such complex billing infrastructures (nor have cable companies, for that matter). Why are flat-rate cable television and Internet services so common? Surely market drivers are one reason, but they are not the only reason: it is far simpler and cheaper to implement a flat-rate billing system than it is to create and manage detailed use logs.
14.8
Emergency Service and other Government Mandates Telcos are regulated; at this point, it is not yet clear whether packet telephony will be regulated to a similar degree. In the United States, regulatory mandates include: •
Emergency services. This includes technology that allows emergency personnel to locate wireless callers. The US government also contracts with telcos to provide special priority call handling for government officials in the event of a national emergency.
•
Lawful interception. Telcos are required to provide surveillance access to law enforcement personnel.
References [1] Girard, A., Routing and Dimension in Circuit-Switched Networks, Reading, MA: Addison-Wesley, 1990. [2] Ash, G. R., Dynamic Routing in Telecommunications Networks, New York: McGraw-Hill, 1997. [3] Truitt, C. J., “Traffic Engineering Techniques for Determining Trunk Requirements in Alternate Routed Networks,” Bell System Technical Journal, Vol. 31. No. 2, March 1954. [4] Kelly, F. P., “Routing and Capacity Allocation in Networks With Trunk Reservation,” Mathematics of Operations Research, Vol. 15, November 1990, pp. 771–793. [5] Mitra, D., and J. B. Seery, “Comparative Evaluations of Randomized and Dynamic Routing Strategies for Circuit–Switched Networks,” IEEE Transactions on Communications, January 1991, pp. 102–116. [6] Nguyen, V., “On the Optimality of Trunk Reservation in Overflow Processes,” Probability and Engineering in the Mathematical Sciences, Vol. 5, 1991, pp. 369–390. [7] Hill, D. W., and S. R. Neal, “The Traffic Capacity of a Probability Engineered Trunk Group,” Bell System Technical Journal, Vol. 55, No. 7, 1976. [8] Wolff, R. W., Stochastic Modeling and the Theory of Queues, Englewood Cliffs, NJ: Prentice-Hall, 1989. [9] Billington, R., and R. N. Allan, Reliability Evaluation of Engineering Systems–Concepts and Techniques, 2nd ed., New York: Plenum Press, 1992. [10] Ascher, H., and H. Feigold, Repariable Systems Reliability: Modeling, Inference, Misconceptinos, and Their Causes, New York: Marcel Dekker, 1984. [11] Barlow, R. E., Engineering Reliability, Society for Industrial and Applied Mathematics, 1998, AMS–SIAM Series on Statistics and Applied Probability 2. [12] Lawless, J. F., Statistical Models and Methods for Lifetime Data, 2nd ed., New York, London: John Wiley and Sons, December 2002.
CHAPTER 15
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments The telecommunications industry does appear poised to move toward packet telephony on a large scale. Historically, the IETF has been the standards body for all things IP. But pieces of the carrier grade puzzle have been missing from the “IETF toolbox.” Despite this fact, Voice over IP has significant momentum in the industry. (For thoughts on this trend, see Section A.3.7 in the appendix.) The missing pieces of the puzzle have received a great deal of attention over the last several years. Activity in the standards arena has not been limited to the IETF. In this chapter, we take stock of developments in the following areas: •
QoS and traffic engineering;
•
Billing; Interworking with circuit-switched domains.
•
We also discuss IPv6, header compression, and middlebox traversal.
15.1
QoS and Traffic Engineering in IP Networks QoS support in IP networks entails many things; there is no simple taxonomy of requirements. Class-based queuing is a reasonable starting point because it is necessary at every IP router along the data path. After covering class-based queuing, we branch out into other requirements associated with routing and signaling. Next we look at techniques for verifying and enforcing traffic contracts. We then discuss standardized QoS performance measures. Finally, we look at interworking between networks that adhere to different performance standards. 15.1.1
Class-Based Queuing
To support QoS, IP routers must implement some sort of class-based queuing. This means that •
Incoming packets must be classified. Potential classification criteria include source addresses, destination addresses, port numbers, protocol identifiers, and application layer information.
209
210
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
•
•
Routers must subdivide their buffer capacity between the traffic classes (i.e., maintain separate input and/or output queues). Routers must employ queue scheduling schemes that meet the requirements of the traffic classes.
Common families of scheduling algorithms include the following: •
Round-robin. The name is self-explanatory.
•
Priority queuing. If the traffic type for queue A has higher priority than that of queue B and queue A is nonempty, then queue A is served. This affords very low latency for the highest-priority queue(s), but traffic types that are “further down the food chain” suffer starvation during periods of resource contention. Weighted fair queuing (WFQ). Each queue has a preassigned proportion of available output bandwidth. The idea of weighted fair queuing is this: over any time period, each queue is served according to its assigned proportion. Of course, this is only possible in a fluid flow model (think of very short time intervals); packets are certainly not infinitely divisible. So approximations are necessary; because of ATM’s fixed cell length, good WFQ approximations are easier to come by in ATM switches than in IP routers.
•
How fair can you be if there is extreme variability in packet length? Note that fairness is reckoned in terms of throughput (e.g., megabits) rather than in number of packets. Note also that it is easier to approximate fairness on high-speed links (which are typical in large network cores) than on low-speed links: it takes very little time for a router to “clock out” a packet on a high-speed link, even if that packet is large. Not every scheme fits cleanly into this taxonomy. In the literature, one can find any number of articles proposing queue scheduling algorithms; many authors also define measures of fairness. Examples include [1–8]. In selecting a scheduling algorithm, each router manufacturer must analyze the trade-off between desirable fairness properties and computational complexity. 15.1.2
DiffServ and IntServ Revisited
We described the two main IP QoS approaches, DiffServ and IntServ, in Section 7.7. Both require class-based queuing, but the granularities involved are very different. Recall that DiffServ routers implement PHBs; PHBs are selected based on the DSCPs in the IP headers. Routers maintain separate ingress and/or egress queues for each DSCP. Depending on the sophistication of the scheduling algorithm, the implementation challenges may still be significant. Note, however, that the number of queues that have to be maintained remains constant even as the number of concurrent data sessions grows. This is an attractive feature, but we point out that DiffServ cannot offer “hard” QoS guarantees (at least not by itself). Since packets are marked at ingress to a DiffServ domain, the task of classification at each intermediate router boils down to simply checking DSCPs. To achieve quantifiable and reliable QoS for delay-sensitive applications, admission control and resource allocation are necessary. Recall that IntServ is based
15.1
QoS and Traffic Engineering in IP Networks
211
on explicit resource reservation. Initially, IntServ’s intended use was per-session resource allocation, with RSVP [9] as the signaling vehicle. In this scenario, the classification task performed at domain ingress must be repeated at each hop. Due to scalability concerns, however, the IntServ model is rarely implemented end-to-end on a per-session basis. We remark here that it is possible to conduct per session resource allocation on a large scale: the world’s PSTN/PLMN infrastructure is living proof. However, we do take the point that it is counterproductive to attempt this approach on, say, 2.4 Gbit/s links between core IP routers. Routers interconnected by such a link would have to maintain queues for an enormous number of sessions (implying that state variables would not only have to be maintained, but that instantiation and destruction of these state variables would have to be managed by signaling). RSVP was later extended for use with MPLS [10], which offers a way to “carve up” buffer and bandwidth resources in an IP domain. A second approach, Constraint-Based Routed Label Distribution Protocol (CR-LDP) [11], also came out of the MPLS Working Group. (MPLS was introduced in Section 7.7.3; see that section for additional references.) Deployment Options
For “hard” QoS, IntServ can be employed at the edge, with MPLS in the core. MPLS LSPs would not normally be allocated to serve individual sessions. Instead, each LSP would aggregate many session flows. The ability to merge traffic streams as they travel from ingress to core is a crucial capability in MPLS. In situations where “hard” QoS guarantees are not required, DiffServ at the edge/MPLS in the core is an effective combination. The choice between IntServ/MPLS and DiffServ/MPLS combinations may depend on access-network bandwidth. For example, the IntServ model appears in the 3GPP specifications. This is not surprising: access-network bandwidth is a scarce resource for wireless carriers, since it consumes licensed spectrum. DiffServ may be a more common option in LAN environments. For softswitch deployments, MPLS is an attractive option for connectivity between Media Gateways. IP and/or Label Switched Routers within softswitch fabrics may be dedicated to their softswitch roles in many carrier grade deployments, at least in the early years. That is, service providers may not rush to carry large volumes of TCP/IP data traffic on the selfsame routers—traffic engineering is significantly easier in the absence of bursty data networking applications. DiffServ and possibly IntServ will be used to manage bandwidth between Media Gateways and IP phones. Constraint-Based Routing
In QoS enabled IP routers, not all traffic is treated the same. This is the idea behind class-based queuing. In QoS-enabled IP networks, it might make sense to route traffic according to its QoS requirements. More generally, the notion of constraint-based routing is that constraints might vary with the type of traffic. Rather than basing routing decisions purely on standard link metrics, additional criteria might be taken into account. QoS routing is a special case of constraint-based routing; policy-based routing is another special case.
212
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
Although it is certainly not a new idea, it is unclear whether there will be a widely deployed consensus approach to constraint-based routing. Constraint-based routing has made its way into the standards, at least: see CR-LDP [11], or OSPF’s traffic engineering extensions [12], for instance. Note also that RSVP-Traffic Engineering (RSVP-TE) [10] defines an EXPLICIT_ROUTE object. 15.1.3
Verifying and Enforcing Traffic Contracts
Corporate customers increasingly want to multiplex voice, video, and traditional bursty data traffic on their access links (i.e., the links that connect to WAN service providers). These customers, in order to make sure they are getting their money’s worth, want service providers to quantify network performance. By the same token, service providers need to know how much traffic their customers are actually injecting into their networks. So metering and performance reporting capabilities are required by both sides. Moreover, service providers may want to police and/or shape incoming traffic to make their networks less prone to congestion. Policing means discarding traffic when its rate exceeds the contracted rate. (A customer may have a very fast access link to a service provider’s network but only have the contractual right to run that link at high utilization for short periods of time.) Discards can take place at network ingress. There is also a less-severe variant in which “excess” traffic is marked for potential discard. If an intermediate network element experiences congestion, it knows which packets to discard first. Traffic shaping refers to techniques that groom ingress traffic. One well-known approach to shaping is the so-called token bucket. Tokens stream into the bucket at a specified rate. A packet that arrives at network ingress is only allowed to pass if and when enough tokens are available to “escort” the packet (at which point the tokens are destroyed; the number of tokens required is usually prorated according to packet size). Tokens that overflow the bucket are discarded. The parameters for a token bucket are its size (i.e., how many tokens can it hold) and the token rate. As long as tokens are available, a token bucket allows traffic to enter the network at wire speed. Often it is desirable to remove some of the burstiness from a traffic source. One can employ a leaky bucket for this purpose. A leaky bucket admits packets into the network at a fixed bit rate. When packets arrive at the bucket faster than the prescribed rate, they wait in the bucket—provided that there is room. Otherwise, they are discarded. Buckets of one or both types are often used in series. 15.1.4
ITU-T and 3GPP QoS Standards
We have talked about techniques for providing QoS. Using these techniques, what kind of performance can one expect? On one hand, we could look at the performance that is possible with today’s “QoS aware” routers. Or we could take a requirements approach instead: What performance targets must be met for a given application to function properly? We hope the two approaches lead to a common ground of performance measures that are achievable and also consistent with a positive user experience.
15.1
QoS and Traffic Engineering in IP Networks
213
Mindful of this philosophy, ITU-T Study Group 13 has authored recommendations on IP QoS. Recommendation Y.1540 [13] defines the following performance parameters: IP packet transfer delay (IPTD), delay variation (IPDV), loss ratio (IPLR), error ratio (IPER) and spurious IP packet rate (SIPR). A companion recommendation, Y.1541 [14], specifies end-to-end target values for five QoS classes. A sixth QoS class is for “best-effort” service: no quantified performance targets are assigned to this class. The values for IPTD and IPDV appear in Table 15.1. For each of the classes 0-4, the upper bound for IPLR is 10−3 and the upper bound for IPER is 10−4. SIPR is not quantified for any of the traffic classes. Class 5 is the best-effort class previously mentioned. How can a service provider meet the specified performance targets? Recommendation Y.1541 does not place any restrictions in this regard. However, it does give examples of node and network mechanisms that could be used to deliver the specified QoS: •
For classes 0 and 1 (which are the only classes that place limits on delay variation): queues with preferential servicing, coupled with traffic grooming, may be necessary. Note that these mechanisms are implemented in the nodes. Interactive VoIP and video conferencing applications are sensitive to jitter as well as delay, so they require this sort of QoS to function properly. Although class 1 is less than ideal here, the idea is that such services should still be usable in less-favorable delay conditions so long as jitter is minimal.
•
For classes 0 and 2 (which are the classes with the most stringent delay bounds), routing is constrained: large distances and/or high hop counts make it difficult if not impossible to meet the delay requirements. Thus, low latency requires network-level mechanisms; contrast this with the node-based queuing techniques mentioned in the previous bullet. For high volume transactional applications, latency is likely to be important (since low-latency performance in the signaling plane helps keep the number of in-progress transactions from getting out of hand), but delay variation does not matter so much. Class 2 is aimed at supporting such applications.
•
QoS class 3 may be sufficient to support transactional applications that are slightly less demanding.
Table 15.1
ITU-T Performance Objectives
QoS Class
Delay Objective: mean IPTD ≤
Delay Variation Objective: −3 10 quantile of {IPDV − min(IPDV)}≤
0 1 2 3 4 5
100 ms 400 ms 100 ms 400 ms 1 sec Unspecified
50 ms 50 ms Unspecified Unspecified Unspecified Unspecified
214
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
•
QoS class 4 is intended to support applications, such as video streaming, that require a certain level of reliability (i.e., low loss and error ratios) but are not particularly sensitive to delay.
Seitz’s overview article [15] does a nice job of summarizing the content of Y.1540 and Y.1541, and discusses ITU-T’s approach to formulating these recommendations. ITU-T recommendation Y.1221 [16] is more explicit (than Y.1541) regarding the means of meeting performance objectives. The Y.1221 specification defines three transfer capabilities: •
Dedicated bandwidth transfer capability, which can be associated with “specified loss commitments” and “specified delay commitments,” seems to be a repackaging of IETF RFCs 2212 [17] and 2598 [18]. (The actual wording is that the dedicated bandwidth transfer capability “strives for compatibility” with those RFCs. Note that RFC 2598 is an outdated specification of the expedited forwarding per hop behavior; it has been obsoleted by RFC 3246 [19].)
•
Statistical bandwidth transfer capability, which can be associated with “specified loss commitments,” seems to be a repackaging of IETF RFCs 2211 [20] and 2597 [21]. Best Effort transfer capability.
•
3GPP’s QoS architecture document [22] defines four traffic classes: conversational, streaming, interactive and background. There are significant differences between the 3GPP and ITU-T approaches. First of all, the set of parameters is different. (We do not go into detail.) Secondly, although specification [22] does mention delay variation, it does not prescribe target values. Moreover, the delay specification is given as a maximum value rather than as a mean. The values are 100 ms for conversational class and 280 ms for streaming class. For interactive and background classes, no values are specified. Performance over the RF interface varies greatly depending on conditions. It is therefore much harder to make performance guarantees than in a wireline environment: quantifying delay variation might have been a case of “going out on a limb.” So it is not surprising that 3GPP was loath to prescribe target values for delay variation. Even in good conditions, the bandwidth of the RF interface is more limited than that of its broadband landline counterparts. For this reason, the 3GPP specifications place upper limits on session bandwidth, whereas the ITU-T specifications do not. At this point in time, it is problematic to determine appropriate QoS classes and performance targets for hybrid wireline-wireless sessions. In addition to the delay variation issue noted previously, recall that delay itself is specified differently (as a mean according to ITU-T but as a maximum in 3GPP). When a connection crosses two networks, the delay “budget” must be apportioned between the two networks in some way. This is easier to do with a mean than with a maximum. At the time of this writing, efforts to standardize mappings between ITU-T and 3GPP QoS classes had not gotten off the ground. Before moving on, we point out the following curious difference between the ITU-T and 3GPP specifications: the ITU-T delay bound for streaming applications
15.2
Service-Level Agreements and Policy Control
215
(mean value no greater than 1 second) is much less stringent than that of 3GPP (maximum value of 280 ms). Streaming is, by definition, half duplex. As a result, users are much less aware of delay than in conversational services. So why does 3GPP specify such a tight delay bound? We believe that there is an implicit difference in intended applications: wireless carriers are interested in building interactive services (somewhat similar to PoC, say; see Section 13.7.1) that rely on video streaming.
15.2
Service-Level Agreements and Policy Control Using the tools and techniques discussed in Sections 15.1.1, 15.1.2, and 15.1.3, service providers can safely enter into meaningful service level agreements. That is, they can enter into contracts that obligate them to meet quantifiable performance measures. What are the economics of the situation? It is probably unrealistic to assume that service providers will upgrade the QoS-delivery capabilities of their networks unless they have a way to charge customers for higher grades of service (higher than that afforded by the traditional best-effort model, that is). Otherwise, there is no motivation to bear the cost of such upgrades. Once they have entered into service-level agreements, service providers must “deliver the goods.” So service providers need efficient ways to implement appropriate policies for access to (and use of) network resources. IETF’s Resource Allocation Protocol (rap) working group has produced a body of work that addresses this issue. The baseline document [23] specifies the Common Open Policy Server (COPS) protocol. COPS is a multipurpose framework. Its use in the QoS context, using RSVP as the vehicle for resource reservation, is described in [24]. The necessary RSVP extensions are defined in [25]. This covers connection admission control in the “traditional” sense: requests are admitted if and only if adequate network resources are available. Policy-based admission control is a more general topic. Among other things, it offers the flexibility to provide higher grades of service to customers willing to pay premium prices. For example, explicit resource reservation might be available only to a provider’s most-valued corporate customers. Or, in terms of the ITU-T QoS standards of the previous section, classes 0 and 2 might be unavailable to all other customers. Corporations have long interconnected their campuses via leased lines, which provide dedicated bandwidth. In this setting, there is no need for service providers to provide per session policy control. Whenever employees are present at their corporate campuses, they have access to the leased transmission capacity (subject to the policies implemented by their own information technology departments). The leased-line business has been a huge “cash cow” for telcos. In this light, we offer the following motivation for policy-based admission control: •
Corporations want to reduce cost by paying only for what they use rather than paying for dedicated, 24 × 7 capacity. However, they still want performance guarantees.
216
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
•
“Road warriors” are increasingly dependent on mobile access to corporate systems.
Policy-based admission control is covered in [26]. For more information, the interested reader can visit the rap working group’s home page.
15.3
SDP and SDPng SDP [27] was originally designed to support announcement of multicast sessions. Like the original version of SIP (RFC 2543, now obsolete), SDP was produced by IETF’s mmusic working group. SDP’s developers needed a way for the originator of a multicast session to communicate session parameters (e.g., codec information) to a population of potential participants. In the intended use, the session announcement would be a one-way thing: a potential participant could only become an actual participant if it supported the offered session parameters. Thus, the idea of employing SDP in the course of session parameter negotiation came after the fact. In our discussions of Megaco, MGCP, and SIP, we have seen that SDP can in fact be used in this way. However, SDP itself does not explicitly distinguish between potential session configurations and actual session configurations. The mmusic Working Group is developing a long-term replacement protocol that overcomes the limitations of SDP; the new protocol is informally called SDP next generation, or SDPng. SDPng distinguishes between potential and actual configurations. The specification sets forth session parameter negotiation procedures. Unlike SDP descriptions, SDPng descriptions are XML documents. For details on SDPng, see the latest version of the document titled “Session Description and Capability Negotiation.” At the time of this writing, that document was an Internet draft in its seventh revision. The mmusic group is also working on an updated version of the SDP specification; this was also an Internet draft at the time of this writing. The new version of SDP will address the most pressing problems with SDP as currently specified. These problems are: •
SDP’s incompatibility with NATs and firewalls;
•
The need to use SDP with TCP and/or SCTP.
SDPng descriptions tend to be larger than their SDP counterparts, so the latter may be more attractive to wireless carriers. There is also the matter of SDP’s large installed base.
15.4
Sigtran Adaptation Layers SCTP [28] is seeing increasingly widespread use for SS7 traffic. Although the SCTP standard is stable, there is still some flux regarding sigtran adaptation layers. In this section, we briefly describe the landscape.
15.5
Middlebox Traversal
217
With SCTP, it is possible to transport SS7 traffic over an IP bearer rather than a traditional TDM bearer. (It is possible, but not ideal, to transport SS7 traffic over TCP). We covered SCTP, which was produced by IETF’s sigtran working group, in Section 7.8.2. For the following discussion, it may be helpful to refer to Figure 8.3. SCTP provides a robust option for carrying higher layer protocols (such as ISUP, SCCP, TCAP, MAP and ANSI-41) across an IP network. What goes between SCTP and the higher layer protocol entity? For compatibility reasons, the fact that SCTP is now present should be transparent to that entity. The required transparency is achieved by a so-called adaptation layer; the answer to the above question depends on where one chooses to put the adaptation layer. Options include: •
M2UA [29]. As the name suggests, M2UA sits between SCTP and MTP3. In a softswitch, the signaling gateway (SG) may terminate MTP2 on attached TDM SS7 links, but pass MTP3 (and all of the layers above it) through to the media gateway controller. On the SG-MGC link, we would have MTP3 over M2UA, which in turn runs over SCTP.
•
M3UA [30], which sits between SCTP and higher layer protocols such as SCCP and ISUP. This is another option for backhaul of SCCP and/or ISUP messages between signaling gateways and media gateway controllers. In the ISUP case we would have ISUP over M3UA over SCTP. SCCP user adaptation layer (SUA). Intended to carry higher layer protocols (such as TCAP) that have relied on SCCP in the past, SUA would take the place of SCCP; the protocol stack would be TCAP over SUA over SCTP. In the softswitch case, SUA might be used between a signaling gateway (which terminates SCCP on its adjacent TDM SS7 links) and an SCP.
•
Unlike M2UA and M3UA, SUA was still an Internet draft at the time of this writing. The document has progressed through a series of revisions; check under the sigtran working group for the latest version. In the sigtran working group, several other adaptation layers have been specified (or specifications are in preparation).
15.5
Middlebox Traversal NATs and firewalls are ubiquitous in today’s IP networks. Let us discuss the former in connection with SIP. Suppose we have an application running on an end system that sits behind a NAT; the application is trying to set up a session with an entity in another domain. Recall that the source IP address and port number that are known to the application are not exposed to the “outside world”: the NAT function exposes a different (IP address, port number) pair and keeps track of the binding between the two. The NAT function does not make analogous changes to address/port identifiers that appear in SIP headers, however. This function is relegated to a so-called ALG. Often, the NAT and ALG functions reside on the same network element. When new applications come along, however, new functionality must be added to the ALG. This slows deployment of new applications. Another problem with this
218
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
arrangement is that additional processing load is placed on the ALG, leading to scalability issues. The philosophy behind the STUN protocol [31] is that it would be better for the end-system application to learn the NAT bindings. (STUN stands for simple traversal of UDP through NATs.) Then it could place the externally-visible address/port pairs in the SIP headers (and SDP payloads, for that matter). However, one problem is that the end-system application may not know whether NAT(s) are present or, if so, how many NATs are present. The idea of STUN is to locate an entity (known as a STUN server) in the public Internet that can respond to binding requests. The application sends binding requests to the STUN server, which sees the externally visible address/port pair since it is on the other side of the NAT(s). The STUN server, upon receiving a binding request, incorporates the desired information in the payload of its binding response. Note that the NAT function does not alter the payload. As the authors of the specification admit, STUN has serious limitations. There are a variety of NAT types, and STUN does not work with all of them. It does not enable incoming TCP connections through NATs, nor does it work correctly when the end systems that are trying to communicate reside behind the same NAT. STUN was developed in IETF’s Middlebox Communication (midcom) working group. Middlebox is an umbrella term intended to encompass network intermediaries such as the following: •
Firewalls, which perform policy-based packet filtering;
•
Network address translators; Intrusion detectors; Load balancers; Security gateways.
• • •
Realizing that STUN is an incomplete, stop-gap solution, the midcom group is developing a MIDCOM protocol. The midcom architecture RFC [32] talks about moving “application intelligence’’ from middleboxes to external MIDCOM agents. Although generality is a design goal of the MIDCOM effort, NATs and firewalls are the main focus of the current work. In the case of SIP interworking with NATs, the approach seems to boil down to something quite tangible: namely, colocating a MIDCOM agent with the SIP proxy server. The problem space that the midcom group seeks to attack is a big one, and the solution space is immature, to say the least. Part of the trouble is that there are many variants of the basic NAT, making it difficult or impossible to deal with NATs in a uniform way. NATs play an important role in conserving IP address space, and they are part of the de facto privacy/security infrastructure in today’s IP networks. In some ways, the problems associated with NATs are hateful. As a simple example, consider the following: RFC 3550 [33] says that, for RTP and RTCP streams associated with the same session, the RTCP port number should equal the RTP port number plus 1. But NATs do not necessarily preserve this port number adjacency. As a result, RFC 3605 [34] specifies a way to include RTCP port numbers in SDP descriptions. Examples like this one serve to strengthen our impression that it will be very difficult to establish a sensible framework that is adequately encompassing.
15.6
Comments and Further Reading
219
It is not clear to us that there is sufficient will in the industry to standardize NAT behavior. (Although it would be painful to do so, we also think it would be a big help; keep in mind that NATs are likely to proliferate to support interworking between IPv4 and IPv6 domains.) This is yet another reason we expect to see carriers deploying VoIP over private IP networks—otherwise, they may not have enough control to make sure that their solutions consistently “work as advertised.” It will be interesting to see how the midcom work program develops.
15.6
Comments and Further Reading 15.6.1
More on IP QoS
IP QoS is the dominant topic in this chapter. Even so, space limitations prevent us from doing justice to this subject area. In the following paragraphs, we give a few references for further investigation. First we list some useful overview documents. In 2000, the Internet Architecture Board issued a discussion document [35] highlighting key issues with IP QoS. The document attempts to establish a consistent vocabulary and describes the limitations of available approaches. RFC 3272 [36] is similar in approach to the aforementioned Internet Architecture Board document but is more encompassing. RFC 3272 came out of IETF’s Traffic Engineering working group (tewg). Faynberg and Lu published an article [37] that, in the context of ITU-T’s ongoing QoS architecture effort, gives a good overview of IP QoS techniques. In his article [15], Seitz envisions a three-step realization process for “an end-to-end IP QoS solution enabling successful IP/PSTN convergence.” The first step is getting service providers to agree on common, quantifiable IP QoS objectives. The standards discussed in Section 15.1.4 concentrate primarily on the first step. The second step is developing and deploying mechanisms that support the objectives. Our discussion in Sections 15.1.1, 15.1.2, and 15.1.3 is pertinent to step 2, but is far from being the last word. The third and last step is coming up with signaling protocols that allow users to dynamically harness the QoS mechanisms based on their current needs. In this book, our coverage of step 3 is limited at best. The output of IETF’s tewg working group is pertinent here, especially to Seitz’s step 2. Does step 3 represent huge changes to current signaling protocols, or relatively small changes? The answer to this question is unclear; this is arguably the area where IP QoS is most immature. In this regard, IETF’s Next Steps in Signaling (nsis) working group bears watching.
15.6.2
IPv6 and ROHC
Migration to IPv6 has not begun en masse. But numerous IETF working groups are addressing aspects of IPv6, the migration thereto, and the implied transition period in which interworking between IPv6 and IPv4 domains will be paramount. We have the IPv6, Mobility for IPv6 (mip6), MIPv6 Signaling and Handoff Optimization (mipshop), Site Multihoming in IPv6 (multi6), and IPv6 Operations (v6ops) working groups.
220
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
IPv6 is not a topic of emphasis in this book, so we will not detail the activities of the aforementioned working groups. We note, however, that IPv6 does have performance implications (and therefore impacts QoS), particularly in wireless networks. There is an interesting conflict between objectives in the wireless realm. On one hand, growth in wireless networks has been touted as an important driver for IPv6 deployment. Let us look at pressures that threaten to exhaust the IPv4 address space. A given subscriber might have multiple devices connected to a PLMN (a PC and a cell phone or a personal digital assistant with voice capability, for instance). Each device would need to be separately IP-addressable. Moreover, while wireless penetration rates are generally high in developed countries, emerging economies promise to grow the worldwide subscriber base substantially. For countries that do not have reliable PSTNs (or whose PSTNs do not extend outside large population centers), third generation wireless networks may be a sensible alternative. Lastly, telematic applications will place additional pressure on the IP address space. Examples include vending machines connecting to the wireless wide area network to inform back-end systems of their stocking levels, home appliances that are remotely accessible via IP-based connectivity, and automobiles. Wireless carriers are eager to market third generation services in terms of “always on” service capabilities; this was a key selling point of cable modem and DSL. Third generation mobile devices would require “permanent” IP addresses in order to realize this vision, or so the argument goes. Here we play devil’s advocate, so to speak, by making the following “contrarian” observations: •
By and large, cable modem and DSL users obtain IP addresses temporarily via DHCP, just like dial-up users. Often, these are private IP addresses hidden behind network address translators; any number of service providers can reuse the same private IP address space.
•
It is unclear that telematic applications require always-on connectivity. In the vending machine example, it appears much more likely to us that the machine would periodically connect to the network, obtain a temporary IP address assignment via DHCP, upload its information to the owner’s IT network, and then disconnect. There are issues if the IT network wants to contact the vending machine...How do we “wake it up” and ask it to contact the network? Wireless carriers have encountered this problem; there are means to solve it, although they tend to be slow.
Although the wireless industry may see a long-term need to migrate to IPv6 (to sidestep limitations on its marketing opportunities), it is less able than the wireline industry to accommodate the implied increase in overhead. Recall that IPv6 addresses are 128 bits long, whereas IPv4 addresses are only 32 bits long. For this reason, header compression is important. Header compression schemes have been around for a long time. Early schemes were not designed for the wireless environment, however, and do not perform well in the wireless domain (because bit error rates tend to be high). IETF’s rohc working group has turned its attention to this problem. Recalling that numerous IP addresses can appear in a single SIP message, we see that the header compression problem is not limited merely to UDP/IP and TCP/IP headers. Before enumerating several rohc specifications, we note that header
15.6
Comments and Further Reading
221
compression algorithms maintain state machines. The difficulty with this is that it consumes memory and processing power in wireless handsets; these are scarce resources with many competing claimants. In the long run, we believe that affordable handsets will have enough resources to run the necessary state machines. Although work is ongoing, rohc’s main specifications are stable. RFC 3095 [38] is the baseline document. RFCs 3096 [39], 3242 [40], 3243 [41], 3408 [42], and 3409 [43] are concerned with header compression for RTP streams over UDP. Since bearer traffic flows throughout the life of a VoIP call (whereas signaling occurs mostly at the beginning and end of a call), it is not surprising that RTP streams have consumed so much of the rohc group’s attention. SIP header compression is important, however. This is especially true for applications like PoC (see Section 13.7.1), which makes heavy use of SIP. Compression of SIP and other signaling protocols generally goes under the name SigComp; see RFCs 3320 [44], 3321 [45] and 3322 [46]. Details are available at the rohc working group’s home page. 15.6.3
Routing for Voice over IP Protocols: iptel Working Group
The IP Telephony (iptel) working group was chartered to solve problems associated with naming and routing for Voice over IP. Using BGP-4 as a model, the group has developed the Telephony Routing over IP (TRIP) protocol [47]. Like BGP-4, TRIP is used to distribute reachability information across domain boundaries. The iptel group charter acknowledges that TRIP alone does not cover all of the scenarios under which signaling servers exchange routing information; work is ongoing. 15.6.4
ENUM
Why not associate a multiplicity of services with a single telephone number? Here the goal is to go beyond traditional vertical services like caller ID to include: •
Additional modes of communication like fax, e-mail, and instant messaging;
•
A variety of destination types for full-duplex voice service (SIP phones and circuit-switched phones come to mind).
The idea of electronic numbering (ENUM) is to use DNS functionality to realize this goal. The IETF’s Telephone Number Mapping (enum) working group has produced a baseline specification (RFC 2916 [48]). Technically, the result of a so-called ENUM query is a series of DNS naming authority pointer (NAPTR) resource records. RFC 2916 is under revision, partly because the companion document defining NAPTR resource records has been obsoleted by the work of the Uniform Resource Names (urn) Working Group [49–53]. The urn working group is concluded. The results of an ENUM query identify the available methods for contacting the subscriber associated with a given telephone number (typically in order of preference). The technical difficulties of ENUM are relatively minor when compared with the beaurocratic challenges. Telephone numbering is administered as specified in
222
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments
ITU-T recommendation E.164 [54]; naturally, many national and international agencies are involved. There are ongoing efforts to establish an international “public” ENUM infrastructure: that is, a worldwide DNS hierarchy that can resolve ENUM queries. Trials have taken place in Europe and Asia; at the time of this writing, no large-scale public ENUM trial has been scheduled in the United States. For up-to-date information, the reader can consult www.itu.int. For information on developments in the US, one can consult www.enum-forum.org. For those seeking information on national developments in other countries, one approach is to monitor the mailing list of IETF’s enum working group. 15.6.5
Service Architectures
We introduced the IMS in Section 13.7.3. IMS provides a standard way for third generation wireless carriers to roll out SIP-based services. As yet, there is no wireline counterpart. However, TISPAN is looking at the possibility of defining an IMS-like service architecture for the wireline domain. TISPAN is a Technical Body that operates under the auspices of European Telecommunication Standards Institute (ETSI); see www.etsi.org for more information. TISPAN was formed in 2003, so it is too soon to tell whether the wireline IMS effort is likely to gain broad support.
References [1] McKeown, N., “The iSLIP Scheduling Algorithm for Input-Queued Switches,” IEEE/ACM Transactions on Networking, Vol. 7, No. 2, 1999, pp. 188–201. [2] Bennett, J., and H. Zhang, “Hierarchical Packet Fair Queuing Algorithms,” ACM SIG-COMM, 1996. [3] Bennett, J., and H. Zhang, “Why WFQ is Not Good Enough for Integrated Services Networks,” Proc. of the 6th International Workshop on Network and Operating Systems, Support for Digital Audio and Video (NOSSDAV), April 1996. 2 [4] Bennett, J., and H. Zhang, “WF Q: Worst-Case Fair Weighted Fair Queuing,” INFOCOM, 1996. [5] Parekh, A., and R. Gallager, “A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The Single-Node Case,” IEEE/ACM Transactions on Networking, Vol. 1, No. 3, June 1993, pp. 344-357. [6] Zhang, L., “Virtual Clock: A New Traffic Control Algorithm for Packet Switching Networks,” ACM SIGCOMM Computer Communications Review, Vol. 20, No. 4, 1990, pp. 19–29. [7] Suri, S., G. Varghese, and G. Chandranmenon, “Leap Forward Virtual Clock: A New Fair Queueing Scheme With Guaranteed Delays and Throughput Fairness,” IEEE Infocom, 1997, pp. 557–565. [8] Golestani, S., “A Self-Clocked Fair Queuing Scheme for Broadband Applications,” IEEE Infocom, 1994. [9] Braden, R., et al., RFC 2205, Resource ReSerVation Protocol (RSVP)–Version 1 Functional Specification, IETF, September 1997. [10] Awduche, D., et al., RFC 3209, RSVP-TE: Extensions to RSVP for LSP Tunnels, IETF, December 2001. [11] Jamoussi, B., et al., RFC 3212, Constraint-Based LSP Setup Using LDP, IETF, January 2002.
15.6
Comments and Further Reading
223
[12] Katz, D., K. Kompella, and D. Yeung, RFC 3630, Traffic Engineering (TE) Extensions to OSPF Version 2, IETF, September 2003. [13] Recommendation Y.1540, IP Packet Transfer and Availability Performance Parameters, ITU-T, December 2002. [14] Recommendation Y.1541, Network Performance Objective for IP-Based Services, ITU-T, May 2002. [15] Seitz, N., “ITU-T QoS Standards for IP-Based Networks,” IEEE Communications Magazine, June 2003, pp. 82–89. [16] Recommendation Y. 1541, Traffic Control and Congestion Control in IP Based Networks, ITU-T, March 2002. [17] Shenker, S., C. Partridge, and R. Guerin, RFC 2212, Specification of Guaranteed Quality of Service, IETF, September 1997. [18] Jacobson, V., K. Nichols, and K. Poduri, RFC 2598, Expedited Forwarding PHB, IETF, June 1999. [19] Davie, B., et al., RFC 3246, An Expedited Forwarding PHB, IETF, March 2002. [20] Wroclasski, J., RFC 2211, Specification of the Controlled-Load Network Element Service, IETF, September 1997. [21] Heinanen, J., et al., RFC 2597, Assured Forwarding PHB Group, IETF, June 1999. [22] TS 23.107, Quality of Service (QoS) Concept and Architecture, 3GPP. [23] Durham, D., et al., RFC 2748, The COPS (Common Open Policy Service) Protocol, IETF, January 2000. [24] Herzog, S., et al., RFC 2748, COPS Usage for RSVP, IETF, January 2000. [25] Herzog, S., RFC 2750, RSVP Extensions for Policy Control, IETF, January 2000. [26] Yavatkar, R., D. Penderakis, and R. Guerin, RFC 2753, A Framework for Policy-Based Admission Control, IETF, January 2000. [27] Handley, M., and V. Jacobson, RFC 2327, SDP: Session Description Protocol, IETF, April 1998. [28] Stewart, R., et al., RFC 2960, Stream Control Transmission Protocol, IETF, October 2000. [29] K. Morneault, et al., RFC 3331, Signaling System 7 (SS7) Message Transfer Part 2 (MTP2)—User Adaption Layer, IETF, September 2002. [30] Sidebottom, G., K. Morneault, and J. Pastor-Balbas, RFC 3332, Signaling System 7 (SS7) Message Transfer Part 3 (MTP3) User Adaption Layer (M3UA), IETF, September 2002. [31] Rosenberg, J., et al., RFC 3489, STUN—Simple Traversal of User Datagram Protocol (UDP) Through Network Address Translators (NATs), IETF, March 2003. [32] Srisuresh, P., RFC 3303, Middlebox Communication Architecture and Framework, IETF, August 2002. [33] Schulzrinne, H., et al., RFC 3550, RTP: A Transport Protocol for Real-Time Applications, IETF, July 2003. [34] Huitema, G., RFC 3605, Real-Time Control Protocol (RTCP) Attribute in Session Description Protocol (SDP), IETF, October 2003. [35] Huston, G., RFC 2990, Next Steps for the IP QoS Architecture, IETF, November 2000. [36] Awduche, D., et al., RFC 3272, Overview and Principles of Internet Traffic Engineering, IETF, May 2002. [37] Lu, H. -L., and I. Faynberg, “An Architectural Framework for Support of Quality of Service in Packet Networks,” IEEE Communications Magazine, June 2003, pp. 98–105. [38] Bormann, C., et al., RFC 3095, Robust Header Compression (ROHC): Framework and Four Profiles: RTP, UDP, ESP, and Uncompressed, IETF, July 2001. [39] Degermark, M., RFC 3096, Requirements for Robust IP/UTP/RTP Header Compression, IETF, July 2001. [40] Jonsson, L. -E. and G. Pelletier, RFC 3242, Robust Header Compression (ROHC): A Link-Layer Assisted Profile for IP/UDP/RTP, IETF, April 2002.
224
Evolving Toward Carrier-Grade Packet Voice: Recent and Ongoing Developments [41] Jonsson, L. -E. and G. Pelletier, RFC 3242, Robuts Header Compression (ROHC): Requirements and Assumptions for 0-byte IP/UDP/RTP Compression, IETF, Aprile 2002. [42] Liu, Z., and K. Le, RFC 3408, Zero-byte Support for Bidirectional Reliable Mode (R-mode) in Extended Link-Layer Assisted Robust Header Compression (ROHC) Profile, IETF, December 2002. [43] Svanbro, K., RFC 3409, Lower Layer Guidelines for Robust RTP/UDP/IP Header Compression, IETF, December 2002. [44] Price, R., et al., RFC 3320, Signaling Compression (SigComp), IETF, January 2003. [45] Hannu, H., RFC 3321, Signaling Compression (SigComp)—Extended Operations, ITEF, January 2003. [46] Signaling Compression (SigComp) Requirements & Assumptions, IETF, January 2003. [47] Rosenberg, J., H. Salama, and M. Squire, RFC 3219, Telephony Routing Over IP (TRIP), IETF, January 2002. [48] Faltstrom, P., RFC 2916, E.164 Number and DNS, IETF, September 2000. [49] Mealling, M., RFC 3401, Dynamic Delegation Discovery System (DDSS) Part One: The Comprehensive DDDS, IETF, October 2002. [50] Mealling, M., RFC 3402, Dynamic Delegation Discovery System (DDDS) Part Two: The Algorithm, IETF, October 2002. [51] Mealling, M., RFC 3403, Dynamic Delegation Discovery System (DDDS) Part Three: The Domain Name System (DNS) Database, IETF, October 2002. [52] Mealling, M., RFC 3404, Dynamic Delegation Discovery System (DDDS) Part Four: The Uniform Resource Identifiers (URI) Resolution Application, IETF, October 2002. [53] Mealling, M., RFC 3405, Dynamic Delegation Discovery System (DDDS) Part Five: URI.ARPA Assignment Procedures, IETF, October 2002. [54] Recommendation E.164, The International Public Telecommunications Numbering Plan, ITU-T.
CHAPTER 16
Conclusion Many telcos are now deploying packet telephony on a substantial scale. The IP Centrex industry is picking up steam. Softswitch deployments in telco core networks are also taking place. With these trends, service providers are gaining operational experience. Softswitch vendors and service providers have worked to adapt traffic engineering models to packet domains. Deployments furnish opportunities to validate and/or calibrate those traffic models. However, numerous aspects of packet telephony are still in early phases of the maturation cycle. In this book, we highlighted the following issues: •
Routing and address resolution that encompasses packet-switched and circuit-switched domains is still a work in progress.
•
Today’s solutions for IP QoS are immature. In a nutshell, there is a very big difference between “make sure it gets there” and “make sure it gets there on time.” TCP/IP is well suited to the former but is not enough for the latter: • It is possible to provide high-quality voice service over “managed” IP networks, but robustness and efficiency improvements are still needed. Above all, scalable IP QoS solutions are still evolving. Operational experiencesomething that was in short supply until very recently- will surely be crucial to furthering this evolution. On today’s wide area network links, highquality VoIP is often realized as VoIP over ATM. This is due to ATM’s QoS capabilities. • Voice over ATM, which can take advantage of ATM’s robust QoS support, is central to some packet telephony implementations. But Voice over ATM per se has not captured the industry’s imagination, and may see limited success in the marketplace. In particular, it appears highly unlikely that endto-end Voice over ATM will be widely deployed. Enhanced service offerings are crucial to the widespread acceptance of packet telephony. “Enhanced service offerings” could mean services that cannot be offered over today’s circuit-switched networks. But it could just as easily mean services that were conceived a long time ago but have typically been offered in limited ways. In the past, PBX/Centrex features have been available only in wireline environments. To be fair, wireless carriers now offer private dialing plans to corporate customers. Integration of service offerings across wireless and wireline domains is an appealing prospect to corporate customers and consumers alike. This is an area that is truly in its infancy. Softswitch technology, along with protocols such as SIP, show promise here. Although
•
225
226
Conclusion
we believe that SIP is well-suited to the development of integrated services, we believe it is important to sound the following note of caution: • Since it is relatively new, SIP is probably not ready to “take on the world.” Maturity will come over time with the accumulation of operational experience. Meanwhile, however, it is hard to ignore that SS7 is established technology; it has proven its robustness on a truly enormous scale. The advent of sigtran means that the bandwidth limitations inherent in TDM-based SS7 implementations can be overcome. We believe that SS7 call control signaling will be phased out very slowly.
APPENDIX A
Data Link Layer Protocols Our primary topics in this appendix are Frame Relay, ATM, and Ethernet. Frame Relay and ATM have strong similarities centering on the fact that both are connection-oriented. Ethernet is connectionless. We will define these terms in the exposition that follows. Before moving into our main topics, we look briefly at a precursor to many link layer protocols: HDLC.
A.1
HDLC We begin our discussion of the data link layer with HDLC. For our purposes, the most important thing about HDLC is that it defines a frame structure that other link layer protocols have copied. More precisely, many other link layer protocols have adapted HDLC’s frame structure to suit their own purposes. For example, PPP [1] uses a framing mechanism based on HDLC framing (see [2]). PPP is perhaps best known for its widespread use in dial-up sessions, but it is deployed in many other ways (e.g., [3]). HDLC defines three frame types, of which we mention two: information and supervisory. Payload received from higher layers is transported within information frames. Supervisory frames are used to implement error and flow control. Although we do not describe the particulars, we note that IEEE logical link control (LLC) [4] uses a variant of HDLC’s error control mechanism. LLC is employed in Ethernet, Token Ring, and other LAN technologies. HDLC frames begin and end with a 1-byte flag field (which contains the pattern ‘01111110’). The flag pattern is used to delimit frames. Thus, we must make sure that the flag pattern never appears in mid-frame: wherever five consecutive ‘1s’ occur in a payload, the link layer protocol entity appends a ‘0’. This process is reversed at the receiving end. Rather than documenting HDLC frame formats (which is not necessary for our purposes), we turn our attention in the next section to Frame Relay. This is another technology that uses HDLC-derived framing.
A.2
Frame Relay Frame Relay is a descendent of the X.25 protocol. X.25 was designed for use on links with relatively high bit error rates; it uses an error control scheme based on
227
228
Data Link Layer Protocols
that of HDLC. X.25 also has flow control features. Because its error and flow control schemes add lots of overhead, X.25 is not very efficient for transmission over highly reliable links. As less reliable analog links were replaced by highly reliable links, an alternative was needed. This was the motivation for Frame Relay, which is sort of a “stripped down” version of X.25. Frame Relay does not perform any error correction; it simply discards frames that it recognizes as corrupted and assumes that higher layer protocols will take care of the rest (e.g., by detecting missing data and requesting retransmission whenever necessary). A.2.1
The Frame Relay Header
The Frame Relay frame and header formats appear in Figure A.1. The 1-byte Flag field contains the HDLC flag pattern and is used to delimit frames. (Note that there is no field indicating the length of the frame, so Frame Relay switches have no other means of delineating frames. Most Frame Relay switches support a maximum frame size of 1,600 or more bytes.) The 2-byte cyclic redundancy check (CRC) field is used to detect corrupted frames (which are discarded). For our purposes, the key header fields are as follows (we will not cover the “grayed-out” header fields): •
Data link connection identifier (DLCI). This 10-bit field is used as a forwarding table index, as described in Section A.2.2. Note that the DLCI does not appear contiguously in the Frame Relay header; we have labeled the placement of the six most significant bits (MSBs) and the four least significant bits (LSBs).
•
Forward explicit congestion notification (FECN). When a Frame Relay switch experiences congestion, it can notify the destination endpoint by setting this bit to 1. Backward explicit congestion notification (BECN). When a Frame Relay switch experiences congestion, it can notify the originating endpoint by setting this bit to 1. Discard eligibility (DE). When a customer exceeds its contracted data rate, this bit can be set to 1 at ingress to the Frame Relay network. When a Frame Relay switch experiences congestion, frames with DE=1 are the first to be discarded.
•
•
Flag
Header
DLCI (MSB portion)
User Data
C/R EA0
DLCI (LSB portion)
16 bits Figure A.1
Frame Relay format including header.
CRC
FE BE CN CN
Flag
DE
EA1
A.2
Frame Relay
A.2.2
229
Label Switching and Virtual Circuits
How does a Frame Relay switch decide where to send an incoming packet? It does so via a table lookup. The table is indexed by incoming port and incoming DLCI, and the table lookup actually yields two things: outgoing port and outgoing DLCI. Before queuing a frame for transmission on the appropriate output port, the Frame Relay switch replaces the incoming DLCI with the outgoing DLCI. Table A.1 presents an illustrative example; here we have the mapping (incoming port/DLCI = 2/3) → (outgoing port/DLCI = 4/7).
The table lookup described here can be done quickly; it is much simpler than the “longest match” lookups performed by IP routers. As the example demonstrates, two end systems that are exchanging frames across a Frame Relay switching network are not likely to see the same DLCI(s). We say that DLCIs have only local significance. To see why DLCIs lack end-to-end significance, let us refer again to the example of Table A.1: by looking at the line for incoming port 3, we see that a different input stream already has “dibs” on outoing port/DLCI = 4/3. To make it so that DLCIs could remain constant along end-to-end paths, we would create more problems than we would solve (particularly in regard to scalability). So how can we manage end-to-end paths through Frame Relay networks? This is accomplished using virtual circuits (VCs). Data makes its way through individual Frame Relay switches by means of bindings between (input port/DLCI) pairs and (output port/DLCI) pairs. A VC is simply a set of such bindings (one at each switch along a route) that collectively define an end-to-end path. Frame Relay VCs are typically established by network administrators via element management systems; these are called permanent VCs. Switched VCs (which are dynamically set up and torn down via signaling) have been discussed over the years but have not seen widespread deployment. With the advent of MPLS, we may see dynamically established paths realized as MPLS label switched paths. Several standards bodies were involved with standardizing of Frame Relay; the Frame Relay Forum was among them. The Frame Relay Forum merged with the MPLS Forum in 2003, creating the MPLS and Frame Relay Alliance. The interested reader can consult www.mplsforum.org for more information on Frame Relay.
Table A.1
Label Switching Illustration
Incoming Port
Incoming DLCI 1
2
3
1 2 3 . . .
Port: 4 DLCI: 7 Port: 4 DLCI: 3
...
1024
230
A.3
Data Link Layer Protocols
Asynchronous Transfer Mode Frame Relay is often called a label switching technology because it uses a short, locally significant label (namely, the DLCI) as a forwarding table index. ATM is also a label switching technology; the details of the label are different, and variable-length frames are replaced by fixed-length cells. But the basic idea is similar. In particular, a virtual circuit is created by establishing a label binding at each switch along an end-to-end path. Until you have a virtual circuit or virtual path (we will discuss the latter in Section A.3.5), you do not have a way to forward data through an ATM network (and similarly for Frame Relay). We say that ATM and Frame Relay are connection-oriented. Note that frames reach the terminating endpoint of a VC in the same order that they entered the originating endpoint. This contrasts with IP networking. ATM does add a few twists. In particular, the developers of ATM paid close attention to QoS issues. Whereas Frame Relay was developed to transport data traffic across wide area networks links, ATM was conceived as a multiservice technology for voice, video, and data. Moreover, Frame Relay was not designed for links with bit rates in excess of 45 Mbit/s; ATM interfaces are available at least up to rates of 2.4 Gbit/s. A.3.1
The ATM Header
The ATM cell format is displayed in Figure A.2. (Here, following the industry standard parlance, we use the term cell as a synonym for “fixed-length frame.” Every ATM cell is 53 bytes long.) When forwarding cells along a virtual circuit, an ATM switch uses the virtual path identifier/virtual channel identifier (VPI/VCI) as the label: incoming port number and VPI/VCI are used as indices into a forwarding table. The table lookup yields an outgoing port number and VPI/VCI. (To emphasize that the VPI/VCI concatenation is “of a piece,” we have cross-hatched this portion of the cell in the figure.) At a network-network interface, the VPI consumes the first 12 bits of the ATM cell. Note that in Figure A.2, at a user-network interface, the first four bits are used for something else (the generic flow control (GFC) field, which we will not discuss further); as a result, the VPI is only 8 bits long. The VCI is always 16 bits long. We will not discuss the payload type identifier (PTI), except to say that it can be used to distinguish network management traffic from user data. The cell loss priority (CLP) bit is analogous to Frame Relay’s discard eligibility bit. The header error check is based on a cyclic redundancy check calculation; it is used to detect the presence of bit errors and for cell delineation. (Notice in this regard that no flag pattern is present to delineate ATM cells.) A.3.2
ATM Approach to Quality of Service and Statistical Multiplexing
ATM was designed from the beginning with robust QoS in mind. To this end, ATM defines several QoS classes. Depending on the QoS class of an ATM connection, the ATM network may provide guarantees on a number of traffic parameters, including cell delay variation (CDV), cell transfer delay (CTD) and cell loss ratio (CLR).
A.3
Asynchronous Transfer Mode
231
8 bits GFC or VPI
VPI
VPI VCI PTI
CLP
Header Error Check
Cell Payload (48 bytes)
Figure A.2
ATM cell format.
Generally, sources must behave in accordance with traffic contracts (or cells may be discarded to prevent congestion). Pertinent source traffic descriptors include peak cell rate, sustainable cell rate and maximum burst size. The classes are as follows: •
Constant bit rate (CBR). Intended for emulation of TDM circuits, this class has tight bounds on the aforementioned parameters; it is not suitable for bursty traffic sources.
•
Real-time variable bit rate (rt-VBR). This class also provides guarantees on CDV, CTD, and CLR. Source descriptors pertinent to this class include sustainable cell rate and maximum burst size. As an example, note that video traffic is by nature burstier than voice traffic. Note also that silencesuppressed voice has a variable bit rate. Nonreal-time variable bit rate (nrt-VBR). This class provides guarantees on (mean) CTD and CLR but not on CDV. Available bit rate (ABR). This class is intended to provide feedback-based flow control (thus there are some similarities to TCP’s role in IP domains). ABR was basically never deployed. Unspecified bit rate (UBR). This is for best-effort service and is widely deployed.
•
•
•
ATM switches make heavy use of weighted fair queuing and/or other class based queuing schemes. Allocation of buffer and bandwidth for the real-time traffic classes is delicate, and it is fair to say that this is a sophisticated subject area. Gory details are available in [5]; the authors, who worked for a major ATM switch vendor at the time, are experts. A.3.3
The ATM Control Plane
ATM VCs can be managed administratively (via ATM element management systems), in which case they are called permanent virtual circuits (PVCs). They can also be set up and torn down dynamically, in which case they are called switched virtual
232
Data Link Layer Protocols
circuits. The SVC signaling protocol at the User-Network Interface (UNI) is based on ISDN signaling. Two versions of the ATM UNI, specified by ITU-T and ATM Forum, respectively, are [6] and [7]. Earlier version(s) of the ATM UNI may also remain in widespread use. What about signaling SVCs across an ATM network? In particular, where is the routing intelligence? Private Network-Network Interface (PNNI [8]) is a signaling and routing protocol aimed at this problem. Like OSPF (see Section 7.5.2), PNNI routes dynamically based on a shortest path formulation. PNNI protocol entities flood routing information through the domains they inhabit; using that information, each PNNI switch builds a routing table. Like OSPF, PNNI is a link state protocol. PNNI incorporates a notion of hierarchy that is more flexible than that of OSPF areas. Unlike OSPF, PNNI is source routed: the originating node selects a route when setting up an SVC. Intermediate nodes are obligated to stick with this route unless adequate resources are unavailable (or administrative policies prohibit it). The call-control portion of PNNI is similar to UNI signaling. Soft-Permanent Connections
We have talked about SVCs, which are dynamically established via signaling. SVCs have been studied extensively over the years but have not been used as widely as expected. (However, some ATM-based softswitches use SVCs.) There is another type of ATM connection: the soft-permanent connection. A soft-permanent virtual circuit (S-PVC) is a hybrid of the SVC and PVC connection types, as we now explain. S-PVC setups are triggered by administrative action (typically through an element management system), but the setups are actually carried out using signaling in the same manner as with SVC setups. The attraction of S-PVCs is that they can be automatically rerouted in failure conditions. Some ATM element management systems implement their own rerouting capabilities. However, the capabilities are not standardized and the rerouting process tends to be very slow. The “permanent” in “soft-permanent” refers to the idea that, under normal circumstances, an S-PVC would be expected to remain in place for a considerable duration. S-PVCs are widely deployed. Ironically, S-PVCs were first conceived for another reason: to make it easy to work with end systems that did not have SVC signaling capability. If such an end system was attached to a network that did have SVC capability, then administrative action would be required to configure the PVC “leg” connecting the end system to the rest of the network. However, the S-PVC notion allowed the rest of the end-to-end connection to be set up dynamically via signaling. A.3.4
ATM Adaptation Layers and Options for Voice over ATM
ATM Adaptation Layers (AALs) are the glue between the “raw” ATM layer and higher layers. Roughly speaking, AALs are packetization schemes. We list AALs and their intended uses below.
A.3
Asynchronous Transfer Mode
233
•
AAL1. This adaptation layer [9] is used for PCM voice (i.e., G.711), either as individual channels [10] or via emulation of channelized DS1s and/or DS3s 1 [11] . AAL1 can also be used for leased lines. AAL1 VCs are normally associated with the CBR traffic type.
•
AAL2. This adaptation layer can also be used for voice (primarily over rt-VBR VCs with vocoders such as AMR. G.711 voice can be transported over CBR AAL2 VCs, but compared with AAL1 there is additional overhead). The interesting thing about AAL2 is that it can multiplex voice calls within a single ATM VC. Further discussion of AAL2 appears later in this section. AAL5. This adaptation layer [12] performs simple segmentation and reassembly. AAL5 has historically been employed for UBR VCs carrying data traffic. Over VBR VCs, AAL5 can also be used for VoIP. Signaling ATM adaptation layer (SAAL). This adaptation layer [13] is used for SS7 signaling over so-called high-speed links. High-speed links are realized using circuit emulation over CBR VCs.
•
•
Historical Note on AAL3/4
Another AAL (AAL3/4) was standardized by ITU-T but is rarely mentioned (thus we have omitted it from our list). AAL3 and 4 were intended to provide simple framing support for connection-oriented and connectionless data services, respectively. Eventually these two AALs were combined. In addition to the 5-byte header, AAL3/4 consumes 4 bytes of the ATM payload. AAL5 does not require the additional 4 bytes and is therefore preferred. Options for Voice over ATM
Overlay networks of PVCs (or S-PVCs) are expensive to manage (not to mention their stranded bandwidth). Therefore SVCs are probably a more attractive option for Voice over ATM. However, performing SVC setup on a per call basis places a heavy signaling load on ATM network elements. Sluggish call setup performance may result. We might espouse the philosophy that, in a broadband network, an individual telephone call consumes too little bandwidth to merit per call signaling at every switch. AAL2 was designed with this philosophy in mind. As noted, AAL2 [14–18] adds a layer of multiplexing within an ATM connection. In the case of voice traffic in a large network, mapping each call to a different SVC results in a large number of ATM connections and places a huge signaling burden on the ATM call-control processors (although it is efficient in terms of transmission bandwidth). AAL2 addresses this problem by introducing circuit identifiers (CIDs) into the ATM payloads. The CIDs are used to distinguish different calls that are carried inside the same ATM VC. The endpoints of the ATM
More on AAL2.
1.
A single digital signal 1 (DS1) carries 24 voice channels; its bit rate is around 1.5 Mbit/s. The 30-channel European counterpart, E1, can also be emulated. A single digital signal 3 (DS3) carries 672 voice channels and operates at around 45 Mbit/s.
234
Data Link Layer Protocols
VC can signal call setups and teardowns at the AAL2 layer without involving the ATM control plane. This can be done within the AAL2 VC itself (in which case a specific CID is set aside for signaling) or out of band. We think AAL2 is quite ingenious. However, it has seen limited uptake in the industry. One of the reasons is that, practically speaking, a mesh of VCs must be maintained among the AAL2 endpoints. (The notion of an “AAL2 switching node” is mentioned in the standards. The idea of an AAL2 switching node is that it would terminate AAL2 VCs and repackage incoming AAL2-layer traffic into outgoing VCs based on the AAL2 CIDs. This would allow one to connect AAL2 endpoints to the AAL2 switching node rather than to each other, thereby reducing the number of AAL2 VCs required to do the job. To the best of our knowledge, nobody ever built an AAL2 switching node.) A.3.5
Virtual Paths
We previously said above that, in the case of virtual circuits, forwarding decisions are based on the combined VPI/VCI field in the cell header. It is also possible to switch cells based only on the VPI field; ATM defines the notion of a virtual path (VP) for this purpose. A single VP can have many VCs within it. We have seen that AAL2 adds a more granular layer of multiplexing. VPs represent a less granular multiplexing option at the ATM layer. Scalability is one of the key motivations for this feature. As we have already seen, individual VCs possess QoS attributes. Switching at the VPI/VCI level of granularity entails class-based queuing to deliver the contracted QoS for each VC. In the core of a large network, there may be many VCs passing through an identical series of switches because their destinations are in relatively close proximity. The idea of switching at the VPI level is to ship all cells through the virtual path in the same order they were received (which requires only one queue per VPI at each switch in the interior of the VP). Class-based queuing mechanisms that operate at ingress to the VP make sure that cells from the consituent VCs are transported in an appropriate order. To the best of our knowledge, virtual paths have never seen large-scale deployment. Service providers took the view that the administrative headaches involved in maintaining a logical network of virtual paths outweighed the potential benefits. A.3.6
MPLS over ATM: VC Merge Capability
We recall the following information about MPLS from Section 7.7.3. MPLS is all about efficient transport of IP packets through wide area networks. MPLS sets up paths at the data link layer; these are known as LSPs. MPLS nodes, which are called label switched routers (LSRs), often have ATM fabrics. When traversing a network of ATM LSRs, an LSP bears a strong resemblance to an ATM VC. There is a key difference, however, which we now explain. It will often be the case that traffic streams entering an ATM LSR on different ports will exit that LSR on the same port. It may not be necessary to distinguish between the two traffic streams at any of the downstream LSRs in the current MPLS domain. This is the case if the two traffic streams are headed to the same egress node and do not require different treatment vis a vis queuing, discard priority, and so on.
A.3
Asynchronous Transfer Mode
235
In principle, it should only be necessary to allocate one label (i.e., VPI/VCI) for the combined traffic stream on the outgoing interface. A network operator might wish to use only one label to avoid consuming too many labels or to limit the signaling traffic associated with distributing the label bindings. (Note that both of these reasons boil down to scalability.) Suppose we have two traffic streams that are routed from ATM LSRs 1 and 2 through LSR 3 to LSR 4. If we assign the same VPI/VCI to all cells from the combined traffic stream as they exit LSR 3 on the link to LSR 4, then we must make certain that we avoid the so-called “cell interleave” problem. That is, as cells from two IP packets (one each from LSR 1 and LSR 2) arrive at LSR 3, they must not be interleaved on the outgoing interface. Thus the scenario depicted in Figure A.3 must not occur. Otherwise, reassembly will not be successful and the frames will have to be discarded. This means that cells must be buffered (at LSR 3, in this example) until an entire packet’s worth of cells have been received, then forwarded contiguously on the outgoing interface. This functionality is called VC merge capability. We now summarize the VC merge issue. When traffic streams enter an ATM LSR on different interfaces but leave, destined for the same egress ATM LSR, on the same interface, we would often like to merge the two streams under a single label. This conserves labels and reduces signaling traffic associated with label bindings but requires VC merge capability to avoid the cell interleave problem previously mentioned. ATM was not originally devised with this capability in mind; in the early years of MPLS, many ATM switches did not support VC merge. On the downside, VC merge requires additional buffering resources. The notion of merging MPLS LSPs, however, is a boon to scalability. In closing, we note that recent work of the ATM Forum [19–21] has concentrated on ATM-MPLS interworking. A.3.7
Why Not Voice over ATM?
Actually, ATM is widely deployed in wide area network settings; voice traffic does run over ATM networks. There are deployments of various stripes: •
Circuit emulation service (in which ATM switches essentially act as glorified digital cross-connect systems);
•
Switched services (at the ATM layer and, to a limited extent, at the AAL2 layer [22]);
ATM LSR 1
Packet 1
ATM LSR 4
ATM LSR 3 ATM LSR 2 Figure A.3
Packet 2
VC merge illustration.
Direction of traffic
236
Data Link Layer Protocols
•
VoIP traffic (which is carried across ATM networks using AAL5);
•
Loop emulation service [23] is used to provide Voice over Digital Subscriber Line.
Therefore, let us rephrase the question as follows: Why is voice not carried “directly” over ATM (as in item 2 above) more often? We offer the following answers: •
The continuing evolution and sustained popularity of Ethernet means that ATM is “not the only game in town.” In our opinion, this is the most important of the lot.
•
It is natural to develop new services (and host the service logic) in an IP domain. We have never encountered discussions that mention ATM and service logic in the same breath. While this does not necessarily imply that we should also deploy IP bearer and control planes, the idea of universality does appeal to people. Moreover, it may be less expensive in the long run to manage fewer technologies.
•
Voice over IP can run over a wide variety of data link layers.
•
Existing IP over ATM schemes (e.g., MPLS with ATM as the data link layer) are imperfect.
In some circles, there seems to be a perception that ATM “lost out” to IP. We believe that this is largely incorrect. To the degree that ATM’s horizons have been limited by another technology, that technology is Ethernet. People did not believe that Ethernet would scale nearly as well as it has. Early on, ATM proponents thought that ATM would become a LAN technology. For instance, there were manufacturers that made 25 Mbit/s network interface cards, and software developers entertained the idea of including SVC capabilities in operating systems. But ATM never got a foothold in the LAN environment; Ethernet evolved in a way that allowed equipment manufacturers to keep prices relatively low, even as that technology’s capability set improved. Thus the vision of end-to-end ATM has never been realized. We may see large volumes of voice traffic being carried over MPLS, with ATM LSRs making up the data link layer. If so, “toll quality” voice may require sophisticated class-based queuing mechanisms at LSP ingress and merge points.
A.4
Ethernet Ethernet is the dominant data link layer technology for LANs. As it overcomes its distance and scalability limitations, Ethernet’s “reach” is extending to metropolitan and wide area networks (MANs and WANs). The frame structure remains the same in both realms (until and unless jumbo frames or something similar are standardized). Beyond the frame structure, however, MAN/WAN Ethernet solutions bear little resemblance to traditional LAN implementations.
A.4
Ethernet
237
Note that, unlike ATM and Frame Relay, Ethernet is connectionless: there is no notion of virtual circuit. A.4.1
History of Ethernet
The initial version of Ethernet appeared in 1976, followed in 1980 by the DECIntel-Xerox (DIX) standard. The DIX standard specified a transmission rate of 10 Mbit/s. The first Ethernet systems, which used coaxial cable, were based on bus topologies. Such networks were difficult to install and troubleshoot, leading to the introduction (in the late 1980s) of twisted-pair Ethernet. Twisted-pair Ethernet cables are similar to telephone wire. Bus topologies gave way to the familiar “star” topologies (in which computers on the same LAN are connected to a central “hub”). Meanwhile, the IEEE published the first version of its 802.3 standard in 1985. From there, the industry was able to coalesce around a single standard. A.4.2
Ethernet Frame Structure
The Ethernet Frame structure is shown in Figure A.4. The 7-byte preamble field is an alternating series of 0s and 1s that enables synchronization. The 1-byte start-offrame delimiter (SFD) contains the pattern ‘10101011’. The destination address and source address are 6-byte fields in which the first bit is used to distinguish between multicast and unicast addresses. Note that source addresses are always unicast. It is worth noting here that Ethernet addresses are globally unique. Each Ethernet address begins with an organizally unique identifier or initial address block that identifies the manufacturer of the hardware. The remaining bits are assigned by the manufacturer. We will cover the optional field marked VLAN info in Section A.4.4. If the value of the 2-byte type/length field is between 46 and 1,500 (inclusive), then that value indicates the number of bytes in the data field. (Payloads shorter than 46 bytes must be padded to enable reliable collision detection. Note that the minimum frame length for high-speed implementations, such as Gigabit Ethernet, is longer than for 10/100 Mbit/s implementations; we omit the details.) If the value of the type/length field is greater than or equal to 1,536, then it indicates an optional payload type. The frame check sequence is a cyclic redundancy check.
Preamble Source address
SFD VLAN info (optional)
Destination address Type/ Length
Data Frame check sequence Figure A.4
Ethernet frame structure.
238
Data Link Layer Protocols
A.4.3
CSMA/CD and Its Scalability Limitations
The 1990s brought Fast Ethernet (which operates at 100 Mbit/s), Gigabit Ethernet, and full-duplex mode. To understand the significance of the latter, we need to fill in a little more background. The data link layer is subdivided into the media access control (MAC) layer and the LLC layer. The MAC layer interfaces with the physical layer, whereas the LLC layer [4] sits directly above the MAC layer and interfaces with the network layer. (Strictly speaking, the LLC layer plays the role of MAC-client only on end systems; on hubs, bridges and switches the MAC-client is called the bridge entity. We discuss bridging in Section A.4.4.) The original MAC layer is based on carrier sense multiple access with collision detection (CSMA/CD, [24]). Similarly to many if not most LAN technologies, the transmission medium is shared. It will routinely happen that multiple hosts want to transmit data simultaneously, so there must be a means of deciding who gets to transmit at any given time. How does CSMA/CD resolve this resource contention problem? If a host wants to transmit data, and it does not “hear” anybody else transmitting, it starts to send its data. If another host begins to transmit right around the same time, the frames sent by the two hosts will interfere with one another and become garbled—this is a collision. Whenever this happens, the two hosts will notice the problem (this is the “CD” in CSMA/CD). Each will stop transmitting and set a randomized timer. Whoever’s timer expires first will retry its transmission. (Hearing this, the other host will wait its turn.) We say that the two hosts in the foregoing example are in the same collision domain. Clearly, as more and more hosts are added to a collision domain, the incidence of collisions increases; at some point, throughput drops precipitously. A.4.4
Hubs, Bridges, and Switches
Ethernet hubs are fairly simple devices. Each incoming packet is replicated and sent out on every port (except for the port it came in on). Devices connected to the same hub are therefore in the same collision domain, and we have seen that collision domains have definite scalability limits. (Moreover, stringing multiple hubs together does not do anything to solve the scalability problem.) This led to the development of Ethernet bridges. Bridging functionality, which is formally defined in [25], includes learning and filtering. Learning simply means that, by observing the source addresses on incoming frames, bridges develop an association of hosts with ports. Based on the results of the learning exercise, Ethernet bridges build filtering databases. Once it knows where a host “lives” (as evidenced by an entry in its filtering database), an Ethernet bridge will forward packets destined for that host only on the appropriate port. In this fashion, bridges subdivide networks into multiple collision domains. This is the aforementioned filtering functionality. Spanning Trees
In a network with multiple bridges, routing loops must be prevented. For example, before the learning process has progressed to a stable point, it would be common for
A.4
Ethernet
239
two bridges to receive frames from the same originating host from one another. For this reason, Ethernet bridges residing on the same LAN run a distributed spanning tree algorithm. Recall from Section 7.5.1 that a spanning tree is a subgraph with the following property: for any pair of nodes, there is one and only one interconnecting path. (This solves the problem of routing loops, since trees do not have loops.) Often, the spanning tree is not the whole network (in the sense that additional links may be present for the sake of redundancy). If a link goes down, the spanning tree is recalculated. However, this is not instantaneous; broadcast storms tend to occur (and persist until the spanning tree algorithm has converged). Note that all traffic is routed on the spanning tree—Ethernet does not come with ready-made load balancing capabilities. (To be fair, link aggregation is a partial exception; see the next section.) Schemes for addressing this problem have been proposed, but to the best of our knowledge there is not a mature standard in this area. Other Scalability Enhancements: Full-Duplex Links, Link Aggregation, and Virtual LANs
Imagine a point-to-point link connecting two bridges. If access to this link is governed by CSMA/CD, it cannot be run at very high utilization (if both bridges are transmitting the majority of the time, collisions will happen often and congestion will result). Full-duplex mode allows both bridges to transmit simultaneously, allowing for much higher utilization. Full-duplex mode requires implementation of flow control capabilities (which are optional in half-duplex mode). Suppose we want to add capacity by adding a second link between two bridges already interconnected. Clearly, we cannot have parallel links in a spanning tree. Link aggregation is a means of representing multiple physical links as a single logical link. The spanning tree algorithm only sees the logical link; the link aggregation sublayer takes care of load balancing among the constituent physical links. The IEEE 802.1Q standard [26] defines the notion of a virtual LAN (VLAN). VLANs allow administrators to assign end systems to logical groups. Filtering is then performed based on those logical groups. Among other benefits, VLANs enhance scalability by segregating switched networks into multiple broadcast domains. The VLAN info field in the Ethernet header (see Figure A.4) is actually subdivided into two 2-byte fields. The first is a type/length field with a designated value indicating that the second 2-byte field is a so-called VLAN tag. The VLAN tag contains a 3-bit user_priority field (which supports rudimentary QoS functionality) and a 12-bit VLAN ID. The maximum number of VLANs that can inhabit a single switched network is 4,094 (of the 212 = 4,096 possible values of the VLAN ID, two are reserved). VLAN tags are primarily useful on interbridge links. They are not typically used on links from end systems to hubs or bridges (so Ethernet cards in workstations normally do not need to support VLAN tagging). Bridges keep track of end systems’ VLAN membership, transparently to the end systems. Early generations of Ethernet bridges did not support full-duplex mode or VLAN tagging. Moreover, they did not typically possess multipath switching fabrics, meaning that simultaneous bursts from several interfaces would cause congestion. This was tolerable with a small
A note on the difference between bridges and switches.
240
Data Link Layer Protocols
number of ports but became a limiting factor on the number of ports a bridge could reasonably support. An Ethernet switch (which is also called a LAN switch) is basically a fancy bridge that is a device with numerous ports and a multipath switching fabric, as well as support for VLANs and full-duplex transmission. Bridge is still the term used in the standards (i.e. bridging functionality is a formally defined thing). LAN switch is a more informal term. A.4.5
Further Reading
We hope this brief introduction to Ethernet makes it clear that much has changed since the early days. By way of expository references on Ethernet and its evolution, the books by Spurgeon [27] and Seifert [28] are informative. Spurgeon also maintains a compendium of information at www.ethermanage.com/ethernet/ ethernet.html. The IEEE 802 LAN/MAN Standards Committee is the key standards body for Ethernet and related technologies. The landscape in this active area is still changing, and reference material becomes outdated quickly. For an outline of the IEEE 802 standards, the reader can consult [29]. There are numerous active IEEE 802 working groups. Examples include 802.17, resilient packet rings, and various wireless initiatives (802.11, 802.15, 802.16, and 802.20). Additional information is available at grouper.ieee.org/groups.
References [1] Simpson, W., RFC 1661, The Point-to-Point Protocol (PPP), IETF, July 1994, Part of IETF STD 51. [2] Simpson, W., RFC 1662, PPP in HDLC–like Framing, IETF, July 1994, Part of IETF STD 51. [3] Malis, A., and W. Simpson, RFC 2616, PPP over SONET/SDH, IETF, June 1999. [4] ANSI/IEEE Std 802.2, 1998 Edition, Local and Metropolitan Area Networks—Specific Requirements Part 2: Logical Link Control, IEEE, 1998, Adopted by the ISO/IEC and redesignated as ISO/IEC 8802-2:1998. [5] Giroux, N., and S. Ganti, Quality of Service in ATM Networks: State-of-the-Art Traffic Management, Upper Saddle River, NJ: Prentice Hall PTR, 1999. [6] Recommendation Q.2931, User-Network Interface (UNI) Layer 3 Specification for Basic Call/Connection Control, ITU-T, February 1995. [7] ATM Forum Technical Committee, af-sig-0061.000, ATM User-Network Interface (UNI) Signaling Specification, Version 4.0, ATM Forum, July 1996. [8] ATM Forum Technical Committee, af-pnni-0055.000, Private Network-Network Interface Specification Version 1.0 (PNNI 1.0), ATM Forum, March 1996 [9] Recommendation I.363.1, B-ISDN ATM Adaption Layer (AAL) Specification, Types 1 and 2, ITU-T, 1996. [10] ATM Forum Technical Committee, af-vtoa-0119.000, Low Speed Circuit Emulation Service (LSCES), ATM Forum, May 1999. [11] ATM Forum Technical Committee, af-vtoa–0078.000, Circuit Emulation Service Interoperability Specification Version 2.0, ATM Forum, January 1997.
A.4
Ethernet
241
[12] Recommendation I.363.5, B-ISDN ATM Adaption Layer Type 5 Specification, ITU-T, August 1996. [13] Recommendation Q.2100, B-ISDN Signaling ATM Adaption Layer (SAAL)—Overview Description, ITU-T, July 1994. [14] Recommendation I.363.2, B-ISDN ATM Adaption Layer 2 Specification, ITU-T, November 2000. [15] Recommendation I.366.1, Segmentation and Reassembly Service Specific Convergence Sublayer for the AAL Type 2, ITU-T, June 1998. [16] Recommendation I.366.2, AAL Type 2 Service Specific Convergence Sublayer for Narrowband Services, ITU-T, March 2000. [17] Recommendation Q.2630.1, AAL Type 2 Signaling Protocol—Capability Set 1, ITU-T, December 1999. [18] Recommendation Q.2630.2, AAL Type 2 Signaling Protocol—Capability Set 2, ITU-T, December 2000. [19] ATM Forum Technical Committee, af-aic-0178.001, ATM—MPLS Network Interworking Version 2.0, ATM Forum, August 2003. [20] ATM Forum Technical Committee, af-aic-0196.000, ATM-MPLS Network Interworking (N-to-One Model) Version 1.0, ATM Forum, August 2003. [21] ATM Forum Technical Committee, af-aic-0197.000, ATM-MPLS Internetworking Signaling Specification Version 1.0, ATM Forum, August 2003. [22] ATM Forum Technical Committee, af-vtoa-0113.000, ATM Trunking Using AAL2 for Narrowband Services, ATM Forum, February 1999. [23] ATM Forum Technical Committee, af-vmoa-145.001, Loop Emulation Service Using AAL2 Rev 1, ATM Forum, February 2003. [24] IEEE Std 802.3, Local and Metropolitan Area Networks—Specific Requirements Part 3: Carrier Sense Multiple Access With Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications, IEEE, March 2000. [25] ANSI/IEEE Std 802.1D, 1998 Edition, Local and Metropolitan Area Networks Common Specifications Part 3: Media Access Control (MAC) Bridges, IEEE, 1998, Adopted by the ISO/IEC and redesignated as ISO/IEC 15802-3:1998. [26] IEEE Std 802.1Q, Virtual Bridged Local Area Networks, IEEE, May 2003. [27] Spurgeon, C., Ethernet: The Definitive Guide, Sebastopol, CA: O’Reilly and Associates, February 2000. [28] Seifert, R., Gigabit Ethernet: Technology and Application for High-Speed LANs, Reading, MA: Addison-Wesley, May 1998. [29] IEEE Std 802-2001, IEEE Standard for Local and Metropolitan Area Networks: Overview and Architecture, IEEE, February 2002.
.
About the Author Matthew Stafford is currently a principal member of technical Staff at Cingular Wireless, where he works on next generation services. Matthew holds Ph.D. degrees in mathematics (1989, Northwestern University) and operations research (2000, University of Texas at Austin). His career in telecommunications began in 1996 with SBC. In 2001, Matthew’s group made the transition to Cingular Wireless. Matthew lives in Austin, Texas.
243
.
Index 3rd Generation Partnership Project (3GPP), 190, 195 defined, 190 QoS standards, 212–15
A Adaptive multirate (AMR) codec, 110–11 Address space IPv4, 67–68 IPv6, 68–69 Advanced Intelligent Network (AIN), 181 Algorithmic delay, 110 Application-level gateways (ALGs), 68 Assured forwarding (AF), 78 Asynchronous Transfer Mode (ATM), 4, 230–36 cell format, 231 cells, 46 defined, 230 header, 230 LSRs, 234–35 MPLS over, 234–35 QoS approach, 230 soft-permanent connections, 232 statistical multiplexing, 230–31 voice over, 233–34, 235–36 ATM Adaptation Layers (AALs), 232–33 AAL2, 233–34 AAL3/4, 233 defined, 232 list of, 233 Authentication, 176 Authentication, authorization, and accounting (AAA), 70
B Backhaul example, 12–13 Back-to-back user agents (B2BUAs), 157 Basic call process (BCP), 182 Basic call state models (BCSMs), 182 Bearer connectivity
legacy switch, 28 next generation switch, 29 Bearer Independent Call Control (BICC), 143–44 defined, 143 initial capability set, 144 interswitch signaling vs., 170 specification, 143–44 Bearer plane, 107–16 inhabiting, 143 interworking, 111–13 separating, 4, 21–30 voice encoding, 107–11 VoIP, 113–16 Bearer traffic defined, 11 new types, 25–26 STPs and, 32 Best-effort service model, 77 Billing functionality, 208 Border Gateway Protocol (BGP), 73, 221
C Call-control signaling, 4, 33 Caller ID service, 34, 35 Caller ID subscription, 34 Call state model. See Finite state machines (FSMs) Carrier sense multiple access with collision detection (CSMA/CD), 238 Centrex service, 41 Channels, 11 Circuit identification codes (CICs), 121, 122 Circuit-switched networks billing functionality, 207–8 deployment success, 7 emergency service, 208 functionality processing, 6–7 interworking with, 142–43 investment, 7 IP routing/dimensioning comparison, 205–6
245
246
Circuit-switched networks (continued) limitations, 35–36 properties, 199–208 QoS, 206 reliability, 7, 207 routing, 26 scalability, 207 security, 206 stability, 205 survivability, 207 voice optimization, 6 Circuit switches packet switches vs., 19 schematic, 16 service plane assistance, 35 Class 5 switches, 41 Class-based queuing, 209–10 Codecs AMR, 110–11 defined, 107 G.711, 107–8 waveform, 109 Code division multiple access (CDMA), 58 Code excited linear predictive (CELP) schemes, 109 Comfort noise, 115–16 Common Channel Signaling 6 (CCS6), 56 Constraint-based routing, 211–12 Control plane, 31–32 ATM, 231–32 illustrated, 32 inhabiting, 14 location, 34 packet-switched, 39 separating, 4, 21–30 Customized Applications for Mobile Enhanced Logic (CAMEL), 185
D Data link layer, 45–46 defined, 45 error detection, 47 examples, 45–46 responsibilities, 45 Data link layer protocols, 227–40 ATM, 230–36 Ethernet, 236–40 Frame Relay, 227–29 HDLC, 227 Data networks, 2 Descriptors, Megaco, 130–31, 133, 135, 136 Detection points (DP), 182
Index
DiffServ, 78–79, 210–12 architecture, 78 Code Point (DSCP), 78 defined, 78, 210 deployment options, 211 at edge, 80 RFCs, 78 Digital subscriber line (DSL), 189 DigitMap, 131 Distributed architecture bearer and control separation, 21–26 defined, 11, 21 Distributed fabric, 6, 13 Distributed functional plane (DFP), 181, 182–83 Distributed switching, 14, 40 Domain name system (DNS), 69 Dual tone multifrequency (DTMF), 112, 143 Dynamic Host Configuration Protocol (DHCP), 68 Dynamic routing, 26–27 nonhierarchical, 202–3 problems, 202
E Electrical and Electronics Engineers (IEEE), 68 Electronic numbering (ENUM), 221–22 defined, 221 infrastructure, 222 Emergency services, 208 Encapsulation, of digital sound, 111–12 End office switch, 41 Enhanced service offering, 225–26 Erlang B formula, 104 Ethernet, 236–40 bridges, 238 CSMA/CD, 238 defined, 236 frame structure, 237 history, 237 hubs, 238 switches, 238–40 Expedited forwarding (EF), 78 Exterior gateway protocols, 73
F Fabrics, 11 defined, 5, 11 defining, 16–19 distributed, 6, 13 intergateway switching, 14, 17, 40
Index
packet, 26–30 Finite state machines (FSMs) defined, 53 illustrated, 55 states, 53, 54 state transitions, 53, 54–55 Frame Relay, 227–29 defined, 227–28 header, 228 header fields, 228 label switching, 229 virtual circuits, 229 Frames, 46 Functional localization, 24
G G.711 codec, 107–8 decoder, 112 encoder, 112 sampling rate, 107 voice quality, 108 G.723.1 vocoder, 110 G.728 vocoder, 110 G.729 vocoder, 110 Gateway functionality, 103 Gateway MSCs (GMSCs), 102 Geographical localization, 25 Global functional plane (GFP), 181, 182 Global title translation (GTT), 95, 96–98 defined, 95, 96 process, 96–97 at SCCP layer, 98
H H.323 protocol suite, 148–52 call control, 149–50 defined, 148 evolution of, 151–52 gatekeepers, 150–51 heritage, 148–49 media control signaling, 149–50 terminology, 148 tunneling, 152 HDLC, 227 Home location register (HLR), 101, 102 HTTP digest authentication, 176
I INAP, 185 Integrated Local Management Interface (ILMI), 68
247
Integrated Services Digital Network (ISDN), 31, 32 call flow, 149 User Part (ISUP), 55, 58–59 Intelligent networks (INs), 180–85 AIN, 181 capability sets, 183–84 conceptual model, 181 distributed functional plane, 181, 182–83 global functional plane, 181, 182 JAIN, 186 limitations, 184–85 signaling protocol, 185 SIP and, 186–89 trade-offs, 184–85 WIN, 185 Interactive voice recognition (IVR), 112 Interfaces on legacy voice switches, 23 open, 22–23 Intergateway switching fabrics, 14, 40 Interior gateway protocols, 73 Internet Engineering Task Force (IETF), 63 Internet Protocol (IP), 4, 63–85 addressing/address resolution, 67–69 defined, 48 differentiated services, 78–79 headers, 48, 64–67 Mobile, 84 MPLS and, 79–80 multiservice networks, 80–81 QoS, 76–77 reachability information, 76 routing, 71–76 statistical multiplexing, 77–78 versions, 63 See also IP routers; IPv4; IPv6; protocols Interswitch signaling, 168–70 BICC comparison, 170 SIP for, 169 IntServ, 79, 210–12 defined, 210 deployment options, 211 IP Multimedia Subsystem (IMS), 192–95 defined, 192 as flexible platform, 192 functional elements, 192–93 I-CSCF, 193 P-CSCF, 192–93 S-CSCF, 192, 194 services, triggering, 194–95 services mobility, 193–94
248
IP packet transfer delay (IPTD), 213 IP routers defined, 48 with Ethernet and Frame Relay interfaces, 53 packet flow through, 52 IP Security Protocol (IPSec), 70 IPv4 address space, 67–68 predominance, 63 See also Internet Protocol (IP) Ipv4 header, 64–65 fields, 64–65 illustrated, 64 IPv6 header vs., 67 IPv6 address space, 68–69 migration to, 63 Mobility for (mip6), 219 Operations (v6ops), 219 ROHC and, 219–21 Signaling and Handoff Optimization (mipshop), 219 site-local addresses, 69 Site Multihoming in (multi6), 219 See also Internet Protocol (IP) IPv6 header, 65–67 extension, 66–67 fields, 65–66 fixed length, 65 illustrated, 66 IPv4 header vs., 67 performance and, 66 ISDN User Part (ISUP), 55, 58–59 call-control signaling, 147 call flow diagram, 59 defined, 55 messages, 58, 59 use, 58 ITU-T, 212–15 3GPP specifications and, 214 performance objectives, 213 Y.1221 recommendation, 214
J Java Advanced Intelligent Network (JAIN), 186
L Label switched paths (LSPs), 79 Label switched routers (LSRs), 234–35
Index
Label switching, 229, 230 Latency, 142–43 Lines, 40 Link aggregation, 239 Linksets, 91 Load balancing, 75 Local area networks (LANs), 1 defined, 2 virtual (VLANs), 239–40 See also Ethernet Long-distance routing, 200
M Maximum transmission units (MTUs), 65 Media gateway controllers (MGCs), 14, 119 Media Gateway Control Protocol (MGCP), 119, 137–42 AuditConnection verb, 140 AuditEndpoint verb, 140 call flow example, 137–38 conference call setup, 139–40 connection model, 139 connections, 139 EndpointConfiguration verb, 140 Megaco vs., 137, 138–40 MoveConnection verb, 141 NotificationRequest verb, 140 packages, 142 response acknowledgments, 141 RestartInProgress verb, 141 signaling flow, 138 virtual endpoints, 139 Media gateways, 40, 119 behavior, 24 capabilities, 120 controller instruction, 18 defined, 14 functions, 119–22 number of, 16 switching fabrics, 15 Megaco/H.248, 119, 123–37 Add command, 126 Audit/Capabilities command, 126 Audit Value command, 126 call flow, 126–29 call flow illustration, 126 call setup, 127 call teardown, 127–28 call waiting example, 129, 137 CHOOSE wildcard, 132 commands, 125–26, 127 conference call example, 135
Index
connection model, 123–24 connection model illustrations, 124, 125 contexts, 124–25, 131 descriptors, 130–31, 133, 135, 136 MGCP vs., 137, 138–40 Modify command, 126 Move command, 126, 129–30 Notify command, 126 packages, 137 properties, 130 sample messages, 132–34 ServiceChange command, 126 Subtract command, 126 terminations, 124, 131 three-way calling example, 134–36 transport, 137 Message Transfer Part Level 1 (MTP1), 56, 93 Message Transfer Part Level 2 (MTP2), 56–57, 94 defined, 56 protocol entries, 94 sequence numbers, 56 signal units, 94 Message Transfer Part Level 3 (MTP3), 57 fields, 95 routing capabilities, 94 Metastable states, 203 Middlebox transversal, 217–19 Mobile Application Part (MAP), 56, 57–58, 100–103 GSM signaling flows, 101 messages, 102, 103 Update Location Request, 101 Mobile IP, 84 Mobile switching centers (MSCs), 100–101 defined, 100 Gateway (GMSCs), 102 Mobility management, 58, 100 Motivation, this book, 7–8 Multiplexing, 50 Multiprotocol label switching (MPLS), 79–80 at core, 80 IP QoS and, 80 philosophy, 79 Multiservice networks, 80–81
N Network address translation (NAT), 67–68, 217–19 Network layer, 46 Network optimization, 71–76 defined, 71
249
model limitations, 72 objectives, 72 Next generation switches bearer connectivity, 29 benefits, 22 defined, 14 fabric definition, 15, 16 features, 22 as geographically distributed entities, 128 schematic representation, 17 signaling between, 143–44 with signaling paths, 18 See also Switches Next generation switching architectures, 21
O Offer/answer model, 167–68 Open interfaces, 22–23 Open Service Access (OSA), 186 Open Shortest Path First (OSPF), 72–73 defined, 72–73 routers, 73 Organization, this book, 7–8
P Packetization delay, 113 Packets defined, 18 fabrics, 26–30 header information, 18 SCTP, 82 Packet switches circuit switches vs., 19 in inter gateway switching, 17 Packet telephony carrier motive/opportunity, 5–6 case for, 2–3 economic benefits, 3 in maturation cycle, 225 motivation, 21–30 services, 3 telco deployment, 225 universality, 4 Parameter negotiation, 174 Parlay, 185–86 Per-hop forwarding behaviors (PHBs), 78 Permanent virtual circuits (PVCs), 231 Physical layer, 47 Playout buffers, 113 Point codes, 92 Poisson arrival process, 204
250
Policing, 212 Policy-based admission control, 215–16 Priority queuing, 210 Private branch exchanges (PBXs), 41 Private network-network Interface (PNNI), 232 Protocols, 43–61 ATM, 230–36 BGP, 73 BGP-4, 221 design, modularity in, 147–48 Ethernet, 236–40 Frame Relay, 227–29 FSM and, 53–55 H.323 suite, 148–52 HDLC, 227 INAP, 185 IP, 48, 63–85 layer 4, 81–84 MGCP, 119–44 OSPF, 72–73 RADIUS, 70 RIP, 73 routing, 72–75 RSVP, 79 RTP, 113–16 SCTP, 82–84 SIP, 152–57 SS7, 55–69 STUN, 218 summary, 60–61 TCP, 48–51 TRIP, 221 UDP, 52–53, 81–82, 83–84 Protocol stack data link layer, 45–46 defined, 43 generic layer descriptions, 44–48 last in/first out data structure comparison, 44 network layer, 46 physical layer, 47 process, 43–44 SS7, 89, 92–93 summary, 60 transport layer, 46 Proxy servers, 155, 156–57 defined, 156 function, 157 stateful, 156 stateless, 156 See also Session Initiation Protocol (SIP)
Index
PSTN and Internet Interworking (PINT), 186–87 aim, 186 RFC, 186 Public land mobile networks (PLMNs), 41 Public-switched telephone networks (PSTNs), 40 Push to Talk, 190–91
Q Quality of service (QoS), 39 3GPP standards, 212–15 ATM, 230–31 attributes in SDP, 173–74 circuit-switched networks, 206 defined, 77 framework, 77 IP and, 76–77 MPLS and, 80 statistical multiplexing and, 84 traffic engineering in IP networks and, 209–15 Queuing class-based, 209–10 priority, 210 weighted fair (WFQ), 210
R Reachability information, 76 Real Time Transport Protocol (RTP), 113–16 Control Protocol (RTCP), 114 header, 114 payload formats, 115–16 uses, 113 Redirect servers, 155, 156–57 defined, 156 functions, 156 response, 165 See also Session Initiation Protocol (SIP) Registrars, 157 Registration, admission, and status (RAS) signaling, 149–50, 151 illustrated, 151 necessity of, 151 Re-INVITEs, 171 Reliability, 207 Remote Authentication Dial-In User Service (RADIUS), 70 Resource ReSerVation Protocol (RSVP), 79 signaling, 171 SIP and, 171–74
Index
Traffic Engineering (RSVP-TE), 212 RFCs DiffServ, 78 PINT, 186 SIP-T, 174 SPIRITS and, 189 Round-robin scheduling, 210 Routesets, 92 Routing, 39 circuit-switched networks, 26 constraint-based, 211–12 data network schemes, 39 dynamic, 26–27, 202–3 fixed hierarchical, 203 hot spots, 75 link cost adjustment and, 76 long-distance, 200 loop example, 74 packet networks, 26–29 protocol convergence, 73–75 protocols, 72–75 scalability, 75 telco, 199–200 trade-offs, 75–76 Routing Information Protocol (RIP), 73
S Scalability, circuit-switched networks, 207 Scheduling algorithms, 210 SCTP, 52–53, 82–84 common header, 82 data chunk header, 83 defined, 82 header fields, 83 packets, 82 SS7 traffic and, 217 TCP comparison, 83–84 use, 216 SDPng, 216 Secure/Multipurpose Internet Mail Extensions (S/MIME), 176 Security, 176 circuit-switched networks, 206 TLS, 176 Segmentation/reassembly, 50 Service architectures, 222 Service control points (SCPs) defined, 34, 89 description, 91 representation of, 91 Service creation environment, 36 Service-level agreements (SLAs), 215–16
251
Service plane circuit switches and, 35 illustrated, 35 inhabiting, 143 Services, 3, 32–33 alternative billing schemes, 33 caller ID, 34, 35 implementing, 179–97 introducing, 23–25 maintaining, 23–25 operational definition, 32 short message, 33 SIP and, 186–89 SS7 architecture, 180–85 unconditional, 33 vertical, 32–33 Services in PSTN requesting Internet services (SPIRITS), 186–89 aim, 187 functionality, 187 gateway, 188 RFC status and, 189 schematic configuration, 188 Session control, 145–57 “generic,” 145–48 H.323 protocol suite, 148–52 signaling flow for, 146 SIP, 152–57 Session Description Protocol (SDP), 119, 122–23 defined, 122 detailed example, 159–61 line types, 161 offer/answer model, 167–68 payload, 166 QoS attributes in, 173–74 scope, 122 SDPng and, 216 Session Initiation Protocol (SIP), 100, 152–57 back-to-back user agents, 157 basics, 152–55 call, making, 164–67 call flow, with resource management, 172 definition, 153 detailed example, 161–68 end-to-end encryption, 176 forking requests, 168 functional entities, 155–57 header compression, 192, 221 HTTP digest authentication support, 176 identifiers, 153–54 INs and, 186–89
252
Session Initiation Protocol (SIP) (continued) for interswitch signaling, 168–70 methods, 154, 170–71 methods definition, 154 offer/answer model, 167–68 proxy servers, 155, 156–57 redirect servers, 155, 156–57 registrars, 157 registration illustration, 162 registration procedures, 161–64 requests and responses, 154–55 response message categories, 155 response status codes, 155 RSVP and, 171–74 services and, 186–89 signaling flow, 152 strengths, 170 in wireless networks, 190–95 See also SIP-T Short Message Service (SMS), 179, 195–96 defined, 33 in support of other applications, 196 use, 195 SigComp, 191 Signaling call-control, 4, 33 gateways, 14, 40 interswitch, 168–70 between next generation switches, 143–44 RAS, 149–50, 151 Signaling Connection Control Part (SCCP), 57, 95–98 classes of service, 95 GTT, 95, 96–98 in-sequence delivery, 95 Signaling System 7. See SS7 Signaling transfer points (STPs) bearer traffic and, 32 defined, 31 responsibilities, 31 routing decisions, 31 SIP-T, 174–76 interworking requirements, 175 RFC, 174 transparency requirement, 175 See also Session Initiation Protocol (SIP) Softswitches. See Next generation switches Spanning trees, 238–39 SS7, 55–59, 89–105 addressing, 91–92 architecture, 89–91 defined, 55
Index
footprint, 100 ISUP, 58–59 ITU-T standards, 89 link types, 90 MAP, 56, 57–58, 100–103 message types, 56 MTP1, 94 MTP2, 56–57, 94 MTP3, 57, 94–95 network management traffic, 93 networks, 89 packet formats, 93 point codes, 92 protocol stack, 89, 92–93 routing, 91–92 routing example, 96 SCCP, 57, 95–98 SCPs, 89, 91 service architectures, 180–85 signaling transfer points, 90 strengths, 105 subsystem number, 92 summary, 103–5 TCAP, 57, 98–100 “traditional” stack illustration, 93 traffic over IP network, 82–83 voice switches, 90 weaknesses, 104 Stability, 205 States active, 53 analyzing information, 53, 54 collecting information, 53, 54 defined, 53 idle, 53 routing and alerting, 53, 54 See also Finite state machines (FSMs) State transitions, 54–55 defined, 53 illustrated, 55 See also finite state machines (FSMs) Statistical multiplexing, 77–78 ATM, 230–31 control plane, 231–32 QoS and, 84 switches, 231 STUN protocol, 218 Subsystem number (SSN), 91, 92 Survivability, 207 Switches, 11 “big,” 16 circuit, 15
Index
Class 5, 41 components, 14–15 components illustration, 16 defined, 11 design, 3–4 end office, 41 Ethernet, 238–40 legacy voice, 22, 23 packet, 17 See also Next generation switches
T TCP/IP networking, 51–52 Telco routing, 199–200 Telephony Routing over IP (TRIP) protocol, 221 Time division multiple access (TDMA), 58 Time division multiplexing (TDM), 41 TISPAN, 222 Traffic contracts, 212 Traffic engineering, 203–4 defined, 199 in IP networks, 209–15 telco routing and, 199–200 Traffic intensity, 203 Traffic shaping, 212 Transaction Capabilities Application Part (TCAP), 57, 98–100 components, 99 defined, 98 invoke component, 99 number portability, 99–100 queries, 100 reject component, 99 return error component, 99 return response component, 99 transactions, 99 Transcoding, 111 Transmission Control Protocol (TCP), 48–51 alternatives, 52–53 applications running over, 48 defined, 48 flow control, 49, 50, 84 functionality, 49–50 header, 48–49 multiplexing, 50 ordering, 84 segmentation and reassembly, 50 sequencing/eliminating duplication, 49–50 sophistication, 50 UDP/SCTP comparison, 83–84 See also Protocols
253
Transport layer, 46 Transport Layer Security (TLS), 69 hop-by-hop basis, 176 working group, 69 Truitt’s model, 200–202 topology, 201 triangle, 202 Trunks, 40 Tunneling, 152
U UDP, 52–53, 81–82, 83–84 header, 82 TCP comparison, 83–84 uses, 81 Uniform Resource Locators (URLs), 69 User-Network Interface (UNI), 232
V Vertical services, 32–33 Virtual circuits (VCs), 229 Virtual home environment (VHE), 193–94 Virtual LANs (VLANs), 239–40 Virtual paths (VPs), 234 Vocoders defined, 109 G.723.1, 110 G.728, 110 G.729, 110 GSM adaptive multirate, 110–11 Voice codecs bit rates, 29 low bit-rate, 29–30 Voice encoding, 107–11 digital, 108 G.711 codec, 107–8 G.723.1 vocoder, 110 G.728 vocoder, 110 G.729 vocoder, 110 Voice over ATM, 113 Voice over IP (VoIP), 113–16 carrier grade, 113 protocol routing, 221 Voice telephony, 1–2
W Waveform codecs, 109 Weighted fair queuing (WFQ), 210 Wireless Intelligent Network (WIN), 185
X X.25 protocol, 227–28
.
Recent Titles in the Artech House Telecommunications Library Vinton G. Cerf, Senior Series Editor Access Networks: Technology and V5 Interfacing, Alex Gillespie Achieving Global Information Networking, Eve L. Varma et al. Advanced High-Frequency Radio Communications, Eric E. Johnson et al. ATM Interworking in Broadband Wireless Applications, M. Sreetharan and S. Subramaniam ATM Switches, Edwin R. Coover ATM Switching Systems, Thomas M. Chen and Stephen S. Liu Broadband Access Technology, Interfaces, and Management, Alex Gillespie Broadband Local Loops for High-Speed Internet Access, Maurice Gagnaire Broadband Networking: ATM, SDH, and SONET, Mike Sexton and Andy Reid Broadband Telecommunications Technology, Second Edition, Byeong Lee, Minho Kang, and Jonghee Lee The Business Case for Web-Based Training, Tammy Whalen and David Wright Centrex or PBX: The Impact of IP, John R. Abrahams and Mauro Lollo Chinese Telecommunications Policy, Xu Yan and Douglas Pitt Communication and Computing for Distributed Multimedia Systems, Guojun Lu Communications Technology Guide for Business, Richard Downey, Seán Boland, and Phillip Walsh Community Networks: Lessons from Blacksburg, Virginia, Second Edition, Andrew M. Cohill and Andrea Kavanaugh, editors Component-Based Network System Engineering, Mark Norris, Rob Davis, and Alan Pengelly Computer Telephony Integration, Second Edition, Rob Walters Customer-Centered Telecommunications Services Marketing, Karen G. Strouse Deploying and Managing IP over WDM Networks, Joan Serrat and Alex Galis, editors Desktop Encyclopedia of the Internet, Nathan J. Muller Digital Clocks for Synchronization and Communications, Masami Kihara, Sadayasu Ono, and Pekka Eskelinen Digital Modulation Techniques, Fuqin Xiong E-Commerce Systems Architecture and Applications, Wasim E. Rajput Engineering Internet QoS, Sanjay Jha and Mahbub Hassan Error-Control Block Codes for Communications Engineers, L. H. Charles Lee
Essentials of Modern Telecommunications Systems, Nihal Kularatna and Dileeka Dias FAX: Facsimile Technology and Systems, Third Edition, Kenneth R. McConnell, Dennis Bodson, and Stephen Urban Fundamentals of Network Security, John E. Canavan Gigabit Ethernet Technology and Applications, Mark Norris Guide to ATM Systems and Technology, Mohammad A. Rahman A Guide to the TCP/IP Protocol Suite, Floyd Wilder Home Networking Technologies and Standards, Theodore B. Zahariadis Information Superhighways Revisited: The Economics of Multimedia, Bruce Egan Installation and Maintenance of SDH/SONET, ATM, xDSL, and Synchronization Networks, José M. Caballero et al. Integrated Broadband Networks: TCP/IP, ATM, SDH/SONET, and WDM/Optics, Byeong Gi Lee and Woojune Kim Internet E-mail: Protocols, Standards, and Implementation, Lawrence Hughes Introduction to Telecommunications Network Engineering, Second Edition, Tarmo Anttalainen Introduction to Telephones and Telephone Systems, Third Edition, A. Michael Noll An Introduction to U.S. Telecommunications Law, Second Edition, Charles H. Kennedy IP Convergence: The Next Revolution in Telecommunications, Nathan J. Muller LANs to WANs: The Complete Management Guide, Nathan J. Muller The Law and Regulation of Telecommunications Carriers, Henk Brands and Evan T. Leo Managing Internet-Driven Change in International Telecommunications, Rob Frieden Marketing Telecommunications Services: New Approaches for a Changing Environment, Karen G. Strouse Mission-Critical Network Planning, Matthew Liotine Multimedia Communications Networks: Technologies and Services, Mallikarjun Tatipamula and Bhumip Khashnabish, editors Next Generation Intelligent Networks, Johan Zuidweg Open Source Software Law, Rod Dixon Performance Evaluation of Communication Networks, Gary N. Higginbottom Performance of TCP/IP over ATM Networks, Mahbub Hassan and Mohammed Atiquzzaman
Practical Guide for Implementing Secure Intranets and Extranets, Kaustubh M. Phaltankar Practical Internet Law for Business, Kurt M. Saunders Practical Multiservice LANs: ATM and RF Broadband, Ernest O. Tunmann Principles of Modern Communications Technology, A. Michael Noll A Professional’s Guide to Data Communication in a TCP/IP World, E. Bryan Carne Programmable Networks for IP Service Deployment, Alex Galis et al., editors Protocol Management in Computer Networking, Philippe Byrnes Pulse Code Modulation Systems Design, William N. Waggener Security, Rights, and Liabilities in E-Commerce, Jeffrey H. Matsuura Service Level Management for Enterprise Networks, Lundy Lewis Signaling and Switching for Packet Telephony, Matthew Stafford SIP: Understanding the Session Initiation Protocol, Second Edition, Alan B. Johnston Smart Card Security and Applications, Second Edition, Mike Hendry SNMP-Based ATM Network Management, Heng Pan Spectrum Wars: The Policy and Technology Debate, Jennifer A. Manner Strategic Management in Telecommunications, James K. Shaw Strategies for Success in the New Telecommunications Marketplace, Karen G. Strouse Successful Business Strategies Using Telecommunications Services, Martin F. Bartholomew Telecommunications Cost Management, S. C. Strother Telecommunications Department Management, Robert A. Gable Telecommunications Deregulation and the Information Economy, Second Edition, James K. Shaw Telecommunications Technology Handbook, Second Edition, Daniel Minoli Telemetry Systems Engineering, Frank Carden, Russell Jedlicka, and Robert Henry Telephone Switching Systems, Richard A. Thompson Understanding Modern Telecommunications and the Information Superhighway, John G. Nellist and Elliott M. Gilbert Understanding Networking Technology: Concepts, Terms, and Trends, Second Edition, Mark Norris Videoconferencing and Videotelephony: Technology and Standards, Second Edition, Richard Schaphorst Visual Telephony, Edward A. Daly and Kathleen J. Hansell
Wide-Area Data Network Performance Engineering, Robert G. Cole and Ravi Ramaswamy Winning Telco Customers Using Marketing Databases, Rob Mattison WLANs and WPANs towards 4G Wireless, Ramjee Prasad and Luis Muñoz
World-Class Telecommunications Service Development, Ellen P. Ward For further information on these and other Artech House titles, ® including previously considered out-of-print books now available through our In-Print-Forever ® (IPF ) program, contact: Artech House 685 Canton Street Norwood, MA 02062 Phone: 781-769-9750 Fax: 781-769-6334 e-mail: [email protected]
Find us on the World Wide Web at: www.artechhouse.com
Artech House 46 Gillingham Street London SW1V 1AH UK Phone: +44 (0)20 7596-8750 Fax: +44 (0)20 7630-0166 e-mail: [email protected]