The IMA Volumes in Mathematics and its Applications Volume 143
Series Editors
Douglas N. Arnold Arnd Scheel
Institute for Mathematics and its Applications (IMA) The Institute for Mathematics and its Applications was established by a grant from the National Science Foundation to the University of Minnesota in 1982. The primary mission of the IMA is to foster research of a truly interdisciplinary nature, establishing links between mathematics of the highest caliber and important scientific and technological problems from other disciplines and industries. To this end, the IMA organizes a wide variety of programs, ranging from short intense workshops in areas of exceptional interest and opportunity to extensive thematic programs lasting a year. IMA Volumes are used to communicate results of these programs that we believe are of particular value to the broader scientific community. The full list of IMA books can be found at the Web site of the Institute for Mathematics and its Applications: http://www.ima.umn.edu/springer/volumes.html Douglas N. Arnold, Director of the IMA
**** ****** IMA ANNUAL PROGRAMS
1982-1983 1983-1984 1984-1985 1985-1986 1986-1987 1987-1988 1988-1989 1989-1990 1990-1991 1991-1992
1992-1993 1993-1994 1994-1995 1995-1996 1996-1997 1997-1998 1998-1999
Statistical and Continuum Approaches to Phase Transition Mathematical Models for the Economics of Decentralized Resource Allocation Continuum Physics and Partial Differential Equations Stochastic Differential Equations and Their Applications Scientific Computation Applied Combinatorics Nonlinear Waves Dynamical Systems and Their Applications Phase Transitions and Free Boundaries Applied Linear Algebra Control Theory and its Applications Emerging Applications of Probability Waves and Scattering Mathematical Methods in Material Science Mathematics of High Performance Computing Emerging Applications of Dynamical Systems Mathematics in Biology
Continued at the back
Prathima Agrawal Daniel Matthew Andrews Philip 1. Fleming George Yin Lisa Zhang Editors
Wireless Communications
~ Springer
Prathima Agrawal Department of Electrical and Computer Engineering Auburn University Auburn, AL 36849-5201 USA www.eng.auburn.edu/-pagrawal
Daniel Matthew Andrews Bell Laboratories Lucent Technologies Murray Hill, NJ 07974-0636 USA em.bell-labs. com!cm/ms/who/andrews/
Philip 1. Fleming Network Advanced Technology Motorola, Inc. Arlington Heights, IL 60004 USA
George Yin Department of Mathematics Wayne State University Detroit, MI 48202 USA wwwmath.wayne.edu/-gyin/
Lisa Zhang Computing Sciences Research Center Bell Laboratories Lucent Technologies Murray Hill, NJ 07974 USA em.bell-labs.com/who/ylz/
Series Editors Douglas N. Arnold Arnd Scheel Institute for Mathematics and its Applications University of Minnesota Minneapolis, MN 55455 USA
Mathematics Subject Classification (2000): 90B18, 94-06, 94A05, 60G35 Library of Congress Control Number: 2006933293 ISBN-IO: 0-387-37269-5 ISBN-13: 978-0387-37269-3 © 2007 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
Camera-ready copy provided by the IMA. 9 8 7 6 543 2 1 springer.com
FOREWORD
This IMA Volume in Mathematics and its Applications
Wireless Communications contains papers based on invited lectures at the very successful IMA Summer Program on Wireless Communications, held on June 22 - July 1, 2005. We would like to thank Prathima Agrawal (Auburn University), Daniel Matthew Andrews (Lucent Technologies), Philip J. Fleming (Motorola, Inc.), George Yin (Wayne State University), and Lisa Zhang (Lucent Technologies) for their superb role as workshop organizers and editors of the proceedings. We take this opportunity to thank the National Science Foundation for its support of the IMA.
Series Editors Douglas N. Arnold, Director of the IMA Arnd Scheel, Deputy Director of the IMA
v
PREFACE
This volume presents papers, based on invited talks given at the 2005 IMA Summer Workshop on Wireless Communications, held at the Institute for Mathematics and Its Applications, University of Minnesota, June 22 July 1, 2005. The conference provided a well blended program to facilitate the communications between academia and the industry, and to bridge the mathematical sciences, engineering, information theory, and communication communities. The emphases were on design and analysis of computationally efficient algorithms to better understand the behavior and to control the wireless telecommunication networks. As an achieve, this volume presents some of the highlights of the conference, and collects papers covering a broad spectrum of topics. All papers have been reviewed. Without the help, assistance, support, and encouragement of many people, this workshop could not come into being. We thank the invited speakers, the poster presenters, and all attendees for making the conference a successful event. Our thanks go to Douglas N. Arnold and Fadil Santosa for helping us shaping the conference and proving us with valuable comments and suggestions. We are grateful to Debra Lewis and the IMA staff for their tireless help in the preparation stage and during the conference. The assistance from Arnd Scheel in preparing the proceedings is gratefully acknowledged. We also thank Patricia V. Brick and Dzung N. Nguyen for their help and assistance for putting the final product together in a beautiful piece. Prathima Agrawal
Department of Electrical and Computer Engineering Auburn University
Daniel Matthew Andrews Bell Laboratories Lucent Technologies Philip J. Fleming Network Advanced Technology Motorola, Inc. George Yin Department of Mathematics Wayne State University
Lisa Zhang Bell Laboratories Lucent Technologies VB
CONTENTS Foreword
v
Preface
vii
A survey of scheduling theory in wireless data networks Matthew Andrews
1
Wireless channel parameters maximizing TCP throughput. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19 Francois Baccelli, Rene L. Cruz, and Antonio Nucci Heavy traffic methods in wireless systems: towards modeling heavy tails and long range dependence. . . . . . . . . . . . . . . . . . . . .. 53 Robert T. Buche, Arka Ghosh, Vladas Pipiras, and Jim X. Zhang Structural results on optimal transmission scheduling over dynamical fading channels: a constrained Markov decision process approach Dejan V. Djonin and Vikram K rishnamurthy
75
Entropy, inference, and channel coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 99 J. Huang, C. Pandit, S.P. Meyn, M.Medard, and V. Veeravalli Optimization of wireless multiple antenna communication system throughput via quantized rate control M.A. Khojastepour, X. Wang, and M. Madihian
125
Communication strategies and coding for relaying. . . . . . . . . . . . . . . . . .. 163 Gerhard Kramer Scheduling and control of multi-node mobile communications systems with randomly-vary.ing channels by stability methods Harold J. Kushner
ix
',' . . . . . . .. 177
CONTENTS
x
A game theoretic approach to interference management in cognitive networks
199
Nie Nie, Cristina Comaniciu, and Prathima Agrawal Enabling interoperability of heterogeneous ad hoc networks
221
Santosh Pandey and Prathima Agrawal Overlay networks for wireless ad hoc networks
237
Christian Scheideler Dimensionality reduction, compression and quantization for distributed estimation with wireless sensor networks
259
Joannis D. Schizas, Alejandro Ribeiro, and Georgios B. Giannakis Fair allocation of a wireless fading channel: an auction approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 297 Jun Sun and Eytan Modiano Modelling and stability of FAST TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 331
Jiantao Wang, David X. Wei, Joon-Young Choi, and Steven H. Low
List of workshop participants
357
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS MATTHEW ANDREWS*
Abstract. We survey some results for scheduling data in wireless data systems such as lxEV-DO. An important feature of such systems is that the channel rates between the basestation and the mobile users are both user-dependent and time-varying. The wireless data scheduling problem has recently received a great deal of attention in the literature. However, comparisons of results are sometimes difficult due to the fact that many different models have been studied. In this survey we describe some of the models that have been proposed and analyze the performance of different scheduling algorithms within these models.
1. Introduction. The advent of third-generation wireless systems such as CDMA2000 lxEV-DO [15, 18) means that mobile Internet users can now obtain high data rates in cellular systems. However, in order to effectively utilize the wireless capacity, we require efficient methods for deciding how the wireless resources should be assigned. In particular, we require scheduling algorithms that determine which user should be served in each time step. In this paper we survey some of the basic results about scheduling in wireless data systems. The most important feature of such systems is that due to channel fading and user mobility, the rates at which users can receive data are both user-dependent and time-varying. In particular, when a user is close to the transmitter it can typically receive data at a higher rate than when it is farther away from the the transmitter. In recent years, there has been a great deal of work on developing effective scheduling algorithms for wireless data systems. Unfortunately, many of the results in the literature consider different models which makes comparing results difficult. In this survey, we define a number of models that have been studied and describe what scheduling results are known in each of them. We begin by describing in detail the basic scheduling problem together with the different models that have been considered. The model. We consider a set of n mobile data users in a wireless cell served by a single basestation. (See Figure 1). We focus on the downlink (basestation to mobile) direction since for many applications such as web browsing the majority of data flows in that direction. The basestation maintains a separate queue of data at the basestation for each mobile user. Time is slotted and in each time slot the basestation can transmit data to exactly one user. In order to make this decision the basestation knows at all time steps a vector (ro(t), ... ,rn-l(t)), where r.It] is the amount of data that can be transmitted to user i at time step t. In the EV-DO system this value is known as the Data Rate Control (DRC) value. As already * Bell Laboratories, Murray Hill, N J 07974 (andrewsCOresearch. bell-labs. com).
2
MATTHEW ANDREWS
goodchannel
FIG.
1. A wireless system.
mentioned, channel fading and user mobility mean that DRC values are user-dependent and time-varying. The scheduler at the basestation knows the value of r i (t) because at each time step mobile user i measures the strength of a pilot signal transmitted by the basestation. From the strength of that signal user i can calculate the quality of the channel between the basestation and itself and determine the rate at which the basestation should transmit in order to achieve a low error. The user then sends this rate to the basestation in a control message. The time-varying nature of the channel rates makes the scheduling problem much more complex than in the wireline setting since the "correct" decision about which user to serve will change from time slot to time slot. In general we want to "ride the peaks" of the channel processes and try to pick a user whose current channel condition is better than average. On the other hand, we want to schedule fairly and not starve any users whose channel conditions are poor. Formally, the scheduling problem is as follows. In each time step the scheduler receives the channel rate vector (ro(t), ... ,rn-l(t)). It then makes a decision about which user to serve. If user i is chosen then ri(t) bits are served from the queue of user i (or all the data is served if the queue size is less than r i (t)). As already mentioned, one of the aims of this survey is to highlight the differences between some of the models that are considered in the literature. The models differ in the assumptions that are made about the arrival model and in the assumptions that are made about the channel process between the basestation and the mobile users. With regards to the traffic model, the two options usually considered are an infinitely-backlogged model and a model where the queues for each user are fed by an external arrival process. More formally, these models are defined as follows: • In the infinitely backlogged model each user always has data to transmit. Since there is no arrival process as such, metrics such as queue size and delay do not make much sense. We wish to
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 3
optimize some function of the throughputs achieved by the users. For example, if R; is some measure of the long-term throughput achieved by user i, a popular goal is to optimize the Proportional Fair metric L:i log n; • In the model with an external arrival process, at each time step the scheduler receives a vector (ao (t), ... , an-l (t)), where a, (t) represents the amount of data that arrives for user i in time slot t. This vector is in addition to the channel rate vector described above. In this case, metrics such as queue size and delay become relevant in addition to throughput. One fundamental goal of any scheduling algorithm is stability. In particular, an essential attribute that we would like a scheduler to possess is stability. We say that a scheduler is stable if it keeps the queue sizes bounded whenever this is feasible. With regards to the channel process the two models that are studied are a model in which the channel rates are generated according to a stationary stochastic process and a worst-case model in which the channel rates are generated by an adversary: • For the model in which the channel rates are generated according to a stationary stochastic process we assume that there is a finite set of (aggregate) channel states denoted by M == {l, ... , M}. Associated with each state m E M is a fixed vector of data rates (p/l' ... , I-L N). The channel process is defined by an ergodic Markov Chain m(t) with state space M. In particular, whenever m(t) == m the channel rate vector is given by ri(t) == I-Lr;. In this model we typically aim to derive the "optimal" scheduling rule with respect to some metric. It is often convenient to compare a candidate scheduling algorithm against an ideal Static Service Split (SSS) rule in which we have a set of cPmi such that L:i cPmi == 1. Whenever the state of the Markov Chain is m the SSS rule serves user i with probability cPmi. We note however that it is typically not feasible to implement the optimal SSS rule since the scheduler is not aware of the structure of the underlying Markov Chain. • For the adversarial .modcl we do not assume any type of stationarity. Instead at each time step t the channel rate vector (ro(t), ... ,rn-l (t)) can be an arbitrary vector that is defined by an adversary. We can think of the adversary as trying to create as much trouble for the scheduling algorithm as possible. As in the stationary model we typically wish to compare a candidate online scheduling algorithm against an "ideal" algorithm. The SSS rules do not make much sense here because the optimal scheduling decision for a particular rate vector may change over time. Instead, we assume that at each time step, the adversary has its own schedule that will produce good performance in conjunction with
4
MATTHEW ANDREWS
the channel rate vectors that it generates. Our aim is to match the adversary's schedule as closely as possible. By combining the two possible traffic arrival models with the two channel models we obtain four possible models, each of which has been studied in the literature. In the next four sections of the paper we consider each model separately and present some of the results that are known. We then briefly discuss the case of wireless mesh networks in which there are multiple transmitters and data may need to pass through more than one node.
2. Infinitely backlogged queues - Stationary channel process. In the first model that we consider we assume that all users always have data to send and that the channel conditions are generated by a stationary stochastic process. This is the model that generated one of the most widely used wireless scheduling algorithms, namely the Proportional Fair scheduling algorithm of Tse [28]. In each time step Proportional Fair serves user,
.
J == arg
ri(t)
mfx R i ( t) ,
where R, (t) is the value at time t of an exponentially filtered average service rate that is updated according to,
R. (t t
+
1) == { (1 - T)R i ( t)
(1 - T) R, (t)
+ rr i ( t)
if i == j if i i= j
for some time constant T. (In practice T is typically on the order of 1000 slots.) Note that the Proportional Fair algorithm gives priority to users with a high instantaneous channel value (r i (t)) and a low current average service rate tR, (t) ). The Proportional Fair algorithm has an extremely elegant theoretical property. It maximizes, over all feasible scheduling rules, the function Ei log R i , whereE, is the long-term service rate of user i. This objective is sometimes known as the Proportional Fair metric for the following reason. If (Ro,'" R~_l) is the vector of feasible rates that maximizes Ei log then for any other vector of rates (R o, ... , R n - 1 ) , we have that E (R i R;)/R; < O. In other words, if we move from R; to another feasible rate allocation and we scale the improvement for user i in proportion to the current allocation, the aggregate improvement must be negative. Another way to look at the metric is that multiplying one user's rate by a factor c has the same effect on the objective as multiplying another user's rate by the same factor c. Lastly, observe that by using Ei log R, as a metric we do not starve any user completely since log a == -00. At a high-level the reason why Proportional Fair is optimal with respect to the metric Ei log R, is as follows. If we let S (t) == Ei log R i (t) then (\7S)(t) == (Ro\t)"'" Rn~dt))' We then have,
u;
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 5
L log Ri(t + 1) - L log Ri(t) i
i
~
(V' S)(t) . (Ro(t + 1) - Ro(t ), ... , Rn -1 (t + 1) - Rn -1 ( t))
=
(Ro\t)"'"
= (Ro\t)"oo,
Rn~dt)) . (Ro(t+l)-R o(t), ... , Rn~l (t+l) -
Rn-1(t))
Rn~l(t)) . (-rRo(t),oo.,rTj(t)-rRj(t),oo.,-rRn_1(t))
Trj (t)
== Rj(t) -
Tn,
whenever user j is served. Since T and n are fixed this implies that in order to maximize the change in 2:i log R, (t) we should serve the user that maximizes ri(t)/Ri(t). This is exactly what Proportional Fair does. A formal proof of the optimality of Proportional Fair has been obtained in multiple contexts (e.g. [20, 1, 25]. In this survey we state the asymptotic result of Stolyar [25]. Let be the steady-state rate allocation of the optimal SSS rule with respect to the metric 2:i log R i . Consider a sequence of processes indexed by TO, T1, T2, ... , where Tk 1 O. For each Tk we have a fixed initial state R;k (0) for the average service rates and a fixed initial state m Tk (0) for the Markov Chain that governs the channel process. Suppose that the channel rates evolve according to the Markov Chain and the average service rates evolve according to the Proportional Fair algorithm. Let DTk (t) be the amount of service received by user i in the time slot t under the process indexed by Tk. Let R;k (£1,£2) be the average service rate received between time slot £1 and time slots £2, namely,
R:
£2
R?(£1,£2) = £2 - ~ 1+ 1 'L....J " D?(t). t=£l
Then, the main theorem of Stolyar [25] implies, THEOREM 2.1. Let A be a bounded subset of R+.. Then, for any E > 0, there exist parameters T 1 and T 2 (both depending on A and
lim
sup
Tkl o R;k (O)EA,£1>T1/Tk,£2>T2/Tk
E)
such that,
V(E[R?(£1'£2)] - RiY:::;
E,
where E[·] denotes expectation. In other words, as T 1 0 the long-term average service rate approaches the average service rate of the optimal SSS rule. Another interesting property of Proportional Fair is that if the channel processes take the form r i (t) == a, . b, (t), where o.; is a constant and the b, (t) processes are i.i.d. then in the long run the fraction of slots allocated to each user is 1/n. That is, if the channel rate fluctuations around the mean are the same for all users then the scheduler serves each user for an equal amount of time. This property was derived by Holtzman in [17].
6
MATTHEW ANDREWS
Proportional Fair with Minimum/Maximum Rate Constraints. Note that although the Proportional Fair algorithm maximizes the metric 2::i log R i , it does not provide any absolute guarantees on the service rate provided to any individual user. For some applications, e.g. streaming video, we may need to provide a minimum bandwidth to the users in order for the application to be useful. In some cases we may also want to limit the amount of service that a user receives, e.g. if we want to encourage a user to upgrade to a more expensive service. Suppose therefore in that for each user i we have a minimum rate and a maximum rate ax in Rr and we want the average service rate R i to satisfy, It; Rr ax. The optimization problem then becomes,
Rr
Rr
< <
maxLlogRi subject to
(2.1)
An algorithm for this problem called Proportional Fair with Minimum/Maximum Rate Constraints (PFMR) was presented in [9]. The algorithm operates by maintaining a token counter Ti(t) for each user i. The role of this token counter is to enforce the rate constraints. It is updated according to if user i is served otherwise where R~oken == Ri in if Ti(t) ~ 0 and R~oken == Ri ax if Ti(t) < step t the PFMR algorithm serves the user, j = argmax ri(t) i
R, (t)
o.
At time
eaiTi(t)
,
where a, is a parameter that determines the timescale over which the rate constraints are satisfied. The basic idea of the token counter is that if the in average service rate to user i is less than then T, (t) becomes positive and so we are more likely to serve user i. If the average service rate to user i is more than Rr,rax then Ti(t) becomes negative and so we are less likely to serve user i. Recall that T is the time constraint of the exponential filter used to define Ri(t). The paper [9] shows that if T 1 0 and a, ex T for all i then as long as the algorithm converges it converges to the optimal solution of the problem (2.1). We remark that in [21], Liu, Chong and Shroff considered a similar problem to (2.1) and presented a different algorithm that is based on the theory of stochastic approximation.
Rr
3. External arrival process - Stationary channel process. The results described in the previous section show that by using the Proportional Fair algorithm we can achieve fair rate allocations in the case that
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS
7
each user always has data to serve. However, in some situations different users have different amounts of data to serve and our goal is to serve all the data. Suppose that for each user traffic arrives according to some stationary random process. Let a, (t) be the amount of data arriving for user i at time t. We sometimes refer to the process defined by (ri (t), a,(t)) as the input process of the system. Let qi(t) be the amount of user i data waiting for service at time t. The queueing process is updated as follows. If user i is served at time t then:
else
Let Ai be the mean arrival rate for user i. We say that the input process is schedulable if there is an SSS rule under which the average long-term service rate R, is greater than (1 - E)Ai for some E > O. We would like the scheduling algorithm to be stable. That is, we would like the algorithm to ensure that the queue process has a stationary distribution whenever the input process is schedulable. Note that if the queue process has a stationary distribution then the aggregate queue size cannot drift to infinity.
3.1. Proportional fair is unstable. An extremely natural question is: "Is the Proportional Fair scheduling algorithm stable?" This question was studied in [3]. However, before we present the answer we must examine exactly how the Proportional Fair algorithm is defined in the case that some of the queue sizes are extremely small. We may not want to serve a user that only has a small amount of data to serve. In [3]' three options were considered for whether or not a user is eligible for service. • Option AI. All users are eligible for service at every time slot. • Option A2. User i is only eligible for service at time slot t if qi(t) > O. • Option A3. User i is only eligible for service at time slot t if qi(t) ~ ri(t); i.e. there is enough data to fully utilize the time slot. Among all eligible users, the one with the highest value of ri(t)/ Ri(t) is selected for service. However, this still does not entirely define the algorithm since there remains the question of how we update the average service rate Ri(t) when the amount of data in the queue of the served user is less than the instantaneous service rate. In [3] the following options were considered, • Option B 1. When user i is served then R, (t) IS updated by R; (t + 1) == (1 - T) R, (t) + Tr i ( t). • Option B2. When user i is served then Ri(t) is updated by u,(t + 1) == (I - T)u.(t) + T min {r i ( t), qi(t) }.
8
MATTHEW ANDREWS
(We remark that as far as we are aware most practical implementations of Proportional Fair use options A2 and B1.) By considering all possible combinations of the "A" and "B" options we obtain six different algorithms. The main result of [3] is that none of these six algorithms are stable.' The instability example is extremely simple and consists of two users. The arrival process is constant, al (t) == 49 and a2(t) == 94 for all t. The channel process for user 2 is constant, rz (t) == 100 for all t. The channel process for user 1 is periodic with period 10, namely, mod 10 == 0 otherwise.
r (t) == { 1000 if t 1
100
This example is schedulable since we could schedule user 1 whenever t mod 20 == 0 and we could schedule user 2 in all other slots. In other words, half of the slots where "i (t) == 1000 are assigned to user 1, all other slots are assigned to user 2. This would result in an average service rate to user 1 of 50 and an average service rate to user 2 of 95. These service rates are more than the respective arrival rates. However, it is shown in [3] that Proportional Fair is not able to make the correct slot assignments. In particular, for each of the six versions of Proportional Fair, it is shown that user 2 receives only 9 out of 10 slots and hence its average service rate is only 90. Since the arrival rate for user 2 is 94 this means that the queue for user 2 grows without bound. 3.2. Max-weight is stable. Since the Proportional Fair algorithm is
not stable, the next question to ask is whether or not there exists a stable algorithm. The basic problem with Proportional Fair is that it does not take into account the queue lengths and so it does not know how to react when one queue starts to get too large. A simple algorithm that does not suffer from this problem is the Max- Weight algorithm that always serves the user with the maximum value of qi(t)ri(t). Note that this algorithm favors users with large instantaneous channel rates and users with large queues. Various analyses have shown that this algorithm is stable, for example [26, 27, 8, 7, 22, 19]. At a high level, the reason why the Max-Weight algorithm is stable is that Ilq(t)11 has a negative drift, where Ilq(t)11 == VLi(qi(t))2. To see why this is true, let Xi (t) be the amount of service that user i receives at time t under Max-Weight and let Yi(t) be the amount of service that user i receives at time t under the optimal SSS rule. By the definition of the MaxWeight we have that Li qi(t)Xi(t) ~ Li qi(t)Yi(t) and by the definition of 1 We remark that the instability example of [3] does not quite fit into the model that we have defined in this paper since the channel process is periodic rather than ergodic. (However, we conjecture that the result could also be extended to an example with an ergodic channel process.)
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 9
schedulability we have that for large w, E[Lt~t'~t+w(Yi(t/) - ai(t/))] EWAi for all i and for all t. Therefore,
>
Ilq(t + 1)11 2 == L(qi(t) + ai(t) - Xi(t))2 i
== Ilq(t)11 2 + L qi(t)(ai(t) - Xi(t)) + L(ai(t) - Xi(t))2 i
~ IIq(t)11
2
+L
qi(t)(ai(t) - Yi(t))
+ L(ai(t) -
Xi(t))2.
i
Since for all t and for large w, E[Lt
EWAi, we can use the above inequality to show that Ilq(t)1I has a negative drift over long timescales whenever some queue becomes large. We omit the details from this survey.
4. External arrival process - Adversarial channel process. Proportional Fair and Max-Weight are simple, appealing algorithms with welldefined provable properties. However, all of the results mentioned in the previous two sections make the assumption that the channel process can be modeled by a stationary stochastic process. There may be some situations however where this stationarity assumption does not hold. In particular, consider a vehicle driving away from a basestation. In this case the channel has a negative drift. Hence it makes sense to also consider an adversarial process that allows us to do worst case analysis. In this section we consider the scenario in which the data is generated by an external arrival process. In this case we assume that the adversary generates the arrival process as well as the channel process. As is usual in adversarial analyses, we must make some restrictions on the adversary otherwise it could overload the system and prevent any algorithm from having reasonable performance. We therefore define the adversarial model as follows. At each time step t the adversary generates the channel rate for user i, r i (t), and the amount of arriving data for user i, ai(t). In order to define whether this situation is schedulable, we assume that the adversary has its own "hidden" schedule. If user i is served by this hidden schedule at time t then we write Zi(t) == 1, else Zi(t) == O. We say that the input process is schedulable with parameters (W,E) if for any sequence of W time steps, the amount of data that arrives for user i in those time steps is less by a factor (1 - E) than the amount of service that the hidden schedule gives to user i, i.e. for any to, L~~~(~ ai(t) < (1 - c) L~~~(~ ri(t)zi(t). 4.1. Tracking algorithm is stable. Ideally we would like a schedule such that if the input process is schedulable then the queue sizes are bounded, i.e. there a exists a B such that qi(t) ~ B for all i, t. This question was addressed in the paper [11]. The first part of this paper looks at impossibility results. In particular, two results were proved. In order
10
MATTHEW ANDREWS
to understand the meaning of these results we let n be the set of channel rates used by the adversary. We also let Rinf == inf{r E R : r > O} and RSUP == sup{r E R}. Paper [11] shows: • If E > 0 then for any online scheduling algorithm A, the adversary can create a schedulable input process such that some queue is unbounded under algorithm A. In this example the rate set is infinite and satisfies Rinf == 0, i.e. the nonzero rates used by the adversary can be arbitrarily small. • If E == 0 then for any online scheduling algorithm A, the adversary can create another schedulable input process such that some queue is unbounded under algorithm A. In this example the rate set is infinite and satisfies n inf > 0, i.e. the nonzero rates used by the adversary are bounded away from zero. The intuition behind these results is that at each time step the adversary can determine which user will be served by algorithm A at the next time step. It then injects data in such a way that the only way to keep the queues bounded is to serve a user different from the one served by algorithm A. These results left open the question of whether there is a stable algorithm for the situation in which R is finite or the situation in which E > 0 and Rinf > O. The paper [11] shows that in both these situations we can obtain a stable online algorithm. Let us focus on the case that R is finite. The algorithm of [11] is called the Tracking Algorithm since it operates by trying to track what the adversary's schedule is doing. In order to describe the algorithm in more detail suppose that after the online algorithm has made its decision in time slot t, the adversary reveals which user was served by its schedule at time step t. Suppose also that the chan, J-ln-l). The next nel rate vector at time t, (ro(t), ... , rn-l (t)) == (J-lo, time after t that the channel rate vector equals (J-lo, ,J-ln-l) the Tracking algorithm serves the user that the adversary served at time step t. This very simple algorithm ensures that the queue size for each user i is always bounded by (2Fn + 1 )RSUP, where F == IRI, the number of possible channel rates. Unfortunately, this quantity is exponential in the number of users. Note however that in the Tracking algorithm just described, we need to keep track of which user was last served by the adversary for each possible channel rate vector. We show in [11] that for a slightly more complex version of the Tracking algorithm we can reduce the queuesize bound to (2nF 2 + l)RSUP by only maintaining state for each possible rate vector for each pair of users, rather than for each rate vector for all users. Note that the above description of the Tracking algorithm relies on knowing what the adversary did at the previous time step. In reality we cannot calculate this but we can calculate something similar that is sufficient to make the Tracking algorithm implementable. Recall that for the input process to be schedulable we must have that
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS to+w
L
t=to
11
to+w
ai(t) ~ (1 - E)
L
Ti(t)Zi(t),
(4.1)
t=to
for any to. Hence we can approximate the adversary's schedule in the following manner. We divide time into windows of length w. At the end of each window we know exactly what the arrivals were during that window and we know what the channel process was during the window. This means that if can solve the above integer program (4.1) with respect to the variables Zi(t) we can find the adversary's schedule for the previous w time steps. By using ideas similar to the above, this allows the Tracking algorithm to operate. The bound on the queue size becomes (2nP2 + 1)Rs u P w . However, this is still not quite satisfactory since for large values of w, the integer program (4.1) might be intractable. It is shown in [11] however, that the Tracking algorithm is well-defined even if we allow the adversary's schedule to be fractional (i.e. we divide service among multiple different users in each time slot). In this case the integer program becomes a more tractable linear program and we obtain the same bound on queue size. The above discussion focused on the case in which the feasible rate set is finite. If R is infinite but Rinf > 0 and E > 0 we show in [11] that we can still apply these results to obtain a stable schedule by rounding down each channel rate to the closest value "Yk == RSUP(1 - E/2)k for 0 ~ k ~ flog1-£/2 RsuPjRinfl .
4.2. Max-weight produces large queues. We have just described an online scheduling algorithm that is stable whenever R is finite or when E > 0 and Rinf > O. However, the Tracking algorithm is somewhat complex since it requires calculating the adversary's schedule over the past w time steps. It is therefore natural to ask whether there are any simpler algorithms that are also stable whenever these conditions hold. In particular, we would like to know how well the Max-Weight scheduling algorithm of Section 3 performs in this context. Unfortunately, the extremely interesting question of whether Max-Weight is always stable remains unresolved. However, the paper [12] shows that in some cases Max-Weight can perform significantly worse than the Tracking algorithm.
Recall that the more complex version of the Tracking algorithm produces queue sizes that are polynomial in the number of users. In contrast, [12] shows that for the rate set R used in the EV-DO system the adversary that can create an input process with E == 0 such that the queue size of one user can be as large as 8184 . 211" when n < 2048. The paper [12] also presents some simulation results for a natural example in which E > 0 and the channel rates are governed by a sequence of users moving past a linear array of basestations. In this case Max-Weight produced queue buildups that were significantly larger than those produced by the Tracking algorithm.
12
MATTHEW ANDREWS
Another contribution of [12] is that it defined a simpler Tracking algorithm that only used state for single users, rather than pairs of users. In particular, for each user i and for each /1 E R, the simpler 'Tracking algorithm maintains a counter c, (/1) that equals the number of times that the adversary served user i when ri(t) == /1 minus the number of times that it served user i when ri(t) == u. At all times it serves the user with the maximum value of ci(ri(t)). Although we are not able to prove any bounds for this simpler Tracking algorithm it performed better than the Tracking algorithms with provable performance bounds in many simulation examples.
5. Infinitely backlogged queues - Adversarial channel process. In this section we once again assume that the channel process is generated by an adversary. However, we now consider the case in which each user always has data to serve. Our objective is to maximize log R i , where R; is a measure of the service rate to user i. However, in contrast to Section 2, it no longer makes sense to measure a long-term average of service rate using an exponential filter since the feasible service rates could change dramatically over time due to the channel assignment process defined by the adversary. We therefore define R i (t) to be the total service for user i up to time t. That is, we update Ri(t) by if i == j if i i= j. In the spirit of competitive analysis, our goal is to define an online scheduling algorithm that always produces a rate assignment such that Li log R, (t) is as close as possible to the value produced by the optimal offline algorithm. In [4] it was shown that it is impossible to match the optimal value of the objective. In particular, if there are n users in the system, THEOREM 5.1. For any online algorithm A, the adversary can construct a channel rate process such that for some time t, Li log Ri(t) :::; Li log (t) - O(n log n), where (t) is the total service rate for user i of the offline scheduling algorithm that is optimal for time t. The proof is short and so we reproduce it here. Proof Let p be a parameter. We define a series of special time steps We let T == tn. We also define a sequence of sets Si; For by, tk ==. tk-l < t :::; tk the rate vector is defined by,
R;
R;
i:«
ri(t) == 1 ri(t) == 0
if i E Sk otherwise.
It remains to define the sets Si: The initial set So We define,
{O,1, ... ,n-1}.
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 13
(This means that ik is the user in Sk-l that has received the least amount of service by time tk). We define Sk == Sk-l - {i k }. Note that for online algorithms this process is well-defined since ik does not depend on Sk. We now analyze how much service each user receives. By the definition of iu,
Since rik(t) == 0 for t
> tk, pk + pk-l + ... + p n-k+1
Hence,
'L.-t " log R;(T) :S log -;;; P + log -n---1 p2 + P + ... + log(pn + pn-1
+ ... + p)
i
p(p2+p) ... (pn+pn-1+ ... +p) n! p2p3 ... pk+1 ... pn+l < Iog - - - - - - - n!(p - l)n
== Iog
=
(~lOgpi) + nlog p~ 1 -log(n!)
=
(~lOgpi) - n(n logn).
(The last equality holds because pS- is maximized for p == 2.) On the other hand, there is a valid schedule that assigns all the time slots between t., and ti+l to user i. This implies, n
LlogR;(t) 2: Llogpi. i
i=l
The result follows. 0 We now present a positive result. In particular we show that an extremely simple randomized algorithm can match the bound of Theorem 5.1 up to constant factors. LEMMA 5.1. For any sequence of rate vectors, if we serve each user at each time step with probability lin, the expected throughputs satisfy
LlogE[Ri(T)] 2: LlogR;(T) - O(nlogn). i
Proof Et~T ri(t)ln
i
Follows immediately from the fact that E[R i (T)]
2: RT(T)ln.
0
14
MATTHEW ANDREWS
We remark that we can approximate the performance of this randomized algorithm by treating it as a fractional schedule that assigns a 1/n fraction of each slot to each user and then "tracking" this schedule's performance using the Tracking algorithm of Section 4.
6. Wireless mesh networks. Up until now we have assumed a situation in which we only have a single basestation and multiple mobile receivers. In this section we briefly discuss the case of wireless mesh networks in which traffic is routed through a network consisting of multiple nodes. We consider the following model. We assume a set of sessions, each one consisting of a path through the network from a source node to a destination node. For each node-pair (i, j) we have a channel rate ri,j (t) that indicates the rate at which we can transmit from node i to node j at time t. At each time step, node i can transmit to at most one neighbor. If it selects neighbor j then the transmission rate is r i,j (t). If session i passes through node i then we let qf(t) be the amount of session-k data queued at node i at time t. We also let n~ be the next hop after node i on the path of session k. For the case in which the channel conditions are generated by a stationary stochastic process and data is injected into each session according to an external arrival process, a generalization of the Max-Weight algorithm defined in Section 3 is known to be stable. For this network version of Max-Weight, at each time step t and at each node i, the scheduler at node i calculates k" == argmaxk{(qf(t) - qkk(t))rink(t)}. If qf*(t) ~ qkk*(t) n ' n i
t
i
then node i sends session k* data to node n~* at time t. The amount of data that is sent is min{qf* (t), ri,n~* (t)}. For the case in which the channel conditions are generated by a stationary stochastic process and each session always has data to inject, suppose that we wish to maximize Lk log Rk(t) where Rk(t) is an exponentially filtered average of the session-k data that is served. We can solve this problem by using the Max-Weight scheduling algorithm and then injecting data into session k whenever Rk\t) -Tq;egin(t) > 0, where T is a small parameter and
qtegin (t) is amount of session-k data queued at the first node on session k's path. This mechanism can be viewed as a method for combining congestion control and scheduling in wireless networks. Note that if a wireless node on the path of session k becomes congested, the Max-Weight scheduling algorithm will create "backpressure" that will cause queue buildups at all nodes on session k's path that are upstream from the congested node. Eventually, qtegin (t) will become large. When this happens, the rule for data injection means that we are less likely to inject session-k data. Joint optimization of scheduling and congestion control can provide benefits in wireless networks for many reasons. A discussion of this issue may be found in [5].
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 15
The above algorithms for scheduling and congestion control in wireless mesh networks are special cases of the Greedy Primal-Dual algorithm for network control defined by Stolyar in [24]. (Similar algorithms were also proposed in [16, 23].) We remark that this Greedy Primal-Dual framework provides numerous extensions to these algorithms, including an algorithm for determining routes through the network in the case that routing is not fixed. For the case in which the input process is generated by an adversary much less is known. If only the traffic is generated by an adversary but the channel rates are constant then an algorithm similar to Max-Weight can keep all queues stable whenever possible [2]. This result also holds for the variable channel rate case in which all data is destined for a single destination and the packet routes form a rooted tree [13, 14]. However, for the general problem no algorithm is known that keeps the queues stable whenever possible. One partial result that was presented in [10] considers an algorithm that is a hybrid of the Tracking algorithm of Section 4 and the Nearest-to-Source algorithm [6] for wireline networks that always gives priority to data that is closest to its source. The paper [10] shows that this hybrid algorithm is stable as long as the adversary does not correlate the traffic arrivals with the channel rate process. That is, stability holds as long as the adversary does not overload the network if we restrict attention to a set of time slots that all have a common channel rate vector.
7. Discussion. In this survey paper we have presented a number of different models in which scheduling in wireless data networks can be analyzed. In these models we have discussed algorithms that perform well and we have presented some limits on achievable performance. One of the features of these results that we wish to highlight is that different algorithms work well in different models. For example in the stationary channel model Proportional Fair works well when we are trying to provide a fair throughput allocation and each user always has data to serve. In contrast, if the queues are fed by some arrival process then Proportional Fair is not always the ideal algorithm since it can lead to unstable queues. A number of open problems remain. First as mentioned earlier, what is the performance of the Max-Weight algorithm in adversarial channels? More generally, is there an algorithm that is simpler than the Tracking algorithm and guarantees stability whenever possible in the case of adversarial channels? Lastly, what is the best scheduling algorithm to use in wireless mesh networks? In particular, is there an algorithm that maintains network stability whenever possible and is completely distributed in the sense that it does not require any queue state information to be exchanged between neighboring nodes?
16
MATTHEW ANDREWS
REFERENCES [1] R. AGRAWAL AND V. SUBRAMANIAN. Optimality of certain channel aware scheduling policies. In Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, October 2002. [2] W. AIELLO, E. KUSHILEVITZ, R. OSTROVSKY, AND A. ROSEN. Adaptive packet routing for bursty adversarial traffic. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 359-368, Dallas, TX, May 1998. [3] M. ANDREWS. Instability of the proportional fair scheduling algorithm for HDR. IEEE Transactions on Wireless Communications, 3(5), 2094. [4] M. ANDREWS. Maximizing profit in overloaded networks. In Proceedings of IEEE INFOCOM '05, Miami, FL, March 2005. [5] M. ANDREWS. Joint optimization of scheduling and congestion control in communication networks. In Proceedings of 40th Annual Conference on Information Sciences and Systems, Princeton, NJ, March 2006. [6] M. ANDREWS, B. AWERBUCH, A. FERNANDEZ, .1. KLEINBERG, T. LEIGHTON, AND Z. LIU. Universal stability results and performance bounds for greedy contention-resolution protocols. Journal of the ACM, 48(1): 39-69, January 2001. [7] M. ANDREWS, K. KUMARAN, K. RAMANAN, A. STOLYAR, R. VIJAYAKUMAR, AND P. WHITING. CDMA data QoS scheduling on the forward link with variable channel conditions. Bell Labs Technical Memorandum, April 2000. [8] M. ANDREWS, K. KUMARAN, K. RAMANAN, A. STOLYAR, R. VIJAYAKUMAR, AND P. WHITING. Providing quality of service over a shared wireless link. IEEE Communications Magazine, February 200l. [9] M. ANDREWS, L. QIAN, AND A. STOLYAR. Optimal utility based multi-user throughput allocation subject to throughput constraints. In Proceedings of IEEE INFOCOM '05, 2005. [10] M. ANDREWS AND L. ZHANG. Routing and scheduling in multihop wireless networks with time-varying channels. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, January 2004. [11] M. ANDREWS AND L. ZHANG. Scheduling over a time-varying user-dependent channel with applications to high speed wireless data. Journal of ACM, September 2005. [12] M. ANDREWS AND L. ZHANG. Scheduling over non-stationary wireless channels with finite rate sets. IEEE/ACM Transactions on Networking, 2006. [13] E. ANSHELEVICH, D. KEMPE, AND J. KLEINBERG. Stability of load balancing algorithms in dynamic adversarial systems. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, Montreal, Canada, May 2002. [14] B. AWERBUCH, P. BERENBRINK, A. BRINKMANN, AND C. SCHEIDELER. Simple routing strategies for adversarial systems. In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science, pp. 158-167, Las Vegas, NV, October 2001. [15] P. BENDER, P. BLACK, M. GROB, R. PADOVANI, AND N. SINDHUSHAYANA A. VITERBI. CDMA/HDR: A bandwidth efficient high speed data service for nomadic users. IEEE Communications Magazine, July 2000. [16] A. ERYILMAZ AND R. SRIKANT. Fair resource allocation in wireless networks using queue-length based scheduling and congestion control. In Proceedings of IEEE INFOCOM 'OS, Miami, FL, March 2005. [17] J. HOLTZMAN. CDMA forward link waterfilling power control. In Proceedings
of the IEEE Semiannual Vehicular Technology Conference, VTC2000-Spring, pp. 1663-1667, Tokyo, Japan, May 2000. [18] A. JALALI, R. PADOVANI, AND R. PANKAJ. Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system. In Proceed-
ings of the IEEE Semiannual Vehicular Technology Conference, VTC2000Spring, Tokyo, Japan, May 2000.
A SURVEY OF SCHEDULING THEORY IN WIRELESS DATA NETWORKS 17 [19] N. KAHALE AND P.E. WRIGHT. Dynamic global packet routing in wireless networks. In Proceedings of IEEE INFOCOM '97, Kobe, Japan, April 1997. [20] H. KUSHNER AND P. WHITING. Asymptotic properties of proportional-fair sharing algorithms. In 40th Annual Allerton Conference on Communication, Control, and Computing, 2002. [21] X. LIU, E. CHONG, AND N.B. SHROFF. A framework for opportunistic scheduling in wireless networks. Computer Networks, 41(4): 451-474, 2003. [22] M. NEELY, E. MODIANO, AND C. ROHRS. Power and server allocation in a multibeam satellite with time varying channels. In Proceedings of IEEE INFOCOM '02, New York, NY, June 2002. [23] M. NEELY, E. MODIANO, AND C. LI. Fairness and optimal stochastic control for heterogeneous networks. In Proceedings of IEEE INFOCOM '05, Miami, FL, March 2005. [24] A. STOLYAR. Maximizing queueing network utility subject to stability: Greedy primal-dual algorithm. Queueing Systems, 50(4): 401-457, 2005. [25] A. STOLYAR. On the asymptotic optimality of the gradient scheduling algorithm for multiuser throughput allocation. Operations Research, 53: 12-25, 2005. [26] L. TASSIULAS AND A. EPHREMIDES. Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks. IEEE Transactions on Automatic Control, 37(12): 1936-1948, December 1992. [27] L. TASSIULAS AND A. EPHREMIDES. Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE Transactions on Information Theory, 30: 466-478, 1993. [28] D. TSE. Multiuser diversity in wireless networks. http://www.eecs.berkeley.edu/-dtse/stanford416.ps.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT FRANQOIS BACCELLI*, RENE L. CRUZt, AND ANTONIO NUCCI+
Abstract. We consider a single TCP session traversing a wireless channel, with a constant signal to interference and noise ratio (SINR) at the receiver. We consider the problem of determining the optimal transmission energy per bit, to maximize TCP throughput. Specifically, in the case where direct sequence spread spectrum modulation is used over a fixed bandwidth channel, we find the optimal processing gain m that maximizes TCP throughput. In the case where there is a high signal to noise ratio, we consider the scenario where adaptive modulation is used over a fixed bandwidth channel, and find the optimal symbol alphabet size M to maximize TCP throughput, Block codes applied to each packet for forward error correction can also be used, and in that case we consider the joint optimization of the coding rate to maximize TCP throughput. Finally, we discuss the issue of assigning target SINR values. In order to carry out our analysis, we obtain a TCP throughput formula in terms of the packet transmission error probability p and the transmission capacity C, which is of independent interest. In our TCP model, the window size is cut in half for each packet transmission loss, and also cut in half whenever the window size exceeds C. This formula is then used to characterize the optimal processing gain or the optimal symbol alphabet size as the solution of a simple fixed point equation that depends on the wireless channel parameters and the parameters of the TCP connection. Key words. CDMA, adaptive modulation, processing gain, block coding, signal to noise and interference ratio, power control, bandwidth sharing, congestion control, congestion avoidance, additive increase multiplicative decrease algorithm, TCP throughput, optimization, stochastic process, stationary distribution, Mellin transform. AMS(MOS) subject classifications. Primary 94A05, 94C99, 60K30.
1. Introduction. Cellular wireless networks were originally designed to support voice, which has stringent delay requirements. In these networks, a power control algorithm is used to maintain a target SINR for each user. The power control algorithm adapts to fast multi-path fading that arises due to mobility of users or the sources of scattering, so that a constant bit rate and a required maximum bit error rate is maintained for each connection, with low transport latency. Thus, for example, when a user encounters a fading channel condition, the transmission power is boosted so that the voice conversation can continue in real time. These systems have been adapted to carry data as well. A fixed capacity channel may be allocated for a data user. We are generally interested in optimizing the channel parameters in order to provide the best performance for the data user. For simplicity here we assume that a data user corresponds to a single TCP connection. We assume that the channel is *INRIA-ENS (franco is . baccelli
20
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
allocated a SINR which will be maintained to a constant target value. We focus on the case of a long lived TCP connection. The target SINR value may be adapted over time to respond to user mobility, and we consider the regime where the TCP connection reaches a steady state between updates of the target SINR value. We consider first the case of CDMA, and consider the problem of optimizing the processing gain m and coding rate p to maximize TCP throughput. By adjusting the processing gain and coding rate, we tradeoff the bit transmission rate C and packet transmission error rate p. By making the processing gain m large we increase the energy per bit and so the packet transmission loss probability p is small. However, the bit transmission rate C is proportional to 11m. We also consider the context of a high SINR channel over a fixed bandiwidth. In this case, we study adaptive modulation, where the symbol alphabet size M is adapted. In this case the raw transmission rate C is proportional to log2(M), but the probability of a packet error p grows quickly with M. We find the optimal value of the alphabet size M in order to maximize TCP throughput. To understand the nature of the optimization we consider, it is useful to think of two extremes. At one extreme we can make the bit-transmission rate C high, but the packet transmission error rate p will be large, Packet transmission errors will generally cause the TCP protocol to reduce its window size, and in turn decrease throughput. At the other extreme we can make the packet transmission error rate p very small, at the expense of reduced transmission rate C. The bit transmission rate C ultimately limits the TCP window size, since buffer overflows will occur when the window gets sufficiently large, and TCP will cut the window size in half, responding to the congestion that occurs at the buffer. Thus, TCP throughput is small at this extreme as well. In order to find the optimal operating point, we consider a model for analyzing the TCP throughput where packet losses due to transmission errors and packet losses due to congestion events are distinguished. In Section 3, we consider a fluid model where the window size is cut in half for each packet loss due to a transmission error or buffer overflow event. The packet loss probability is p, and we assume that a congestion event happens when the window size reaches a value that matches the bit-transmission rate, C, on the channel. This model is appropriate for small buffers. This makes sense within e.g. the CDMA context where the bit-transmission rate is small, compared to what is available on wireline networks, and where large downlink buffers would imply large RTTs and hence poor TCP performance as TCP throughput is known to be roughly inversely proportional to RTT. For this model, we find a formula for the TCP throughput asa function of p and C, which is of independent interest. This is a generalization of the well known"square root" formula (e.g. see [11]) for TCP throughput
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 21
where packet transmission errors and buffer overflows are not separately modeled. We also obtain formulas for the probability density function of the TCP window size. In Section 4, we show how to use the analytical framework for TCP throughput in the particular case of wireless channels with fixed SINR. We consider several cases ranging from CDMA to adaptive modulation, with or without FEC. We show that the TCP throughput can be fairly sensitive to the .physical layer parameters, i.e. the processing gain m or symbol alphabet size M, and we give simple analytical characterizations of the optimal values for these parameters. We also discuss, in Section 5, the problem of assignment of target SINR values in a network context, and specifically discuss the case of a wireless cellular CDMA downlink. Before describing our model and analysis in more detail, we first discuss some related work.
2. Related work. Many authors have considered the general problem of performance of the TCP protocol over wireless links. A comparison of various approaches to the problem is given in [4]. One branch of related work is concerned with the problem of determining whether packet losses are due to congestion or due to transmission errors, so that TCP can go into congestion avoidance mode only when packet losses are due to congestion. A second branch of work considers the approach of splitting the TCP connection at the wire-line/wireless infrastructure boundary, so that the TCP connection is isolated from the packet transmission errors on the wireless channel. A third approach to optimizing TCP performance over wireless channels is to optimize the link layer for TCP performance. The approach we take in this paper falls into the third category. Zorzi and Rao [14] considered the effect of correlated errors on TCP throughput. Correlated errors typically occur due to multi-path fading. In our model, we assume a fixed SINR (thanks to power control), so errors can be modeled as independent. Chaskar, Lakshman, and Madhow [5] consider the use of link layer ARQ over the wireless channel to hide packet transmission errors. In this case, TCP primarily reacts to buffer overflows only. In [5], it is suggested that the physical layer parameters should be optimized so that the buffer overflow probability q times the bandwidth delay product squared is equal to one. Liu, Goeckel, and Towsley [10] considered the problem of adapting the coding rate p to the channel conditions, with the objective of maximizing TCP throughput. It was already noted in [10] that optimal values of operating parameters for the channel for TCP were different than those for UDP. In [9], Liu, Zhou and Giannakis used simulation to study cross-layer optimization within the adaptive modulation setting.
22
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
The main novelty of the present paper is the fact that it provides an analytic framework for this adaptation. Several recent papers provide analytical formulas for the throughput of a large collection of competing TCP flows with both congestion and transmission error losses (see [2] and the references therein). The main difference with [2] and related papers is that we are here focusing on the rate of a single TCP connection constrained to remain below the rate C and subject to both random transmission error losses and to losses that occur when its rate reaches or exceeds C. To the best of our knowledge, this setting, the associated formulas for TCP throughput and the analytic characterizations of the optimal wireless channel parameters that are derived in the present paper are all new.
3. TCP throughput. 3.1. Model. In this section, we analyze the throughput of a single TCP flow over a wireless channel. Each packet contains L bits, and when it is transmitted over the channel, it is lost with probability p. We assume that packet transmission errors are independent. Later on, in Section 5, we will consider how the packet transmission loss probability p depends on , and other physical layer parameters. We assume it takes L/C seconds to transmit each packet on the channel. The parameter C is called the raw transmission capacity and has units of bits/sec. We now consider a fluid model for the TCP flow. Our model is in terms of the parameters C and p discussed above. Consider a random process X (t) that models the instantaneous throughput for the TCP flow. The instantaneous throughput is assumed to be proportional to the TCP window size. Our model for the dynamics of X (t) is as follows. Let R be the round trip delay (assumed constant here), in units of seconds. When there is no loss, at time t, X(t) increases at rate L/ R2. If X(t) reaches C, a congestion event occurs, causing X(t) to be reduced in half to the value C /2. Packet losses are modeled by a time in-homogeneous Poisson process, where the rate of loss at time t is ;\(t) == pX(t)/L. This reflects the fact that the rate of packet loss is higher when the packet rate is higher. When there is a packet loss at time t, the value of X (t) is reduced in half. Of course a more refined model would take into account the presence of a buffer and the fact that losses only take place when this buffer overflows. In Section 6, we will show by simulation that the conclusions obtained from our simplified bufferless model are still valid for this refined model.
3.2. Distributions. We first define some notation. Let ex == pR2 / L 2. Define Ao == 1 and for l 2 1 define
Aj =
II l
J=1
(
-4
22]
_
1
)
.
(3.1)
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT
For I
~
23
1 and n ~ 2 define al,n
== A l - 1 exp( _(2 2(l-
[1 + (221~
l) -
1)a(C /2 n ) 2 /2)
(3.2)
n 1)eXP(- (3)22(1- 1)a (C/ 2 )2/ 2)] .
The following is proved in Appendix B. THEOREM 3.1. Let f(x) be the stationary probability density function
for the instantaneous throughput X (t) of the TCP connection. The density f satisfies the differential equation ifC/2 < x S; C ifOS;x
df(x) __ { -axf(x) dx -axf(x)+4axf(2x)
(3.3)
where a == pR2 / £2. The density f(x) is discontinuous at x == C/2 and such that
(3.4) The density f (x) is given by n I Aie _2 f( X ) -_ " L....-t' TVn-l
2l
ax
2/2
(3.5)
l=O
if C /2 n +1 < x < C /2 n for n ~ 0, 0 otherwise, where the constants Vo, VI, . .. are the solutions to the equations
(3.6) n
Vn
==
Lal,nVn-l
for n > 1 ,
(3.7)
l=1
and
L 00
m=O
Vm
1
[C/2
Tn
=+1
2
]
exp( -ax /2)dx =
C/2
[
LT 00
k
]
Ak .
-1
(3.8)
k=O
It is interesting to note that in all cases examined so far, the constants
Vn appear to converge to a constant as n 3.3. this case f(x) is a case the and C.
~ 00.
Limiting cases. First consider the case where p == a == O. In it can be verified that L:~o Vn-lA l == 0 for all n > 1 and hence uniform distribution on [C/2, C]. This is expected since in this trajectory of X (t) is a sawtooth pattern varying between C /2
24
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
(\
I
Ij \\
0.8
I \
0.3
/
0.2
/
/
I
04
I \ I
/
)
\
I \\
/
/
0.1
I
0.6
;'
FIG. 1.
I
I \\ I
0.4
/
02
I
I
\
I !
\
"'.,..-
Probability Density of Instantaneous TCP Throughput: Examples.
Next consider the case where C approaches infinity. In this case, it can be verified that Vn tends to Va for all n. Hence in this case we have f(x) == Va(E~a Aze-22lax2 /2) for all x > O. This is precisely the same result obtained in [3]. This is expected since letting C approach infinity is equivalent to having transmission losses only, which coincides with previous models. 3.4. Examples. We now illustrate the above results with some numerical examples. Suppose C == 5, R == 100 and L == 40. In Figure 1, the density f(x) of the instantaneous throughout is plotted versus x for two different values of the packet loss probability p. The left plot shows the case p small. In this case, note that there is a visible discontinuity in the density at x == C /2, reflecting the jumps from C to C /2 when the instantaneous throughput hits the boundary C. The right plot shows the case p large. In this case the packet loss probability is high enough so that it is unlikely that the instantaneous throughput is near the boundary C, and thus there is no visible discontinuity in the density at x == C /2. Note that in all cases, f (x) is nearly zero when x is close to zero, which is because packet loss becomes an unlikely event when the instantaneous throughput is near x == O. 3.5. Mean values. The mean throughput of TCP is given by
TCP(C,p)
=
1°
xf(x)dx.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 25
Specifically, by using (3.5), after some algebraic manipulation, we can show that
In the following theorem we obtain a closed form equation for TCP(C,p), which leads to a simple approximation for TCP(C,p). Let !(u) be the Mellin transform of f (x): (3.10) with u
~ 1.
Let f(u) be the Mellin transform of e- x , i.e.
1 00
r(u) =
e-Xxu-1dx .
Note that f(·) is the well known gamma function and satisfies the identity I'( u + 1) == uf( u). For alIi ~ 0, define l
III (u) ==
II (1 - 2-
u- 2
k) .
(3.11)
k=O
The following is proved in Appendix C. THEOREM 3.2 (Mean TCP Throughput). The Mellin transform of the probability density f (x) of the instantaneous TCP throughput is given by __
f( U) ==
"LJI?O II I (U )Gu
(ac
l
2
-2-
)
aC 2 l
L:l~O Ill(l)C (-2-)
f(u/2)
r(u/2+1+1)
f(1/2)
(3.12)
f(1/2+1+1)
In particular, the mean TCP throughput is
REMARK 3.1. We have two different expressions for the mean stationary throughput, obtain by two different methods: that of Equation (3.9) and that of Theorem 3.2 above. It would be interesting to prove directly that the two expressions are equal, but we have not been able to do this. We have checked the equality numerically to see if the two expressions yield the same numerical result, and in all cases we tested, the results matched closely. From (3.13) we obtain the following approximation, proved in Appendix D.
26
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
3.1 (Approximation to Mean TCP Throughput). mean TCP throughput satisfies COROLLARY
3C
TCP(C,p) ==
4 -
pR 2C 3 11
---v:- 256 -
925667
+ 377487360
p3 R
6C7
£6
p2R 4 C S
497
£4
491520
The
(3.14)
o( 3 R 6C7jL 6)C . + P
Higher order expansions are easily obtained using e.g. Maple. As expected, the values of the throughput are insensitive to the value of C if the packet loss probability p is sufficiently high. In fact we have the following corollary. A sketch of the proof is provided in Appendix E. COROLLARY 3.2. For all p > 0 we have
.
f2
1 II 00 (2)
1.309
J~oo TCP{C,p) = V~ ftIIoo{l) ~ ..jPR/L ·
(3.15 )
The result in Corollary 3.2 coincides with [3], which is expected since letting C approach 00 is equivalent to considering random transmission losses only. 4. Optimization of wireless channel parameters. In this section we consider optimization of the wireless channel parameters to maximize the throughput of the TCP session that passes through it. We can use the formula for the TCP throughput in (3.14) to evaluate TCP(C,p) as a function of C and p. The wireless channel parameters will determine C and p, and we wish to maximize TCP(C,p) with respect to these parameters. We will consider two regimes, the low SINR regime and the high SINR regime. For example, the low SINR regime arises in the context of codedivision multiple access (CDMA) systems. In this regime, a processing gain m is used to adjust the transmission data rate, or equivalently the transmission energy used per bit. A large processing gain is generally needed to compensate for low SINR. We shall consider the problem of optimizing the processing gain to maximize TCP throughput. We shall also consider the use of error correction codes, and jointly optimize the coding rate and the processing gain. When the SINR is high, it is possible to reliably send more than one bit per transmitted symbol. If the number of possible values of each symbol is M, then M is called the alphabet size. The number of bits per symbol is log2 M. A large SINR generally enables use of an alphabet size greater than two with low probability of symbol error. We consider the problem of adjusting the alphabet size in order to maximize TCP throughput. In the next subsection, we present a wireless channel model, which yields formulas for p and C as a function of the wireless channel parameters.
4.1. Wireless channel model. We first consider a model appropriate for a low SINR at the receiver, in the context of a CDMA network.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 27
4.1.1. Code division multiplexing without FEC. We will consider the case of code-division multiplexing with direct sequence spread spectrum modulation, with a chip duration of T; seconds. Binary Phase Shift Keying (BPSK) is assumed as the underlying modulation scheme. Each bit transmitted is encoded into m chips using a spreading sequence, where m is called the processing gain. The bit transmission rate is thus
c=
1
(4.1)
mTc '
Typically, many users may transmit at the same time, causing interference at each receiver. Each user has a receiver which correlates the incoming signal with the spreading sequence used at the transmitter. Let 1 be the signal to noise plus interference ratio (SINR) at the output of the correlator. We shall define 1 mathematically when we discuss assignment of SINR values later. If we model the interference at the correlator output as Gaussian, the probability of a bit error is
BER == Q(JTF0),
(4.2)
where Q(.) is the CDF of the zero mean unit variance Gaussian density, oo x2 i.e. Q(x) == (1/ /21f) Jx e- / 2dx. In the uncoded case that we consider here (no FEC), each packet corresponds to transmission of L bits on the channel. The probability of a packet error is
(4.3) We will use the following approximation for p, which is valid when BER is small: p ~ LQ( JTF0)
.
(4.4)
4.1.2. Code division multiplexing with FEC. We shall also consider the case where a block code for forward error correction (FEC) is used. In this case, the L bits of each packet are encoded into N bits, where N 2: L. The ratio p == L / N is called the coding rate. Since we count throughput in terms of information conveyed, the bit transmission capacity in this case is C==
p
-T . m
(4.5)
c
The collection of N bits is called a codeword, and there is a codeword for each of the 2£ possible bit patterns of a packet. In general, the N codeword bits contain redundancy, so that if bit transmission errors occur, the bit pattern of the originally encoded packet can sometimes be recovered
28
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
at the receiver. We assume that up to t errors can be corrected, and that t + 1 or more bit transmission errors result in a packet loss. The probability of a packet error is thus
~(N) j
P = j~l
.
[Q (0WYW [1- Q (0WY)] N-J'
.
Define () == (t + 1) IN. We use the following large deviations-based estimate for p: (4.6) where q == Q( jiiVY) and h(x) == -x log2(x) - (1- x) log2(1-x) is the binary entropy function. Notice that this approximation uses the dominant term of Stirling's formula, and that better expansions could be used. In order to determine t, or equivalently (), we use the Gilbert-Varshamov bound ([12] p. 463, [15], §IILA), which implies that for a given coding rate p, there exists a block code with "error correction capability" (), where () satisfies p == 1 - h(2()) .
(4.7)
4.1.3. Adaptive modulation. Next, we discuss a scenario appropriate for high SINRs, where more than one bit can be conveyed per transmitted symbol. In the adaptive modulation we consider, the alphabet size M can be varied. Each symbol can thus take on M possible values. The duration of each symbol is T seconds, and the bandwidth of the transmitted signal is proportional to 1IT. The bit transmission rate in this scenario is therefore C
==
log2 M T .
(4.8)
We define the signal to noise ratio 'Y in this case to be ratio of the average
symbol energy at the receiver to the noise power spectral density. It should be noted that "'I is also equal to the ratio of the signal power at the receiver to the noise power at the receiver, after channel bandpass filtering. As stated in [6], for large values of " the probability of a bit error is well approximated by
Since we have L bits per packet, we use the following approximation for the probability of a packet transmission error, which is valid when BER is small:
(4.9)
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 29
,i
First we consider a single TCP session i which is assigned a given SINR == 1. We are interested in optimizing the processing gain m, == m value and the coding rate Pi == P in order to maximize the mean TCP throughput. Recall that we have C == -mL rc ' and that the packet error probability p is a decreasing function of m and an increasing function of p. We wish to maximize M == TCP( C, p) over all possible values of m and p.
4.2. Optimization of TCP throughput. In this subsection we consider optimization of TCP throughput in the three cases outlined in the previous subsection.
4.2.1. Code division multiplexing without FEC. We first present some numerical results. In Figure 2, we consider the case where Ii == 0.03 and the RTT value is R == 0.1 second. On the top right curve we plot the TCP throughput as a function of the processing gain m. We see that there is fair amount of sensitivity to the processing gain, and there is a unique maximum around m == 450. To get a better sense of this, on the top left plot we illustrate the TCP throughput ISO curves, i.e. values of C and p that yield the same TCP throughput. Superimposed with these ISO curves, we plot the locus of (p, C) values corresponding to different values of processing gain. We see that the optimum processing gain corresponds to crossing a "knee" of a TCP ISO curve. For reference, we consider optimization of the "UDP throughput" C(l- p) in the bottom two plots in Figure 2. In this case we use (4.1) for C and (4.4) for p in the expansion for TCP(C,p) in (3.14), for an explicit optimization of the mean throughput. Using the approximation Q(x) ~ e- 2 in (4.4), we have
vbx x2/
(4.10)
If aC 2 /2 is small, then the first order expansion of the "few losses" case (see (D.3)), corresponding to the first two terms in (3.14), gives the following expression for the mean rate TCP(C p) ,
~
1(3 T21
4Tc
- - - - -3(3- e m m !iYi C
!!!J.) 2
Villi
(4.11 )
'
with 13 = 64il~. Differentiating with respect to m, we get that the optimal m solves
1 1(
- -2 4Tc m
- 3 +(3-
2TJ
(f m/Tii
7) _
- - + - 2/Tii --
m
e
Tn, ) 2
==0
'
that is
(4.12)
30
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI 4
X 10 1.6...---,.....--.....----,-----.---.,
Tep Throu h ut Isocurves
r-;
1.4
2.5
I~",
1.2
1.5 1
t
12000' 10000"'"
/ /
0.8 . 0.6
_8000-......_
_
--.,;:;~.....Io--_
0.01
...._ ..._
- 8000- ..
0.06
,)
0.4~.../
_--Lo._---L_--l
0.04
'
o
0.08
200
400
4
x 10
2.5
UDPGoodput Isocurves
25000 (10
aU
0
0
0
0
0
en
2
2
2.5
en a.
2
,gsa.
1.5
X
0
~
0.02
0.04
0.06
0.08
1000
4
j~-, '"
~
,
-,
I
<;
I I
""-,''"'---
/
a.
1.5
800
10
"'0 0 0 (!}
o
600 m.I
P
0.5 0 0
t
)
200
400
600
800
0
m.I
P
FIG. 2. Example of cross-layer optimization result - Uncoded case: Ii == 0.03, R== 0.1 second.
Higher order approximations of Q can of course be used in the same way when needed. Taking L == 320, R == .1, r == 3 . 10- 2 , T; == 10- 7 , we get {3 rv 1.15 . 10- 6 ; so using the first order approximation for the TCP throughput, the optimal processing gain m satisfies e-· 015m == 4.727.10- 9
mVi'i .
.03+ 2m
(4.13)
The solution is m == 459. The associated value for aC 2 / 2 is pC2 R 2/ (2L2) rv 0.815 which, upon examination of (3.14), justifies the use of the few loss approximation (the correction brought by the second order term is appro 4/1000 of the value given by the first order approximation). REMARK 4.1. Our model has an interesting relation to [5}, where all packet transmission errors are hidden with a link layer mechanism (eg link layer ARQ), so that the only packet losses that TCP reacts to are buffer overflows at the link layer buffer. Let Wbd be the bandwidth delay product (where bandwidth is defined in units of TCP packets/sec). The authors of [5] propose sizing the buffer so that the overflow probability q satisfies q(Wbd)2 == 1. With our method, we find the optimal operating point at aC 2/2 == 0.815, which is equivalent to p(Wbd)2/ 2 == 0.815.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT
rcr ThroughputoPt[bps], sinr=0.03,rtt-u.ts
600 500
---15k
31
UDP GoodputOPt[bps], sinr=0.03, rtt=0.15
500
400
400
-23k -24k _~"(>.):,:.,,,.:
200
-_.
100_~'"
-~
100-~
o
'------'----""-------'---~
0.2
0.4
0.6
,:.~6k
200
0.8
0'------'----"----'-------' 0.2 0.6 0.4 0.8
KlLopt
KlLopt
4
opt X 10 4.5 ..------r-~::.w..a.JIU~:JU..L-...,.....t_____,
opt \
TCPT -
4
UDPG
I
3.5
i
3
~ 2
2.5
2
I
1.5 -
TCPT
UDPG
I
1 '------'----""---------'--~ 0.2 0.6 0.8 0.4
o o
L..-.I---_~ _
250
_" -_
500
__'___----'
750
1000
KlL
,i
FIG. 3. =:
0.03, R
Example of joint optimization of processing gain m, and coding rate K/ L. =: 0.1 second.
4.2.2. Code division multiplexing with FEC. In this case, we wish to jointly optimize the processing gain m and coding rate p in order to maximize TCP throughput. In this case we use (4.5) for C and (4.6) for p, where () is determined from p using (4.7). In Figure 3, we again consider the case where 1 == 0.03 and the RTT value is R == 0.1 second, and plot results for when both the processing gain m and the coding rate p == L / N are jointly optimized. The top left plot shows how the optimum processing gain m * changes as a function of the coding rate p. On the graph we have also labeled the approximate value of the TCP throughput corresponding to each point. For example, for a coding rate of about 0.4, the optimum processing gain is about 100, and the corresponding TCP throughput is about 31Kbps. For comparison, in the top right plot we show how the optimum processing gain m * changes as a function of the coding rate p, where we optimize UDP goodput, i.e. (1 - p)C. In the bottom left plot we show the optimal TCP and UDP throughput as a function of the coding rate p. In the bottom right plot we show the optimum TCP and UDP throughput as a function of the processing gain m. Next, we describe a procedure for an explicit joint optimization. Define J == TCP(C,p). We are interested in maximizing J with respect to the
32
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
processing gain m and the coding rate p. For convenience we optimize the TCP throughput J with respect to m and (), and then the optimal coding rate p* is determined by ()* through (4.7). Note that ()* and m* satisfy g~ == 0 and ~~ == O. Define FI (m, ()) == ~~ and F2 (m,()) == g~. In Appendix F, we provide the calculation of F I and F 2 • We can solve the two non-linear equations F I (m, ()) == 0 and F2 (m,()) == 0 in order to determine the optimal processing gain m* and the optimal value of the coding rate p*, where p* is found from ()* by using equation (4.7).
4.2.3. Adaptive modulation. Using the approximation Q(x) _1_e~x
x 2
/
2
~
in (4.9) we obtain '
p
rv
2L
-1M-I -IMlog 2 (M)
J(M-l)e-2(~~J).
(4.14)
) 67f1
Using C == log2(M)/T and (4.14) in the expansion (D.3) gives: - l) TCP(C p) ~ - 1 ( 3Iog 2(M) - -1) (-1M - 1))M - 1 e- 2 ( M3~) (4.15) '4T T2 -IMlog 2(M)
with rt = 64~1.%wr. Differentiating with respect to M, we get that the optimal M solves
1(3
4Tc
M log(2)
- -1)e _ T2
3~
2(M-l)
k(M) ) = 0
with
k M _ 3')' . (-1M -1)JA1=l ( ) - 2(M - 1)2 -1M log2 (M ) log (M)2M-.JM-l 2
+ +
2y'M-l 2 Mlog 2(M)
(VM -l)VM
-1
(log2(M) 2VM
+
I ) VM log(2)
Mlog~(M)
.
Hence, the optimal M solves e
_ _ _3~_ _ 2(M-l)
3T 2 1 2Iog(2)1] Mk(M) .
== - - - - - -
(4.16)
Consider the case where T == 10- 7 , R == .1, and L == 1500. In this case, we get M* == 7 for T == 60 and M* == 14 for T == 200. In Figure 4, we plot the mean TCP throughput as a function of M for T == 60 and 1 == 200. We see that this analysis predicts the optimal value of M closely.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 33
,/~///F " ' , \ /
./
2. e+07
/
e+07
/f
1. e+07
/
/
/
\\,
\
/
j i
i
I
\ \
/ I
!
/
!
9+07
I
I
e+06
I i
i i
o
o1
2
10
M
FIG.
12
14
16
M
4. Mean TCP throughput as a function of the constellation size based on (4.9).
5. Assignment of SINR values. In this section, we first recall classical results on the downlink of a cellular CDMA network in §5.1. Based on this, we propose in §5.2 an assignment of SINR values (Equation (5.4) below) to the different users, that takes into account their interactions in the network. These SINR values are hence those to be used in e.g. Equation (4.12) when considering the optimization of such a CDMA framework. Suppose there are multiple users on the downlink of a CDMA cellular system, where the users may be associated with different base stations. The signal transmitted for the i t h user at the associated base station is denoted by Si(t) is given by
where Pi(t) is called the spreading code, bi(t) is called the data signal, and Pi is a constant. The spreading code Pi(t) takes on values in {-1, 1} and is constant over intervals of duration T e . Specifically, 00
Pi(t) ==
L
ciu(tjTe
-
k),
k=-oo where u(x) == 1 if 0 < x < 1 and u(x) == 0 otherwise, and for each user i, the elements of the sequence {ci}k=-oo are either +1 of -1. The constant T; is called the chip duration and is constant across users. The data signal bi(t) is also taking its values in {-1, 1}. For the i t h user, we assume that each bit to be sent is repeated m, chips, where m, is an integer. Specifically, we have
34
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI 00
n=-oo
The quantity m, is called the processing gain for user i. The data rate for user i is thus l/(miTe). Note that (Si(t))2 == Pi, and so the parameter Pi is called transmission power for the i t h user. Typically either pseudo-random or known deterministic sequences are used to define the spreading codes. For purposes of analysis here we assume that JP>{ == 1} == JP>{ == -1} == 1/2 for all i, k, and that is independent if i =1= i' or k =1= k'. We shall also assume that the data bits are of random and independent, i.e. IP{b~ == 1} == IP{b~ == -1} == 1/2 for all i, n, and that b~ is independent of b~, if i =1= i' or n =1= n'.
ct,
ct
ct
ct
We assume a so-called flat fading (i.e. frequency non-selective) channel model. Let vgki be the signal path gain from the base station associated with user k to the location of user i. For example, the useful signal at user i is given by vgiiSi(t). However, the signal intended for another user k, namely vgkiSk(t) also arrives at the location of user i, with possibly a time shift reflecting the different distances between base stations and users. In addition, an external white Gaussian noise signal ni(t) is also present at the receiver for user i, with two-sided power spectral density N~. The total signal at the receiver of user i is
ri(t) == vgiiSi(t) +
L
ygkiSk(t -
at) + ni(t).
k:k=j;i The numbers
at characterize the propagation delays between transmitters
and receivers.
We shall assume that a~ == 0 for all users k which are associated with the base station that user i is associated with. This is because the associated signals from such users k travel along the same path as the signal from user i. For other values of k, we shall assume that is a random variable uniformly distributed between 0 and miTe. In general the values of could be larger than miTe, but since we assume that all chips and data bits are random, there should be no loss of generality in this assumption.
a1
a1
5.1. Probability of a bit error. Consider the receiver at user i, in particular the operation of decoding the data bit represented by bb. The receiver first correlates the incoming received signal with the spreading code for user i, namely Pi(t), and integrates the output of the correlator over the interval [0, miTe]. The output of the integrator is Z~, where
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 35
l =l
Z6 =
rn i T c
pi(t)ri(t)dt rn i T c
Pi(t) ( V9iiSi(t)
o
L
j9ki Sk(t - at)
+ ni(t))
dt
k:k:/=i
= b~JgiiPi
l
+
l
rn i T c
(Pi(t))2dt +
o
L
JgkiPk
k:k:/=i
rn i T c
Pi(t)Pk(t - ak)bk(t - ak)dt
== bb J giiPimiTc +
L
+ Yd
J gkiPkPk,i + Yd,
k:ki=i
where
and
Pi,k
=
l
rni T c
Pi(t)Pk(t - aDbk(t - ak)dt.
The first term above, bbJg;;P;miTc, can be thought of as the "signal component"; the second term, Lk:k:/=i Jgk;PkPk,i is the "interference," and the last term yoi is the "noise". The receiver decides whether bb == 1 or bb == -1 on the basis of whether Z~ > 0 or Z~ < 0, respectively. The probability of a bit error for user i, BER i , is therefore
BERi
== P{
L J gkiPkPk,i + yd > J giiPimiTc}. k:ki=i
Note that Yd is a Gaussian random variable with zero mean and variance N~miTc. Using the central limit theorem, we approximate the interference term as being Gaussian as well, and independent of the noise term. In Appendix A we show that the interference term, Lk:k:/=i V9kiPkPk,i has variance (2/3) Lk:k~i gkiPkmi (T c )2. The sum of the interference and noise term therefore has zero mean and variance (Lk:k#i (2/3)9kiPkmi(Tc)2)
+
NOmiTc.
Approximating the interference terms as Gaussian, a standard analysis yields that the probability of a bit error, BER, is (5.1)
where
(5.2)
36
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
The quantity miri is known as the "EbjNo", or energy per bit per noise power density.
5.2. SINR allocation. Due to mobility of users, the gain values {9ki} change with time. We assume that a closed loop power control algorithm is used, to vary the power values {Pi} to maintain the SINR values {r i } at prescribed values. Define the SINR vector r(p) == [r 1,r2, ... ,rN]T. We say that an SINR target vector 1 is feasible if there exists a set of non-negative power values {Pi} such that r(p) == 1. If 1 is feasible, then the power control algorithm sets the transmission powers {Pi} accordingly to achieve the target SINR vector 1. Next we examine the feasibility condition for a target SINR vector. If ~ 1, then for all i we have
r
where a i2 == NoOJTi: Equivalently, we have
Pi - (2/3)'yi
L (gki/ gii)P
k
~ ''fiaT / gii ,
k:ki=i
or in matrix notation, P- F P ~ b, where F == {Fi,j} is an N x N matrix with Fii == 0 and Fi,j == (2j3),i(gjij gii) if i =1= j, and
b == ['1 ar /911, '2a~ /922, ... , '"'(Na'ftv / 9N N ]T. There exists a non-negative finite P satisfying the above, if and only if the spectral radius of the matrix F is less than unity. In this case, the minimal P satisfying the above is P* == (I - F) -1 b. The minimal power vector P* can be found by an iterative distributed algorithm. There may be additional constraints on the power vector, e.g. there is generally a peak power constraint for each base station. A simple sufficient condition for the spectral radius of F to be less than 1] is that each row sum of F is less than 1]. In general, we can set 1] strictly less than 1 as a safety margin, as suggested in [1]. Setting 1] == 1, 1 is feasible if for all i we have
(2/3)'"'(i
L (9ki/9ii) S 1 .
(5.3)
k:ki=i
Note that gki/ gii == 1 for all users k that are associated with the same base station that user i is. Furthermore, the value of 9ki/9ii is the same for all users k which are associated with the same base station. Suppose there are B base stations. Let b, be the base station associated with user i. Assuming b =1= bi, define CYb(i) == 9ki/9ii, where k is such that user k is
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 37
associated with base station b. Let N b be the number of users associated with base station b. Defining Nfnt == Ek:k=/=i (gki/ gii), we thus have
L
Nfnt == N bi - 1 +
ab(i)Nb.
b:b=/=bi We call Nfnt the effective number of interfering users for user i. The attenuation factor ab( i) can be measured by user i by comparing the power received in pilot tones from base station b and it's assigned base station. The values of Ni, can be considered to be slowly varying, and reported directly to the base station associated with a given user i. If each user i reports the measured value of ab( i) to its associated base station for all b, then the value of Niint is known to the base station associated with user i, b.. Hence base station b, can calculate an appropriate value for the target SINR '"'Ii. In particular, we can set 1.51] '"'Ii == N~nt'
(5.4)
t
where 1] is a parameter set to a number strictly less than unity, as a safety factor. Note that in general, since the matrix F varies with time, the feasibility of a set of target SINR values also changes with time. Another simple sufficient condition for the spectral radius of F to be less than 1] is that each column sum is less than 1], i.e. for each j E [1, N] we have
(2/3)
L '"'Ik(gjk/gkk) < 1 .
(5.5)
k:k=/=j Unlike (5.3), the condition (5.5) specifies an explicit coupling between the feasible SINR values of the different users. In summary, we a set of target SINR values for the users is specified by the vector 1. A target vector 1 is feasible if and only if the spectral radius of F is less than one. Alternatively, we can use either (5.3) or (5.5) as the basis for allocating target SINR values 1i to the users. 6. Simulation. The aim of this section is to analyze the effect of a buffer by simulation. We simulate the case of one base station with a buffer that is shared between N TCP users on the downlink. The general setting is that of §4.1.1, with no FEC. The assignment of SINR targets is that of §5: all users having the same SINR target since there is only one base station. We used the hybrid simulator Netscale [7]. More precisely, the simulated dynamics is as follows: as in the mathematical model, each TCP flow is persistent (i.e. always has packets to download); it evolves according to either the slow start or the congestion avoidance phase. In the congestion avoidance phase, it increases its window size W(t) of 1 unit every time
38
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI TABLE 1 Comparison of TCP Formulas with Simulation Results.
TCP Formulas TCP Simulation
Optimal processing gain
Throughput
17 18
155 167
W(t) acks and halves it at each packet loss. Losses occur due to either transmission errors (with probability p as defined in (4.3)) or congestion. In the simulation, if the window size is less than 3 or if several losses occur in the same RTT, the flow may not recover from losses: after some inactive waiting period, a time-out is triggered and the flow restarts in slow-start phase (see [13]). The main difference with the dynamics described above is that congestion losses now stem from the interaction of the N flows via the shared buffer. When one or more flows exceed their transmission capacity C, as defined in (4.1), the shared buffer starts filling in and may eventually overflow, creating congestion losses for these flows and possibly for the other flows as well. In the simulations, we used a size of 200kb for the shared buffer. The minimal value of the RTT is R ==0.5 s. (in the simulation, the RTT increases when the buffer fills in); the packet size is L==2kb and the max window size (not used in the mathematical model but taken into account in the simulation) is of 240 packets. In all cases (see e.g. Figure 5), we again observe a sharp maximum when plotting the long term average TCP rate of one flow in function of the common value chosen for the processing gains of all flows. For N == 2 users, Table 1 gives the optimal processing gains and the corresponding throughput per user (in kbps) as obtained by our mathematical model of Section 4 and by this simulation. Figure 5 plots the UDP and TCP goodput in function of the processing gain in the case with N == 10 users. We observe that in both cases, the optimal gain is slightly larger than what is predicted by theory. This could be explained by the fact that in the simulations, the throughput degradation after losses can be higher, because of time-outs. However our simulations suggest that the presence of a shared buffer does not modify our global conclusions that the maximum TCP throughput is achieved by a precise tuning of the processing gain which is quite different from that for a UDP flow.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 39
1 cell 10 terminals 50 45
UDP goo dput Tep model gOOdpu!
+ /
TC P measured goodput
•
40 35 30
"'
a. -"
25
""
20 15
+ +. +
10 5
a
a
50
100
150
200
processing gain
F IG . 5. N = 10: TCP and UDP goodput as a function of the processing gain by Netscale simulation (in blue) and according to our mathematical model (in green) .
APPENDIX A . Variance of interference. To calculate the variance of the interference, it suffices to show that Pi ,k has variance (2/3)mi(Tc )2. Note that Pi ,k is a zero mean random variab le. Since the statistics of
are identical to that of Pk(t -
aU, we have that E[(Pi,k)2j is
40
FRAN<;OIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
Hence
E[
(l
m iT c
Pi(t)Pk(t _ aUbk(t _ aUdt) 2]
= miE[ = miE[
(l (l
Tc
Pk(t _ at)dt) 2] Tc
Pk(t _ a)dt) 2]
=miE[(aC~l +(TC-a)c~)2] =m i(E[a 2] +E[(Tc-a)2]) == 2miE[a2]
= 2mi(1/Tc)
i
TC
x
2dx
== (2j3)miT~. Therefore, the variance of the interference is
L (2/3)gki mi(T )2Pk. c
k:ki=i
B. Proof of Theorem 1. Let ft(x)be the probability density function for X (t). We wish to examine how the density ft evolves over time, and obtain the steady state density if it exists. For x < Cj2 we have that ft(x)dx is JP>{X(t) E [x, x
+ dx]}
[x - (Lj R 2)dt , x - (Lj R 2)dt + dx]}(l - (pj L)xdt) +JID{X(t-dt) E [2x,2x+2dx]}(pjL)2xdt == ft-dt(X - (Lj R 2)dt)(1 - (pj L)xdt)dx + ft-dt(2x)(pj L)2x2dxdt.
== JID{X(t - dt)
E
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 41
Rewriting this, we have
ft(x)
==
ft-dt(X - (Lj R 2)dt)(1 - (pj L)xdt)
+ ft-dt(2x)(pj L)4xdt
or
ft(x) - ft-dt(X - (Lj R 2 )dt ) ==
-
ft-dt(X - (Lj R 2 )dt )(pj L )xdt
+ft-dt(2x)(pj L )4xdt. Thus,
ft(x) - ft-dt(X) == -
+ ft-dt(X)
- ft-dt(X - (Lj R2)dt) dt ft-dt(X - (Lj R 2)dt)(pj L)x + ft-dt(2x)(pj L )4x.
Letting dt and dx approach zero, we obtain the following equation (A. la Fokker-Planck)
a f~~X)
+ (L/ R 2 ) a ~~x)
= _(p/L)xftCr) + 4(p/ L )xft(2x).
(B.1)
Similarly, for x > Cj2 we have that ft(x)dx is
IfD{X(t)
E
[x,x+dx]} ==IfD{X(t-dt) E [x-(LjR 2)dt,x-(LjR2)dt+dx]} (1 - (pj L )xdt) 2)dt)(1 - (pj L )xdt)dx. == ft-dt(X - (L j R
Rewriting this, we have
ft(x)
==
ft-dt(X - (Lj R 2 )dt )(1 - (pj L)xdt)
or
Thus,
ft(x) - ft-dt(X) + ft-dt(X) - ft-dt(X - (L/R2)dt) dt 2)dt)(p/ L)x. == - ft-dt(X - (L/ R Letting dt and dx approach zero, we obtain
aft(x) + (L/R 2 / ft(X)
at
ax
= -(p/L)xft(x).
(B.2)
We assume that ft(x) ~ f(x) as t ~ 00, i.e. that f is the steady state density of X (t). From the above, setting afa~x) == 0, it follows that f satisfies the differential equation
df(x) {-aXf(X) dx -axf(x)
if C/2 < x < C
+ 4axf(2x)
if 0 ::; x < 0/2
'
(B.3)
42
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
where a == pR 2 / £2, which proves (3.3). For C/2 < x < C, note that a solution to the equation d~~) -axf(x) is given by
f(x) == Vaexp(-ax 2/2)
for x E (C/2,C),
where Va is any constant. It is interesting to note that shape. For (C/4) < x < C/2, we therefore have
f has a Gaussian
d~~) = -axf(x) +4Voaxexp(-4ax 2/2). The solution to this is
f(x) == VI exp( -ax 2 /2) + Va(-4/3) exp( -4ax 2 /2), for x E (C/4,C/2), where VI is any constant. For x E (C/8,C/4), f(x) must therefore satisfy
d~~) = -axf(x) + V14axexp( -4ax 2/2) +Va(-4/3)4xa exp( -16ax 2 /2). A solution to this is
f(x) == V2 exp( -ax 2 /2) + VI (-4/3) exp( -4ax 2 /2) +VO( -4/3)( -4/15) exp( -16ax 2 /2), for x E (C/8,C/4), where V2 is any constant. The general solution can be found by induction to be
f(x)
=
Vnexp(-ax
2/2)
+
t
Vn- 1
[J] (22;~ 1)] exp(-2 2Iax
2/2),
for x E (C /2 n +I , C /2 n ) , which proves (3.5). Next we find the value of the constants Va, VI, V2 , .... Note that ft(C/2 + dx)dx is IP'{X(t) E [C/2, C/2
= JID{ X(t -
+ dx]}
}(1- (p/L)C/2dt) [C - ~2 dt, C - ~2dt + dX] }(1 - (p/ L)Cdt)
dt) E [C/2 -
+JID{ X(t -
dt)
E
~2dt,C/2 - ~2dt + dX]
= ft-dt( C/2 - ~2dt)(1- (p/L)C/2dt)dx + ft-dt ( C -
~2 dt) (1 -
(p/ L )Cdt)dx.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT
Letting dx and dt approach zero, and t
---t
00,
43
we obtain (B.4)
which proves (3.4). This can be used to find a relationship between Va and VI: using (B.4), we have
Va exp( -o:C 2 /8) == VI exp( -o:C 2 /8) + Va(-4/3) exp( -o:C 2 /2) +Va exp( -o:C 2 /2) . Thus,
or equivalently VI == Va(l For n
+ (1/3) exp( -30:C2 /8)), which is equation (3.6).
> 1, ft(C/2 n + dx)dx is
Letting dt approach zero, and t
f((C/2 n ) -
---t
00,
we obtain f((C/2 n )+ )
).
Thus we have
Vn exp (-
=
Vn-1 exp
r
a(~ /2) + ~ v., [J1 (22~~1) ]
exp (-
r/
221a(~ 2)
(-a (~) /2) ~ Vn-l~ + [J1 (22~~ 1)]exp (- 221a(~) /2).
44
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
Therefore we have that Vn exp( -a( ~ )2 /2) is equal to
Vn- 1
[exp(- a(~ r /2) +(4/3) exp (- 4a(~ r /2)]
-t, Vn~l [11 (22;~ 1)] exp (- 22la(~ r /2) ~Vn-l-l [11 (22;~1)] exp( -22la(~r/2)
+
=Vn - 1
[exp( -a(~r/2)+(4/3)exp( -4a(~r/2)]
-t, [11 (22;~ 1)] exp (- 22la(~ r/2) +t, [n (22;~ 1)] exp (- 22(l-1)a(~ r /2) Vn-l
Vn-l
which is equal to
Vn- 1
r/
r
[exp(- a(~ 2)+(4/3) exp (- 4a(~ /2)]
+~ v,.,
[n (22;~ 1)] exp (- 22(l-1)a(~ r/2)
[1+ (22l~1)exp( -(3)22(l-1)a(~r/2)]
.
Thus we have
Vn =
~Vn-l
ll] (22;~ 1)] exp (-
(22(l-1) -1)a(C/2 n)2/2)
[1 +(2 2l~ 1) exp (- (3)22(l-1)a(C/2 n)2/2)] , which is equation (3.7).
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 45
To find the constants {Vn } we will also need the following equation:
1 C
1=
f(x)dx
C
00
= { f(x)dx +
l
:L {
TL
f(x)dx
n=l }C/2 n + 1
}C/2
== Va
C/2
c 2
exp( -ax /2)dx
+
:L l 00
C 2n /
n=l C/2 n +1
C/2
which proves (3.8). The proof of (3.9) is along the same lines.
C. Proof of Theorem 3.2 (Mean TCP throughput). We have
df(x) {-axf(x) dx -axf(x)
< x
(C.l)
where ex == pR 2 / L 2 • The function f is differentiable everywhere but at C /2 where it is differentiable on the right and on the left and with a jump such that:
(C.2) Note that f(x) == 0 for x ~ [0, C]. Let f(u) be the Mellin transform of f(x):
1
00
1(u)
=
f(x)xU-1dx
with u 2: 1. Multiplying both sides of equation (C.l) by w.r.t. x, we get
(C.3) XU
and integrating
f(C-)CU - f((C/2)+)(C/2)U - u {C f(x)xu-1dx }C/2
= -0'
{C f(x)x u+1dx }C/2
(C.4)
46
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
and
(C.5)
Adding (C.4) and (C.5) and using (C.2), we obtain
Note the similarity of equation (C.6) to the well known equation I'(u-} 1) == uf(u), satisfied by the Gamma function. Motivated by this observation, we
now express the unknown function f(·) in terms of a new unknown function g(.) by defining
f(u) Multiplying (C.6) by
2)
u/2
= g(u)r(u/2); (
(C.7)
.
(ir/ 2 ~r(u/2+1)' one gets
g(u) = (1 - 21u ) (\lJ(u)
+ g(u + 2)) ,
(C.8)
with
\lJ(u)
= I(C-) c: (::)U/2 2
1
2
.
f(u/2+1)
For all integers 1 let l
IT (1 - 2-
IIl(u) ==
u
-
2k
) .
k=O
Note that the infinite product II oo (u) is well defined and non zero. The general solution of (C.8) is
g(u) ==
L ITl(U)\lJ(U + 2l) + IIoo(u)h(u), l~O
with h(u) some finite periodic function with period 2. Hence
f(u)
= 1(C-)
2
"'II (u)c u +2l (~)l
~
2
l
(
2)
+I1 oo(u)h(u)f(u/2);
u/2
f(u/2) r(u/2+l+1) .
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 47
Since the density f(x) has its support on [0, CJ, the function f(u) grows at most CU when u tends to 00. The growth of the function
f(C-) ""'IT (u)c u +21(~)l f(u/2) 2 L..-t l 2 f(u/2+l+l) l~O
is easily seen to be at most Cu. So if h(.) is non zero, the order of would be that of
f(u/2)
f (u)
2)u/2 (~
which is a contradiction with the fact that the order of this function is at most CU when u tends to 00. Hence h(u) == 0 necessarily, so that 1(u)
=
f(C-) ""'IT (u)c u+ 21(~)l r(u/2) . 2 L..-t l 2 f(u/2+l+l)
(C.g)
l~O
The constant
f (C-) is determined by the relation
1 = 1(1)
=
f(C-) ""' IT (1)CH21 (~)l f(1/2) . 2 L..-t l 2 f(I/2 + l + 1) l'20
We finally get the following expression for the Mellin transform of interest:
'"
II( )CU+2l(Q)l
1(u) = L.1~O I U '2 '" II (1)Cl+2l (Q) l L.Jl~O l "2
r(u/2)
r(u/2+1+l)
(C.IO)
r(1/2) , r(1/2+l+1)
which proves (3.12). The mean TCP throughput is TCP(C,p) == f(2), which proves (3.13). D. Proof of Corollary 3.1 (Approximation to Mean TCP
throughput). We have
TCP(C,p) == 1(2)
==
3C [1 + ~pR2C2 j L2 + o(pR 2C2 j L 4
2
)]
1+~~pR2C2/£2+o(pR2C2/£2)
(D.1)
pR2C ~ o( R 2C2jL 2)C. L2 256 + P 3
= 3C _
4
Using the Maple software tool, we obtain the following second order expansion _ pR2C ~ f--(2) = 3C 4 L2 256 3
_
p
2R4C5 ~ L4 491520
o( 2R4 C 4 j L 4 )C
+
P
(D )
..2
This suggests using the approximation for the "few losses" case:
1(2) ~ 3C _ pR C ~ 4 L2 256 2
3
.
(D.3)
48
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
E. Proof of Corollary 3.2 (sketch). From Stirling's formula, if n is large,
An (Ae)n
-rv
n!
-
~
1
---
V2nn'
Hence
( An)
log - , n.
r-;»
nlog(A)
+n -
1 1 nlog(n) - -log(n) - -log(27T). 2 2
For A large, the last expression is maximized with respect to n when n satisfies
log(A) -log(n) - -
1
2n
~
0,
i.e., for n close to A. If aC 2 == pR 2C2 / £2 is large, the last observation shows that the dominant term in the numerator of (3.13) is approximately
whereas the dominant term in the denominator is
Hence the ratio tends to 1(2) =
(i_I Il:x>(2)
V~ fi Il
oo(l)
== 1.309 .
JPR/ L
F. Calculation of ~~ and :~. We have
BJ 8e
BJ Be
BJ Bp
= 8C 8e + 8p 8e .
(F.l)
From (3.14), we have 8J R 2 C 3 11 8p ~ -£2256 '
(F.2)
and 8J 3 8C ~ <1 -
R 2C2 33 P£2 256 .
(F.3)
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT
49
Now
(F.4) Next, we have
(_p)
Be _ ~ BO - BO
(F.5)
mTc
2 ) = ( mTc
20
log2 ( 1 - 20) .
(F.6)
We thus have an equation expressing ~: in terms of m and 0, say
(F.7) where the function F I (m, 0) is determined by substitution in the equations above. Next, we have
BJ
-
iim
BJ Be BeBm
BJ Bp Bp Bm
== - - - + - - - .
(F.B)
The quantities g~ and ~~ were given above. We have
Be
am
p
- m 2T
c
(F.9)
·
In order to compute ~ note that
Bp Bm
Bp Bq oqBm·
We have
op ~ ~ [2N[h(O)+O log2(q)+(I-O) log2(I-Q)]] Bq Bq == pL(lj(plog2))(Ojq - (1- O)j(l- q))
= pL(l/ (p log 2)) (qrl -=- qq))
,
(F.10)
(F. 11)
and
~ == __1_e- m "! / 2 {1 om
J27f
V~·
(F.12)
50
FRANQOIS BACCELLI, RENE L. CRUZ, AND ANTONIO NUCCI
We thus have an equation expressing g~ in terms of m and (), say
(F.13) where the function F2 (m, 0) is determined by substitution in the equations above.
7. Conclusion. In this paper we have considered the optimization of wireless channel parameters in order to optimize the throughput of a single TCP connection passing through the channel. We considered the optimization of the transmission energy consumed per bit. We assumed that the SINR at the receiver of the channel is held constant. We considered two regimes - the first where the SINR value is low and where energy per bit is increased by increasing the processing gain, and the second where the SINR value is high and the energy per bit can be decreased by increasing the symbol alphabet size. In both case, we found that the TCP throughput can be fairly sensitive to these parameters, and our results suggest that these parameters should be set carefully and as a function of the system scenario. We have also found that the use of forward error correction, when the coding rate is optimized, can significantly increase TCP throughput. We have also proposed a way to assign SINR values in a network context, where the interaction between users should be taken into account.
Acknowledgments. We would like to thank J. Bolot for his inputs to this line of thoughts as well as D. Hong and L. Fournie for their comments
on the paper and their help on the simulations used in §6. We are also grateful to G. Giannakis for having suggested the idea to apply our ideas, which were initially motivated by CDMA, to adaptive modulation.
REFERENCES [1] F. BACCELLI, B. BLASZCZYSZYN, AND M. KARRAY, "Up and Downlink Admission and Congestion Control and Maximal Load in Large Homogeneous CDMA Networks" ACM MONET, 9(6), Dec. 2004. [2] F. BACCELLI, K.B. KIM, AND D. DE VLEESCHAUWER, "Analysis of the Competition between Wired, DSL and Wireless Users in an Access Network", Proceedings of IEEE INFOCOM'05, March 2005. [3] F. BACCELLI, D.R. McDoNALD, AND J. REYNIER, "A Mean-Field Model for Multiple TCP Connections through a Buffer Implementing RED," Performance
Evaluation, 49: 77-97, 2002. [4] H. BALAKRISHNAN, V.N. PADMANABHAN, S. SESHAN, AND R. H. KATZ, "A comparison of mechanisms for improving TCP performance over wireless links,"
IEEE/ACM Trans. on Networking, 5(6): 756-769, 1997. [5] H. CHASKAR, T.V. LAKSHMAN, AND U. MADHOW, "TCP Over Wireless with Link Level Error Control: Analysis and Design Methodology", IEEE/A CM Trans. Networking, 7(5): 605-615, October 1999. [6] A. GOLDSMITH, Wireless Communications, Cambridge University Press, 2005. [7] D. HONG, Netscale, http://www.n2nsoft.com/. 2005.
WIRELESS CHANNEL PARAMETERS MAXIMIZING TCP THROUGHPUT 51 [8] T.V. LAKSHMAN AND U. MADHOW, "The performance of Tep/IP for networks (9J
[10]
[11]
[12] [13] [14] (15]
with high bandwidth-delay products and random loss," IEEE/ACM Trans. Networking, 5(3): 336-350, June 1997. Q. LIU, S. ZHOU, AND G. GIANNAKIS, "Queuing with Adaptive Modulation and Coding over Wireless Links: Cross-Layer Analysis and Design", IEEE Trans. Wireless Communications, 4(3), May 2005. B. LID, D.L. GOECKEL, AND D. TOWSLEY, "TCP-cognizant adaptive forward error correction in wireless networks," GLOBECOM 2002 - IEEE Global Telecommunications Conference, 21(1): 2139-2144, November 2002. M. MATHIS, J. SEMKE, J. MAHDAVI, AND T. ()TT, "The macroscopic behaviour of the TCP congestion avoidance algorithm," ACM Computer Comm. Review, 27(3): 67-82, July 1997. J .G. PROAKIS, Digital Communications, McGraw-Hill, 3-rd Edition, 1995. W. STEVENS, TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. IETF RFC 2001. M. ZORZI AND R.R. RAo, "The effect of correlated errors on the performance of TCP," IEEE Comm. Letter, 1(5): 127-129, 1997. Q. ZHAO, P. COSMAN, AND L. MILSTEIN, "Optimal Allocation of Bandwidth for Source Coding, Channel Coding and Spreading in CDMA Systems", IEEE Transactions on Communications, 52(10): 1797-1808, October 2004.
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS: TOWARDS MODELING HEAVY TAILS AND LONG RANGE DEPENDENCE ROBERT T. BUCHE*, ARKA GHOSHt, VLADAS PIPIRAS+, AND JIM X. ZHANG§
Abstract. Heavy traffic models for wireless queueing systems under short range dependence and light tail assumptions on the data traffic have been studied recently. We outline one such model considered by Buche and Kushner [7]. At the same time, similarly to what happened for wireline networks, the emergence of high capacity applications (multimedia, gaming) and inherent mechanisms (multi-access interference) of wireless networks have led to the growing evidence of long range dependence and heavy tail characteristics in data traffic. Extending heavy traffic methods under these assumptions presents significant challenges. We discuss an approach for extending the methods in [7] under a heavy tail assumption only. The corresponding heavy traffic model is based on (non-Gaussian) stable Levy motion, not Brownian motion which is associated with a light tail assumption. When long range dependence is also present, a promising alternative approach and model based on a Poisson measure representation, motivated from Kurtz [17], are described. The corresponding heavy traffic model is now driven by fractional Brownian motion. As stochastic control analysis for stable Levy motion or fractional Brownian motion is currently undeveloped, the queue limit models characterizing the wireless system can be studied only under given controls, such as stabilizing controls or else heuristic policies. Key words. Wireless queueing networks, heavy traffic method, long range dependence, heavy tails, stable Levy motion, fractional Brownian motion, controls. AMS(MOS) subject classifications. 93E99, 60G18, 60G51, 60G57.
1. Introduction. Heavy traffic methods for wireline queueing systems are well developed (see [21] and the references therein). The basic idea of the heavy traffic method is to simplify the equations for the queueing system dynamics, which has complex stochastics, by looking at the asymptotics which retain much of the structure of the original system. A heavy traffic assumption, loosely speaking, means that the queueing system is operating at near capacity. The asymptotic model of the queueing system under heavy traffic is obtained using weak convergence methods and is expressed as a stochastic differential equation with reflection (SDER). *NC State University, Mathematics Department, Campus Box 8205, Raleigh, NC 27695 (rtbucheeuni t y .ncsu. edu), supported by Intelligent Automation Inc., Rockville, MD, through ARO grant W911NF-04-C-0138. tIowa State University, Statistics Department, 303 Snedecor Hall, Ames, IA 500111210 (apghoshaiast at e . edu). +University of North Carolina, Department of Statistics and OR, Smith Bldg, CB3260, Chapel Hill, NC 27599 (pf.pa raseemai L. unc. edu). §NC State University, Mathematics Department, Campus Box 8205, Raleigh, NC 27695 (j Imxzhangencsu . edu) , supported by Intelligent Automation Inc., Rockville, MD, through ARO grant W911NF-04-C-0138. . 53
54
ROBERT T. BUCHE E1' AL.
Heavy traffic methods for wireless queueing systems have recently been addressed by a few authors. The major difference from the wireline case is the random variations of the wireless medium (the channel variations). In Buche and Kushner [7), a forward-link, one-cell model with a fixed number of queues is considered. The channel process affecting the departure rates is a semi-Markov process. The associated stochastic control problem is to allocate the resource (e.g., power) to the queues where only part of the total power can be allocated; the rest is preassigned to meet a heavy traffic condition (a balance between the mean arrival and mean departure rates). In [32] (see also [31]), a general heavy traffic model is considered under a resource pooling condition, where the channel process is Markovian. Under the resource pooling condition, there is an associated state-space collapse to a one-dimensional workload process. Given a channel state, a collection of service rates is available to choose from and the optimal choice is a MaxWeight discipline that minimizes the system workload. In [1], an SDE model for the queue dynamics with lower and upper reflection boundaries (corresponding to buffer underflow and overflow, respectively) is proposed along with an ergodic control problem to minimize the long-run energy consumption cost and with an overflow (packet drop) constraint. This model allows an explicit solution for the optimal control and is applied to a static channel wireless model. In all of these models, bounded second moment conditions or light tails (LT), and weak or short range dependence (SRD), are imposed leading to Brownian-driven models. Growing empirical evidence and other factors have recently led to a major paradigm shift in various queueing systems from the above assumptions to those of heavy tails (HT) and long range dependence (LRD). This shift has initially occurred in wireline systems [24, 28], and then was followed by similar findings in wireless systems [2, 13, 14, 27]. In particular, there is evidence of HT and LRD in wireless internet traffic due to large file sizes from WWW page downloads and multimedia applications including streaming (music, news clips, movies) and interactive video (e-commerce, gaming). Unlike in wireline systems, LRD can also be explained by the multi-access interference (MAl) in wireless systems [38]. In [37], this effect was heuristically accounted for in a rate control setting (no queueing model). The LRD traffic affects network requirements (e.g. capacity and queue-size) and their design, for example, in planning for the next generation, 4G wireless networks [16]. The main goal of the paper is to discuss initial work toward extending heavy traffic methods and obtaining the limit queue models for wireless systems under HT and LRD assumptions. These are challenging problems with the limit dynamics expected to be driven by stable Levy motion (sLm) or fractional Brownian motion (fErn). Stochastic control theory for these processes is as yet undeveloped, but the queueing model can be used to characterize and make recommendations about the system design. Using the queue limit models, a variety of qualitative and quantitative in-
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
55
formation can be obtained: dimensioning the queues, first passage times, stationary distributions, and others in the spirit of [22, 34]. Extending the methods of [7] to incorporate only HT seems plausible and is discussed in Section §3. The perturbed test function method [19, 23] was used in [7] to show the convergence to the driving Brownian motion and to obtain the limit queue dynamics. This method is based on the wellposedness of the martingale problem (MP) associated with the infinitesimal generator of the limit process. Under HT, we expect the limit process to be sLm which has an infinitesimal generator with an associated well-posed MP. However, in the case of LRD, the limit process is expected to converge to fErn which does not have an associated infinitesimal generator, hence MP. To extend the heavy traffic methods to the case of HT and LRD, referred below simply as the LRD case, we introduce a simplified wireless model in Section §4. In this model, the arrival process is described by the discrete source model [25] where sources turn on according to a Poisson process and stay "On" for a random duration. The HT distribution of the "On" times is used to capture LRD of the data traffic. We employ a general approach of Kurtz [17] to represent the data input and output processes through a convenient Poisson random measure (PRM) on which the heavy traffic analysis can then be based. The discrete source model is closely related to the so-called On/Off model [25, 35] used in [18] for a wireline heavy traffic analysis under LT and SRD assumptions. The weak convergence analysis of [18] sheds light on the PRM approach used in our wireless model. The rest of the paper is organized as follows. In Section §2, the heavy traffic method under LT and SRD assumptions is discussed. In particular, [7] is summarized, highlighting the issues relevant for extension to the cases of HT and LRD. In Section §3, extension of the methods used in [7] to the HT case is considered. In Section §4, we address the extension to the LRD case by using the PRM approach. Glossary of abbreviations. The following abbreviations are used throughout the paper: fBm-fractional Brownian motion HT-heavy tails, heavy-tailed
LT-light tails, light-tailed LRD-long range dependence, long range dependent MAI-multi-access interference MP-martingale problem OMM-orthogonal martingale measure PRM-Poisson random measure SA-stochastic approximation SDER-stochastic differential equation with reflection sLm-stable Levy motion SRD-short range dependence, short range dependent
56
ROBERT T. BUCHE ET AL,
2. Heavy traffic model and stochastic control under light tails (LT) and short range dependence (SRD). We give here a short description of the heavy traffic wireless queueing model of [7], focusing on the aspects of the model needed for the extensionsto the cases of HT and LRD. We consider a one-cell forward-link wireless system with K queues at the base station, each dedicated to a mobile, Data destined for a particular mobile arrives randomly via wireline to the associated queue. The departures from the queue to the mobile is via the wireless medium (channel) whose state is modeled by a Markov or semi-Markov process. The vector channel state is indexed by j E :1; each j specifies the channel for all K mobiles. In particular, we suppose that j specifies ~t(j), i ~ K, an achievable transmission rate/unit power. The actual transmission rate depends on the power applied to a queue and the channel state. We do not impose latency requirements on the data. Interference modeling at the mobiles is not considered in the discussion below. (General interference modeling is discussed in [7] though not the relationship between MAl and LRD considered in [38].)
Heavy traffic analysis. In the heavy traffic method, one considers an embedded sequence of (scaled) queueing systems (Eq. (2.8) below) indexed by scaling parameter n representing the speed of the system in our case. The mean arrival rate and departure rate of the data are O( n). The channel state is given by the process L(nl/t), where 0 < 1/ < 1, so that the rate of channel variations is slower than the rate of arrivals. This is reasonable as the channel coherence times (how long the channel is in a given state) are typically much longer than the service times for packets. The state space scaling is according to the channel process and is given by 1/n T, where , == 1 - 1//2. This "heavy traffic scaling" leads to a driving Brownian motion in the weak limit and is studied below in this section. The departures depend on the power and the design problem is to determine how to allocate power to the queues to meet some performance objective (e.g., keep the queue sizes from becoming too large while maintaining good throughput). The available power is split up into a nominal power Pi (j) to apply to queue i in channel state j and a reserve power Ui(j, x) which is queue state (i.e., x) dependent; the total power applied is Pi (j) + u, (j, x) / n" /2. The nominal power is for balancing the mean arrivals and the reserve power, a small amount subject to control, is used for controlling the stochastic variations. In particular, the mean arrival rate to queue i is equal to
Aia ==.
"\a -b Ai Vi
==
'"""" "\ d ( ') - (.) (') ~ Ai J Pi J 7r J ,
i
< K,
(2.1)
jE.:J
where 7r is the stationary distribution of the channel, ~f is the mean packet is the mean packet size. The system is in heavy traffic in arrival rate, and the sense that the reserve power gives little additional capacity beyond that
vf
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
57
needed to handle the mean arrival rate. Despite that there is little reserve power, its affect on the queueing dynamics is significant in heavy traffic. Applying the heavy traffic scaling with parameter n to the queue balance equations, we obtain (2.2)
Ar
is the cumulative arrival process, where Xi is the (scaled) size of queue i, and is the cumulative departure process. Taking into account Eq. (2.1), the equation (2.2) can be expressed in more detail and a "centered form" as described next. The arrival process Af(t) can be given by either a fluid arrival model or a random arrival model. We will see that these models are equivalent as n ---t 00. For the fluid model, we have
Dr
(2.3) For the random arrival model, the l-th interarrival time for queue i for the n-th system is given by b.~,'ln In. Let nS:,n(t) be the number of packets that have arrived to queue i by time t, and l be the l-th packet size. Then, '
vf
-b
A~(t) == Mr:,n(t) 1.
. 1.
+ -3ib.~n'Y "
«s;: (t) ~
~
b.a,n i,l'
(2.4)
l=l
where
is the scaled variations of the arrival about the mean process. Assume that that the centered and scaled sequence of processes
(2.5) for the interarrival times and packet sizes are tight. This is typically the case in the traditional wireline setting under LT and SRD assumptions and is appropriate here since the data is arriving to the queues via wireline. Note that the state space scaling 1 > 112 used here is "stronger" than the standard 11 scaling for the wireline case. This implies that Mt,n(t) => O. The sum term in Eq. (2.4) is the time t plus an error term en(t) that goes to o as n - t 00. Hence, Eq. (2.3) and Eq. (2.4) are equivalent asymptotically.
vn
58
ROBERT T. BUCHE ET AL.
The departure process is given by
(2.6)
where I{-} is the indicator function and I{xi~(s»O} constrains the queues from being negative. The relation Eq. (2.6) can be rewritten as
Df(t)
= Mid,n(t) -
n v / 2 Aft - Zi(t) +
t I: I{Ln(sl=j}J..f(j)ui(j, xn(s)) ds,
io
jE:!
where
(2.7) and Zi(t) is the reflection process which represents the work that could have been done using the nominal power Pi(·), had the queues not been empty. (We assume that the reflection directions are orthogonal to the faces, so that the so-called completely-S conditions on the reflection process needed in the weak convergence analysis are satisfied. In general, these conditions in the wireless setting are complex [7].) Combining the expressions above and neglecting the error en(t), the prelimit equation is then given by
xr(t) == xr(O)
+ Mt,n(t)
- Mid,n(t) + zr(t)
(2.8)
- t L I{Ln(s)=j}J..1(j)ui(j, xn(s)) ds. io
jE:!
The following theorem is the main result in [7]: THEOREM
2.1.
Under suitable assumptions, Mt,n(.)
=}
0, the zero
process, and
which satisfies
where
ui; (.)
is the i-th component of the Brownian motion w(·)
K}, z, (.) is the reflection process and bi (U, x) == -
L ~t(j)Ui (j, X)1f(j). j
== {Wi (.), i
:::;
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
59
Theorem 2.1 establishes the weak convergence to the controlled reflected diffusion (or SDER) in Eq. (2.9) approximating the actual system. Of main interest in the weak convergence analysis is proving convergence to the driving Brownian motion, which is described next. In Section §3, we consider the weak convergence analysis to sLm under HT assumptions.
Convergence to Brownian motion. To prove Mid,n =} uu, a perturbed test function method used in stochastic approximation (SA) analysis [19, 23) is applied. For the prelimit model given above, the "step-size" of the SA corresponds to a scheduling interval ~ s,n == cn -v which is on the order of the channel coherence time and over which the resource allocations are held constant. In this case, one can write
with ~~l~
L
[I{L(cl)=j} -
n(j)] ~t(j)Pi(j).
(2.10)
j
Set also ~l == (~rl,i :s; K), F~ == a(~l,l < m), E~(·) == E(·IF~) and r~ == L:~m E~~f· The main assumptions needed to show the convergence are:
(AI) There is a matrix I: o such that 1 rn+k-l
lim
n,m,k~oo
(A2) There is a matrix
~l
1
lim
n,m,k~oo
-k
-k
L
E~ (~l[~l]') - I: o ==
o.
l=m
such that rn+k-l
L
E~ (~l [r~l]/) - I: 1 == O.
l=m
These assumptions follow from the assumed bounded departure rates and ergodic properties of the channel process (see [7]). Under these assumptions, one can establish the conditions needed to show tightness and, loosely speaking, convergence of the infinitesimal operator associated with the prelimit process to that associated with the limit Brownian motion (see Section 7.4.3 in [23], for example). In the cases of HT and LRD, we expect convergence to sLm or fBm. In the HT case considered in Section §3, we shall illustrate how the above assumptions can be violated and discuss alternate procedures for showing convergence. The LRD case considered in Section §4 is even more delicate. Stochastic control problem and numerical methods. With the limit Brownian-driven SDER model in Eq. (2.9), one can consider an associated
60
ROBERT T. BUCHE El' AL.
stochastic control problem choosing a cost rate k( x) which penalizes the queue size x. In particular, an infinite horizon cost (x(O) == x)
W(x) =
Ex
LX) e-
f3sk(x(s))
ds,
(3
> 0 and small,
is considered in [7]' with the cost rate k(x) of the form E~l cixfi, c., Pi constants, Pi > 1, with the objective of minimizing the cost over the reserve power u. Given the reflection term in Eq. (2.9), this problem cannot be solved analytically and numerical methods must be used. Preliminary results on this problem can be found in [7, 9]. Since these results are not a focus of the paper, we do not discuss them here.
3. Extension to heavy tails (HT): Convergence to stable Levy motion (sLm). We consider here HT transmissions in the arrival and departure processes and the resulting modifications to the model and methods of Section §2 for showing convergence to the driving process in the limit model. In the HT case, we expect convergence to sLm based on the results in [34] and [25]. We propose carrying out the weak convergence analysis directly in the Skorohod topology and identifying the limit via the infinitesimal generator for sLm and the associated MP. The condition (AI) in Section §2, for example, will not hold in the HT case. To illustrate how (AI) can fail, consider a general multimedia transmission (e.g., movie) model with MPEG encoding. The departure model for this example is different from that considered in Section §2. In particular, in the current example, the data to be transmitted is divided into scenes; different scenes will have different bit-rate requirements depending on the scene content [12]. From [12], the scene size can be Pareto distributed, which is HT. Over a scheduling interval 6.f,n in the scaled system several scenes can be transmitted (since the data rates are O(n) and the channel variations are O(n V ) , 0 < v < 1). We thus assume that ~~l6.;,n, the (centered about the mean) amount transmitted over the l-th scheduling interval for queue i, i ~ K, has HT and hence infinite second moments. Assume also that the channel and scene processes are mutually independent and each are independent over the scheduling intervals {~f,n; l}. Finally, assume that the scheduling intervals are constant: 6.;,n == 6.s,n. Then, the variables ~rl are HT and independent across the time index l. The conditional expectation E~ in (AI) becomes an expectation and ~o cannot be found due to the infinite second moments of ~~l'
3.1. Characterizing sLm: martingale problem (MP). As in the case of LT and SRD in Section §2, the martingale problem (MP) will be used to characterize the desired form of the limit process. The martingale problem for sLm can be formulated as follows. Let D[O, (0) be the space of paths that are right continuous with left-hand limits, and C; denoted the bounded, C 2 functions. Then, given a coordinate mapping process Z (.) E
61
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
D[O, 00), under what conditions does one have existence and uniqueness of a probability measure P on (D[O,oo),Ft == a(Xs,s::; t)) such that (a) P(Zo == zo) == l,and (b) for every
E C~,
f
f(Z(t)) - f(Z(O))
-I
t
£f(Z(s)) ds is a P-Iocal martingale?
(3.1)
In (3.1), L is the infinitesimal generator associated with sLm, given by
£f(x) =
J
[J(x + z) - f(x) -
l{1zl~1} (z)1' (x)z] n(dz),
(3.2)
where
n(dz) =
const
Izll+Q dz
(3.3)
is a Levy measure associated with (symmetric) sLm with index Q. See, for example, [3, 15, 26]. Existence and uniqueness of the solution to the MP gives existence and uniqueness of the corresponding process Z(t). Remark. The above discussion concerns convergence to sLm. For more general queueing limit models, for example, where the arrival and service processes are state dependent, one may need to show convergence to a more general driving process X(t) such as that given by a SDE driven by sLm dX (t)
== a (X (s - )) dZ (t),
(3.4)
where a(·) is some Borel measurable function. In this case, the relationship between unique weak solutions to theSDE (3.4) and the MP problem needs to be considered carefully. Conditions for weak uniqueness of SDE (3.4) need to be considered; some results on existence and uniqueness of SDE driven by sLm can be found in [5, 30, 36].
3.2. Weak convergence to sLm. Wediscuss here the problem of weak convergence of the departure process to sLm, with the details to be carried out in a future work (the arrival process follows analogously). For the HT multimedia model given in the beginning of the section, we consider a similar (centered and scaled) departure process as Mid,n(t) in Eq. (2.10), given by Z1,n(t) in Eq. (3.5). However, there are some differences between Mid,n(t) and Z1,n(t) that are important to point out. A difference is that Mid,n(t) is a scaled sum of centered departures over the scheduling intervals tl;,n, where the tl;,n are on the order of the channel coherence time (how long the channel is in a given state). But for high-speed wireless networks used by multimedia applications, the scheduling intervals can be very small-much smaller' than than the channel coherence time. Consequently, we let z1,n(t) be simply the scaled sum of the centered scene
62
ROBERT T. BUCHE ET AL.
sizes. A second difference is that for Mid,n(t) the rate of transmission is given by >"1 (j)Pi(j); in the multimedia example, the scene sizes ( can influence the power Pi, denoted now by Pi((,j), and canonical rate per unit power, denoted now by Pi((,j). The dependence on (in Pi((,j) can model the choice of a collection of antennas in an antenna array to utilize for transmission. (Transmission strategies incorporating the use of multiple antennas to achieve fast transmission in wireless systems is an area of active research.) An analogue of Mid,n(t) in Eq. (2.10) in the HT case is the following: nSr(t)
zt,n(t) = nIl/a
L
[(~l -
cr] ,
(3.5)
l==l
where (rl, is the size of the l-th scene in queue i, assumed (presently) i.i.d. with mean (r and HT distribution satisfying Q
lim x p((rl
x---+oo
'
> x) == K,
K constant,
(3.6)
and
Sr(t)
=
~ max {m: f~:T:S t},
(3.7)
l==l
where ~:~,n is the service time of the l-th scene in queue i; which depends on the channel state j and scene content (size (). (Do not confuse Eq. (3.5) with the reflection process zf.) Note that the state space scaling for Z1,n(t) is as in [34] and according to the tail distribution of the scene size (see Eq. (3.6)). We now lay down some assumptions leading to a representation of Z1,n(t) as an integral against a martingale measure (Eq. (3.9)). This is the same form as a representation of (unsealed) processes studied in Bass [4] which included the particular case of sLm. We will outline an approach for showing that the scaled processes converges to sLm which seems promising given the similar forms representing the scaled and unsealed processes. Let r~l == L:~: ~ ~:,~' n be the start time for serving the l- th scene; the service time for the l-th scene is ~:,~,n == r~l+l - r~l' Since the service times O(n) are faster than the channel state variations O(n V ) , we can assume that the channel state remains the same for each service interval for a scene. Under this assumption an error is incurred, corresponding to the cases where the channel changes state during the service of a scene, which is asymptotically unimportant. Assume that (rl == z, that is, the l-th scene is of size z, and that the channel state is j for' the duration of serving the scene. In addition, assume that the nominal power applied for the service
z1,n
zt,
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
63
is Pi(Z, j) and the transmission rate per unit power is pr(z, j) == npi(z, j). Then, the duration for the service time is ~~c,n == Z ~,l pr(Z,j)Pi(Z,j)
.
Under this model, the condition analogous to (2.1) is
Af ~ ~fvf
==
JL
Pi(Z,j)Pi(Z,j)7f(j)7fsC(dz),
i::; K,
(3.8)
JEY'
where 7fSC is the distribution for the scene size. There can be great freedom in choosing P and P for meeting the above equality and we assume that we can choose the deterministic functions P and P such that Eq. (3.8) holds and ~ ~cl,n can be modeled as having exponential distribution with parameter , This may seem like a strong assumption but it can be viewed as an assumption on how the scenes are sent to the mobile which should not reflect the heavy tails in the scene content (the details of the scene). It also allows us to consider a tractable Poisson model parallel to the development in [4]. Under the above assumptions, the pairs of (fr, Cr), I == 0, ... are LLd. (in the following we drop the i subscript for notational convenience), where fr corresponds to the jump time of a Poisson process with parameter ,.\ n and have distribution ~n (dz). Define the Poisson random measure on IR+ x IR+ by ~
"\r.
cr
00
J-ln(ds, dz)
==L
6(f[L,([L)
l=l
having the mean measure i/" == And(t)~n(dz), where An == n): Then, Zd,n(t) can be represented as an integral over the martingale measure (J-lnlJn) given by ([4]): t
Zd,n(t) =
1\" r r Z(J.Ln - vn)(ds, dz). n io iIR+
(3.9)
More generally, Zd,n(t) can be a solution to the SDE (3.10)
with the scaled jump size
in the case of Eq. (3.9).
64
ROBERT T. BUCHE ET AL.
By Ito's formula for semimartingales and using the fact that (j1 n is a martingale measure, one can obtain [4] that, for f E C 2 ,
-
vn )
is a martingale, where
c: f(x)
=
Fn(z)
=
J
[f(x
+ Fn(z))
- f(x) - f'(x)Fn(z)] >.nC(dz),
n:/o.·
In other words, the prelimit process Zd,n(t) has the "infinitesimal generator" E": Note that for an a-stable Levy motion, its infinitesimal generator c is given by Eqs. (3.2)-(3.3). In analogy to the Brownian case, we plan to study the convergence of Zd,n to sLm Zd through the convergence of \.en fn(Zd,n(t)) - .ef(Zd,n(t))I--4 0 (in a suitable sense) and Ifn - fl--4 0 (fn is a "perturbation" of f), as in the perturbed test function method. Although this approach for showing weak convergence still needs to be developed, it seems to be a direct approach and may be useful when considering more complex models in a control analysis, for example, when the reserve power u affects the driving process through Zd,n(t-) in Eq. (3.10).
4. Incorporating long range dependence (LRD): Poisson random measure (PRM) approach and convergence to fractional Brownian motion (fBm). In this section, we consider extensions of the heavy traffic analysis to the LRD case. We start with some motivation (Subsection §4.1). In Subsection §4.2, we discuss a related model in the LT and SRD case with a focus on the methods for showing the tightness conditions needed in the weak convergence analysis. This is often the most difficult aspect of the weak convergence analysis. The methods for showing tightness in the LRD case are more complex and it is useful to compare them against those used in the LT and SRD case. In Subsection §4.3, we the consider heavy traffic analysis of a simple wireless model with LRD.
4.1. Motivation. To capture LRD, the related On/Off and infinite source Poisson (also called discrete source) models are often used, for example, in [17, 24, 25], and their large scale analysis naturally involves fErn as the driving process. Both [17] and [25] use a Poisson random measure (PRM) approach to obtain fErn though the approach in [17] is more general as the inputs can be stochastic processes. As fErn is not a semimartingale and does not have an infinitesimal generator or the associated martingale problem, the methods used in the SRD case cannot be applied any longer. In the PRM approach, as in our model in Subsection §4.3, the arrival and departure processes are conveniently represented in terms of an integral
65
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
against a PRM. This helps in showing tightness of the processes by using the properties of orthogonal martingale measures (OMM) which are obtained from the centering and scaling of the PRMs in the heavy traffic analysis. These properties come from stochastic calculus for semimartingales even though the limiting fBm is not a semimartingale.
4.2. Related heavy traffic result for an exponential On/Off model - the case of LT and SRD. A heavy traffic model for an exponential On/Off source model input leading to a Brownian limit model can be found in Kushner and Martins [18]. We outline here the tightness analysis of [18] so that it could be compared to that based on the PRM approach. In [18], the arrival process is the superposition of N independent On/Off sources where the "On" times are exponentially distributed with parameter A and the "Off" times are exponentially distributed with parameter u: When the source is on, the input is according to a Poisson process with rate v. The departures are modeled as a sequence of service times {~~, k == 1,2, ... } which are i.i.d. and independent of the arrival process. Because of the independence assumption between arrivals and departures, the weak convergence analysis for the queueing system can focus separately on the arrival and departure processes. We will focus on the arrival process. The arrival process in [18] can be described by {Yi(·), i == 1, , N}, an indicator function for the source i being "On" and {ai(.), i == 1, , N}, giving the arrivals which are Poisson with rate v. The arrival process has a martingale decomposition given by ((3.1) in [18])
+ A(1 - Yi(t))] dt + dfJi(t), da,(t) == VYi (t) dt + dii, (t), dYi(t) == [J-lYi(t)
(4.1)
where Yi(·) and ai(·) are Ff-martingales and Ff == a({Yi(s),ai(s);s S t, i ::; N}). Part of the centered and scaled (by 1/JF/) arrival process needed in the heavy traffic method is described by centering and scaling L~l Yi(t). Denote this centered and scaled process ZN (t). It turns out (Theorem 3, [18]) that the weak limit of Z N, as N -+ 00, is governed by the linear SDE
dZ(t) == - (A
+ J-l)Z(t) dt + dW (t),
(4.2)
which can be seen from the form of the prelimit (obtained from (4.1))
ZN (tN
+ t) == e-(A+J-1)t ZN (tN) +
t:
e-(A+J-l)(tN+t-s)dZ N (s),
(4.3)
tN
where ZN ( t) == 1/JF/ L~ 1 Yi (t) corresponds to the Wiener process in the limit (4.2). In order to show the weak convergence of {Z N (t N + .), N < oo}, the following tightness result of Kurtz ([10], Chap. 3, Theorem 8.6, p. 138 and
66
ROBERT T. BUCHE Err AL.
[18], p. 1102) is used: the sequence of processes {X n ( . ) ; n 2 I} with paths in D[O, (0) is tight if (i) the collection of random variables {Xn(t); n 2 I}, is tight for each t E V, V a dense set of [0, (0), and (ii) for each T 2 0, lim lim sup sup E[min{l, IXn(T +~) - Xn(T)I}] == 0, Ll-+O
n-+oo
T~T
(4.4)
where the sup is over all stopping times T ~ T. Part (ii) follows here from the special forms in Eq. (4.1) and Eq. (4.3) of the above processes by showing ((3.7) in [18]) that
E
[CZN(tN + e+.6.)
- ZN (tN + ()))2] = 0(.6.),
E [(ZN (tN + () +.6.) - ZN (tN + ()))2]
:s 2.6.E IZ N (tN)1
2
+ 0(.6.).
For part (i), tightness of {Z N ( t); N ~ I}, each t E V, follows from stochastic stability techniques in [19] (Theorem 2 in [18] and is applied in a stochastic approximation rate of convergence analysis in [6]). The method relies on showing a supermartingale property for a Liapunov function (V(x) == x 2 in this case). Showing tightness of the arrival process is completed by showing tightness of AN (t) == I/VNE~l iii(t). This is done by applying the techniques used for showing tightness of ZN ( .). Due to the general arrival process, [17] can no longer use the expressions of the form Eq. (4.1) and Eq. (4.3). As the result, the techniques used for showing tightness (parts (i) and (ii)) above) are more complex in [17].
4.3. PRM approach and application: wireless model. We discuss here the PR:NI approach applied to a simple wireless model in heavy traffic. The details of the heavy traffic analysis will be part of a future work. Model and PRM representations. In Fig. 1, we consider a simplified wireless model that allows us to use the mathematical approach motivated from Kurtz [17] in obtaining limit models of the queue dynamics under HT and LRD. We suppose that each mobile user is requesting one type of files, say multiple WWW-pages. Requests are made at times Sk' k 2 1, which are the arrival times of a Poisson process with a constant intensity Aa. The file sizes are W k , k 2 1, which are assumed to come from an HT distribution. Once a request is made, the server starts sending these WWW-files in the form of packets immediately to the tower through wireline at the rate 1. The data input or rate process X in Kurtz's [17] description is then Xk(s) == s. The files are transmitted to the tower in times T k == Wk , k ~ 1. The resulting arrival process is an infinite source Poisson arrival model. Observe that the total amount of data aggregated until time t over all documents being transmitted simultaneously to the tower is
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
67
www
• ••
• •• (t
aggl'=U (t)
~ ;'.
~
r
("
,
\
,
'
8
FIG. 1. Wireless model.
Ua(t) =
it
=
r
XN(s) (TN(S) /\ (t - s)) dN(s) (4.5)
(r /\ (t - s)) ea(ds X dr),
J(O,tjX'R.+
where N is the Poisson process corresponding to the jumps Sk' k 2: 1, and ~a = L:~o 8(Sk,r!:) is a PRM on IR+ x IR+ with the mean measure ).,am X va (here, m is the Lebesgue measure and va is the distribution of sizes W k = Tk)' The PRM representation which was key in the weak convergence arguments used by Kurtz [17]' plays a fundamental role here as well.
Remark. More general models can be handled where the WWW-file input rate is random. This would be appropriate for the case where these files are sent to the tower through a wireless medium. In this case, XN(s) in Eq. (4.5) would be a general random processes. On the departure side of the tower (transmission from tower to mobile), let S~ , k 2: 1, be the starting times of the transmission of the k-th document. Suppose that S~ = SI:, k 2: 1, that is, the transmission from the tower to a mobile starts as soon as the mobile makes a request. This may seem as an oversimplification. However, we assume that the amount of the document that can be sent from the tower to the mobile depends on how much of the document is received by the tower from the server at
68
ROBERT T. BUCHE ET AL.
any given time. This is reflected in the rate function X~ for departures as described below. Let p(t) denote the total power applied to the data at the tower for transmissions to the mobiles, C(t) denote the channel process modeling the wireless medium, and r (C (t)) denote the departure rate per unit power. Observe that
d(s) =
is
p(v)f(C(v))dv,
is the departure process if the tower transmits documents continuously to the mobile (note that, this may not happen because there may not be packets available at the tower to transmit to the mobile at some time points - leading to idleness at the tower). Since sg == Sf;, the queue-length at the tower at time s from the start of transmission is ,(s - d(s)) for all s ~O, where ,(x(t)) == x(t) - infs~t x(s) is the Skorohod map. This leads to the departure rate process
xt(s) == s - ,(s - d(s)). The first term on the right side is for the arrivals at the tower (at the constant deterministic rate of 1). On the departure side, the time to transmit the kth document from and the the tower to a mobile will depend both on the departure rate size of the document Wk as T~ == inft~o{X~(t) ==.Wk } . Consider now the PRM ~d == L~=o 8(S~,xt,rt) on IR+ x D(IR+,IR+) x IR+ with the mean measure Aam X u", where v d is the joint measure, according to the distribution J.-l on E ~ D(IR+, IR+) x IR+, of each element of the i.i.d. sequence (X~, T~). Then, the controlled departure process is
Tt
Ud(t) =
=
it x~(s) (T~(S) r
A (t
x»
- s)) dN(s) (4.6)
u(rA(t-s))e(dsxduxdr).
J(O,t] x D(IR+,IR+)xIR+
This gives the queue equation at the tower as
Note that there are no reflection terms in the queue equation, since it is The key in absorbed in the definition of the departure rate function the above representation is that Q is expressed in terms of a PRM. This structure is exploited for carrying out the weak convergence analysis and identifying the limit process.
x».
Heavy traffic method and weak convergence. Turning to the discussion of the heavy traffic analysis, we first discuss the case when W k are not
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
69
HT and which does not involve LRD in order to illustrate the weak convergence analysis using the PRM approach. We then discuss the extensions to the LRD case along with important extensions of the model. From the representation in Eq. (4.5), we have E[Ua(t)] == at, where
(4.7) Similarly, from (4.6), E[Ud(t)] == f3t, where
f3
a
= A
1
(4.8)
u(r)fL(du x dr).
The heavy traffic method considers a sequence of embedded systems indexed by the scaling parameter n. At the n-th embedded system, we assume that the arrival (hence the departure) process is turned on by a Poisson process with rate ti). a, and for the departure process we assume that the i.i.d. pair (X~,n, T:,n) has the distribution J1n on E. This scaling then yields a corresponding change on an and f3n in Eq. (4.7) and Eq. (4.8), respectively, for the n-th system. A heavy traffic limit for such a model involves the assumption that under the above rescaling of network measurements, the scaled arrival and departure rates are asymptotically the same. In particular, given the above assumptions and scaling, the n-th embedded queue is then
Qn(t) = ua,n(t) - Qnt _ Ud,n(t) - f3n t + Qnt - f3n t
vn
vn
vn
vn'
The heavy traffic assumption is simply
. an - f3n Iim r: ==
n---+oo
yn
C,
(4.9)
where C is some constant. We now consider the (weak) limits of the scaled and centered processes ((ua,n (t) - ant) / JTi, (Ud,n(t) - f3nt) / JTi). We focus on the departure process, since the arrival process can be viewed as a special case in that the "workload" has a constant instead of random rate. Observe that
Vn(t)
=
Ud,n(t) - f3 t JTi n = n
1
u(r /\ (t - s))3 n(ds x du x dr)
(O,t) x£
Ivn,
where 3 n(A) ~ (~~(A) - A~m x lJ~(A)) for A c IR+ x E. Under the assumption that the jump process for the cumulative number of sources turned "On" has constant intensity, 3 n is an orthogonal martingale measure (OMM).
70
ROBERT T. BUCHE ET AL.
The collection of random processes {V n (.); n} (dropping the "d" dependence in the superscripts for notational simplicity) is tight if the collection of random variables {Vn(t); n ~ I}, each t E V, is tight (as in part (i) of (4.4)) and
E[(Vn(t
+ h) -
V n(t))2(Vn(t) - Vn(t - h))2] :::; CTh(},
(4.10)
for some Cr > 0, f) > 1,0 < h < 1 and h :::; t :S T. The condition (4.10) is equivalent to that in (4.4) (see [10]' Chap. 3, Theorems 8.8 and 8.6). With the process now represented as an integral against an orthogonal martingale measure, the evaluation involving the increments of the process in the lefthand side of the inequality in (4.10) depends on the discontinuities in the process given by the OMM. In particular, the calculation uses the quadratic variations 17i(A)t for the process
L[3 n (A, s) - 3 n (A, s-)]l,
l > 2,
s"S.t
for arbitrary set A E B(E), the Borel sigma algebra generated by the sets in E. For (4.10) to hold, one needs (3.8) and (3.9) in Kurtz [17] to hold which are conditions on the powers of the increments of the workload output processes. To show that {Vn(t); n ~ I}, each t E V, is tight, the stochastic stability results are not applied as in [18). Instead, this follows from conditions on the workload output process assuring a central limit theorem for triangular arrays which are given in (3.6) and (3.7) in Kurtz [17]. The weak convergence analysis establishes the weak convergence of V n ( . ) in the Skorohod topology on D(IR+,IR+) to V(·) where, for t2 > t 1 > 0, V(h, t2)~
r.
(u(r!\ (t2 - s)) - u(r!\ (tl - s))) W(ds x du x dr),
J(O,t)X£
W is Gaussian noise on (0,00) x E corresponding to m x ~ and ~ depends on the scalings used in the prelimit and is associated with the variance of V(tl, t2), described next. In particular, for the weak convergence analysis, one assumes that
· 1un
-2 \ n an AnY
C
==~,
(4.11 )
n~oo
where an is the state-space scaling (1/ Vii in the current example) with the convergence in a "somewhat vague" topology on M (E), the set of measures on E [17]. It follows that the variance of V(tl, t2) is E (V(tl' t2)2] =
r
J(O,t)X£
(u(r
!\
(t2 - s)) - u(r !\ (t 1 - s)))2 ds~(du x dr).
The An, i/" in Eq. (4.11) can depend on the n-th embedded system in a "multiplicative" way (sometimes called a single system). In particular, we
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
71
let An == n). and t/" == u. The physical interpretation for the above assumptions is: for the n-th embedded system the rate of sources activations is increasing linearly with n and the pair of workload and duration of service time is distributed according to the same u. Under such scaling assumptions on A and u, it is immediate that Eq. (4.11) holds where the limiting ~ is simply Av. For the arrival process, the analogous procedure gives
Wn(t) =
vnn
ua,n(t) -
Q
t n =
1
(r 1\ (t - s))3 n(ds x dr),
(O,t)xIR+
where now 3 n(A) ~ (~~(A) - A~m x v~(A)) Ivn, for A C lR+ X lR+. The weak convergence analysis is expected to yield that W n (.) weakly converges (in the Skorohod topology on D(IR+, IR+)) to W (.) where
W(tl t2) = 1
r
(r
1\
(t2 - s) - r
1\
(tl - s))W(ds x dr),
J(O,t)XR+
and W is Gaussian noise on lR+ x lR+ corresponding to m x Aa t/' .
Convergence to mm. Continuing our example with constant rate 1 input to the tower, suppose that the durations T k == W k are HT as lim r!3-1 P{Tk > r} == c, r~oo
2 < {3 < 3. The HT inputs lead to LRD in the input process [25]. This leads to new scalings in the weak convergence analysis dependent on (3, in particular, we can have that ~ in Eq. (4.11) is given by
see [17]. One obtains the weak limit va ('), represented as an integral against a Gaussian random measure, similar to the example above. The limit process is identified as fErn through direct calculation of the variance and using the fact that fErn, denoted B H ('), is a unique zero mean process, with stationary Gaussian increments and variance EB1-I(t) == a 2 t 2 H , a > 0,1/2 < H < 1, where H is the Hurst parameter [11]. There is much flexibility in the scaling through component scalings on the source rate for the "On" -times and the data input process which includes a separate scaling on the input duration. The particular choice of scaling is part of our future research.
Extensions of the model. The model in Fig. 1 can naturally be extended to more detailed modeling cases. In particular, the intensity Aa can depend on the time of day, reflecting the diurnal variations in measured data. The intensity can also depend on the number of active downloads at time t which is derived from N(s) and TN(s): the user is less likely to request
72
ROBERT T . BUCHE ET AL .
www
Outu t1
IISf l'
2
FIG. 2. Wire less m odel with power cont rol.
another download at time t if several are occurring at time t. One could also try to incorporate the LRD model for the multi-access interference in [381. Under this interference modeling , the retransmission of packets to th e mobile will have affects on th e departure modeling. It is very natural to consider extending the power p(.) to depend on the queue state Q(.) and the channel st ate C(·). On one hand, one wishes to have "fair" controls, balancing the queue size and, on the other hand, one wishes for good throughput, taking advantage of th e best channel condition without regard to the queue state. Controlling only for maximum throughput can lead to unfair policies. Furthermore, unlimited power is not available , so th ere is a power constraint - the total power applied at a given time has to be ~ P, a maximum amount available for use. A general set-up for th e power control problem is shown in Fig. 2. Since the stochastic optimal control problems are hard for the limit forms of Q(.), which includes fEm and sLm components, one could consider an a priori st ructure on the cont rol policies, such as threshold controls on the queue size for given chann el conditi on, etc . One could also consider a multiclass cases
HEAVY TRAFFIC METHODS IN WIRELESS SYSTEMS
73
of the above with both WWW and multimedia (movie) sources, where the multimedia has latency requirements.
REFERENCES [1] B. ATA, .I.M. HARRISON, AND L.A. SHEPP, Drift Rate Control of a Brownian Processing System, The Annals of Applied Probability, 15(2): 1145-1160, 2005. [2] D.R. BASGEET, J. IRVINE, A. NluNRO, P. DUGENIE, D. KALESHI, AND O. LAZARO, Impact of Mobility on Aggregate Traffic in Mobile Multimedia System, The 5th International Symposium on Wireless Personal Multimedia Communications (IEEE), 2: 333-337, Oct. 2002. [3] R.F. BASS, Uniqueness in Law for Pure Jump Markov Processes, Probability Theoryand Related Fields, 79: 271-287, 1988. [4) R.F. BASS, Stochastic differential equations with jumps, Probability Surveys,1: 1-19, 2004. [5] R.F. BASS, K. BURDZY, AND Z.Q. CHEN, Stochastic differential equations driven by stable processes for which pathwise uniqueness fails, Stochastic Processes and their Applications, 111(1): 1-15, 2004. [6] R. BUCHE AND H ..I. KUSHNER, Rate of convergence for constrained stochastic approximation algorithms, SIAM Journal on Control and Optimization, 40: 1011-1041, 2001. [7] R. BUCHE AND H ..I. KUSHNER, Control of Mobile Communications With TimeVarying Channels in Heavy Traffic, IEEE Transactions on Automatic Control, 47(6): 992-1003, 2002. [8] R. BUCHE AND H ..J. KUSHNER, Control of Mobile Communication Systems With Time- Varying Channels via Stability Methods, IEEE Transactions on Automatic Control, 49(11): 1954-1962, 2004. [9] R.T. BUCHE AND C. LIN, Heavy traffic control policies for wireless systems with time-varying channels, Proceedings, American Control Conference, 6: 39723974, 2005. [10] S.N. ETHIER AND T.G. KURTZ, Markov Processes: Characterization and Convergence, Wiley, New York, 1986. [11] P. EMBRECHTS AND M. MAEJIMA, Selfsimilar Processes, Princeton University Press, 2002. [12] M. IZQUIERDO AND D.S. REEVES, A survey of statistical source models for variablebit-rate compressed video, Multimedia Systems, 7: 199-21~, 1999. [13] M. JIANG, M. NIKOLIC, S. HARDY, AND L. TRAJKOVIC, Impact of self-similarity on wireless data network performance, IEEE International Conference on Communications, 2: 477-481, 2001. [14] R. KALDEN AND S. IBRAHIM, Searching for self-similarity in GPRS, Proceedings of the 5th annual Passive and Active Measurement Workshop (PAM 2004), Antibes Juan-les-Pins, France, April 19-20, 2004. [15] T. KOMATSU, On the martingale problem for generators if stable processes with perturbations, Osaka J. of Mathematics, 21(1): 113-132, 2004. [16] A. KRENDZEL, Y. KOUCHERYAVY, .1. HARJU, AND S. LOPATIN, Network Planning Problems in 3G/4G Wireless Systems, The 1st COST 290 Management Committee Meeting,Malta, Oct. 2004. [17] T. G. KURTZ, Limit Theorems for workload input models, Stochastic Neworks: Theory and Applications, Eds. F.P. Kelly, S. Zhachery & 1. Ziedins, Oxford, 1996, pp. 119-139. [18] H.J. KUSHNER AND L.F. MARTINS, Heavy Traffic Analysis of a Data Transmission System with many Independent Sources, SIAM J. on Appl, Math., 53(4): 1095-1122.
74
ROBERT T. BUCHE ET AL.
[19] H.J. KUSHNER, Approximation and Weak Convergence Methods for Random Processes, MIT Press, 1984. [20] H.J. KUSHNER AND P. DUPUIS, Numerical Methods for Stochastic Control Problems in Continuous Time, Second Edition, Springer, New York, 2001. [21] H.J. KUSHNER, Heavy Traffic Analysis of Controlled Queueing and Communication Networks, Springer, 2002. [22] H.J. KUSHNER, J. YANG, AND D. JARVIS, "Controlled and optimally controlled multiplexing systems: A numerical exploration", Queueing Systems, 20: 255291, 1995. [23] H.J. KUSHNER AND G. YIN, Stochastic approximation and recursive algorithms and applications, second edition, Springer, New York, 2003. [24] W.E. LELAND, M.S. TAQQU, W. WILLINGER, AND D.V. WILSON, On the selfsimilar nature of Ethernet traffic, IEEE/ACM Transactions on Networking, 2(1): 1-15, 1994. [25] T. MIKOSCH, S. RESNICK, H. ROOTZEN, AND A. STEGEMAN, Is Network Traffic Approximated by Stable Levy Motion or Fractional Brownian Motion?, The Annals of Applied Probability, 12 (1): 23-68, 2002. [26] R. MIKULEVICIUS AND H. PRAGARAUSKAS, On the martingale problem associated with nondegenerate Levy operators, Lithuanian Mathematical Journal, 31 (3): 297-311, 1992. [27] R. NARASIMHA AND R. RAO, Modeling Variable Bit Rate Video On Wired and Wireless Networks Using Discrete- Time Self-Similar Systems, Proceedings, IEEE International Conference on Personal Wireless Communications, pp. 290-294, 2002. [28] K. PARK AND W. WILLINGER, Self-Similar Network Traffic and Performance Evaluation, J. Wiley & Sons, Inc., New York, 2000. [29] V. PIPIRAS AND M. TAQQU, Integration questions related to fractional Brownian motion, Probability Theory and Related Fields, 118: 251-291, 2000. [30] H. PRAGARAUSKAS AND P.A. ZANZOTTO, On one-dimensional stochastic differential equations driven by stable processes, Lithuanian Mathematical Journal, 40 (3): 277-295, 2000. [31] S. SHAKKOTTAI, R. SRIKANT, AND A. STOYLAR,Pathwise optimality of the exponential rule for wireless channels, Advances in Applied Probability, 36 (4): 1021-1045, 2004. [32] A.L. STOLYAR, !vfax Weight scheduling in a generalized switch: State space collapse and workload minimization in heavy traffic, The Annals of Applied Probability, 14(1): 1-53, February 2004. [33J D. STROOCK, Diffusion processes associated with Levy generators, Z. Warscheinlichkeitstheorie verw. Gebiete, 32: 209-244, 1975. [34] W. WHITT, An overview of Brownian and non-Brownian FCLTs for the singleserver queue, Queueing Systems, 36: 39-70, 2000. [35] W. WILLINGER, M.S. TAQQU, R. SHERMAN, AND D.V. WILSON, Self-similarity through high-variability: Statistical analysis of ethernet LAN traffic at the source level, IEEE/ ACM Trans. Networking, 5(1): 71-86, Feb. 1997. [36] P .A. ZANZOTTO, On stochastic differential equations driven by a Cauchy process and other stable Levy motions, The Annals of Probability, 30(2): 802-825, 2002. [37] .1. ZHANG, M. Hu, AND N.B. SHROFF, Bursty Data Over CDMA: MAl Self Similarity, Rate Control, and Admission Control, Proceedings, IEEE INFOCOM, 1: 391-399, 2002. [38] J. ZHANG AND T. KONSTANTOPOULOS, Multiple-Access Interference Processes Are Self-Similar in Multimedia CDMA Cellular Networks, IEEE Transactions on Information Theory, 51(3): 1024-1038, 2005. [39] J.A. ZHAO, B. LI, C.W. KOK, AND I. AHMAD, MPEG-4 Video Transmission over Wireless Networks: A Link Level Performance Study, Wireless Networks, 10: 133-146, 2004.
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING OVER DYNAMICAL FADING CHANNELS: A CONSTRAINED MARKOV DECISION PROCESS APPROACH DEJAN V. DJONIN* AND VIKRAM KRISHNAMURTHyt Abstract. The problem of transmission scheduling over a correlated time-varying wireless channel is formulated as a Constrained Markov Decision Process. The model includes a transmission buffer and finite state Markov model for time-varying radio channel and incoming traffic. The resulting cross-layer optimization problem is formulated to minimize the transmission cost under the constraint on a buffer cost such as the transmission delay. Under the assumptions on submodularity and convexity of the cost function it is shown that the optimal randomized policy is monotonically increasing with the increase of the buffer state. Furthermore, the influence of the channel and traffic correlation matrices on the optimal transmission cost is investigated. It is shown that comparison between optimal transmission costs of two different channels can be performed by considering the stochastic dominance relation of their conditional probability distributions. As an example of this result, channels with smaller scattering and the same mean can achieve smaller average transmission cost for the same average buffer cost. Key words. Value function, scheduling, optimal policy, Markov Decision process, correlated sources, correlated channels, transmission scheduling, supermodularity, stochastic dominance, latency, adaptive modulation. AMS(MOS) subject classifications. 93E20.
Primary 94A05, 94A14, 90B18, 90B36,
1. Introduction. Consider the uplink transmission problem comprising of a single user with Markovian traffic arrival, finite buffer and Markovian fading channel. In this paper we derive structural results and a computationally efficient algorithm for the uplink transmission scheduling policy that optimizes a transmission cost, such as power cost subject to a delay constraint. Several wireless standards such as EDGE, IS-856, 802.11a,b and g, WCDMA and 1xEVDo provide a framework for transmission scheduling based on the channel state. Several recent papers have studied resource allocation adaptation for transmission over time-varying fading channels under constraints on the transmission delay [2], [9], [14] [10], [12], [3] and [13]. All of these papers formulate the problem of finding the optimal policies as an unconstrained Markov Decision Process (MDP) and use dynamic programming methods *Department of Electrical Engineering, University of British Columbia, Vancouver, BC, Canada. The work of the first author was supported in part by NSERC PostDoctoral Fellowship. t Department of Electrical Engineering, University of British Columbia, Vancouver, BC, Canada. The work of the second author was supported in part by NSERC strategic grant. 75
76
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
to compute optimal transmission policies. The structure of optimal deterministic rate adaptation policies for non-correlated channels have been analyzed in [12]. In comparison to the above papers, here we present the transmission scheduling problem as a constrained MDP (CMDP) where the global constraints in the CMDP take into account the limitations on the transmission latency. The following two modeling assumptions are employed in our analysis. First, the packet arrivals and channel variation are modeled as a discrete time finite state Markov chain. Transmission scheduler decisions are made at discrete time instants. Restricting the analysis to discrete time finite state processes avoids the technicalities and complexities associated with semi-Markov decision processes which seldom lead to practical algorithms. The discrete time Markov chain assumption on the channel variation implies that the channel is block fading, which is the model used by many authors. Second, it is assumed that channel state is exactly known (or fully observed) and also that the incoming traffic state is exactly known.
1.1. Main results. In order to establish our structural results on the optimal transmission policies, we combine three tools: Lagrangian dynamic programming approach to constrained MDPs [1], supermodularity [23) and sensitivity of MDPs to transition probabilities (15). The first step in this paper is the formulation of the generic rate adaptive transmission scheduling problem as an average cost infinite horizon CMDP. Under suitable regularity conditions, a stationary optimal policy exists. For the unconstrained case or with local constraints only, this stationary policy is pure, i.e., the optimal action is a deterministic function of the state, and the problem can easily be solved via stochastic dynamic programming methods. However for a CMDP (with global constraints), the optimal scheduling policy in general is randomized, i.e., the optimal action at a given time instant is a probabilistic function of the current state and stochastic dynamic programming methods cannot be applied directly to solve for the optimal policy. An infinite horizon average cost CMDP can be formulated as a linear programming problem, and the optimal policy (possibly randomized) of the CMDPs can be obtained by solving this LP. However, in this paper we are interested in deriving structural results on the optimal transmission scheduling policies and not simply solving a CMDP for the optimal policy. Structural results such as supermodularity [23] have been developed for MDPs using the stochastic dynamic programming formulation. In order to exploit these structural results for a CMDP, we need to reformulate the CMDP as a MDP with Lagrangian costs. Therefore, we employ the results from [1] to establish the equivalence in optimal average costs between CMDP and unconstrained MDP with Lagrangian instantaneous costs. Further details are given in Section 3.2. Our second result is the use of supermodular properties on a lattice, developed in [23], to establish several structural results on the nature of
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 77
the optimal transmission policy as a function of the buffer state. For an active constraint on the buffer cost (e.g. latency), it is shown that the optimal transmission policy is a pure policy. A novel result of Theorem 3.1 states that if the Lagrangian cost function of this model satisfies certain supermodularity and convexity properties, then the optimal pure scheduling policies are monotonically non-decreasing in the buffer occupancy. This means that irrespective of the current channel and source states as well as the transition probabilities that describe channel and source Markov chains, the optimal transmission scheduler will take more packets from the buffer as the buffer occupancy increases. This has practical implications for deriving computationally efficient policy search algorithms (such as policy iteration [17]) as the search space for the optimal policy can be significantly red uced to a subset of non-decreasing policies in the buffer size. In particular, for only two rate adaptation policies, our structural results imply that for each traffic and channel state, the scheduling policies will be threshold policies in buffer component of the state variable. A threshold policy takes the same action for state variable less than a fixed threshold and takes a different action for state variable greater or equal than that threshold. In Theorem 3.2 of Section 3.3 we also give the piece-wise linear characterization of the dependence of the optimal transmission cost with respect to the buffer cost constraint. For a general constraint we demonstrate that optimal randomized policies are probabilistic mixture of two pure monotone non-decreasing policies and present a computationally efficient algorithm to compute these mixed policies. Our final result explores the influence of the structure of the transition probability matrix of Markovian channel on the optimal transmission policy subject to a delay (latency) constraint. "VVe present a method utilizing stochastic ordering to compare two Markovian channels based on their transition probability matrices. In Section 3.5 and Theorem 3.3 establish two results that can determine the influence of channel correlations on the optimal cost with a single active constraint. As an illustration to these results, we prove that wireless channels with smaller scattering, same mean and the same latency constraint would have less transmission costs (such as power consumption) when optimal scheduling is applied. Another example is the maximum ratio combining and the favorable influence of adding new diversity branches on the optimal transmission cost. Our results exploit the influence of the transition probability matrices on the value function of a Markov Decision process considered in [15]. The outline of the remainder of the paper is as follows. In Section 2 we pose the problem of the choice of the optimal adaptive transmission policies for a generic transmission model as a CMDP and define all the ingredients that constitute it. A practical communication example that can fit into this model is given in Section 2.4. Results on the monotonic increasing structure of the optimal pure and randomized policies are given in Section 3.2. The dependance of the optimal cost on the constraint is
78
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
discussed in Section 3.3. Section 3.5 deals with a mathematical tool that can be used to classify and compare different channel environments. Proofs are postponed for the Section 4.
2. Scheduling problem formulation as constrained Markov decision process. 2.1. Review of constrained Markov decision processes and Lagrangian optimization. The aim of this section is to quickly review the key results on CMDP and Lagrangian formulation of the optimal constrained cost. Notation: Upper case bold letters denote random variables, while lower case letters are reserved for the instances of random variables. Let X(y) denote the random variable X conditioned on the outcome y of the random variable Y. Let ICI denote the cardinality of a certain finite set C, r[·J denotes the probability measure and lE[·J denotes the expectation operator. Let No be the set of integers including 0. Let S denote an arbitrary finite set called the state space': Let n == 0,1,2, ... denote discrete time. Let As, S E S denote an arbitrary collection of finite sets called action sets. The evolution of a MDP can be described as follows. When the system is in state S E S, a finite number of possible actions which are elements of the set As can be taken. Let an denote the action taken by the decision maker at the time n. The system evolution is Markovian with transition probabilities
°
(2.1)
for some si, Sj E S, a E A S j and n == 0,1, .... Let On, n ~ denote the a-algebra generated by the observed system state So, ... ,Sn at time n. Define the set of Markovian admissible policies == {A == {an} Ian is measurable w.r.t. On, \:In == 0,1, ...}. This means that an is a (potentially) random function of current state Sn. Let D denote the set of all pure policies where an is a deterministic function of current state Sn. The finite cost c( Sn, an) ~ 0 is the instantaneous cost of taking action an in the state Sn. For any admissible policy 7r E , let the infinite horizon cost conditioned on initial state So == So be defined as
where the expectation is over randomized actions An and system state Sn evolution for n == 1,2,.... The goal is to compute the optimal policy 7r* that minimizes the cost (2.2)
(2.3) 1 See Section 2.4 for the remark considering continuous Markov Processes, e.g. for modeling Rayleigh fading channels.
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING
79
subject to the global constraint
(2.4) Here d(s,a) 2: 0 is a known instantaneous finite cost function, where constraint cost D 2: 0 is a user specified parameter. Any policy 1T* that minimizes CSo ( 1T) will be called the optimal policy. Transmission cost of the policy 1T* that is optimal subject to constraint (2.4) will be denoted by C* (D). We will call the constraint (2.4) active if the equality holds in (2.4) for the optimal policy 1T*. Denote with V the set of all constraint costs D such that (2.4) for the optimal transmission scheduler is active, i.e.,
V == {DID(1T*) == D; 1T* is optimal policy for constraint cost D}.
(2.5)
The set V will be called the set of feasible constraints. A CMDP is considered unichain if every feasible policy where an is a deterministic function of So, ... ,Sn induces a single recurrent class plus possibly an empty set of transient states. For finite CMDPs with unichain structure and bounded costs it is sufficient to regard the set of admissible policies that are not history-dependent [1] as the optimal policy can be always found within the set of Markovian admissible policies
cost and policy of the constrained Markov Decision Process can be found using an unconstrained MDP and Lagrangian approach. THEOREM 2.1. The optimal cost function of a finite unichain CMDP problem satisfies
C*(D) == min sup J(1T, A) - AD == sup min J(1T, A) - AD 7rE
(2.6)
A2:0 7rE
where
(2.7) and c(s, a; A) is the Lagrangian cost given with
c(s, a; A) == c(s, a) + Ad(s, a) for a certain Lagrangian multiplier A > O.
(2.8)
80
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
Note that the minimization in the rightmost expression in (2.6) can be performed only over the set of pure policies. This important Theorem establishes that the CMDP problem can be solved by solving the appropriate Lagrangian unconstrained MDP problem and the relative value or policy iteration algorithms available for this case.
2.2. Markovian dynamics for the channel and incoming traffic. We now formulate the generic transmission control problem as CMDP. Consider the transmission scheduling problem comprising of a single user with Markovian traffic arrival, finite transmission buffer and Markovian fading channel as shown in Fig. 1. Then each state s E S is composed of three components s == [h, b, f] and S == H x B x F where x denotes the Cartesian product. H denotes the channel state s.pace, B denotes the buffer state space and F denotes the traffic state space. The following assumption for the stated transmission scheduling problem will be used to establish the main results of the paper. ASSUMPTION 1. The CMDP that models the transmission scheduling problem is unichain.
Higher Layer Application
Higher Layer Application
Buffer
hn FIG. 1. Block scheme of the transmission adaptation system.
As a consequence of this assumption, all components of the state variable form ergodic processes for any feasible Markov policy 1r E
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 81
Note that h is an uncontrollable component of the system state since it is not control dependent. Let H(h n ) be the random variable of the channel state h n+1 ph (·1 h n) as conditioned on the previous state h n. The second component of the state space is the current buffer state b E B where set B == {O, 1, ... , L - 1} corresponds to all possible states of the buffer occupancy in bits. The third component of the state space is f E F{O, 1, ... , P-1} which is the current perfectly observable incoming traffic state. The finite traffic state space satisfies F c No and P == IFI. ASSUMPTION 3. Incoming traffic state that forms an ergodic Markov Chain with transition probabilities pi (fn+llfn) is independent of actions, buffer and the channel state. Therefore, incoming traffic state is also an uncontrollable component of the state space. I"-.J
2.3. Scheduling action space and transition probabilities. Let
W(a) be the buffer retrieval function that denotes the number of bits to be taken from the buffer provided that action a E As is applied in state s E S. The buffer evolves according to the Lindley recursion
(2.9) where f n is the number of incoming packets at time nand G is the incoming traffic packet size in bits. Without the loss of generality we can assume that W(a) is a non-decreasing function of a. An action an is available in state Sn == [hn , bn , fn], (or an E A sn ) , if
(2.10) This condition prohibits actions that would lead to negative buffer occupancy. The next component of the CMDP description are the transition probabilities. As a consequence of Assumptions 2 and 3 and (2.9), the transition probability between the states s == [h n+1 , bn+1 , fn+d and s == [h n, bn, in] is given with
min(bn + Gin+l - w(a n), L)) x ph(h n+1Ih n)pi (in+llin)
p(sn+ll sn,a n) == J (bn+1
-
(2.11)
where an E A S n and £5 (x) is a Kronecker delta function i.e. £5 (x) == 1 if x = 0 while J(x) == 0 otherwise.
2.4. Example of optimal transmission scheduling as CMDP. In this subsection, a transmission scheduling example is formulated as CMDPs. This example is used subsequently in Section 3. REMARK 2.1. While the structural results and examples in this paper are derived for denumerable Markov chains, under suitable regularity
82
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
conditions (cf. !16}) the results also hold for continuous state Markov Processes, such as the Gaussian AR channel model (see Example 1). However, for notational consistency, we assume that a continuous-state Markov Process is discretized (approximated) with sufficient precision by a denumerable Markov chain. EXAMPLE 1. Transmission rate adaptation for average power minimization. Transition Probabilities: Consider the complex channel gain h n at a certain transmission block n modeled as p-th order autoregressive contin-
uous state model p
b«
==
L
(1jh n - j
+ Vn
(2.12)
j=l
where V n is white (complex) Gaussian noise with variance (J'2 and mean u. In view of the Remark 2.1, we assume that continuous-state AR model (2.12) is replaced with a discretized model
(2.13)
Function 3 : C ~ H represents a quantization function performed at the channel gain estimator that ensures that complex gains hn belong to the finite set H c C. For example, quantizer =: can perform uniform quantization of both real and imaginary components of the complex channel gain with finite number of uniformly spaced quantization levels. In this model, the state of the channel at n-th block is the vector h n == {h n - 1 , ... , h n - p } . If p == 1 this model represents a first order Markov Chain as was used in e.g. [24]. Assuming coherent detection at the receiver, the amplitude channel gain of the complex gain h is Ihl. Action Space: Transmission scheduler is performing power adaptation of transmitted code words. Let the actions a E A == {a, ... ,A} correspond to rate a/M codes employed at the transmission scheduler, where M is the number of symbols in the code word. Therefore, it follows that W(a) == a where W(a) is defined in (2.9). Each rate action determines specific transmission power for a fixed BER. Cost: Define the transmission cost as the power 2
c([h,
b, f], a) = f(B;R)lh/ 2 (2 W(a )/ M - 1) ,
(2.14)
that is necessary to achieve code rate W(a)/ M for specified bit error rate BER, and noise variance (J'2. The SNR gap f(BER) of a practical modulation corresponding to the Shannon capacity formula can be found in
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 83
e.g. [11]. The previous power cost comes from the expression for the transmission rate R
R = log2
(1 +
r(BER)lhI
2
;) ,
(2.15)
that can be achieved for signal to noise ratio ~ with bit error rate BER. Constraint: The global constraint is the latency constraint
(2.16)
for a given latency constraint iJ and random variable B n describing the buffer occupancy at time n. According to the Little's formula, (2.16) describes the constrained average delay incurred in the buffer. In terms of the notation of (2.4)
b d([h, b, f], a) = CF where
(2.17)
P is the average number of incoming packets.
3. Structural results for transmission scheduling. In this section we present the main results of this paper and discuss their importance through several examples. Proofs of the results are presented in Section 4. Throughout this section we consider that Assumptions 1, 2 and 3 hold 2 . 3.1. Submodular functions and stochastic dominance. We take a brief detour to give important explanations and definitions that will be necessary to state the results and formalize the proofs. DEFINITION 1. f23} A function f : A x X x p ~ ~ is supermodular (has increasing differences) in (a, x) for a fixed parameter pEP, if for all a' ~ a and x' ~ x,
f(a',x';p) - f(a,x';p)
~
f(a',x;p) - f(a,x;p).
(3.1)
A function is f : A x X x P ~ ~ is submodular (has decreasing differences) in (x, a) if the conditions of previous definition are satisfied and the inequality in (3.1) is reversed. A central question of interest for establishing the monotonic structure of the scheduling policies is to identify when n(x)
== arg min f( a, x; p) aEA
(3.2)
2The unichain assumption is not necessary for our proofs if infinite horizon discounted costs [17] are used in lieu of average costs.
84
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
is non-decreasing in x for any parameter pEP. This result is due to Topkis [23] and shows that submodularity of the function f in the pair (x, a) implies that 7r(x) is non-decreasing function. We shall also need the following definition of the stochastically dominating random variables [15], [22), [20). Let TQ be the set of all bounded functions f : Q ~ R such that f(q) < 00, for all q E Q and certain finite set Q. DEFINITION 2. [15J, [22J, [20J Let x, and X 2 be r.v. with the support ~Q X 2, if IE [u(X l)] 2: on Q and let 9 c TQ . Xl dominates X 2, or IE [u(X 2 ) ] for all functions u E g. Each set 9 thus defines a dominance partial ordering. Let X(h) be a random variable with support on finite ordered set Q and conditioned on the scalar parameter h E Q. We will call X( h) stochastically nondecreasing on 9 if X(h l ) ~Q X(h 2 ) for some hI > b». Similarly, X(h) is stochastically non-increasing on 9 if X(h l ) =5g X(h 2 ) for some b, > h2 . It is well known [22) that if 9 is the set of increasing functions on a finite set Q, then the stochastic dominance ordering corresponds to the first-order stochastic dominance ordering. In this case, X, ~l X 2 if and only if Fl(x) F 2(x) for all x, where FI(x) and F2(x) are cumulative distribution functions of r.v. Xl and X 2, respectively. By definition, second-order stochastic dominance proceeds from the Definition 2 by restricting the set of functions 9 to all non-decreasing and concave functions. It is of interest to note that if X, first-order stochastically dominates X 2 , then X, second-order stochastically dominates X 2 . Also, if X. second-order stochastically dominates X 2 , and if Xl and X 2 have the same mean, then X, has smaller variance than X 2 •
x,
<
3.2. Effect of buffer occupancy on the optimal transmission scheduling policies. In this section we establish the first main result of this paper - sufficient conditions for the optimal transmission policy 7r of the described transmission scheduling model to be monotonically nondecreasing in the buffer state b. To state our result, we need the following assumption and definitions: ASSUMPTION 4. Set of feasible actions As in state s == [h, b, f] E S is a non-empty set of actions a E A for which b + f' - w(a) :::; Land b - \lJ(a) 2: 0 and any f' E F. Note that previous assumption prohibits transmitter buffer overflows and underflows. DEFINITION 3. For any 0 :S q :::; 1, mixed policy 7r is a randomized policy formed of two pure policies 7rl and 7r2 such that policy 7rl is applied with probability q and policy 7r2 is applied with probability 1 - q. According to the previous definition, mixed policy 7r is a randomized policy that is convex combination of pure policies 7r1 and 7r2. DEFINITION 4. Pure policy 7r is non-decreasing in the buffer state b if the ordinal number (index) of the action a == 7r([h, b, f)) taken in state
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 85
[h, b, f] is non-decreasing in buffer state b for each channel state hand traffic state f. THEOREM 3.1. Let the Assumptions 1, 2, 3, 4 hold. Let the instantaneous Lagrangian cost c([h, b, f], a; A) (2.8) be submodular, convex function of b and increasing in buffer state b. Let \lJ(a) defined in (2.9) be concave increasing function of a. Then for cost constraint fJ > 0, the optimal randomized policy n* ([ h, b, f]) is a mixed policy of two pure policies n 1 ([h, b, f]) and 7[2([h, b, f]) that are non-decreasing functions of buffer state b (see Definition 4). Furthermore,. there exists only one state s E S such that n1(s) i= n 2(s). As buffer retrieval function \lJ (a) is chosen to be increasing in the ordinal number of action a, Theorem 3.1 states that with increasing buffer occupancy b, the pure policies 1T"1 and 1T"2 that constitute the optimal policy 1T"* take more bits from the buffer, irrespective of the channel state. Therefore the average number of bits taken from the buffer by the optimal mixed policy 1T"* is also an increasing function of the buffer occupancy b. Note that the Theorem 3.1 does not place conditions on instantaneous cost c([h, b, f], a) and constraint related cost d([h, b, f], a) defined in Section 2, i.e. both of these functions can be dependent on all three components of the state space [h, b, f]. Example of Theorem 3.1. Consider the power adaptation Example 1 with transmission cost given in (2.14) and buffer cost given in (2.17). Then since the power does not depend on buffer state b and instantaneous buffer cost d([h, b, f], a) is not dependent on a, the Lagrangian cost c([h, b, f], a; A) is submodular in (b, a). In the power adaptation Example 1 instantaneous transmission cost c([h, b, f], a) (2.14) is a convex function of a and not a function of buffer state b, while instantaneous buffer cost d( [h, b, f], a) (2.17) is a convex function of b and not a function of a. Then the Lagrangian cost c([h, b, f], a; A) is convex in b. Assuming that \lJ(a) is a non-increasing function of action a (which simply amounts to notational convention), Theorem 2 holds. COROLLARY 1. Under the conditions of Theorem 3.1, the optimal policy 7[* (s), s E S of Example 1 is a mixed policy of two deterministic policies that are non-decreasing functions of the buffer state for each channel and traffic states. Note that each of the pure policies that constitute a mixed policy in Theorem 3.1 can be calculated from an unconstrained MDP with a fixed Lagrangian multiplier A > O. The algorithms for computation of the Lagrangian multiplier is presented in the next subsection. 3.3. Effect of the constraint on the optimal transmission scheduling cost. This section explores the properties of the dependence of the optimal transmission cost C* (fJ) on the buffer cost constraint fJ. Function C* (D) is also presenting a lower bound on the achievable region of transmission costs for given buffer cost constraint D.
86
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
Let 7r~ be the optimal pure policy of the Lagrangian formulation (2.6) of the CMDP model for a certain Lagrange multiplier A and let IIp == {7r~IO < A < oo}. Since in the finite CMDP model S < 00 and A < 00, IIp == {7ri, 7r2,... , 1rQ} for some Q < 00 as the number of possible pure policies for transmission schedulers is finite. Therefore, there is a finite number of pairs [C(1rl ), D (7rl)] ,l == 1, ... , Q that satisfy the equations (2.3) and (2.4). Without the loss of generality, assume the ordering D(1rl1) < D (1rl2) for certain II < 12 and II, l2 E {1, ... , Q}. If the conditions of Theorem 3.1 are satisfied then each of schedulers 7r l, 1 == 1, ... ,Q possess the non-decreasing structure in the buffer state. THEOREM 3.2. Let the Assumptions 1, 2, 3 hold.
(1) For any buffer length L E No, the optimal average transmission scheduling cost C* (D) is a piece-wise linear non-increasing function of jj E V that can be expressed as C*(D) == max (D(1r;J - D) A + AEA
C(7r~)
(3.3)
where A defined as
is a finite set. (2) In addition to the above stated assumptions, suppose that Assumption 4 holds and that c( [h, b, f], a; A) is jointly convex in b, a and A and let \lJ(a) be concave increasing in a. Then C*(D) is piece-wise linear convex non-increasing function of fJ. Interpretation of Theorem 3.2. In Example 1 Lagrangian cost function c([h, b, f], a; A) is jointly convex in b,a and A and \l1(a) is concave in a. That implies that optimal power is convex decreasing function of the average delay in the buffer. The vertices of the piece-wise linear convex function (3.3) are the power and delay costs attained by optimal pure transmission schedulers, whereas the points in the linear segments between the vertices can be attained by using mixed policies. The main use of Theorem 3.2 is that it gives us a representation of the transmission cost C* of the optimal randomized policy for the transmission scheduling problem as a piece-wise linear convex function of the constraint D. This leads to an efficient algorithm for the calculation of the optimal randomized policy as discussed in Section 3.4. For a given feasible cost constraint fJ E V, let the optimal policy be mixed policy of pure policies 7r; and 7rt:, where the first policy is taken with probability q and the second with probability 1 - q. Note that, according to [18] policies 7r; and differ only in one state. Under the Assumption 1
r:
D == qD(nj) + (1 -
q)D(n'k).
(3.5)
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 87
Using the above discussion and notation it can be concluded that the set of feasible cost constraints D == [D(1ri), D( 1rQ)]. Under the condition of Theorem 3.2 pairs [C(1ri),D(1ri)]' l == 1, ... , Q constitute the vertices of a piece-wise linear convex function C*(D). Furthermore, Theorem 3.2 implies that given a parameter D E D, the pure policies 1rj and that form the optimal mixed policy are adjacent, i.e. k == j + 1 as all other combinations of pure policies would give larger power cost due to the convexity of C(15). Under the conditions of Theorem 3.1 pure policies 1rj and 1rj+1 that form the optimal mixed policy are differing in only one state. Due to the piece-wise linearity and convexity of the dependence of the function C*(D), several consequences of the Theorem 3.2 can be stated. The lower and upper bounds on achievable performance can be given with the following Corollaries. COROLLARY 2. (Lower bound) Let ~ ~ A, where A is the set of
r:
coefficients of linear segments defined in (3.4). Then a lower bound on the convex function C* (D) is given by the piece-wise linear function max '\E~
(D - D(1r~)) A + C(1r~) < C*(D).
(3.6)
The practical importance of the Corollary 2 is that knowledge of only a subset of segments of the piece-wise linear function C* (D) can produce a lower bound on the achievable transmission cost region. For example, if any two adjacent pure policies 1r_i and 7r_;+1 are known, then a line through points [C(1rj),D(1rj)] and [C(7rj+l),D(7rj+l)] constitutes a lower bound on
C(D). COROLLARY 3. (Upper bound) If only a subset of Z points [C l, .ol], l == 1,2, ... , Z on the convex piece-wise linear function C*(D) is known, then an upper bound on function C* (D) is formed by the piece-wise linear function with vertices [Cl , Dl) ,l == 1, 2, ... , z. REMARK 3.1. The simplest of the upper bounds can be constructed by considering Z == 2 in Corollary 3. Then only extremal policies 1ri and 1rQ
exist and their respective transmission and buffer costs are [C1,Dd == [C(1f;),D(1r;)] and [C2 ,15 2 ] == [C(7r Q),D(7rQ)]. Note that, pure policy 1ri can be computed from the Lagrangian formulation of the CMDP model for A -7 00, while 1rQfollows from the same model for A == o. Under special conditions on the transmission cost c(s, a) and buffer cost d(s, a) of Section 2, the optimal pure policy 1ri can be found without the use of stochastic dynamic programming methods. For example, when c(s, a) is not dependent on the buffer occupancy b and traffic state f and d(s, a) is increasing in b while not dependent on channel state h and traffic state f (as in Example 1), then the optimal policy 1ri is equal to 1r~([h, b,
f]) == m~x{ala
E A[h,b,J]}.
(3.7)
88
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
3.4. Algorithm for computation of the lagrange multiplier and optimal mixed policy. The optimal pure policy for a CMDP can be found using the relative value iteration (RVI) in case that the constraint is (2.4) active. However, we can still pose the question of computation the suitable Lagrangian multiplier .\ that satisfies the constraint with equality. Note that the average constraint for the optimal policy 7r~ for Lagrangian multiplier .\ can be given with (3.8)
For the model of Section 2 and positive transmission and buffer costs, D( 7r~) is piece-wise constant decreasing function of A. A simple algorithm designed to find the smallest .\ (that will be called A*) such that the constraint (2.4) is satisfied can be formulated as following
(3.9) where the step
En
==
~. The convergence to .\ * is ensured as the function
1(D( >J - iJ) o. ,\
1r
is piece-wise linear concave function that attains its global maximum at .\ * and its derivative is D(7r~). Therefore the algorithm (3.9) is just a gradient descent algorithm. We demonstrate how to employ the estimated parameter A* to find the optimal randomized policy for any feasible constraint b E V with RVI. Assume that conditions of Theorem 3.1 hold. First, find the A* for a given feasible constraint D. In view of Theorem 3.2 and [6]' perturb the parameter A* by some 6.\ to get .\- == A* - 6.\ and A+ == .\ * + c5A. Next we find the optimal pure policies 7r~ _ and 7r~ + and their respective average constrained costs D- == D(1f;_) and D+ == D(1f;+). As stated in Theorem 3.1, the optimal randomized policy is a mixed policy of two pure policies and let parameter q determine the probability of taking the policy 7r~_ and 1- q be the probability of taking the policy 7r~+. Now, parameter q can be computed such that qD- + (1 - q)D+ == D. 3.5. Influence of channel dynamics. In this section we present new results on the influence of the fading distributions on the performance of scheduling algorithms. We start with a general result on the influence of channel dynamics on the optimal transmission scheduling cost in Theorem 3.3. This result is later made more specific in Theorem 3.4 for the first order dominance and Corollary 6 for the second order dominance of transition probabilities of the channel. Consider channels P and Q, and let Hp(h n ) and HQ(h n ) be random variables with the support on H that describe the channel state in the next
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 89
decision epoch as conditioned on the previous state h n E H, for channels P and Q respectively. Where needed, we explicitly state the dependence of costs, feasible delays and value functions on transition probabilities by denote using the appropriate subscripts P and Q. For example C p and the optimal costs of (2.3) for channels P and Q respectively. For a unichain unconstrained MDP model with instantaneous costs given in (2.8), the differential value function V(s; A) of the optimal pure policy x" can be calculated from [4]
Co
L p(s/ls, a)V(s'; A)]
C*(A) + V(s; A) == min [C(S, a; A) + aEA s .
(3.10)
s'ES
where V(sr; A) == 0 for a certain reference state Sr E S. C*(A) is the optimal average cost defined in (2.3). The previous equation is also referred to as the Bellman's equation. The differential value function can be calculated using relative value iteration forming the sequence of estimates Vffi(S; A). If convergence is achieved V(s; A) == limffi~oo sup Vffi(S; A). For a feasible action a E As in state s we can define the state-action value function
Q(s, a; A) == c(s,a; A) +
L p(s'ls, a)V(s'; A).
(3.11 )
s'ES
Function Q( s, a; A) can be perceived as the equivalent instantaneous cost for the dynamic stochastic problem that can be used to find the optimal action in a given state. Now, the optimal policy can be found according to 1f~(s) == argminaQ(s,a;A). The optimal cost C*(A) for a given A can be calculated using (3.10). Let us revert to the transmission scheduling problem formulated in Section 2. Recall as in Section 3.1 that TQ denotes the set of bounded functions on a finite set Q. THEOREM 3.3. Consider tllJO fading channels P and Q 'w'ith Hp(h)
and HQ(h) random variables of the next channel state as conditioned on the previous state h E H, for channels P and Q respectively. Let Assumptions 1, 2, 3 hold. Suppose that Vp([h, b, f]; A), VQ([h, b, f]; A) E g for any b E B, f E F and A E ~+ where 9 is a subset o] all bounded junctions on H i.e. 0 C T1{. Then ifHp(h) dominates HQ(h) on 0, it follows that C*p -> C*Q
(3.12)
for any feasible buffer cost constraint fJ E 'Dp n 'DQ , where 'Dp, 'DQ are feasible cost constraint sets (2.5) of channels P and Q respectively. C p and are optimal costs (2.3) of channels P and Q respectively. In the sequel we discuss the conditions on the channel and cost under which functions V([h, b, f]; A) are the elements of a set of functions 9 c T'}-{, for any b E 13, f E F and A E ~+. In the following Theorem we give
Co
90
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
sufficient conditions under which differential value functions V([h, b, f]; A) belong to the set of non-decreasing functions in h. THEOREM 3.4. Let the Assumptions 1, 2, 3 hold. A sufficient condition for V([h, b, f]; A) to be non-increasing function of h for any b E B, f E :F and A E ~+ is that: (1) H (h) be first order stochastically nondecreasing in h, and (2) c([h, b, f], a) be non-increasing function of h.
The results of Theorem 3.4 can be combined with the results of Theorem 3.3 to establish the following Corollary. COROLLARY 4. Let the Assumptions 1, 2, 3 hold. Let Hp(h) and HQ (h) be first order stochastically increasing in h and let H p (h) be first order stochastically dominating HQ(h) for any h. Then, if c([h, b, f], a) is non-increasing function of h
C*P
< C*Q
(3.13)
-
fJ E 1) p n 1)Q . The below example gives a consequence of this result for the case of autoregressive (AR) models for channel fading gains. Consider the special case of first order AR model of (2.13) with p == 1. i.e. for any feasible average buffer cost constraint
(3.14) COROLLARY 5. Let the Assumptions 1, 2, 3 hold. Consider channels P and Q modeled as in Example 1 and their respective fading gains that have Rician distribution. If both channels are first order AR models (3.14) with the same noise variance 0- 2 and af 2: a~, then Hp(h) is first order stochastically dominating HQ(h) for any h. Then the minimum necessary power cost to achieve certain feasible average buffer cost fJ E 1) p n 1)Q would be less for the channel P than for the channel Q.
Next we consider the second-order stochastic dominance ordering of
Hp(h) over HQ(h) as this will give us an even more precise comparison among two channels. The definition and discussion of the second order stochastic dominance is given in Section 3.1. The following Corollary is stated without proof as it follows the same ideas as Theorem 3.4 and Corollary 4. COROLLARY 6. Let the Assumptions 1, 2, 3 hold. Let Hp(h) and
HQ (h) be second-order stochastically increasing in h and let H p (h) be second-order stochastically dominating HQ(h) for any h. If c([h, b, f], a) is non-increasing and convex function of h for any b E B, f E :F and A E ~+ then C*p
< - C*Q
for any feasible average buffer cost constraint
(3.15)
fJ
E
Vp
n V Q.
Fading channels with stronger scattering have larger variance of the fading gain h. Therefore, based on Corollary 6 we can state the following
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 91
rule of thumb: Channels with less scattering and the same mean require less average transmission cost for the same buffer cost constraint. The second order dominance property of Corollary 6 can also be used to investigate the performance of scheduling algorithms used in conjunction with the Maximum Ratio Combining (MRC) for multichannel receivers (cf. [21]). Consider the use of an optimal scheduling algorithm over a Rayleigh flat fading channel with the distribution I 1_l! p(hlh ) == ~e i
(3.16)
!
of the received power gain h with mean 1, variance 12 and independent of the previous channel state h'. Recall that the power gain is proportional to the signal to noise ratio. The instantaneous cost c(s, a) can be then given as in (2.14) and the constrained buffer cost as in (2.17). We explore the influence of the use of multichannel MRC receiver with K diversity branches with Rayleigh fading gains h l , l == 1,2, ... ,K on the performance of the scheduling algorithm for fixed buffer cost constraint fJ. To ensure a fair comparison, it is assumed that each diversity branch has the same noise power as in the single diversity system. Furthermore, the average powers are equally distributed across all diversity branches, i.e. IE[h l ] ==
k'
;;z == ~ == l == 1,2, ... , K. The equivalent power gain after MRC combining is given by [21] (3.17) Under the above assumptions on the distribution of signal and noise powers across the diversity branches, the equivalent power gain after MRC diversity combining has the chi-square distribution (cf. [21], p. 267) K
1
1
P (hlh) = (K -l)!("Y/K)K h
K -1 _ ~h
e
~
(3.18)
-2
with mean l' and variance k' Since the distribution of the single channel receiver gain have the same mean but larger variance than the corresponding multichannel receiver gain, we can employ Corollary 6 and the properties of second order stochastic dominance to show that multichannel MRC receivers will always have smaller average transmission cost than the single channel receivers, for the fixed feasible average buffer cost. Using the same approach it can also be shown that receivers with more branches will also perform better. The following Corollary summarizes this discussion. COROLLARY 7. Consider the Maximum Ratio Combining reception
system (cf. (3.17)) with K diversity branches with independent fading processes. Let each branch follow a Rayleigh fading distribution with average
92
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
signal to noise ratio S~R for some fixed constant SNR. Under the conditions of Corollary 6 the average transmission cost is a decreasing function of the number of diversity branches for any feasible buffer cost constraint ti. 4. Proofs. Proof of Theorem 3.1. Let us first consider the case that the buffer cost constraint fJ in (2.4) is chosen such that a pure stationary policy exists that is optimal for the stated communication model and that constraint (2.4) holds with the equality for the optimal policy, i.e. the constraint (2.4) is active. Later this condition is relaxed and a non-decreasing property is shown for general constraints and randomized policies. This implies that there exist a Lagrange multiplier A and the optimal pure policy 1r~ that attain the sup and min respectively in the righthand side of (2.6), such that the global constraint D (1r~) == b. In order to prove that 1r~ ([h, b, f]) is increasing function of buffer occupancy b, we have to demonstrate that Q([h, b, f], a;'x) is submodular function in the pair (b, a). First notice that according to the statement of the theorem c([h, b, f], a;'x) is submodular function of (b,a). Therefore we only need to show that the second term of (3.11)
Q1([h, b, f], a; ,X)
=
L L ph(h'lh)pf(f'If)V([h',b-1J!(a)+G!,,!,];'x)
(4.1)
h'EH fIEF
is submodular function of (b, a) for any hand f. Here we used the Assumptions 1,2,3,4 of our model to simplify (3.11). We first state the following Lemma whose proof is after the end of the proof. LEMMA 4.1. Under the Assumptions 1,2,3,4, V([h,b,f];'x) is con-
vex increasing function of buffer state b for any h, f and ,X given a convex increasing instantaneous cost function c([h, b, f], a;'x) in b and concave function w(a) . Now, if function V([h,b,f];'x) is convex in buffer state b, it can be shown that V([h',b - \lJ(a) + Gf',f'];A) is also submodular in (b,a) for any h' E H, f' E :F. This follows by noting that for a convex function V([h, b, f];,X) of b it holds that V([h, x, f];,X) + V([h, y, f];,X) ?: V([h, ax + (1 - a)y, f];,X)
(4.2)
+ V([h, (1 -
a)x + ay, j];,X)
for certain 0 ::; a ::; 1. This is a direct consequence of the definition of convex function V([h,b,j];'x). Substituting x == b - w(a') + Gj,y == b' - w(a) + Gf and a == (\l1(a') - \l1(a))/(\l1(a') - \l1(a) + b' - b) in the previous equation and rearranging the terms we can get the following
V([h, b - w(a') + Gj, j];,X) + V([h, b' - w(a) + Gj, j];'x) (4.3) ?: V([h, b' - w(a') + G f, f];,X) + V([h, b - w(a) + G j, j];'x)
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 93
Rearranging the terms of the previous inequality we get
V([h, b' - w(a/) + G[, f]; A) - V([h, b' - w(a) + Gt, f]; A)
< V([h, b -
(4.4)
w(a/) + Gf, f]; A) - V([h, b - w(a) + Gf, f]; A)
that for a/ ~ a and b' ~ b is equivalent to the submodularity of V([h, b- W(a) + G f]; A) in the pair (b, a) for some channel and traffic states hand f. Furthermore, positive weighted sum of submodular functions is also submodular, which establishes the submodularity of Q([h, b, f], a; A) in (b, a) and monotonic structure in buffer occupancy of the optimal pure policy for active constraint. Using the result of [6] the optimal randomized policy for a general constraint jj is a mixed policy of two pure policies that can be computed for two different Lagrange multipliers. As discussed above, for A > 0 all optimal pure policies are non-decreasing in the buffer occupancy and both of the policies that constitute the mixed policy for a general constraint jj possess that non-decreasing structure as well. Further, according to [18], the number of states with randomized actions in a unichain MDP model with only one constraint is no more than 1. Therefore the pure policies that constitute the optimal mixed policy differ in only one state. This concludes the proof. D Proof of Lemma 4.1. The proof follows by mathematical induction and using the relative value iteration. As RVI converges for any initial VO([h, b, f]; A), let us choose that VO([h, b, f]; A) is a convex increasing function of buffer state b. The non-decreasing property of function vm([h, b, f]; A) in b follows from the non-decreasing property of the instantaneous Lagrangian cost c([h, b, f], a; A) in b. Now, we will show that increasing and convex property of Vm([h, b, I]; A) implies the increasing and convex property of V m +1 ([h, b, f]; A) in b. Note that under the Assumption 4 minimization in Lindley formula (2.11) can be omitted for any feasible action a and buffer occupancy b. According to the value iteration algorithm V m+ 1([h, b, I]; A) == minQm([h, b, f], a; A). a
(4.5)
Under the conditions of the lemma, as shown above, if vm([h, b, f]; A) is convex in b, then Qm([h, b, f], a; A) is submodular in (b, a). Therefore,
Qm([h,b',f],a';A) - Qm([h,b,f],a';A)
< Qm([h, b',f], a; A) - Qm([h, b, f], a; A) for some a/ ~ a and b' ~ b. Using the convexity of Qm([h, b, f], a; A) in b, the previous equation implies
94
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
Qm([h,
u, f], a'; A) -
Qm([h, b, f], a'; A) ~ Qm([h, b' +p, f], a; A) - Qm([h, b + p, f], a; A)
for certain p 2: o. Rearranging the terms of the previous equation and substituting b' == band b == b- p we can get
Qm([h, b+ p, f], a; A) - Qm([h, b, f], a'; A) 2: Qm([h, b, f], a; A) - Qm([h, b- p, f], a'; A) a == argminaQm([h,b,f],a;A) and using (4.5)
Now, substituting a' we get
V m+1([h,b+p,f];A) - V m+1([h,b,f];A)
2: V m +1([h, b, f]; A) - V m +1([h, b- p, f]; A) which is equivalent to convexity of vm+l ([h, b, f]; A) in b. This concludes the proof. D Proof of Theorem 3.2. (1) The non-decreasing property of function C* (D) follows from the positivity of costs c(s, a) and d(s, a). As discussed in Section 3.3 finite state CMDP with finite number actions has only a finite number of pure policies. Only a finite number of pure policies exist that attain the minimum in (2.6) and their respective Lagrange multipliers can be given by (3.4) for fJ > O. Therefore (2.6) can be rewritten as
C*(D)
=
L (JC ,>.) - >'D) Ip.=suPA~omin"E.pD 7f
J(1r,>")->"D}
(4.6)
..\EA
where I{x} is the indicator function that returns 1 if x is true, and 0 otherwise. The piecewise linear characterization (3.3) of the achievable region C*(D) follows by differentiating (4.6) with respect to D and using the continuity arguments of Lagrangian costs c(s, a; A) in A. (2) According to Theorem 2.1 for a feasible constraint fJ E V, the optimal average cost can be calculated as
c(15) == SUpC*(A) - AD
(4.7)
..\>0
where C*(A) is defined in (3.10). The non-decreasing property of function C(D) follows from the positivity of costs c(s, a) and d(s, a). To prove that C(D) is convex in D, we have to show that C*(A) - AD is convex in A for each D. This follows from the fact that max; f(x, a) is convex in x if f(x,a) is convex in x for each a [19].
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING 95
This is equivalent to showing that C*(A) is convex in A. Further, C*(A) can be calculated for a known differential value function V[h, b, f]; A) from
C*(A) == min[c([hr , br , fr], a; A) a
+
L
:r
h' EH
f' EF
(4.8)
ph(h'lh r )pf (f'lfr )V([h', br - w(a) + G1', f']; >')]
for a certain reference state S r == [hr , br , j r ]. Previous equation follows under Assumptions 1, 2, 3 and 4 from (3.10). Next, we show that if V ([h, b, j]; A) is jointly convex in buffer state b and A, then V ([h, b- (a) + G f, f] is also jointly convex in a and A. If V([h, b, f]; A) is convex in a and A, it follows that
w
aV([h, b- W(al)+Gj, j]; AI)+(1-a)V([h, b- w(a2)+G j, f]; A2) a
2 V([h, b-(aw(al)-(1-a)w(a2))+Gj, f]; aAI +(1-a)A2) b
2 V([h, bl
-
(4.9)
w(aal +(1-a)a2)+G j, j]; aAI + (1-a)A2)
a
b
where 2 follows from convexity of V([h, b, j]; A) in b and A and 2 follows from the concavity of \lJ(a) and non-decreasing property of V([h, b, f]; A) in b. The proof that V([h, b, j]; A) is convex in b and A is omitted as it stems from the covexity of c([h, b, j], a; A) in b and A by following the similar steps as in proof of Lemma 4.1. Therefore, V ([h, b- (a) + G I, j] is jointly convex in a and A. The convexity of C* (A) now follows by noting that
w
c([hr, br , fr], a; A)+
L L h' EH
ph(h'lh r )pf (f'lfr)V([h', b; - \lJ(a)
+ Gf', f']; A)
f' EF
of (4.8) is jointly convex in a and A. Using the property that g(x) min a f(x, a) is a convex function of x for a jointly convex function f of (x, a), it follows that C* (A) is also convex function of A which concludes the proof. 0 Proof of Theorem 3.3. Based on the Theorem 1 we have that
C p == sup min Jp(7f, A) A>07TE
AD
(4.10)
Similar equation follows for channel Q. Now min.; J p( 7f; A) can be determined through relative value iteration algorithm of Section III and the proof follows by mathematical induction. Start by choosing V~(s; A) 2: V8(s; A)
96
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
Vp(s; A) ~ VQ(s; A) implies V;+l (s; A) >
for any initial state s.
VQ+1(s; A) since
V;+l(s;'x)
= m}n [C(S, a; A) + L
p(s/ls,
a)Vp(~; .:\)]
s/ES
=
mln[c(s,a;'x) + L
pi (J'If) LPi(h'lh)
fIEF X
h'
Vp([h', min(b + Gf' - '1J(a), L), /'];.:\)
2mln[c(s,a;'x) + L
pi(J'If) LP~(h'lh)
fIEF X
Vp([h', min(b + Gt'
b
~ m}n[c(s, a;'x) + L
- '1J(a), L), /']; .:\)]
b
~
pi (J'If) LP~(h'[h)
f'EF X
(4.11)
h'
li'
VQ([h ', min(b + G f'
- '1J(a), L), /']; .:\)]
1
== VQ+ (s) a
where ~ follows from the stochastic dominance of Hp (h) over HQ(h), while b
.
~
follows from the induction assumption Vp(s;.:\) ~ VQ(s; A). Since the value iteration is converging, the previous result further implies (4.12) for any initial state distribution and A E ~+. For D E Dp constraint in (2.4) is active constraint and it follows that
Cp
==
n DQ
supminCp(A,u) - AD ~ supminCQ(A,u) - AD == CQ A>O
A>O
U
the
(4.13)
U
which concludes the proof.
0
Proof of Theorem 3.4. The proof of this Theorem proceeds by employing mathematical induction and the relative value iteration for the calculation of vm([h, b, I]; A). Start by choosing VO([h, b, I]; A) that is nonincreasing in h. Then assuming that vm([h, b, I]; A) is non-increasing in h and the statement of the theorem it follows that V;+l ([h, b, I]; A) given by
V;+l([h, b, f],'x) = ~n[c([h, b, f], a) + L
p(J'If) LPp(h'[h)
f' EF X
Vp([h', min(b + GI'
h'
- w(a), L), I']; A)]
(4.14)
is also non-increasing in h, where we have used the stochastic dominance property of Hp(h '). D
STRUCTURAL RESULTS ON OPTIMAL TRANSMISSION SCHEDULING
97
5. Conclusions. This paper establishes general structural results of optimal pure and randomized policies for the constrained MDP formulation of the transmission scheduling problems. It is shown that the optimal policies are monotonically increasing provided that certain conditions on the convexity of the instantaneous transmission and buffer costs are satisfied. A particularly interesting and useful extension of the work presented in this paper is to devise efficient adaptive control algorithms that can adaptively improve the control policies in unknown environments. Since the state space of the MDP for our transmission scheduling model can be large, it is of interest to employ the structure of the optimal policies in order to speed up the convergence of algorithms such as the Q-Iearning or TD learning [5]. Some similar algorithms that use the structure of the policies and value function in order to simplify the policy improvement of the iterative algorithms have been established in [7], [8]. The presented results can also be used to investigate the influence of correlations of the channel and traffic on the optimal average transmission costs such as power cost. We have established the condition under which it is certain that wireless channels with less scattering would have smaller necessary transmission power for the same transmission latency. This result is also extended to show that maximum ratio combining decreases the necessary average transmission power for a fixed average latency. We also refer the reader to [25], [26] for gradient based simulation optimization algorithms for adaptive control of constrained Markov decision processes. The algorithms in [25], [26] use measure valued differentiation to optimize the constrained MDP via a primal dual type stochastic approximation algorithm. In future work we plan to use the structural results in this paper to analyse constrained MDPs in cross layer admission control of multiclass networks comprising of CDMA users - see [27] for the constrained MDP formulation.
REFERENCES [1] E. ALTMAN, Constrained Markov Decision Processes: Stochastic Modeling, London: Chapman and Hall CRC, 1999. [2] B.E. COLLINS, Transmission policies for time varying channels with average delay constraints, in Proc. of Allerton Conf. on Comm., Control and Compo (1999). [3] R. BERRY AND R. GALLAGER, Communication over fading channels with delay constraints, IEEE Trans. on Inform. Theory (2002), pp. 1135-1149. [4] D.P. BERTSEKAS, Dynamic Programming and Optimal Control, Vol. 2, Belmont, Massachusetts: Athena Scientific, 1996. [5] D.P. BERTSEKAS AND J. TSITSIKLIS, Neuro-Dynamic Programming, Belmont, Massachusetts: Athena Scientific, 1996. [6] F.J. BEUTLER AND K.W. Ross, Optimal policies for controlled Markov chains with a constraint, Journal of Math. Anal. and Applications, 112 (1985), pp. 236-252. [7] C. BOUTILIER, R. DEARDEN, AND M. GOLDSZMIDT, Exploiting structure in policy construction, in Proc. Fourteenth Inter. Conf. on AI (IJCAI-95) (1995), pp. 1104-1111.
98
DEJAN V. DJONIN AND VIKRAM KRISHNAMURTHY
[8] R.P.C. GUESTRIN, D. KOLLER, AND S. VENKATARAMAN, Efficient solution algorithms for factored MDPs, Journal of Artificial Intelligence Research, 19 (2003), pp. 399-468. [9] D. RAJAN, A. SUBHARWAL, AND B.AAZHANG, Delay and rate constrained transmission policies over wireless channels, in Proc. of Globecom Conf. (2001), pp. 806-810. [10] H. WANG AND N.B. MANDAYAM, Delay and energy constrained dynamic power control, in Proc. of Globecom Conf., 2 (2001), pp. 1287-1291. [11] J.M. CIOFFI, A multicarrier premier, available at http:j jwww.stanford.eduj groupjcioffijpdfjmulticarrier.pdf (Nov. 1999). [12] M. GOYAL, A. KUMAR, AND V. SHARMA, Power constrained and delay optimal policies for scheduling transmission over a fading channel, in Proc. of INFOCOM (2003), pp. 311-320. [13] A.K. KARMOKAR, D.V.DJONIN, AND V. K. BHARGAVA, Delay constrained rate and power adaptation over correlated fading channels, in Proc. of Globecom Conf. (2004), pp. 3448-3453. [14] B. PRABHAKAR, E. UYSAL-BIYIKOGLU, AND A.E. GAMAL, Energy-efficient transmission over a wireless link via lazy packet scheduling, in Proc. of INFOCOM (2001), pp. 386-394. [15] A. MULLER, How does the value function of a Markov decision process depend on the transition probabilities'?, Mathematics of Operations Research (1997), pp. 872-895. [16] O. HERNANDEZ-LERMA AND J .-E. LASSERRE, Discrete-time Markov control processes: Basic optimality criteria, Springer, New York, 1996. [17] M.L. PUTTERMAN, Markov Decision Procsses: Discrete Stochastic Dynammic Programming, New York: John Wiley & Sons, 1994. [18] K.W. Ross, Randomized and, past-dependent policies for Markov 'decision processes with multiple constraints, Operations Research, 37 (1987), pp. 474-477. [19] S. BOYD AND L. VANDENBERGHE, Convex Optimization, Cambridge University Press, 2003. [20] M. SHAKED AND J .G. SHANTIKUMAR, Stochastic Orders and Their Applications, Academic Press, San Diego, CA, 1994. [21] M.K. SIMON AND M.-S. ALOUINI, Digital Communication over Fading Channels: A Unified Approach to Performance Analysis, John Wiley & Sons, New York, 2000. [22] J. E. SMITH AND K. F. MCCARDLE, Structural properties of stochastic dynamic programs, Operations Research (2002), pp. 796-809. [23] D. M. TOPKIS, Supermodularity and Complementarity, Princeton University Press, Princeton, NJ, 1998. [24] R.A. ZIEGLER AND J .M. CIOFFI, Estimation of time-varying digital radio channels, IEEE Trans. on Vehicular Tech. (1992), pp. 134-151. [25] F. VAZQUEZ ABAD AND V. KRISHNAMURTHY, Self Learning Control of Markov Chains - A Gradient Approach, Proceedings of 41st IEEE Conf. on Decision and Control, Las Vegas, pp. 1940-1945, 2002. [26] F. VAZQUEZ ABAD AND V. KRISHNAMURTHY, Constrained Stochastic Approxima-
tion Algorithms for Adaptive Control of Constrained Markov Decision Processes, Proceedings of 42nd IEEE Conf. on Decision and Control, pp. 28232828, 2003. [27] S. SUNGH, V. KRISHNAMURTHY, AND H.V. POOR, Integrated Voice/Data Call Admission Control for Wireless DS-CDMA Systems with Fading, IEEE Transactions Signal Processing, Vol. 50, No.6, pp. 1483-1495, June 2002.
ENTROPY, INFERENCE, AND CHANNEL CODING J. HUANG*, C. PANDITt, S.P. MEYN+, M. MEDARD§, AND V. VEERAVALLI'
Abstract. This article surveys application of convex optimization theory to topics in Information Theory. Topics include optimal robust algorithms for hypothesis testing; a fresh look at the relationships between channel coding and robust hypothesis testing; and the structure of optimal input distributions in channel coding. A key finding is that the optimal distribution achieving channel capacity is typically discrete, and that the distribution achieving an optimal error exponent for rates below capacity is always discrete. We find that the resulting codes significantly out-perform traditional signal constellation schemes such as QAM and PSK. AMS(MOS) subject classifications. Primary: 94A24. 94A13, 94A17. Secondary: 94A34, 94A40, 60F10. Key words. Information theory; channel coding; error exponents; fading channels.
1. Introduction. This article surveys application of convex optimization theory to topics in information theory. Our main focus is on channel coding, and the relationships between channel coding and robust hypothesis testing. The optimization problems considered in this paper concern minimization or maximization of a convex function over the space of probability measures. The focus is on the following three central areas of information theory: hypothesis testing, channel capacity, and the exponential bounds on the probability of error in channel coding. One foundation of this work lies in the theory of convex optimization [5, 9]. In particular, the structural properties obtained are based on convex duality theory and the KuhnTucker alignment conditions. A second foundation is entropy. Recall that for two distributions /-l,7r the relative entropy, or Kullback-Leibler diver-
gence is defined as, if J.l -< n, otherwise. Relative entropy plays a fundamental role in hypothesis testing and communications, and it arises as the natural answer to several important ques*Marvell Technology, Santa Clara, CA (j ianyih(Omarvell. com). tMorgan Stanley and Co., 1585 Broadway, New York, NY 10019 (charuhas.pandit(O morganstanley. com). +Department of Electrical and Computer Engineering and the Coordinated Sciences Laboratory, University of Illinois at Urbana-Champaign (meyn
100 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI tions in applications in data compression, model-selection in statistics, and signal processing [29, 15, 12, 4, 16, 13, 14, 20, 36, 8, 17, 31].
1.1. Channel models. We consider a stationary, memoryless channel with input alphabet X, output alphabet Y, and transition density defined by
P(Y E dy I X == x) == p(Ylx) dy,
x E X, Y E Y.
(1.1)
It is assumed that Y is equal to either IR or C, and we assume that X is a closed subset of IR. For a given input distribution f.-l on X, the densitity for the marginal distribution of the output is denoted,
pJ1-(dy) =
J
jl(dx)p(y[x),
Y E Y.
(1.2)
Many complex channel models in which X is equal to C are considered by viewing f.-l as the amplitude of X. Details on this transformation are given prior to Theorem 1.1 below. Throughout the paper we restrict to noncoherent channels in which neither the transmitter nor the receiver knows the channel state. A signal constellation is a finite set of points in X that is used to define possible codewords. Two well known examples when X == Care quadratureamplitude modulation (QAM), and phase-shift keyed (PSK) coding. In these coding schemes the codewords are chosen using a random code constructed with a uniform distribution across the given signal constellation. These methods are largely motivated by properties of the additive Gaussian noise (AWGN) channel, where it is known that a random code book obtained using a Gaussian distribution achieves capacity. The problem of constellation design has recently received renewed attention in information theory and communication theory. While many techniques in information theory such as coding have readily found their way into communication applications, the signal constellations that information theory envisages and those generally considered by practitioners differ significantly. In particular, while the optimum constellation for an AWGN channel is a continuous constellation that allows for a Gaussian distribution on the input, commonly used constellations over AWGN channels, such as quadrature amplitude modulation (QAM), are not only discrete, but also generally regularly spaced. This gap between theory and practice can be explained in part by the difficulty of deploying, in practical systems, continuous constellations. However, there is also a body of work which strongly suggests the continuous paradigm favored by theoreticians is inappropriate for realistic channel models in the majority of today's applications, such as wireless communication systems. Under any of the following conditions the optimal capacity achieving distribution has a finite number of mass points, or in the case of a complex channel, the amplitude has finite support:
101
ENTROPY, INFERENCE, AND CHANNEL CODING
(i) The AWGN channel under a peak power constraint [38, 37, 32, 10].
(ii) Channels with fading, such as Rayleigh [1] and Rician fading [22, (iii)
(iv)
This cally
21]. Substantial generalizations are given in [25]. Lack of channel coherence [27]. For the noncoherent Rayleigh fading channel, a Gaussian input is shown to generate bounded mutual information as SNR goes to infinity [11, 30]. Under general conditions a binary distribution is optimal, or approximately optimal for sufficiently low SNR ([19], and [39, Theorem 3].) article provides theory to explain why optimal distributions are typidiscrete based on the Kuhn-Tucker alignment conditions.
1.2. Capacity and error exponents. Operator-theoretic notation is convenient in the convex-analytic setting of this article. We let M denote the set of probability distributions on the Borel sigma field on some state space X, which is always taken to be a closed subset of Euclidean space. For any distribution jJ EM, and any measurable function f: X - t IR, we denote
\IL, J):=
J
f(x) lL(dx).
Mutual information is defined as,
J-LEM,
(1.3)
and channel capacity is determined by maximizing mutual information subject to two linear constraints: (i) The average power constraint that
(J-L,¢) < a~ where ¢(x) :== x 2 for x E JR. (ii) The peak power constraint that J-L is supported on X n [-M, M] for a given M ~ 00. Hence the input distribution is constrained by J-L E M (a~, M, X), where
M((J~, M, X) := {IL EM: \IL, ¢)
< (J~,
IL{[-M, M]} = 1},
(1.4)
and the capacity C(a~, M, X) is expressed as the value of a convex program,
sup subject to
I(J-L) J-L
E M(a~,
M, X).
(1.5)
The channel reliability junction is denoted,
R>O,
(1.6)
102 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI Er(R) Optimal code 0.25
---_ ...--
QAM
0.20
0.15
,, ,, ,, ,, ,, ,,
0.10
0.05
0
' ... '
....
-
0.1
0
R 0.3
0.2
0.4
0.5
Fig. 1: The error exponent E: (R) for the two codes shown in Figure 11. The 3-point constellation performs better than 16-point QAM for all rates R ~ C.
where Pe(N, R) is the minimal probability of error, where the minimum is over all block codes of length N and rate R. The random coding exponent Er(R) may be expressed for a given R < C via, Er(R) =
sup ( 0:Sp:Sl
[-pR -log(GP(Jl))]) ,
sup
(1.7)
1-LEM(a~,M,X)
where for each p 2 0 we define,
GP(Jl):=
J[J
Jl(dx)p(Ylx)l/(HP)] Hp dy.
(1.8)
The following random-coding bound holds under the assumptions imposed in this paper,
Pe(N,R)::; exp(-NEr(R)),
N2:1,R2:0.
Moreover, the equality E(R) == Er(R) holds for rates greater than the critical rate
Refit
[7, 18].
If one can design a distribution with a large error exponent, then the associated random code can be constructed with a correspondingly small block-length. This has tremendous benefit in implementation. Figure 1 shows an example taken from [26) that illustrates how a better designed code can greatly out-perform QAM for rates R below capacity. Optimization of the random coding exponent is addressed as follows. Rather than parameterize the optimization problem by the given rate R > 0, we consider for each Lagrange multiplier p the convex program, inf
subject to
GP(J-l) J.-l E
M(a~,M,X).
(1.9)
ENTROPY, INFERENCE, AND CHANNEL CODING
103
The value is denoted GP*, and we have from (1.7)
Er(R) == sup [-pR -log(GP*)] . O~p~l
Our objective is to explore the structure of optimal input distributions achieving either channel capacity or the random coding exponent Er(R). Instead of studying individual channel models, which have been the topics of numerous papers, we take a systematic approach to study these problems under very general channel conditions. We believe this viewpoint will clarify the our applications of optimization theory to information theory.
1.3. Assumptions and examples. The basis of our analysis is the structure of two sensitivity functions that may be interpreted as gradients with respect to J-l of the respective objective functions I(J-l) and GP(J-l). The channel sensitivity function is defined by gl'(X):= / log [p(y!x)/pl'(y)]p(y!x) dy,
x E lR,
(1.10)
where PJ-L was defined in (1.2), For each x, gJ-L(x) is the relative entropy between two probability distributions p(ylx) and PJ-L(Y). The error exponent sensitivity function is given by,
g~(X):= /
r
[/ J-t(dz)p(y!z)l j (l+ P) p(ylx)l j ( l+ p) dy,
x E X.
(1.11)
The following limit follows from elementary calculus,
_ I'
-gJ-L (x ) -
Ilfl P-'O
log g~ (x) P
,
x E JR.
The existence of a solution to (1.5) or (1.9) requires some conditions on the channel and its constraints. We list here the remaining assumptions imposed on the real channel in this paper. (Al) The input alphabet X is a closed subset of JR., Y == C or JR., and min(a~, M) < 00. (A2) For each n 2: 1, lim P(IYI Ixl-+oo
< niX == x) == O.
(A3) The function log(p( . I . )) is continuous on X x Y and, for any Y E Y, log(p(yl')) is analytic within the interior of X. Moreover, gJ-L is an analytic function within the interior of X, for any J-l E M(a~,M,X). Conditions (AI )-(A3) are also the standing conditions in [25, 26]. A complex channel model is more realistic in the majority of applications. We describe next a general complex model, defined by a transition
104
J.
HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
density p.(vlu) on C x C. The input is denoted U, the output V, with U E U == a closed subset of C, and V E V == C. The input and output are related by the transition density via,
P{V E dv I U == u} == p.(vlu) dv,
u,v E C.
The optimization problem (1.5) is unchanged: The average power constraint is given by E[IUI 2 ] ~ a~, and the peak-power constraint indicates that lUI ~ M, where Izi denotes the modulus of a complex number z E C. It is always assumed that the complex model is symmetric: symmetric: p. (v Iu)
== P. (ej v Iej u) , Q
u, v E C, a E IR.
Q
(1.12)
Under (1.12) we define, (i) X == lUI, X == U n IR+, and M again denotes probability distributions on B(X); (ii) For any J.l E M, we define J.l. as the symmetric distribution on C whose magnitude has distribution u. That is, we have the polarcoordinates representation,
1
J.l.(dx x da) == -J.l(dx) do , 27fx
x
> 0, 0 < a
< 27f,
(1.13)
and we set J.l( {O}) == J.l. ({O} ). This is denoted J.l~ in the special case J.l == <5 x . For each x E X, the distribution J.l~ coincides with the uniform distribution on the circle {z E
(iv) g/-L: lR+ ~ lR+ is defined as the channel sensitivity function corresponding to the transition density p. This may be expressed,
where J.l. and J.l correspond as in (ii). The symmetry condition (1.12) is a natural assumption in many applications since phase information is lost at high bandwidths. It is shown in [25] that in all of the standard complex channel models, Assumptions (A1)-(A3) hold for the corresponding real channel with input X == lUI. Moreover, both the capacity and random coding exponent Er(R) for the complex and real models coincide. We recall the most common channel models here:
The Rician channel This is the general complex fading channel, in which the input and output are related by, V:=(A+a)U+N
(1.14)
ENTROPY, INFERENCE, AND CHANNEL CODING
105
where U and V are the complex-valued channel input and output, a 2:: 0, and A and N are independent complex Gaussian random variables, A Nc(O, O"~) and N rv Nc(O, O"~). The Rician channel reduces to the complex AWGN channel when O"~ == o. Throughout this paper we assume that N and A are circularly symmetric. Consequently, V has a circularly symmetric distribution whenever the distribution of U is circularly symmetric. 0 f"V
On setting a == 0 in (1.14) we obtain another important special case:
The Rayleigh channel. The model (1.14) with a == 0 is known as the Rayleigh channel. Under our standing assumption that N, A have circularly symmetric distributions, it follows that the output distribution is symmetric for any input distribution (not necessarily symmetric.) Based on this property, the model may be normalized as follows, as in [2]: Setting X == IUIO" A/O"N and Y == IVI 2/O"~, we obtain a real channel model with transition density x, Y E IR+.
(1.15)
o The phase-noise channel. This noncoherent AWGN channel emerges in communication systems where it is not possible to provide a carrier phase reference at the receiver [28]. The channel model is represented by
V == Uej (} + N, where U and V are the complex-valued channel input and output, N is an independent complex Gaussian random variable with variance 20"~, and () is an independent random phase distributed uniformly on [-1r, 1r]. It is easy to see the input phase does not convey any information, and the mutual information is decided by the conditional probability density of the channel output amplitude Y given the channel input magnitude X,
y exp (y2 p(ylx) == -2-
«»
+2x 2 ) f o ( -2xy ) ,
2a N
«»
(1.16)
where f o is the zeroth-order modified Bessel function of the first kind. 0 The sensitivity function gp, is easily computed numerically for the Rayleigh or phase-noise channels based on (1.15) or (1.16). For the general Rician model, computation of gP, appears to be less straightforward since this requires computation of gP,e' which involves integration over the complex plane. The capacity-achieving input distribution is discrete under conditions imposed here when M is finite [25]. We find that the distribution optimizing the error exponent E r for a given positive rate R < C always has finite support, with or without a peak power constraint. Consequently,
106
J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
in the symmetric complex channel, the distribution is symmetric, and its magnitude has finite support. The following result provides a summary of results obtained for the real channel or the symmetric complex channel. The proof of Theorem 1.1 (ii) follows directly from Propositions 3.2 and 3.3. THEOREM 1.1. The following hold for the real channel model under Assumptions (Al)-{A3): (i) If M < 00 then there exists an optimizer f.1* of the convex program (1.5) definining capacity. The distribution J-L* has finite support. (ii) Consider the convex program (1.9) under the relaxed condition that min(a~, M) < 00. For each p there exists an optimizer J-L P , and any optimizer has finite support. Moreoever, for each R E (0, C) there exists p* achieving the maximum in (1. 7) so that,
Er(R) == -p* R -log(GP*) == -p* R -log(GP* (f.1P*)).
o The remainder of the paper is organized as follows: Section 2 reviews well known results on hypothesis testing, along with a promising new approach to robust hypothesis testing. The formulae for capacity and the random coding exponent are explained, based on results from robust and ordinary hypothesis testing. Section 3 contains results from [25, 26] showing why the capacityachieving input distribution and the distribution optimizing the error exponent are discrete. The numerical results contained in Section 4 show that the resulting codes can significantly out-perform traditional signal constellation schemes such as QAM and PSK. Section 5 concludes the paper.
2. Hypothesis testing and reliable communication. The focus of this section is to characterize optimal input distributions in hypothesis testing, channel capacity, and error exponents. A goal is to clarify the relationship between the solutions to these three optimization problems. In Section 2.1 we survey some results from [23, 6, 41] on asymptotic hypothesis testing based on Sanov's Theorem. These results will be used to set the stage for the convex analytic methods and geometric intuition to be applied in the remainder of the paper. We first briefly recall Sanov's Theorem: If X is a real-valued sequence, the empirical distributions are defined as the sequence of discrete probability distributions on B, 1 fN(A)
= N
N-l
L ll{X
k E
A},
AEB.
(2.1)
k=O
Suppose that X is i.i.d. with marginal distribution n, Sanoy's Theorem states that for any closed convex set A ~ M, lim -N-1logP{fN E A} == inf{D(f.1l1n) : f.1 E A}.
N-+-oo
ENTROPY, INFERENCE, AND CHANNEL CODING
107
The relative entropy is jointly convex on M x M, and hence computation of the minimum of D(p,lln) amounts to solving a convex program.
2.1. Neyman-Pearson hypothesis testing. Consider the binary hypothesis testing problem based on a finite number of observations from a sequence X == {Xt : t == 1, ... }, taking values in the set X == IR d . It is assumed that, conditioned on the hypotheses Hi, or HI, these observations are independent and identically distributed (i.i.d.). The marginal probability distribution on X is denoted n j under hypothesis H, for j == 0,1. The goal is to classify a given set of observations into one of the two hypotheses. For a given N ~ 1, suppose that a decision test ¢ N is constructed based on the finite set of measurements {Xl, ... , XN}. This may be expressed C XN . The test declares that as the characteristic function of a subset hypothesis HI is true if ¢N == 1, or equivalently, (X1,X2 , ... ,XN ) E Af'. The performance of a sequence of tests ¢ :== {¢N : N ~ I} is reflected in the error exponents for the type-II error probability and type-I error probability, defined respectively by,
Af
!4>
:== -lim inf N-Hx)
1 N log(P 11"1 (¢N(X I , . . . ,XN) == 0)),
1 J¢ :== -lim inf N log(P 11"0 (¢w(X1 , ... , X N ) N-HX)
= 1)).
The asymptotic N-P criterion of Hoeffding [23] is described as follows: For a given constant 7] ~ 0, an optimal test is the solution to the following optimization problem, sup I,
subject to
(2.2)
¢
where the supremum is over all test sequences ¢. The optimal value of the exponent I¢ in the asymptotic N-P problem is described in terms of relative entropy. It is shown in [41] that one may restrict to tests of the following form without loss of generality: for a closed set A ~ M, ¢N == TI{fN E A},
(2.3)
where [T N } denotes the sequence of empirical distributions (2.1). Sanov's Theorem tells us that for any test of this form,
For an arbitrary measure t: E M and for {3 E IR+, consider the divergence set,
(2.4)
108 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
j.l(log(l)) = j.l" (log (l ))
........................................................
Fig . 2: The Neyman-Pearson hypothesis testing problem. The likelihood ratio test is interpreted as a separating set between the convex sets Qf)(7l"0) and QI3"(7l"1).
The divergence set Qt(n) is a closed convex subset of M since D(· II .) is jointly convex and lower sem i-continuous on M x M. Consequently, from Sanov's Theorem the smallest set A that gives 14> ~ TJ is the divergence set A* = Q~(nO), and hence the solution to (2.2) is the value of the convex program,
Theorem 2.1 may be interpreted geometrically as follows. We have "'(* E Q~ (nO) n (n l), and the convex sets Q~ (nO) and (n l) are separated by the following set, which corresponds to the test sequence in
Qt"
Qt"
(2.7):
n=b
EM : ("'( , log £)
= ("'(* , log £)}.
This geometry is illustrated in Figure 2. THEOREM 2.1. Suppose that {nO,n l} have strictly positive densities on X = IR d , denoted {pO , pI}, and suppose that the optimal value of 14> in (2.2) is finite and non-zero. Then the following statements hold, (i) The optimal value of (2.2) is given by the minimal Kullback-Leibler divergence (3* given in (2.5). (ii) There exists p* > 0 such that the following alignment condition holds for the optimizer "'(* of (2.4):
d"'(* () * 1 d,,/* () (3* + P* T/, 1og dn 1 x + p og dn o x::;
x E X,
with equality almost everywhere. Consequently, the optimizer "'(* E Q~(nO) has density,
x E X, where k o > 0 is a normalizing constant.
(iii) {3* =
rr;~{ -PrJ -
(1 +
p) log(J (pO (x)) If,; (pI (x)) l~P dX) },
where the maximum is attained at the value of p" in (ii).
(2.6)
109
ENTROPY, INFERENCE, AND CHANNEL CODING
(iv) The following log-likelihood ratio test (LRT) is optimal, described as the general test (2.3) using the set, A:=={,EM:("logP) ~ jJ*-1]},
(2.7)
where P denotes the likelihood ratio P == d7fo / dn 1 . Proof Part (i) of Theorem 2.1 is due to Hoeffding [23]. This result follows from Sanov's Theorem as described above. Parts (ii) and (iii) were established in [6]. We sketch a proof here since similar ideas will be used later in the paper. To prove (ii) we construct a dual for the convex program (2.5). Consider the relaxation in which the constraint, E ('rrO) is relaxed through the introduction of a Lagrange multiplier p E IR+. The Lagrangian is denoted,
Qt
(2.8)
w:
and the dual function IR+ ~ IR is defined as the infimum of the Lagrangian over all probability distributions,
(2.9) An optimizer
,P can be characterized by taking directional derivatives. Let
, E Mbe arbitrary, and define
,e
== 0,
+ (1 -
P
d,P
O),p. Then the first order
condition for optimality is,
o~
de. I dO £(, ,p)
e=o
_
- \, - , ,log -d1 7f
+ P(log d,P d °- 1]) ). 7f
On setting, == 6x , the point-mass at x E X, we obtain the bound,
d,P
log dn 1 (x)
d,P + p log dno (x)
s: DhPlldn 1 ) + pDhPlldnO),
,*
,P
and on integrating both sides of this inequality with respect to we con== ,P*, where clude that equality holds a.e. [,P]. This proves (ii), with p* is chosen so that ,P* E Q~ (7f O) . Substituting the optimizer explicit expression for the dual,
,P into the Lagrangian (2.8) leads to an (2.10)
The term within the brackets is constant, which gives,
110 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
To complete the proof of (iii) we argue that {3* is the value of the dual, (3* == max{'!J(p) : p 2: O}.
Given this combined with the expression (2.10) for w(p) we obtain the representation (iii). Part (iv) follows from Sanov's Theorem and the geometry illustrated in Figure 2: The likelihood ratio test is interpreted as a separating set between the convex sets Q1J (7f O) and Qf3* (7f1 ). 0
Robust hypothesis testing. In typical applications it is unrealistic to assume precise values are known for the two marginals 7f o, 7f1. Consider the following relaxation in which hypothesis Hi corresponds to the assumption that the marginal distribution lies is a closed, affine subset Pi c M. A robust N-P hypothesis testing problem is formulated in which the worstcase type-II exponent is maximized over 7f1 E PI, subject to a uniform constraint on the type-I exponent over all 7fo E Po: subject to
inf
7roEPo
> TJ.
J7r O ¢ -
(2.11 )
A test is called optimal if it solves this optimization problem. The optimization problem (2.11) is considered in [34, 35, 33] in the special case in which the uncertainty sets are defined by specifying a finite number of generalized moments: A finite set of real-valued continuous functions {fj : i == 1, ... , n} and real constants {c~ : i == 1, ... , n} are given, and P i :==
{7f
EM: (7f,fj) == c~,
j
== 0, ... ,n},
i == 0,1.
(2.12)
As a notational convenience we take fo == 1 and c6 == 1. It is possible to construct a simple optimal test based on a linear function of the data. Although the test itself is not a log-likelihood test, it has a geometric interpretation that is entirely analogous to that given in Theorem 2.1. The value (3* in an optimal test can be expressed, (2.13)
Moreover, the infimum is achieved by some u" E Q~(Po) n Qt*(P I ) , along with least favorable distributions 7fo E Po, 7fi E PI, satisfying
The distribution J.-l* has the form J.-l*(x) == eo (x)7f o(x), where the function eo is a linear combination of the constraint functions {fi} used to define Po. The function log eo defines a separating hyperplane between the convex sets Q~ (Po) and Qt* (IP\), as illustrated in Figure 3.
111
ENTROPY, INFERENCE, AND CHANNEL CODING
...............................~ .::
:.:.:.: ;; f............ ·
_
..,:.:;.:.,:...::
,:
(J.l,log(io» = (J.l- , log(io)) .
Fig. 3: The two-moment worst-case hypothesis testing problem . The uncertainty classes P i, i = 0, 1 are determined by a finite number of linear constraints, and the thickened regions Q7/(lPo), Qt'- (lPI) are each convex. The linear thres hold test is interpreted as a separating hyperplane between these two convex sets .
Note that log f o is defined everywhere, yet in applications the likelihood rato df.L* jd7r may be defined only on a small subset of X. PROPOSITION 2.1. Suppose that the moment classes JIll o and JIll l each
o
satisfy the non-degeneracy condition that the vector (ch, , c~) lies in the relative interior of the set of all possible moments {f.L(Jo , , fn) : f.L EM}. Then, there exists {Ao, . . . , An} E lR such that the function f o = 2: Adi is non-negative valued, and the following test is optimal
rP'N = 0 {=:} ~
N -I
L 10g(fo(X d ) ::; n.
(2.14)
t=o
Proof The result is given as [34, Proposition 2.4]. We sketch the proof here, based on the convex geometry illustrated in Figure 3. Since (JIllo) and (JPlI) are compact sets it follows from their construction that there exists f.L* E Qt(JIllo)nQt-(JPll) ' Moreover, by convexity there exists some function h: X ~ lR defining a separating hyperplane between the sets Q;(JIllo) and Q;_ (JP>t) , satisfying
Qt
Qt-
The remainder of the proof consists of the identification of h using the Kuhn-Tucker alignment conditions based on consideration of a dual functional as in t he proof of Theorem 2.1. 0
2.2. Mutual information. In this section we derive the expression (1.5) for channel capacity based on Theorem 2.1, following ideas in Anantharam [3] (see also [12, 15].) Consider the decoding problem in which a set of N-dimensional code words are generated by a sequence of random variables with marginal distribution f.L . The receiver is given the output sequence {Yl , . . . YN} and considers an arbitrary sequence from the code book {xt,... X'j"}, where i is the index in a finite set {I , .. . , eNR} , where R is the rate of the code. Since Xi has marginal distribution f.L, Y has marginal distribution P/L defined in (1.2).
112 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI For each i, this decision process can be interpreted as a binary hypothesis testing problem in which Hi, is the hypothesis that {(X;, Yj) : j == 1, ... ,N} has marginal distribution
That is, Xi and Yare independent if codeword i was not sent. Hypothesis HI is the hypothesis that i is the true code word, so that the joint marginal distribution is 7f1
:== J-i, 8 p[dx, dy] :== J-i,(dx)p(ylx)dy.
Suppose that the error exponent TJ > 0 is given, and an optimal N-P LRT test is applied. Then I¢ == TJ means that,
== -
1 lim N log(P{Code word i is accepted
N~oo
Iii- i*}),
where the index i* denotes the code word sent. Consideration of eRN codewords, our interest lies in the probability that at least one of the eRN - 1 incorrect code words is mistaken for the true code word. We obtain through the union bound, P{The true code word i" is rejected}
:S" lim P{Code word i is accepted I i Z::
N-+oo
=1=
i*},
i#i*
from which we obtain, TJ - R ~ -
1 lim N 10g(P{The true code word i* is rejected}).
N-+oo
(2.15)
We must have R < 1] to ensure that right hand side is positive, so that the probability that the true code word i* is rejected vanishes as N ~ 00. One must also ensure that TJ is not too large, since it is necessary that {3* > 0 so that J¢ > 0 under the LRT. Hence an upper bound on R is the supremum over TJ satisfying {3* > 0, which is precisely mutual information:
This conclusion is illustrated in Figure 4. The channel capacity is defined to be the maximum of I over all input distributions J-i, satisfying the given constraints. We thus arrive at the convex program (1.5).
ENTROPY, INFERENCE , AND CHANNEL CODING
113
p.0p
.
I-" ~pl'
Fig. 4: The cha nnel capacity is equal to t he maximal relat ive ent ropy between PI' ® J10 and PI' 0 P, over all input d istribut ions /l satisfying t he given const rai nts .
2.3. Error exponents. A representation of the channel-coding random coding exponent can be obtained based on similar reasoning. Here we illustrate the form of the solut ion, and show that it may be cast as a robust hypothesis testing prob lem of the form considered in Section 2.1. For a given I-" E M , denot e by Po the space of product measures on
XxV, Po = {I-" 0
II : II
is a probability measur e on Y},
and define the corresp onding divergence set for a given R > 0,
v
Equivalently, Q"k(J!»o) = b: min, Db II I-" 0 lI) :s; R}. The robust hypothesis testing problem is binary, with HI as defined in the channel capacity problem , but with H o defined using Po: Ho: {(XJ, lj ) : j = 1, , N} has marginal distribution nO E Po. HI : {(XJ,lj ) : j = 1, , N} has marginal distribution n l := I-" 0 p. Proposition 2.2 shows th at the random coding exponent Er(R) can be repr esented as the solution to the robust N-P hypoth esis testing problem (2.11) with TJ = R and PI = {I-" 0 pl. PROPOSITION 2.2 . Er (R ) =
S~P(i~f { jJ : Qt(1-" 0
p) n Q"k(J!»o) f-
0}) .
(2.17)
Suppose that there exists a triple (1-"* , t/ " ; ')'*) that solve (2.17) in the sens e that
Th en, there exists a channe l transition density P such that ')'*
= 1-"* 0
P,
u" =
PI'·'
114 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI and the rate can be expressed as mutual information,
Proof Blahut in [6] establishes several representations for E r , beginning with the following
Er(R) == sup p,
inf
p,8PEQ~
D({l8p II {l8p)
(2.18)
where the supremum is over all {l, subject to the given constraints, and the infimum is over all transition densities p satisfying {l 8 PE Q~ where
Q"h :== {{l8 p : D({l8 p II {l Q9 pp,) < R}. The optimization problem (2.17) is a relaxation of (2.18) in which the distributions {v} in the definition of r o are constrained to be of the form PP" and the distributions {"'(} are constrained to be of the form {l 8 P for some transition density p. It remains to show that these restrictions hold for any solution to (2.17), so that the relaxed optimization problem (2.17) is equivalent to (2.18). For fixed u; denote the infimum over (3 in (2.17) by, (3* ({l) :== inf {(3 :
Q; ({l8 p) n Q"h (ro) f= 0}.
(2.19)
If (v*, "'(*) solve (2.19) in the sense that
D("'(*
II
{l8 p) == (3* ({l),
D("'(*
1/
{l Q9 v*) == R,
then the distribution "'(* solves the ordinary N-P hypothesis testing problem with nO == {l Q9 v* and n 1 == 11 8 p. It then follows that the first marginal "'(i is equal to {l by the representation given in Theorem 2.1 (ii). Moreover, the second marginal of "'(2 can be taken to be v* since for any t/,
These conclusions imply that "'(* == {l 8 P for some channel density p, and u" == PP" which establishes the desired equivalence between the optimiza0 tion problems (2.17) and (2.18). The solution to the optimization problem (2.19) is illustrated in Figure 5. The channel transition density p shown in the figure solves
(3* (11) == inf {(3 : Qt ({l8 p) n Q~(110 pp,)
f= 0}
== D(118p 11118P)· The error exponent is equal to the maximal relative entropy (3* ({l) over all u, and the rate can be expressed as mutual information R == I({l*;p) :== D(I1* 8 P II {l* 0 pp,) where 11* is the optimizing distribution.
ENTROPY, INFERENCE, AND CHANNEL CODING
115
Fig . 5: The error exponent. is equal to the solution of a robust. N-P hypothesis testing problem .
3. Convex optimization and channel coding. The alignment conditions for the N-P hypothesis testing problem were derived in Theorem 2.1 based on elementary calculus. Similar reasoning leads to alignment conditions characterizing channel capacity and the optimal error exponent. 3.1. Mutual information. Recall that the channel sensitivity function gJ.'(x) is the point-wise relative entropy, glJ.(x) := D(P( · I x)lIp(· I f-L)) . The next result, taken from [25], shows that glJ. is the gradient of I at f-L. 0 PROPOSITION 3.1. For any given f-L ,f-L E M((J"~,M,X), (i) I(f-L) = (f-L , glJ.) = max (f-L , gJ.'f) . IJ.'EM
(ii) Letting f-Lo := (1- ())f-L0 + ()f-L , the first-order sensitivity with respect to () E
[0, 1] is (3.1)
o The dual functional \It : JR+ ---. JR+ is defined by \It(r) = sup [I( f-L ) - r(f-L ,¢)],
r
2: 0 ,
(3.2)
IJ.EM o
where Mo = M(M 2 , M,X) = M(oo, M,X) denotes the constraint set without an average power constraint. The dual functional is a convex, decreasing function of r , as illustrated in Figure 6. Note that we do not exclude M = 00 . In this case, M o = M , which denotes the set of probability distributions on X. The parameter r provides a convenient parameterization of the optimization problem (1.5). The proof of Theorem 3.1. may be found in [25, Theorem 2.8]. THEOREM 3.1. If M < 00, then an optimizing distribution f-L; exists for (3.2) for each r > 0, and the following hold: (i) The alignment condition holds, g,dx) ::; \It(r) with equality a.e. [f-L; ].
+ r x2 ,
Ixl ::; M ,
116
J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
\11(1')
Co - 1'oa 2(1'o)
:
.
r TO
Fig. 6: The dual functional is convex and decreasing. For a given ro > 0, the slope determines an average power constraint a 2(ro), and the corresponding capacity Co :== C((T2(ro), M, X) may be determined as shown in the figure.
(ii) Let a 2(r) :== - ddT \lI(r). The distribution J-l; is optimal under the corresponding average power constraint:
Moreover, we have I(J-l;) == \lI(r) + ra 2(r). (iii) The capacity C( . , M, X) is concave in its first variable, with
d 2 -d 2 C(ap,M,X) == r, ap
o
3.2. Error exponents. Boundedness of the sensitivity function central to the analysis of [25]. LEMMA 3.1. 0 < g~(x) ~ 1 for each x, and g~ -4 0 as x -4 00. Continuity of GP follows easily from Lemma 3.1. The following set results establishes convexity and differentiability of GP with respect to For u, /Lo E M and 0 E [0,1] we denote /L{} :== (1 - O)J-l° + O/L. PROPOSITION 3.2. For any given u, J-l0 E M(a~, M, X) and p > 0, (i) For a given p, the mapping GP: M(a~, AI, X) r--t IR+ is continuous in the weak topology. (ii) The functional GP is convex, and can be expressed as the maximum of linear functionals,
is
0 of J-L.
(iii) Fix p 2: 0, /Lo EM. The first order sensitivity is given by
o For fixed p, the optimization problem (1.9) is a convex program since GP is convex. Continuity then leads to existence of an optimizer. The
ENTROPY, INFERENCE, AND CHANNEL CODING
117
0.77 0.75 0.74 0.73
0.72 0.71 l--_"""---_~_"""""___--'--
_
__'__~
X
Fig. 7: Numerical results for the Rayleigh channel Y = AX constraint for rate R < C, with O"~ = O"J..., = 1 and p = 0.5.
+N
subject to peak-power
following result from [26] summarizes the structure of the optimal input distribution. It is similar to Theorem 3.1 which requires the peak power constraint M < 00. We stress that this condition is not required here. THEOREM 3.2. For each p ~ 0, there exists J-LP E M((J~,M,X) that achieves GP*. Moreover, a distribution J-L0 E M((J~,M,X) is optimal if and only if there exists a real number Ai and a positive real number A2 such that
If thes e conditions hold, then
o Shown in Figure 7 are plots of g~ for two distributions. The input distribution J-Lo violates the alignment condition in Theorem 3.2 (ii), and hence is not optimal. The alignment condition does hold for J-Ll, and we conclude that this distribution does optimize (1.5). PROPOSITION 3.3. For given p, any optimal input distribution J-L* achieving GP* is discrete, with a finite number of mass points in any interval. Proof To see that the optimizer g~* is discrete consider the alignment conditions. There exists a quadratic function q* satisfying q*(x) ::; g~* (x), with equality a.e. [J-L*]. Lemma 3.1 asserts that g~* takes on strictly positive values and vanishes at infinity. It follows that q* is not a constant function, and hence q*(x) --+ -00 as x --+ 00. This shows that the optimizer has bounded support, with
Moreover, since g~* is an analytic function on X it then follows that g~* (x) == q*(x) is only possible for a finite number of points. 0
118 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
Optimal binary distributions. Gallager in [19] bounds the random coding exponent by a linear functional over the space of probability measures. The bound is shown to be tight for low SNR, and thus the error exponent optimization problem is converted to a linear program over the space of probability measures. An optimizer is an extreme point, which is shown to be binary. Similar arguments used in [25] can be generalized to the model considered here. We begin with consideration of zero SNR, which leads us to consider the senstivity function using J..L == 60, the point mass at 0, denoted 9g :== 9~o(X). It is easy to see 9g(0) == 1, and we have seen that 9g(X) ~ 1 everywhere. Given the analyticity assumption, it follows that this function has zero derivative at the origin, and non-positive second derivative. We thus obtain the bound, dlog(l - 9g(X)) I > 2, dlog(x) x=o with equality holding if and only if the second derivative of 9g (x) is non-zero at x == O. We have the following proposition concerning the binary distribution at low SNR. This extends Theorem 3.4 of [25] which covers channel capacity. However, unlike this previous result and the result of [19], we do not require a peak power constraint on the input distribution. PROPOSITION 3.4. Consider a channel with X == IR+. For a fixed p > 0, suppose that the following hold,
(i) dd;29b (0) == O. (ii) There is a unique
Xl
>0
satisfying
dlog(1 - 9g(X)) d log(x) (iii) There is 'positive sensitivity' at
I
== 2.
X=Xl
Xl:
~ (dlog(l - 9g(X))) I dx
dlog(x)
:f O.
X=Xl
Then, for all SNR sufficiently small, the optimal input distribution is binary with one point at the origin. 0 The proof of Proposition 3.4 may be found in [26]. We illustrate the proof and its assumptions using the Rayleigh channel. Given the channel transition probability function (1.15), the sensitivity function is, X ~
O.
ENTROPY, INFERENCE, AND CHANNEL CODING g~(x)
dlog(l- 9b(X)) dlog(:r)
qp. (X) quadratic
", , " ,, ,
119
'iii:
2
----------------
-------------
Xl
,, , 0.5
gb
Fig. 8: Optimal binary distribution for the Rayleigh channel. At left is a plot of together with the quadratic function aligned with The two functions meet at only two points. Shown at right is a plot of the log-derivative: The nonzero point of support for the optimal binary input is found by setting this equal to 2.
gb .
From the plot shown at left in Figure 8 we see that there exists a quadratic function qo satisfying qo (x) ~ gg (x), with equality at the origin and precisely one Xl > o. The plot at right shows that (iii) holds, and hence that all of the assumptions of Proposition 3.4 are satisfied. Consequently, with vanishing SNR, the optimizing distribution is binary, and approximates the binary distribution supported on {O,Xl}.
4. Signal constellation design. We now show how the conclusions of this paper may be applied in design. For a symmetric, complex channel we have seen in Theorem 1.1 that the optimal input distribution is circularly symmetric on C, and discrete in magnitude. We consider in this section discrete approximations of this optimal distribution on C. We propose the following approach to signal constellation design and coding in which the signal alphabet and associated probabilities are chosen to provide a random code that approximates the random code obtained through the nonlinear program (1.5). We conclude this paper with examples to illustrate the performance of this approach, as compared to standard approaches based on QAM or PSK. Complex A WGN channel. This is the complex channel model given by Y == X + N with N complex Gaussian. Examples considered in [7] suggest that QAM typically outperforms PSK when the constellation sizes are fixed, and the signal to noise ratio is large. For small SNR, it is known that QAM and PSK are almost optimal (see [7, Figure 7.11], [40], and related results in [37]). Figure 9 shows results using two signal constellation methods: 4-point QAM and a 5-point distribution which contains the 4-point QAM plus a point at origin. The 5 point distribution is an approximation to the optimal input distribution, which is binary in magnitude, and uniformly distributed in phase. The 5-point constellation performs better than 4-point QAM by about 13%, with lower power consumption, 0
120
J. HUANG, C. PANDIT, S.P, MEYN, M. MEDARD, AND V. VEERAVALLI
• :3
..
•
It
.'
'3
0
'.
.'
Fig. 9: Left: the 4-point QAM signal constellation for complex AWGN channel Y == X + N, with u~ == 9 and uJv == 1, i.e. SNR == 9 (9.54dB). The mutual information achieved by this 4-point QAM is 1.38 nats/syrnbol. Right: A 5-point constellation signal constellation for complex AWGN channel Y == X + N, with uJv == 1, with 4 points (with equal probability) at the same position as QAM plus one point at origin with probability equal to 0,1077. The constellation achieves 1.52 nata/symbol mutual information.
1.71
5.31
: 5
Fig. 10: Left: the 16-point QAM signal constellation for Rayleigh channel Y == AX + N, with u~ == 1, CTJv == 1, and average power constraint u~ == 11.7. The mutual information achieved is 0.1951 nata/symbol. Right: A 2-point constellation with one point at origin (with probability 0.5346) and another point with magnitude 5, for the same channel model and average power constraint. The mutual information achieved is 0,4879 nata/symbol, which is 2.5 times more than that achieved by the 16-point QAM.
Rayleigh channel with low SNR. We consider the normalized model in which A, N are each Gaussian, mutually independent, and circularly symmetric, with O"~ == 1, O"Jv == 1. Consideration of the magnitude of X and Y leads to the real channel model with transition density shown in (1.15). We compare codes obtained from the two constellations illustrated in Figure 10. The first constellation is a 16-point QAM. Since the code used in QAM is a random code with uniform distribution, the average power is given by (J~ == 11.7. The second constellation has only two elements: one point at origin and another point at position 5 E C. The weights are chosen so that the average power is again (J~ == 11.7, which results in fL{O} == 1- fL{5} == 0.5346. This is the optimal input distribution when the peak-power constraint M == 5 is imposed.
121
ENTROPY, INFERENCE, AND CHANNEL CODING
2.57
7.71
:2.7
:8
Fig. 11: The plot at left shows the 16-point QAM signal constellation, and at right is shown a three-point constellation with one point at origin; one point on a circle of radius 2.57; and the third point on a circle of radius 8. The respective probabilities are uniform for the QAM code, and given by (0.5346,0.1379,0.397) for the respective codewords in the three-point constellation.
Computations show that the simpler coding scheme achieves mutual information 0.4879 nata/symbol, which is about 2.5 times more than the mutual information achieved by the 16-point QAM code. 0
Rayleigh channel with high SNR. In this final example the same parameters used in the previous experiment are maintained, except now the average power is increased to a~ == 26.4. The optimal input distribution is given as follows when the channel is subject to the peak power constraint IX/ ~ 8: The phase may be taken uniformly distributed without any loss of generality, and the magnitude has three points of support at {D.O, 2.7, 8.0} with respective probabilities {0.465, 0.138, 0.397}. Consequently, we propose a constellation whose magnitude is restricted to these three radii. This is compared to 16-point QAM. The two constellation designs are illustrated in Figure 11. If the probability weights {0.465, 0.138, 0.397} are used in the proposed constellation design, then the resulting mutual information is 0.5956 nata/symbol, which is about 3 times larger than the mutual information achieved by the 16-point QAM. 0 5. Conclusions. Many problems in information theory may be cast as a convex program over a set of probability distributions. Here we have seen three: hypothesis testing, channel capacity, and computation of the random coding exponent. Another example considered in [24] is computation of the distortion function in source coding. Although the optimization problem in each case is infinite dimensional when the state space is not finite, in each example we have considered it is possible to construct a finite dimensional algorithm, and convergence is typically very fast. We believe this is in part due to the extremal nature of optimizers. Since optimizers
122 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI
have few points of support, this means the optimizer is on the boundary of the constraint set, and hence sensitivity is typically non-zero. There are many unanswered questions: (i) The theory described here sets the stage for further research on channel sensitivity. For example, how sensitive is the error exponent to SNR, coherence, channel memory, or other parameters. (ii) It is possible to extend most of these results to multiple access channels. However, we have not yet extended the cutting plane algorithm to MIMO channels, and we don't know if the resulting algorithms will be computationally feasible. (iii) Can we apply the results and algorithms we have here to adaptively construct efficient constellations for fading channels?
REFERENCES [1] I.C. ABOU-FAYCAL, M.D. TROTT, AND S. SHAMAI. The capcity of discrete-time memoryless Rayleigh-fading channels. TIT, 47(4):1290-1301, May 2001. [2] I.C. ABOU-FAYCAL, M.D. TROTT, AND S. SHAMAI. The capacity of discretetime memoryless Rayleigh-fading channels. IEEE Trans. Inform. Theory, 47( 4):1290-1301, 2001. [3] V. ANANTHARAM. A large deviations approach to error exponents in source coding and hypothesis testing. IEEE Trans. Inform. Theory, 36(4):938-943, 1990. [4] R"R. BAHADUR. Some Limit Theorems in Statistics. SIAM, Philadelphia, PA, 1971. [5] D.P. BERTSEKAS. Nonlinear Programming. Athena Scientific, Belmont, MA, 1999. [6] R.E. BLAHUT. Hypothesis testing and information theory. IEEE Trans. Information Theory, IT-20:405-417, 1974. [7] R.E BLAHUT. Principles and Practice of Information Theory. McGraw-Hill, New York, 1995. [8] .I.M, BORWEIN AND A.S. LEWIS. A survey of convergence results for maximum entropy. In A. Mohammad-Djafari and G. Demoment, editors, Maximum Entropy and Bayesian Methods, pp. 39-48. Kluwer Academic, Dordrecht, 1993. [9] S. BOYD AND L. VANDENBERGHE. Convex Optimization. Cambridge University Press, Cambridge, 2004. [10] T.H. CHAN, S. HRANILOVIC, AND F.R. KSCHISCHANG. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. to appear on IEEE Trans. Inform. Theory, 2004. [IIJ RONG-RONG CHEN, B. HAJEK, R. KOETTER, AND U. MADHOW. On fixed input distributions for noncoherent communication over high SNR Rayleigh fading channels. IEEE Trans. Inform. Theory, 50(12):3390-3396, 2004. [12] T. COVER AND J. THOMAS. Elements of Information Theory. Wiley, New York, 1991. [13] I. CSISZAR. Sanov property, generalized I -projection and a conditional limit theorem. Ann. Probab., 12(3):768-793, 1984. [14] 1. CSISZAR. The method of types. IEEE Trans. Inform. Theory, 44(6):2505-2523, 1998. Information theory: 1948-1998. [15] A. DEMBO AND O. ZEITOUNI. Large Deviations Techniques And Applications. Springer-Verlag, New York, second edition, 1998. [16] PAUL DUPUIS AND RICHARD S. ELLIS. A weak convergence approach to the theory of large deviations. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, 1997. A Wiley-Interscience Publication.
ENTROPY, INFERENCE, AND CHANNEL CODING
123
[17] S.-C. FANG, .I .R. RAJASEKERA, AND H.-S.J. TSAO. Entropy optimization and mathematical programming. International Series in Operations Research & Management Science, 8. Kluwer Academic Publishers, Boston, MA, 1997. [18] R.G. GALLAGER. Information Theory and Reliable Communication. Wiley, New York, 1968. [19] R.G. GALLAGER. Power limited channels: Coding, multiaccess, and spread spectrum. In R.E. Blahut and R. Koetter, editors, Codes, Graphs, and Systems, pp. 229-257. Kluwer Academic Publishers, Boston, 2002. [20] J.D. GIBSON, R.L. BAKER, T. BERGER, T. LOOKABAUGH, AND D. LINDBERGH. Digital Compression for Multimedia. Morgan Kaufmann Publishers, San Fransisco, CA, 1998. [21] M.C. GURSOY, H.V. POOR, AND S. VERDU. The noncoherent Rician fading channel - part I: Structure of capacity achieving input. IEEE Trans. Wireless
Communication (to appear), 2005. [22] M.C. GURSOY, H.V. POOR, AND S. VERDU. The noncoherent Rician fading channel - part II: Spectral efficiency in the low power regime. IEEE Trans. Wireless
Communication (to appear), 2005. [23] W. HOEFFDING. Asymptotically optimal tests for multinomial distributions. Ann. Math. Statist., 36:369-408, 1965. [24] J. HUANG. Characterization and computation of optimal distribution for channel coding. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, Illinois, 2004. [25] .1. HUANG AND S.P. MEYN. Characterization and computation of optimal distribution for channel coding. IEEE Trans. Inform. Theory, 51(7):1-16, 2005. [26] .1. HUANG, S.P. MEYN, AND M. MEDARD. Error exponents for channel coding and signal constellation design. Submitted for publication, October 2005. [27] M. KATZ AND S. SHAMAI. On the capacity-achieving distribution of the discretetime non-coherent additive white gaussian noise channel. In Proc. IEEE Int'l. Symp. Inform. Theory, Lausanne, Switzerland, June 3D-July 5., p. 165,2002. [28] M. KATZ AND S. SHAMAI. On the capacity-achieving distribution of the discretetime non-coherent additive white Gaussian noise channel. In 2002 IEEE International Symposium on Information Theory, p. 165, 2002. [29] S. KULLBACK. Information Theory and Statistics. Dover Publications Inc., Mineola, NY, 1997. Reprint of the second (1968) edition. [30] A. LAPIDOTH AND S.M. MOSER. Capacity bounds via duality with applications to multiple-antenna systems on fiat-fading channels. IEEE Trans. Inform. Theory, 49(10), Oct. 2003. [31] DAVID J. C. MAC!(AY. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. available from http://www.inference.phy.cam.ac. uk /mackay /itila/. [32] R. PALANKI. On the capacity-achieving distributions of some fading channels. Presented at 40th Allerton Conference on Communication, Control, and Computing, 2002. [33] C. PANDIT. Robust Statistical Modeling Based On Moment Classes With Applications to Admission Control, Large Deviations and Hypothesis Testing. PhD thesis, University of Illinois at Urbana Champaign, University of Illinois, Urbana, IL, USA, 2004. [34] C. PANDIT AND S. P. MEYN. Worst-case large-deviations with application to queueing and information theory. To appear, Stoch. Proc. Applns., 2005. [35] C. PANDIT, S.P. MEYN, AND V.V. VEERAVALLI. Asymptotic robust NeymanPearson testing based on moment classes. In Proceedings of the International Symposium on Information Theory (ISIT) , 2004, June 2004. [36] J. RISSANEN. Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, 1989. [37] S. SHAMAI AND 1. BAR-DAVID. The capacity of average and peak-power-limited quadrature Gaussian channels. IEEE Trans. Inform. Theory, 41(4):1060-1071, 1995.
124 J. HUANG, C. PANDIT, S.P. MEYN, M. MEDARD, AND V. VEERAVALLI [38] J. G. SMITH. The information capacity of amplitude and variance-constrained scalar gaussian channels. Inform. Contr., 18:203-219, 1971. [39] S. VERDU. On channel capacity per unit cost. IEEE Trans. Inform. Theory, 36(5):1019-1030, 1990. [40] S. VERDU. Spectral efficiency in the wideband regime. IEEE Trans. Inform. Theory, 48(6):1319-1343, June 2002. [41] OFER ZEITOUNI AND MICHAEL GUTMAN. On universal hypotheses testing via large deviations. IEEE Trans. Inform. Theory, 37(2):285-290, 1991.
OPTIMIZATION OF WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM THROUGHPUT VIA QUANTIZED RATE CONTROL M.A. KHOJASTEPOUR*, X. WANGt, AND M. MADIHIAN*
Abstract. The facility of information exchange using wireless communication systems has affected many aspects of the modern lifestyle. In return for the growth of wireless applications, we have witnessed an ever growing demand for the high data rate wireless communication systems. However , the hostility of the wireless fading environment and channel variation makes the design of high rate communication system very challenging. To this end, multiple antenna systems have shown to be very effective in fading environment by providing significant performance improvements and achievable data rates in comparison to a single antenna systems. The performance gain achieved by multiple antenna system increases when the knowledge of the channel state information (CSI) at each end, either the receiver or transmitter, is increased. Although perfect CSI is desirable, practical systems are usually built on estimating the CSI at the receiver and possibly feeding back the CSI to the transmitter through a feedback link with a limited capacity. While most of the research efforts has been focused on the outage probability minimization through an adaptive transmission scheme, the overall evaluation of the system throughput is not well addressed. However, the throughput is the actual performance measure for most of the practical applications, such as data transfer or video streaming. In this work, we consider the problem of throughput maximization through a quantized feedback which is appropriate model for practical systems where the feedback link has limited capacity. We derive the optimal quantized rate control design for a general multiple transmit and multiple receive antenna system, and provide the mathematical framework to find such an optimal solution. Moreover, an adaptive gradient search algorithm has been proposed that can efficiently find the optimal solution. It is shown that the proposed quantized rate control design considerably improves the throughput of a system for a given average power. Equivalently, for a targeted throughput, a huge saving in power can be obtained through quantized rate control. More importantly, only a few bits of feedback per block of transmission is needed to achieve most of the gain in the knowledge of CSI at the transmitter. Practicality of such a low rate feedback highly motivate the use of the proposed rate control strategy in order to maximize the system throughput.
Keywords: Multiple antenna systems, rate control, power control, quantized feedback, channel state information, adaptive algorithm, stochastic optimization. 1. Introduction. Increasing demand for high speed and multimedia applications drives wireless market to grow in an explosive rate in order to deliver wireless data communications such as Internet access, as well as messaging, video-conferencing and other high-speed data transmission applications. The time varying nature of the channel quality in wireless environment, known as fading, causes random fluctuations in the received power level that considerably decreases the probability of the reliable decoding of the received packets. As a result, the received packets have to *NEC Laboratories America, 4 Independence Way, Suite 200, Princeton, N J. tDepartment of Electrical Engineering, Columbia University, New York, NY. 125
126
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
be dropped if the attempted transmission rate exceeds the instantaneous channel capacity, that is defined as transmission being in outage. Therefore, overcoming the effect of fading is the main challenge in increasing the throughput and achieving high speed transmission rate in wireless communication systems. Two promising technologies to combat fading are (i) using feedback in order to adaptively change the transmission strategy for different channel states, and (ii) using multiple antenna systems in order to provide multiple copies of the transmitted signals through space diversity. On the one hand, different levels of the channel state information at the transmitter (CSIT) or the receiver (CSIR) directly affect the design of communication system and results in different achievable performance. Feeding back the channel state information to the transmitter can significantly improve the outage performance of wireless communication systems. However, some performance metrics are less sensitive than the others to the level of the channel state information. For example, if the channel state information is perfectly known at the receiver, the ergodic capacity of a fading channel can only be marginally improved by providing the perfect knowledge at the transmitter [1]. However, the outage capacity will be significantly affected by the knowledge of the channel state at the transmitter
[2-4]. On the other hand, Multiple antenna systems have proved to be very effective to combat fading and provide significant performance improvements to achieve higher achievable data rates in comparison to a single antenna systems [2-8]. The gain in having the knowledge of channel state information are even more considerable for multiple antenna systems. However, attaining perfect knowledge of channel state information is usually not practical where this knowledge is obtained through a feedback link with limited capacity. Nevertheless, the huge gain in having side information at the transmitter would entice us to obtain and use even a partial information. The performance improvement by increasing the number of antennas in a MIMO system with perfect channel state information at the transmitter and receiver can also be understood from the increasing number of channel parameters. The complexity of the estimation of all the parameters for a practical system is then grows fast. Moreover, the channel state information at the transmitter is normally obtained by a feedback from the receiver to the transmitter. Therefore, the more the number of channel parameter, the higher the feedback rate that is needed to provide the channel state information at the transmitter. In this paper, we consider general multiple antenna communication systems over block fading channel with the goal of maximizing the overall system throughput that is defined as the time average of the total sum of information packets that have been successfully decoded over total transmission time. Since outage is the primary cause of the physical layer packet drop and decrease in wireless system throughput in block fading channel,
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
127
there has been quite a lot of efforts focused on of the outage minimization through an adaptive transmission scheme [2-5,8-10]. Although minimizing the outage probability results in smaller frame error rates and increase in throughput, we show that it is not sufficient for throughput maximization. In order to optimize the throughput, the transmission strategy should adapt the rate of the transmission with the channel variations. When the channel has a better quality, a higher rate should be used and when the channel suffers from a deep fading and has poor quality, the transmission rate should be lowered enough to allow the packet to go through. Therefore, especially if the feedback capacity is limited, it is more important to control the transmission rate rather than minimizing the outage probability through beamforming[ll], precoding[12], power control [9,10] or any other adaptive transmission strategy. While we are considering block fading channel model in this work, It should be pointed out that the same intuition about adapting the transmission rate with the channel variations hold for the ergodic fading model [1]. The work in [1] has shown that in the presence of the channel state information at the transmitter as well as the receiver, by controlling the rate and the power the capacity of the channel can be achieved. However, it can be shown that for the ergodic channels the rate control is not necessary to achieve the capacity [13], while our results confirms that the rate control is an absolute need to maximize the throughput in block fading channels. The saddle difference is that in the ergodic environment each codeword experiences many different channel realizations, therefore it is possible to achieve any rate close to the ergodic capacity where all codeword are transmitted with the same rate but by an appropriate power adaptation strategy [13]. We find the optimal quantized rate control strategy through finite rate feedback and we show that significant gains can be achieved by controlling the transmission rates of multiple antenna systems over wireless block fading channels. More importantly, we show that a very low rate feedback is sufficient to achieve a high percentage of the possible gain. This low rate feedback simplifies the design procedure and makes our proposed scheme to be easily implementable in practice. Moreover, We present a mathematical framework based on the stochastic gradient search algorithm and numerical optimization methods in order to find the optimal solution for multiple input and multiple output (MIMO) antenna systems. The rest of the paper is organized as follows. First, We review related work and prior art on adaptive modulation and rate control for the purpose of throughput maximization in Section 2 and then' we describe the system and channel model in Section 3. In Section 4, we briefly review some outage minimization results under different channel state information assumptions. In Section 5, we discuss the low rate feedback .design and strategies and also discuss the effect of estimation error at the receiver. We then discuss the distribution of the supportable rate by the channel in Section 6. In Section 7, we present the optimal rate control strategy through quantized
128
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
feedback for a system with perfect channel state information at the receiver which exploits on optimal codes in each block where the feedback link is error free. In Section 8, the design of the quantized rate control based on the stochastic gradient search algorithm is presented. Finally, we conclude in Section g.
2. Related work. Rate control for communication in fading environment have been studied earlier in the context of adaptive modulation. The role of feedback in improving the performance of communication systems in fading environment has been long established [14] where the feedback information is used to adapt the characteristics of the transmitted signals (such as power and rate) to the variation of the fading channel. The early works on such an adaptive modulation were motivated by the capacity analysis of the single input single output (SISO) communication systems in ergodic fading environment with the knowledge of the channel state information at the transmitterjl] and its extension under different adaptive transmission and diversity techniques for Rayleigh fading environment [15]. It has been shown that in Using uncoded transmission in 8180 systems over Rayleigh fading environment, a typical gain of 20dB can be achieved by adapting the modulation and power control relative to a non-adaptive transmission strategy[16]. However, if adaptive modulation is not used and only power is controlled, about 5 - 10dB of this gain will be lost [16]. The variable-rate and variable-power scheme of [16] for transmission of uncoded MQAM signals over fading channel exhibits a huge gap of IldB to Shannon capacity of the fading channel with the knowledge of the channel state at the transmitter and the receiver. The subsequent work applies coset codes to adaptive modulation scheme that almost break the gap in half[17]. Although the l l dls gap to Shannon limit was partially bridged with addition of trellis and lattice code to adaptive modulation [17], a gap of about 6dB still remains. The above works consider SISO communication systems in fading environment and exploits adaptive modulation and channel coding scheme to come close to the ergodic capacity (with CIST) of the channel. However, it is known that the gain in capacity with the knowledge of the channel state at the transmitter in fading environment is negligible[l] and practically the capacity of the S1SO fading channel with and without CS1T are very close. With the recent improvement in coding theory and advent of very good codes such as turbo code [18] and low density parity check (LDPC) codes[19] that can come very close to Shannon limit of the channel, it is possible to approach the ergodic capacity (without CSIT) of the fading channel that is theoretically close to the capacity with CSIT for SISO systems. While coding provide most of the performance gain in achieving the ergodic capacity of a fading channel, the communication in block fading channel is limited by its outage performance that is not much improved
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
129
even if the strongest code is exploited. In order to minimize outage and maximize the throughput signal adaptation such as rate control or adaptive modulation becomes an integral part of a system in order to achieves close to the optimal performance. Recent work on adaptive turbo coded modulation over flat fading channels[20] has shown considerable improvement (about 3dB) over adaptive trellis coded modulation[17] and as a result bridges the gap to the ergodic capacity of the fading channel to less than 3dB. The problem formulation in this work [20] is applicable to block fading channels where each codeword only spans the duration of the coherence interval of the channel, however, this work is still based on the ergodic capacity formulation where the goal is to maximize the throughput that is defined as (2.1) subject to an average power constraint (2.2) and a bit error rate (BER) constraint (2.3) where, represents the received SNR, p(,) denotes the distribution of " b(,) is the instantaneous transmission rate, and S(,) is the average power used for the channel state ,. This definition of throughput is just based on the attempted transmission rate and does not account for the decoding error, therefore it is
different from our definition of the throughput in this paper that is defined (see Equation 7.4) as the temporal average sum of all the reliably decoded information received at the receiver. Moreover, the constraint (2.3) seems to be an artificial limit and not necessary constraint which is imposed in order to make the problem tractable. In the one hand, the total delivered information to the receiver is not maximized by using equal BER for both the channel state with high transmission rate and low transmission rate. On the other hand, bit error rate (BER) is not really an appropriate measure for most of the modern packetized communication application in which a packet is dropped even if one bit is in error. For such a system packet error rate, aka frame error rate (FER), is a more appropriate measure of performance where outage probability as an information theoretical measure serves as a good indicator of FER for practical systems[21]. To comply with this fact, our definition of throughput in (7.4) is based on the temporal average of the information rate of the decodable packets at the receiver.
130
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
Some extensions and interesting works on the problem of throughput maximization via. adaptive modulation for SISO systems under correlated fading assumption and having multiple classes of data can be found in [22,23]. In this work, we are interested on the problem of throughput maximization for a multiple input and multiple output (MIMO) systems in which the gain in using adaptive transmission is much more interesting and the penalty in not using an adaptive transmission scheme is potentially huge. The work of [24] proposes a simple adaptive modulation scheme for MIMO systems which uses the SINR (signal to interference and noise ratio) information at the receiver in order to maximize the data rate for each transmit antenna, and then through an iterative adaptive modulation algorithm selects an optimal (sub )set of transmit antennas and maximizes the data rate on the selected transmit antennas. The later work on throughput maximization with multiple codes[25] shows that splitting the channel orthogonally in time, frequency, or among the inputs of a MIMO system and then transmitting different codewords on each orthogonal sub-channel is considerably suboptimal and significantly reduces the achievable average throughput of the system. However, in this work [25], no feedback link is used to choose multiple codes in order to maximize the throughput and essentially the system works with perfect CSIR and no CSIT. The problem of adaptive modulation for multiple antenna systems with perfect knowledge of CSIR and CSIT has also been addressed in [26]. This work [26] addresses the problem of bit allocation for the different transmission modes of a MIMO channel by using channel singular value decomposition and assigns appropriate number of bits for each eigenmode of the channel. However, the assumption of perfect CSIT is not normally possible for a practical communication system. Another point where our work differentiates from the prior works on adaptive modulation for MIMO systems is where instead of no knowledge of CSIT or perfect knowledge of CSIT, we consider a quantized CSIT that is made possible through using a feedback link with a very limited capacity. Also, we consider coding over all antennas and do not consider rate splitting over different antennas as the latter case is shown to be suboptimal[25]. 3. System model. We Consider the following complex baseband model for a multiple antenna system with M transmit antennas and N receive antennas, depicted in Figure 1,
y == Hx+w,
(3.1)
where XM x 1 is the vector transmitted symbols, HN x M is the channel matrix, w N x 1 is a circularly symmetric complex additive white Gaussian noise with zero mean and variance one, and YNxl is the received signal. We consider a block fading channel model in which the channel remains con-
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
131
"u
u Encoder
F(H)
i
L _
~
__
~
Decoder
.•
Low rate feedback
.
_
/\
A
I - - --' H = {hki
FIG. 1. Representation of the system Model, including a general MxN MIMO system of with chann el estimator which provides the knowledge of the channe l state information at the receiver and a feedback link (perfect or finit e rat e of feedback) which provide the channe l state information at the transmitter.
stant during transmission of each packet (or codeword of length T) and it changes independently from a block to another block, where the distribution of th e channel state is known a priori . The average power constraint on th e transmissions can be expressed as E[x H x] ~ P . Equivalently, since tr(xx H ) = tr(xHx) and the expectation and trace commute we have E[xHx] ~ P. Alternately, we can consider a codeword X = (X1X2 .. . XT) and the received vectors of Y = (Y1Y2 . . . YT) of the block length T in which the channel is constant and write the channel model as
(3.2) and the power constraint is expressed as Etr[XHX] ~ PT , where PT is the total average power constraint per transmission block of length T . The discussion of this paper is applicable to the independent and identically distributed (i.i.d.) block fading channel model of [4,8] as well as the correlated fading model of [27] , rank deficient channels such as keyhole channel, and Rician fading in presence of the line of sight. Therefore, the channel matrix H is adopted to represent different cases. However, unless otherwise stated, we consider (i.i.d.) Rayleigh channel model which
132
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
means the elements of the channel matrix H are independent and identically distributed circularly symmetric complex Gaussian random variables with mean zero and variance one. For a system which employs a space-time code (e.g., other than transmission of independent information from different antennas) the instantaneous mutual information of the channel depends on the code structure. More precisely, for any space-time block codes [5,28] or linear dispersion code [29] we can express the instantaneous mutual information of the channel in terms of an effective channel Matrix 'H. This mutual information is an indicator of the supportable rate by the channel as a function of the channel state because this rate can be approached with a long enough outer code. In block transmission model of (3.2), if there are Q independent transmitted symbols, Ql, q2, ... .qo in each block of length T where the transmitted signals are linear combination of these signals defined as Q
X ==
L (SqCq + s~Dq)
(3.3)
q=l
for some fixed M x T complex matrices, the transmission code is called linear dispersion code (or in short LD code). The matrices C q and D q for q == 1,2, ... , Q are called dispersion matrices that completely determines the code structure, and the codeword are obtained through choices of independent symbols ql, Q2, ... , qq. For a practical code, the values of the symbols ql, q2, ... ,qQ are typically chosen from an r-PSK or r-QAM constellation and the design of the dispersion matrices is dependent of the choice of constellation. However, to have a fair comparison to the comparison of the BLAST codes, we consider choices of independent complex Gaussian input symbols in order to formulate the outage probability of linear dispersion codes that can act as a lower bound to the performance of LD codes with the same codeword size. In o~der to find the effective channel model, we first decompose Sq into the real ana imaginary parts and rewrite (3.3) in the following form Q
X ==
L (Re(sq) A~ + Im(sq) B;)
(3.4)
q=l
where A q == (Cq+Dq)T and B q == (Cq -Dq)T. For any vector z E a matrix A E e N x M , let us define [4,29]
z=[
Re(z)] Im(z) ,
eN
and
(3.5)
and
A==
Re(z) -Im(z) Im(z) Re(z)
].
(3.6)
WIRELESS MULTIPLE ANTENNA COl\1MUNICATION SYSTEM
The equivalent real channel matrix
"
ti ==
133
it can be written as
[A~hl lith l
(3.7)
.
A1 h N B1 h N and the equivalent channel model is given by
y == H8 where
8
==
+W
(3.8)
[81 82 ... 8Q]T.
4. Outage probability under different channel state information assumptions. The outage probability can be considered as a lower bound to the probability of error in communication system. However, this bound is approachable by using effective coding scheme, proper input alphabet, and long enough codes. In this paper, we consider the throughput as our performance measure where the goal is to maximize total throughput of the multiple antenna system. However, throughput is directly a function of outage probability because some packets cannot be reliably decoded at the receiver as a result of the channel outage. Due to the importance of outage analysis in our throughput optimization, we review the effect of different channel state information [2-4] assumptions on the outage probability of the system. Because we are interested in throughput maximization with finite number of feedback bits, we only discuss three scenario for outage minimization problem: (i) Where the perfect channel state information is available only at the receiver (denoted by CSIR) [30,31], (ii) Where the perfect channel state information is available both at the transmitter and the receiver (denoted by CSIR & CSIT) [32], and (iii) Where the perfect channel state information is available at the transmitter and log2(L) bits of feedback from the receiver to the transmitter will be used to perform optimal quantized power control [9]. 4.1. Perfect CSIR only. We first consider the case where the channel is known exactly at the receiver only and is not known at the transmitter. For a given channel state H channel input x such that IE [xxH ] ~ Q with average power constraint tr( Q) ~ P the outage probability is then defined as the probability that the instantaneous mutual information of the channel given by
I(x; ylH) == log (1 + HQHH)
(4.1)
falls below the attempted rate of transmission R, i.e.,
Pout(R, P) ==
inf
Q:Q2:o,tr(Q)~p
Prob(logdet (I
+ HQHH) < R),
(4.2)
134
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
where channel realization H is randomly distributed. Unlike the problem of ergodic capacity analysis of the multiple antenna systems in Rayleigh fading environment, the outage minimization problem in block fading environment is not easy. The main difficulty is that the objective function in outage probability minimization problem for multiple antenna systems is not a convex function with respect to input power distribution. In [4] it has been conjectured that the minimizer is a vector of independent Gaussian random variables which has equal power equal to P/ k over k fix positions and zero in the rest of N - k positions. This conjecture has partly been proved for multiple input and single output systems in [33-40). However, the complete answer to this conjecture is not yet known. Telatar's conjecture in [4) suggest that because of the symmetry of the channel the optimal Q is of the form
P k"dia g(ql , q2, ... .qu), 1 S; i:::; k: qi == 1, otherwise qi == O.
(4.3)
This statement is somewhat misleading and it is possible to have a symmetric channel distribution for which the above form of Q is not optimal [9]. However, for the case of symmetric and independent Rayleigh fading it seems that the conjecture is true. A more appealing property of the outage minimizing distribution in the Rayleigh fading environment is that for the practical range of outage probability (0.001,0.1), the more the number of the transmit antennas the better. We refer the reader to [9] for more detailed discussion about this conjecture. Therefore, we restrict our attention to a i.i.d. zero-mean complex Gaussian input to maximize the mutual information between the input x and output y [4] while all the transmit antennas are used. Given a choice of input such that IE[xxH ] == ~I, the mutual information (in nats /s/Hz) is given by
P I(x;ylh) = logdet(I + MHHH)
(4.4)
and the minimum outage probability is then obtained by
Pout(R,P) = Prob (lOgdet (I +
~HHH)
< R).
(4.5)
For the case of multiple transmit and one receive antenna the above formulation can be written in closed form as
Pout(R, P) = Prob (log
(1 + ~ 'Y) < R )
(4.6)
(4.7)
WIRELESS MULTIPLE ANTENNA
r
CO~IMUNICATION
SYSTEM
135
(M, M(e;-l)) f(M, 00)
(4.8)
CN(o, I) which implies that the instantaneous channel signal since H to noise ratio, , == HH H, is chi-squared distributed with 2M degrees of freedom, and its probability density function is given by f"'.J
(4.9) where I'(u, x) == fox ua-1e-Udu denotes the incomplete gamma function. From a theoretical point of view, Equation (4.4) reflects the outage probability of a multiple antenna system using BLAST [6,7,41] where independent streams are sent through transmit antennas. Equation (4.4) has a simple form since it only depends on the input and output of the channel at each channel use. However, for the general channel model (3.8), the outage probability is given in terms of the equivalent channel matrix 'H as
Pout (R, P)
= Prob (~ log det
(I + ~ HH H) < R) ,
(4.10)
where R is the attempted rate of transmission.
4.2. Perfect CSIR and CSIT. If channel state information is available to the transmitter (e.g, through a feedback link) as well as the receiver, this knowledge can be used in choosing the channel input. In particular, the transmitter can use the knowledge of the channel realization to adapt the right power allocation for each block of transmission (power control in time) and versus antennas (power control in space) by choosing right Q in Equation (4.2). However, this knowledge does not alter the optimality of the random i.i.d. complex Gaussian distribution as an optimal choice of input distribution. Therefore, channel state information at the transmitter is solely used to perform power control across both space and time. It was shown in [2], that the optimal transmission scheme for a general multiple transmit and receive antenna case is in fact decomposable into an optimal power control scheme across time concatenated with an optimal power control across space. The former is usually referred to as "power control" and the latter is referred to as "beamforming". For example, for a multiple transmit and one receive antennas, the optimal beamforming technique directs the transmission along the direction given by H H / JHH H . The optimal power control scheme for each block determines the power along this direction, P(H, R), based on the knowledge of the channel H at this block and the transmission rate R, while the long term average power constraint IE[P(h, R)] :S P is satisfied. However, for more than one receive antenna, in general there are more than one possible beamforming direction and furthermore the available
136
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
power at each block should be optimally split across these directions in order to minimize the outage probability at this block. For the general channel model (3.8), the minimum outage probability with perfect channel state information both at the transmitter and receiver is given in terms of the equivalent channel matrix H as (4.11 )
where v == HHH is the equivalent one-dimensional channel quality [9] and R is the attempted rate of transmission. The power P('Y, R) in (4.11) is solution to the short term power control problem in the form (4.12)
The value of the threshold value 1'0 can then be found as the solution of the long term power control problem such that the long term average power constraint lE[P(H,R)] :S P is satisfied. We have
1
00
)'0
(eR 1) -1'- p,(I')dl':S P
(4.13)
where the evaluated 1'0 represents the cutoff value for which the power is assigned for the transmission if the equivalent one-dimensional channel quality "y == HHH is better than this threshold. Otherwise, no power is assigned for the transmission which results in an outage event. Therefore, the outage probability can be rewritten as
1
00
Pout(R, P)
==
Prob(1' < 1'0) ==
f-,kt)d"f
(4.14)
)'0
where I, (I') denotes the probability density function of the equivalent onedimensional channel quality I' == H'HH [9].
4.3. Perfect CSIR and partial CIST via quantized feedback. The optimal power control through quantized feedback design for general multiple transmit antenna systems has been derived in [9]. It has been shown in [9] that the minimum outage probability for quantized power control with log2(L) bits of feedback (corresponding to L power levels) is obtained as
Pout(R, P)
= Prob
(~IOgdet
(I + P(~R) "f) < R )
(4.15)
where the power allocation function P(I', R) is defined as follows.
P(I',R)
==
Pi if I' E [l'i,l'i+l) for all i E {1,2, .. . ,L -1}
(4.16)
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
137
and
(4.17) for some 0
< 1'1 < 1'2 < ... < ,L < 00, where the
i t h power level is given by
e R -1
Pi == - - , for all i E {1, 2, ... , L - 1}. 1'i
(4.18)
Therefore, the minimum outage probability can be equivalently be written as
1
00
Pout(R, P) == Probf-y < 1'1) ==
j"ky)d'"'(,
(4.19)
')'1
5. Throughput maximization using rate control through quantized feedback design. Motivated by the significant potential per-
formance improvement achieved by feedback, we discuss the means of providing such an information to the transmitter in this section. In theory, we usually the channel state information is known to the receiver. However, this knowledge is obtained through channel estimation which is not perfect in two senses: (i) the estimated value has an error, and (ii) it uses some part of the available system resources. For example, in preamble-based channel estimation for M x N multiple antenna systems, there are M N unknown that can be estimated with finite variance through transmission of a long enough preamble prior to transmission of the actual message. The value of M N unknown channel coefficients can be determined through at least M N independent measurements. Choosing the a simple preamble of the form
J
Px;e I would be then sufficient and the resulting mutual information of the channel through T (assume T > M is the coherence interval) uses of the channel is then lower bounded by [10,42] (5.1) where
P. _ TP - Ppr e e> T-M is the total average power used to transmit the actual data, P is the total available average power, and Ppr e is the power used in sending preamble to estimate the channel. Therefore, the knowledge of the channel state at the transmitter has a finite variance (or error) in its estimation and is not perfect. Furthermore, this knowledge comes at the price of spending the power Ppr e' and the time fraction !;f. for training as part of the available
138
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
o Comparison of Power Control and Beamforming for 2x l MIMO System 10 . - - - -- -,-- - - -,.- - - - ,....-- - - --.-- - - ---"
-1
10
-2
.£ 10 15III .t:l
e
\
a. 10..J Q>
E 8 ...
I
/ CSIR and CSfT
10
I
11 \ Power Gontol
I
'
\~
-5
10
Estimated CSIR&CS IT
-6
10
-5
0
5
10
15
20
Average Power (dB)
FIG. 2. Comparison of the outage performance of a 2x l MIMO system with or without feedback. This figure also shows the performance with beamforming only and power control only. In general, power control becomes more important at high SNR values in comparison to beamforming .
system resources which is not used to send the actual data. A typ ical increase in the outage probability due to estimation error for a frame length of 1000 symbols is depicted in Figure 2. This figures reveals that a partial feedback (few bits per frame) with an imperfect estimation of the channe l state information at the receiver results in a considerable performance gain in comparison to a system without feedback and even perfect channel state information at the receiver. Assuming the per fect or partial knowledge of the channe l state information at the receiver, it can be sent to the transmitter via feedback. However , even if the perfect channe l state information is available at the receiver, the perfect knowledge at the transmitter cannot be assumed unless the rate of feedback is infinity because the channe l parameters are contin uous random variables. Therefo re, similar to the earlier work on power control [9, 10]' we will discuss the cases where only patrial channel state information is available at the transmitter through a finite rate feedback. It has been shown in [9] that considerable gain can be achieved with partial feedback to the transmitter for the purpose of power control in time and
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
139
minimizing the outage probability. Using partial feedback, similar gains in outage minimization are also achievable through beamforming [11,43-48]. However, for any number of feedback bits, the beamforming in comparison to the power control has a limited gain in the performance for the outage measure. From a practical point of view, a small rate of feedback can be considered to be available from the destination to the source without wasting too much of the system resources [10-12,48,49]. However, no matter how low the feedback rate is, because of the fading there is a probability of outage in receiving the crucial feedback information at the transmitter in which our design strategy depends. Therefore, it is important to incorporate the possibility of the outage (or loss in feedback information) in the design of finite rate feedback strategies. Finally, the performance of a practical finite-length code cannot be captured by the mutual information between the source and destination, even if the finite constellation input is used in the evaluation of the mutual information. In particular, the frame error rate for a practical code behaves very different from that of the optimal code, i.e., not only the frame error rate of a practical code as a function of SNR does not go to zero where the optimal codes achieve zero error rate, but also the drop in the frame error rate versus SNR is not necessarily sharp. Moreover, for a multiple antenna system the performance of the code also depends on the specific channel realization matrix, H, which make the problem even more involved. To be more specific, the evaluation of the code performance for the two different channel realization of HI and H 2 usually shows different behavior even though the instantaneous mutual information of the channel in these two cases are equal, i.e., I(x; y1H1 ) == I(x; yIH 2 ) . For the rest of this paper, we consider two different scenarios. First, we consider the case that the feedback is used for the power control and a constant transmission rate is used for all the blocks. However, we optimize the value of the attempted transmission rate. Second, we consider the case that the transmission power is fixed over each block, however, the transmission rate varies based on the channel state. At the receiver, the value of the transmission rate is chosen from a predetermined set of rates and then it is fed back to the transmitter. Furthermore, for both mentioned scenarios we assume that • (i) the codes used in each block of transmissions are optimal in the sense that they achieve arbitrarily small probability of error for the rate very close to the instantaneous mutual information of the channel, • (ii) the knowledge of the channel state information at the receiver is perfect and available without wasting any system resources, • (iii) the knowledge of the channel state information at the transmitter is available through a fixed and finite rate of feedback, e.g.,
140
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
l092(£) bits of feedback per block of transmission, where the feedback has no error.
6. Distribution of the supportable rate by the channel. The instantaneous mutual information of the channel for a block of transmission with the given channel state 'Y defines the maximum transmission rate that can be achieved with arbitrarily low probability of error in this block of transmission. For a given average power P( 'Y) per block, the supportable rate is given by G('"'() = logdet
(1 +
PI;)
'Y)
(6.1)
where '"'( == HHH is the equivalent channel quality that includes the effect of the given space-time codes. For a given average power P, the cumulative probability density function of the supportable rate by the channel, Fc(R), defined as Fc(R)
~ Prob (lOgdet (1 + PI;) 'Y)
< R)
(6.2)
which is equal to the probability of outage, Pout(R, P), for a given rate R. It has been observed that the probability density function of the supportable rate by the channel,
a
fc(R) == 8RFc(R),
is asymptotically Gaussian distributed where either the number of the transmission antennas or the number of the received antennas go to infinity [50]. Hochwald et. al. have derived the analytical expressions for the mean and variance of the distribution in three cases: (i) when the number of transmit antennas grows large and the number of receive antennas remains fixed, (ii) when the number of receive antennas grows large and the number of transmit antennas remains fixed, and (iii) when both the number of transmit and the number of receive antennas grows large but their ratio remains constant. However, even for small number of transmit and receive antennas and the practical range of the average transmission power P it can be verified that the distribution is in fact very close to Gaussian distribution. In fact, for most practical purposes, including the quantized rate control design, it is enough to find the mean and .variance of the distribution through simulation. Figure 3 shows the distribution of the probability density function of the supportable rate by a 2 x 2 multiple antenna system. This figure shows the actual distribution and the Gaussian distribution with the same mean
WIRELESS MULTIPLE ANT ENNA COM MU NICAT IO N SYSTEM
141
Distribution of the supported rate by 2><2 MIMO channel using BLAST code
0.7 P = OdB, Actual - - P = OdB, Approx imation -e- P = 1dB , Actua l -0- P = 1dB , Approx imation P = 2dB , Actual -.()- P = 2dB, Approx imation -e- P = 3dB , Actua l -0- P = 3dB , Approx imation
0.6
+
05 s:
0
.i3 c:
.2
>-
0.4
iiic
"
-c
~ 03 .B
'" e
.c
Q.
02
0.1
05
1.5
2
25
3
35
4
4 .5
5
Supported rate Rs
FIG. 3. Sa mp le dis tribution of the suppo rtable rate by the chann el as a fun ction of th e available average power. Figure shows the probability distribution fun ction for a multiple antenna transmit and multiple antenna receive system 2 x 2 that exploits BL A S T codes. Gaussian approximation of the distributions are also plott ed that shows th e accura cy of the approx imation .
and vari an ce which reveals the accuracy of the Gaussian approximation. Also, It can be seen th at for the various average power t he distribution is different and it is better approximated by the Gaussian distribution with increasing the averag e power. The works of smith and shafi [51] and also wang and Ginniakis [52] contains more det ailed discussions about the Gaussian approximation of th e mutual information for multiple transmit and multiple receive antenna syste ms, For th e rate cont rol st ra tegy in this pap er , we find t he act ual distribution of the channel through simulat ion, However , using t he Gaussian approximation is beneficial in finding a closed form expression for the gradient in Section 8, Having a closed form expression for t he gradient is usually helpful for faster and smoother convergence of t he gradient based search optimizat ion algorit hms. Still , we find the actua l mean and variance of th e distribution through simulatio n without using any ap prox imated formula to be used in the Gaussian approximation.
142
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
7. Optimal quantized feedback strategy. In this section, we evaluate the throughput performance of an analytical model for a multiple transmit antenna system under certain idealized assumption. First, we consider perfect channel state information at the receiver is available at no cost or wasting any system resources. Second, we assume that for a given channel state information the transmitted symbols are Gaussian, and furthermore the corresponding coding scheme is capacity achieving, i.e., the code maximizes the instantaneous mutual information of the channel given the channel condition, H. Finally, we assume that a error-free feedback link with a given rate is available from the receiver to the transmitter. This feedback link will be used to provide the channel state information at the transmitter. Based on the above assumption, we derive the optimal rate control strategy for a M x N multiple transmit antenna system with M transmit and N receive antennas in a block fading channel via finite number of bits of feedback. The objective is to maximize total throughput of the system by choosing the attempted transmission rate from finite number of possible rate based on the estimate of the channel at the receiver. As mentioned earlier, we assume perfect knowledge of the channel state information at the receiver, using Gaussian inputs assume that our coding scheme is capable of achieving the maximum instantaneous mutual information at each block, and finally, there is no error in the feedback link. We consider the general model of (3.8) where H represents the equivalent channel model [29] where the coding matrix is absorbed in the channel matrix. Therefore, for an attempted transmission rate of R and average power PH(H) per block of transmission, the outage probability is given by
Pout(R,P) = Prob (lOgdet (1 +
PH~1i)1i1iH)
< R)
(7.1)
where H is the equivalent channel. The problem of outage minimization is then formulated as min
P, P(--Y)EP
Prob (IOgdet (1 +
PMb),,) < R)
(7.2)
subject to
lE [P('"'()]
(7.3)
where P is the long term average power, , == HH H is the effective channel quality (defined in Section 4.1), P ~ {PI, P2 , ... , PL } is a fixed power level codebook with L number of the power levels, and P('"'() is a quantized power strategy which maps any points from the set of the effective channel qualities '"'( E r to a power level in P. When perfect knowledge of the channel state information is available at the transmitter, the power control
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
143
strategy takes its value from a continuous set that can also be interpreted as L ~ 00. We denote optimal solution of the above outage probability minimization problem by p~~i (R, P), where L denotes the number of power levels that can be used by the transmitter. We artificially denote the minimum outage probability without CSI at the transmitter by P~~~(R, P) because the transmission power is constant when the no channel state information is available at the transmitter. On the other hand, when we have perfect knowledge at the transmitter, the power level can take its value form a set of positive real numbers and we artificially denote the minimum (R, P). outage probability with perfect CSI at the transmitter [32] by The throughput for a block fading channel is defined as the average rate of information transmission from the transmitter to the receiver with asymptotically zero error probability. Because of the possibility of the outage in block fading channels, the throughput is less than the attempted rate of transmission. For a constant attempted transmission rate Rand long term average power P per packet and a given power control strategy with L bits of feedback the throughput T(R, P) is defined as
n:
(L)
T ( R,P) == R(l- Pout (R,P)).
(7.4)
Therefore, the problem of throughput maximization with quantized power control can be formulated as max R
min
P, P(,)EP
R(1 -
Prob (IOgdet
(I + PMh') 1') < R))
(7.5)
subject to
IE [P(l' )]
< P.
(7.6)
The feedback can be used to provide some information about the channel state at the transmitter and improve a given performance metric. In one hand, the feedback can be used to control the power at the transmitter to minimize the probability of the outage that also translates to minimizing the packet error. In this case, if maximizing the throughput is considered as the performance metric instead of minimizing the outage probability, the system throughput is then optimized by choosing the right value for the attempted rate of transmission R in (7.5). On the other hand, the feedback can be used to control the transmission rate per packet to maximize the throughput directly without any power control. The throughput maximization problem with quantized rate control can be formulated as
R~W:h) Rh') (1- Prob (lOgdet (I + ~I') < Rh'))).
(7.7)
The optimal rate control assumes the exact knowledge of 1', and choose the rate
Ch') = logdet
(I + PI;)1')
144
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
based on the channel state , == HH H. However, when the feedback has finite rate log, (L) bits per block, the most efficient use of the feedback signal at the transmitter for rate control is to use a different transmission rate R, for each feedback signal i E {1, 2, ... , L}. Therefore, for q bits of feedback, we need to find L == 2q -1 transmission rates R 1 , R 2 , ... , R L and a mapping function (7.8) where R == {R 1 , R 2 , ... ,R L } such that the total system throughput is maximized while the average power P is used in each block. Therefore, the set of r is partitioned into L sets of r 1 , r 2, ... , r L such that if for a block of transmission, E r i then the feedback signal i is sent to the transmitter and the associated transmission rate Pi will be used in this block. Without loss of generality assume that the transmission rates satisfy R 1 > R 2 > ... > R L corresponding to the partition f 1,f2 , ... ,f L . It can be shown that the optimal solution to the quantized throughput maximization problem (7.7), avoids outage for all the channel conditions in the first L - 1 partitions, and let the outage occur only in the last partition, fL. Moreover, the partitioning of optimal solution is such that a channel condition, either belongs to (i) the partition I', with maximum index that can guarantee no outage for this channel quality, or (ii) this channel condition belongs to G L. Therefore, we have the following results about the structure of the optimal solution. THEOREM 7.1. Let R(,) E R, R == {R 1 , R 2 , ... , R L } be the optimal solution for the optimization problem (7.7), where R 1 > R 2 > ... > RL. Then, for all , except in a set of measure zero the transmitted packet is not in outage and we have
(7.9) and
(7.10) otherwise, the transmitted packet is in outage and we have
(7.11) Proof: Let Pout (" R(,), P) represents the outage event for a given channel condition" the transmitted power P, and the attempted rate of transmission R(,), i.e., Pout (" R(,), P) == 1 if the rate R(,) is greater than the instantaneous mutual information of the channel with fixed channel quality, using average transmit power P, and Pout(" R, P(,)) == 0 otherwise. First, we note that (i) V"( E i : Paut("(, R("(), P) = o. Suppose not, then there exist some ')'0 E Ui=-l f i such that Pout (,0, R(')'o), P) == 1.
ut=yr
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
145
We can remove "/0 from the set uf=11r i and assign it to r L without changing the overall throughput. Second, we note that (ii) V,,/ if :3i, 1 < i < L - 1 : R(,,/) == R, then Pout (,,/, R i - 1 ,P) == 1. Suppose not, therefore there exists "/0 and i such that R(,,/o) == R; and Pout (10, Ri - 1 , P) == O. We can reassign "/0 to the set r i - 1 instead of r i and therefore R("/0) == Ri - 1 . With this repartitioning, the transmission still is not in an outage and the total throughput is increased (or might be equal in a degenerate case 1,("/0) == 0) because R i - 1 > R i . Clearly, this repartitioning satisfies Condition (ii). Therefore, without loss of generality we assume that the optimal solution satisfies Conditions (i) and (ii). In fact, if for some choices of channel coding or channel distribution, any other solution also minimizes the outage probability for the same average power, it can be equivalently transformed to such a solution which satisfies both Condition (i) and (ii). An example of such a condition happens when distribution of the channel either has discontinuity or has some intervals for which the supportable rate of the channel is zero. Although, both of these conditions are mathematically interesting to explore, none of them happen in the practical wireless system with Rayleigh or Rician distributions. Now, we can easily prove the theorem. Consider 1 such that R, S C(,,/) < Ri - 1 for some i, 2 SiS L. We want to show that R(,,/) == R; or equivalently "/ E rio Assume that "t E r j for some 1 S j < i. This is in contradiction with Condition (ii) because Pout ("/ , Rs, P) == O. Also, we cannot have "/ E r j for some i < j S L - 1, because Pout (,,/, P) == 1 that is in contradiction with Condition (i). Now, it is only left to show that "/ ~ TL except for a set of measure zero. Let r; be defined as a set of all "/ such that R, S C (,,) < R i - 1 , and assume that the set I' i has nonzero probability. Therefore, we can find a new rate R~ > R, and partition the set r: into two new sets r~ ~ {1,R~ S C(1) < R i - 1 } in which the rate R~ will be used, and the set r; - r~ in which power R L will be used. This repartitioning will increase the throughput which is in contradiction with the optimality of the original solution of the set of rates R == {R 1 , R 2 , ... , R L } . Therefore, the channel quality" has to be in r; that means R, < R(,,) SRi-I. The sufficiency condition can also be similarly argued, because if R(,) == R; for some i, 1 :::; i :::; L - 1, it has to provides zero outage due to Condition (i), and therefore, R; S C(,). Moreover, the rate R i - 1 cannot be supported by the channel because of the Condition (ii), and therefore C (,) < R, -1 . The above argument can also be used for" such that R 1 S C(,,) for which we want to show that R(,) == R 1, or equivalently" E r 1. Now, we note that the only region left from the set of possible ,,'s is C(,,) S R L that constitute the last partition f L for which outage occurs for any rate chosen from the set R. Thus, we can simply assume that R(,) == R L . •
s;
r: -
146
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
8. Optimal design of the quantized rate control based on stochastic gradient search and simulation optimization. In this section, we use the distribution of the supportable rate by the channel and use the result of Theorem 7.1 to design the optimal rate control through quantized feedback. As discussed in Section 6, this distribution depends on the number of antennas both at the transmitter and the receiver, the used space-time codes (e.g. orthogonal designs, BLAST, or linear dispersion codes), and the average power of the code as it is depicted in Figure 3. Because in practice the distribution of supportable rate by the channel is a continuous distribution, we restrict our analysis to the cases where the density function is continuous and nonzero in the interval (0,00). It is not hard to see that this condition is satisfied for the cases of Rician or Rayleigh fading channels using M x N multiple antenna systems. Based on these assumption we have the following simplified result. COROLLARY 8.1. If the probability density function fc(R) of the supportable rate by the channel is continuous and positive in (0, CX)), then the optimal L level quantized rate control function R( "() for (7.7) follows the form of Equations (7.9)-(7.11). The proof is almost immediate using the same argument as in the proof of Theorem 7.1 and considering the fact that the set of measure zero in the proof of Theorem 7.1 will not appear here because of the conditions on fc(R). Figures 3 and 4 shows an exemplary visualization of the distribution of the supportable rate by the channel and the optimal rate allocation strategy with L == 1,2,3 levels of feedback that corresponds no feedback, one bit of feedback, and log2(3) bits of feedback per block, respectively. Using Theorem 8.1, we can see that the outage event occurs only in the interval [0, R L ) and the corresponding outage probability is defined as
The achievable average throughput of the system can also be written in terms of the rate levelsIq , R 2 , ... , R L as
... + R 2(Fc(R 1 ) -
F c(R 2 ) )
+ R 1 (1 -
Fc(R 1 ) ) .
(8.2)
Therefore, the optimization problem can be rewritten over the vector of the threshold values for the rate B. == [R 1 , R 2 , ..• , RL]T in the following form maxE
RL(Fc(R L- 1 ) - Fc(R L ))+ ... + R 2(Fc(R 1 ) - Fc(R 2 ) ) + R 1 (1 - Fc(R 1 ) ) .
(8.3)
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
147
Finite rate levels and Ditribution of supportad rate for 2x2 MIM O
0.7 P = OdS, Actual - - P = OdS, App roximation
06
0 .5
0.4
0.3
0.2
0 .1
0
0
0 .5
1.5
2.5
3
3 .5
4
4 .5
5
Supported rate Rs
F IG. 4. Sample distribution of the supportable rate by the channel. Figure shows the probability distribution function for a mu ltiple antenna transmit and multiple antenna receive system 2 x 2 that exploits BLAST codes. Finite level rate control for one, two, and three levels are depicted in the figure.
Although t he above maximization prob lem is not convex (or concave) optimization problem, the gradient search method has shown to converge to a single vector of t he rates irrespective of the starting point of the optimization prob lem, T herefore, we conjecture that t he above optimization problem has a unique maxim izer, For t he case t hat L = 1, we formally prove that although the objective is not concave, it has a unique maximizer, In fact, we can prove that the objective function is a log-concave funct ion. We have the following theorem. THEOREM 8.1. For the general multiple transmit and multiple receive antenna system and K-block channel fading model, there exist a unique maximizer for the throughput optimization problem with perfect channel state information at the receiver and no channel state information at the transmitter. Proof We first note that the objective function is not a concave functio n. Figure 5 shows t he throughput versus the attempted t ransmission rate R for a single transmit and single receive antenna as well as a 4 x 1 multiple antenna systems employing BLAST codes with the average power 1dB per block. It is clear form the figure that the objective function is not
148
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
Throughput behavior versus attempted transm ission rate for different MIMO system , Pave = 1dB
0.45
:/ '
.. f
. . . . . . . j ':
0.4
'
\
" \" .
I
\ I \ ... . . · f ··· ..· . . . . \
0.35
I I I
0.3
B. "e
/::
<,
\ \ '\
/ .•• \t\ •.•••·•·•.• ·•·
0.25
s: co
r:
\ / / '"
I I
.
\
0.2
0.15
'I
.
,'1
.\ \\
!
\
: ·r', ·
0.1
\,
\ . . .'<. \,
\
.\. \
i
!
0.05 f '
\
.
\
i
oI o
,
" Q5
'
." .
" '~
, -
"".~.
1~
2
2.5
3
Attempted transm ission rate
FIG. 5. Th roughput versus atte mp ted rate of transm ission for a MIMO system and SIS O sys tem . It can be seen that the throughput is no t a concave fun ction of the attempted rat e of trans m ission, howeve r it has a unique maxim izer (sim ulatio n results also represen ts this fa ct) .
convex or concave functi on of th e attempte d t ra nsmission rate R. However, t he same figur e suggest the existence of a unique maximum. We show that th e objective function is a log-concave function and t herefore the optimization probl em has a unique maximum. For a K-blo ck fading channel the optimization problem can be writ ten as
where I i = HiHiH is the equivalent channel quality for the i t h block. By taking the logarithm of the obj ective function J (R), we have
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
For all 1
149
< i < K, define the random variables (8.6)
and K
(p) + M'Yi .
K
C = ~Ci = ~IOgdet I
(8.7)
Therefore, we can rewrite the log J(R) as log J(R) == log R + log Prob (C
< R) == log R + log Fc(R)
(8.8)
where Fc(R) is the cumulative density function of C. To show that the objective function is log-concave we need to show that its logarithm is a concave function. Because log(R) is a concave function and the sum of two concave function is a concave function, it is enough to show that the cumulative density function of the channel Fc(R) is a log-concave function. Although the sum of log-concave functions is not always log-concave, a very strong property of the log-concave function is that this property is preserved under integration for some cases, e.g., the convolution of two log-concave function is a log-concave function [53]. Therefore, because the random variable C is the sum of K random variables Ci , 0 ::; i ::; K, its distribution is obtained through convolution and it will be log-concave if the distributions of Ci's are log-concave. This latter property is usually easier to check. For example, the distribution of the channel quality for a M x 1 or a 1 x N multiple antenna system follows a chi-squared distribution. Therefore, the cumulative distribution function of C, can be written in term of the cumulative distribution of a chi-squared random variable and it can be readily verified through direct differentiation that this cumulative distribution function is a log-concave function. Here, do not provide the direct analytical proof for the log-concavity of the cumulative density function of the instantaneous channel capacity for a general M x N multiple antenna systems over a single block fading channel. However, considering the Gaussian approximation of the instantaneous channel capacity of multi-antenna systems, and using the fact that the cumulative density function of a Gaussian distribution is log-concave, it is not hard to believe that Fc(R) is log-concave. • The stationary points of the above optimization problem (8.3) satisfy \lRJ == 0, where J == RL(Fc(R L- 1 ) - Fc(R L)) + ... + R 2(Fc(R 1 ) Fc(R 2 ) ) + R1(1 - Fc(R 1 ) ) . Therefore, we have
150
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
\l RJ ==
8IE[R(1)] 8R l 8IE[R(1)] 8R2
(8.9) -R1fc(R1) + (1 - Fc(R1)) + R2fc(R1) -R 2fc(R2 ) + (Fc(R 1 ) - Fc(R 2 ) ) + R3fc(R2 )
==
o.
Note that the fist (L - 1) rows have similar form, but the last row is different. We can rewrite these set of nonlinear equations in the form
(8.10) (8.11 )
(8.12) We can conclude from these last set of equations that if the number of rate levels goes to infinity, we have
(8.13) and therefore the average throughput in each bins are asymptotically equal. In the reminder of this section we discuss a numerical method based on gradient descent algorithm [53,54] to solve the throughput maximization problem (7.7) with quantized feedback and to find the optimal values of the rate control thresholds R l , R 2 , ... , R L . The numerical algorithms based on the gradient search rely on the fact that if the gradient of the objective function at the current solution point is not zero, by taking an step toward the opposite direction of the gradient, it is possible to find a new point for which the value of the objective function is smaller. However, in practice it is hard to know the right step size. The algorithm may not converge if step size are large, and it may converge way too slow if step size is small.
However, there are number of effective algorithm to adjust the step size. The Barzilai-Borwein method is a steepest descent method for unconstrained optimization which has proved to be very effective for most practical applications. The Barzilai-Borwein gradient search method differs from
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
151
the usual steepest descent method in the way that the step length is chosen and does not guarantee descent in the objective function. Combined with the technique of nonmonotone line search, such a method has found successful applications in unconstrained optimization, convex constrained optimization and stochastic optimization. A number of recent works have also developed and improved the Barzilai-Borwein gradient search method for some cases [55-57]. The adaptive gradient search algorithm is then defined with the following recursive equation for which the solution converges to a local extremum point of the optimization problem. However, our simulation results for multiple antenna systems over rayleigh fading channel shows that for any random starting point the solution converges to the same point. We have
(8.14) where the gradient V llJ is given by (8.9), and J-lk is the sequence step sizes. This sequence can be either chosen as an appropriate fixed sequence of decreasing positive real numbers such that I:~1 J-l~ < 00 and I:~1 J-lk - t 00, or it can be dynamically found through Barzilai-Borwein gradient search [55-57] which improves the performance of the algorithm and usually converges faster. It should be pointed out that if the probability density function of the equivalent channel condition TJ is analytically known, the value of the gradient can be found analytically. However, in many practical cases it is simpler to estimate the gradient by using its simplified form (8.9) through monte-carlo simulation [58-60]. The gradient search stochastic optimization method [61,62] is a very powerful technique where the actual gradient and value of the objective function is not known, but it can be well estimated through monte carlo simulation or evaluation of the system performance [63]. Figure 4 shows the optimal quantized rate control levels with one, two, and three rate levels for a 2 x 2 multiple transmit and multiple receive antenna system using BLAST code. The probability density function of the supportable rate by the channel is also depicted to have a better representation of how those rates are chosen. The average throughput is then the sum of the area under the probability density curve times the transmission rate in each bin. The outage probability of the scheme is also represented by the area under the probability density curve in the first bin. As a final note in this section, it should be pointed out that the Gaussian approximation discussed in Section 6 is very useful to find the analytical expression for the gradient (8.9) in terms of the mean and variance. Therefore, the gradient can be well approximated with relatively small number of simulations.
9. Discussion and conclusion. In many practical communication system, the goal is to maximize the overall throughput that is the total sum of information packets that have been successfully decoded over a
152
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
Throughput of 4x4 MIMO system using BLAST codes 11 r------r---,----..----,---,----,---~-____r--_,_-___, .......... No feedback 10
4- 1-bit feedback -a- 4-bit feedback - - Perfect teedback
9
w-
"'"c
a
Qj C
.c
7
e:;
6
~
0 0.
s:
"'"
l? s:
f-
5
4
2 L - _- l ._ _--'--_ _-'----_---'_ 4 o 2 3
_
- ' -_ _- ' -_ _"--_----'_
5
6
7
a
_
-'-_-.l 10 9
Average Power (dB)
FIG. 6. Figure shows the throughput performance of a 4x4 MIMO using BLAST code and rate control strategies with on e and four bits of feedback . The performance with no feedback and optimal feedback (i . e., infinite number of bits) also plotted for reference. Comparison of th e curve shows that to achieve th e full throughput potential of the syste m , transmission rate should be adapted with SNR . It is shown how in creasing the number of feedback bits impro ves th e performance of the system. It should be also noted that the performance of a s ystem with four bits of feedback is very close to a system with the optimal infinite bits of f eedback.
large period of time. Although minimizing the outage probability results in smaller frame error rates and increase in throughput, it is not sufficient for throughput maximizatio n. In order to optimize the th roughput, the transmission strategy should adapt the rate of the transmission with the channel variations. When the channel has a better quality, a higher rate should be used and when the channel suffers from a deep fading and has poor quality, the transmission rate should be lower to allow the packet to go through . While we are considering block fading channel model in this work, It should be pointed out that the same intuition about adapting the transmission rate with the channel variations hold for the ergodic fading model [1]. The work in [1] has shown that in the presence of the channel state information at the transmitter as well as the receiver, by controlling the rate and the power the capac ity of the channe l can be achieved. However, it can
153
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
Comparison of throughput with and without (costant R=2) rate control for 2x2 MIMO using BLAST codes
6 r--;:==:;:::;::::r::::::::::;==::;--.----~r-----,-----.----I -
No feedbac k 1-bit feedback -e- 4-M feedba ck - - Perfect feedback
+
.... .... r ,
5
/
0 '---
-2
-
-
--'--
o
-
-
-'-2
-
-
-
-'--4
-
-
---'--
6
-
-
-
'---
8
-
-
--'
10
Average Power (dB)
FIG. 7. Representation of the throughput for systems with rate control and systems with constant rate and power control for different cases of feedback, i.e., no feedback, i-bit feedback, 4-bit feedback, and the perfect feedback .
be shown that for the ergodic channels the rate control is not necessary to achieve the capacity [13], while the rate control is an absolute need to maximize the throughput in block fading channels. Figure 6 shows the increase throughput for a 4 x 4 multiple antenna system using BLAST codes as a function of average transmission power if the transmission rate is chosen properly for each available average power. However, if the transmission rate is kept constant, the penalty could be huge . For example, Figure 7 compares the throughput of a 2 x 2 multiple antenna system using BLAST codes for two different scenarios. First, when the transmission rate is chosen properly as a function of the available average power. Second , when the transmission rate is kept constant at 2 bits per channel use. Clearly, for high available average power a system with adaptive rate can carry much higher throughput than a system with a fixed transmission rate. Figure 5 shows the throughput of a single transmit and single receive antenna as well as 4 x 1 multiple antenna systems using the same average power 1dB. It can be seen that the throughput changes with varying the attempted transmission rate. A small transmission or too large a transmission rate per packet results in a loss in system performance. This ob-
154
M.A. KH OJ ASTEP OUR , X. WANG, AND M. MADIHIAN
Probability of the outage corresponding to rate control for 2x2 MIMO using BLAST code
10°
.-------.-----.--- ,-------r---,---.---r-----.----.------,
.-._~----
---_._---- ----
~---
----------_._-_._ _ _- - _ _ _- ---.. ...
.. ....
----_._--..
_. No feedback
-4- 1-bit feedback
-e-
10·' L-_--'-_
o
_
-'-_ _-'-_--'_ _-'-_ _-'-_
2
3
456
_
4-bit feedba ck
.l..-_--'_
7
8
_
-'-_-.J
9
10
Average Power (dB)
FI G. 8. Probabilit y of outag e corresponding to th e different rate con trol strategies with vari ous feedback bits. It can be observed that in average th e optimal value of outag e probability that correspon ds to th e optimal finite rat e con trol does not fia ctuate with SNR. Howe ver , it is fun ction of the n umber of feedback bit s. For no f eedback, th e probability of packet erro r is about 20% , and it decreases wi th in creasing th e numb er of f eedback bits . Figure shows that the probability of outag e is about 5% for l· bit f eedback an d aboutO.l % for 4-bit feedback case.
servatio n suggest that to achieve a targeted th roughput, we should pick higher t ransmission rate and let the outage occur for th e transmission of some packets. In fact , for a given number of feedback bits, the opt imal outage probability is almost not a function of th e available average power. Figure 8 shows th at to maximize the throughput the probability of out age is about 20%, 5%, and 0.1% for a system using no feedback, l-bit feedback, and 4-bit feedback, respectively. The figure also shows that there is a unique transmission rat e that maximizes the throughput. Also th e figure reveals that increasing the number of antennas is useful only if we pick the right t ransmission rate per packet . In genera l, the larger the numb er of ante nnas in the system, the more sensitive the throughput to the transmission rate. Figure 9 shows the th roughput performance of a 2x2 MIMO using Alamouti code and rate cont rol strategies with one and four bits of feedback. Th e performance with no feedback and optimal feedback (i.e., infinite
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
155
Throughput of 2x2 MIMO system using A1amouti codes
4.5 r - - - - , - - - - , - - - , - - - - - ,-
- - , -- -, --
---,, --
- ,---,-------,
-
No feedback l-bit feedback 4-bit feedback - - Perfect feedback
+
4
-e-
0.5 '---_--I.._ _- ' -_ _-'----_.........l._ _-'--_ _-'----_---J'-----_----'-_ _- ' -_ - - ' 2 4 5 7 10 8 9 o 3 6 Average Power (dB)
FIG. 9. Figure shows the throughput performance of a 2x2 MIMO using Alamouti code and rate control stmtegies with one and four bits of feedback . The performance with no feedback and optimal feedback (i .e., infinite number of bits) are also plotted for a reference. The curves show that to achieve the full throughput potential of the system, tmnsmission rate should be adapted with the available avemge power. It is shown how increasing the number of feedback bits improves the performance of the system. It should be also noted that the performance of a system with four bits of feedback is very close to a system with the optimal infinite bits of feedback. By comparing to Figure 10, it can be seen that for a 2x2 MIMO system with various number of feedback bits, using a system with BLAST code generally outperforms a system with Alamouti code.
number of bits) is also plotted for reference. This figure also confirms that to achieve the full throughput potential of the system, transmission rate should be adapted with available average power. By comparing to Figure 10, it can be seen that for a 2x2 MIMO system with various number of feedback bits, using a system with BLAST code generally outperforms a system with Alamouti code. It should be pointed out that the feedback can either be used to control the power and lower the probability of outage and as a result gain in throughput or it can be used to control the transmission rate. Figures 11 and 12 compare these two feedback strategies. It can be observed that the power control strategy is more sensitive to the available average power. Also, for higher number of feedback bits the rate control strictly performs better than the power power control strategy.
156
M.A. KHOJ AST EPOUR, X. WANG, AND M. MADIHIAN
Throu ghput of 2x2 MIMO system using BLAST codes
6r---,--.-----.----r---,---.-----.---.------,-----, No feedback 1-bit feedback --a- 4-bit feedback - - Perfect feedback
+
5
--
~~.: , ....•~.-..
0 '---
o
---'-
-
---'-2
-
--'-3
- ' -4
---'5
-
---'-6
-
--'-7
-
-'--_ --''--_ --' 8
9
10
Average Power (dB)
F IG. 10. Figure shows the throughpu t performance of a 2x2 M IMO using BL A ST code and rate control strategies with on e and four bits of f eedback. Th e performan ce with no f eedback and optimal feedback (i .e., infini te number of bits ) are also plotted f or a referen ce. T he curves show that to achieve th e full throughp ut pot ential of th e s ystem, transmission rat e should be adapt ed with available average power . It is shown how inc reasing the number of f eedback bits im prove s the performan ce of the system. It should be also noted that the performance of a system with four bits of feedback is very close to a syste m with the optimal infinite bits of f eedback. B y compa ring to Figure g, it can be seen th at f or a 2x2 MIMO system with various number of f eedback bit s, using a sys tem with BL AST code gen erally outp erforms a system wit h A lamouti code.
It is shown in Figures 10 and 9 that increasing th e number of feedback bits improves the performance of t he system. It should be also noted that th e performance of a system with four bit s of feedback is very close to a syst em with the optimal infinite bits of feedback. Figure 13 shows th e percentage of the gain by using feedback over not using feedback as a function of feedback bits . As th e number of feedback bits gets larger , more gain is achieved and th e full gain can be reached for a syste m with infinite numb er of feedback bits. However, it can be noted th at most of th e gain can be achieved through small numb er of feedback bits. For example, abou t 89% or 93% of th e gain can be achieved using only 4 or 5 bits of feedback, respectively. This fact show that a low rate feedback can effectively impr ove t he performance for th e throughput maximization.
WIRELESS MULTI PLE ANTE NNA COMMUNICAT ION SYST EM
157
Variable rate vs . constant rate throughput comparison for 2><2 MIMO system using BLA ST code
4
3.5
I.;,.=:;: ~ ...
control
R=2
I .
O~--'----'------'-_-'-
o
2
3
4
_
_
- ' -_---l._
5
6
_
- ' -_ _L-_........L_ _...J
7
8
9
10
Average Power (dB)
FIG. 11. Representation of the throughput f or the system with and wit hou t rate con trol fo r a !!x!! MIMO using BL A ST code. System without rat e con trol in cur significant loss in perform ance in t erms of throughput if the system is used over different rang e of SNR. At SNR very sm aller than the nominal SNR , the throughput drops because of the in creasin g outag e. Whereas at the relatively large SNR , th e loss in throughput is because of not choosing the appropri at e rat e of transmission.
Moreover , It should be noted that the amount of t he ga in in using quantized feedback st ra tegy can be huge. For example, Figure 9 shows that for a targeted throughput of 3 bits per channel use for a 2 x 2 multiple antenna system using Alamouti code, the required average power for a syst em with no feedba ck is about lOdB. However, if full feedback is used th e same throughput can be achieved less than 6dB average transmission power. Therefore , there is more than 4dB gap in th e required average power out of which almost 3.8dB can be achieved by using quan tized rate control with only 5 bits of feedb ack. The power saving t hrough using quantized rate cont rol becomes even more appa rent and appealing for a higher targeted t hroughput. The huge gain in performance through quantized rat e cont rol t hat is shown in this pap er highly motivat es such a design. More imp ortantly the fact that the huge performance gains is par ticularly achievabl e wit h a very low ra t e feedb ack well suits practi cal syste ms and should be considered for high data rate communicat ions over wireless cha nnels.
158
M.A. KHOJAST EPOUR , X. WANG, AND M. MADIHIAN
Throughput comparison with and without rate control for 2x2 MIMO using BLAST code
4.5 , - - - , - - - - , - -- - , - - -- .-----,------,------,---- - -.-- - ,--- - --,
+ 4
t-bit Rate control -+- l-brt Power control , R=2 ___ l-b~ Power control, R=3
3.5
0.5 " -_ _'--_ _'--_ _.1.-_ _.1.-_ _- ' -_ _- ' -_ _- ' -_ _- ' -_ _--'--_ _--' 10 8 9 o 2 4 6 7 3 5
FIG. 12. Comparison of the throughput for the system with I- bit power control and constant rate versus a I- bit rate control with constant rat e for a 2x2 MIMO using B LAST code. System with out rate con trol in cur signific ant loss in performance in terms of through put if the system is used over different rang e of SNR . At SNR very smaller than the nominal SNR, the through put drops because of the inc reasing outa ge. Wh ereas at the relatively large SNR , th e loss in throughput is becaus e of not choosing th e appropriate rate of tran smission. Simulat ion resul ts shows that th e throughput of a system wit h rate contro l succeed the throu ghput of a sys tem wit h power con trol with th e same number of bits of f eedback.
REFERENCES [11 A. GOLDSMITH AND P . VARAIYA, "Capac ity of fading channe ls with channel sid e information," IEEE Trans actions on Information Th eory , 43( 6): 1986-1992, Nov . 1997. [21 E. BIGLlERI, G . CAIRE , AND G . TARICCO, "Limiti ng performan ce of blo ck-fad ing channels with mul t iple antennas," IEEE Transa ctions on Inf ormation Th eory , 47: 1273-1289, Sep 2001. [3] G . CAIRE, G. TARICCO, AND E. BIGLIERI, "Opt imum power contro l over fading channels," IE EE Trans actions on Information Th eory , 45(5) : 1468-1489, Jul. 1999. [4J E. TE LATAR, "Capacity of multi-ante nna Gaussia n cha nnels," AT & T B ell Labs In tern al Tech. Memo , J une 1995. [5] V. TAROKH, N. SESHADRI , AND A .R. CALDERBANK, "Space-t ime codes for high dat a rate wireless comm unication: per forman ce cr ite r ion an d code construction," IEEE Trans actions on Information Th eory , 44(2): 744- 765, Mar 1998.
W IRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
159
Throughput gain with the channel state information for 2x2 MIMO system using BLAST codes 100
90
80
70 c 'iii
'" e
60
"'" '"
50
;:; "6 E
l" ,? '"
40
0-
30
20
10
0
0
0.5
1.5
2
2.5
3
3.5
4
4.5
5
Number of feedba ck bits
FIG. 13. Figure shows the percentage of the gain by using feedback versus number of feedback bits . As the number of feedback bits goes to infinity the full (100%) gain is achieved, whereas wit h zero bits of feedback, i.e., no feedback the gain is zero . It can be observed from the figure that about 93% of the gain is achieved through only 5 bits of feedback.
[6J G .J . FOSCHINI AND M.J . GANS, "O n limit s of wire less communications in a fad ing [7]
[8]
[9]
[10]
[11]
environment when using multiple antennas, " Wireless Personal Communi cations, 6: 311, 1998. G.J . FOSCHINI AND M.J. GANS, "Capacity when using multiple antennas at transmit and receive sites and ray leigh-faded matrix channel is unknown to the transmitter," Advances in Wireless Communications, Eds . J .M. Holtzman and M. Zorzi, K ulwer Academic Publishers , 1998. T .L . MARZETTA AND B.M . HOCHWALD, "Capacit y of a mob ile mult iple-a ntenna communication link in Rayl eigh flat fading, " IEEE Transa ctions on Inform ation Theory, 45( 1): 139-1 57, Jan 1999. M.A . KHOJASTEPOUR, X. WANG , AND M. MADIH IAN, "Opt imal quantized power control in multiple antenna communication systems," in Forty-third Annual A llerton Conference on Communication, Control, and Computing, Allerton House, Monticello, IL, September 28-30, 2005. S. BHASHYAM , A. SABHARWAL , AND B. AAZHANG, "Feedback gain in multi p le antenna systems," IEEE Transactions on Communications, 50(5) : 785-798, May 2002. K .K. MUKKAVILLI, A. SABHARWAL, E . ERKIP, AND B. AAZHANG, "O n beamforming with finite rate feedback in mu ltiple-antenna systems," IEEE Transactions on Information Theory, pp . 2562-2579, 2003.
160
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
[12) D.J. LOVE AND JR. HEATH, R.W., "Limited feedback unitary precoding for orthogonal space-time block codes," IEEE Transacions on Signal Processing, 53: 64-73, 2005. [13] G. CAIRE AND S. SHAMAI, "On the capacity of some channels with channel state information," IEEE Trans. on Inform. Theory, 45: 1468-1489, 1998. [14] J. HAYES, "Adaptive feedback communications," IEEE Transactions on Communications, 16: 29-34, 1968. [15] M.S. ALOUINI AND A.J. GOLDSMITH, "Capacity of rayleigh fading channels under different adaptive transmission and diversity-combining techniques," IEEE
[16] [17) [18)
[19]
[20) [21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29] [30]
[31]
Journal on Selected Areas in Communications, Special Issue on Multi-Media Network Radios, 17: 837-850, 1999. A.J. GOLDSMITH AND S.-G. CHUA, "Variable-rate variable-power mqam for fading channels," IEEE Trans. on Communications, 45: 1218-1230, 1997. A.J. GOLDSMITH AND S.-G. CHUA, "Adaptive coded modulation for fading channels," IEEE Trans. on Communications, 46: 595-602, 1998. C. BERROU, A. GLAVIEUX, AND P. THITIMASJSHIMA, "Near shannon limit error correcting coding and decoding: Turbo-codes," in IEEE Int. Conf. on Commu., Geneval, Switzerland, May 1993, pp. 1064-1070. "Design of capacityT. RICHARDSON, M. SHOKROLLAHI, AND R. DRBANKE, approaching irregular low-density parity-check codes," IEEE Trans. on Info. Theory, 47(2): 619-637, Feb. 2001. S. VISHWANATH AND A.J. GOLDSMITH, "Adaptive turbo-coded modulation for flat-fading channels," IEEE Trans. on Communications, 51: 964-972, 2003. N. PRASAD, M.K. VARANASI, AND M.A. KHOJASTEPOUR, "Outage theorems for mimo fading channels," in Forty-third annual Allerton conference on Communication, Control, and Computing, 2005. R.K. MALLIK, M.Z. WIN, r.w. SHAO, M.-S. ALOUINI, AND A.J. GOLDSMITH, "Channel capacity of adaptive transmission with maximal ratio combining in correlated rayleigh fading," IEEE Transactions on Wireless Communications, 3: 1124-1133, 2004. M.J. HOSSAIN, P.K. LITTHALADEVUNI, M.-S. ALOUINI, V.K. BHARGAVA, AND A.J. GOLDSMITH, "Adaptive hierarchical modulation for simultaneous voice and multi-class data transmission over fading channels," IEEE Transactions on Vehicular Technology (accepted for publication), 2005. P. SEBASTIAN, H. SAMPATH, AND A. PAULRAJ, "Adaptive modulation for multiple antenna systems," in Proceedings of Thirty-fourth Asilomar conference on signal, systems, and computers, 2000h. S.A. JAFAR, S. VISHWANATH, AND A.J. GOLDSMITH, "Throughput maximization with multiple codes and partial outages," in Proceedings of Globecom Conference, San Antonio, Texas, USA, 2001. J.C. ROH AND'B.D. RAO, "Adaptive modulation for multiple antenna channels," in Proceedings of Thirty-six Asilomar conference on signal, systems, and computers, 2002. C.-N. CHUAH, D.N.C. TSE, J.M. KAHN, AND R.A. VALENZUELA, "Capacity scaling in mimo wireless systems under correlated fading," IEEE Transactions on Information Theory, 48: 637-650, 2002. V. TAROKH, H. JAFARKHANI, AND A.R. CALDERBANK, "Space-time block codes from orthogonal designs," IEEE Transactions on Information Theory, 45: 1456-1467, 1999. B. HASIBI AND B.M. HOCHWALD, "High-rate codes that are linear in space and time," IEEE Transactions on Information Theory, 48: 1804-1824, 2002. L. OZAROW, S. SHAMAI, AND A.D. WYNER, "Information theoretic considerations for cellular mobile radio," IEEE Transactions on Vehicular Technology, 43: 359-378, May 1994. R. McELIECE AND W. STARK, "Channels with block interference," IEEE Transactions on Information Theory, 30(1): 44-53, Jan. 1984.
WIRELESS MULTIPLE ANTENNA COMMUNICATION SYSTEM
161
[32] G. CAIRE, G. TARICCO, AND E. BIGLIERI, "Optimum power control over fading channels," IEEE Transactions on Information Theory, 45(5): 1468-1489, 1999. [33] E. J ORSWICK, "Outage probability of multiple antenna systems: Optimal transmission and impact of correlation," in Proc. of 13. Joint Conference on Coding and Communications, march 2004. [34] H. BOCHE AND E. J ORSWIECK, "Outage probability of multiple antenna systems: Optimal transmission and impact of correlation," in Proc. IZS, March 2004. [35] E. J ORSWIECK AND H. BOCHE, "Optimal transmission strategy for multiple antenna systems with uninformed transmitter and correlation," in Proc. Asilomar, 2003. [36] E. JORSWIECK AND H. BOCHE, "On the impact of correlation on the capacity in mimo systems without csi at the transmitter ," in Proceedings of CISS, 2003. [37] E. J ORSWIECK AND H. BOCHE, "Behaviour of outage probability in miso systems with no channel state information at the transmitter ," in Proceedings of ITW, 2003. [38] H. BOCHE AND E. J ORSWIECK, "Optimal power allocation for miso systems and complete characterization of the impact of correlation on the capacity," in Proc. of ICASSP, 2003. [39] H. BOCHE AND E. J ORSWIECK, "On schur-convexity of expectation of weighted sum of random variables with applications," Journal of Inequalities in Pure and Applied Mathematics, 5: 46, 2004. [40] H. BOCHE AND E. JORSWIECK, "On the schur-concavity of the ergodic and outage capacity with respect to correlation in multi-antenna systems with no csi at the transmitter ," in Proceedings of Allerton, 2002. [41] G.J. FOSCHINI, ," Bell Labs Technical Journal, 1: 41-59, 1996. [42] A. SABHARWAL, E. ERKIP, AND B. AAZHANG, "On channel state information in multiple antenna block fading channels," in Proceedings ISITA, Hawaii, Nov 2000. [43] A. NARULA, M.J. LOPEZ, M.D. TROTT, AND G.W. WORNELL, "Efficient use of side information in multiple-antenna data transmission over fading channels," IEEE Journal on Selected Areas of Communications, 16(8): 1423-1436, Oct 1998. [44] A. NARULA, M.D. TROTT, AND G.W. WORNELL, "Performance limits of coded diversity methods for transmitter antenna arrays," IEEE Transactions on Information Theory, 45(7): 2418-2433, Nov 1999. [45] E. VISOTSKY AND U. MADHOW, "Space-time precoding with imperfect feedback," in Proceedings I8IT 2000, Sorrento, Italy, June 2000. [46] S.A. JAFAR AND A. GOLDSMITH, "On optimality of beamforming for multiple antenna systems with imperfect feedback," in Proceedings I8IT 2001, Washington DC, USA, June 200l. [47] K.K. MUKKAVILLI, A. SABHARWAL, M. ORCHARD, AND B. AAZHANG, "Transmit diversity with channel feedback," in Proceedings International Symposium on Telecommunications, Tehran, Iran, September 2001. [48] D ..J. LOVE, R"W. HEATH JR., AND T. STROHMER, "Grassmannian beamforming for multiple-input multiple-output wireless systems," IEEE Transactions on Information Theory, 49: 2735-2747, 2003. [49] D.J. LOVE, R.W. HEATH JR., W. SANTIPACH, AND M.L. HONIG, "What is the value of limited feedback for mimo channels?," IEEE Communications Magazine, 42: 54-59, 2004. [50] B.M. HOCHWALD, T.L. MARZETTA, AND V. TAROKH, "Multi-antenna channelhardening and its implication for rate feedback and scheduling," Submitted for publication to IEEE Transactions on Information Theory, 2003. [51] P.J. SMITH AND M SHAFI, "On a gaussian approximation to the capatity of wireless mimo systems," in Proceedings of International Conference on Communications, 2002.
162
M.A. KHOJASTEPOUR, X. WANG, AND M. MADIHIAN
[52] Z. WANG AND G.B. GIANNAKIS, "Outage mutual information of space-time mimo systems," in Proceedings of 40th Annual Allerton Conference on Communication, Control, and Computing, 2002. [53] S.P. BOYD AND L. VANDENBERGHE, Convex optimization, New York: Cambridge, 2004. [54] .J.C. SPALL, Introduction to stochastic search and optimization: estimation, simulation, and control, Hoboken, NJ: Wiley-Interscience, 2003. [55] M. RAYDAN AND B.F. SVAITER, "Relaxed steepest descent and cauchy-barzilaiborwein method," Computational Optimization and Applications, 21: 155167, 2002. "A new gradient descent method for unconstrained optimiza[56] N. ANDREI, tion," Submitted for Publication to Journal of Mathematics of Compu-
[57]
[58] [59] [60] [61] [62] [63]
tation, ..http://www.ici.ro/camo/neculai/newstep.pdf @ 02/22/2005"; also: ICI Technical Report, April 2004. N. ANDREI, "Relaxed gradient descent and a new gradient descent methods for unconstrained optimization," Submitted for Publication to Journal of Mathematical Programming, "http://www.ici.ro/camo/neculai/newgrad. pdf @ 02/22/2005"; also: ICI Technical Report, August 2004. S. ANDARDOTTIR, "A review of simulation optimization techniques," 1998. P. G LASSERMAN, Gradient estimation via perturbation analysis, Boston : Kluwer Academic Publishers, 1991. M. Fu AND J.-Q. Hu, Conditional Monte Carlo: Gradient Estimation and Optimization Applications, Boston: Kluwer Academic Press, 1997. M. Fu, "Optimization via simulation: A review," Annals of Operations Research, 53: 199-248, 1994. H.J. KUSHNER AND G. YIN, Stochastic approximation and recursive algorithms and applications, New York: Springer, 2003. .J. NOCEDAL AND S. WRIGHT, Numerical Optimization, Springer Verlag, 1999.
COMMUNICATION STRATEGIES AND CODING FOR RELAYING GERHARD KRAMER* Abstract. Information-theoretic models for relaying are developed and applied to wireline and wireless networks. For wireline networks with node constraints, data compression is shown to improve rates. For wireless networks with half-duplex constraints, decode-and-forward strategies are developed that can give substantial rate gains over no-relay transmission and traditional multihopping. The performance of the strategies is verified by designing coded modulations that approach certain information-theoretic limits. The codes used are irregular low-density parity-check (LDPC) codes. Key words. Relay channel, capacity, code, half-duplex, low-density parity-check code, network. AMS(MOS) subject classifications. 94A05, 94A15, 94A24.
1. Introduction. Information theory usually models a communication channel by a conditional probability distribution. For example, a model for communicating a symbol from one point to another might involve the conditional probability distribution p Y /x (·) that evaluates to
Py1x(bla),
a E X, bEY
(1.1)
where X and Yare random variables taking on values in the discrete and finite alphabets X and Y, respectively. The aim of communication is to transmit reliably a message index W taking on one of M values from a source to a sink. Suppose that to accomplish this task one transmits a string of n symbols X" == Xl, X 2 , ... , X n over the channel. The rate of communication is then
(1.2)
R == log2(M)/n
bits per channel use. The maximum rate C at which one can transmit reliably is called the capacity of the channel. A relay channel is a multiterminal channel with three parties or nodes: a source (node 1), a relay (node 2), and a sink (node 3). A possible model for relaying might involve the probabilities (see [6, 20]) PY2 Y3\X 1X2 (b2, b3 la l '
a2)
(1.3)
where Xl is the source's channel input, Y3 is the sink's channel output, and X 2 and Y2 are the relay's input and output, respectively. The idea is that the source and sink can only transmit and receive, respectively, but *Communications and Statistical Sciences Department, Bell Labs, Lucent Technologies, Murray Hill, NJ 07974. 163
164
GERHARD KRAMER
the relay can both transmit and receive. Suppose that the source and relay transmit the strings Xl == XII, X I2, ... ,X l n and X!j == X 21 , X 22 , · .. ,X2n , respectively, over the channel. Suppose further that the relay can react l . quickly so that its input X 2i can be any function of its past outputs The relay channel is said to be memoryless if one has (see [12])
yi-
PY2iY3ilwxtX~Y2i-lY3i-l (b 2i, b3i I i bi w,ail,a 2, 2
==
PY2Y31XIX2
(b2i , b3i la li , a2i)
l
bi - 1 ) '3
(1.4)
for all al, a~, b~, b1, and i == 1,2, ... ,n. We will consider only memoryless channels. Again, the maximum rate C at which one can transmit reliably is called the capacity of the channel. A relay network is a generalization of a relay channel to a system with T nodes: a source (node 1), T - 2 relays (nodes 2 to T -1), and a sink (node T). A model for relaying would involve the probabilities (see [12, 20])
The relay network is memoryless if the natural extension of the condition (1.4) is true, i.e., if the ith channel outputs ¥ti, t == 2,3, ... , T, depend only on the ith channel inputs X t i , t == 1,2, ... ,T - 1, given the message, the present (or ith) and past channel inputs, and the past channel outputs. The capacity C is again the maximum rate at which one can transmit reliably. 2. Wireline and wireless networks.
2.1. Wireline models. The memoryless relay channel defined by (1.3) models a variety of communication problems. Consider, for example, a wireline network with three terminals depicted in Figure 1. The idea is that the source (node 1) is wired to the relay (node 2) which is wired to the sink (node 3). One might therefore expect that Y2 is a noisy function of Xl only, and that Y3 is a noisy function of X 2 only. In this case the channel distribution (1.3) satisfies
for all aI, a2, b2 , b3 . If the channels are essentially noise-free we might even choose to consider
where 1(.) is the indicator function that takes on the value 1 if its argument is true and is zero otherwise. Some wireline problems have constraints on the network nodes and not only (capacity constraints) on the network channels or edges. For instance, suppose the relay (node 2) has limited processing power and can
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
2
Y
EJ
X
2
Node 2
•
165
EJ
Y3
Node 3
FIG. 1. A wireline network.
either transmit or receive, but not both. For noise-free networks , one might model this via the constraint
(2.3) Note that (2.1) is no longer true . However, we can write
for all al, az , bz, b3 • More generally, a relay channel is said to be physically degraded [6] if one can write PY2YsIXIX2 (bz, b3la l , az)
= PY21XIX2 (bzlal, az) . PYSIX2Y2 (b3Iaz , bz)
(2.5)
for all al, az, bz, b3. The channels (2.1), (2.2), and (2.4) are therefore physically degraded.
2.2. Wireless models. Consider the wireless network depicted in Figure 2. For such problems one usually replaces the probability distribution (1.1) with a probability density PYIX(-) , A commonly-studied class of probability densities is based on an additive white Gaussian noise (AWGN) model with (2.6) (2.7)
where X l, X z , Yz, Y3, Zz , Z3 are complex random variables, h i j and di j are the respective (fading) channel gain and distance between nodes i and j, and a is an attenuation exponent (e.g., a = 2 for free space propagation). The average energies (or powers) of the inputs are constrained as n
L E[IXti IZ]/n S P
t,
; =1
t = 1,2.
(2.8)
.
The idea of the above model is that the wireless channel permits broadcasting (Xl affects both Yz and Y,,) but that this causes interference (Xl and X z interfere at node 3).
166
GERHARD KRAMER Node 2
Node I
Node 3
FIG. 2. A wireless network.
We will assume that h i j is a realization of a complex random variable H i j . We say that the channel exhibits Rayleigh fading if the H i j are statistically independent, proper [15], complex, Gaussian, zero-mean, unit variance random variables. We further assume that Z2 and Z3 are independent, proper, complex, Gaussian, unit variance random variables that are independent of Xl , X 2 , and the Hi j for all i,j. We remark that the model defined by (2.6) and (2.7) implicitly permits the relay to transmit and receive at the same time in the same frequency band. However, this is often not possible due to the large differences in transmit and receive energies at the antennas of wireless devices. Most practical wireless relays operate under a half-duplex constraint that one can model as (see [11] and [12]) (2.9) Note the similarity between (2.3) and (2.9). We remark that without or with the half-duplex constraint, the wireless models we consider do not satisfy (2.5) and are hence not physically degraded . 3. Example of a code for wireline relaying. Consider the wireline network in Figure 1 with the node constraint (2.3) and Y3 = X 2 . Suppose for simplicity that the alphabets Xl, X2 ,Y2,Y3 of the respective random variables Xl , X 2, Y2 , Y3 are all the set {O, I}. One might guess that the capacity of this network is 1/2 bit per use because the relay can either receive or transmit, but not both. Consider, however, the code tree depicted in Figure 3 that is labeled with the symbols X2 that the relay transmits after having decoded an Xl. The idea is that after decoding Xl = 0 the relay sends X 2 = 0, while after decoding Xl = 1 the relay sends X 21> X 22 = 1, O. Note that every codeword in the code t ree has exactly one 0, so that the source can transmit exactly
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
167
o o FIG.
3. A code tree for the relay in Figure 1.
one new message bit for every relay codeword. Note further that the tree is labeled so that the code is prefix-free or instantaneously decodable [7, Ch. 5]. This means that the sink can correctly parse its received sequence to extract the message bits. For example, suppose that the message w has 8 bits 0,1,0,0,1,1,1,0. The transmit and receive sequences are then ~==
X 113
-y13 _ 2
-
X 213
--
1, 0, 0, 0,
2, 1, 1, 0,
5, 0, 0, 0,
3,4, x,O, 0,0, 1,0,
6, 7,8, 1, x,l, 1, 0,1, 0, 1,0,
9,10, x,l, 0,1, 1,0,
11,12, x,O, 0,0, 1,0,
13 X X
(3.1)
°
where x denotes a "don't care" symbol. Note that, for this particular message, we have transmitted 8 bits in 13 channel uses which gives a rate R(w) == 8/13 that is larger than 1/2. We would like to determine the rate of the above code when the source transmits b bits for large b. The codewords have variable lengths, so we define a random variable £2 to be the length of the codeword of any of the message bits. If the source bits are coin-tossing, we compute E[L 2 ] == ~ (1) + ~ (2) == ~ and
R
=
1
E[L
J
=
.
2/3 bits per channel use
(3.2)
2
implying that we can substantially increase the rate beyond 1/2 bits per use. In fact, it turns out that better compression codes such as Huffman codes or arithmetic source codes (see [7, Ch. 5]) can achieve R ==0.773 bits per use. Furthermore, this rate is the capacity of the network, as discussed in Section 4.2 below.
4. Capacity and coding. The capacity of a point-to-point memoryless channel (1.1) is known to be C == max I(X; Y)
(4.1)
Px i-)
where I(X; Y) is the mutual information between the random variables X and Y (see [7, Ch. 2 and Ch. 8]). For the complex-alphabet AWGN model we have
y==x+z
(4.2)
168
GERHARD KRAMER
where E~l E[IXi 2]/n :::; P and Z is proper, Gaussian, unit-variance, and independent of X. The maximization in (4.1) is now performed over all probability density functions px(·) and the result is (see [7, Ch. 10)) I
C == log2(1 + P) bits per channel use
(4.3)
where we recall that the channel has complex alphabets. Consider next the relay channel (1.3). The capacity of this channel is still not known except for special cases. However, good achievable rates and upper bounds on the capacity are known. For example, a standard cut-set upper bound on the capacity is (see [6])
We next consider random coding strategies that achieve good rates for the relay channels we are interested in.
4.1. Random codes for relay channels. Several types of relaying strategies have been developed and we list some of these below. • Amplify-and-forward: the relay amplifies the most recent Y2. More generally, the relay transmits some function of a small number of the past Y2 . • Decode-and-forward: the relay decodes the source message, reencodes it, and transmits the resulting codeword. The relay usually uses a different code book than the source. This method includes traditional multihopping where the source transmits and the relay is silent in a first block, while the source is silent and the relay transmits in a second block. • Compress-and-forward: the relay quantizes, compresses, and channel encodes a string of Y2 and transmits the resulting quantized values digitally to the sink. A more sophisticated quantization exploits the statistical dependence between Y2 and Y3 to reduce the compression rate (see [6, 12]). We will consider only the decode-and-forward strategy and some of its variants. The coding procedure described next appeared in [5, 21]. Consider the AWGN channel with (2.6) and (2.7) and di j == hi j == 1 for all i, j. We generate two codebooks C~ and C2 that both have 2n R codewords of length n (we assume that 2n R is an integer for simplicity). Every codeword. ~~ (w), w ==' 1, 2, ... , 2n R , in Ci is generated by choosing each of its n symbols independently using a proper, complex, Gaussian distribution with zero mean and variance P{ where P{ ::; Pl. The codewords ~2(W), w == 1,2, ... , 2n R , in C2 are generated in the same way except that the Gaussian distribution has variance P2 . The transmission protocol is depicted in Figure 4 and it operates as follows.
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
Block 3
Block 2
Block 1 f.~ (wd + {Jf.2(1)
;f~ (W2)
;f2(1)
;f2(wd
+ {J;f2(wd
169
f.~ (W3)
+ {J;f2(W2)
;f2(W2)
FIG. 4. A decode-and-forward strategy. The upper and lower blocks show what the source and relay transmit, respectively.
• Suppose W has nRB bits. We split these into B equally-sized blocks of nR bits WI, W2, ... , WB. Set Wo == WB+I == 1. • In block b, b == 1,2, ... , B + 1, the source transmits (4.5) where {3 == J(PI - P{)/P2 . • In block b the relay transmits
{f2 ( Wb-I).
Note that using randomly-generated codebooks with the above transmission protocol will ensure that the power constraints (2.8) can be satisfied. The decoding procedure is as follows. • After block b, b == 1,2, ... , B, the relay decodes Wb by using its bth block of channel outputs. • After block b, b == 2,3, ... , B + 1, the sink decodes Wb by using its (b - l)st and bth block of channel outputs. One can show, using virtually the same analysis as for deriving (4.3), that the above decode-and-forward strategy achieves the rates R satisfying
R < log(l + P{) R < log(1 + P{ + (1 + 13)2 P2 )
(4.6) (4.7)
where the first and second bounds arise due to the respective relay and sink decoding steps.
4.2. Generalizations. The above strategy generalizes in a natural way to other memoryless relay channels (1.3). Summarizing, we find that for large B one can achieve any rate up to
where the first and second mutual information expressions in (4.8) arise due to the respective relay and sink decoding steps. We remark that (4.8) is almost the same as (4.4) except that there is no Y3 in the first mutual information expression in (4.8). We also remark that for physically degraded relay channels satisfying (2.5) we have that
170
GERHARD KRAMER
Node 1
Node 2
Node 3
Node 4
FIG. 5. A wireline network with four nodes .
forms a Markov chain so that I(X I ; Y3IX zYz ) = O. This implies that (4.9) by the chain rule for mutual information [7, p. 22J. Thus, the cut-set bound (4.4) is the same as the decode-and-forward rate (4.8) and is hence the capacity. This means that (4.8) is the capacity of the wireline models (2.1), (2.2), and (2.4). Furthermore, one can check that (4.8) gives C = 0.773 bits per use for the channel studied in Section 3. A generalization of the achievable rate (4.8) is to relay networks. Let 7T(') be a permutation on {I, 2, ... , T} with 7T(I) = 1 and 7T(T) = T, and let 7T( i : j) = {7T( i), 7T( i + 1), ... , 7T(j)}. One can show that one can achieve rates up to (see [12, 22, 23])
R = max
min
1r(') l~t~T-I
I (X 1r ( I :t ) ; Y1r (t +l ) I X 1r(t +l :T -
I »)
(4.10)
where one can choose any distribution on Xl, X z ,... , XT-I. Furthermore, the rate (4.10) is the capacity of physically degraded relay networks that include wireline networks (see [2, p. 69] and [12, Remark 10]). For example, consider the wireline network with four nodes shown in Figure 5. Suppose that node 1 transmits a message to node 4, and that both nodes 2 and 3 have the node constraint described by (2.3). We find that (4.10) gives the capacity of this channel. 5. Wireless networks. From here on we consider relay channels defined by (2.6) and (2.7) with Rayleigh fading, i.e., the H i j are independent, proper, complex, Gaussian , zero-mean, unit variance random variables. Suppose further that the source node does not know the realizations of these random variables, the relay knows H l 2 only, and the sink knows H l 3 and H Z3 only. These restrictions on channel knowledge apply to the practical case where node j can accurately estimate its channel gains H i j but it cannot (or wishes not to) synchronize its waveform with the other transmitters. Suppose we encode by using the strategy described in Section 4.1 that is depicted in Figure 4. One can show that it is best to choose P{ = H or (3 = 0 in (4.5). The resulting protocol is depicted in Figure 6. It has recently been shown that this strategy achieves capacity if the relay is in the vicinity of the source node, but not necessarily colocated with it [12J .
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
171
Block 4
Block 1
Block 2
Block 3
;£l( wd
;£ 1(W2)
;£1 (W3)
;£1 (W4)
;£2(1)
~(wd
~(W2)
~( W3)
FIG. 6. A decode-and-forward strategy for full-dupl ex relays .
Block I
Block 2
I ;£1(~1) I
;£1 (W2)
~(W1)
Block 3
I (~3) I ;£1
Block 4
;£1 (W4) ~(W3)
FIG. 7. A decode-and-forward strategy for half-duplex relays.
5.1. Half-duplex relays. The strategies in Figures 4 and 6 apply to full-duplex relays. We next wish to consider half-duplex devices satisfying
(2.9). A natural approach is to modify the protocol in Figure 6 by using the protocol depicted in Figure 7. The relay now decodes only every second message block WI, W3 , Ws, .... We will design codes for such a protocol. However, we remark that there are better decode-and-forward strategies that additionally modulate the timing of the relay so as to transmit extra information to the sink (see [11]). Consider, e.g., the geometry shown in Figure 8 where d is a real number , d13 = 1, d12 = Idl, and d23 = 11 - dl (the relay is to the left of the source if d is negative). Suppose further that the sink has two antennas. We replace (2.7) with the 2 x 1 vector
L
!l13
= -;:iX1 a:3
!l23
+ O:/2X2 + Z3
(5.1)
d23
where !l13 and !l23 are realizations of the 2 x 1 vectors H 13 and H 23 whose components are independent, proper, complex, Gaussian, zero-mean, unitvariance random variables (we thus have multiantenna Rayleigh fading [8, 18]). Similarly, b is a 2 x 1 vector with independent, proper, complex, Gaussian, unit-variance entries. To make the problem more realistic , suppose we use quaternary phaseshift keying (QPSK) modulation at both the source and relay. Suppose further that a = 4, PI = P2 = 0.25 (which we write as Es/No = - 6dB ), and the relay listens exactly half of the time. The rates (4.8) of the halfduplex decode-and-forward strategy of Figure 7 are shown in Figure 9 as a function of d. Also shown are the rates when there is no relay (R >:::: 0.54 bits per use) and when one uses traditional multihopping (see Section 4.1) with optimized listen and transmit times for the relay. Observe that decode-and-forward can achieve substantial rate gains over both no-relay transmission and traditional multihopping. For instance,
172
GERHARD KRAMER
~
~
•
1
------------------------------~ --------------------~ d
FIG.
•
•
Relay
Source
Destination
8. A geometry with the relay on a line going through the source and sink.
1.4 r - - - - r - - - - r - - - - - r - - - r - - - - r - - - - r - - - - - - r - - - - , - - - - ,
1.2
1x1x2 channel, 1/2-duplex Rayleigh fading, a=4 QPSK, E/N =- 6 dB o
/
r.
/. m
~ ~
\\
\
/
0.8
£.
\
/: decode-and-forward: (relaytalks 1/2 of time) / / •
\
~ O.6_/~~~::ff 0.4 /
\\
~
\~__ '~"
: multihopping . (optimum duplexing)
.
0.2
O~---L---I..----L---I..----L---"'----~--.l.....--...l
-1
-0.75
-0.5
-0.25
o
0.25
0.5
0.75
d FIG. 9.
Decode-and-forward rates.
the points marked with * in Figure 9 are located at (d, R) == (0.25, 0.5) and (d, R) == (0.25,1). Decode-and-forward can thus approximately double the rate as compared to no-relay transmission at d == 0.25. The next section describes codes that achieve these points with low error probability.
5.2. Code design. Consider the geometry of Figure 8 with d == 0.25. We design irregular low-density parity-check (LDPC) codes [9, 14] of length 16,000 for single-antenna, no fading, AWGN channels with binary phaseshift keying (BPSK) by using the curve-fitting procedure described in [4, Section III]. For the actual transmission, the coded bits are mapped to QPSK symbols by using the Gray mapping. The decoder uses the standard graph representation of an LDPC code with variable nodes on the left and check nodes on the right. The left and right nodes are connected by edges whose nodes are chosen with a random permutation that avoids 2-cycles. The decoder iterates 60 times between the left and right nodes by using an a posteriori probability (APP) decoder.
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
1x2 no relay R=1/2(Rc=1/4)
2x2 distr. O-BLAST d=O.25, a=4 R=3/4 (R c=3/8) 1 detectoractivation relay decodesearly
a: w
u,
I
LOPC codes n=16,OOO random edge perm. 60 iterations
I
R=1/2: : R=3/4 Capacity I I Capacity I J
I I I I
-7
-6.5
173
I I I I I I
-6
-5.5
-5
-4.5
-4
Es/No [dB] FIG. 10. Decode-and-forward frame error rates when using LDPC codes designed for single-antenna AWGN channels.
Consider first the case R == 1/2 without a relay. We design an LDPC code with rate R; == 1/4 that has an (single-antenna, no fading, BPSK) AWGN decoding threshold of Eb/No == -0.4 dB which is about 0.3 dB away from the (single-antenna, no fading, BPSK) AWGN capacity. The resulting frame error rate (FER) curve is shown on the right in Figure 10. Observe that the code operates within 1.3 dB of capacity at an FER of 10- 3 . The extra loss (as compared to 0.3 dB for the single-antenna case) can be attributed to the relatively short code length and the fading. Consider next the case R == 1 with decode-and-forward. We design an LDPC code with rate R; == 3/8 that has an (single-antenna, no fading, BPSK) AWGN decoding threshold of Eb/No == 0.1 dB which is about 0.45 dB away from the (single-antenna, no fading, BPSK) AWGN capacity. The encoding and decoding procedure is as follows (see Figure 7). • In the odd-numbered blocks b == 1,3,5, ... , the source transmits 4000 QPSK symbols (or 8,000 of the 16,000 codeword bits) by using the rate R c == 3/8 LDPC code. • After every odd-numbered block b, the relay decodes the information bits of the R; == 3/8 code from this block. Note that the relay has received only half of this codeword's symbols. • In the even-numbered blocks b == 2,4,6, ... , the source transmits using the rate R; == 1/4 code described above. • In the even-numbered blocks, the relay encodes the information bits decoded from the previous block by using the R; == 3/8 en-
174
GERHARD KRAMER
coder and transmits the last 4000 QPSK symbols of this codeword (or the last 8,000 of the 16,000 codeword bits). • After every even-numbered block, the sink decodes the information bits of the rate R; == 3/8 code. The sink performs only one detector activation per codeword (we remark that multiple detector activations would improve the performance a little [4]). • The sink cancels the interference caused by the symbols of the R; == 3/8 code from the even-numbered blocks. • After every second even-numbered block, the sink decodes the information bits of the R; == 1/4 code. The overall rate is R == 2(3/8)+2(1/4)(1/2) == 1 bit per use, where the leading factors 2 are due to the QPSK modulation. There are three decoding steps to consider in the above procedure. • The FER of the relay decoding step is not shown in Figure 10 because it lies far to the left of the other two curves. • The FER of the sink decoding the information bits from the R; == 3/8 code is shown as the left curve in Figure 10 (labeled "2 x 2 distr. D-BLAST" because our decode-and-forward strategies are closely related to a distributed version of D-BLAST [8]). • The FER of the sink decoding the information bits from the R; == 1/4 code is the same as the case where there is no relay, and is the right curve in Figure 10. We see that the dominating FER is in both cases (without and with a relay) due to the direct link between the source and sink. The reliability of the two schemes is therefore the same. However, the decode-and-forward scheme doubles the rate. Note also that we have used only codes designed for single-antenna channels. This might be important if one needs to use "off-the-shelf" codes such as those prescribed in standards.
6. Concluding remarks. We have developed codes for wireline and wireless relay channels. For the wireless case, we have chosen to study fast-fading channels and have designed distributed iteratively-decodable codes that approach certain information rates. We remark that it is also interesting to study slow-fading channels. This type of problem has been addressed in the context of CDMA [16], distributed space-time coding [1, 3, 13]' and distributed convolutional and turbo coding [10, 17, 19, 24].
REFERENCES [1] P.A. ANGHEL, G. LEUS, AND M. KAVEH, "Distribued space-time cooperative systems with regenerative relays," IEEE Trans. Wireless Commun., submitted. [2] M.R. AREF, Information Flow in Relay Networks. Ph.D. thesis, Stanford Univ., Stanford, CA, Oct. 1980.
[3]
S. BARBAROSSA,
L.
PESCOSOLIDO,
D.
LUDOVICI,
L.
BARBETTA, AND
G.
SCUTARI,
"Cooperative wireless networks based on distributed space-time coding," Proc. Int. Workshop on Wireless Ad-hoc Networks (IWWAN), June 2004.
COMMUNICATION STRATEGIES AND CODING FOR RELAYING
175
[4] S. TEN BRINK, G. KRAMER, AND A. ASHIKHMIN, "Design of low-density paritycheck codes for modulation and detection," IEEE Trans. Commun., 52(5): 670-678, April 2004. [5} A.B. CARLEIAL, "Multiple-access channels with different generalized feedback signals," IEEE Trans. Inf. Theory, 28(6): 841-850, Nov. 1982. [6] T.M. COVER AND A.A. EL GAMAL, "Capacity theorems for the relay channel," IEEE Trans. Inf. Theory, 25: 572-584, Sept. 1979. [7] T.M. COVER AND J .A. THOMAS, Elements of Inform. Theory. New York: Wiley, 1991. [8] G.J. FOSCHINI, "Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas," Bell Labs. Tech. J., 1(2): 41-59, 1996. [9] R.G. GALLAGER, "Low-density parity-check codes," IEEE Trans. Inf. Theory, 8: 21-28, Jan. 1962. [10] T.E. HUNTER, S. SANAYEI, AND A. NOSRATINIA, "The outage behavior of coded cooperation," Proc. IEEE Int. Symp. Inf. Theory, Adelaide, Australia, JuneJuly, 2004, p. 270. [11] G. KRAMER, "Models and theory for relay channels with receive constraints," Proc. 42nd Annual Allerton Conf. on Commun., Control, and Computing, Monticello, IL, Sept. 29, Oct. 1, 2004, pp. 1312-1321. [12} G. KRAMER, M. GASTPAR, AND P. GUPTA, "Cooperative strategies and capacity theorems for relay networks," IEEE Trans. Inf. Theory, 51(9): 3037-3063, Sept. 2005. [13] J.N. LANEMAN AND G.W. WORNELL, "Distributed space-time-coded protocols for exploiting cooperative diversity in wireless networks;" IEEE Trans. Inf. Theory, 49(10): 2415-2425, Oct. 2003. [14} M.G. LUBY, M. MITZENMACHER, M.A. SHOKROLLAHI, AND D.A. SPIELMAN, "Efficient erasure correcting codes," IEEE Trans. Inf. Theory, 47(2): 569-584, Feb. 2001. [15} F.D. NEESER AND J.L. MASSEY, "Proper complex random processes with applications to information theory," IEEE Trans. Inf. Theory, 39(4): 1293-1302, July 1993. [16] A. SENDONARIS, E. ERKIP, AND B. AAZHANG, "User cooperation diversity-Part II: Implementation aspects and performance analysis," IEEE Transactions on Communications, 51(11): 1939-1948, Nov. 2003. [17] A. STEFANOV AND E. ERKIP, "Cooperative coding for wireless networks," IEEE Trans. Commun., 52(9): 1470-1476, Sept. 2004. [18] I.E. TELATAR, "Capacity of multi-antenna Gaussian channels," Eur. Trans. Telecom., 10: 585-595, Nov. 1999. [19] M.C. VALENTI AND B. ZHAO, "Distributed turbo codes: towards the capacity of the relay channel," Proc. IEEE Vehicular Tech. Conf. (VTC), Orlando, FL, Oct. 2003, pp. 322-326. [20] E.C. VAN DER MEULEN, Transmission of Information in a T-Terminal Discrete Memoryless Channel. Ph.D. thesis, Univ. of California, Berkeley, CA, June 1968. [21] F.M. J. WILLEMS, Informationtheoretical Results for the Discrete Memoryless Multiple Access Channel. Doctor in de Wetenschappen Proefschrift, Katholieke Universiteit Leuven, Leuven, Belgium, Oct. 1982. [22] L.-L. XIE AND P.R. KUMAR, "A network information theory for wireless communication: scaling laws and optimal operation," IEEE Trans. Inf. Theory, 50(5): 748-767, May 2004. [23] L.-L. XIE AND P.R. KUMAR, "An achievable rate for the multiple level relay channel," IEEE Trans. Inf. Theory, 51(4), April 2005. [24] B. ZHAO AND M.C. VALENTI, "Distributed turbo coded diversity for relay channel," Electron. Lett., 39(10): 786-787, May 2003.
SCHEDULING AND CONTROL OF MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS WITH RANDOMLY-VARYING CHANNELS BY STABILITY METHODS HAROLD J. KUSHNER*
Abstract. We consider a communications network consisting of many mobiles. There are random external data processes arriving at some of the mobiles, each destined for a unique destination or set of destinations. Each mobile can serve as a node in the possibly multi-hop (and not necessarily unique) path from source to destination. At each mobile the data is queued according to the source-destination pair.. Time is divided into small scheduling intervals. The capacity of the connecting channels are randomly varying. The system resources such as transmission power and/or time, bandwidth, and perhaps antennas, must be allocated to the various queues in a queue and channelstate dependent way to assure stability and good operation. Lost packets might or might not have to be retransmitted. At the beginning of the intervals, the channels are estimated via pilot signals and this information is used for the scheduling decisions, which are made at the beginning of the intervals. Stochastic stability methods are used to develop scheduling policies. The resulting controls are readily implementable and allow a range of tradeoff's between current rates and queue lengths, under very weak conditions. The basic methods are an extension of recent works for a system with one transmitter that communicates with many mobiles. The choice of Liapunov function allows a choice of the effective performance criteria. All essential factors are incorporated into a "mean rate" function, so that the results cover many different systems. Because of the non-Markovian nature of the problem, we use the perturbed Stochastic Liapunov function method, which is designed for such problems. Various extensions (such as the requirement of acknowledgments) are given, as well as a useful method for getting the a priori routes. Key words. Scheduling in stochastic networks, randomly varying link capacities, mobile networks, stochastic stability, stability of networks with randomly varying links, routing in ad-hoc networks, perturbed stochastic Liapunov functions.
AMS(MOS) subject classifications. 93E15.
49Q05, 49K40, 60K25, 90B15, 93D09,
1. Introduction. The paper considers the problem of scheduling in a network of M mobiles (to be referred to as nodes) with time varying link capacities. There are many (8) external sources with bursty data processes, each sending its data to its unique origin node, to be sent through the network to a unique (except for the multicasting case) destination node. At each mobile, the data is queued until transmitted, in an infinite buffer depending on the source-destination pair. Some mobiles serve as intermediaries in the possibly multi-hop connections between sources and destinations. The routes between source and destination need not be unique. We are concerned with the efficient and stabilizing allocation of the sys*Applied Mathematics Department, Brown University, Providence, RI 02912, USA (hjkCOdam. brown. edu). This work was partially supported by NSF grant DMS-0506928
and ARO contract W911NF-05-10928. 177
178
HAROLD J. KUSHNER
tern resources, say, transmission power, time and bandwidth, to the various queues at each mobile in a queue and channel-state dependent way. Time is divided into small scheduling intervals. The capacities of the connecting channels in each interval form a correlated random process. At the beginning of the intervals, the capacities (or surrogates such as the S / N ratios) are estimated where possible via pilot signals and this information is used for the scheduling during that interval. The resource allocation decisions are made at the beginning of the intervals. Owing to the random nature of the arrival and channel processes, the computation or even the existence of stabilizing policies is not at all obvious. The approach is a network extension of the development for the one-node case in [4]. The channel processes are usually non-Markovian.' Even if it and the arrival processes were Markovian, it would be extremely difficult to use classical stability methods, but the versatile perturbed Liapunov function method [4, 7] can be used to obtain stabilizing scheduling policies. Let X denote the vector of all queue values at all of the nodes (all data quantities are measured in packets). With the perturbed Liapunov function method one starts with a basic Liapunov function V(X) that works for an approximating "mean flow" system where the randomness has been averaged out in a particular "controlled" way. Then one gets a perturbation ()V(n) so that V (X (n)) + ()V(n) can be used as a Liapunov function for the true non-Markov physical system. Analogously to the "stability" method for selecting controls, the controls are determined by "approximately" minimizing a conditional expectation of the rate of change of the basic Liapunov function along the random path. The formulas are simple and the algorithm is readily implemented. For simplicity, we use a basic Liapunov function that is a polynomial which is the sum of terms, each depending on a single component of the state of the queue. This seems to be adequate for current needs, but a large family of strictly convex separable functions can also be used. The end result is that, if a certain "mean flow" is stabilizable, then so is the physical system under our scheduling rule. This stabilizability condition can often be readily verified, and appears to be very close to a necessary condition. Some useful extensions are discussed in Section 4. There we give the modest changes that are required when a packet can be lost and the receipts on each individual link must be acknowledged. The multicasting case is briefly outlined and there is a discussion of a simple model where the number of sources can vary in time. Various extensions are implicitly included in the basic formulation. For example, channel breakdowns, priority users, and random connectivity, The (n + l)st scheduling interval will be called the nth slot. The argument (n) denotes the beginning of the nth slot, and is referred to as "time n." Let Xi,k(n) denote the queue size at time n at node k of 1 E.g.,
Rayleigh fading.
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
179
data coming from source i (defined to be zero if node k is not on the path for source i). Define the vectors X k (n) == {Xi,k, i :::; S} and X (n) == {Xk(n), k ::; M}, with canonical values X k and X, resp. With given weights 2 Wi,k, the basic Liapunov function will be
V(X) ==
L Wi,kX~k'
P '? 2.
(1.1)
i,k
A stability analysis should assure robustness of behavior to small changes in the process dynamics; hence it is preferable to use methods that do not require the Markov property. The perturbed Liapunov function method does not require Markovianness. In applications, there are often many criteria that are of interest, e.g., mean delay and variance of delay. One should experiment with the form of the Liapunov function and examine the effects of the associated scheduling rules in order to get insight into the tradeoffs between competing criteria. Such an experimental procedure would give more insight and better rules than those obtained with a single fixed rule. The wide choice of functions V(X) facilitates such experimentation. There is much work on scheduling in the presence of various types of channel and data process randomness. But very little is available on scheduling for the general network case when the channels are randomly varying in a non-trivial way. For the one-node case where the rate of transmission is proportional to power, [1,9] gets rules for power allocation whose form is similar to ours when p == 2 (such rules are called "max weight" there), and which are based on stability considerations. The method uses large deviations estimates and the setup is Markovian. See also [4]. The reference [11], perhaps the first to deal with random channels in a network, allocates power. Since their channel-rate and data-arrival processes are all i.i.d. sequences (this assumption is required by their method), the possible applications are very limited. The papers [2, 3] deal with related problems, again essentially for onenode systems. There is a set of parallel processors, and the connectivities between the sources and the processors (but not the outgoing channels) vary randomly. They prove results concerning the limit (as t ~ oc] of (queue length at t)/t, and give conditions under which this limit is zero. This is used to show that the integral of the "rates" of transmission per unit time converges. They allocate a single resource (e.g., bandwidth) and the 2We could use
2:i,k Vi,k(Xi,k),
2:i k Wi,k[Xi,k
wher~ the Vi,k(-)
+ hi,k]Pi,k
, where Pi,k 2: 2, hi,k 2: 0 or V(X)
=
are strictly convex non-negative functions, whose
first derivative DVi,k(Xi,k) is O(\!i,k(Xi,k)) and second derivative is O(D\!i,k(Xi,k)' One can choose the function, for example, to model upper bound constraints on some queues. The choice of the functions and powers allows a variety of tradeoffs between queue size and throughput. We use (1.1) since the notation is simpler. But the development is parallel for the other cases, and the same conditions would be used.
180
HAROLD J. KUSHNER
rate is proportional to the allocation. Our proof is easily adapted to that problem, with the definition of stability to be used here. The work [10], for a one node model, has a Markovian channel-state process, the data input sequence is i.i.d., and a "complete resource pooling" condition is required. The decision rule is the same as ours for a quadratic Liapunov function. The emphasis is on stability and simplification of the model in the heavy traffic limit. The paper [6) treats the same subject as this paper, but the routes are restricted to be unique, and the set of extensions is different. When acknowledgments are required (as in Section 4), they are sent to the origin node. Here transmission on each link must -be acknowledged. The developments differ in the type of Liapunov function perturbations that are used. See also [5] for a stability analysis as the heavy traffic regime is approached.
2. The problem formulation. Definitions. Let k denote a canonical node and let(i, k) denote the queue of source i data at node k. Since the routing is not necessarily unique, queue (i, k) might have possible forward links to any number of other nodes. Let {f(i,k,a),a} denote the possible next nodes for queue (i,k). These are indexed by the parameter a, whose value ranges over a set that depends on i, k. This set will not be specified, but all summations over a, for fixed i, k, are assumed to be over this set. Similarly, queue (i, k) might receive data from any number of other nodes. Let {b('i, k(3), (3} denote the possible nodes from which (i, k) can receive data, indexed by the parameter {3, whose value ranges over a set that depends on i, k. This set will not be specified, but all summations over {3, for fixed i, k, are assumed to be over this set. If no route for source i uses node k, then queue (i, k) does not exist, and we ignore Xi,k, f(i, k, a) and b(i, k, (3). If the routing from (i, k) is unique, then a takes only one value. If k is the origin node for source i, then terms involving b(i, k, (3) are ignored, as are terms involving f(i, k, ex) if node k is the terminal node for source i. Let L k (n) denote the (vector) set of channel states at node k, at time n. It is a vector consisting of the states of all of the possible links {( i, f( i, k, ex)); i, ex} that are outgoing from node k. Lk(n) could be just the set of S / N ratios at the receivers corresponding to unit transmitted power, or it might be some other indicator of the link capacities. It is notationally convenient to work with the vector Lk(n), rather than with the individual links, since the decisions at each node k depend on the states of all of the possible outgoing links. Lk(n) might denote other quantities in addition to the channel quality. For example, there might be power constraints that vary randomly due to interference from exogenous sources. These could be included in the Lk(n). If some link at node k is unavailable at time n, then that fact could also be included in L k (n). For notational simplicity, we suppose that the channel state vector takes only finitely many values
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
181
for each node k. The (vector-valued) symbol j is used for the canonical value of Lk(n), for any k, n. The range of j will depend on the node k, but will be suppressed in the notation. Let di,k,a (n) denote the number of packets sent from queue i of node k to queue i of node a at time n. It will depend on the channel state and the allocated resources (e.g., power, frequency, bandwidth). It is always zero if node k is not on any path for source i. Let ai,k(n) denote the actual random number of arrivals in slot n from the exterior, if any, from source i at node k. These will be non-zero only for the unique node k(i) at which source i enters the network. Let F n denote the minimal a-algebra that measures all of the systems data up until time n as well as the channel states {Ln(k), k} in slot n. These channel states are assumed to be available at time n. Let En denote the expectation conditioned on F n . We say that the packets sent in slot n are sent at time n, when the scheduling decisions are made.
Stability: Definition. An appropriate definition of stability is a "uniform mean recurrence time" property. Suppose that there are 0 < qo < 00 and a real-valued function F(·) 2 0 such that the following holds: For any n and the random time a1 == min{k 2 n: IX(k)1 ~ qo}, we have" (2.1) Then the system is said to be stable. If IX(n)1 reaches a level ql > qo, then the conditional expectation of the time required to return to a value qo or less is bounded by a function of q.1, uniformly in the past history and in n. 4 Note that the right side of (2.1) depends only on X (n), and nothing else, even though there is a conditional expectation En on the left side, and the channel and arrival processes are random and correlated.
The decision rule. The number of packets, di,k,a(n), transmitted from queue (i, k) in slot n to node f (i, k, Q') depends on the allocated resources, such as power, bandwidth, or time. Such resources are subject to constraints, either locally (at each node) or globally (for the entire network). The constraints might be just bounds on the total resources available at a node or on the number of packets than can be sent in a slot, in which case the determination of the di,k,a (n) for all i, Q' can be all made at node k. If the constraints involve more than one node, then making the assignments requires coordination among the nodes. In classical control theory, stability ideas are often used to obtain controls that assure a stable system. Typically, one chooses a Liapunov function and then selects the control that minimizes its "rate of change" on the path. The idea is similar in our case. We will choose the di,k,a (n) that 30"1 = 00, unless otherwise defined. 4This implies that the sequence {X (n)} is tight or bounded in probability (see, for example, [7, Theorem 2, Chapter 6].
182
HAROLD J. KUSHNER
minimize an approximation to EnV (X (n + 1)) - V (X (n)). To motivate what will be done, let us start with the evaluation
Wi,k [EnXf,k(n + 1) - Xf,k(n)]
= Wi,k Xf,;;l(n)
[-
~ di,k,a(n) + Enai,k(n) + ~ di,b(i,k,,B),k(n)]
+ terms of order (p -
2) in Xi,k(n).
Note that di,b(i,k,f3),k(n) == number of packets sent from queue (i, b(i, k, (3)) to queue (i, k) at time n. Hence the last sum is the total number of packets arriving at node k at time n from all nodes. The sum over i, k of the terms in the second line that do not involve the ai,k can be written as
-L
[Wi,k Xf,;;l(n)-
i,k,a
Wi,f(i,k,a)Xr,f(~,k,a)(n)]di,k,a(n).
(2.2)
This can be written as
The lower order terms in Xi,k (n) are nonlinear functions of the di,k,a (n) and higher conditional moments of the ai,k(n), and would be very hard to deal with. It turns out, as in [4], that is is enough to base the decisions on (2.2) or (2.3). If the decisions at node k need not be coordinated with those at any other node, then the decision is a maximizer in
subject to the local constraints. If there are constraints that involve the decisions at a set of nodes, then the decisions for such a set must be made together, and the decision rule is a maximizer in {d i
k
max
o:(n)ji,k,a}
,,
L
i,k,a
[Wi "kXf-,;l (n) -
ui,
'
f(i "k a)X~f-(l. k )(n)] di , k , a (n), (2.5) 1, 1, ,a
or, equivalently, in
subject to the constraints. If {X(n)} is not a Markov process, then V(X) cannot be a Liapunov function for the system. However, as shown in the
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
183
next section, the perturbed Liapunov function method [4, 7, 8] can be used to prove that the maximizing rules (2.4), (2.5), or (2.6), assure stability under reasonable conditions. Let Ui,k,a(j, X) denote the control function at queue i at node k for the transmission to node f(i, k, a). The Ui,k,a(j, X) represents the allocated resources (power, time, bandwidth, etc.) that are allocated at queue (i, k) to the link to node a. Also, unless otherwise noted, its dependence on the queues is only on X k and the required queue values at the immediate upstream nodes, namely the Xi,!(i,k,a) for all i, a. If no route for source i uses node k, then ignore Ui,k,a(j, X). The amount of data that is sent from queue (i, k) to queue i at node a is determined by the allocated resources Ui,k,a(j, X) and the current channel state at node k. Let the function gi,k,a(j, Xi,k, Ui,k,a(j, X)) denote the actual amount of data that is sent under current channel state j and control Ui,k,a (.). This defines di,k,a(n); Le., the channel rate for queue (i, k) on the link to node a, associated with channel state j and control Ui,k,a(j, X(n)) is di,k,a(n) == gi,k,n(j, Xi,k(n), Ui,k,n(j, X(n))). The Xi,k appears as an argument of 9i,k,a(.) only because the amount sent cannot be larger than the queue content.
Assumptions. The following assumptions are network analogs of those used in [4] and will be commented on further below. A2.1. There are constraint sets U» such that {Ui,k,a(j,X);i,a} E Uk. It is always assumed that the maximizing constrained di,k,a(n) in (2.4), (2.5), or (2.6) exist and are Borel functions of the {X(n), Lk(n); i, k}. A2.2. There is a constant K 1 such that Enlai(n)IP :S K 1 · There are ~fk such that the sums ' v
6~~k(n) ==
L
[Enai,k(l) - ~f,kJ
l=n
converge as v
-+ 00,
uniformly in n, w.
It. follows from the definitions that ~f,k == 0 if node k is not the source node k(i) for source i. For future use, write ~f == ~f,k(i) the mean input rate for source i (measured in packets per slot).
A2.3. For each node k there are Ilk,j 2: 0 such that 2: j Ilk,j == 1 and 2:~=n [EnI{Lk(l)=j} - Ilk,j] converges as v -+ 00, uniformly in n,w.
A2.4. Define Ki, == maXi,k,j,u,n,X [gi,k,a (j, Xi,k, Ui,k,n (j, X))] . There is a resource allocation {Ui,k,n(·); i, k, a} such that the following holds under it. There are non-negative real numbers {q?z, k .o ; i, k, a} such that q?t , k ,n ==
184
HAROLD J. KUSHNER
gi,k,a(j, Xi,k(n), Ui,k,a(j, X(n))) if Xi,k(n) 2 K O•5 Also, if Xi,k(n) < then gi,k,a(j, Xi,k(n), Ui,k,a(j, X(n))) < The q{k,a satisfy
e.:
~ -~. II· . ~ qt,b(t,k,{3),k b(t,k,{3),]
-. -== ~ -~ II· -< qt,k ~ qt,k,a k,], each i, k
(3,j
i= k(i),
x;
(2.7)
j,a
and, for k == k(i), ~f
< iii,k.
Comments on the assumptions. (A2.1) simply states that there are constraints on the resources and allocations. (A2.2) and (A2.3) are mixing conditions on the data arrival and channel processes, resp., and do not appear to be restrictive. If the arrivals occur in batches, with the batches and intervals being mutually independent and each iid, then (A2.2) is just a constant times the residual time to the first arrival. See [4] for more discussion of this point. Both (A2.2) and (A2.3) say that the expectation of the future values of the random variables given the data in the remote past converges to the average in a "summable" way as the difference between the times goes to infinity. (A2.3) holds for the received signal power associated with Rayleigh fading. (A2.4) basically requires that there are controls under which the mean service rate at queue (i, k) for any i that uses node k is slightly greater than the mean data arrival rate ~f, if the queues remain large, for all (i, k). Similar conditions occur frequently in studies of stability in stochastic networks.
A variation of (A2.2) and (A2.3). The convergence of the sum in (A2.2) can be replaced by the condition that Enai,k(l)-~~k --+ 0 uniformly in n, w as k - n --+ 00. Then the perturbation (3.1) would be replaced by m+n 1(n)
Wi,k X fk
L En [ai,k(l) - ~f,k]
l=n
for large enough m. The error terms in the proof are slightly different, but the method is the same. Analogous remarks hold for (A2.3) and the perturbations (3.2).
An equivalent form of (A2.4). Abusing terminology slightly, for k i= k(i), define iii,b(i,k) == 2:{3,j q1,b(i,k,{3),k IIb(i,k,{3),j, the average (over the channel variations) flow into (i,k) under the rates {q{k,a;i,k,j,o:}. Then it is implied by (A2.4) that the q{k,a can be taken to satisfy
L q1,k(i),a IIk(i),j > ~f,
(2.8a)
[.cx
5The lower bound Ki, is introduced in (A2.4) only because if the queue content is smaller than the maximum of what can be transmitted on a scheduling interval, then the mean (weighed with the Ilk,j) output might be too small to assure (2.7). For example
if a queue is empty, then there are no departures.
185
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
and that there is Co > 0 such that for k
=1=
k( i),
average into (i, k) -average out of (i, k) == qi,b(i,k) -qi,k
:s;
-Co
< O. (2.8b)
Section 5 gives a useful method for getting both the routing and the
«.,
Example. Let the control be over bandwidth, with the rate proportional to bandwidth. Let the bandwidth allocated to (i, k) for transmission to node a be denoted by B{, k ,0 ,let the constants of proportionality be c; , k , a and define the rate straints ~. B~ k L--n,a ~"a
ql,k ,
»:
== d~" k k a' There are the total bandwidth con" ~ Bi, for each i, k. Suppose that the set of inequalities a
~. q~ k Ilk)' > >...r, all k, has a solution. Then the corresponding q{.. , k,\A W),a ~"O , satisfy (A2.4).
r.,
3. The stability theorem and Liapunov function perturbations. Suppose that, for a random process {x( n)}, we have En V (x(n + 1)) - V(x(n)) == Cn, where {cn} is a random sequence that is "mixing" in the following sense. There is a constant c < 0 such that En[cn+m - c] ---4 0 fast enough as m ---4 00, for the sum 6"Vn == L:i=nEn[Ci - c] to converge (and be bounded) uniformly in n, where En now denotes the expectation conditioned on {cz, I ~ n}. Define Vn == V (x( n~)) + bVn . Then E n£5Vn+1 - £5Vn == -(cn - c) and En Vn+1 - Vn == Cn - [cn - c] == c < o. The use of the perturbation has allowed us to replace c., by a "mean." The perturbed Liapunov function method is an extension of this idea. The perturbation 6"V(n) that will be used will be a sum of components, one associated with each possible external input process, and one associated with each input link and one to each output link of each queue. The motivation for their form should be apparent from the way that they are used in the proof. See also [5, 7] for more motivation of the construction of the perturbations. Recall that k( i) denotes the arrival node for source i. The perturbation associated with the arrivals from source i is 00
£5V:~k(n) == Wi,kXr~l (n) LEn [ai,k(l) - >"'f,k] . l=n This is zero if k
(3.1)
-# k( i).
The function 6"~~k~,Q(n) defined in (3.2) is concerned with the effects of the departure of packets from queue (i, k), via link f (i, k, a), on the value of EnXfk (n + 1) - X~k (n) when j is the vector-valued channel state at node k, and under the fixed rate ij~t , k ,\A defined in (A2.4). The £5V.t d, k,-,),· (3(n) is concerned with the effects on EnX~k(n + 1) - X~k(n) of the inputs to (i, k) from the link leading to it from node b(i, k, (3), when the vector-valued channel state at node b(i, k, (3) is j, and under the fixed rate ii;'b(i,k,{3),k. r.,
186
HAROLD J. KUSHNER 00
6V:~kj,a(n) == -Wi,k XfJ;l(n)ii{k,a LEn [I{Lk(l)=j} - Ilk,j] , l=n
(3.2) 00
1
Wi,kXf,"i: (n )iil,b(i,k,/3),k
L En [I{
- IT b(i,k,/3 ),j ] ,
L b ( i , k , {3) (l)=j}
l=n
The complete perturbation and time-dependent Liapunov function are, resp.,
oV(n) =
L oV:~dn) + L
i,k V(n) == V(X (n))
i,k,j,a
OV:~kj,a(n) +
L
ov:ti,/3(n), (3.3)
i,k,j,{3
+ 6V (n).
Theorem 3.1. The system is stable under (A2.1)-(A2.4). Proof. The function V(n) is the (time-varying) Liapunov function that is to be used. We need to show that V(n) is a local supermartingale, when the queue values are large. In particular, we need to show that there is c > 0 such that for large X, we have EnV(n + 1) - V(n) ::; -c, and then to show that this inequality can be used to get (2.1). Thus, we need to evaluate
EnV(n + 1) - V(n) =
+L
i,k
L + L
L wi,kEn [Xf,k(n + 1) -
Xf,k(n)]
i,k En [l5~~k(n + 1) -l5~~k(n)J
+
En
[OV;~k:'i,a(n + 1) - OV;~k:'i,a(n)]
En
[OV:~kJ,/3(n + 1) - OV;~kJ,/3(n)] ,
i,k,j,a
i,k,j,{3
The components will be evaluated separately, and then the results summed. The summation will effectively cancel various "undesirable" terms, and replace them by averages. This is the key idea of the method. In the expansions to follow, K denotes a constant whose value might vary from usage to usage. A first order Taylor expansion yields
LWi,kEn [Xf,k(n+1)-Xf,k(n)] = i,k
L Wi,k X fk
1(n)
[Enai'k(n) -
i,k
+O(IXp-2(n)1) + K.
L di,k,a(n) + L d
i'b(i'k'/3)'k(n)]
a
/3
(3.4)
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
187
Now consider (3.1) for any i, k. Recall that Jv:ak(n) == 0 if k # k(i), the origin node for source i. If k == k( i), then a first 'order expansion yields En6~~k(n
== -Wi,k
+ 1) -
b"~~k(n)
Xfk 1(n)
[Enai,k(n) - ~f,k]
+ O(IXP-2(n)1) + K.
Thus,
L En [b"V:~k(n + 1) - JV:~k(n)] i,k == - L Wi,k Xfk 1(n) [Enai,k(n) -
~f,k]
+ O(IXP-2(n)1) + K.
(3.5)
i,k Let us see what has been accomplished so far. On adding (3.4) and (3.5), we see that the terms Wi,kXfk1(n)Enai,k(n) are cancelled, and a "mean value" term Wi,kXfkl(n)~f,k appears, together with a term of order p - 2. These lower order terms will be dominated by the terms of order p - 1 for large values of the queue state. The desire for such cancellations and replacements by mean values determined the form of (3.1). Let us now consider the perturbation defined by the first term in (3.2), which will facilitate dominating the di,k,o:(n) in (3.4) by a term that can be effectively averaged. The definition (3.2) yields
En [b"V. dk,+· (n + 1) - b"V.1"dk,+·,J,O:(n)] ,J,O: 1"
==
00
-Wi,k EnXfk
1(n
L
+ l)q{k,o:
1 +Wi,kXfk (n )q{k,o:
En+ 1 [I{Lk(l)=j} - ilk,j]
l=n+l
00
(3.6)
2: En [I{Lk: (l)=j} - ilk,j] . l=n
Rewrite (3.6) by splitting out the lowest summand of the last term to get
00
-Wi,kEnXfk1(n + l)q{k,a 00
+Wi,k Xfk
1(n)q{k,o:
2:
2:
En+ 1 [I{Lk(l)=j} -
n.;
l=n+l
(3.7)
En [I{Lk(l)=j} - ilk,j] .
l=n+l
By expanding Xfk 1(n + 1) - Xfk 1(n) we can represent (3.7) as En [6v: dk+' (n + 1) - b"V.1"dk,+·,] (n)] , ,] 1(n)q{k,0: == Wi,k Xfk [I{Lk(n)=j} - ilk,j]
+ O(IXP-2(n)1) + K.
(3.8)
188
HAROLD J. KUSHNER
An analogous procedure yields that
En
[OV;~C,!3(n + 1) - OV;~C,!3(n)]
=
1
(3.9)
-Wi,kXf,"k (n )ii1,b(i,k,!3),k [I{Lb(i,k,fJ) (n)=j} - Il b(i,k,!3 ),j ]
+O(IXp-2(n)1) + K. Now add the expansions (3.4), (3.5), (3.8), and (3.9), over i, k, j, a, (3. Some terms in one expansion are the negative of terms in some other expansions. Adding the expansions and canceling such terms yields the expression -
-
p-l
- a
En V(n + 1) - V(n) == Ei,k Wi,kXi,k (n)Ai,k
+ Ei,k [ - Wi,kXf,k 1(n) Ea di,k,a(n) + Wi,kXf,k 1(n) E!3 di,b(i,k,!3),k (n)] + Ei,k,j Wi,kXfk1(n) Ea q;'k,a [I{Lk(n)=j} - n.,. 1
- Ei,k,j Wi,kXf,k (n)
E i3 ii~,b(i,k,!3),k
[I{Lb(i'k'fJ) (n)=j} - Il b(i,k,!3 ),j ]
+O(IXp-2(n)1) + K. (3.10) The terms in the second, third and fourth lines of (3.10) that do not involve the ilk,j variables are
- L Wi,kXf,k1(n) [ L di,k,a(n) - L di,b(i,k,!3),dn)] i,k
+L
i,k
a
Wi,kXf,k 1 (n) [
(3
L ii{k,a I{ Ldn)=j} - L ii;,b(i,k,!3),k I{ Lb(i,k,fJ) a,j
(n)=j} ] ,
{3,j
For each k, a, (3, the indicator functions in the above sums over j select the actual current channel state j == L k (n) or j == L b( i,k,(3) (n), as appropriate. Hence, the previous expression can be rewritten as
(3.11)
It is simpler to complete the proof first under the assumption that Xi,k(n) ~ K o for all i, k, and then to add the few details for the general case. If all Xi,k(n) 2: Ki, then by (A2.4) there are resource allocations {Ui,k,a (.); i, k, a} such that, for channel state j, the output from queue (i, k) to queue (i, f(i, k, a)) will be gi,k,a(j, Xi,k(n), Ui,k,a(j, X(n))) == The di,k,a(n) are chosen by either the maximization rule (2.4), or by the rules (2.5) or (2.6) (which are equivalent to each other). The rule (2.4) is implied
e..
189
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
ql
(2.5) and by (2.6). On the other hand, the k a defined in (A2.4) are not necessarily maximizers in (2.6). Hence the expr~ssion (3.11) is non-positive. Using this non-positivity in (3.10) together with the definitions of qi,k,a, qi,k and qi,b(i,k) yields the following upper bound to (3.10):
L Wi,kX~k1(n) [~f,k - qi,k + qi,b(i,k)] + O(IXp-2(n)l) + K.
(3.12)
i,k By (2.8), the terms in the brackets in the first line of (3.12) are:::; -co Thus we have proved that
EnV(n + 1) - V(n) < -co
L Wi,kXrk1(n) + O(IX(n)\p-2) + K,
< o.
(3.13)
i,k 6V(n) satisfies 16V(n)1 == O(IX(n)IP-1)
+K
(3.14)
and, by (3.13),
EnV(n + 1) - V(n)
--t -00,
By (3.15), there are
C1
uniformly in n, w as X(n)
> 0 and qo > 0, such
--t 00.
(3.15)
that, for IX(n)1 ~ qo,
(3.16) Given small 6
> 0, (3.14) implies
that for qo sufficiently large,
IV(X(n)) - V(n)l :::; 6(1 + V(X(n)).
(3.17)
Let 0"0 be a stopping time for which IX(O"o)1 == C2 > qo, and define the stopping time 0"1 == min{n > 0"0 : IX(n)1 ~ qo}. Then, by (3.16), we have
(3.18) Using (3.18) and the bound (3.17) on V(n) - V(X(n)) to bound V(O"i) -
V(X(ai)), i == 0,1, yields
-ss., [1 + V(X(O"l))] + e.;V(X(O"l)) < E aaV(O"l)
~ -c1Eao(0"1 - 0"0)
+ [6 + V(X(0"0))(1 + 6)]
or
E (0"1 _ 0"0) ao
_ < _26_+_V_(X_(_O"o_))_(_1+_6)_+_6_E_aoV_(_X_(0"_1_)) C1
which implies that the definition of stability (2.1) holds since V (X (0"1)) :::; SUPlxl~qo V(x).
190
HAROLD J. KUSHNER
Finally, we complete the details when some components of X(n) are less than K«. Recall the definition of U1,k,a(·) and q{k,a in (A2.4). Define
«;
For Xi,k(n) 2:
we have 9i,k,a(Lk(n), X(n)) == q::~,~) by the defi-
nition of the q::~,~) in (A2.4).
If Xi,k(n) ~ K o then, also by (A2.4),
9i,k,a(Lk(n), X(n)) ~ q::~,~). Rewrite (3.11) as follows.
- L Wi,kXf,;;l(n) [ L di,k,a(n) - L di,b(i,k,,B),k(n)] i,k a f3
+
L
1(n)
Wi,k X fk
i,k
~ 9i,b(i,k,{3),k (Lb(i,k,{3) (n), X (n))]
L L
[q::~,~) -
Wi,k Xf,;;l(n) L
i,k:Xi,k (n)'2K o
+.
9i,k,a(Lk(n), X(n))
a
-
+
[L
9i,k,a(Lk(n), X(n))]
a
Wi,kXf,;;l(n) [ Lq::~,~) - L9i,k,a(Lk(n),X(n))]
~,k:X'i,k(n)
a
a
- L Wi,kXf,;;l(n) [ L q~~(~:~':;l):~) - L 9i,b(i,k,{3),k(Lb(i,k,{3) (n), X (n))] . i,k
f3
f3
As was argued for the case where all Xi,k 2: K o, the sum of the first three lines is non-positive, since the {di,k,a(n); i, k} are chosen by the maximization rule. The two terms in each bracket in the fourth line are equal by the definition of the 9i,k,a when Xi,k(n) 2: K«. Hence this term is zero. By (A2.4), the bracketed terms in the last line are non-negative, hence the last line is non-positive. Thus the only possible positive term is the next to last line, and this is 0(1) since it is a sum over i, k for which Xi,k(n) ~ K o. Thus (3.11) is 0(1). iFrom this point on the proof is completed just as for the case where all Xi,k(n) 2:
«;
•
Notes. The decision rule (2.4) requires that each node k know the value of the Xi,k(n) and Xi,!(i,ka) (n) for all i, 0: that are relevant at node k. In fact, if the value of the Xi,!(i,k,a) (n) were known only subject to a bounded error, then the proof still goes through under the same conditions. So, only an occasional approximate estimate of the queues at the upstream nodes is needed. Suppose that some links are preempted by priority users from time to time, where the intervals of availability are defined by a renewal process that is independent of the arrival and channel rate processes. Then it can be shown that the results continue to hold, but with the ili,k multiplied by the fraction of time that the channel is available, so the capacity must
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
191
be sufficient to handle the average down times, Under the other assumptions, condition (A2.4) is sufficient but not necessary for stability. But it is "nearly" necessary in the sense that if for each choice of the {qf k Q:}' there is some (i o, ko) such that iiio,b(io,k o) - iiio,k o > 0, then the system ~ould not be stable.
4. Some extensions. The basic approach to scheduling and stability can be extended in many ways, and the examples described below illustrate some of the possibilities. A. Acknowledgments of receipt required for each link. The foregoing development did not require that received packets be acknowledged. Suppose that packets on the link from any queue (i, k) to node Q that are not acknowledged within a window Wi,k,a of scheduling intervals will need to be requeued at (i, k) and retransmitted. The treatment of the acknowledgment and loss processes involves a more complicated notation and an additional perturbation to the Liapunov function. In order to keep the notation reasonable, we will suppose that the routing is unique for each source-destination pair. Thus, the indices Q, {3 can and will be dropped. The approach for the non-unique routing case is essentially the same, with analogous results. The acks are sent back to the previous node when a packet is received, subject to a possible delay. If we fully accounted for the possibility that the packet loss or non-ack process depended on the traffic in the channel, and the channel characteristics, the resulting problem would be very difficult. Because of this, it is often assumed that the loss is a consequence of uncontrolled additional traffic in the channels. We will take the following often used approach. For each link, an ack for each received packet is sent to the node from which it just came. If a packet sent from queue (i, k) at time n is not acknowledged by time n + Wi,k, then that packet will be requeued at (i, k). The development in Section 3 can be readily modified to accommodate these changes. The development in [6] supposed that acks for source i data are sent only to the origin node k( i), and that packets lost anywhere must be retransmitted from that node. Here acks are required for each link. Until the end of the example, we suppose that the packet loss process is random. Thus, the events that packets are lost are independent among the inks, iid for the packets on each link, and independent of the channel states, decisions, and arrivals. Let (i,k (n) denote the fraction of packets sent from queue (i, k) at time n that were not received at queue (i, f (i, k)). These would not be acknowledged by the end of the waiting period Wi,k, and must be requeued and retransmitted at that time. Let F n now measure the (i,k(l),l < n for all i,k ,as well. Define Pi,k == En(i,k(n) == E(i,k(n). The queue dynamics are now Xi,k(n
+ 1) == Xi,k(n) + ai,k(n)
- di,k(n)
+ (1- (i,b(i,k)(n))di,b(i,k)(n)
+di,k(n - Wi,k)(i,k(n - Wi,k),
192
HAROLD J. KUSHNER
The last term on the right are the requeued packets, and the next to last term are the packets sent from (i, b( i, k)) to (i, k) that were received. We have
X1!k(n + 1) == 1.,
Xfk 1 (n )[ - di,k(n) + (1- (i,b(i,k) (n))di,b(i,k)(n)
(4.1)
+di,k(n - Wi,k)(i,k(n - Wi,k)] +Xf;;l(n)ai,k(n) + O(\Xf;;2(n)l) + K,
where K is a constant whose value might change from usage to usage. The additional Liapunov function perturbation component
6V w (n) ==
n-l
L Wi,k Xf;;l(n) L i,k
di,k(l)(i,k(l).
(4.2)
m=n-Wi,k
will help us deal with averaging the increases in the various queues due to not receiving an ack in time. Recall that if k == k(i), the origin node for source i, then di,b(i,k) (n) == O. Noting that, for k =1= k(i), En(i,b(i,k)(n) == Pi,b(i,k) , we can write
En[V(X(n + 1)) - V(X(n))]
==
+ En [6V W (n + 1) - 6V W (n)]
L Wi,k Xf;;l(n)Enai,k(n) i,k + L Wi,kX~~l(n) [ - di,k(n) + (1- Pi,b(i,k))di,b(i,k)(n) i,k
(4.3)
+(i,k(n - Wi,k)di,k(n - Wi,k)]
+
L Wi,k Xf,;l(n) [Pi,kdi,k(n) - (i,k(n - Wi,k)di,k(n - Wi,k)]
i,k +O(IX p -
2(n)\)
+ K.
The second, third and fourth lines contain the highest order terms in EnV(X(n + 1)) - V(X(n)), and the next to last line is the highest order term in the expansion of En [6V W (n + 1) - 6VW (n)]. The terms with di,k(n - Wi,k) lines cancel each other, and we drop them now. The decision rule that replaces (2.4) is
{di~(;0:i} L(1- Pi,k)
[Wi,k Xf,;;l(n) -
Wi,f(i,k)Xf,f(~,k)(n)] di,k(n).
1.
The rules (2.5), and (2.6) are modified similarly. The full new perturbed Liapunov function is
VW (n) == V(X(n)) + bV w (n) + L bV:~k(n) i,k
+ L(l - Pi,k)b~~kj(n) i,k
+ L(l- Pi,b(i,k))b~~kJ(n). i,k,j
(4.4)
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
193
Then, using (4.1), (4.3), (3.5), (3.8), and (3.9), + 1) - Vr-w (n) == '" ~ Wi,kX~k l (n)Af,k
En V-w (n
i,k
+ L [ - (1- Pi,k)Wi,kXf,";;l(n)di,k(n) i,k
+(1 - Pi,b(i,k) )Wi,kXf,;; 1 (n)di,b(i,k) (n)]
+ L(l - Pi,k)Wi,kX~kl(n)q{k [I{Lk(n)=j} - ilk,j] i,k
- 2)1 - Pi,b(i,k) )Wi,kXf,;; 1 (n )iH,b(i,k) [I{
Lb(i,k)
(n)=j} - IIb(i,k),j]
i,k,j
+O(IX p- 2(n)1) + K. The second and third lines are due to (the non-arrival parts of) the thirdfifth lines of (4.3). The fourth line is due to (1 - Pi,k)<5~~{;;(n) and the next to last line to (1- Pi,b(i,k) )<5~~kJ (n). Dominating terms as in the part of the proof of the theorem concerning (3.11) yields the following upper bound to the last expression:
L Wi,kX fk
1(n)
[~f,k - (1 - Pi,k)ifi,k + (1 - Pi,b(i,k))ifi,b(i,k)]
i,k
+O(IXp-2(n)l) + K. At most one of ifi,b(i,k) and ~f,k can be non-zero for any i, k. The condition (A2.4) is modified to read, for all i,
(1 -
».» )iJi,k > ~i,
for k == k(i),
(1 - Pi,k)ifi,k - (1 - Pi,b(i,k)ifi,b(i,k) > 0,
for k
=1=
k(i).
The proof is completed as in Theorem 3.1. If the packet loss process for the link out of (i, k) is correlated, then the process {(i,k(n), n} is correlated and another perturbation is required to average it. Suppose that there are Pi,k such that the sums in 00
o~~+(n) = Wi,k Xf,";;l(n) LEn [Ci,k(l) - Pi,k] di,k(l), l=n
00
o~~-(n) = -Wi,k Xf,;;l(n) LEn [Ci,b(i,k)(l) - Pi,b(i,k)] di,b(i,k) (l), l=n
are well defined and bounded, uniformly in n, w. Then add the JV:~± (.) to V W (.). The conclusion is unchanged.
194
HAROLD J. KUSHNER
B. Multicasting. Suppose that some sources have multiple destinations, with a unique route for each source-destination pair. Let the route network for each source form a tree, with the source as the root and the final destinations as the end branches. Suppose that if the tree branches at node k, then transmissions must be done to all of the branches simultaneously, as is commonly required in multicasting. If the route for source i uses node k, then redefine b(i, k, ,) to denote the nodes at the end of the branches of the tree out of queue (i, k), where the dimension of the index parameter, is the number of branches. Then (2.4) is replaced by max
{di,k(n):i}
~ [Wi kXfk-1(n) - L......t" ~ f(i k""V)X1?f-(l. k
~ 2"
ui,
''''
2,
2",
)(n)] di k(n), '
,
subject to the constraints at node k. Modify (2.5) and (2.6) analogously. The criterion (A2.4) is modified in an obvious manner to take account of the new flows.
c.
Variable number of sources and destinations. When the number of sources, nodes and destinations vary randomly, the modeling problem can be quite vexing. For example, if a node disappears slowly as its links fade, what happens to its still untransmitted data ? We will take a simple approach, by supposing that there is a backbone network, with an unchanging number of nodes, although the associated links in the backbone will still vary randomly. There is a large and randomly varying number of sources that send data to the nodes in the backbone. The arriving packets from the randomly changing number of sources are multiplexed on arrival. These packets are assigned priority values and at the backbone nodes, the data is queued according to both priority and the node to which that packet would be sent to next on its route to its final destination. Owing to the multiplexing and the large number of sources, it is assumed that the total arrival processes (per slot) from the exterior to the various queues (i, k) are mutually independent, and the elements of each are iid, with bounded variances, and means denoted by ~fk' The index i denotes the ith queue at backbone node k, and that queue is associated with both priority and the next node, and might contain packets from many different sources. Let ~i,k;v" (n) denote the fraction of the number of packets that are sent at time n from queue (i, k) to node, will be assigned to queue v there. Again, owing to the multiplexing and the large number of sources, we suppose a conditional independence in routing in that there are Pi,k;v" such that En~i,k;v,,(n) == Pi,k;v" , where En denotes the expectation conditioned on the data to time n. The queue dynamics are
v"
195
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
Condition (A2.4) is changed to require the existence of such that
"5.f,k
+ L ijt~PV''Y:i,kTI'Y,j [.»,
{ii{k; i, k, j}
L ifhTIk,j < 0 j
for each i, k. The proof follows the lines of that of Theorem 3.1. The decision rule (2.4) is replaced by, for each node k and channel state j,
5. An a priori routing selection. A potentially useful approach for getting the routing and the u(·) functions is based on a type of fluid controlled-flow approximation. In applications the algorithm would be run periodically to produce new routings as conditions change. The example is intended to be illustrative of the possibilities only. Suppose that power only is to be allocated. Let pI k a denote the power assigned to queue (i, k) for data transmitted to node ~,' when the channel state at node k is j. The associated channel rate is k a (pI "ka ) == , k,a' The routes to be given " might depend on the channel states. But the development in Section 3 is readily modified to account for this dependence. Suppose that there are upper bounds Qi such that for each i, j,
c;
ql
(5.1) i.cx
This might reflects the fact that each packet takes a minimal time. Suppose that each node k has a constraint of the form
,,-j
< Pk,
L....JPi,k,a -
(5.2)
each j,
i.cx
where Pk is the total energy/slot available at node k. We also need a constraint that assures that the average output for each non-source node equals the average input, and we write this as follows, for each i, k =I- k(i): out ==
L Cf,k,a(P{k,a)ITk,j 2: L Cf,l,k(P{l,k)ITl,j == m. cc.]
(5.3)
l,j
If node k(i) is the input node for source i, then replace (5.3) by out == L....J "C~~, k(")~,a (p~z, k(")~ .o )ITk(i) ,J" == ~f
+ E.
(5.4)
cc.]
The (arbitrarily small) E > 0 is used to assure slight overcapacity so that (A2.4) will hold and the stability argument of Theorem 3.1 can be used.
196
HAROLD J. KUSHNER
Suppose that c(i) is the destination node for source i. Then to assure that all packets end up where they are intended, for each i use the constraint "C!k L.-.t 2, ,C (')(P~k 2 2, ,C (.»)rrkj==~f+E. 'l. , k,j
(5.5)
c:
k aJP{ k 0:) that satisfy the constraints (5.1)-(5.5) will Any q{k 0: == yield an ac~~ptable 'a' prio'ri' route. But one might wish to select one via an optimization problem. One possible cost criterion is the total average power given by
L
f5;'k,o: ITk,j.
(5.6)
i,k,o:,j Minimize (5.6), subject to (5.1)-(5.5). The above approach to getting the a priori routes might yield a distributed flow for some sources. However, given these routes, the maximization rules (2.4), (2.5), or (2.6), still work. Replace (2.4) by max
"[WikXfkl(n)-wif(jiko:)X~f-(l. 'k " ' "'. J,'t,
{ d i , k , a(n);i,o:} ~ '2.,0:
1"
,0:
)(n)] diko:(n), ,
,
pi
where for each i, j, k, !(j, i, k, ex) indexes the links for which k 0: > 0 and di,k,o:(n) is the amount sent to node a from queue (i, k). ' , For multicasting, use (5.5) for all destination nodes for source i. The criterion (5.6) is concerned with total power. An alternative is to strive for maximum stability. To do this rewrite (5.3) as
L Cl,k,a(f5{k,a) ITk,j - L C{l,k (P{,l,k)I1 a,j l,j
l ,j
== bi,k'
where bi,k > O. With appropriate definitions, this can be made to include (5.3) and (5.4). Then either maximize I:i,k bi,k, or seek max min.j, bi,k. This approach will get routes and qf k m that yield the best Co in (A2.4). In addition, the dual variables associated with the constraints provide "price" guidelines, that tell us the places where an increase in the resources would do the most good (in the sense of the mathematical programming formulation). The example in [6, Section 5) was concerned with a simpler model, where each packet that was transmitted was required to have a minimum SIN ratio at the receiver, and the final form of the optimization problem was a linear program.
Comment on another case: bandwidth allocation. Suppose that the basic control is over bandwidth allocation, with the number of packets/slot being proportional to bandwidth as q{k,a == b{k,o:P{,k,o:' where the P{k,a
MULTI-NODE MOBILE COMMUNICATIONS SYSTEMS
197
are the constants of proportionality and brk,n is the assigned bandwidth. There would be a total BW constraint of the form Ei n b{ k o ~ Bi; at each node, replacing (5.2). Input-output constraints anal~go~s' to (5.3), (5.4), and (5.5), are still to hold. To get the routes, one could either strive for maximum stability or minimize the total average bandwidth, which is
L
b{,k,nIIk,j.
i,k,n,j REFERENCES [1] M. ANDREWS, K. KUMARAN, K. RAMANAN, A. STOLYAR, R. VIJAYAKUMAR, AND
[2] [3]
P. WHITING. Providing quality of service over a shared wireless link. IEEE Communications Magazine, 2001. N. BAMBOS AND G. MICHAILIDIS. Queueing and scheduling in random environments. Adv. in Appl. Prob., 36:293-317, 2004. N. BAMBOS AND G. MICHAILIDIS. Queueing dynamics of random link topology: Stationary dynamics of maximal throughput schedules. Queueing Systems,
50:5-52, 2004. [4] R. BUCHE AND H.J. KUSHNER. Control of mobile communication systems with time-varying channels via stability methods. IEEE Trans on Autom. Contr.,
49:1954-1962, 2004.
[5J R. BUCHE AND H.J. KUSHNER. Analysis and control of mobile communications with time varying channels in heavy traffic. IEEE Trans. Autom. Control,
47:992-1003, 2002.
[6J H.J. KUSHNER. Control of multi-node mobile communications networks with time varying channels via stability methods. submitted, June, 2005. [7] H.J. KUSHNER. Approximation and Weak Convergence Methods for Random Processes with Applications to Stochastic Systems Theory. MIT Press, Cambridge, Mass., 1984. [8] H.J. KUSHNER AND G. YIN. Stochastic Approximation Algorithms and Applications. Springer-Verlag, Berlin and New York, 1997. Second edition, 2003. [9] S. SHAKKOTI AND A. STOLYAR. Scheduling for multiple flows sharing a timevarying channel: The exponential rule. In M Suhov, editor, Analytic Methods in Applied Probability: In Memory of Fridrih Karpelevich, American Math. Soc. Transl. , Series 2, Volume 207, pp. 185-202. American Mathematical Society, Providence, 2002. [10] S. STOLYAR. Max weight scheduling in a generalized switch: state space collapse and workload minimization in heavy traffic. Ann. of Appl. Probab, 14:1-53,
2004. [l l] L. TASSIULAS AND A. EPHREMIDES. Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE Trans. Automatic Control, 39:466478, 1993.
A GAME THEORETIC APPROACH TO INTERFERENCE MANAGEMENT IN COGNITIVE NETWORKS NIE NIE*, CRISTINA COMANICIU*t, AND PRATHIMA AGRAWAL+
Abstract. In this paper, we propose a game theoretic solution for joint channel selection and power allocation in cognitive radio networks. Our proposed algorithm enforces cooperation among nodes in an effort to reduce the overall energy consumption in the network. For designing the power control, we consider both the case in which no transmission power constraints are imposed, as well as the more practical case, in which the maximum transmission power is limited. We show that an iterative algorithm for channel scheduling and power allocation can be implemented, which converges to a pure strategy Nash equilibrium solution, i.e., a deterministic choice of channels and transmission powers for all users. Our simulation results also show that, while both channel allocation and power control can independently improve the system performance, there is a significant gain for the joint algorithm. Key words. Cognitive radio, channel allocation, power control, potential game. AMS(MOS) subject classifications. 91A80, 68M10.
1. Introduction. The explosive growth of wireless services and the increased users' population density call for intelligent ways of managing the scarce spectrum resources. With the new paradigm shift in the FCC's spectrum management policy [2] that creates opportunities for new, more aggressive, spectrum reuse, cognitive radio technology lays the foundation for the deployment of smart flexible networks that cooperatively adapt to increase the overall network performance. The cognitive radio terminology was coined by Mitola [1], and refers to a smart radio which has the ability to sense the external environment, learn from the history, and make intelligent decisions to adjust its transmission parameters according to the current state of the environment. As the cognitive radios are essentially autonomous agents that are learning their environment and are optimizing their performance by modifying their transmission parameters, their interactions can be modeled using a game theoretic framework. In this framework, the cognitive radios are the players and their actions are the selection of new transmission parameters and new transmission frequencies, etc., which influence their own performance, as well as the performance of the neighboring players. Game theory has been extensively applied in microeconomics, and only more recently has received attention as a useful tool to design and *Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030 (nnie
200
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
analyze distributed resource allocation algorithms (e.g., [4-8]). Some game theoretic models for cognitive radio networks were presented in [9J, which has identified potential game formulations for power control, call admission control and interference avoidance in cognitive radio networks. The convergence conditions for various game models in cognitive radio networks are investigated in [10]. In our previous work [11] we have proposed a distributed channel allocation algorithm for cognitive radios using a fixed transmission power. In this paper, we extend this framework to a more realistic scenario, that also allows the radios to control their transmission powers. Here we assume that the cognitive radios are sensing the environment by sending probes and measuring the available channels, and then distributively select the best channels, then they optimize their transmission powers according to their channel selection to minimize the energy consumption. With a goal of optimizing the link throughput and establishing a fair spectrum sharing over the network, the radios measure the local interference temperature on different frequencies and adjust by maximizing the data transmission rate for a given channel quality (using adaptive modulation) and by possibly switching to a different frequency channel. To tackle the above problem, we propose a game theoretic formulation, in which the adaptive channel allocation and power control problem is modelled as a potential game similarly to the one presented in [11]. The radios are modelled as a collection of agents that distributively act to maximize their utilities in a cooperative fashion. The radios' decisions are based on their perceived utility associated with each possible action which is related to the transmission power and to the channel selection. We study the convergence properties of the proposed adaptation algorithm and we design adaptation protocols for this algorithm. Two scenarios (power control with and without maximum transmission power limitation) are considered, and the effect of various maximum power levels on the system performance is investigated. We further study the tradeoffs related to energy consumption, throughput and fairness when power control and channel allocation are employed individually and jointly in the network. A glossary of abbreviations defined in this paper is listed as following:
C A_NPC PC_NCA CPC_NCA JCAPC JCACPC
Channel Allocation, No Power Control Power Control, No Channle Allocation Constraint Power Control, No Channel Allocation Joint Channel Allocation with Power Control Joint Channel Allocation with Constraint Power Control
2. System model. The cognitive radio network we consider consists of a set of N transmitting-receiving pairs of nodes, uniformly distributed in a square region of dimension D* x D*. We assume that the nodes are either fixed, or are moving slowly (slower than the convergence time for the proposed algorithms). Fig. 1 shows an example of a network realization,
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
..
,
1000
201
900 ::~
800
.,
--4..
•
700
r .:(.
'" .:
.
600 500 400
..
300 :(.
200
...
100 0
FIG.
~~
1 0
200
400
600
800
1000
1. A snapshot of the nodes' positions and network topology.
TABLE 1 Modulation Modes in Adaptive Modulation and corresponding SIR requirement for target BER==10- 3 .
Modulation Mode 1024 QAM 256 QAM 64QAM 16 QAM QPSK BPSK
SIR (dB) 35.5 29.4 23.3 16.9 9.9 6.8
where we used dashed lines to connect the transmitting node to its intended
receiving node. The nodes measure the spectrum availability and decide on the transmission channel. We assume that there are K frequency channels available for transmission, with K < N. By distributively selecting a transmitting frequency, the radios effectively construct a channel reuse distribution map with reduced co-channel interference.
The transmission link quality can be characterized by a required Bit Error Rate target (BER), which is specific for the given application. An equivalent SIR target requirement can be determined, based on the modulation type selected when employing an adaptive modulation scheme (see Table 1). The Signal-to-Interference Ratio (SIR) measured at the receiver j associated with transmitter i can be expressed as: (2.1)
202
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
where Pi is the transmission power at transmitter i, G ji is the link gain between transmitter i and receiver j. a 2 denotes the received noise and it is assumed to be the same for all receiver nodes. I( i, j) is the interference function characterizing the interference created by node i to node j and is defined as
I (i, j)
==
1 if transmitters i and j are transmitting over the same channel { o otherwise.
(2.2)
Analyzing Eq. (2.1) we see that in order to maintain a certain BER constraint the nodes can adjust at both the physical and the network layer level. At the network level, the nodes can minimize the interference by appropriately selecting the transmission channel frequency. At the physical layer, power control can reduce interference and, for a feasible system, results in all users meeting their SIR constraints. Alternatively, the target SIR requirements can be changed (reduced or increased) by using different modulation levels. As an example of adaptation at the physical layer, we have assumed that software defined radios enable the nodes to adjust their transmission rates and consequently the required SIR targets by adaptively changing the modulation scheme. Also, the nodes can adjust their transmission power level via distributed power control to ensure that all the nodes sharing the same channel meet the target SIR requirement at their intended receiver. The BER requirement selected for simulations is 10- 3 , and we assume the use of an adaptive modulation scheme ~ith six modes: BPSK, QPSK, 16 QAM, 64 QAM, 256 QAM and 1024 QAM. In Table 1 we show the modulation modes and the corresponding SIR target requirements used for our simulations [12, 13]. For our simulations we are also assuming that all users have packets to transmit at all times (worst case scenario), and that multiple users are allowed to transmit at the same time over a shared channel. We assume that users in the network are identical, which means they have an identical action set and identical utility functions associated with their possible actions.
3. Power control. For the users sharing the same frequency channel, their transmission powers affects their link quality and the interference temperature on that particular channel. The goal of power control is to adjust the transmission powers of all users to improve the link quality and to enable the group of users who are transmitting over the same channel to meet a certain target BER, which can be associated with the highest SIR target that can be met (the highest rate) using adaptive modulation. Given a target SIR '"'(*, for a feasible system with N users, a non-negative power vector P* can be obtained by: (3.1)
203
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
,*
where H ij == (h ij ) is the normalized link gain matrix such that hij == ~ for i =f j and hij == 0 for i == j, TJ == (TJi)i=l ..N is the normalized noise vector such that TJi == J.. , and a 2 is the received noise power. We consider tht~t there are K frequency channels, and each channel is shared by a group of users that transmit on the same frequency. Each user group determines their transmission powers via power control. Let s, == 1,2, ... , K denote the choice of transmitting channel for user i, i E N, the power vector for the kth user group can be determined by:
,*
P; == (I - Hk)-lTJk, for k == 1,2, ... , K,
(3.2)
where H k == (hij ) for s: == k, S j == k and i =f j, and TJk is the normalized noise vector for s, == k. The number of the elements of Pk is equal to the number of the users who transmit on the same channel. For a feasible system, P; should be a non-negative vector, P; (i) > 0, i E Nk' with the assumption that the transmission power can be adjusted without limitations. However, in practice, the maximum output power of a transmitter is upper-bounded. Taking this limitation into account, the transmission power vector Pk can still be determined by Eq. (3.2) but with the constraint that
O:S;P;(i):s;PMAX,iENk , jork==I,2, ... ,K,
(3.3)
where PMAX denotes the maximum transmitter output power depending on the physical device, and/or regulation restrictions. Consequently, the constrained transmission power Pk(i) min{P;(i), PMAX } , i E Ni; It is clear that by selecting different transmitting channels, a user will belong to different user groups and will choose its operating power level with respect to the interference environment of that particular group. When the channels are allocated adaptively, the population and the members of these user groups will change with the current channel assignment. 4. A game theoretic formulation. Game theory represents a set of mathematical tools developed for the purpose of analyzing the interactions in decision processes. In this work, we model our channel allocation problem as a normal form game (see also [11]), which can be mathematically defined as r == {N, {Si}iEN, {Ui}iEN}. N is the finite set of players (cognitive radios as decision makers). S, is the set of strategies associated with player i. In our case, the players' strategies are the choice of a transmitting channel, s, == 1,2, ... , K. We define § == XSi' i E N as the strategy space and Ui: § ----t 1R as the set of utility functions that the players associate with their strategies. For every player i in game r, the utility function U, is a function of s., the strategy selected by player i, and of the current strategy profile of its opponents: s :«. The utility function characterizes a player's preference for a particular choice of strategy.
204
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
In analyzing the outcome of the game, as the players make decisions independently and are influenced by the other players' decisions, we are interested to determine if there exist a convergence point for the adaptive channel selection algorithm, from which no player would deviate anymore, i.e., a Nash equilibrium (NE). A strategy profile for the players, S == [Sl' S2, ... , SN], is a NE if and only if
(4.1) If the equilibrium strategy profile in Eq. (4.1) is deterministic, a pure strategy Nash equilibrium exists. For finite games, even if a pure strategy Nash equilibrium does not exist, a mixed strategy Nash equilibrium can be found (equilibrium is characterized by a set of probabilities assigned to the pure strategies) . 4.1. Utility function. For our joint channel allocation and power control game, the utility function should characterize the preference of a user for a particular channel, given the fact the user knows that power control is employed by all the users sharing a given channel. Considering that users are willing to cooperate to achieve a fair allocation of resources, we impose that the utility function must account for both the interference perceived by the current user, as well as the interference that particular user is creating to neighboring users sharing the same channel. A possible choice for the utility function may be: N
L
U, (8i, 8-i) == -
Pi (8j )Gij f( 8j,
s.)
j#i,j=l
(4.2)
N
- L
Pi(Si)Gjif(Si, Sj)
Vi == 1,2, ... ,N.
j#i,j=l
For the above definition, we denoted P==[P1,P2, ... ,PN] as the transmission powers for the N radios and S==[Sl,S2, ... ,SN] as the strategy profile. The transmission powers depend on the strategy profile S (the channel allocation) as discussed in the Section 3. f(Si, Sj) is an interference function defined as: I
f (Si, S j) ==
{
o
if Sj == s., transmitter j and i choose the same strategy (same channel) otherwise
Given that the above utility function accounts for both the interference measured at the current user's receiver, and the interference created by the user to others, the algorithm implementation for the channel selection becomes more complex, since it will require probing packets on a common access channel for measuring and estimating the interference a user will create to neighboring radios.
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
205
4.2. Potential game formulation and equilibrium convergence. In the previous section we have defined an utility function and discussed the physical meaning to which the formulation gears. Now, we further study the mathematical properties impacted on this function in order to have good convergence properties for the adaptation algorithm. It has been shown that certain classes of games converge to a Nash equilibrium when a best or better response adaptive strategy is employed, see Refs. [14-18]. In what follows, we show that for the utility function defined in Eq. (4.2), an exact potential game can be formulated that ensures that a pure strategy Nash equilibrium solution to be reached for the joint power control and channel selection algorithm. Characteristic for a potential game is the existence of a potential function that exactly reflects any unilateral change in the utility function of any player. The potential function models the information associated with the improvement paths of a game instead of the exact utility of the game [15]. An exact potential function is defined as a function Fp : §
-4
lR, if for all i, and
s.,
s~ E Si,
with the property that
Ui(Si, S-i) - Ui(S~, S-i) == Fp(Si, S-i) - Fp(s~, S-i)'
(4.3)
If a potential function can be defined for a game, the game is an exact potential game. In an exact potential game, for a change in actions of a single player, the change in the potential function is equal to the value of the improvement deviation. Any potential game in which players take actions sequentially converges to a pure strategy Nash equilibrium that maximizes the potential function. For our previously formulated joint power control and channel allocation game with utility function U, we can define an exact potential function to be
Fp(S)
=
Fp(Si, s-d
=
N
N
i=l
j:f:i,j=l
L (- a L
Pj(sj)Gijf(sj, s.)
N
-(1- a) .
L:
Pi(Si)Gjd(Si, Sj))
(4.4)
J:f: 2,J=1
Vi == 1,2, ... ,N,
0<
Q
< 1.
The function in Eq. (4.4) essentially reflects the network utility. It can be seen thus that the potential game property Eq. (4.3) ensures that an increase in individual users' utilities contributes to the increase of the overall network utility. Without loss of generality, in this work we set Q == 0.5. See Appendix A, for a proof that the function defined in Eq. (4.4) is an exact potential function.
206
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
We note that the above property holds only if users take actions sequentially, following a best response strategy. Consequently, a certain coordination among users should be implemented in the distributed algorithm. The easiest way to implement a certain level of coordination is to allow users to play the game only when they win a Bernouli trial with probabilwhere N is the number of users currently in the system. ity p == Another observation is that the evaluation of the utility function in Eq. (4.2) implies that an estimate of the vector powers should be computed for each considered configuration of users on channels, in order to determine the interference power. For this purpose, knowledge on channel gains for all users should be available, and can also be obtained by channel probing, measurements, and exchanging control messages inthe network.
17,
5. Performance evaluation. In this section, we present some numerical results to illustrate the performance of the proposed adaptive channel allocation and power control algorithm for both non-constraint power control and constraint power control scenarios. For simulation purposes, we consider a fixed wireless ad hoc network (as described in the system model section) with N == 20 and D == 1000 (20 transmitters and their receivers are randomly distributed over a 1000m x 1000m square area). The joint adaptation algorithm is illustrated for a network of 20 transmitting radios, sharing K == 4 available channels. A random channel assignment is selected as the initial assignment. For a fair performance comparison, all the simulations start from the same initial channel allocation. The BER requirement is 10- 3 and the noise power a 2 is set to be 10- 13 . For the numerical results, a path loss coefficient of n == 2 was selected. From the simulation results, we can see that the proposed adaptive channel allocation and power control game preserves the convergence property of the cooperative spectrum sharing algorithm proposed in [11] for fixed power transmission. Both cases with non-constraint and with constraint power control converge to a pure strategy Nash equilibrium (a determined channel assignment), but reaching different equilibrium points. As an example, Fig. 2 illustrates the convergence property for the joint power control and channel allocation algorithm, when no constraints on the maximum transmission power are imposed. As performance measures for the proposed algorithm we consider the achieved SIRs and throughputs (adaptive modulation is used to ensure a certain BER target, as previously explained in Section 2). We consider the average performance per user as well as the variability in the achieved performance (fairness), measured in terms of variance. We also consider the total transmission power as a measure related to the energy efficiency. We also study the improvement in performance that can be achieved by employing a joint optimization over channels and powers (JCAPC), compared to the case for which either only power control is used to improve the performance, given a random allocation of the channels (Power Control,
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
4
207
The strategies taken by the nodes ..,....--~-.....,...---r"---..,.--r----r---....,.-----,
3
2
11a-1~...L---~---"-_---L._----'-_.L--_~_--L..-_..--J
o
100
200
300
400
500
600
700
800
900
T: Numberof Trials FIG. 2. Potential game: convergence of users' strategies.
No Channel Allocation (PC-.NCA)), or only channel adaption is employed for a fixed transmission power level (Channel Allocation, No Power Control (CA_NPC)). We first illustrate the results for the case of non-constraint power control. We assume that all the users have the same initial transmission power. We examine different initial power levels with a range covering 1.5 x 10- 5,1.5 X 10- 6,1.5 X 10- 7,1.5 X 10- 8,1.5 X 10- 9,1.5 X lO-lOW. We find that the performance of the algorithm with non-constraint power control is independent on the initial power level. As an example, in Fig. 3, we show the histograms of the users' achieved SIRs for a) the initial randomly channel assignment with fixed transmission power; b) PCflCA ; c) CA_NPC ; d) JCAPC, with an initial transmission power of 1.5 x 10- 7W. It can be seen that, for PC_NCA, power control reduces the interference temperature by adjusting the transmission power of neighboring nodes, and consequently improves the SIRs of the users who suffer from poor link conditions, such that no user has an SIR below 7 dB. Furthermore, we see that in CA_NPC, adaptive channel allocation can provide further improvement by creating a better frequency reuse allocation even without power control: no user will have an SIR below 10 dB. However, when channel allocation and power control are employed jointly, the advantage is obvious in that the distribution of the users' achieved SIRs is concentrated around a mean value (with very low variance), with almost all the users maintaining an achieved SIR around 20 dB, which demonstrates a fair and efficient spectrum sharing.
208
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
a) initial SIRs
10 r-----.......--
----,..----,
8 6 4 2
OL----'--
-20
o
20
40
60
d) JCAPC 8 r-------.--------.,...-----,
20 ..------.-------.....-----.-----,
6
15
4
10
2
5
0'----"""""'---
-20
o
20
40
-
o'----------I.--~---'
60
-20
o
20
40
60
FiG. 3. Histogram of users' achieved SIRs. a) Initial State b) PC_NCA c) CA_NPC d) JCAPC, initial transmission power == 1.5 X lO-7W.
The performance in terms of the normalized achievable throughput at each receiver is similar, as illustrated in Fig. 4. By exploiting both of the channel allocation and power control techniques in the proposed cooperative spectrum sharing algorithm, a more fair allocation of the throughput is achieved throughout the network. The improvement for the users that initially had a low performance can be noticed, at the expense of a slight penalty in performance for the users with initially high throughput. We further investigated the performance of the proposed algorithm under the constraint of maximum data transmission power (JCACPC). In the simulation, all the users operate at the maximum data transmission power initially. The initial power level range used in the study of JCAPC is utilized to examine the effects of various maximum data transmission power limitations on the performance of JCACPC. In Fig. 5, we illustrate the histogram of achieved SIRs with the maximum data transmission power set to be 1.5 x 10- 7W. It is shown that, CPC-NCA with a limitation of transmission power has a poorer performance in terms of achieved SIR without the help of frequency reusing planning. This happens because the benefits of the power control are limited by the maximum transmission power constraint.
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
209
b) PC_NCA
a) initial Throughputs
15,...----.--.------.-----.-------.
8 6
10
4
5 2
o
5
10
15
o
-5
o
5
10
15
10
15
d) JCAPC
10,...----.--.------.----......------, 8
20~-......--
15
6
10 4 5
2
o
-5
o
O~--
5
10
15
-5
o
5
FIG. 4. Histogram of users' normalized Throughputs. a) Initial State b) PC_NCA c) CA_NPC d) JCAPC, initial transmission power == 1.5 x lO-7W.
b) CPC_NCA
a) initial SIRs
6,...----.--.------.----......---.
6 ~-......----..--------.
4
4
2
2
o
20
40
o
60
20
40
60
d) JCACPC
8.----.------.----.----.
20 r - - - - . - - - - - - - . - - - - - - . - - - - ,
6
15
4
10
2
5
0'---------
-20
o
20
40
•
O~-....o..-----------'------J
60
-20
o
20
40
60
FIG. 5. Histogram of users' achieved SIRs. a) Initial State b) CPC_NCA c) CA_NPC d) JCACPC, maximum transmission power == 1.5 X lO-7W.
210
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
a) Pmax = 1.5e-8 W
20 10 0 -5
0
5
10
15 20 25 30 b) Pmax = 1.5e-9 W
35
40
45
0
5
10
15 20 25 30 c) Pmax = 1.5e-1 0 W
35
40
45
0
5
10
35
40
45
10 5 0 -5 10 5 0 -5
15
20
25
30
FIG. 6. Histogram of users' achieved SIRs of JCACPC algorithm with various maximum data transmission power.
The proposed JCACPC algorithm maintains its advantages and demonstrates a performance close to that of JCAPC in Fig. 3. However, its performance is dependent on the upper bound of the transmission power. In Fig. 6, we illustrate the evolution of the distribution of users' achieved SIRs as the upper bound of transmission power drops from 1.5 x 10-7W to 1.5 X 10-lOW. It can be found that the JCACPC algorithm degenerates into CA_NPC algorithm gradually as the maximum transmission power decreases. When the upper bound drops to 1.5 x lO-lOW or below, JCACPC performs as the same as CA-NPC does, which means constraint power control can not provide constructive contributions to the algorithm anymore. A similar trend can be observed for the throughput in Fig. 7 and Fig. 8. The reason behind this fact is that when the power upper bound is relatively high, the joint channel allocation and power control algorithm has more freedom to adjust the users' power, and benefit from the optimal power allocation. The transmission power distribution at equilibrium point in this case is shown in Fig. 9. As the upper bound drops, some of the users may have to transmit at the maximum power and still not meet the target yet, as shown in Fig. 10. But, to some extent, the adaptive channel allocation can compensate part of the loss of performance. However, when the transmission power constraint becomes more and more strict, more and
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
b) CPC_NCA
a) initial Throughputs
8
8
6
6
4
4
2
2
0 -5
0
5
10
211
15
0 -5
0
c) CA_NPC
5
10
15
10
15
d) JCACPC
10
20
8
15
6 10 4
5
2 0 -5
0
10
5
15
0 -5
0
5
FIG. 7. Histogram of users' normalized Throughputs. a) Initial State b) CPC_NCA c) CA_NPC d) JCACPC, maximum transmission power == 1.5 X lO-7W.
a) Pmax = 1.5e-8 W
. . ' .'1---'----'-::~:J -4
1:[ -4
-2
: -2
0
2 4 6 b) Pmax = 1.5e-9 W
8
10
12
:1111~ : J diE
0
2 4 6 c) Pmax = 1.5e-10 W
8
10
12
:
J
8
10
FIG. 8. Histogram of users' normalized Throughputs of JCACPC algorithm with various maximum data transmission power.
212
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
7
X 101.8 ~------------------------.,
1.6
Maximum Power
1.4
1.2
0.8 0.6 0.4
0.2
oL... .u FIG. 9. Transmission power distribution at equilibrium point of JCACPC, maximum transmission power == 1.5 X 10- 7 W .
X 10-
8
5~------------------------.
4.5
4 3.5 3
2.5 2 Maximum Power
1.51---------...-------~..--------.t
0.5
o
FIG. 10. Transmission power distribution at equilibrium point of JCACPC, maximum transmission power == 1.5 X 10- 8 W .
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
213
2.5
2 Maximum Power 1.5
0.5
o
FIG. 11. Tmnsmission power distribution at equilibrium point of lCACPC, maximum transmission power == 1.5 X lO-lOW.
more users are unable to meet the target, and consequently the performance degrades. Eventually, when the upper bound drops below a certain value, 1.5 x 10- l 0W in this example, all the users are forced to transmit at an equal maximum power (see Fig. 11 ), and JCACPC yields the same performance as that of CA-.NPC. In Fig. 12, we compare the energy consumption in terms of total transmission power for all the five scenarios: CA_NPC, PC-.NCA, CPC-.NCA, JCAPC and JCACPC. We find that the energy consumption of CA-.NPC increases linearly with the initial fixed transmission power. CPC-.NCA's total transmission power demonstrates the same trend. PC-.NCA and JCAPC show a constant energy consumptions with different initial power levels, but PC_NCA may converge to a point with much higher energy consumption due to the lack of adaption in the channel selection, which may require some users to use high powers to overcome the interference on their current channel. In Fig. 13 we summarize the performance comparisons among all five cases in terms of average throughput per user and variance of the throughput per user. The variance measure quantifies the fairness, with the fairest scheme having the lowest variance. 6. Conclusion. In this work, we have proposed a game theoretic solution for joint channel selection and power allocation in a cognitive ad hoc
214
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
Total Transmission Power vs, Initial(MAX) Transmission Power
10° Q5 ~ 0
a..
0
6
$$
--e- PC without MAX Power - +-
PC with MAX Power
--+- Only PC without MAX Power
0
'Een
e
- - - - Without PC
10-2
c
'wen
~
10-4
- ~ - Only PC with MAX Power
c
.-~ .-(5 (ij
10- 6
10-8
10- 10 10- 10
10-9 10-8 10-7 Initial(MAX) Transmission Power
10- 6
FIG. 12. Total transmission power us. Maximum transmission power (constraint power control) or Initial transmission power (other cases).
7,----r------r--------.,------,-------,.----, _
6
Average Throughput per User
~ Variance of the Throughput per Use
5
4
3
2
o
FIG.
cases.
JCAPC
JCACPC
13. Average- Throughput and Variance of the Throughput per user for all the
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
215
network. Based on channel probing and measurements, the users extract information to independently assess their channel preferences and compute their optimal transmission power. We prove that the distributed algorithm can be modeled as an exact potential game, which is guaranteed to converge to a pure strategy Nash equilibrium, i.e., to a deterministic selection of channels and powers for all users. Our simulation results quantify the performance gain obtained by the proposed joint algorithm compared to simply employing power control for a fixed channel allocation, or to adaptively choosing the channels but with a fixed transmission power level.
APPENDIX A. Proof of the exact potential function. Suppose there is a potential function of game r is defined in Eq. (4.4) as:
where 0 < a < 1. Then for all i E {I, 2, ..., N},
N
==
-a
N
L
Pj(sj)Gijf(sj, Si) - (1 - a)
j-l-i,j=l
N
==
-a
L j-l-ij=l
L
Pi(Si)Gjif(Si, Sj)
j-l-i,j=l
N
Pj(sj)Gijf(sj, s.) - (1 - a)
L j-l-ij=l
pi(Si)Gjif(Si, Sj)
216
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
N
L
== -a
N
L
Pj(Sj)Gij!(Sj,Si) - (I-a)
j=li,j=l N
N
k=li,k=l
k=li,k=l
+
L (-api(si)Gki!(Si, Sk)) + L (-(l-a)Pk(sk)Gik!(sk, Si))
N
== -0:
L
N
L k=l:i,k=l
Pi(Si)Gji!(Si, Sj)
j=l:i,j=l
N
-0:
L
Pj(Sj)Gij!(Sj, Si) - (1 - 0:)
j=l:i,j=l
Let
Pi(Si)Gji!(Si,Sj)
j=li,j=l
N
Pi(Si)Gki!(Si, sk)(l - a)
L k=l:i,k=l
Pk(Sk)Gik!(Sk, Si)
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
Then, N
L
Fp(Si' S-i) == -a
Pj(sj)Gijf(sj, s.)
ji=i,j=l N
L
-(1 - a)
Pi(Si)Gjif(Si, Sj)
j#i,j=l N
L
-a
Pi(Si)Gkif(Si, Sk)
k#i,k=l N
L
-(1 - a)
Pk(Sk)Gikf(Sk, s.)
+ Q(S-i).
k#i,k=l
Substitute k with j, N
== -(a
+ (1 - a))
L
Pj(sj)Gijf(sj, Si)
j#i,j=l N
-(a + (1 - a))
L
Pi(Si)Gjif(Si, Sj)
+ Q(S-i).
j#i,j=l
If user i changes its strategy from s, to
s~,
we can get:
N
Fp(s~, S-i) == -a
L
Pj(sj)Gijf(sj, s~)
ji=i,j=l N
L
-(1 - a)
Pi(S~)Gjif(s~, Sj)
j#i,j=l
N
-a
L'
Pi(S~)Gkif(s~, Sk)
k#i,k=l N
-(1 - a)
L
Pk(Sk)Gikf(Sk, s~)
+ Q(S-i)
k:j:i,k=l N
==
-(a + (1 - a))
L
Pj(sj)Gijf(sj, s~)
j#i,j=l N
-(a + (1 - a))
L ji=i,j=l
Pi(S~)Gjif(s~, Sj)
+ Q(S-i).
217
218
NIE NIE, CRISTINA COMANICIU AND PRATHIMA AGRAWAL
Here Q(S-i) is not affected by the strategy changing of user i. Hence, Fp(s~, S-i) - Fp(Si, S-i) N
L
== -(a + (1 - a))
Pj(sj)Gijf(sj, s~)
j:{;i,j=l N
L
-(a + (1- a))
Pi(S~)Gjif(s~, Sj)
j:{;i,j=l
- (-(a + (1 - a)) .
t
Pj(sj)Gijf(sj, s.)
):{;1,,)=1
-(a + (1 - a)) .
t
Pi(Si)Gj;/(Si, Sj))
):{;1,,)=1
N
== -
L
N
Pj(sj)Gijf(sj, s~) -
j:{;i,j=l
L
Pi(S~)Gjif(s~, Sj)
j:{;i,j=l
-(- .t
Pj(sj)Gijf(sj, s.) - .
):{;1,,)=1
t
Pi(Si)Gj;/(Si, Sj)) .
):{;1,,)=1
From Eq. (4.2), Ui(S~, S-i) - Ui(Si, S-i) N
== -
L
Pj(sj)Gijf(sj, s~)
j#ij=l N
- L
Pi(S~)Gjif(s~, Sj)
j#ij=l
-(-.t
Pj(sj)Gijf(sj, Si) - .
)#1,,)=1
t
Pi(Si)Gj;/(Si, Sj))
)#1,,)=1
Vi == 1,2, ... ,N,
So, we prove that Fp(S) defined in Eq. (4.4) is an exact potential function of game f.
A GAME THEORETIC APPROACH IN COGNITIVE NETWORKS
219
REFERENCES [1] J. MITOLA III, Cognitive Radio: An Integrated Agent Architecture for Software Defined Radio, Doctor of Technology Dissertation, Royal Institute of Technology (KTH), Sweden, May 2000.
[2] Facilitating Opportunities for Flexible, Efficient, and Reliable Spectrum Use Employing Cognitive Radio Technologies, FCC Report and Order, FCC-0557 Al, March 11, 2005. [3] J. MITOLA III, Cognitive Radio for Flexible Alobile Multimedia Communications, IEEE 1999 Mobile Multimedia Conference (MoMuC), November, 1999. [4] DAVID J. GOODMAN AND NARAYAN B. MANDAYAM, Network Assisted Power Control for Wireless Data, Mobile Networks and Applications, 6(5): 409-415, 2001 [5] S. GINDE, J. NEEL, AND R. BUEHRER, Game Theoretic Analysis of Joint Link Adaptation and Distributed Power Control in GPRS, Vehicular Technology Conference, Orlando October 2003. [6] D. KRISHNASWAMY, Game-theoretic formulations for network-assisted resource management in wireless networks, Proceeding of IEEE Vehicular Technology Conference, Vancouver, September 2002. [7] SHIN HORNG WONG AND IAN J. WASSELL, Application of Game Theory for Distributed Dynamic Channel Allocation, IEEE 55th Vehicular Technology Conference, Spring 2002, Birmingham, AL, pp. 404-408, May 2002. [8] R. MENON, A. MACKENZIE, R. BUEHRER, AND J. REED, Game Theory and Interference Avoidance in Decentralized Networks, SDR Forum Technical Conference, November 15-18, 2004. [9} J. NEEL, J.H. REED, AND R.P. GILLES, The Role of Game Theory in the Analysis of Software Radio Networks, SDR Forum Technical Conference, November 2002. [10] J. NEEL, J.H. REED, AND R.P. GILLES, Convergence of Cognitive Radio Networks, Wireless Communications and Networking Conference, 2004. [II} N. NIE AND C. COMANICIU, Adaptive Channel Allocation Spectrum Etiquette for Cognitive Radio Networks, IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN) 2005, Nov. 2005. [12} S.T. CHUNG AND A.J. GOLDSMITH, Degree of Freedom in Adaptive Modulation: A Unified View, IEEE Transactions on Communications, 49(9), September 2001. [13] J .G. PROAKIS, Digital Communications, The McGraw-Hill Companies, Inc. 200l. [14] R. ROSENTHAL, A class of games possessing pure-strategy Nash equilibria, International Journal of Game Theory, 2: 65-67, 1973. [15} D. MONDERER AND L. SHAPLEY, Potential Games, Games and Economic Behavior 14: 124-143, 1996. [16] G. ARSLAN AND J. SHAMMA, Distributed convergence to Nash equilibria with local utility measurements, In 43rd IEEE Conference on Decision and Control, pp. 1538-1543, 2004. [17] JEFF S. SHAMMA AND GRDAL ARSLAN, Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to Nash Equilibria, IEEE Transactions on Automatic Control, 50(3), March, 2005. [18] SAMUEL lEONG, ROBERT McGREW, EUGENE NUDELMAN, YOAV SHOHAM, AND QIXlANG SUN, Fast and Compact: A Simple Class of Congestion Games, In Proceedings of American Association for Artifical Intelligence (AAAI), 2005.
ENABLING INTEROPERABILITY OF HETEROGENEOUS AD HOC NETWORKS SANTOSH PANDEY* AND PRATHIMA AGRAWAL*t
Abstract. Diverse application requirements of wireless communications have resulted in multiple wireless standards. Devices using disparate wireless technologies often cannot communicate directly with each other. However, in many practical scenarios such as disaster management and public safety crises, communication amongst dissimilar wireless networks may become a necessity. Previous work related to coverage and connectivity issues consider a homogeneous network. We investigate the problem of communication in an ad hoc heterogeneous network comprising of several underlying homogeneous networks. Our proposed solution introduces special multi-interface devices, called drones, that enable communication between dissimilar networks. The optimal placement of drones with accompanying network interfaces is determined by a new placement and interface selection algorithm (PISA). PIS A works in two phases. The first phase, finds coarse placements for drones by a heterogeneous clustering algorithm. Then, using well known genetic algorithms, optimal drone locations are determined based on the coarse placements derived in phase one. PIS A also determines the type of interfaces for each drone. The algorithm is validated, considering example scenarios, using MATLAB simulations. Solutions achievable using PISA result in high connectivity in the heterogeneous network with minimal number of drones and network interfaces. Key words. Heterogeneous network, connectivity, wireless ad hoc network, heterogeneous clustering, genetic algorithm, interface selection.
AMS(MOS) subject classifications. 9lA80, 68MlO, 05C40, 68WOl, 91C20.
1. Introduction. In the past decade, wireless communication has evolved into an indispensable technology for critical applications such as public safety, disaster recovery, and military operations. In order to cater to their diverse requirements, wireless networks that use distinct frequency bands, communication technologies, end user devices, and network infrastructures have evolved. This has resulted in a large number of wireless networks that cannot directly communicate with each other. The recent Katrina disaster highlighted the problems of communication between wireless networks of different public safety departments. During major crises, personnel from different public organizations are brought together at a location. Examples of public safety organizations are police, fire, rescue squad and national guard. Ease in communication amongst them is essential to expedite their response. This calls for connectivity amongst the various dissimilar networks in the vicinity of a crisis location. We consider the problem of connectivity when several homogeneous networks in a region combine to form a single heterogeneous network. For "Department of Electrical and Computer Engineering, Auburn University, Auburn, AL 36849 (pandesg
221
222
SANTOSH PANDEY AND PRATHIMA AGRAWAL
Networkwith type 4 interfacenodes
FIG.
1. Schematic representation of drone placement.
example, the police radio network and the fire department's radio network are both homogeneous wireless networks that may be independent of each other but communicate with some backbone infrastructure. During a disaster when the backbone infrastructure is destroyed, they become isolated networks that can be combined together to form a single heterogeneous network. Direct communication between dissimilar nodes in the resulting heterogeneous network may not be possible. This problem of connectivity in a heterogeneous network is not limited to public safety: wireless personal area networks, ad hoc networks, mesh networks, and sensor networks are other areas where this problem may be encountered. A brute-force and expensive solution would be to equip each node with all additional network interfaces. Previous solutions for the public safety communication problem use a single high-end base station with all interfaces to relay communication between heterogeneous nodes", However, these solutions are impractical as they do not scale well with the number of interfaces and the number of underlying homogeneous networks. Moreover, in ad hoc, sensor, and mesh networks there is an additional degree of freedom: the individual nodes may reach a 'base station' via multiple hops using other nodes. This paper addresses the communication problem in such a heterogeneous ad hoc network. To the best of our knowledge this is the first work dealing with the above described problem. We propose a scalable low-cost solution for enabling communication in heterogeneous networks by deploying a small number of multiple interface devices called drones. Development of such multi-interface devices is currently underway", A heterogenous network comprising of several groups of homogeneous networks is depicted in Fig. 1. The heterogeneous network may be formed of several groups of 802.11a, 802.11b, ultra-wide band 1 http://www.arinc.com/news/2005/09-22b-05.html. 2 http://www.advancenanotech.com/060228_wireless-research. html.
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS
223
(UWB) and WiMax nodes. It should be noted that all types of nodes may be present throughout the deployment region and not just the areas indicated by the figure. These areas only represent their high density regions. These nodes can communicate within their own groups but cannot communicate with nodes of a different type (different groups). As shown in the figure, the drones act as bridges between different homogeneous networks. Thus the communication between two different types of nodes would be routed via single or multiple drones. For a static network with no mobility of nodes, the drones may only be equipped with a few select interfaces in order to reduce the cost of each drone. However, for a mobile network, each drone may have all types of interfaces but may choose to keep only a few interfaces active in order to conserve its energy. The cost (in terms of expense or energy) of the solution increases with the number of interfaces on each drone. Thus, the objective of this work is to estimate strategic drone placements that will improve network connectivity by minimizing the number of hops between any two nodes of the heterogenous network while using minimum number of drones and their interfaces. The set of problems to be addressed are: How many interfaces should the drones have? How to distribute the drones? Where to place them to avoid network partitions? These issues are addressed by a placement and interface selection algorithm (PISA). In essence, PISA solves the placement problem of drones in two phases, namely: coarse placement followed by refined placement step. In the first phase, candidate drone locations are identified using heterogeneous clustering. These locations are then used to select the drone locations and their interfaces in the second phase. Although we assume an ad hoc network, the proposed solution can be extended trivially to infrastructure based networks, where nodes are one hop away from base stations. The paper is organized as follows: Section §2 discusses the previous work in related areas, Section §3 describes the details of PISA, Section §4 presents results of the successful application of PISA in different scenarios and finally, the conclusion is provided in Section §5. 2. Related work. For a homogeneous infrastructure-based network, our problem of drone placement is similar to the problem of base station placement in mobile cellular networks. In both these problems, a minimum number of high end devices (drones or base stations respectively) are to be placed to maintain connectivity and coverage for the entire network. There are many proposed solutions to obtain optimal base station placement for cellular network coverage. One such solution using a combinatorial algorithm was proposed in [7]. The initial base station locations are split into smaller groups and all base station combinations within the group are tested to obtain the best coverage. The base stations resulting in best solutions from different groups are then merged to form new
224
SANTOSH PANDEY AND PRATHIMA AGRAWAL Node locations, interface type, 'k', 'p'
Result: Drone positions and interfaces
FIG.
2. PISA flowchart.
groups and the process is repeated. However, extensive computations are required as the number of base stations increases. Evolutionary algorithms, which are adaptive heuristic search algorithms based on evolutionary ideas of natural selection and genetics, have also been used. In [8], the authors propose the use of evolutionary algorithms to generate a set of cell sites that satisfy multiple objectives (such as high coverage, low multi-coverage etc) of a coverage problem. In [5], the authors use a genetic algorithm, which is a type of evolutionary algorithm for obtaining the base station placements. An appropriate genetic representation was proposed in order to increase efficiency of the genetic algorithm. A similar problem is also described for sensor networks. In [6] the authors propose techniques based on Voronoi diagrams for placement strategy of additional sensor nodes in a pre-deployed sensor network in order to improve the worst and best-case sensor coverage. All of the above solutions assume a homogeneous network. Most of the previous work on heterogeneous ad hoc network topology control, such as [4]' consider heterogeneous nodes as nodes having different transmission powers but being able to communicate with each other. However,we consider heterogeneous nodes as dissimilar nodes that cannot communicate with each other. Note that for simplicity we assume equal transmission power for all types of nodes and drone interfaces, but this solution is applicable otherwise too.
3. Methodology. We assume that a fixed number of drones are to be deployed throughout the network. Our proposed algorithm, PISA, not only identifies the placements for drones but also determines the interfaces on each one of them. PISA is carried out in two steps as shown in Fig. 2. 1. Coarse Placement: Using a heterogeneous clustering algorithm, we find regions of nodal heterogeneity within the deployment area. k
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS
101
102
103
104
105
6) (6 ~)
~)
~) ~
(~ (6 (6 ~
201
101
202
102
6 6 6:1 63 6 6:1 63 J [J
X
301
101
201
X
102
6) (6 6) (6
225
201
202
302
203
204
205
(a) Original configuration
FIG.
103
104
105
~
X
303
X
304
~
103
X
305
(b) Coarse placement. Candidate locations marked as 'X'
~;;-"~--76 203
i
~;~~~~6
~: 6 (~
204
205
(c) Refined placement. Drone with interface (1,2)
3. Simple example to demonstrate PISA.
clusters are formed in this step. The cluster centers are chosen as candidate locations where the drones may be placed. 2. Refined Placement: Using a genetic algorithm, we find the subset of locations from the set of candidate locations for drone placement. We assume p drones are to be placed to improve network connectivity while minimizing network partitions and number of interfaces on each drone. We present a simple example in order to outline our proposed solution. Nodes 101-105 and 201-205 in Fig. 3(a), are two groups of nodes with different interfaces. The coarse placement step identifies the heterogeneous clusters and their respective centroids (candidate locations), indicated as 301-305, as shown in Fig. 3(b). We then use a genetic algorithm to find a refined location that is shown in 3(c). It is easy to calculate that the average number of hops between two nodes is minimum when the drone is placed at 305, i.e, the solution obtained by PISA. We explain each of the two steps in detail in the following sections.
3.1. Coarse placement. The first step in the algorithm is to determine the coarse positions (candidate locations) for drone placement. Intuitively, a candidate location should be such that a drone placed at that location can interconnect maximum number of heterogeneous nodes. Thus
226
SANTOSH PANDEY AND PRATHIMA AGRAWAL
the problem of determining candidate locations can be translated to finding regions or clusters of heterogeneous nodes in the network. The regular clustering algorithms such as k-means or hierarchical clustering consider the distance between the nodes for formation of clusters [9). Along with the distance measure, we incorporate an additional heterogeneity criteria to be considered for cluster formation. This can be formulated as a multi-objective clustering problem in which clusters are formed based on multiple criteria (node distance and heterogeneity in our case). Multiobjective clustering techniques using evolutionary algorithms are described in (3) and [10). However, in our case, due to the simplicity of the clustering criteria and the relaxed accuracy requirement for this step, we modify the basic kmeans algorithm for obtaining heterogenous clusters. We choose k-means clustering as the underlying clustering technique since it has linear time complexity while hierarchical clustering has quadratic time complexity [9). The heterogeneous clustering algorithm (RCA) is described next.
3.1.1. Heterogeneous clustering algorithm (HCA). We describe the basic k-means clustering algorithm which forms k clusters. Initially k clusters are formed by randomly classifying all the nodes into one of the k clusters. The algorithm then iteratively reshuffles each node based on its Euclidian distances (d 1 , d2 , .. . dk ) from all the centroids of k clusters. The node is assigned in the cluster which is "closest" (minimum distance) to it. The algorithm converges if all the cluster elements remain unchanged in successive iterations. The modification of "distance" parameter (or clustering objective function) to incorporate heterogeneity criteria in basic k-means algorithm is explained next. Consider a network which consists of N; types of heterogeneous nodes. We will define a measure of heterogeneity as follows. We represent parameter ti, as the heterogeneity due to a node of type tj for the i t h cluster. It is calculated as: .-N.( .)_Ni(t-tj ) n1, tJ N ' 1,
t -
1
(3.1)
where, Ni(t j ) represents the number of nodes of type t j (same type as the node under consideration) and Ni(t - tj) represents the number of all other types of nodes (except type t j ) in the i t h cluster. Clearly, ti; is positive if cluster i contains more type t j nodes than an equal sized perfectly heterogeneous cluster" and vice versa. The modified distance of a node to centroid of cluster 'i' is represented as d'i and calculated as:
(3.2) where, d; is the Euclidian distance, n', is the normalized value of ri, and ex is the heterogeneity weighing factor. n'i E (0, 1) is normalized considering 3 All
N j types of nodes are in equal proportion in a perfectly heterogeneous cluster.
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS
227
all tu, i == 1,2, ..., k clusters. Normalizing ni to n', is required in order to compensate for different cluster sizes in the network. Note that when Q == 0, the heterogeneity parameter is neglected while clustering, and Q == 1 gives heterogeneity parameter equal weight as that of distance d.: To explain d', intuitively, consider anode of type tj to be added in a cluster 'i'. If the cluster has more type t j nodes than a perfectly heterogeneous cluster, i.e. Ni(t j ) > ~i, then n, > O. Similarly, ni < 0 if the cluster t has less t j type nodes. Thus, the corresponding normalized n'i and hence the resultant d'i would be greater in the former case than the latter. Note that although d'i > d; for all cases, the relative increase in d'; depends on n'i. Thus, if the cluster 'i' has more type t j nodes then d'i increases and thus decreasing the chances of inclusion of the node in the cluster.
3.1.2. Test example. In order to validate the heterogeneous clustering algorithm described above, we consider a simple example with 4 nodes of two types placed at the vertex of a square as shown in Fig. 4(a). When 2 clusters are formed using the original k-means algorithm, the resultant horizontal and vertical clusters are shown in Fig. 4(b) and 4(c) respectively. Due to the equal distance between horizontal and vertical nodes, both these cases occur with equal probability. However, if heterogeneous clustering is desired, the horizontal clusters comprising of heterogeneous nodes are preferred over the vertical clusters which comprise of homogenous nodes. Fig. 4(d) shows the percentage of horizontal and vertical clusters formed when clustering was carried out 1000 times using HCA with different values of Q. As seen from this figure, when heterogeneity and distance are equally weighted (Q == 1), the desired heterogenous clusters are obtained for all iterations. This clearly demonstrates that HCA successfully incorporates heterogeneity. We fix Q == 1 for the rest of the paper. 3.2. Refined placement. From the k candidate locations, obtained from the previous step, we need to select p optimal locations for placing p drones. Each drone may have a different set of multiple interfaces from the total N, types of interfaces in the network. We use a genetic algorithm in this case for determining both the drone locations and the set of interfaces for each drone.
3.2.1. Genetic algorithm (GA). Many genetic algorithm implementations are proposed in the literature for the base station placement problem [5, 8]. We consider the basic genetic algorithm (GA) in this work [2]. In GA, many candidate solutions (population) are considered at a time with each solution (individual of the population) represented as a string of binary sequences (chromosomes). The fitness of each individual is calculated using a fitness function. The individuals that are better (higher fitness value) in the previous iteration (generation) are chosen as parents to generate new individuals in the current iteration. This is done by either swapping random parts of the parent chromosomes (cross over) or by
228
SANTOSH PANDEY AND PRATHIMA AGRAWAL
® I
~ x~ I i
®i
8
i
@
@
(a) Original nod es .
(b) Desir ed horizon t al cluster s.
(c) Un desir ed verti ca l clu st er s .
~100
C c:
90
~ .~
80
§
70
.E
60
0
"0
c: g'"
.0
50
0
V>
40
"0
30
~::::>
0
'"
20
'~" '"
10
Cl
~
a,
0
0
(d) Effect of
0.25
Q
0.5 a
0.75
on form a t ion of het ero gen eou s cluster s.
FIG. 4. Heteroqeneous cl'U8tering example.
changing random bits in a chromosome (mutation). The resultant children replace the pa rents only if they have a higher fitness value than the parents . The algorithm terminates after a prespecified number of generations. We now explain the representation of individual chromosomes for our GA implementation. Our representation is based on the chromosome representat ion described in approach 3 of [5] since it was found to minimize the execution time and increase fitness of the resultant solut ion. The drone location and its interfaces are represented as binary strings in an individual chromosome as follows. The drone location is selected from the k candidate
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS TABLE
229
1
Individual chromosome representation.
l 1
Vi Vk Interpretation
2
1
Drone
Nt
11101 110 29 24 drone to be placed at 24th candidate location with interfaces {1,2}
l
Nt
01110 101 14 12 drone to be placed at 12t h candidate location with interfaces {1,3}
locations which can be represented by l == flog2(k)1 bits. Since k:S 2z, we linearly scale the decimal value (say V'l) represented by '1' binary bits to map to actual candidate location index (say Vk ) . Note that Vi E (0,2 Z) while Vk E (1, k). Thus the drone will be placed at the Vk th candidate location from the k candidate locations. Thus,
k -1 + 1) , Vk == 1NT ( Vi x -z2 -1
(3.3)
where, INT(x) represents the nearest integer to x. We use an additional Nt bits to represent the multiple interfaces for each drone. Each of these bits correspond to a single type of interface present in the heterogeneous network. The subset of N, bits that have value' l' represent the interfaces present on the corresponding drone. Thus a single drone can be completely represented by l + N, bits. Since we consider placing p drones, a possible solution (an individual chromosome) is comprised of p blocks of l + N, bits, i.e. p(l + Nt) bits in all. For example, consider a heterogeneous network with N t ==3 and 26 candidate locations from the coarse placement step (k==26); thus I == 5. If 2 drones are to be placed for this network (p==2), the length of the individual chromosome (1) will be 16. Table 1 shows the representation of an example individual chromosome I == 1110111001110101. Next, we explain the fitness function used for the GA. When a multiinterface drone is added to a heterogeneous network, it increases the communication linkages in its neighborhood by interconnecting nodes of different types. Thus, the p drones in a particular solution (individual) are represented with linkages amongst different heterogeneous nodes in their respective coverage regions. The network can be represented as a graph with edges representing the communication links between nodes. These edges may be due to direct communication between homogeneous nodes or communication between heterogeneous nodes via a drone. Let this resultant graph be represented as H with all the nodes as the vertices of the graph. We do not represent drones as additional vertices as this facilitates
230
SANTOSH PANDEY AND PRATHIMA AGRAWAL
the comparison of connectivity in the heterogenous network with different number of drones. The spectrum of graph H is used to measure the network connectivity as in [1]. The spectrum of a graph is represented by the eigenvalues and eigenvectors of the adjacency matrix or Laplacian matrix (the difference of the degree matrix and adjacency matrix) [11]. However, unlike [1] that uses the eigenvalue of the state transition matrix to quantify the connectivity, we use the maximum eigenvalue (Am) of the adjacency matrix of H. We find that this measure reflects the overall connectivity of the graph rather than local connectivity. We have validated this measurement parameter for various scenarios but do not report the results here due to lack of space. Thus, we define the fitness function for an individual as,
II =
(Aml
(3.4)
(Ninter face) X (Npartitions)
where, Ninter face represents the total number of interfaces from all the p drones in an individual and Npartitions represents the number of partitions in the graph H. Npartitions is equal to the number of unit eigenvalues of the state transition matrix generated using adjacency matrix of H [1]. The exponential factor, /3, was fixed to 4 by trial and error in order to give more weight to Am in the calculation of II. The fitness value increases with the increase in the connectivity across the network but decreases with the increase in the number of drone interfaces and network partitions. The fitness function tends to obtain a low-cost solution by minimizing the number of interfaces on each drone.
3.2.2. Test example. We now consider a test example to demonstrate the validity of the proposed GA. Consider 3 columns of heterogeneous nodes as shown in Fig. 5. The nodes are placed at regular unit distance from each other, i.e. a node is unit distance intervals away from its north, south, east and west neighbor. The communication distance for all types of interfaces for both the drone and nodes is assumed to be unit distance. However, since the nodes in adjacent columns are of different types they cannot directly communicate with each other. Let the candidate locations be as shown in Fig. 5(a). We now use this as input to GA in order to place 2 drones (p==2). The result at the end of 300 generations is shown in Fig. 5(b). The interfaces of the respective drones are also indicated in the figure. As seen from this figure, the solution obtained by GA has optimal placements (minimum average number of hops between any two nodes in the network) and interfaces for both the drones. 4. Test cases for PISA. We consider 2 examples to test our proposed PIS A algorithm. In these examples, the location and type of nodes are known. PISA gives the solution for placements and interfaces of drones in the network.
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS 231
® I I
@x@x@
~
i
i
@x@x@
I
@X@ X·@ ; D{1,2}
D{2,3} I
I
@X@X@
@
@X@X@) (a) Candidate locations.
FIG. 5.
(b) Refined placement.
Genetic algorithm example.
4.1. Three column example. We reconsider the previous example of 3 columns of heterogeneous nodes as discussed in Section §3.2.2. In this section we consider the application of PISA for obtaining the complete solution. The number of clusters, k, is varied from 4 to 8 while the number of drones to be placed, p, is varied from 2 to 4. Since a drone should have atleast two different interfaces on it, any drone with less than 2 interfaces, i.e. less than 2 bits set in the Nt chromosome bits, is ignored. The resultant fitness values are plotted in Fig. 6(a) under single.clustetuig, The drone locations corresponding to different fitness values are represented in Fig. 6(b). Note that due to symmetry of the node distribution, locations that are mirror images of the solutions depicted in the figure would also result in the same fitness values. These are trivial and not shown in the figure. As seen from Fig. 6(a) under single_clustering, it is observed that in many cases using k candidate locations that are obtained from single RCA iteration result in lower fitness values. In one of the case, k==7 and p==3, no solution was obtained and hence the resultant network is partitioned (Fig. 6). It is found that this is due to an inappropriate set of candidate locations obtained from RCA. Thus in order to correct this, the coarse placement step is modified to incorporate multiple iterations (5 iterations in our case) of HCA. The resultant set of candidate locations is a union of
232
SANTOSH PANDEY AND PRATHIMA AGRAWAL
7 ,-
~
- - _.
~ --
I ...---.-..-.•-.-.•.- -_a_ .__......-
I
~
.. .- -- ........- • .- - - -.- --. -
.
- - single_clu stering - .-- multiple_c lustering
44
4
5
5
5
6
6
6
7
7
7
888k
23
4
2
3
4
2
3
4
23
4
23
4
P
(a) Fi tness va lues for different val ues of k a nd p .
@ @ Z @ V @
Fitness Drone Value Position
6.45
@
@ V @ W
W
@ z
@
X
X
@
®
@
W
6.17
X
4.85
V
2.59
Z No drones
2.49
®
(b) Dro ne placements for d ifferent fit ness values.
FIG. 6-. Effect of multiple clust ering iterations f or 3 columns example.
all the cluster centroids obtained from each RCA iteration. This greatly improves the fitness values of resultant solution for all values of k and p as shown in Fig. 6(a) under multiple-clustering. Note t hat most of the solutions for different values of k and p result in drones placed at location 'W' in Fig. 6(b) . This results in linkages amongst nodes {12, 13, 22, 23} and {22, 23, 32, 33} which are within unit distance from respective drones place at 'W'. This is better than previous solution of Section §3.2.2 which only resulted in links amongst nodes {13, 23} and {23,
INTEROPERABILITY IN HET EROGE NEOUS AD HOC NETWORKS
233
650
o 600
550
~
•
500
o
o
• C, •..p .\ ... to ".x•..:. ••· ':-J.' . _;> C
..
C, • •••
"
~.
~
450
2
•
.
1 sd contour co
..
400
350
400
450
500
550
•
D{1 ,2,3 } D{1.2 } D{1.3) o D{2.3}
600
650
FIG. 7. Dis tribu tion of dron e placeme n ts for net work formed of three type of nod es each no rmally distribut ed with m ean at C I, C2 and C3 respective ly.
33}. In fact , it can be easily observed th at no ot her placements of drones anywhere in the network would result in lower average number of hops between two nod es t ha n 'W' (or locations that are its mirr or images). Thus, using multiple clustering iterati ons almost all solut ions result in optima lly connecte d heterogeneous network irrespective of t he value of k 4 . It is inte rest ing to point out t hat all solutions, even for p > 2, consist of only 2 drones; th e other (p-2) drones had 0 or 1 int erface. Moreover, the drones between columns 1 and 2 consist of interfaces '1' and '2' and similarly drones between columns 2 and 3 consisted of interfaces '2' and '3'. No solution had drones with all 3 interfaces. 4.2. Three Gaussian distribution example. In t he previous example, PISA was tested with fixed node locations. We now test PISA for randomly distributed nodes. Consider a network comprising of 3 types of nodes. These nod es are each Gaussian distributed with respective statistical means at C1 , C2 and C3 as shown in Fig . 7. The figure also repr esents a x , 0 and 0 with th e respective statistical means to indicate the different type of nodes for each distribution. The mean s are chosen to form a triangular arr angement. Each of the Gaussian distributions has a standard deviation equal to 50 and a cross-covariance equal to O. We assume that there are 50 nodes of each typ e; hence, a total of 150 nodes in t he network. The communi cation distan ce is assumed to be 50 units for all typ e of nod es and drone int erfaces. Intui tively, a favorable solut ion would be a single drone with 3 inte rfaces placed at the cente r of t he t riangle formed by C 1 , C2 and C3 . We run a Monte-Carlo simulation with t he node locations rand omly selected based on Gaussian probability density function for each iteration. 4Note t hat t his is only valid when non- extreme values of k are considered .
234
SANTOSH PANDEY AND PRATHIMA AGRAWAL
The interconnections between similar nodes are obtained based on the communication distance amongst nodes. PISA is then applied to obtain the drone locations. Multiple clustering iterations, as explained in previous case, is used in this case too. For each iteration, k and p are varied as {IS, 20, 25, 30} and {2, 4, 6} respectively. Note that some of the combinations of k and p result in drones with 0 or 1 interface which were ignored. The number of GA generations for this case was set to 500. In all 26 monte-carlo iterations are considered and the resultant drone placements are represented in Fig. 7. As seen from the figure, most of the solutions are drones with 3 interfaces which lie between the 3 means. The drones with 2 interfaces are usually between the means of the corresponding Gaussian distributions. The solution obtained from PISA, thus follows the intuitive placement of drone for this simple example.
5. Conclusion. This paper discusses the use of drones (multiinterface devices) to maintain connectivity in a heterogeneous network. The placement and interface selection for these drones is obtained by PISA. The two steps of PISA are validated via test examples. PISA applied to different test cases results in desirable solutions. It is observed that the final solution depends on the candidate locations generated during the coarse placement step. Multiple clustering iterations during this step improve the resultant solution. Also, if PISA is given more than the required number of drones to be placed, it returns the excessive drones with zero interfaces. Thus, PISA solutions result in minimum number of drones with minimum number of interfaces on them. We are currently studying the effect of mobility of nodes on drone placements. As future work, we plan to extend genetic algorithms to automatically determine the number of clusters, k, for the coarse placement step and the number of drones, p, for the refined placement step. Various combinatorial algorithms can be investigated in the future to reduce the execution time for the refined placement step. The proposed work is useful across a wide spectrum of applications dealing with heterogeneous networks.
REFERENCES [1J
S. RAMANATHAN, AND D.T. LANG, Measuring performance of ad hoc networks using timescales for information flow, Proceedings of IEEE INFOCOM, 2 (2003), pp. 1564-1574. [2] D.E. GOLDBERG, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley Professional, 1989. [3J J. HANDL AND J. KNOWLES, Multiobjective clustering with automatic determination of the number of clusters, Tech. Rep. TR-COMPSYSBIO-2004-02, UMIST, Manchester, August 2004. [4] N. Lr AND J.C. Hou, Topology control in heterogeneous wireless networks: Problems and solutions, Proceedings of IEEE INFOCOM, 1 (2004), pp. 232-243. R.M. D'SOUZA,
INTEROPERABILITY IN HETEROGENEOUS AD HOC NETWORKS
[5] [6]
[7]
[8]
[9] [10]
[11]
235
E. LAITINEN, AND J. LAHTEENMAKI, Radio coverage optimization with genetic algorithms, IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, 1 (1998), pp. 318-322. S. MEGUERDICHIAN, F. KOUSHANFAR, M. POTKONJAK, AND M.B. SRIVASTAVA, Coverage problems in wireless adhoc sensor networks, Proceedings of IEEE INFOCOM, 3 (2001), pp. 1380-1387. A. MOLINA, G.E. ATHANASIADOUAND, AND A.R. NIX, The automatic location of base-stations for optimised cellular coverage: A new combinatorial approach, IEEE Vehicular Technology Conference, 1 (1999), pp. 606-610. L. RAISANEN AND R.M. WHITAKER, Multi-objective optimization in area coverage problems for cellular communication networks: Evaluation of an elitist evolutionary strategy, Proceedings of the ACM Symposium on Applied Computing (2003), pp. 714-720. M. STEINBACH, G. KARYPIS, AND V. KUMAR, A comparison of document clustering techniques, In KDD Workshop on Text Mining (2000). C.L. VALENZUELA, A simple evolutionary algorithm for multi-objective optimization (SEA MO), Proceedings of the 2002 Congress on Evolutionary Computation, 1 (2002), pp. 717-722. P. ZHU AND R.C. WILSON, A study of graph spectra for comparing graphs, British Machine Vision Conference (2005). http://www.bmva.ac.uk/bmvc/ 2005/papers/162/bmvc2005b.pdf. K. LIESKA,
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS CHRISTIAN SCHEIDELER* Abstract. Radio networks are widely used today. People access voice and data services via mobile phones, Bluetooth technology replaces unhandy cables by wireless links, and wireless networking is possible via IEEE 802.11 compatible network equipment. Nodes in such networks exchange their data packets usually with fixed base stations that connect them with a wired backbone. However, in applications such as search and rescue missions or environmental monitoring, no explicit communication infrastructure may be available. In this case, the wireless hosts have to organize in a so-called wireless ad hoc network. As long as all of the hosts are within transmission range of each other, the problem of exchanging information in such a network basically boils down to designing suitable medium access control protocols, but if not all hosts can directly communicate with each other, we also need suitable routing algorithms. Designing routing algorithms for wireless ad hoc networks is an extremely challenging task and still research in progress. In this paper, we mostly focus on the simpler question of how to maintain an overlay network of wireless links between the hosts so that, as a minimum requirement, every node is reachable from every other node (i.e. the graph formed by the links is connected) as long as this is possible. Ideally, for every pair of nodes (v, w) there should also be a route from v to w with a close to minimum possible hop distance or energy consumption. The graph formed by the wireless links should also have a low degree to ensure a low maintenance cost and it should be easy to update in case of arrivals or departures of nodes or changes in their positions. This paper will present various strategies for reaching these goals under ideal as well as (more) realistic models. Key words. Wireless ad hoc networks, overlay networks, spanner, wireless models. AMS(MOS) subject classifications. 68M10, 68R10, 90B18.
1. Introduction. The problem of designing an overlay network for wireless ad hoc networks has recently attracted a lot of attention. A basic requirement for these overlay network designs is that they maintain connectivity among the hosts, as long as this is possible. The most straightforward approach to achieve connectivity is to maintain a link between every pair of wireless hosts that are within their transmission range. However, this may require a high maintenance and update cost since the corresponding overlay network may have a high degree. Also, some links may have a high energy cost, and so a natural question would be whether these can be dropped without endangering connectivity. An alternative approach would be to maintain connections only to the k nearest neighbors. However, Figure 1 demonstrates that it is easy to come up with examples in which the graph formed by the links would not *Fakultat fiir Informatik, Technische Universitat Miinchen, Boltzmannstr. 3, 85748 Garching b. Miinchen, Germany (phone: +49-89-289-17709, sche i de.Lero.in , tum. de). Work was done while working at the Johns Hopkins University, supported by NSF grants CCR-0311121 and CCR-0311795. 237
238
CHRISTIAN SCHEIDELER
•
/\
.~.
FIG.!. A counterexample for the naive approach with k
= 2.
be connected. So this approach does not work in general. As was shown by Xue and Kumar [30], it only works in specific cases. For example, if n hosts are distributed uniformly at random in a unit square and every host connects to more than 5.177 4 log n of its nearest neighbors, then the network formed by these links is connected with a probability that tends to 1 as n increases. But connecting to less than 0.07 410g n nearest neighbors results in almost sure disconnectivity. Another possible approach is that every host maintains connections to k hosts chosen uniformly at random among all hosts within its transmission range. This also does not guarantee connectivity in general but works well in certain cases. For example, Dubhashi et al. [8] recently showed that if every node has at least e (log n) nodes within its transmission range, then choosing just 2 random nodes to connect to will establish connectivity almost surely. In this paper, we are only focusing on approaches that guarantee connectivity no matter how the hosts are distributed, as long as this is in principle possible. Most of these approaches are based on so-called spanners, which are properly selected subgraphs of the graph of all possible connections between the wireless hosts so that the hosts are not only connected but their (hop or Euclidean) distance in that graph is closely related to their minimum (hop or Euclidean) distance when considering all possible connections. Spanners first appeared in computational geometry [10, 31], were then discovered as an interesting tool for approximating NP-hard problems [24], and have recently attracted a lot of attention in the context of routing and topology control in wireless ad hoc networks [1, 11, 12, 3, 23]. In the following, the wireless hosts are simply called nodes. To simplify our presentation, we assume that the nodes are distributed in a perfect 2dimensional Euclidean space, or formally, the nodes represent a set of points V C }R2, but all of the approaches presented here can also be extended to higher dimensions. Given any pair of nodes U == (u x , u y), v == (v x , v y) E }R2,
denotes the Euclidean distance between u and v, and given any sequence of nodes s == (Ul' U2, ... ,Uk) and any 8 2: 0,
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
239
"
I, (
,/
"
\,
'",~ - ;,,-'-
\. ,
-'
I,
:~'
'. "
....
,",
........ ~
- -'~~ - ~ ...
FIG. 2. A connected unit disk graph.
k-l 11 8
11<5
==
L
Il
u i U i+ l ll<5
i=l
denotes the b-cost of s. For any graph G == (V, E), a node sequence s == (Ul' U2, ... ,Uk) is called a path in G if (Ui,Ui+l) E E for all 1 ::; i < k. Given any directed graph G == (V, E) and any two nodes u, v E V, the b-distance db(u, v) of U and v in G is the minimum b-cost IIpll<5 over all paths p from U to v in C. If c5 == 0, then db (u, v) gives the topological (or hop) distance of U and v in G, and if c5 == 1, db(u, v) gives the Euclidean distance of U and v in G. Also cases with c5 > 1 are interesting for us because the transmission of a packet over a distance of r usually has an energy consumption that scales with r<5 for some b > 1. In reality, b is usually in the range [2,5], where it is closer to 2 outdoors and closer to 5 indoors. We assume that every node has a maximum transmission range of 1, Le., every node u E V can send messages only to nodes v E V with IIuvll ::; 1. From this assumption it follows that every overlay network connecting these nodes can only be a subgraph of the following graph. DEFINITION 1.1. For any point set V C }R2, the unit disk graph of V, called UDC (V), is a directed graph that contains all edges (u, v) with
Iluvll
:S 1.
In the following, we will always assume that V is chosen so that its UDG is connected and non-degenerate, i.e., there is a path in UDG(V) between every pair of nodes and no two pairs of nodes have exactly the same Euclidean distance (see also Figure 2). The connectivity assumption is a prerequisite for our strategies below to establish a connected network among the nodes and the non-degenerateness property will simplify the proofs. When G is the UDG of V, we simply use d<5 (U, v) instead of db(u, v).
240
CHRISTIAN SCHEIDELER
1.1. Structure of the paper. The rest of this paper is organized as follows. First, we define different kinds of geometric spanners and provide relationships between them (Section 2.1). Afterwards, we study a general class of graphs called proximity graphs that contain many of the spanner constructions proposed for ad hoc networks (Section 2.2). Among these spanner constructions are sector-based spanners and planar spanners. Various sector-based spanners are reviewed in Section 2.3, and various planar spanners are reviewed in Section 2.4. All of these constructions are based on simple space and energy models. Namely, the nodes are distributed in a perfect 2-dimensional Euclidean space, every node has a transmission radius of 1 and the energy necessary for transmitting a message over a distance of d is d<5 for some fixed constant 6 ~ 2. In Section 3 we show how to modify the spanner constructions in Section 2 so that even under more realistic models the spanner constructions still work. The paper ends with conclusions. 2. Spanners. First, we define spanners in which arbitrary pairs of nodes can, in principle, be connected by an edge (i.e., we do not limit the transmission range of nodes). DEFINITION 2.1. Consider any finite set of nodes V C }R2, and let c ~ 1 be any constant. • A graph G == (V, E) is called a geometric c-spanner of V if for all u, v E V there exists a path p from u to v in G with
IIpll
~ c '1luvll
.
If G is a geometric c-spanner, c is called its stretch factor. • G is a (c,8)-power spanner of V if for all u, v E V there is a path p from u to v in G with
If for all 8 ~ 2 there exists a constant c so that G is a (c, 8)-power spanner, then we simply call G a power spanner. • G is a weak c-spanner of V if for all u, v E V there is a path p from u to v in G that is within a disk of diameter at most
c'1luvll . • A graph G == (V, E) is called a constrained (geometric, power, or weak) spanner of V if for every pair of nodes u, v E V there is a path p that, in addition to the specific requirement for the spanner type, also satisfies the condition that for every edge e in p,
Ilell < Iluvll· Since wireless nodes have a limited transmission range, the following spanner definitions are more relevant for ad hoc networks.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
u
FIG.
u
v
241
v
3. Examples of a spanner, weak spanner, and power spanner.
DEFINITION 2.2. Let V C }R2 be any finite set of nodes with a connected UDG. • A graph G == (V, E) is called a geometric c-spanner of UDG(V) if for all u, v E V there exists a path P from u to v in G with
IlplI:S; c -d(u,v). • G is a (c, 6)-power spanner of UDG(V) if for all u, v E V there is a path p from u to v in G with
• G is a weak c-spanner of UDG(V) if for all u, v E V there is a path P from u to v in G that is within a disk of diameter at most
c- d(u, v) . Interestingly, any constrained spanner of V in which all edges of length more than 1 are removed is also a spanner of the UDG of V, as shown in the next theorem. THEOREM 2.1. A ny constrained geometric c-spanner / (c, 6) -power spanner / weak c-spanner G of V restricted to edges of length at most 1 is also a geometric c-spanner / (c, 6)-power spanner / weak c-spanner of the UDG ofV. Proof Let U be the UDG of V. Suppose that G is a (c, b)-power spanner of V for some 6 ~ O. Then it holds for every pair of nodes u, v E V with lIuvll :s; 1 that there is a path P in GnU with Ilpllb :s; clluvll b . Now, consider an arbitrary pair u, w E V, and let p == (VO, VI, v2, ... , Vk) be any path in U with Vo == u and Vk == w that has a b-cost of db(u, w). Since IlviVi+111 :s; 1 for all i, there is a path Pi from Vi to Vi+1 in GnU with Ilpi lib :s; cllvivi+lll b . Concatenating these paths, we end up with a path p' with
IIp'll b ==
k-I
k-I
i=O
i=O
L Ilpi lib:::; L cllviVi+lll b ==
C .
db (u, w) .
242
CHRISTIAN SCHEIDELER
Hence, GnU is also a (c, 6)-power spanner of U. Since a geometric cspanner is just a (c, Lj-powcr spanner, this also proves the theorem for constrained geometric spanners. Finally, consider the case that G is a constrained weak c-spanner. Then it holds for every pair of nodes u, v E V with Iluvll ::; 1 that there is a path p in GnU that is within a disk of diameter at most ciluvil. Consider now an arbitrary pair u, w E V, and let p == (va, VI, V2, ... , Vk) be any path in U with Va == u and Vk == w that has a Euclidean length of d(u, w). Since IlviVi+lll ~ 1 for all i, there is a path Pi from Vi to Vi+1 in GnU that is within a disk of diameter at most c . IlviVi+lll. Concatenating these paths, we end up with a path pi that is within a disk of diameter at most c- d(u, w). To prove this, we need the following straightforward fact. FACT 2.2. Any two disks of diameter d l and d 2 with a non-empty intersection are contained in a disk of diameter at most d 1
+ d2 .
Using this fact in an inductive manner on the length of p, it follows that when replacing the paths Pi in pi by their disks, pi is contained in a disk of radius at most k-I
L c ·llvivi+11l ::; c - d(u, w) . i=a
n
Hence, it suffices to present and analyze algorithms for constrained spanners in order to obtain overlay networks that are also spanners of UDGs.
2.1. Relationships between spanners. Next, we study general relationships between the different kinds of spanners. All of these relationships hold for general spanners as well as constrained spanners. To keep the paper at a reasonable size, most of the proofs are left out. For detailed proofs, see [26, 28]. We start with a straight-forward theorem. THEOREM 2.3. Every graph G == (V, E) that is a (constrained) geometric c-spanner is also a (constrained) weak c-spanner.
However, the theorem does not hold any more when considering power spanners. THEOREM 2.4 ([28]). For any <5 > 1 there is a family of (constrained) (c, 6) -power spanners which are not a (constrained) weak C -spanner for any constant C. Also, the reverse direction of Theorem 2.3 is not true, i.e., the fact
that a graph is a weak spanner does not imply in general that it is also a geometric spanner. THEOREM 2.5 ([28]). There exists a family of graphs G == (V, E) with V c IR 2 all of which are (constrained) weak 2( J2 + I)-spanners but not a (constrained) geometric c-spanner for any constant c. The next theorem studies the relationship between geometric spanners and power spanners, which is easy to show.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
243
THEOREM 2.6. Every (constrained) geometric c-spanner is a (constrained) (c6 , 6) -power spanner for every <5 ~ 1.
Hence, in order to prove that a graph is a power spanner, it suffices to prove that it is a geometric spanner. Interestingly, for <5 ~ 2, it even suffices to show that a graph is a weak spanner in order to prove that it is a power spanner. THEOREM 2.7 ([28]). Let G == (V, E) be a (constrained) weak cspanner. Then G is also a (constrained) (C, 6)-power spanner for <5 > 2 8
where C == (4c + 1)2 . 1_~2-(5' It is even a weak spanner for <5 == 2. However, a weak c-spanner may not be a (C, <5)-power spanner for any constant C if <5 < 2. THEOREM 2.8 ([28]). For any 6 < 2 there exists a family of graphs G == (V, E) with V c }R2 which are (constrained) weak c-spanners for a constant c but not a (constrained) (C, 6)-power spanner for any constant C. Summing up Theorems 2.3, 2.4, 2.5, 2.6, and 2.7, we obtain the following interesting relationship between the class of all geometric spanners, weak spanners, and power spanners with <5 2:: 2: Geometric spanners
C
Weak spanners
C
Power spanners.
2.2. Proximity graphs. From our insights on spanners above it follows that it would often be sufficient to design protocols that guarantee a constrained weak c-spanner as long as this is possible because weak spanners are guaranteed to have energy-efficient paths. But how can such spanners be designed? Consider the following definition: DEFINITION 2.3. For any node set V C }R2, the graph G == (V, E) is called a proximity graph of V if and only if for all u, w E V it holds that • (u, w) E E or • there is a v E V with (u,v) E E and II vwll < Iluwll. For an example of a node v satisfying the proximity conditions, see Figure 4. It is known that there are proximity graphs with a stretch factor as bad as IVI - 1 [4] but proximity graphs are always good weak spanners. THEOREM 2.9. For any finite V C ]R2, every proximity graph of V is a weak 2-spanner. Proof Let G == (V, E) be any proximity graph of V. First we prove that G is connected. Certainly, a graph G is connected if and only if for every pair of nodes in G there is a path connecting these two nodes. So consider any pair of nodes u, w E V. We distinguish between two cases: 1. (u, w) E E: Then u and ware connected, and we are done. 2. There is a v E V with (u,v) E E and Ilvwll < Iluwll: Then we use the edge (u, v) and get closer to w then we were before. Since V is finite, we only have to apply case 2 a finite number of times until case 1 holds. Hence, G is connected. Besides G being connected, it follows from the observation above that for any pair of nodes u, w E V there is a path p that monotonically con-
244
CHRISTIAN SCHEIDELER o
o
o o v ..
o o FIG. 4. Connections satisfying the RNG condition for v. (Removing the dashed connections gives a minimum set of connections satisfying the RNG condition.)
verges against w. Hence, p is contained in a disk of diameter at most 211uwll, which proves the theorem. 0 Hence, every proximity graph is also a power spanner of V for every 5 2 2. To make proximity graphs useful for ad hoc networks, we consider a constrained form of proximity graphs which are also known as relative neighborhood graphs [4]. DEFINITION 2.4. For any node set V C }R2, the graph G == (V, E) is called a relative neighborhood graph (RNG) of V if and only if for all u, w E V it holds that
• (u, w) E E or • there is av E V with (u,v) E E, Iluvll < IIuwll, andllvwll < Iluwll. It is easy to verify that relative neighborhood graphs satisfy the condition on constrained graphs we formulated for spanners in Definition 2.2. Hence, Theorems 2.1,2.7, and 2.9 imply that relative neighborhood graphs are weak and power spanners of the UDG of V for every <5 2: 2. Though relative neighborhood graphs may be good weak spanners, they may not be geometric spanners or power spanners with a low cost. Here, two basic approaches have been pursued in the literature to obtain geometric spanners and/or power spanners with low cost: • The nodes cut the space around thern into sectors of equal angle (), where () is sufficiently small. Such graphs are also known as B-graphs or Yao graphs [6, 25, 31]. • The nodes triangulate the space to form Delaunay-like graphs. We first consider Yao graphs and their variants, which we also call sectorbased spanners, and afterwards we study Delaunay graphs and their variants, which we also call planar spanners. 2.3. Sector-based spanners. The basic idea underlying the Yao graphs is to cut the space around each node into sectors of equal angle
() and to connect each node to the nearest neighbor in each of its sectors (see Figure 5). As we will see, this will give a relative neighborhood graph if () is sufficiently small. For any pair of nodes u, v, let Cu,v denote the sector (or cone) of u containing v.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
245
FIG. 5. An example of a Yao graph.
DEFINITION 2.5. Consider any finite V C }R2 and let kEN. Suppose that the space around every node v E V is cut into k sectors with angle () == 21r/k. Then the Yao graph YGe(V) of V consists of the following set of edges:
E == {(u,v) I U,v E V and there is no wE V with wE Cu,v and
Iluwll < Iluvll} . We start with a basic property of Yao graphs. THEOREM 2.10. IfB == 21r/k with k > 6, then YGe(V) is a RNG. The theorem immediately implies that Yao graphs with k > 6 are weak spanners. But they are more than that, as shown in the next theorem. THEOREM 2.11 ([25]). If B == 21r/k with k > 6, then YGe(V) is a
geometric spanner with stretch factor at most 1 1 - 2 sin( B/2) .
Combining this with Theorem 2.6 yields the following result. COROLLARY 2.1. If () == 21r/k with k > 6, then YGe(V) is a (c,8)power spanner for every 8 2:: 1 with 1 - ( 1 - 2 sin ((}/ 2)
c<
A much better result was shown strengthened their result to any 8 2:: THEOREM 2.12 ([26]). If () == (c, 8)-power spanner for every 8 2:: 1
)8 .
by Li et al. [21] for 8 2:: 2. We recently 1. 21r/k with k > 6, then YGe(V) is a
with 1
c< 8' - 1 - (2 sin((}/2))
246
CHRISTIAN SCHEIDELER
I
I
..............
:"i"';::"':'\::--!:::--~:'~'
...;--t::>:
FIG. 6. The Yao graph, the sparsified Yao graph, and the symmetric Yao graph of a point set.
The drawback of the Yao graph is that, although its out-degree is at most k, its in-degree may be as high as n -1 (consider, for example, the disk in Figure 9 with one node in its center and all other nodes on its border). Various sub-graphs of the Yao graph have been suggested to remove this drawback. We will present two of them here (see also [3]). DEFINITION 2.6. The sparsified Yao graph SpYGe(V) is a sub-graph of YGe(V) with edge set
E == {(u,v) E E(YGe(V)) I for all wE V with (w,v) E E(YGe(V)) and
w E Cv,u:
llvwll > Ilvull}
.
In words, for every sector of every node v, the sparsified Yao graph
only keeps the shortest of all edges into v. Hence, the sparsified Yao graph has an in-degree of at most k and an out degree of at most k, and therefore a degree of at most 2k. DEFINITION 2.7. The symmetric Yao graph SyYGe(V) is a sub-graph of YGe(V) with edge set
E == {(u,v) E E(YGe(V)) I (v,u) E E(YGe(V))}. In words, the symmetric Yao graph only keeps an edge (u, v) if not only v is the nearest neighbor of u in Cu,v but also u is the nearest neighbor of v in Cv,u. Hence, the symmetric Yao graph has a degree of at most k. Obviously,
SyYGe(V)
~
SpYGe(V)
~
YGe(V)
and Figure 6 shows that there are cases in which the edge sets of the different graphs are proper subsets of each other. Thus, it suffices to prove
connectivity for SyYGe(V) in order to prove connectivity for both variants of the Yao graph. THEOREM 2.13 ([11]). For all non-degenerate node sets V and k > 6, SyYGe(V) is connected.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
247
FIG. 7. An example of a Delaunay graph.
Unfortunately, the symmetric Yao graph is not a good power spanner for any £5 2:: 1, which implies that it is not even a good weak spanner. THEOREM 2.14 ([3]). The symmetric Yao graph is not a (c,£5)-power spanner for any constant c and any £5 2:: 1. However, the sparsified Yao graph is a good weak spanner. THEOREM 2.15 ([3]). If k > 6, then the sparsified Yao graph is a weak
c-spanner with c ==
1-2 Si;(8 /2) .
Though the sparsified Yao graph is not a relative neighborhood graph like the original Yao graph, it is easy to check that when restricting to the UDG of V, the proof of Theorem 2.15 is still correct for all pairs u, W with lIuwll :::; 1. Hence, it follows from the proof of Theorem 2.1 that the sparsified Yao graph is also a weak c-spanner of the UDG of V. Thus, Theorem 2.7 implies that it is also a power spanner of the UDG of V for every <5 2:: 2 and therefore useful for wireless ad hoc networks.
2.4. Planar spanners. The most well known class of planar spanners are the Delaunay graphs. The Delaunay graph of a set of points in IR.2 is equivalent to their Delaunay triangulation and the dual of their Voronoi diagram. Since the Delaunay triangulation of any point set in IR. 2 is planar, the Delaunay graph is planar. In the following, let L(uvw) be the triangle formed by the nodes u, v, and wand O(uvw) be the unique circle through u, v, and w. DEFINITION 2.8. For any V C IR. 2, the Delaunay graph Del(V) of V
consists of all edges (u, v) that have a node w E V for which O( uvw) does not contain any other node of V. For an example of a Delaunay graph see Figure 7. It is known [7, 14] that the Delauney graph is a geometric c spanner with c == 3 CO;(7r /6) ~ 2.42, but the Delaunay graph is difficult to maintain locally. Therefore, several variants of it have been proposed. The most well-known variant is the Gabriel graph.
248
CHRISTIAN SCHEIDELER
1
I
'
11 I
1"
:/ ,
1
\
I
I
\
I
I,'
I
I
I,
' "
I
"
I
FIG. 8. A Gabriel graph.
For any V C IR 2 , the Gabriel graph GG(V) of V consists of all edges (u, w) with the property that there is no node v E V with DEFINITION 2. g.
In words, the Gabriel graph of V consists of all edges {u, w} with the property that the open sphere through u and w with diameter Iluwll does not contain any other node in V. An example of a Gabriel graph is given in Figure 8. The Gabriel graph has the following interesting properties: 2.16. For any V C IR 2 , the Gabriel graph of V is a relative neighborhood graph and a subgraph of the Delaunay graph of V. THEOREM
Unfortunately, Theorem 2.5 implies that the Gabriel graph is not a geometric spanner. With better techniques one can even create a counterexample with stretch factor O( Iii) [21]. But Theorem 2.9 implies that the Gabriel graph is a weak 2-spanner, and even more importantly, it is an optimal power spanner for every <5 2: 2. THEOREM 2.17 ([21]). For every <5 2: 2, the Gabriel graph is an optimal power spanner. Unfortunately, the outdegree of a Gabriel graph can be as high as n - 1 (see Figure 9). Also, since the Gabriel graph is not a geometric spanner, one may ask whether there are locally constructible planar graphs that are geometric spanners. To investigate the latter issue, we define the following classes of graphs. DEFINITION 2.10. A triangle 6(uvw) satisfies the k-Iocalized Delaunay property if the interior of the disk O( uvw) does not contain any node of V that is a k-neighbor of u, v, or w in UDG(V) and (u, v), (v, w), (w, u) E UDG(V). Such a triangle is called a k-Iocalized Delaunay triangle. DEFINITION 2.11. The k-Iocalized Delaunay graph over V, denoted by LDel(k) (V), has exactly all Gabriel edges and the edges of all k-localized Delaunay triangles.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS __-:;;;'-
I
,
'I 'I
" ..
,';,' I
....
I I
, \
\ ,,,
\, \ \ '
\,"
,\
.......... ..
"
....-":.':\\/'l<,,.---.
- - -...-e-
"
1\
"~,
)'
'" , .,' /
--c.:::.. . . . __..
\, \\ \\
""
"
,,
249
,
I'
\ \\
,/ \::' '"
J::--'
FIG. 9. Gabriel graph for the unit disk with one node in its center and all other nodes on its border.
Let the constrained Delaunay graph of a point set V be defined as == Del(V) n UDG(V). The following facts are known about klocalized Delaunay graphs. THEOREM 2.18 ([20]). Localized Delaunay graphs have the following properties: 1. UDel(V) ~ LDel(k)(V) for all k 2: 1. 2. LDel(k+l)(V) < LDel(k)(V) for all k 2: 1. 3. LDel(2)(V) is a planar graph. 4. t.n-i» (V) is not always planar.
UDel(V)
Since U Del (V) is a geometric c-spanner with c ~ 2.42 it follows that LDel(2)(V) is a geometric c-spanner with c ~ 2.42, and it is also planar. However, as mentioned above, Gabriel graphs and therefore all graphs of the localized Delaunay graph family have the problem that the degree may be very high (see Figure 9). This problem can be solved by constraining a Delaunay graph in the same way Yao graphs are constrained to sparsified Yao graphs: cut the space around each node into k > 6 sectors of equal angle, and accept only the connection of the closest node with an incoming edge in the original graph. Similar to the proof of the sparsified Yao graph, this gives a sparsified Delaunay graph that is still a weak spanner. Other constructions have been proposed that can even maintain a Euclidean O(l)-spanner but at the cost of requiring an algorithm that mayneed a long time to stabilize at some solution [29].
3. From ideal to realistic models. In the previous section we saw that in the ideal world (i.e., a perfect 2-dimensional Euclidean space) it is possible to construct good spanners. But how about the real world? In the real world, many assumptions made above are not valid any more.
250
CHRISTIAN SCHEIDELER
For example, instead of the unit disk graph, more involved models have to be chosen to model the transmission range of a node in real life. The position of a node or its distance or angle to another node may not be easy to determine. The energy consumed by transmitting a message over a distance of d is not simply d~ for some fixed 0 2:: 2. Also, the spanner constructions above can create very dense networks that can possibly create a lot of contention, reducing the effectiveness of wireless communication in practice. Finally, mobility has not been addressed above. We will present possible solutions to each of these problems.
3.1. Unit disk model. Certainly, the unit disk model is too simplistic to model the transmission range of wireless nodes. One alternative model would be the standard packet radio network model used in many papers on wireless broadcasting: We model the wireless medium as a graph G == (V, E) where V represents the set of wireless nodes and (u, v) E E if and only if u is able to transmit a message to v. This model has two disadvantages. First of all, it is too general. Pathological cases can be constructed that would never occur in practice. It is possible, for example, to choose a graph G that makes it impossible to construct a low-degree (and therefore low contention) geometric spanner. In fact, it is easy to come up with a node distribution V and graph G where any geometric spanner with constant stretch factor would have to be a star graph, i.e., one node in it must have a degree of n - 1. Also, the packet radio network model does not allow us to say how the transmission range of a node changes when changing its transmission power. An alternative model could be the following: We are given a set V of wireless nodes that are distributed in an arbitrary way in a 2-dimensional Euclidean space. Consider any function t with the property that there is a fixed constant 1 E [0, 1) so that for any two points p and q in the Euclidean space, 1. t(p, q) E [(1 -1) '/Ipqll, (1 + 1) ·lIpqllJ and 2. t(p, q) == t(q,p), i.e. t is symmetric. For any two nodes u, v E V where u sends with transmission power corresponding to a value of t u , v is in the transmission range of u if and only if t( u, v) :::; t u . Applying the UDC model to this new model, this means that two nodes u and v are within transmission range if t( u, v) :::; 1. Thus, t determines the transmission range of the nodes and 1 bounds the non-uniformity of the environment. Notice that we do not require t to be monotonic in the distance or to satisfy the triangle inequality. This makes sure that our model even applies to highly irregular environments. In Figure 10, for example, the distance between u and v is greater than the distance between u and w. Yet, the cost of communicating between u and w, t(u, w), is bigger than t(u, v). Similar cost functions were also used in [17].
OVERLAY NETWO RKS FOR WIRELESS AD HOC NETWORKS
251
v. u. FIG. 10. The area covered by the maximum tmnsmissi on mnge of n ode u is given by th e shaded area. Given a maximum tmnsmissi on ran ge of 1, this means that t(u , v) ::; 1 and t(u , w) > 1.
Does t he cost model still allow us to construct good spanners? Yes, it does, because of cond ition 1 on the cost funct ion above. This condition essentially states that t(p, q)t5 = 8( llpqW) for any constant J ~ O. Hence, the following fact holds. FACT 3 .1. FOT any graph G it holds that G is a geometric c-spanner / (c, J)-power spanner / weak c-spanner of V ui.r.i. 11· 11 if and only if G is a geometric c-spanner / (c, J)-pouier spanner / weak c-spanner of V w.r. t. t. Using thi s fact , it is easy to verify that Theorem 2.1 still holds. Also, t he relationships between th e different classes of spanners in Section 2.1 still hold due to Fact 3.1. For t he various spanner const ruct ions t hat we suggested afterwards, we obtain the following results. Proximity graphs. Definition 2.3 with 1\ . II being repla ced by t still satisfies Theorem 2.9 since for any pair of nodes u , w E V in a proximity graph G = (V, E ), either (u,w) E E or (u,v) E E for some nod e v with t( v, w) < t(u , w). Hence, all nodes on a pat h from u to w lie within a transmission range of at most t(u, w) around w. Sector-based spanners. To make sure t hat the Yao graph is still a geometric spanner , an an gle () has to be chosen so that for any two nodes u , w E V it holds th at eit her (u, w) E E or t here is a node v in the sector Cu.,w with (tt, v) E E and
t (v , w) :::; (1 - €)t(u , w) for some constant € > O. For 1 = 0 in the conditions for t t his is true for any () < 27l'/ k. If 1 > 0, th en it can be shown via trigonometric arguments (see also Figure 11) that
(
t(U, V) Sin () ) 2 + (t(U,V) 1+ 1 1- 1
_ ( t(U,V)COS())) 2 1+ 1
Simplifying t his expression, we obtain t (v, w ) :::;
V 2(1 -cos ()) + 21 2( 1 + cos ()) ( ) . t u,v . 1- 1
252
CHRISTIAN SCHEIDELER
w
u FIG. 11. Node u has a connection to v because t(u, v) closer to u then v if, > o.
~w
<
t(u, w), but w may be
L
v' ....---1-----....
v
FIG. 12. The typical street corner problem. Nodes v and wand nodes v' and w' are within transmission range of each other, but v and w cannot reach anyone of v' and w', and vice versa.
It holds that V2(1 - cos 0) + 2')'2(1
+ cos 0) < 1
~-----------
1-')'
1
cosO> - - 2(1 - ,)
Hence, as long as 0 :::; arccos(1/(2(1 - ')'))) - E for some constant E > 0, which is only possible if ')' < 1/2" the Yao graph construction still yields a geometric spanner of constant stretch factor . Under this condition on 0, also the properties for the sparsified and the symmetric Yao graph can be shown. Planar spanners. The planarity condition cannot be satisfied any more because the Delaunay condition of having no node in O( uvw) can create crossing edges when applying this condition to t. These crossing edges can be very hard to determine (see, for example, Figure 12). Nevertheless, the Gabriel graph is still a RNG due to the condition that t(u, v)2 + t(v, w)2 < t(u, w)2, and therefore t(u, v) < t(u, w) and t(v, w) < t(u, w). Hence, the Gabriel graph (and also the other Delaunay graphs presented in Section 2.4 because they are supergraphs of the Gabriel graph) is still a weak spanner.
3.2. Positions, distances and angles. All spanner graph constructions can be easily done in a distributed way if every node knows its po-
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
253
sition, which is possible if it has GPS. But if the nodes are not able to determine their positions through means like GPS, then other strategies have to be used. In the following, we list some options for the various spanner constructions. Proximity graphs. In proximity graphs, only the distance has to be computed between nodes. Here, a reasonable strategy might be to measure the signal strength and from there compute the distance based on an appropriate path loss model. This computation may not be too reliable in a non-uniform environment, but as long as the model above with the cost function t can be applied (i.e., the path loss only varies by a constant factor), only a constant factor error will be done in the distance calculation. This would suffice to obtain the same properties as shown in the previous subsection. Sector-based spanners. Here, knowing the distance alone does not suffice because it is also important to compute the angle between the nodes. Here, trilateration techniques 111ay be used: First, u determines the pairwise distance among the nodes in its neighborhood (using the technique mentioned for proximity graphs, for example), and then u tries to layout the nodes so that all distance relationships are satisfied (up to a small constant factor). Using this virtual layout of the nodes, u will then cut the layout into sectors and connect to nodes according to the Yao graph rules. Again, if the model with the cost function t can be applied, then a similar relationship between () and the error in the measurement can be shown as in the previous subsection, so that the Yao graph is still a geometric spanner. Planar spanners. Gabriel graphs do not need to compute the distance between two nodes but may directly use the signal strength. To see this, recall the Gabriel condition
If the path loss scales quadratically with the distance (which is true in a perfect outdoors environment), then this expression simply states that
e(u, v)
+ e(v, w) < e(u, w)
where e(x, y) is the energy necessary to send a message from x to y. But even if this perfect situation is not given, the energy argument still guarantees that the Gabriel graph is a weak spanner as long as we can apply the cost function t above for the path loss. Since the other Delaunay graphs are supergraphs of the Gabriel graph, these will also be weak spanners.
3.3. Energy cost model. So far, we assumed a simple energy cost model, i.e., given a distance of x, the energy consumption scales with f(x) == x<5 for some 8 ~ 2. In reality, there is a fixed minimum energy consumption, so a more realistic function would be g(x) == max{eo,x d }
254
CHRISTIAN SCHEIDELER
for some positive constant eo. In this case, short edges should be avoided because sending a message along many short edges can now be much more expensive than sending a message along a long edge. In fact, in this cost model it is not true any more that every geometric spanner or weak spanner is also a power spanner. So we need to adjust our spanner constructions for this property to be true again. For this we need the concept of a dominating set. DEFINITION 3.1. Given a node distribution V C }R2 and a distance d, we say that a subset U ~ V forms a dominating set of V w.r.t. d if for all v E V either v E U or there is a node U E U with lIuvll :::; d.
Suppose now that we found a dominating set U for the given node distribution with respect to distance do == ~. Consider the graph G == (U, E) with E consisting of all edges {u, u'} E U 2 with the property that there are v,v' E V with lI uv lI :::; do, Ilu'v'll :::; do and Ilvv'll :::; dm ax where d m ax is the maximum transmission range of a node. Since all edges in G have a length of at least do, all results about the relationship between geometric, weak and power spanners hold again for G because the threshold eo is effectively washed out. THEOREM 3.2. For any pair {u, v} E U2, the energy of an energyoptimal path Pu,v in G is within a constant factor of the energy of an energy-optimal path P~,v in the UDG(V). Proof Consider any energy-optimal path P~,v in UDG(V) w.r.t. g(x). Let VI, ... ,Vk be the nodes traversed in this path, and let u.; be any node in U with lluivi II s do for all i E {I, ... , k}. Then the energy of the path
Pu,v == (VI, UI, ... ,Uk, Vk) is within a constant factor of the energy of P~,v because of the following facts:
• for all i, g( IlviVi+111) == O( eo) • g(llvIuI11) == O( eo) and g(llukVk II) == O( eo) • for all i with Ilvivi+111 ::; do, g(lluiui+lll)
::; g(3do) == (3do)<5 ::; 3<5 g ( llvivi+lll) == O(g(llviVi+III)) • for all i with II vi vi+111> do, g(lluiUi+lll) ::; g(lIvivi+lll + 2do) ::; (31I vi Vi+l ll)<5 == O(g(llviVi+lll))
n
Hence, in order to repair our spanner constructions for g(x), we can do the following:
1. Choose a dominating set U w.r.t. do. 2. Apply the spanner construction to G == (U,E) defined above to obtain a graph G' == (U, E'). 3. Construct a graph G" == (V, E") out of G' by replacing every edge {u, u'} E E' by at most 3 edges of length at most min{dm ax , "uu'll} and adding all edges {u, v} with u E U, v E V \ U and lIuvll ::; do. Step 3 is possible because by definition there is only an edge between two nodes u, u' E U in G if there are two nodes v,,v' E V with lIuvlI ::; do, lIu'v'll do and IIvv'll < drn a x ·
s
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
255
Using this construction, all spanner results in the previous section hold, apart from the planarity property. Here, one would have to take care that when connecting the nodes in the dominating set in step 3, planarity is maintained, which is possible.
3.4. Contention. In all spanner constructions above, all nodes have equal roles. This, however, can be bad in areas with a high node density because then a lot of nodes have to coordinate their transmissions so that eventually messages can get through. An alternative solution is to partition the nodes into two groups: normal cluster nodes and cluster leaders, which we will also call passive and active nodes in the following. All communication is scheduled by the cluster leaders, or active nodes, in a sense that they determine who is allowed to transmit a message at a certain time point. This significantly simplifies the contention problem if there are only a few cluster leaders within a transmission range. Ideally, the active nodes should form a connected set so that they can handle all non-local communication, and for each passive node there should be an active node within transmission range so that also all messages from and to passive nodes can be forwarded by the active nodes. Finding such a set of nodes is also known as the connected dominating set problem. Various distributed protocols have already been presented to find such a set. See, for example, [2, 9, 13, 16, 19]. As long as our model above with the cost function t is applicable, these protocols actually yield a connected dominating set of constant density, i.e., every node has only a constant number of dominating set nodes within its transmission range. In addition, any of the spanner constructions can be applied to the active nodes to further reduce the number of potential communication links while keeping the network connected. In this way, the effort of coordinating the transmissions between the active nodes can be kept at a low constant, no matter how many nodes there are in the network, so that the effort of scheduling message transmissions can scale to any number of nodes. 3.5. Mobility. As long as the nodes do not move around (and no node fails), the overlay network only has to be constructed only once. However, if the nodes move around, then frequent updates to the overlay network may be necessary. To limit the amount of updates, we need a mechanism that only connects nodes that can remain connected for a certain time. Certainly, this requirement can only be satisfied for two nodes v and w if the relative speed of v to w is small. This easiest way to take this into account is to add additional dimensions for the speed (see also [27]). Since speed in a 2-dimensional Euclidean space is a 2-dimension vector, this would result in a 4-dimensional space. Next we investigate whether our spanner constructions can still be used in this space. Proximity graphs. For these graphs, only the distance between two nodes is relevant, not the dimension of the space. Hence, all the properties of proximity graphs are preserved.
256
CHRISTIAN SCHEIDELER
Sector-based graphs. For Yao-graphs, the following result is known. THEOREM 3.3 ([22, 25]). Let V be any set of n points in jRd and let o < () < 1r/3. Then the graph YGe(V) is a geometric spanner for V with stretch factor 1-2 Si~(e/2)· The number of cones needed to obtain an angle 3 2
of () is 0 (d -1/2 ( d : ) d-1 ) . Hence, the stretch factor is the same as for the 2-dimensional case, but the number of cones to obtain a certain angle () grows exponentially in the dimension d. Planar graphs. Certainly, planarity is not relevant any more in a more than 2-dimensional space, but the other spanner results about the various Delaunay graphs can be shown to hold as before. 4. Conclusions. In this paper we gave an overview of spanner constructions relevant for wireless ad hoc networks. We studied the performance of these constructions both in an idealized model and under more realistic assumptions. Interesting open problems in the future are how to route efficiently in these networks when using realistic communication and mobility models. For the UDG model and static nodes, there is already a large body of work on routing protocols (see, e.g., [5, 15, 17, 18]), but most of these results heavily rely on the assumption that the given overlay network is planar. Planarity, however, is hard to achieve under more realistic models, as we saw above. So further research is necessary. REFERENCES [1] K. ALZOUBI, X.-Y. LI, Y. WANG, P.J. WAN, AND O. FRIEDER. Geometric spanners for wireless ad hoc netw.orks. IEEE Transactions on Parallel and Distributed Systems, 14(4):408-421, 2003. [2] K. ALZOUBI, P.-J. WAN, AND O. FRIEDER. New Distributed Algorithm for Connected Dominating Set in Wireless Ad Hoc Networks. In Proceedings of the
Thirty-Fourth Annual Hawaii International Conference on System Science (HICSS-35). IEEE Computer Society Press, 2002. [3] F. MEYER AUF DER HEIDE, C. SCHINDELHAUER, K. VOLBERT, AND M. GRUNEWALD. Energy, congestion and dilation in radio networks. In Proc. of the 14th ACM Symp. on Parallel Algorithms and Architectures (SPA A), pp. 230-237, 2002. [4] P. nOSE, L. DEVROYE, W. EVANS, AND D. KIRKPATRICK. On the spanning ratio of Gabriel graphs and beta-skeletons. In Proc. of Latin American Theoretical
Informatics Conference (LATIN), 2002.
[5] P. BOSE, P. MORIN, I. STOJMENOVIC, AND J. URRUTIA. Routing with guaranteed delivery in ad hoc wireless networks. In Proc. of 3rd Int. Workshop on Discrete Algorithms and Methods for Mobility (Dial-M), pp. 48-55, 1999. [6] K.L. CLARKSON. Approximation algorithms for shortest path motion planning. In Proc. of the 19th SIGACT Symposium, 1987. [7] D.P. DOBKIN, S.J. FRIEDMAN, AND K.J. SUPOWIT. Delaunay graphs are almost as good as complete graphs. Discrete Computational Geometry, pp. 399-407, 1990. [8] D. DUBHASHI, O. HAGGSTROM, A. PANCONESI, AND M. SOZIO. Irrigating ad hoc networks in constant time. In Proc. of the 17th ACM Symp. on Parallel Algorithms and Architectures (SPA A), pp. 106-115, 2005.
OVERLAY NETWORKS FOR WIRELESS AD HOC NETWORKS
257
[9] DEVDATT DUBHASHC ALESSANDRO MEl, ALESSANDRO PANCONESI, JAIKUMAR RAD-
[10] [11]
[12}
[13]
[14] [15]
HAKRISHNAN, AND ARVIND SRINIVASAN. Fast distributed algorithms for (weakly) connected dominating sets and linear-size skeletons. In Proceedings of the fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-03), pp. 717-724, New York, January 12-14, 2003. ACM Press. D. EpPSTEIN. Handbook of Computational Geometry, chapter Spanning trees and spanners, pp. 425-461. Elsevier, 2000. M. GRUNEWALD, T. LUKOVSZKI, C. SCHINDELHAUER, AND K. VOLBERT. Distributed maintenance of resource efficient wireless network topologies. In European Conference on Parallel Computing (EUROPAR), pp. 935-946, 2002. L. JIA, R. RAJARAMAN, AND C. SCHEIDELER. On local algorithms for topology control and routing in ad hoc networks. In Proc. of the 15th A CM Symp. on Parallel Algorithms and Architectures (SPA A), pp. 220-229, 2003. L. JIA, R. RAJARAMAN, AND R. SUEL. An efficient distributed algorithm for constructing small dominating sets. In PODC: 20th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, 2001. .I.M. KEIL AND C.A. GUTWIN. Classes of graphs which approximate the complete Euclidean graph. Discrete Computational Geometry, 7:13-28, 1992. E. KRANAKIS, H. SINGH, AND J. URRUTIA. Compass routing on geometric networks. In Proc. of 11th Canadian Conference on Computational Geometry, pp. 51-
54, 1999. [16] F. KUHN, T. MOSCIBRODA, AND R. WATTENHOFER. Radio network clustering from scratch. In European Symposium on Algorithms (ESA) , 2004. [17] F. KUHN, R. WATTENHOFER, Y. ZHANG, AND A. ZOLLINGER. Geometric ad-hoc routing: Of theory and practice. In Proc. of the 22nd IEEE Symp. on Principles of Distributed Computing (PODC), 2003. [18] F. KUHN, R. WATTENHOFER, AND A. ZOLLINGER. Asymptotically optimal geometric mobile ad-hoc routing. In Proc. of 6th Int. Workshop on Discrete Algorithms and Methods for Mobility (Dial-M), pp. 24-33, 2002. [19] FABIAN KUHN AND ROGER WATTENHOFER. Constant-time distributed dominating set approximation. In Proceedings of the 22nd Annual ACM Symposium on Principles of Distributed Computing (PODC-03), pp. 25-32, New York, July 13-16, 2003. ACM Press. [20] X.-Y. LI, G. CALINESCU, P.-J. WAN, AND Y. WANG. Localized Delaunay triangulation with application in ad hoc wireless networks. IEEE Transactions on Parallel and Distributed Systems, 14(10):1035-1047, 2003. [21] X.- Y. LI, P.-J. WAN, AND Y. WANG. Power efficient and sparse spanner for wireless ad hoc networks. In Proc. of IEEE International Conference on Computer Communications and Networks (ICCCN) , 2001. [22] T. LUKOVSZKI. New Results on Geometric Spanners and Their Applications. PhD thesis, University of Paderborn, 1999. [23] R. RAJARAMAN. Topology control and routing in ad hoc networks: a survey. SIGACT News, 33(2):60-73, 2002. [24] S.B. RAO AND W.D. SMITH. Approiximating geometrical graphs via spanners and banyans. In Proc. of the 30th ACM Symp. on Theory of Computing (STOC), pp. 540-550, 1998. [25] J. RUPPERT AND R. SEIDEL. Approximating the d-dimensional complete euclidean graph. In Proc. of the 3rd Canadian Conference on Computational Geometry (CCCG), pp. 207-210, 1991. [26] C. SCHEIDELER. Performance Analysis of Mobile Ad Hoc Networks, chapter Overlay Networks for Wireless Systems. Nova Science Publishers, To appear in
2005. [27] C. SCHINDELHAUER, T. LUKOVSZKI, S. RUHRUP, AND K. VOLBERT. Worst case mobility in ad hoc networks. In Proc. of the 13th ACM Symp. on Parallel Algorithms and Architectures (SPA A), pp. 230-239, 2003.
258
CHRISTIAN SCHEIDELER
[28] C. SCHINDELHAUER., K. VOLBERT, AND M. ZIEGLER.. Spanners, weak spanners, and power spanners for wireless networks. In Proc. of 15th Annual International Symposium on Algorithms and Computation (ISAAC '04), pp. 805-821, 2004. [29] Y. WANG AND X.- Y. L1. Localized construction of bounded degree and planar spanner for wireless ad hoc networks. In DIALM-POMC '03: Proceedings of the 2003 joint workshop on Foundations of mobile computing, pp. 59-68, 2003. [30] F. XUE AND P.R. KUMAR. The number of neighbors needed for connectivity of wireless networks. Wireless Networks, 10(2):169-181, 2004. [31] A. C.-C. YAO. On constructing minimum spanning trees in k-dimensional space and related problems. SIAM Journal on Computing, 11:721-736, 1982.
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION FOR DISTRIBUTED ESTIMATION WITH WIRELESS SENSOR NETWORKS* IOANNIS D. SCHIZASt, ALEJANDRO RIBEIROt, AND GEORGIOS B. GIANNAKISt
Abstract. The distributed nature of observations collected by inexpensive wireless sensors necessitates transmission of the individual sensor data under stringent bandwidth and power constraints. These constraints motivate: i) a means of reducing the dimensionality of local sensor observations; ii) quantization of sensor observations prior to digital transmission; and iii) estimators based on the quantized digital messages. These three problems are addressed in the present paper. We start deriving linear estimators of stationary random signals based on reduced-dimensionality observations. For uncorrelated sensor data, we develop mean-square error (MSE) optimal estimators in closed-form; while for correlated sensor data, we derive sub-optimal iterative estimators which guarantee convergence at least to a stationary point. We then determine lower and upper bounds for the Distortion-Rate (D-R) function and a novel alternating scheme that numerically determines an achievable upper bound of the D-R function for general distributed estimation using multiple sensors. We finally derive distributed estimators based on binary observations along with their fundamental error-variance limits for pragmatic signal models including: i) known univariate but generally non-Gaussian noise probability density functions (pdfs); ii) known noise pdfs with a finite number of unknown parameters; and iii) practical generalizations to multivariate and possibly correlated pdfs. Estimators utilizing either independent or colored binary observations are developed, analyzed and tested with numerical examples. Key words. Wireless Sensor Networks, Distributed Parameter Estimation, Distributed Compression, Canonical Correlation Analysis, Distortion-Rate Analysis, Quantization, Estimation.
AMS(MOS) subject classifications. Primary 68W15, 62G05, 68P30, 90B15.
1. Introduction. Wireless sensor networks (WSNs) consist of lowcost energy limited transceiver nodes spatially deployed in large numbers to accomplish monitoring, surveillance and control tasks through cooperative actions [14]. The potential of WSNs for surveillance has by now been well appreciated especially in the context of data fusion and distributed detection; e.g., [32,33] and references therein. However, except for recent
works where spatial correlation is exploited to reduce the amount of information exchanged among nodes [2,5,9,11,15,21,25,26], use of WSNs for "Prepared through collaborative participation in the Communications and Networks Consortium sponsored by the U. S. Army Research Laboratory under the Collaborative Technology Alliance Program, Cooperative Agreement DAAD19-01-2-0011. The U. S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. tDepartment of Electrical and Computer Engineering, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455. Tel/fax: 612-626-7781/612-625-4583 ({schizas,aribeiro,georgios}~ece.umn.edu.)
259
260
IOANNIS D. SCHIZAS ET AL.
the equally important problem of distributed parameter estimation remains a largely uncharted territory. While a number of statistical and information theoretic tools have been developed over the years, the unique characteristics of WSNs require rethinking of many algorithms traditionally designed for centralized estimation. Indeed, the distributed nature of the observations necessitates transmission of the individual sensor data; moreover, the power/bandwidth available for transmission and signal processing is severely limited. To complicate matters even more the parametric data models used and the knowledge of sensor noise distributions are not easy to characterize; observations taken by (small and inexpensive) sensors are very noisy; and the WSN size and topology may change dynamically. To appreciate the challenges implied by these constraints, consider a mean-location parameter estimation problem with sensors collecting observations in order to estimate a parameter in additive zero-mean noise. The distributed nature of the observations necessitates transmission of the individual sensor data under stringent bandwidth and power constraints thus requiring: i) a means of combining local sensor observations in order to reduce their dimensionality while keeping the estimation MSE as small as possible; ii) quantization of the combined observations prior to digital transmission; and iii) estimators based on the quantized digital messages, certainly different from estimators based on the original analog-amplitude observa tions. Overcoming the limitations of nonlinear/nonGaussian data models and non-ideal channel links, one of the major goals in this paper is to form estimates at the fusion center (FC) of a random stationary vector based on analog-amplitude multi-sensor observations. To enable estimation under the stringent power and computing limitations of WSNs, we seek linear dimensionality reducing operators (data compressing matrices) per sensor along with linear operators at the Fe, in order to minimize the mean-square error (MSE) in estimation. If sufficiently strong errorcontrol codes are used, we can treat links as ideal and formulate this intertwined compression-estimation task as a canonical correlation analysis problem [31]. Here, we explicitly account for non-ideal links and develop distributed estimators generally applicable to nonlinear and non-Gaussian setups (Section 2). We start by deriving in closed-form the MSE optimal matrices for compression and estimation when the sensor data are uncorrelated (Section 2.1), and we prove that the optimal solution amounts to optimally compressing the linear minimum mean-square error (LMMSE) signal estimate formed at each sensor. With correlated (coupled) sensor observations, globally optimal distributed estimation has been shown to be NP-hard when reduced-dimensionality sensor data are concatenated at the FC [18]. For this case, we develop a block coordinate descent iterative estimator (Section 2.2) which always converges to a stationary point and subsumes a recent distributed reconstruction algorithm in [10].
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION
261
When the sensors are allowed to transmit only digital-amplitude data (due to encoding rate constraints), an issue of paramount importance is to determine bounds on the minimum achievable distortion between the signal of interest and its estimate formed at the Fe using the encoded information transmitted by the sensors subject to rate constraints (Distortion-Rate function). In the reconstruction scenario, the FC wishes to accurately estimate the sensor observations. In the estimation scenario, the FC is interested in accurately estimating an underlying random vector which is correlated with, but not equal to, the sensor observations. In the single sensor setting, single-letter characterizations of the Distortion-Rate (D-R) function for both scenarios are known [8, p. 336], and the estimation problem, which is also referred to as rate-distortion with a remote source, has also been determined [3, p. 78]. In the distributed scenario, where there are multiple sensors with correlated observations, neither problem is well understood. The best analytical inner and outer bounds for the D-R function for reconstruction can be found in [4]. An iterative scheme has been developed in [10], which numerically determines an achievable upper bound for distributed reconstruction but not for signal estimation. We present this D-R analysis in Section 3. We first determine the D-R function for estimating a vector parameter when applying rate-constrained encoding to the observation data, in closed form for the single-sensor case (Section 3.1). Without assuming that the number of parameters equals the number of observations, we prove that the optimal scheme achieving the D-R function amounts to first computing the minimum mean square error (MMSE) estimate at the sensor, and then optimally compressing the estimate via reverse water-filling (rwf). The D-R function for the single-sensor setup serves as a non-achievable lower D-R bound for rate constrained estimation in the multi-sensor setup. Next, we develop an alternating scheme that numerically determines an achievable D-R upper bound for the multisensor scenario (Section 3.2). Different from [10], which deals with WSNbased distributed reconstruction, our approach aims for general estimation problems. Returning to the issue of estimation once the actual observations have been collected at the FC, we study the intertwining between quantization and estimation (Section IV). We begin with mean-location parameter estimation in the presence of known univariate but generally non-Gaussian noise pdfs (Section 4.1.1). We next develop mean-location parameter estimators based on binary observations and benchmark their performance when the noise variance is unknown; however, the same approach in principle applies to any noise pdf that is known except for a finite number of unknown parameters (Section 4.1.2). Subsequently, we move to the most challenging case where the noise pdf is completely unknown (Section 4.2). Finally, we consider vector generalizations where each sensor observes a given (possibly nonlinear) function of the unknown parameter vector in the presence of multivariate and possibly colored noise (Section 4.3). While
262
IOANNIS D. SCHIZAS ET AL. ,..._ ....-._._._._...• :
8
jy
:
FUIIan CIftW
B
1.__... ..
i
.i
Hi
.--1
FIG. 1. Distributed setup for estimating a random signal s.
challenging in general, it will turn out that under relaxed conditions, the resultant Maximum Likelihood Estimator (MLE) is the maximum of a concave function, thus ensuring convergence of Newton-type iterative algorithms. Moreover, in the presence of colored Gaussian noise, we show that judiciously quantizing each sensor's data renders the estimators' variance stunningly close to the variance of the clairvoyant estimator that is based on the unquantized observations; thus, nicely generalizing the results of Sections 4.1.1, 4.1.2, and [27] to the more realistic vector parameter estimation problem (Section 4.3.1). Numerical examples corroborate our theoretical findings in Section 5, where we also test them on a motivating application involving distributed parameter estimation with a WSN for measuring a vector flow (Section 5.4). We conclude the paper in Section 6. 2. Dimensionality reduction for distributed estimation. In this section we develop linear distributed estimators in a setup where the sensors observe and transmit analog-amplitude data. Consider the WSN depicted in Fig. 1, comprising L sensors linked with an FC. Each sensor, say the ith one, observes an N i x 1 vector Xi that is correlated with a p x 1 random signal of interest s. Through a k i x N i fat matrix C, each sensor transmits a compressed k, x 1 vector CiXi, using e.g., multicarrier modulation with one entry riding per subcarrier. Low-power and bandwidth constraints at the sensors encourage transmissions with k, « Ni, while linearity in compression and estimation are well motivated by low-complexity requirements. Furthermore, we assume that: (al) No information is exchanged among sensors, and each sensor-FC link comprises a k, x k, full rank fading multiplicative channel matrix D i along with zero-mean additive FC noise Zi, which is uncorrelated with Xi, D,, and across channels; i.e., noise covariance matrices satisfy ~ZiZj == 0 for i =I j. Matrices {D,. ~ZiZi }t=l are available at the FC. (a2) Data Xi and the signal of interest s are zero-mean with full rank auto- and cross-covariance matrices ~ss, ~SXi and ~XiXj \j i,j E [1, L], all of which are available at the FC. In multicarrier links, full rank of the channel matrices {D i }r=l is ensured if sensors do not transmit over sub carriers with zero channel gain.
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION
263
Matrices {D i }r=l can be acquired via training, and likewise the signal and noise covariances in (a1) and (a2) can be estimated via sample averaging as usual. With multicarrier (and generally any orthogonal) sensor access, the noise uncorrelatedness across channels is also well justified. Notice that unlike [10,18,37,38], we neither confine ourselves to a linear signal-plus-noise model Xi == Hs + n., nor we invoke any assumption on the distribution (e.g., Gaussianity) of {Xi}r=l and s. Equally important, we do not assume ideal channel links. Sensors transmit over orthogonal channels so that theFC separates and concatenates the received vectors {Yi(Ci ) == DiCiXi+Zi}r=l' to obtain the l:f=l ki x 1 vector:
Left multiplying Y by a p x (l:~1 ki ) matrix B, we form the linear estimate § of s. For a prescribed power Pi per sensor, our problem is to obtain under (a1)-(a2) MSE optimal matrices {C7}r=1 and BO; i.e., we seek:
(BO, {Cf}f=l) == arg
min L E[lls - By(C 1 , ... , C L ) 112 ],
B,{C'i},;,=l
s. to tr(Ci~X'iXiCT) ~ Pi,
(2.2) i E {l, ... ,L}.
2.1. Decoupled distributed estimation. We consider first the case where ~XiXj == 0, Vi =I i, which shows up e.g., when matrices {H i }r=l in the linear model Xi == His-l-n. are mutually uncorrelated and also uncorrelated with the noise vectors n.. Then, the multi-sensor optimization task in (2.2) reduces to a set of L decoupled problems. Specifically, it is easy to show that the cost function in (2.2) can be written as [31]:
where B, is the p x k i submatrix of B :== [B1 ... B L ] . As the ith nonnegative summand depends only on B i , C i the MSE optimal matrices are given by
(Bf, Cf) == arg min E[lls- Bi(DiCiXi a..«,
+ Zi)1!2],
s. to tr(Ci~XiXi CT) ~ Pi, i E {I, ... ,L}.
(2.4)
Since the cost function in (2.4) corresponds to a single-sensor setup (L == 1), we will drop the subscript i for notational brevity and write B, == B, C, == C, Xi == X, z, == Z, P == Pi and k == k i . The Lagrangian for minimizing (2.3) can be easily written as:
J(B, C, Jl) == .l,
+ tr(B~zzBT) + Jl[tr(C~xxCT) - P]
+ tr[(:Es x
BDC:Exx):E;;(:E xs - ~xxCTDTBT)],
-
(2.5)
264
IOANNIS D. SCHIZAS ET AL.
where J o :== tr(:E ss - :Esx:E;; :E xs) is the minimum attainable MMSE for linear estimation of s based on x. Continuing, we derive a simplified form of (2.5) the minimization of which will provide closed-form solutions for the MSE optimal matrices BO and Co. Aiming at this simplification, consider the SVD ~ sx == U sxS sx V;x' and the eigen-decompositions ~zz == QzAzQI and DT~;} D == QZdAzdQ;d' where Azd :== diag(A zd,l ... Azd,k) and AZd,l 2: ... 2: AZd,k > 0. Notice, that Azd,i captures the SNR of the ith entry in the received signal vector at the FC. Further, define A :== Q;VsxS;x SsxV;xQx with Pa :== rank(A) == rank(~sx), and Ax :== A;1/2 AA;1/2 with corresponding eigendecomposition Ax == QaxAaxQax, where A ax == diag(Aax,l,'" , Aax,Pa' , ._ -1/2 0",' ,0) and Aax,l 2: ... 2: Aax,Pa > 0. Moreover, let V, .- Ax Qax denote the invertible matrix which simultaneously diagonalizes the matrices A and Ax. Since matrices (Qzd, v. u.; A zd, Qzd, D, ~zz) are all invertible, for every matrix C (or B) we can clearly find a unique matrix ~c (correspondingly ~ B) that satisfies:
c.,
C == QZd~cV~Q;,
B == Usx~BA;}Q;dDT:E;},
(2.6)
where ~c :== [
J(~c, /-l) == L; + tr(A ax ) + /-l(tr(~c~~) - P) 1
-tr ((A;d
+ ~c~~)-l~cAax~~) .
(2.7)
Applying the well known Karush-Kuhn-Tucker (KKT) conditions (e.g., [6, Ch. 5]) that must be satisfied at the minimum of (2.7), we prove in [31] that the matrix 4-0 minimizing (2.7), is diagonal with diagonal entries:
«: =
~
{
±
( ,.) "DAd' Z ,t
0,
1/ 2
_ _1_ . Ad" Z ,t
l::;i::;~
(2.8)
~+l::;i::;k
where ~ is the maximum integer in [1, k] for which {
(2.9) When k > Pa, the MMSE remains invariant [31]; thus, it suffices to consider k E [1, Pal. Summarizing, we have established that: PROPOSITION 2.1. Under (al), (a 2), and for k ::; Pa, the matrices minimizing J(Bpxk' C kxN) == E[lls - Bpxk(DCkxNX + z)11 2 ], subject to tr(CkxN~xx ClxN):S P, are:
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 265
co == Qzd~C V~ Q~ , BO = ~sxQx Vac/
(CC/ + A;l) -1 A;d1Q;dDT~;},
(2.10)
where ~c is given by (2.8), and the corresponding Lagrange multiplier /10 is specified by (2.9). The MMSE is
J n1in (k) == L;
~
~ Aax,i(¢~iY
i=l
i=l Azd,i
+ Z:: Aax,i -
~
-1
'0
+ (¢c,ii)
2'
(2.11)
According to Proposition 1, the optimal weight matrix ~c in Co distributes the given power across the entries of the pre-whitened vector Qxx at the sensor in a waterfilling-like manner so as to balance channel strength and additive noise variance at the FC with the degree of dimensionality reduction that can be afforded. It is worth mentioning that (2.8) dictates a minimum power per sensor. Specifically, in order to ensure that rank( epc) == ~ the power must satisfy:
Vr
P >
"K:
(A
"A- 1 )1/2
zd,i J Aax ,,,,, AZd,,,,,
L..Ji=l
ax,'l
r:
~
1 - ~A;d,i i=l
.
(2.12)
The optimal matrices in Proposition 1 can be viewed as implementing a two-step scheme, where: i) we estimate s based on x at the sensor using the LMMSE estimate SLM == ~sx~;lx; and ii) compress and reconstruct SLM using the optimal matrices Co and BO implied by Proposition 1 after replacing x with SLM. For this estimate-first compress-afterwards (EC) interpretation, we prove in [31] that: COROLLARY 2.1. For k E [l,Pa), the k x N matrix in (2.10) can be written as Co == CO~sx~;;, where Co is the k x p optimal matrix obtained by Proposition 1 when x == SLM. Thus, the EC scheme is MSE optimal in the sense of minimizing (2.3). Another interesting feature of the EC scheme implied by Proposition 1 is that the MMSE Jrnin(k) is non-increasing with respect to the reduced dimensionality k, given a limited power budget per sensor. Specifically, we establish in [31] that: COROLLARY 2.2. If C k1X Nand C k2x N are the optimal matrices determined by Proposition 1 with k 1 < k 2 , under the same channel parameters AZd,i for i == 1, ... , k 1 , and common power P, the MMSE in (2.11) is non-increasing; i. e., J rnin (k 1 ) ~ J rnin (k 2 ) for k1 < k 2 . Notice that Corollary 2 advocates the efficient power allocation that the EC-n scheme performs among the compressed components.
2.2. Coupled distributed estimation. In this section, we allow the sensor observations to be correlated. Because ~xx is no longer block diagonal, decoupling of the multi-sensor optimization problem cannot be effected in this case. The pertinent MSE cost is [c.f. (2.2)]:
266
IOANNIS D. SCHIZAS ET AL.
(2.13) Minimizing (2.13) does not lead to a closed-form solution and incurs complexity that grows exponentially with L [18]. For this reason, we resort to iterative alternatives which converge at least to a stationary point of the cost in (2.13). To this end, let us suppose temporarily that matrices {Bl}r=l,l#i and {Cl}r=l,l#i are fixed and satisfy the power constraints tr( C, ~XlXl CT) == Pi, for l == 1, ... ,L and l =1= i. Upon defining the vector Si :== s - Ef=l,l#i(BlDzClXZ + BlZZ) the cost in (2.13) becomes: (2.14)
which being a function of C i and B, only, falls under the realm of Proposition 1. This means that when {BZ}r=l,l;zti and {CZ}~I,l#i are given, the matrices B, and C, minimizing (2.14) under the power constraint tr(Ci~XiXiCT) ::; Pi can be directly obtained from (2.10), after setting S == Si, X == Xi, Z == z, and Pa == rank(~sixi) in Proposition 1. The corresponding auto- and cross- covariance matrices needed must also be modified appropriately, namely ~ss == ~SiSi and ~SXi == ~SiXi' We have thus established the following result for coupled sensor observations:
2.2. If (a1) and (a2) are satisfied, and i, ::; rank(~SiXi)' then for given matrices {Bl}r=l,l#i and {C l}r=l,Z#i satisfying tr(C l ~XlXl CT) == Pl , the optimal Bf and Cf matrices minimizing E[lis B, (DlClXl +zz) 11 2 ] are provided by Proposition 1, after setting x = Xi, S = Si and applying the corresponding covariance modifications, Proposition 2 suggests the following alternating algorithm for distributed estimation in the presence of fading and Fe noise: PROPOSITION
Ef=l
Algorithm 1 : Initialize randomly the matrices {C~O) lr=1 and {B~O) It=I' such that
tr(C~O)~XiXiC~O)T) == Pi.
n==O
repeat
n==n+l for i == 1,£ do ' h ' C(n) B(n) C(n) B(n) C(n-I) B(n-I) G Iven t e matrices I ' 1 , ... , i-I' i-I' i+1 , i+1 , (n - I ) B(n-I) d . C(n) B(n) . Th 2 . . ., C L 'L ,etermine i , i VIa . end for until IMSE(n) - MSE(n-l) I < € for given tolerance € Notice that Algorithm 1 belongs to the class of block coordinate descent iterative schemes. At every step i during the nth iteration, it yields the optimal pair of matrices Ci, Bi, treating the rest as given. Thus, the MSE(n) cost per iteration is non-increasing and the algorithm always converges
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 267
.--_..._-----_ ...-------,
rt (n~u~·~I-~~ i.(,\ .I, :! i
! : I I
iL__.
o.tl model
r····
D.in~
1
~
! I I I
1
,
Enmd.t
:
I I
i : : ~xl~_LruK .... ~r~
I I
Dor:odw
:
i
.~J
L ••--------------,
r-------------------------: :----------------------------------1
l
Encoder
I! X :
!
L
ii
u • :
,=: :L :
Decoder
Elk
tlwt
~
Qz,
i _! X:
:j :
FIG. 2. (Left): Distributed setup.; (Right): Test channel for x Gaussian in a point- to-point link.
to a stationary point of (2.13). Beyond its applicability to possibly nonGaussian and nonlinear model settings, it is the only available algorithm for handling fading and generally colored FC noise effects in distributed estimation.
3. Distortion-rate analysis for distributed estimation. In contrast to the previous section, here we consider digital-amplitude data transmission (bits) from the sensors to the FC. In such a setup, all the sensors are characterized by a rate constraint. In order to determine the minimum possible distortion (MSE) between the signal of interest and the estimate at the FC, under encoding rate constraints, we perform D-R analysis and determine bounds for the D-R function. With reference to Fig. 2 (Left), consider a WSN comprising L sensors that communicate.with an FC. Each sensor, say the ith, observes an N, x 1 vector Xi (t) which is correlated with a p x 1 random signal (parameter vector) of interest s (t) , where t denotes discrete time. Similar to [22,23,34], we assume that:
(a3) No information is exchanged among sensors and the links with the FC are noise-free.
(a4) The random vector s(t) is generated by a stationary Gaussian vector memoryless source with s(t) N (0, ~ss); the sensor data {Xi(t)}~l adhere to the linear-Gaussian model Xi(t) == His(t) + ni(t), where ni(t) denotes additive white Gaussian noise (AWGN); i.e., ni(t) N (0, (121); noise ni(t) is uncorrelated across sensors, time and with s; and Hi as well as (cross-) covariance matrices ~ss, ~SXi and ~XiXj are known Vi, j E {I, ... ,L}. "-I
"-I
Notice that (a3) assumes that sufficiently strong channel codes are used; while whiteness of ni(t) and the zero-mean assumptions in (a4) are made without loss of generality. The linear model in (a4) is commonly encountered in estimation and in a number of cases it even accurately approximates non-linear mappings; e.g., via a first-order Taylor expansion in tar-
268
IOANNIS D. SCHIZAS ET AL.
get tracking applications. Although confining ourselves to Gaussian vectors Xi(t) is of interest on its own, following arguments similar to those in [3, p. 134] we can show that the D-R functions obtained in this paper bound from above their counterparts for non-Gaussian sensor data Xi (t). Blocks x~n) :== {Xi(t)}~=l' comprising n consecutive time instantiations of the vector Xi(t), are encoded per sensor to yield each encoder's output u;n) == fi(n) (x;n)), i == 1, ... , L. These outputs are communicated through ideal orthogonal channels to the Fe. There, u~n),s are decoded to obtain an estimate of (n ) (
(n )
(n ) ) .
s(n) :==
{s(t)}~=l denoted as s~)(u~n), ... , u~n))
U i(n ).IS
==
(n) Th e rate constraint . a functi unc Ion 0 f Xi'
gR Xl , ... , XL ,sInce is imposed through a bound on the cardinality of the range of the sensor encoding functions, Le., the cardinality of the range of fi(n) must be no larger than 2n R 'i , where R, is the available rate at the encoder of the ith sensor. The sum rate satisfies the constraint 2:;=1 R i ::; R, where R is the total available rate shared by the L sensors. Under this rate constraint, we want to determine the minimum possible MSE distortion (lin) 2:~=1 E[lls(t) - sR(t)1I 2 ] for estimating s in the limit of infinite blocklength n. When L == 1, a single-letter information theoretic characterization is known for the latter, but no simplification is known for the distributed multi-sensor scenario.
3.1. Distortion-rate for centralized estimation. We will first determine the D-R function for estimating s(t) in a single-sensor setup. The single-letter characterization of the D-R function in this setup allow us to drop the time index. Here, all {Xi}r=l :== x are available to a single sensor, and X == Hs + n. We let p :== rank(H) denote the rank of matrix H. The D-R function in such a scenario provides a lower (non-achievable) bound on the MMSE that can be achieved in a multi-sensor distributed setup, where each Xi is observed by a different sensor. Existing works treat the case N == p [29, 35], but here we look for the D-R function regardless of N, p, in the linear-Gaussian model framework. 3.1.1. Background on D-R analysis for reconstruction. The
D-R function for encoding X, which has probability density function (pdf) p(x), with rate R at an individual sensor, and reconstructing it (in the MMSE sense) as x at the Fe, is given by [8, p. 342]: (3.1)
x
where x E}RN and E }RN, and the minimization is w.r.t. the conditional pdf p(xlx). Let ~xx == QxAxQ; denote the eigenvalue decomposition of ~xx, where Ax == diag(Ax,1 ... Ax,N) and Ax,l ~ ... ~ Ax,N > O. For X Gaussian, Dx(R) can be determined by applying rwf to the pre-whitened vector x., :== Q;x [8, p. 348]. For a prescribed rate R,
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 269
it turns out that :3 k such that the first k entries {xw(i)}f=l of x w , are encoded and reconstructed independently from each other using rate
{R i
= O.51og 2 (Ax,;j d(k, R))}7=1' where
d(k,R)
=
(I17=lAx,ir/k2-2R/k
with R == 2:7=1 R i ; and the last N - k entries of x., are assigned no rate; i.e., {R i == O}~k+1' The corresponding MMSE for encoding xw(i), the ith entry of x w , under a rate constraint Ri , is D, == E[llxw(i) - w(i )112 ] == d(k, R) when i == 1, ... , k and D, == Ax,i when i == k + 1 ... ,N. The resultant MMSE (D-R function) is:
x
Especially for d(k, R), it follows that max( {Ax,i}~k+l) :::; d(k, R) < min{ Ax,l' ... ,Ax,k}. Intuitively, d(k, R) is a threshold distortion determining which entries of x., are assigned with nonzero rate. The first k entries of x., with variance Ax,i > d(k, R) are encoded with non-zero rate, but the last N - k ones are discarded in the encoding procedure (are set to zero). Associated with the rwf principle is the so called test channel; see e.g., [8, p. 345]. The encoder's MSE optimal output is u == Q~ kX+(' where Qx,k is formed by the first k columns of Qx, and ( models the distortion noise that results due to the rate-constrained encoding of x. The zero-mean AWGN ( is uncorrelated with x and its diagonal covariance matrix ~(( has entries [~((]ii == Ax,iDi/(Ax,i - D i). The part of the test channel that takes as input u and outputs X, models the decoder. The reconstruction x of x at the decoder output is:
where 8k is a diagonal matrix with non-zero entries (Ax,i - D i )/ Ax,i, i == 1, ... , k.
[8 k]ii
3.1.2. D-R analysis for estimation. The D-R function for estimating a source s given observation x (where the source and observation are probabilistically drawn from the joint pdf p(x, s)) with rate R at an individual sensor, and reconstructing it (in the MMSE sense) as x at the Fe is given by [3, p. 79]: (3.4)
where s E }RN and s » E }RN, and the minimization is w.r.t. the conditional pdf p(sRlx). In order to achieve the D-R function, one might be tempted to first compress x by applying rwf at the sensor, without taking into account the data model relating s with x, and subsequently use the reconstructed x to form the MMSE estimate See == E[slx] at the Fe. An
270
IOANNIS D. SCHIZAS ET AL.
r---------------------------------------------- ------1 x
!
_!
u~
i
9llf!
Qz.ka:
~~
:
x;
E[sli] ice
l
r-- ---------- ---------------------------------------------j
:
8:
x
T
,
Qi,k..,
1.
----
:
u~
9 ec
Qa.A:ec
Sec!
i I I
~
FIG. 3. (Top): Test channel for the CE scheme.; (Bottom): Test channel for the EC scheme.
alternative option would be to first form the MMSE estimate S == E[slx], encode the latter using rwf at the sensor, and after decoding at the FC, obtain the reconstructed estimate Sec' Referring to the former option as Compress-Estimate (CE), and to the latter as Estimate-Compress (EC), we are interested in determining which one yields the smallest MSE under a rate constraint R. Another interesting question is whether any of the CE and EC schemes enjoys MMSE optimality (i.e., achieves (3.4)). With subscripts ce and ec corresponding to these two options, let us also define the errors See :== s - See and Sec :== s - Sec' For CE, we depict in Fig. 3 (Top) the test channel for encoding x via rwf, followed by MMSE estimation of s based on x. Suppose that when applying rwf to x with prescribed rate R, the first k ee components of x., are assigned with non-zero rate and the rest are discarded. The MMSE optimal encoder's output for encoding x is given, as in subsection III-A.I, by U ee == k x+(ee' The covariance matrix of (ee has diagonal entries [~(ce (ce] ii == ~:~i Dfe / (Ax, i - Dfe) for i == 1, ... , k ee , where Dfe :== ee E[(xw(i) - w(i ))2]. Recalling that Dfe = (n~~e1 Ax,i) 1/k 2-2Rlkee when i == 1, ... ,kee and Dfe == Ax,i, when i == k ee + 1, ... ,N, the reconstructed x in CE is [c.f. (3.3)]:
Q;
x
(3.5) where [8 ee]ii == (Ax,i - Dfe)/ Ax,i, for i == 1, ... ,kee. Letting x :== Q~ x == [xT OlX(N-kce)]T, with Xl :== E>eeQ;,kceX+E>eeCee, we have for the MMSE estimate See == E[slx]:
(3.6) since Q~ is unitary and the last N - k ee entries of x are useless for estimating s. We have shown in [30] that the covariance matrix ~SceSce ' E[(s - see)(s - see)T] == ~ss - ~SXl ~;}Xl ~XlS of See is:
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 271
dilag (Dee \ x-2,1 ... Dee \ -2 ) . 1 A NAx,N In Fig. 3 (Bottom) we depict the test channel for the EC scheme. The MMSE estimate s == E[slx] is followed by the test channel that results when applying rwf to a pre-whitened version of S, with rate R. Let ~ss == QsAsQT be the eigenvalue decomposition for the covariance matrix of S, where A"8 == diag(A"8, 1 ... A"8,p ) and A"8,1 > _ ... > _ A"8,p . Suppose now that the first k ee entries of Sw == QT s are assigned with non-zero rate and the rest are discarded. The MSE optimal encoder's output is given by T ";d h . ". U ee == Q s,kecs + ~ee' an t e estimate Sec IS:
A. 'W here ~ee'-
(3.8) where Qs,kec is formed by the first kee columns of Qs. For the t.; x kee diagonal matrices 8 ee and ~(ec(ec we have [8 ee]ii == (As,i - Di e)/ As,i and [~(ec(ec]ii == As,iDi e/(As,i - Die), where Die :== E[(sw(i) - See,w(i))2], and s ec,w .'== Q!'s sec' Recall also that tr: 1, ==
(Il~ec 1,=1 A"8,1,.)1/k
ec
2-2R/kec when
i == 1, ... ,kec and Di c == As,i, for i == kec + 1, ... ,po Upon defining diag (DIc ... D~C), the covariance matrix of Sec is given by [30]:
~SecSec == ~88
-
~8x~;1~x8
+ QsaecQT·
a ee :== (3.9)
The MMSE associated with CE and EC is given, respectively, by [c.f. (3.7) and (3.9)]:
Dee(R) : == trace(~scesce) == .l,
+ Ece(R),
Dee(R) : == trace(~secsec) == J o
+ Eec(R),
(3.10)
where Eee(R) :== trace(~sxQxaeeQ;~xs), Eec(R) :== trace(QsaecQf), and J o :== trace(~ss - ~sx~;; ~xs) is the MMSE achieved when estimating s based on x, without source encoding (R - t (0). Since J., is common to both EC and CE it is important to compare Eee(R) with Eee(R) in order to determine which estimation scheme achieves the smallest MSE. The following proposition provides such an asymptotic comparison: 2p PROPOSITION 3.1. If R > :== 0.5 max {log2 ((Ilf=1 Ax,i) / a ) , log2 ((I1~=1 As,i) /(As,p)P)}, then it holds that Eee(R) == 112-2R/N and
n.,
Eee(R) == ,22- 2R/ P, uhere v, and,2 are constants.
An immediate consequence of Proposition 3 is that the MSE for EC converges as R ~ 00 to J o with rate O(2- 2R/ p ) . The MSE of CE converges likewise, but with rate O(2- 2R / N ) . For the typical case N > p, EC approaches the lower bound J; faster than CE, implying correspondingly a more efficient usage of the available rate R. This is intuitively reasonable since CE compresses x, which contains the noise n. Since the last N - p eigenvalues of :E x x equal the noise variance a 2 , part of the available rate is consumed to compress the noise. On the contrary, the MMSE estimator s in EC suppresses significant part of the noise.
272
IOANNIS D. SCHIZAS ET AL.
-CEscheme . - . - . EC scheme Lower bound on MSE, J
- - - Decoupled EC . - . - . Upper bound determined by Aig. 2 Non-achievable lower bound
o
w
~ 2.5
~
6
is
2 1.5
0.5
15 20 Rale(bils)
25
30
10
35
15 20 Rate (bits)
25
30
35
FIG. 4. (Left): D-R region for EC and CE at SNR == 2; (Right): Distortion-rate bounds for estimating s in a two-sensor setup.
Let us examine now some special cases to gain more insight about Proposition 3. Scalar model (p == 1, N == 1): Let x == hs + n, where h is fixed, while s, n are uncorrelated with s N (0, a;), n N (0, a;), and a; == h 2 a; + a;. With a~ce and a~ec denoting the variances of See and Sec, respectively, we have shown in [30] that: PROPOSITION 3.2. For N == p == 1, it holds that ai ce == a~ec and hence the D-R junctions [or EC and CE are identical; i.e., Dee(R) == Dee(R). Vector model (p == 1, N > 1): With x == hs + n and after setting R t h :== 0.510g2 (1+ a;llhIl2ja2), we have established that [30]: . PROPOSITION 3.3. For R :::; R t h it holds that Ece(R) == Eec(R) and thus Dec(R) == Dce(R). For R > R t h , we have Eee(R) > Eee(R) and thus EC uses more efficiently the available rate. We define the signal-to-noise ratio (SNR) as SNR trace(H~ssHT)jN a 2 , and compare in Fig. 4 (Left) the MMSE when estimating s using the CE and EC schemes. With ~ss == a;I p , p == 4 and N == 40, we observe that beyond a threshold rate, the distortion of EC converges to J o faster than that of CE, which corroborates.Proposition 3. Our analysis so far raises the question whether EC is MSE optimal. We have shown that this is the case when estimating s with a given rate R and without forcing any assumption about Nand p. A related claim has been reported in [29,35] for N == p, but the extension to N -1= p is not obvious. We have established that [30]: PROPOSITION 3.4. The D-R junction when estimating s based on x can be expressed as '"'J
Ds(R) == where S MMSE.
min
E[lls - sR11 2 ] == E[llsI1 2 ] +
p(sRlx) I(x;sR)'5:R
==
'"'J
min
E[lls - sR1I 2 ],
p(sRls) I(s;sR)'5:R
~sx~;;x is the MMSE estimator, and
s is
(3.11)
the corresponding
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 273
Proposition 6 reveals that the optimal means of estimating s is to first form the optimal MMSE estimate S and then apply optimal ratedistortion encoding to this estimate. The lower bound on this distortion when R --+ 00, is J o == E[118112], which is intuitively appealing. The DR function in (3.11) is achievable, because the rightmost term in (3.11) corresponds to the D-R function for reconstructing the MMSE estimate S which is known to be achievable using random coding; see e.g., [3, p. 66].
3.2. Distortion-rate for distributed estimation. Let us now consider the D-R function for estimating s in a multi-sensor setup, under a total available rate R which has to be shared among all sensors. Because analytical specification of the D-R function in this case remains intractable, we will develop an alternating algorithm that numerically determines an achievable upper bound for it. Combining this upper bound with the nonachievable lower bound corresponding to an equivalent single-sensor setup, and applying the MMSE optimal EC scheme, will provide a (hopefully tight) region where the D-R function lies in. For simplicity in exposition, we confine ourselves to a two-sensor setup, but our results apply to any finite L > 2. To this end, we consider the following single-letter characterization of the upper bound on the D-R function:
15(R)
==
min
A
p( u i lx i ) ,p( u21x2) ,SR
2 E p(S'Ul'U2)[lls - SR(UI, u2)11 J,
(3.12)
I(X;Ul,U2)~R
where the minimization is w.r.t. {p(uilxi)}~=l and SR :== SR(UI, U2). Achievability of 15 (R) can be established by readily extending to the vector case the scalar results in [7]. To carry out the minimization in (3.12), we develop an alternating scheme whereby
U2
is treated as side information
that is available at the decoder when optimizing (3.12) w.r.t. p(ullxl) and SR(Ul, U2). The side information U2 is considered as the output of an optimal rate-distortion encoder applied to X2 for estimating s, without taking into account x j , Since X2 is Gaussian, the side information will have the form (c.f. subsection III-A.2) U2 == Q2X2 + (2' where Q2 E JRk2xN2 and k 2 ~ N 2 , due to the rate constrained encoding of X2. Recall that the k 2 x 1 vector (2 is uncorrelated with X2 and Gaussian; i.e., (2 '" N (0, :E(2(2)' Based on 1/1 :== [xi uf]T, the optimal estimator for s is the MMSE one: S == E[slxI' U2] == :Es?jJ:E;~ 1/1 == LIXI + L 2U2, where L I, L 2 are p x N 1 and p x k 2 matrices such that :Es?jJ:E;~ == [LI L 2]. If 8 is the corresponding MSE, then S == § + S, where S is uncorrelated with 1/1 due to the orthogonality principle. Noticing also that SR(Ul, U2) is uncorrelated with 8 because it is a function of x, and U2, we have E[lls-SR(Ul' u2)11 2] == E[lis - SR(UI, u2)11 2] + E[11 811 2], or,
E[lls-SR(UI' u2)11
2]
==
E[IIL I XI-(SR(UI, u2)-L2u2)112]+E[llsI12].
(3.13)
274
IOANNIS D. SCHIZAS ET AL.
Clearly, it holds that I(x; uj , U2) == R 2+ I(xl; Ul) - I(u2; ui), where R 2 :== I(x; U2) is the rate consumed to form the side information U2 and the rate constraint in (3.12) becomes I(x; u- , U2) ~ R ¢:} I(XI; u.) - I(u2; uj ) ~ R - R 2 :== R l . The new signal of interest in (3.13) is Lj xj ; thus, Ul has to be a function of LIXI. Using the fact that Xl ----t LIXI ----t UI, constitutes a Markov chain, we show in [30] that I(XI; UI) == I(LIXl; UI)' Using the latter, we obtain: (3.14)
From the RHS of (3.14), we deduce the equivalent constraint I(LlXl; Ul)I(u2; u j ) ~ R l . Combining the latter with (3.13) and (3.12), we arrive at the D-R upper bound: D(R l ) == E[l18112]
+
min
E[IILlXl - SR,12(Ul, u2)11 2J, (3.15)
p(u1IL 1xd,SR I(L 1x 1;ud-I(u1 ;u2)~R1
where SR,12(UI, U2) :== SR(Ul, u2)-L2U2. Through (3.15) we can determine an achievable D-R region, having available rate R I at the encoder and side information U2 at the decoder. Since Xl and U2 are jointly Gaussian, we can apply the Wyner-Ziv result [36], which allows us to consider that U2 is available both at the decoder and the encoder. This, in turn, permits re-writing the first term in (3.15) as: min
p(sRIL 1 Xl,U2)
E[IIL1x1 - [SR(U1, U2) - L 2u 2]1I 2 ] .
(3.16)
I(L 1 X 1 ;sRIU2)~Rl
If 81 :== E[L l Xllu2] == L l ~X1 U2 :E~}U2 U2 and 81 is the corresponding MSE, then we can write L l Xl == 81 +81. For the rate constraint in (3.16), we have:
I(L 1X l ; 8RIU 2 ) == I(LlXl - 81;8R - L 2U2 - 811 u 2)
(3.17)
== 1(81; 8R - L 2 U 2 - 81),
where the first equality is true because U2 is known; while the second one holds since U2 is uncorrelated with 81, due to the orthogonality principle, and likewise U2 is uncorrelated with 8R,12(Ul, U2) :== SR(Ul, u2)-L2U2-81. Utilizing (3.16) and (3.17), we arrive at: D(R l ) ==
~min_
p(sR,12l s d 1(81 jSR,12)~Rl
E[1I81 - 8R,12(Ul, u2)11 2] +
E[11 8
2].
11
(3.18)
Notice that (3.18) is the D-R function for reconstructing the MSE 81 with rate R 1 . Since 81 is Gaussian, we can readily apply rwf to the pre-whitened QT181 for_ determining D(R l ) and the corresponding test channel that achieves D( R l ) . Through the latter, and considering the next eigenvalue
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 275
decomposition ~SlSl == QSl diag(A s1,1 " . AS1,p)Qf1, we find that the first encoder's output that minimizes (3.12) has the form: (3.19) where QS 1,k1 denotes the first k1 columns of QSl , k, is the number of Q~ 81 entries that are assigned with non-zero rate, and QI :== Qf1, k 1 L I · The ki x 1 AWGN (I '" N (0, ~(l(l) is uncorrelated with x.. Additionally, we have
[~(l(l)ii == AS1,i D } /(AS1,i - D}), where D; == ( I1:~1 AS1,i )
1/ k 1 2-2Rl/kl,
for i == 1, ... , k I , and D} == AS1,i when i == k 1 + 1, ... ,po This way, we are able to determine also p(ullxl)' The reconstruction function has the form:
+ L1~XIU2~~21u2 U2 + L 2U2 QSl », 8 1Q~ ». L I ~Xl ~~2Iu2 U2,
SR(UI, U2) ==Q sl,k18 1 U1 where
E[llsI1
(3.20)
U2
[8 I ]ii == AS1,i D} /(AS1,i-D;), and the MMSE is D(R 1 ) == E~=I DJ+ 2
].
The approach in this subsection can be applied in an alternating fashion from sensor to sensor in order to determine appropriate p(uilxi), for i == 1,2, and SR(Ul, U2) that at best globally minimize (3.15). The conditional pdfs can be determined by finding the appropriate covariances ~(i(i' Furthermore, by specifying the optimal Ql and Q2, we have a complete characterization of the encoders' structure. The resultant algorithm is summarized next:
Algorithm 2 : Initialize QiO), Q~O), ~2~~1 '~2~~2 by applying optimal D-R encoding to each sensor's test channel independently. For a total rate R, generate M random increments {r(m)}~=o, such that 0 ~ r(m) ~ Rand
E~=o r(m) == R. Set R I (0) == R 2 (0) == O. for j == I,M do Set R(j) == E{=o r(l) for i == 1,2 do I == mod (i, 2) + 1 %The complementary index j Ro(j) == I(x; ui ) ) We use Q~j-l) ~(j-1) R(J') ,R (J') to determine Q~j) ~(j) and dis~ '(i(i' 0 ~ , (i(i tortion D(Ri (j))
end for
Update matrices Q(j) ~(j) _ l ' (l(l
that result the smallest distortion
[)(Rz (j) ), with l E [1, 2] Set Rl(j) = R(j) - I(x; ufj)) and Rr(j) end for
= I(x; ufj))·
276
IOANNIS D. SCHIZAS ET AL.
In Fig. 4 (Right), we plot the non-achievable lower bound which corresponds to one sensor having available the entire x and using the optimal EC scheme. Moreover, we plot an achievable D-R upper bound determined by letting the i-th sensor form its local estimate s, == E[slxi], and then apply optimal rate-distortion encoding to 8i. If SR, 1 and 8R,2 are the reconstructed versions of 81 and 82, respectively, then the decoder at the FC forms the final estimate 8R == E[sI8R,1, 8R,2]. We also plot the achievable D-R region determined numerically by the alternating algorithm. For each rate, we keep the smallest distortion returned after 500 executions of the algorithm simulated with ~ss == I p , p == 4, and N 1 == N 2 == 20, at SNR == 2. We observe that the algorithm provides a tight upper bound for the achievable D-R region. Using also the non-achievable lower bound (solid line), we have effectively reduced the 'uncertainty region' where the D-R function lies.
4. Distributed quantization-estimation. Consider a WSN consisting of N sensors deployed to estimate a deterministic p x 1 vector parameter O. The nth sensor observes an M x 1 vector of noisy observations x(n) == fn(lJ)
+ w(n),
n == 0,1, ... , N - 1 ,
(4.1)
M
where f n : RP --t R is a known (generally nonlinear) function and w(n) denotes zero-mean noise with pdf Pw(w), that is known possibly up to a finite number of unknown parameters. We further assume that w(n1) is independent of w(n2) for n1 =I- n2; i.e., noise variables are independent across sensors. We will use I n to denote the Jacobian of the differentiable function f n whose (i,j)th entry is given by [JnJij == 8[fnJi/8[OJj. Due to bandwidth limitations, the observations x(n) have to be quantized and estimation of 0 can only be based on these quantized values. We will henceforth think of quantization as the construction of a set of indicator variables
k == 1, ... , K ,
(4.2)
taking the value 1 when x(n) belongs to the region Bi.(n) C R M, and o otherwise. Estimation of 0 will rely on this set of binary variables {bk(n), k == 1, ... , K}~;Ol. The latter are Bernoulli distributed with parameters qk(n) satisfying
qk(n)
:==
Pr{bk(n) == 1} == Pr{x(n)
E
Bk(n)}.
(4.3)
In the ensuing sections, we will derive the Cramer-Rae Lower Bound (CRLB) to benchmark the variance of all unbiased estimators iJ constructed using the binary observations {bk(n), k == 1, ... , K}~~Ol. We will further show that it is possible to find Maximum Likelihood Estimators (MLEs) that (at least asymptotically) are known to achieve the CRLB. Finally, we will reveal that the CRLB based on {bk(n), k == 1, ... , K}N~ol can come surprisingly close to the clairvoyant CRLB based on {x( n) } ~';ol in certain applications of practical interest.
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION
277
4.1. Scalar parameter estimation - Parametric approach. Consider the case where (J ~ B is a scalar (p == 1), x(n) == B + w(n), and Pw(w) ~ Pw(w, a) is known, with a denoting the noise standard deviation. Seeking first estimators iJ when the possibly non-Gaussian noise pdf is known, we move on to the case where a is unknown, and prove that in both cases the variance of fJ based on a single bit per sensor can come close to the variance of the sample mean estimator, x :== N::' L::~ x(n). 4.1.1. Known noise pdf. When the noise pdf is known, we will rely on a single region Bl(n) in (4.2) to generate a single bit bl(n) per sensor, using a threshold Tc common to all N sensors: B, (n) :== B; == (Tc , (0), 'tin. Based on these binary observations, b1 (n) :== l{x(n) E (Tc,oo)} received from all N sensors, the fusion center seeks estimates of B. oo Let Fw(u) :== Ju Pw(w) dw denote the Complementary Cumulative Distribution Function (CCDF) of the noise. Using (4.3), we can express the Bernoulli parameter as, q1 == JT~-{) Pw (w )dw == Fw(Tc - B); and its MLE l as til == N::' bl(n). Invoking now the invariance property of MLE, it follows readily that the MLE of B is given by [27] 1 :
L::o
"
B = Tc
1
-1
-
Fw
(
N-1
)
N ~ b1 (n) .
(4.4)
Furthermore, it can be shown that the CRLB, that bounds the variance of any unbiased estimator iJ based on bl(n)::ol is [27]
") > ~ Fw(Tc - B)[l - Fw(Tc var (B - N p~(Tc _ B)
-
B)] '== B(B)
.
.
(4.5)
If the noise is Gaussian, and we define the a-distance between the threshold Tc and the (unknown) parameter B as ~c :== (Tc - B)/a, then (4.5) reduces to B( B)
==
a 2 27rQ(~c)[1 - Q(~cJ
N
e-~c
(4.6)
with Q( u) :== (1I~) Ju e- / 2 dw denoting the Gaussian tail probability function. The bound B(B) is the variance of i: scaled by the factor D(fl c ) ; recall that var(x) = 0"2 IN [13, p.31]. Optimizing B(B) with respect to fl c , yields the optimum at fl c == 0 and CXJ
w2
(4.7) 1 Although related results are derived in [27, Prop.l] for Gaussian noise, it is straightforward to generalize the referred proof to cover also non-Gaussian noise pdfs.
278
IOANNIS D. SCHIZAS ET AL.
the minimum CRLB. Eq. (4.7) reveals something unexpected: relying on a single bit per x(n), the estimator in (4.4) incurs a minimal (just a 7f/2 factor) increase in its variance relative to the clairvoyant x which relies on the unquantized data x(n). But this minimal loss in performance corresponds to the ideal choice ~c == 0, which implies T c == 0 and requires perfect knowledge of the unknown 0 for selecting the quantization threshold T c . A closer look at B(0) in (4.5) will confirm that the loss can be huge if T c - 0 »0. Indeed, as T c - 0 - t 00 the denominator in (4.5) goes to zero faster than its numerator, since Fw is the integral of the non-negative pdf Pw; and thus, B(O) - t 00 as T c - 0 - t 00. The implication of the latter is twofold: i) since it shows up in the CRLB, the potentially high variance of estimators based on quantized observations is inherent to the possibly severe bandwidth limitations of the problem itself and is not unique to a particular estimator; ii) for any choice of T c , the fundamental performance limits in (4.5) are dictated by the end points T c - 8 1 and T c - 8 2 when 0 is confined to the interval [8 1,82 ] . On the other hand, how successful the T c selection is depends on the dynamic range 181 - 8 2 which makes sense because the latter affects the error incurred when quantizing x(n) to b1 (n). Notice that in such joint quantization-estimation problems one faces two sources of error: quantization and noise. To account for both, the proper figure of merit for estimators based on binary observations is what we will term quantization signal-to-noise ratio (Q-SNR): 1
(4.8) Notice that contrary to common wisdom, the smaller Q-SNR is, the easier it becomes to select 'T c judiciously. Furthermore, the variance increase in (4.5) relative to the variance of the clairvoyant x is smaller, for a given (J'. This is because as the Q-SNR increases the problem becomes more difficult in general, but the rate at which the variance increases is smaller for the CRLB in (4.5) than for var(x) == (J'2 IN.
4.1.2. Known noise pdf with unknown variance. No matter how small the variance in (4.5) can be made by properly selecting T c , the estimator () in (4.4) requires perfect knowledge of the noise pdf which may not be always justifiable. A more realistic approach is to assume that the noise pdf is known (e.g., Gaussian) but some of its parameters are unknown. A case frequently encountered in practice is when the noise pdf is known except for its variance E[w2(n)] == (J'2. Introducing the standardized variable v(n) :== w(n)l(J' we write the signal model as x(n) == 0 + (J'V (n). oo
(4.9)
Let Pv(v) and Fv(v) :== Jv Pv(u)du denote the known pdf and CCDF of v(n). Note that according to its definition, v(n) has zero mean, E[v2(n)] ==
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 279
1, and the pdfs of v and ware related by Pw(w) == (l/(J)Pv(w/(J). Note also that all two parameter pdfs can be standardized likewise. To estimate 0 when (J is also unknown while keeping the bandwidth constraint to 1 bit per sensor, we divide the sensors in two groups each using a different region (i.e., threshold) to define the binary observations: (4.10)
That is, the first N /2 sensors quantize their observations using the threshold 71, while the remaining N /2 sensors rely on the threshold 72. Without loss of generality, we assume 72 > 71. The Bernoulli parameters of the resultant binary observations can be expressed in terms of the CCDF of v(n) as:
r; [ 7 1 (J- OJ
'.== ql
for n == 0, ... , (N/2) - 1,
r; [ 72 (J- 0]
'.== q2
for n == (N /2), ... ,N.
(4.11 )
Given the noise independence across sensors, the MLEs of ql, q2 can be found, respectively, as
(4.12)
Mimicking (4.4), we can invert F; in (4.11) and invoke the invariance property of MLEs, to obtain the MLE {) in terms of ql and Q2. This result is stated in the following proposition that also derives the CRLB for this estimation problem 2 . PROPOSITION 4.1. Consider estimating 0 in (4.9) ,.based on binary observations constructed from the regions defined in (4.10). (a) The MLE of 0 is
o= Fv-1(lh)Tl -
Fv-1(ih)T2
Fv- 1(q2) - Fv- I (qi )
,
(4.13)
with Fv- 1 denoting the inverse function of Fv , and ql, q2 given by (4.12). (b) The variance of any unbiased estimator of 0, var( B), based on {bi (n )}~~ol is bounded by
2 B(O) := 2a ( ~1~2 N ~2 - ~1
)2 [qd 1 - qI) + q2 (1- q2)] p~(~l)~I
p~(~2)~~
(4.14)
20 m it t ed due to space considerations, proofs pertaining to claims in this section can be found in [28].
280
IOANNIS D. SCHIZAS ET AL.
-~5
_ ""
_ 3
..
_~
_ 1
FIG. 5. Per bit CR L B when the binary observations are independent (Section 4.1.2) and depen dent (Section 4.1 .3) , respecti vely . In both cases, the variance increase with respect to the sample mean estimator is small when the a-distances are close to 1, being slightly better for the case of depend ent binary observations (Gaussian nois e) .
where qk is given by (4. 11), and 7k - B !J.k := - - ,
k = 1,2 ,
(4.15)
(J'
is the (J' -dista nce between B and the threshold 7 k . Eq. (4.14) is reminiscent of (4.5), suggest ing that the vari an ces of t he estimators t hey bound are related . This impli es t hat even when t he known noise pdf contains unknown param eters the varian ce of fJ ca n come close to the varia nce of t he clair voyant estima to r X, pr ovided t hat the thresholds 71, 72 are chosen close to B relative to the noise st andard deviation (so that .6. 1 , .6. 2 , and .6. 2 - .6. 1 in (4.15) a re se 1). For the Gaussian pdf, Fig. 5 shows the cont our plot of B(B) in (4.14) normalized by (J'2 j N := var(x). Notice that in the low Q-SNR regime !J. 1,!J. 2 :::::: 1, and t he relativ e variance increase B (B) j var(x ) is less than 3. 4.1.3. Dependent binary observations. In t he pr evious subsection , we restricted t he sensors to transmit only 1 bit per x(n) datum, and divided the sensors in two classes each qu an ti zing x(n) usin g a different threshold . A related approa ch is to let each senso r use two t hresholds :
B 1(n ) .- B 1= (71 , 00),
n=0,1 ,
,N-1 ,
B 2(n ) .- B2= (72,00),
n=0,1 ,
,N - 1
(4.16)
where 72 > 71 . We define the per sensor vecto r of bin ar y observat ions b(n) := [b1(n), b2(n)]T, and the vecto r Bern oulli par am et er q := [q1(n),q2(n)]T, whose components are as in (4.11). Note the subtle differences between (4.10) and (4.16). While each of the N sensors generates 1 bin ar y observation acco rding to (4.10), each sensor creates 2 bin ar y observat ions as per (4.16). The t ot al number of bits from all senso rs in the former case is N, but in th e lat ter N log2 3, since our cons traint 72 > 71 impli es that the realization b = (0,1) is imp ossible. In
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 281
addition, all bits in the former case are independent, whereas correlation is present in the latter since bl (n) and b2(n) come from the same x(n). Even though one would expect this correlation to complicate matters, a property of the binary observations defined as per (4.16), summarized in the next lemma, renders estimation of () based on them feasible. LEMMA 4.1. The MLE ofq:== (ql(n),q2(n))T based on the binary observations {b( n)} ~:OI constructed according to (4.16) is given by
q= ~
N-I
L b(n).
(4.17)
n=O
Interestingly, (4.17) coincides with (4.12), proving that the corresponding estimators of () are identical; i.e., (4.13) yields also the MLE Beven in the correlated case. However, as the following proposition asserts, correlation affects the estimator's variance and the corresponding CRLB. PROPOSITION 4.2. Consider estimating () in (4.9), when a is unknown, based on binary observations constructed from the regions defined in (4.16). The variance of any unbiased estimator of (), var( B), based on {bl (n), b2 (n) }~:OI is bounded by
B D (()) :=
2 0-
N
(
~1~2
~2 - ~I
ql (1 - ql)
[ p~(~d~i
)2 (4.18)
q2 (1 - q2)
+ P~(~2)~~
q2 (1 - ql)
-
pv(~dp(~2)~1~2
] 1
where the subscript D in B D ( ()) is used as a mnemonic for the dependent binary observations this estimator relies on [c.f. (4.14)). Unexpectedly, (4.18) is similar to (4.14). Actually, a fair comparison between the two requires compensating for the difference in the total number of bits used in each case. This can be accomplished by introducing the per-bit CRLBs for the independent and correlated cases respectively, C(()) == NB(()),
(4.19)
which lower bound the corresponding variances achievable by the transmission of 1 bit. Evaluation of C(())/a 2 and CD(())/a 2 follows from (4.14), (4.18) and (4.19) and is depicted in Fig. 5 for Gaussian noise and a-distances ~I, ~2 having amplitude as large as 5. Somewhat surprisingly, both approaches yield very similar bounds with the one relying on dependent binary observations being slightly better in the achievable variance; or correspondingly, in requiring a smaller number of sensors to achieve the same CRLB.
4.2. Unknown noise pdf. In certain applications it may not be reasonable to assume knowledge about the noise pdf Pw(w). These cases require non - parametric approaches as the one pursued in this section.
282
, T
FIG. 6. When the noise pdf is unknown numerically integrating the CCDF using the trapezoidal rule yields an approximation of the mean.
We assume that Pw(w) has zero mean so that () in (4.1) is identifiable. Let Px(x) and Fx(x) denote the pdf and CCDF of the observations x(n). As () is the mean of x(n), we can write ():=
8F(x) +00 xpx(x) dx = - 1+00 x~ dx = 1-00 -00 vX
1 1
Fx- 1(v) dv, (4.20)
0
where in establishing the second equality we used the fact that the pdf is the negative derivative of the CCDF, and in the last equality we introduced the change of variables v = Fx (x). But note that the integral of the inverse CCDF can be written in terms of the integral of the CCDF as (see also Fig. 6)
roo
0
() = -
1
_00[1 - Fx(U)] du + Jo
Fx(u) du,
(4.21)
allowing one to express the mean () of x(n) in terms of its CCDF . To avoid carrying out integrals with infinite range, let us assume that x(n) E (-T, T) which is always practically satisfied for T sufficiently large, so that we can rewrite (4.21) as
o=
IT
Fx(u) du - T.
(4.22)
-T
Numerical evaluation of the integral in (4.22) can be performed using a number of known techniques. Let us consider an ordered set of interior points {Td~=l along with end-points TO = -T and TK+1 = T . Relying on the fact that Fx(TO) = Fx( -T) = 1 and Fx(TK+d = Fx(T) = 0, application of the trapezoidal rule for numerical integration yields (see also Fig. 6), 1 K
() =
2" 2:)Tk+1 - Tk-l)Fx(Tk) - T + e a ,
(4.23)
k=l with ea denoting the approximation error . Certainly, other methods like Simpson's rule, or the broader class of Newton-Cotes formulas, can be used to further reduce ea'
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION
283
Whichever the choice, the key is that binary observations constructed from the region Bi. :== (Tk' (0) have Bernoulli parameters
(4.24) Inserting the non-parametric estimators Fx(Tk) == qk in (4.23), our parameter estimator when the noise pdf is unknown takes the form: "
1
K
L qk(Tk+1 - Tk-d
B= 2
- T.
(4.25)
k=l
Since qk's are unbiased, (4.23) and (4.25) imply that E(B) == 0 + ea. Being biased, the proper performance indicator for Bin (4.25) is the Mean Squared Error (MSE), not the variance. Maintaining the bandwidth constraint of 1 bit per sensor (i.e. K == 1), let us divide the N sensors in K subgroups containing N / K sensors each, and define the regions
B1(n) :== Bi; == (Tk' (0), n == (k - l)(N/ K), ... ,k(N/ K) - 1;
(4.26)
the region B 1 (n) will be used by sensor n to construct and transmit the binary observation b1(n). Herein, the unbiased estimators of the Bernoulli parameters qk are 1
k(N/K)-l
L
qk= (N/K)
b1 (n),
k=l, ... ,K,
(4.27)
n=(k-l)(N/ K)
and are used in (4.25) to estimate O. It is easy to verify that var(qk) == qk(l - qk)/(N/ K), and that qk1 and qk2 are independent for k 1 i= k2. The resultant MSE, E[(O - 8)2], will be bounded as follows'. PROPOSITION 4.3. Consider the estimator 8 given in (4.25), with qk as in (4.27). Assume that for T sufficiently large and known Px(x) == 0, for Ixl ~ T,. the noise pdf has bounded derivative Pw (u) :== 8pw (w)/ In»; and define T m ax :== maxk{Tk+l - Tk} and Pmax :== maxuE(-T,T){pw(U)}. The MSE is given by,
(4.28) with the approximation error e a and var( 8), satisfying
Tpmax
2
Iea I ~
- 6 - Tm ax '
(4.29)
") = ((}
~
(4.30)
var
(Tk+1 - Tk_d 2 qk(l - qk) ~ 4 N/K'
30 mit t ed due to space considerations, proofs pertaining to claims in this work can be found in [28].
284
IOANNIS D. SCHIZAS ET AL.
with {Tk}f=1 a grid of thresholds in (-T, T) and {Qk}f'=1 as in (4.24). Note from (4.30) that the larger contributions to var( 0) occur when qk ~ 1/2, since this value maximizes the coefficients Qk(l-Qk); equivalently, this happens when the thresholds satisfy Tk ~ 0 [c.f. (4.24)]. Thus, as with the case where the noise pdf is known, when 0 belongs to an a priori known interval [8 1,82 ] , this knowledge must be exploited in selecting thresholds around the likeliest values of O. On the other hand, note that the var( 0) term in (4.28) will dominate le al 2 , because le al 2 ex T~ax as per (4.29). To clarify this point, consider an equispaced grid of thresholds with Tk+l - Tk == T == Tm a x, Vk, such that Tm a x == 2T/(K + 1) < 2T/K. Using the (loose) bound qk(l- qk) ::; 1/4, the MSE is bounded by [c.f. (4.28) - (4.30)]
E[(O _ 0)2] <
4T 6 '2m ax P
9K4
T2
+_
(4.31)
N'
The bound in (4.31) is minimized by selecting K == N, which amounts to having each sensor use a different region to construct its binary observation. In this case, le al 2 ex N-4 and its effect becomes practically negligible. Moreover, most pdfs have relatively small derivatives; e.g., for the Gaussian pdf we have Pmax == (21rea 4 ) -1/2 . The integration error can be further reduced by resorting to a more powerful numerical integration method, although its difference with respect to the trapezoidal rule will not have any impact in practice. Since K == N, the selection Tk+l - Tk == T, Vk, yields
{) =
T;N-l
[Nl] ». (n) - 1
b1 (n) - T = T N ~ 1 ;
,
(4.32)
that does not require knowledge of the threshold used to construct the binary observation at the fusion center of a WSN. This feature allows for each sensor to randomly select its threshold without using values pre-assigned by the fusion center; see also [16] for related random quantization algorithms. REMARK 4.1. While e~ ex T6 seems to dominate var( 0) ex T2 in (4.31), this is not true for the operational low-to-medium Q-SNR range for distributed estimators based on binary observations. This is because the support 2T over which Fx(x) in (4.22) is non-zero depends on a and the dynamic range 181 - 8 2 of the parameter e. And as the Q-SNR decreases, T ex a. But since Pmax ex a- 2 , e~ ex a 2 I N4 which is negligible when compared to the term var(O) ex a 2 IN. REMARK 4.2. Pdf-unaware bandwidth-constrained distributed estimation was introduced in [16], where it was referred to as universal. At the (relatively minor) restriction of deterministically-assigned thresholds, the estimator in (4.32) achieves a four times smaller variance than the universal estimator in [16] which can afford randomly assigned thresholds 1
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 285
though it is true that Bin (4.32) can also be implemented with randomly assigned thresholds, its MSE in (4.31) has been derived for deterministically assigned ones. The reason behind this noticeable performance improvement is that the approach here implicitly utilizes the data pdf (through the numerical approximation of the CCDF) in constructing the asymptotic MLE of (4.25). The only extra condition required over [16] is for the pdf to be differentiable, which is typically satisfied in practice. Also, the approach herein is readily generalizable to estimation of vector parameters a practical scenario where universal estimators like those in [16] are yet to be found. Apart from providing useful bounds on the finite-sample performance, Eqs. (4.29), (4.30), and (4.31) establish asymptotic optimality of the estimators in (4.25) and (4.32) as summarized in the following:
e
COROLLARY 4.1. Under the assumptions of Propositions 4.3 and the conditions: i) Tm a x ex K- 1 ; and ii) T 2jN,T6/K 4 -+ 0 as T,K,N -+ 00, the estimators in (4.25) and (4.32) are asymptotically (as K, N -+ (0) unbiased and consistent in the mean-square sense. The estimators in (4.25) and (4.32) are consistent even if the support of the data pdf is infinite, as long as we guarantee a proper rate of convergence relative to the number of sensors and thresholds. REMARK 4.3. To compare the estimators in (4.4) and (4.32), consider that () E [8 1,8 2 ] == [-0-,0-], and that the noise is Gaussian with variance 0- 2 , yielding a Q-SNR 1 == 4. No estimator can have variance smaller than var(x) == (J2 jN; however, for the (medium) , == 4 Q-SNR value they can come close. For the known pdf estimator in (4.4), the variance is var( e) ~ 20- 2 / N. The unknown pdf estimator in (4.32) requires an assumption about the essentially non-zero support of the Gaussian pdf. If we suppose that the noise pdf is non-zero over [- 2(J, 2(J], the corresponding variance becomes var(B) ~ 9(J2 IN. The penalties due to the transmission of a single bit per sensor with respect to x are approximately 2 and 9. While the increasing penalty is expected as the uncertainty about the noise pdf increases, the relatively small loss is rather unexpected.
e
4.3. Vector parameter generalization. Let us now return to the general problem we started with in Section 2. We begin by defining the per sensor vector of binary observations b(n) :== (b1(n), ... , bK(n))T, and note that since its entries are binary, realizations f3 of b(n) belong to the set
B:=={,(3ER K I [f3]kE{O,l}, k==l, ... ,K},
(4.33)
where [,(3]k denotes the k t h component of,(3. With each f3 E B and each sensor we now associate the region
Bj3(n):==
n [f3] k =1
Bk(n)
n [,(3] k =0
Bk(n),
(4.34)
2R6
TOANNTS
n.
[JJJBi n) 1::::::::::IBln)
FIG. 7. (Left): The vector of binary observations b takes on the value {Yl, Y2} if and only if x(n) belongs to the region B{Yl,Y2}; (Right): Selecting the regions Bk(n) perpendicular to the covariance matrix eigenvectors results in independent binary observations.
where Bk(n) denotes the set-complement of Bk(n) in RM. Note that the definition in (4.34) implies that x(n) E B,s(n) if and only if b(n) == {3; see also Fig. 7 (Left) for an illustration in R 2 (M == 2). The corresponding pro babilities are:
q(3(n)
:=
Pr{b(n)
= ,B} =
r
Pw[u - fn(O); 1/.'] du,
(4.35)
JB{3(n)
with f n as in (4.1), and 1/J containing the unknown parameters of the known noise pdf. Using definitions (4.35) and (4.33), we can write the pertinent log-likelihood function as N-l
£(8,1/J) ==
L L l5(b(n) - (3) In q,s(n) ,
(4.36)
n=O yE13
and the MLE of 8 as (4.37) The nonlinear search needed to obtain fJ could be challenging. Fortunately, as the following proposition asserts, under certain conditions that are usually met in practice, L( 8, 1/J) is concave which implies that computationally efficient search algorithms can be invoked to find its global maximum. PROPOSITION 4.4. If the MLE problem in (4.37) satisfies the conditions: ~ Pw(w) is log-concave [6, p. 104], and 1/J is known. [c2] The functions f n (6 ) are linear; i.e., f n (6 ) == H n6, with H n E R(Mxp). [c3] The regions Bk(n) are chosen as half-spaces.
[cl] The noise pdf Pw(w; 1/J)
then £(8) in (4.36) is a concave function of 6.
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 287
Note that [cl] is satisfied by common noise pdfs, including the multivariate Gaussian [6, p.l04]; and also that [c2] is typical in parameter estimation. Moreover, even when [c2] is not satisfied, linearizing f n (0) using Taylor's expansion is a common first step, typical in e.g., parameter tracking applications. On the other hand, [c3] places a constraint in the regions defining the binary observations, which is simply up to the designer's choice.
4.3.1. Colored Gaussian noise. Analyzing the performance of the MLE in (4.37) is only possible asymptotically (as N or SNR go to infinity). Notwithstanding, when the noise is Gaussian, simplifications render variance analysis tractable and lead to interesting guidelines for constructing the estimator {). Restrict Pw(w;1jJ) ~ Pw(w) to the class of multivariate Gaussian pdfs, and let C(n) denote the noise covariance matrix at sensor n. Assume that {C(n)} ~==-Ol are known and let {(em(n), (J~ (n))} ~= 1 be the set of eigenvectors and associated eigenvalues: M
C(n) ==
L (J~(n)em(n)e~(n).
(4.38)
m=l
For each sensor, we define a set of K == M regions B k ( n) as half-spaces whose borders are hyper-planes perpendicular to the covariance matrix eigenvectors; i.e.,
Bk(n) == {x E R M I e[(n)x 2: Tk(n)},
k == 1, ... , K == M,
(4.39)
Fig. 7 (Right) depicts the regions Bk(n) in (4.39) for M == 2. Note that since each entry of x( n) offers a distinct scalar observation, the selection K == M amounts to a bandwidth constraint of 1 bit per sensor per dimenSlone
The rationale behind this selection of regions is that the resultant binary observations bk(n) are independent, meaning that Pr{b k1(n)bk 2(n)} == Pr{bk1(n)} Pr{b k2(n)} for k1 i- k 2 . As a result, we have a total of MN independent binary observations to estimate o. Herein, the Bernoulli parameters qk(n) take on a particularly simple form in terms of the Gaussian tail function,
where we introduced the a-distance between fn(O) and the corresponding threshold ~k(n) :== [Tk(n) - eI(n)fn(O)]j(Jk(n). Moreover, for simplicity we denote the Q function in (4.40) as Q(~k(n)).
288
IOANNIS D. SCHIZAS ET AL.
Due to the independence among binary observations we have
p(b(n)) == Il~=l [qk(n)]b k(n)[l - qk(n)]l-bk(n), leading to N-l K
£(6)
=L
L bk(n) In qk(n) + [1 - bk(n)]ln[1 - qk(n)],
(4.41)
n=O k=l
whose N K independent summands replace the N2 K dependent terms in (4.36). Since the regions B k (n) are half-spaces, Proposition 4.4 applies to the maximization of (4.41) and guarantees that the numerical search for the {) estimator in (4.41) is well-conditioned and will converge to the global maximum, at least when the functions f n are linear. More important, it will turn out that these regions render finite sample performance analysis of the MLE in (4.37), tractable. In particular, it is possible to derive a closed-form expression for the Fisher Information Matrix (FIM) [13, p.44], as we establish next. PROPOSITION 4.5. The FIM, I, for estimating (J based on the binary observations obtained from the regions defined in (4.39), is given by (4.42)
where I n denotes the Jacobian of fn(O). Inspection of (4.42) shows that the variance of the MLE in (4.37) depends on the signal function containing the parameter of interest (via the Jacobians), the noise structure and power (via the eigenvalues and eigenvectors), and the selection of the regions B k (n) (via the a-distances). Among these three factors only the last one is inherent to the bandwidth constraint, the other two being common to the estimator that is based on the original x( n) observations. The last point is clarified if we consider the FIM L, for estimating (J given the unquantized vector observations x(n). This matrix can be shown to be (see [28, Apx. DJ),
I x
= ~1 JT ~
n=O
n
[~ em(n)e;,,(n)] JT ~ 2 ( ) m=l
am
(4.43)
n'
ti
If we define the equivalent noise powers as (4.44)
we can rewrite (4.42) in the form (4.45)
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 289
which except for the noise powers has form identical to (4.43). Thus, comparison of (4.45) with (4.43) reveals that from a performance perspective,
the use of binary observations is equivalent to an increase in the noise variance from (J"~ (n) to p~ (n), while the rest of the problem structure remains unchanged. Since we certainly want the equivalent noise increase to be as small as possible, minimizing (4.44) over ~k (n) calls for this distance to be set to zero, or equivalently, to select thresholds Tk (n) == el (n )fn (0). In this case, the equivalent noise power is
(4.46) Surprisingly, even in the vector case a judicious selection of the regions Bk(n) results in a very small penalty (1r /2) in terms of the equivalent noise increase. Similar to Sections 4.1.1 and 4.1.2, we can thus claim that while requiring the transmission of 1 bit per sensor per dimension, the variance of the MLE in (4.37), based on {b(n)}~==-ol, yields a variance close to the clairvoyant estimator's variance -based on {x(n)}~==-ol- for low-to-medium Q-SNR problems. 5. Simulations. In this section we provide numerical results for the distributed estimation schemes developed in Sections I and III. 5.1. Distributed dimensionality reduction. We first test the MMSE performance versus k for the EC scheme and the estimator returned by Algorithm 1. To assess the difference in handling noise effects, we also compare EC and Algorithm 1 with the schemes in [38] and [37], which we abbreviate as C'E and C"E because they perform compression (C) followed by estimation (E). Although G'E and C"E have been derived under ideal link conditions, we modify them here to account for D i . Our comparisons will further include an option we term CE, which compresses first the data and reconstructs them at the Fe using Co and BO found by (2.10) after setting s == x, and then estimates s based on the reconstructed data vector x. For benchmarking purposes, we also plot J o , achieved when estimating s based on uncompressed data transmitted over ideal links. Test Case 1 (EC with uncorrelated sensor data): We consider first the decoupled case of Section 3, where MMSE performance is characterized by the single sensor (L == 1) setup. Fig. 8 (Left) depicts the MMSE versus k for J o , EC, CE, C'E and C"E for a linear model x == Hs-j- n, where N == 50 and p == 10. The matrices H, ~ss and ~nn, are selected randomly such that tr(H~ssHT)/tr(~nn) == 2, while sand n are uncorrelated. We set :E z z == a;Ik' and select P such that 10 loglO(P/ a;) == 7dB. As expected J; benchmarks all curves, while the worst performance is exhibited by C'E. Albeit suboptimal, CE comes close to the optimal EC. The monotonic decrease of MMSE with k for EC corroborates Corollary 2. Contrasting it with the increase C"E exhibits in MMSE beyond a certain k, we can
290
IOANNIS D. SCHIZAS ET AL.
....................................
_---_._-----._--
... - C'E
7,----~---;=======1-, _..
'
-
-01.
EC-d
-
C' E
_._-....-.... Oecoupled EC
- .. - c" e
- - - AJg. ' - ... - Centralized EC (Lal) _ J.
- ~ - CE - - - EC
~ PJ,,: ..7dB
oo'---~-~--:o--~-----:' 20
'"
50
o L--~--~--~---:'. 10 20
FIG. 8. MMSE comparisons versus k for a centralized, L tributed 3-sensor setup (Right).
.~
=1
"
(Left) , and a dis-
appreciate the importance of coping with noise effects. This increase is justifiable since each entry of the compressed data in C"E is allocated a smaller portion of the given power as k grows. In EC however, the quality of channel links and the available power determine the number of the compressed components (which might lie in a vector space of dimensionality K, ~ k), and allocate power optimally among them . Test Case 2 (Algorithm 1 with correlated sensor data): Here we consider a 3-sensor setup using the same linear model as in Test Case 1, while setting N 1 = N 2 = 17 and N3 = 16. FC noise Zi is white with variance IT;" The power Pi and variance IT;, are chosen such that 10log1o(P/IT;,) = 13dB, for i = 1,2,3, and the tolerance quantity for the Algorithm 1 is set to 3 E = 10- • Fig. 8 (Right) depicts the MMSE as a function of the total number kt ot = L;=1 ki of compressed entries across sensors for: i) a centralized EC setup for which a single (virtual) sensor (L = 1) has available the data vectors of all three sensors; ii) the estimator returned by Algorithm 1; iii) the decoupled EC estimator which ignores sensor correlations; iv) the C'E and v) an iterative estimator developed in [31], denoted here as EC-d, which similar to C'E accounts for fading but ignores noise. Interestingly, our decentralized Algorithm 1 comes very close to the hypothetical single-sensor bound of the centralized EC estimator, while outperforming the decoupled EC one. Also worth noting is that EC-d performs close to Algorithm 1 for small values of k tot , but as k tot increases it behaves as bad as C'E. 5.2. Scalar parameter estimation - parametric approach. We begin by simulating the estimator in (4.13) for scalar parameter estimation in the presence of AWGN with unknown variance . Results are shown in Fig. 9 for two different sets of IT-distances, <6. 1 , <6. 2 , corroborating the values predicted by (4.14) and the fact that the performance loss with respect to the clairvoyant sample mean estimator, X, is indeed small .
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION 291
rc-:"-~·, · " , ;· · : < " , · > " , ,,, , < , " < , , ": " ' ;" ,, ,, ' l "':"
numberolae nsou
:=1
Ie mean
,,'
EfTlIiticaI and lhflOrll1lcalvariance (to. -2 . ' 1 . 0.5)
ou!rberof aenllOf8
FIG. 9. Noise of unknown power estimator. The simulation corroborates the close to clairvoyant variance prediction of (4.14) (a = 1, () = 0, Gaussian noise) .
• "
l'!~ical (gauuian noisll'l) el'f1)iOcal (unilorm I'lOISI!!)
- varianoe botlnd
10· '
:
:
; ... •
number ol aenaorl
FIG. 10. The variance of the estimators in (4.4) and (4.32) are close to th e sample mean estimator variance (a 2 := E[w 2(n)1 = 1, T = 3, () E [-1 ,11).
5.3. Scalar parameter estimation - unknown noise pdf. Fig. 10 depicts theoretical bounds and simulated variances for the estimators (4.4) and (4.32) for an example Q-SNR 'Y = 4. The sample mean estimator variance, var(x) = (J"2 IN, is also depicted for comparison purposes. The simulations corroborate the implications of Remark 3, reinforcing the idea that for low to medium Q-SNR problems quantization to a single bit per observation leads to minimal losses in variance performance. Note that for this particular example the unknown pdf variance bound, (4.31), overestimates the variance by a factor of roughly 1.2 for the uniform case and roughly 2.6 for the Gaussian case.
292
IOANNIS D. SCHIZAS ET AL.
,/ "
, '1'
n / /
...
/ ' \'
/ '
... -: ... ..-
/
...
/
,,"'-
/'" ....
./
/
»:
./
of"
, ./
FIG. 11. Th e vector fl ow v in cises over a certain sensor capable of m easu ring th e normal com ponen t of v .
5.4. Vector parameter estimation - A motivating application. In this section, we illustrate how a problem involving vector parameters can be solved using the estimators of Section 4.3.1. Suppose we wish to est ima te a vector flow usin g incidence obs ervations. With reference to Fig . 11, conside r the flow vect or v := (VO, vl f, and a senso r positioned at an angle ¢( n) with resp ect to a known referen ce directi on . We will rely on a set of so ca lled incidence observat ions {x(n) } ~:Ol measuring t he component of the flow norm al to t he corres po nding senso r
x(n) := (v , n ) + w(n) =
Vo
sin[¢(n)] + VI cos[¢(n)] + w(n) ,
(5. 1)
where (,) denot es inner product , w(n) is zero -m ean AWGN , and n = 0,1 , . .. , N - 1 is the sensor ind ex . The mod el (5.1) applies to t he measuremen t of hydraulic fields , pr essure variations induced by wind and radi ation from a dist ant source [20]. Estima ting v fits the fram ework of Secti on 4.3.1 requiring t he t ransmission of a single binary observat ion per sensor, bl (n) = 1 {x(n) 2: 71 (n) } . The FIM in (4.45) is eas ily found to be I =
~ _1_
f:'o pr(n)
(
2
sin [¢(n) ] sin[¢(n)] cos[¢ (n )]
sin[¢ (n)] cos[¢(n)] )
cos 2[ ¢(n)]
(5.2)
.
Furthermore, sin ce x(n) in (5.1) is linear in v and the noise pdf is logconcave (Gaussian) t he log-likelihood function is concave as ass erted by Proposition 4.4 . Suppose t hat we a re a ble to place the thresholds opt imally as implied by 7 1 (n) = Vo sin [¢(n)] + VI cos [¢(n) ], so th a t pr (n) = Crr /2 )a2 . If we a lso make t he reas onable ass umption t hat t he angles a re random and uniformly dist ributed , ¢(n) '" U[-1r , 1r]' th en t he ave rage FIM t urns out to be:
I _ ~ ( N/2 -
1rCT2
0
0
N/2
)
.
(5.3)
DIMENSIONALITY RED UCTIO N, COMPRESSION AND QUANTIZATION 293
nou mbllr of MWulo rl
10-J L -
' - -_ _~.,__------=..::..:o.-J
,.'
FIG. 12. Average variance for th e com pone n ts of v . Th e em pirica l as well as th e bound (5.4) are compared with the analog observ ations based MLE (v = (1,1) , (T = 1).
But according to t he law of large numb ers I ~ ance will be approximate ly given by
I, and
t he estimation vari-
7r(J 2
var(vo) = var(vI) =
N ·
(5.4)
Fig. 12 depicts the bound (5.4), as well as the simulated varian ces va~vo ) and var (VI) in comparison with t he clairvoyant MLE based on { x(n) } n':ol , corroborating our an alytical expressions. 6. Conclusions. We considered t he probl em of est imat ion in wireless sensor networks showing tha t th e seemingly unr elat ed probl ems of dimensionality reducti on, compression, quantization and est imation are actua lly intertwined due to t he distributed nature of t he WSN. We started by deriving algorit hms for est imating stat iona ry random signals based on reduced-dim ensionality observations collected by powerlimited wireless sensors linked with a fusion cent er. We dealt with non-ideal channel links that are characte rized by multiplicative fadin g and additive noise. Wh en data across sensors are uncorrelat ed, we established global mean-square error opt imal schemes in closed-form and pr oved t hat th ey implement estimat ion followed by compression per sensor. Besides dist ributed est ima tion with redu ced dimensionality decoupled observat ions, such closed-form solutions are valua ble for all applications prin cipal compon ents and canoni cal correlation ana lysis are sought in t he presence of multiplicative and additive noise. For correlate d sensor observations, we developed an algorithm t hat relies on block coord inate descent iterations which are gua ra ntee d to converge at least to a local stationa ry point of the associate mean-square error cost. Th e optimal est ima tors allocate properly the prescrib ed power following a wat erfilling-like principle to balance
294
IOANNIS D. SCHIZAS ET AL.
judiciously channel effects and additive noise at the fusion center with the degree of dimensionality reduction that can be afforded. Continuing, with digital-amplitude data transmission we determined the distortion -rate (D-R) function for estimating a random vector in a single-sensor setup and established the optimality of the estimate-first compress-afterwards (EC) approach along with the suboptimality of a compress-first estimate afterwards (CE) alternative. When it comes to estimation using multiple sensors, the corresponding D-R function can be bounded from below using the single-sensor D-R function achieved using the EC scheme. An alternating algorithm was also derived for determining numerically an achievable D-R upper bound in the distributed multi-sensor setup. Using this upper bound in combination with the non-achievable lower bound we obtained a tight region, where the D-R function for distributed estimation lies in. We finally developed parameter estimators for realistic signal models and derived their fundamental variance limits under severe bandwidth constraints. The latter were adhered to by quantizing each sensor's observation to one or a few bits. By jointly accounting for the unique quantizationestimation tradeoffs present, these bit(s) per sensor were first used to derive distributed maximum likelihood estimators (MLEs) for scalar meanlocation parameters in the presence of generally non-Gaussian noise when the noise pdf is completely known; subsequently, when the pdf is known except for a number of unknown parameters; and finally, when the noise pdf is unknown. The unknown pdf case was tackled through a non-parametric estimator of the unknown complementary cumulative distribution function based on quantized (binary) observations. In all three cases, the resulting estimators turned out to exhibit comparable variances that can come surprisingly close to the variance of the clairvoyant estimator which relies on unquantized observations. This happens when the SNR capturing both quantization and noise effects assumes low-to-moderate values. Analogous claims were established for practical generalizations that were pursued in the multivariate and colored noise cases for distributed estimation of vector parameters under bandwidth constraints. Therein, MLEs were formed via numerical search but the log-likelihoods were proved to be concave thus ensuring fast convergence to the unique global maximum.
REFERENCES [1] M.
[2]
B.
H. PAPADOPOULOS, Sequential signal encoding and estimation for distributed sensor networks, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, 4: 2577-2580, Salt Lake City, Utah, May 2001. BEFERULL-LoZANO, R.L. KONSBRUCK, AND M. VETTERLI, Rate-Distortion problem for physics based distributed sensing, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, 3: 913-916, Montreal, Canada, May 2004. ABDALLAH AND
DIMENSIONALITY REDUCTION, COMPRESSION AND QUANTIZATION
295
[3] T. BERGER, Rate Distortion Theory: A Mathematical Basis for Data Compression. Prentice Hall, 1971. [4] T. BERGER, Multiterminal Source Coding, in Lectures Presented at CISM Summer School on the Info. Theory Approach to Comm., July 1977. [5] D. BLATT AND A. HERO, Distributed maximum likelihood estimation for sensor networks, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, 3: 929-932, Montreal, Canada, May 2004. [6] S. BOYD AND L. VANDENBERGHE, Convex Optimization. Cambridge University Press, 2004. [7] J. CHEN, X. ZHANG, T. BERGER, AND S.B. WICKER, An Upper Bound on the Sum-Rate Distortion Function and Its Corresponding Rate Allocation Schemes for the CEO Problem, IEEE Journal on Selected Areas in Communications, pp. 406-411, August 2004. [8] T. COVER AND J. THOMAS, Elements of In forrnation Theory. John Wiley and Sons, 2nd edition ed., 1991. [9] E. ERTIN, R" MOSES, AND L. POTTER, Network parameter estimation with detection failures, in Proc. of the Intnl. Conference on Acoustics, Speech, and Signal Processing, 2: 273-276, Montreal, Canada, May 2004. [10] M. GASTPAR, P.L. DRAGGOTI, AND M. VETTERLI, The distributed Kerbunen-Loeve transform, IEEE Transactions on Information Theory, submitted Nov. 2004 (available at http://www.eecs.berkeley.edu/''-Jgastpar/). [11] J. GUBNER, Distributed Estimation and Quantization, IEEE Transactions on Information Theory, 39: 1456-1459, 1993. [12] P. ISHWAR, R. PURl, K. RAMCHADRAN, AND S. PRADHAN, On Rate-Constrained Distributed Estimation in Unreliable Sensor Networks, IEEE Journal on Selected Areas in Communications, pp. 765-775, April 2005. [13] S.M. KAY, Fundamentals of Statistical Signal Processing - Estimation Theory. Prentice Hall, 1993. [14] S. KUMAR, F. ZAO, AND D. SHEPHERD, eds., Special issue on collaborative information processing, Vol. 19 of IEEE Signal Proc. Magazine, March 2002. [15] W. LAM AND A. REIBMAN, Quantizer design for decentralized systems with communication constraints, IEEE Transactions on Communications, 41: 1602-1605, Aug. 1993. [16] Z.-Q. Luo, An isotropic universal decentralized estimation scheme for a bandwidth constrained ad hoc sensor network, IEEE Journal on Selected Areas in Communications, 23: 735-744, April 2005. [17] Z.-Q. Luo, Universal Decentralized Estimation in a Bandwidth Constrained Sensor Network, IEEE Transactions on Information Theory, 51: 2210-2219, June 2005. [18] Z.-Q. LUG, G.B. GIANNAKIS, AND S. ZHANG, Optimal linear decentralized estimation in a bandwidth constrained sensor network, in Proc. of the IntI. Symp. on Info. Theory, pp. 1441-1445, Adelaide, Australia, Sept. 4-9 2005. [19] Z.-Q. Luo AND J .-J. XIAO, Decentralized estimation in an inhomogeneous sensing environment, IEEE Transactions on Information Theory, 51: 3564 -3575, October 2005. [20] A. MAINWARING, D. CULLER, J. POLASTRE, R. SZEWCZYK, AND J. ANDERSON, Wireless sensor networks for habitat monitoring, in Proc. of the 1st ACM International Workshop on Wireless Sensor Networks and Applications, 3: 88-97, Atlanta, Georgia, 2002. [21] R.D. NOWAK, Distributed EM algorithms for density estimation and clustering in sensor networks, IEEE Transactions on Signal Processing, 51: 2245-2253, August 2002. [22] Y. OOHAMA, The Rate-Distortion Function for the Quadratic Gaussian CEO Problem, IEEE Transactions On Information Theory, pp. 1057-1070, May 1998.
296
IOANNIS D. SCHIZAS ET AL.
[231 A. PANDYA, A. KANSAL, G. POTTlE, AND M. SRIVASTAVA, Fidelity and Resource Sensitive Data Gathering, in Proc. of the 42nd Allerton Conference, Allerton, IL, September 2004. [24] H. PAPADOPOULOS, G. WORNELL, AND A. OPPENHEIM, Sequential signal encoding from noisy measurements using quantizers with dynamic bias control, IEEE Transactions on Information Theory, 47: 978-1002, 2001. [25] S.S. PRADHAN, J. KUSUMA, AND K. RAMCHANDRAN, Distributed compression in a dense microsensor network, IEEE Signal Processing Magazine, 19: 51-60, March 2002. [26] M.G. RABBAT AND R.D. NOWAK, Decentralized source localization and tracking, in Proc. of the 2004 IEEE Intnl. Conference on Acoustics, Speech, and Signal Processing, 3: 921-924, Montreal, Canada, May 2004. [27] A. RIBEIRO AND G.B. GIANNAKIS, Bandwidth-Constrained Distributed Estimation for Wireless Sensor Networks, Part I: Gaussian Case, IEEE Transactions on Signal Processing, 54: 1131-1143, March 2006. [28] A. RIBEIRO AND G.B. GIANNAKIS, Bandwidth-Constrained Distributed Estimation for Wireless Sensor Networks, Part II: Unknown pdf, IEEE Transactions on Signal Processing, 2006, to appear. [29] D. J. SAKRISON, Source encoding in the presence of random disturbance, IEEE Transactions on Information Theory, pp. 165-167, January 1968. [30] J.D. SCHIZAS, G.B. GIANNAKIS, AND N. JINDAL, Distortion-Rate Analysis for Distributed Estimation with Wireless Sensor Networks, IEEE Transactions On Information Theory, submitted December 2005 (available at http://spincom.ece.umn.edu/) . [31] J.D. SCHIZAS, G.B. GIANNAKIS, AND Z.-Q. Luo, Distributed estimation using reduced dimensionality sensor observations, IEEE Transactions on Signal Processing, submitted November 2005 (available at http://spincom.ece.umn.edu/). [32] Y. SUNG, L. TONG, AND A. SWAMI, Asymptotic locally optimal detector for largescale sensor networks under the Poisson regime, in Proc. of the International Conference on Acoustics, Speech, and Signal Processing, 2: 1077-1080, Montreal, Canada, May 2004. [33] P.K. VARSHNEY, Distributed Detection and Data Fusion. Springer-Verlag, 1997. [34] H. VrSWANATHAN AND T. BERGER, The Quadratic Gaussian CEO Problem, IEEE Transactions on Information Theory, pp. 1549-1559, September 1997. [35] J. WOLF AND J. ZIV, Transmission of noisy information to a noisy receiver with minimum distortion, IEEE Transactions on Information Theory, pp. 406-411, July 1970. [36] A. WYNER AND J. Zrv, The Rate-Distortion Function for Source Coding with Side Information at the Decoder, IEEE Trans. on Info. Theory, pp. 1-10, January 1976. [37] K. ZHANG, X.R. LI, P. ZHANG, AND H. LI, Optimal linear estimation fusionPart VI: Sensor data compression, in Proc. of the Intl. Conf. on Info. Fusion, pp. 221-228, Queensland, Australia 2003. [38) Y. ZHU, E. SONG, J. ZHOU, AND Z. You, Optimal dimensionality reduction of sensor data in multisensor estimation fusion, IEEE Transactions on Signal Processing, 53: 1631-1639, May 2005.
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL: AN AUCTION APPROACH JUN SUN* AND EYTAN MODIANO* Abstract. We study the use of auction algorithm in allocating a wireless fading channel among a set of non-cooperating users in both downlink and uplink communication scenarios. For the downlink case, we develop a novel auction-based algorithm to allow users to fairly compete for a wireless fading channel. We use the all-pay auction mechanism whereby user bid for the channel, during each time-slot, based on the fade state of the channel, and the user that makes the higher bid wins use of the channel. Under the assumption that each user has a limited budget for bidding, we show the existence of a unique Nash equilibrium strategy. We show that the strategy achieves a throughput allocation for each user that is proportional to the user's budget and establish that the aggregate throughput received by the users using the Nash equilibrium strategy is at least 3/4 of what can be obtained using an optimal centralized allocation scheme that does not take fairness into account. For the uplink case, we present a game-theoretical model of a wireless communication system with multiple competing users sharing a multiaccess fading channel. With a specified capture rule and a limited amount of energy available, a user opportunistically adjusts its transmission power based on its own channel state to maximize the user's own individual throughput. We derive an explicit form of the Nash equilibrium power allocation strategy. Furthermore, this Nash equilibrium power allocation strategy is unique under certain capture rule. We also quantify the loss of efficiency in throughput due to user's selfish behavior. Moreover, as the number of users in the system increases, the total system throughput obtained by using a Nash equilibrium strategy approaches the maximum attainable throughput. Key words. Stochastic processes, Mathematical programming/optimization. AMS(MOS) subject classifications. Primary 9lA80, 9lAlO, 93E03, 93A14.
1. Introduction. The limited bandwidth and high demand in a communication network necessitate a systematic procedure in place for fair allocation. This is where the economic theory of pricing and auction can be applied in the field of communications and networks research, for pricing and auction are natural ways to allocate resources with limited supply. Recently, in the networks area, much work is done to address the allocation of a limited resource in a complex, large scaled system such as the internet. They approach the problem from a classical economic perspective where users have utility functions and cost functions, both measured in the same monetary unit. Pricing is used as a tool to balance users' demand for bandwidth. Here, we are interested in solving a specific engineering problem of scheduling transmission among a set of users while achieving fairness in a specific wireless environment. We use game theoretical concepts such *Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139. The work was supported by NASA Space Communication Project grant number NAG3-2835. 297
298
JUN SUN AND EYTAN MODIANO
as Nash equilibrium as a tool for modelling the interaction among users. Both the objective and the constraint of the optimization problem that each user faces have physical meanings based on underlying system. Our focus in this paper will be on the use of the all-pay auction in allocating a wireless fading channel for both the uplink and the downlink. A fundamental characteristic of a wireless network is that the channel over which communication takes place is often time-varying. This variation of the channel quality is due to constructive and destructive interference between multipaths and shadowing effects (fading). In a single cell with one transmitter (base station or satellite) and multiple users communicating through time-varying fading channels, the transmitter can send data at higher rates to users with better channels. In time slotted system such as the HDR system, time slots are allocated among users according to their channel qualities. The problem of resource allocation in wireless networks has received much attention in recent years. In [1] the authors try to maximize the data throughput of an energy and time constrained transmitter sending over a fading channel. A dynamic programming formulation that leads to an optimal transmission schedule is presented. Other works address the similar problem, without consideration to fairness, include [7] and [8]. In [5], the authors consider scheduling policies for maxmin fairness allocation of bandwidth, which maximizes the allocation for the most poorly treated sessions while not wasting any network resources, in wireless ad-hoc networks. In [4], the authors designed a scheduling algorithm that achieves proportional fairness, a notion of fairness originally proposed by Kelly [6].
In [9], the authors present a slot allocation scheme that maximizes expected system performance subject to the constraint that each user gets a fixed fraction of time slots. The authors did not use a formal notion of fairness, but argue that their system can explicitly set the fraction of time assigned to each user. Hence, while each user may get to use the channel an equal fraction of the time, the resulting throughput obtained by each user may be vastly different. The following simple example illustrates the different allocations that may result from the different notions of fairness. We consider the communication system with one transmitter and two users, A and B, and the allocations that use different notions of fairness discussed in the previous paragraph. We assume that the throughput is proportional to the the channel condition. The channel coefficient, which is a quantitative measure of the channel condition ranging from 0 to 1 with 1 as the best channel condition, for user A and user B in the two time slots are (0.1, 0.2) and (0.3, 0.9) respectively. The throughput result for each individual user and for total
system under different notions of fairness constraint are given in Table 1. When there is no fairness constraint, to maximize the total system throughput would require the transmitter to allocate both time slots to user B. To achieve maxmin fair allocation, the transmitter would allocate slot one to
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL TABLE
299
1
Throughput results using different notions of fairness.
No fair constraint Maxmin fair Time fraction
Throughput
Throughput
Total
for user A
for user B
throughput
0 0.2 0.1
1.2 0.3 0.9
1.2 0.5 1.0
user B and slot two to user A, thus resulting in a total throughput of 0.5. If the transmitter wants to maximize the total throughput subject to the constraint that each user gets one time slot (i.e., the approach of [9]), the resulting allocation, denoted as time fraction fair, is to give user A slot one and user B slot two. As a result, the total throughput is 1.0. In the above example, the transmitter selects an allocation to ensure an artificially chosen notion of fairness. From Table 1, we can see that from the user's perspective, no notion is truly fair as both users want slot two. In order to resolve this conflict, we use a new approach which allows users to compete for time slots. In this way, each user is responsible for its own action and its resulting throughput. We call the fraction of bandwidth received by each user competitive fair. Using this notion of competitive fairness, the resulting throughput obtained for each user can serve as a reference point for comparing various other allocations. Moreover, the competitive fair allocation scheme can provide fundamental insight into the design of a fair scheduler that make sense. In our model, users compete for time-slots. For each time-slot, each user has a different valuation (Le., its own channel condition). And each user is only interested in getting a higher throughput for itself. Naturally, these characteristics give rise to an auction. In this paper we consider the all-pay auction mechanism. Using the all-pay auction mechanism, users submit a "bid" for the time-slot and the transmitter allocates the slot to the user that made the highest bid. Moreover, in the all-pay auction mechanism, the transmitter gets to keep the bids of all users (regardless of whether or not they win the auction). Each user is assumed to have an initial amount of money. The money possessed by each user can be viewed as fictitious money that serves as a mechanism to differentiate the QoS given to the various users. This fictitious money, in fact, could correspond to a certain QoS for which the user paid in real money. As for the solution of the slot auction game, we use the concept of Nash equilibrium, which is a set of strategies (one for each player) from which there are no profitable unilateral deviation. In the downlink communication scenario, we consider a communication system with one transmitter and two users. For each time slot, channel states are independent and identically distributed with known probability
300
JUN SUN AND EYTAN MODIANO
distribution. Each user wants to maximize its own expected total throughput subject to an average money constraint. We have the following main results for the downlink case: • We find a unique Nash equilibrium when both channel states are uniformly distributed over [0,1]. • We show that the Nash equilibrium strategy pair provides an allocation scheme that is fair in the sense that the price per unit of throughput is the same for both users. • We show that the Nash equilibrium strategy of this auction leads to an allocations at which total throughput is no worse than 3/4 of the throughput obtained by an algorithm that attempts to maximize total system throughput without a fairness constraint. • We provide an estimation algorithm that enables users to accurately estimate the amount of money possessed by their opponent so that users do not need prior knowledge of each other's money. The all-pay auction can be used to model the uplink power allocation as well. In the second part of this paper, we present a distributed uplink power allocation scheme that based on the all-pay auction. Specifically, we consider a communication system consisting of multiple users competing to access a satellite, or a base-station. Each user has an average power constraint. Time is slotted. During each time slot, each user chooses a power level for transmission based on the channel state of current slot, which is only known to itself. Depending on the capture model and the received power of that user's signal, a transmitted packet may be captured even if multiple users are transmitting at the same slot. If the objective of each user in the system is to find a power allocation strategy that maximizes its probability of getting captured based its average power constraint, we have a power allocation game that resembles the all-pay auction. Comparing with the all-pay auction, the average power constraint in the power allocation game corresponds to the average money constraint and transmission power corresponds to money. Both power and money is taken away once a bidding or a transmission is taken place. In this uplink scenario, using the technique to solve for Nash equilibrium in the all-pay auction, we get a similar Nash equilibrium strategy in the uplink game. The game theoretical formulation of the uplink power allocation problem stems from the desire for a distributive algorithm in a wireless uplink. Due to the variation of channel quality in a fading channel, one can exploit the channel variation opportunistically by allowing the user with best channel condition to transmit, which require the presence of a centralized scheduler who knows each user's channel condition. As the number of users in the network increases, the delay in conveying user's channel conditions to the scheduler will limit the system's performance. Hence, a distributive multi-access scheme with no centralized scheduler becomes an attractive alternative. However, in a distributive environment, users may want to change their communication protocols in order to improve their own per-
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
301
formance, making it impossible to ensure a particular algorithm will be adopted by all users in the network. Rather than following some mandated algorithm, in this paper users are assumed to act selfishly (i.e., choose their own power allocation strategies) to further their own individual interests. With each user wants to maximize its own expected throughput, we obtain a Nash equilibrium power allocation strategy which determines the optimal transmission power control strategy for each user. The obtained optimal power control strategy specifies how much power a user needs to use to maximize its own throughput for any possible channel state. Users get different average throughput based on their average power constraint. Hence, this transmission scheme can be viewed as mechanism for providing quality of service (QoS) differentiation; whereby users are given different energy for transmission. The obtained Nash equilibrium power allocation strategy is unique under certain capture rule. When all users have the same energy constraint, we obtained a symmetric Nash equilibrium. Due to the selfish behavior of individual users, the overall system throughput will be less than that of a system where users employ the same mandated algorithm. This loss in efficiency is also quantified. In the multiple users' case, as the number of user in the system increases, the symmetric Nash equilibrium strategy approaches the optimal algorithm specified by a system designer (i.e., algorithm that results in the largest total system throughput). In this case, there is no loss of efficiency when users employ the symmetric Nash equilibrium. Game theoretical approaches to resource allocation problems have been explored by many researchers recently (e.g., [2][19]). In [2]' the authors consider a resource allocation problem for a wireless channel, without fading, where users have different utility values for the channel. They show the existence of an equilibrium pricing scheme where the transmitter attempts to maximize its revenue and the users attempt to maximize their individual utilities. In [19]' the authors explore the properties of a congestion game where users of a congested resource anticipate the effect of their action on the price of the resource. Again, the work of [19] focuses on a wireline channel without the notion of wireless fading. Our work attempts to apply game theory to the allocation of a wireless fading channel. In particular, we show that auction algorithms are well suited for achieving fair allocation in this environment. Other papers dealing with the application of game theory to resource allocation problems include [3] [23] [24]. This paper is organized as follows. In Section 2, we describe the donwlink communication system and the Nash equilibrium bidding strategy. Section 2.1 presents the problem formulation for the downlink case. In Section 2.2, the unique Nash equilibrium strategy pair and the resulting throughput for each user are provided for the case that each user can use only one bidding function. In Section 2.3, we show the unique Nash equilibrium strategy pair for the case that each user can use multiple bidding functions. In Section 2.4, we compare the throughput results of the Nash
302
JUN SUN AND EYTAN MODIANO
equilibrium strategy with two other centralized allocation algorithms. In Section 2.5, an estimation algorithm that enables the users to estimate the amount of money possessed by their opponent is developed. Section 3 presents the Nash equilibrium power allocation function for a uplink random access system. Section 3.1 describe the uplink communication scenario. In Section 3.2, the Nash equilibrium power allocation strategy is obtained for the two users case. In Section 3.3, we present a symmetric Nash equilibrium power allocation function for multiple users with the same average power constraint. Finally, Section 4 concludes the paper.
2. Downlink transmission. 2.1. Downlink problem formulation. We consider a communication environment with a single transmitter sending data to two users over two different fading channels. We assume that there is always data to be sent to the users. Time is assumed to be discrete, and the channel state for a given channel changes according to a known probabilistic model independently over time. The two channels are also assumed to be independent of each other. The transmitter can transmit to only one user during a particular slot with a constant power P. The channel fade state thus determines the throughput that can be obtained. For a given power level, we assume for simplicity that the throughput is a linear function of the channel state. This can be justified by the Shannon capacity at low signal-to-noise ratio, or by using a fixed modulation scheme [1]. For general throughput function, the method used in this paper applies as well. Let Xi be a random variable denoting the channel state for the channel between the transmitter and user i, i == 1,2. When transmitting to user i, the throughput will then be P . Xi. Without loss of generality, we assume P == 1 throughout this paper. We now describe the all-pay auction rule used in this paper. Let Q' and {3 be the average amount of money available to user 1 and user 2 respectively during each time slot. We assume that the values of Q' and {3 are known to both users. Both users know the distribution of Xl and X 2 . We also assume that the exact value of the channel state Xi is revealed to user i only at the beginning of each time slot. During each time slot, the following actions take place: 1. Each user submits a bid according to the channel condition revealed to it. 2. The transmitter chooses the one with higher bid to transmit. 3. Once a bid is submitted by the user, it is taken by the transmitter regardless of whether the user gets the slot or not, i.e., no refund for the one who loses the bid. The formulation of our auction is different from the type of auction used in economic theory in several ways. First, we look at a case where the number of object in the auction goes to infinity. While in the current auction research, the number of object is finite [20][21][22]. Second, in our
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
303
auction formulation, the money used for bidding does not have a direct connection with the value of the time slot. Money is merely a tool for users to compete for time slots, and it has no value after the auction. Therefore, it is desirable for each user to spend all of its money. However, in auction theory, an object's value is measured in the same unit as the money used in the bidding process, hence their objective is to maximize the difference between the object's value and its cost. Lastly, in our formulation, the valuation of each commodity (time-slot) changes due to the fading channel model; a notion that is not common in economic theory. Besides the all-pay auction, first-price auction and second-price auction are two other commonly used auction mechanisms. In the first-price auction, each bidder submits a single bid without seeing the others' bids, and the object is sold to the bidder who makes the highest bid. The winner pays its bid. In the second price auction, each user independently submits a single bid without seeing the others' bids, and the object is sold to the bidder who makes the highest bid. However, the price it pays is the second-highest bidder's bid [20]. We choose to use the all-pay auction in this paper to illustrate the auction approach to resource allocation in wireless networks. We believe that other auction mechanisms can be similarly applied and their application to the wireless channel allocation problem is a direction for future research. The objective for each user is to design a bidding strategy, which specifies how a user will act in every possible distinguishable circumstance, to maximize its own expected throughput per time slot subject to the expected or average money constraint. Once a user, say user 1, chooses a function, say i), for its strategy in the ith slot, it bids an amount of
fi
money equal to iii) (x) when it sees its channel condition in the ith slot is Xl == x. Formally, let F1 and F2 be the set of continuous and bounded realvalued functions with finite first and second derivative over the support of Xl and X 2 respectively. Then, the strategy space for user 1, say 51, and user 2, say 52, are defined as follows: 81 =
{f~1),··· ,f~n) EF11 ~tE[jIi)(X1)] =a} i=l
(2.1)
For each user, a strategy is a sequence of bidding functions i'", ... .i'". Without loss of generality, we restrict each user to have n different bidding functions, where n can be chosen as an arbitrarily large number. Note that users choose a strategy for a block of n time slots instead of just for a single time slot, one bidding function for each slot. In order to maximize the overall throughput (over infinite horizon), each user chooses bidding
304
JUN SUN AND EYTAN MODIANO
functions to maximize the expected total throughput over this block of n i)(X slots. The term E[fi 1 ) ] denotes the expected amount of money spent by user 1 if it uses bidding function fii) for the ith slot in the block. We first consider a special class of strategies in which each user can use only a single bidding function. More specifically, by setting ... == fin) and f2 == f~l) ~ ... == we have the following:
r:
{!l E F IE[!l(XdJ = Q} 52.= {h E F21 E[h(X2)] = ,6}. 51
=
1
fl == fi 1) ==
(2.2)
By considering first the set of strategies in 81 and 82 , we are able to find the Nash equilibrium strategy pair within the set 8 1 and 8 2 . Given a strategy pair (fl, f2), where 11 E 51 and 12 E 52, the expected throughput or payoff function for user 1 is defined as the following assuming the constant power P == 1:
(2.3) where
Similarly, the throughput function for user 2 assuming P == 1:
(2.4) Throughout this paper, for simplicity, we let the channel state Xi be uniformly distributed over [0, 1]. However, our approach can be extended to the case where the channel state has a general distribution. Due to space limitations, we omit the more complex analysis for general channel state distribution.
2.2. Unique Nash equilibrium strategy with a single bidding function. We present in this section a unique Nash equilibrium strategy pair (f{, f2)' A strategy pair (f{, f 2 ) is said to be in Nash equilibrium if fi is the best response for user 1 to user 2's strategy f 2, and f 2 is the best response for user 2 to user 1's strategy fi. We consider here the case where both users choose their strategies from the strategy space 51 and 52 (Le., the single bidding function strategy) and the value of Q and /3 are known to both users. To get the Nash equilibrium strategy pair, we first argue that an equilibrium bidding function must be nondecreasing. To see this, consider an arbitrary bidding function f such that f(a) > f(b) for some a < b. If user 1 chooses f as its bidding function, user 1 will be better off if it bids f(b)
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
305
when the channel state is a and f(a) when the channel state is b. This way, its odds of winning the slot when the channel state is b, which is more valuable to it, will be higher than before, and it has an incentive to change its strategy (i.e., I is not an equilibrium strategy). Hence, we conclude that, for each user, an equilibrium bidding function must be nondecreasing. We further restrict users' bidding functions to be strictly increasing for technical reason which will be explained later. There is no loss of generality in this assumption because any continuous, bounded, nondecreasing function can be approximated by a strictly increasing function ar bitrarily closely. Next, we show some useful properties associated with the equilibrium strategy pair (fi, f5)· Lemma 1. If (fi,f5) is a Nash equilibrium strategy pair, li(l) ==
f5(1). Proof Suppose fi(l) #- 15(1). Without loss of generality, let assume that fi(l) > 12(1). Since both Ii and f5 are continuous, there exists J > 0 such that li(x) > 15(1) + ft(I);f 2(1) \Ix E [1 - J,l]. User 1 can devise
a new bidding strategy, say 11, by moving a small amount of money, say J. f j(1);f2(1), away from the interval [1- J, 1] to some other interval, thus resulting in an increase in user l's throughput. Therefore, when fi(l) > 12(1), the bidding strategy pair (fi, f2) cannot be in equilibrium since the strategy pair (11,15) gives a higher throughput for user 1. Similar result holds for the case 15(1) > li(l). Thus, we must have li(l) == 12(1) if (Ii, f 2) is an equilibrium strategy pair. 0 We have just established that fi(l) == f 2(1) is a necessary condition for (fi, f 2) to be an equilibrium strategy pair. We also find that fi(O) == 12(0) == 0 since it does not make sense to bid for a slot with zero channel state. Thus, from now on, to find the Nash equilibrium strategy pair (Ii, 12), we will consider only the function pair 11 E 51 and 12 E 52 that are strictly increasing and satisfying the above two boundary conditions (i.e., 11 (1) == 12 (1) and 11 (0) == 12(0) == 0). These two boundary conditions, together with strictly increasing property of fl E 31 and f2 E 52, make the inverse of fl and f2 well defined. Thus, we are able to define the following terms. With user 2's strategy 12 fixed, let gj~) : (Xl, b) -7 R denote user l's expected throughput of a slot conditioning on the following events: • User 1's channel state is Xl == Xl . • User 1's bid is b. Specifically, we can the write the equation:
(2.5) where P(/2(X2) < b) is the probability that user 1 wins the time slot.
306
JUN SUN AND EYTAN MODIANO
Consequently, using a strategy
11,
user 1's throughput is given by:
(2.6)
where the last equality results from the uniform distribution assumption. With user 1's strategy 11 fixed, similar terms for user 2 can be defined.
Then, user 2's throughput is given by:
(2.7)
Due to the uniformly distributed channel state, P(/2(X 2 )
where
iiI
~
b) is given by
is well defined. Thus, we can rewrite Eq. (2.5) as
Hence we have,
1 1 1
G 1(a , (3)
=
Xl' f;1(h(xd) dx1
(2.8)
1
G2(a , (3) =
X2' f 1
1(!2(x2))
dx2.
(2.9)
The following lemma gives a necessary and sufficient condition of a • (Xl ,b) I ilib num . . rD or convenience, N as h equiu strategy pair. we d enote ag}~)ab b=b* (i.e., the marginal gain at b == b*) as Dg>~) (Xl, b*). Lemma 2. A strategy pair 12) is a Nash equilibrium strategy pair
if and only if Dg>~)(Xl,fi(xd) constants
C1
and
C2,
for all
Xl
in.
and DgW(x 2,f5.(x 2)) = C2, for some E [0,1] and all X2 E [0,1]. =
Cl
To understand the lemma intuitively, suppose there exists X =I- x such that D9W (x, fi(x)) > Dg>~) (x, fi(x)). Reducing the bid at x to fi(x) - a and increasing the bid at X to Ii (x) + <5 will result in an increase in the throughput by (Dg>~) (x, fi(x)) - Dg>~) (x, fi(x))) · a. Thus, user 1 has an
307
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
incentive to change its bidding function, and (fi, f 2) cannot be a Nash equilibrium strategy pair in this case. Proof The complete proof is given in the Appendix. 0 With Lemma 2, we are able to find the unique Nash equilibrium strategy pair. The exact form of the equilibrium bidding strategies are presented in the following Theorem. Theorem 1. Under the assumption of a single bidding function, the
following is a unique Nash equilibrium strategy pair for the auction: f;(x) == c x,+l
(2.10)
f 2* (X') ==c·x"Y.1 +
(2.11 )
1
where the constant! and c are chosen such that
1 1c.x~+1dx=(3. 1
c . xl'+l dx = a
(2.12)
1
(2.13)
Equations (2.12) and (2.13) impose the average money constraints. Fig. 1 shows an example of the Nash equilibrium bidding strategy pair when a == 1 and j3 == 2. Since user 1 has less money than user 2, user 1 concentrates its bidding on time slots with very good channel state. Proof We show here that fi(x) == c . x,+l and f 2(x) == c . x~+l is indeed a Nash equilibrium strategy pair by using the sufficiency condition of Lemma 2, and we leave the uniqueness part to the appendix. It is easy to check that both the condition li(l) == 12(1) and li(O) == 12(0) are satisfied. Since both functions are strictly increasing, we can write g}}) (x, b) = x- I;-l(b) and g}~\x, b) = x· I;-l(b). Also, since both Ii and
12 are differentiable, we have gW(x,b) and gW(x,b) both differentiable with respect to b. Therefore, (1)
8 9f:;(x,b)1 8b
x b=fi(x)
x
!
== 12'(12- 1(1;(x))) == 12'(xr) == c(1 + ,)'
Similarly, (2)
8g f i (x,b 8b
)
I b=f:;(x)
x == li'(li- (! 2(X)))
From Lemma 2, we see that
I
x
Ii' (x l / , )
1 c(1 + ,) .
u: 12) is indeed a Nash equilibrium strategy
pair because both D9J}) (x, Ii(x)) and Dg~~)(x,I2(x)) are constants. The proof of uniqueness of (fi, f 2) is given in the appendix. 0 Fig. 2 shows the resulting allocation scheme when both users employ the Nash equilibrium strategy shown in Fig. 1. Above the curve, time slots
308
JUN SUN AND EYTAN MODIANO
Bidding function for user 1 with
a
5
~
=1 and ~ =2
Bidding function for user 2 with
4.5
4.5
4
4
3.5
3.5
3
~
e
e
c
c
=1 and ~ =2
3
~2.5
~2.5
0
0
E
a
5,...-------r-------,
E
2
2
1.5
1.5
0.5
0.5
o o
L-_...-£:==--_-L.-
0.5 channel coefficient
-.l
0.5 channel coefficient
FIG. 1. An example of Nash equilibrium strategy pair for a
== 1 and {3 == 2.
will be allocated to user 2 since user 2's bid is higher than user 1 's in this region. Similarly, user 1 gets the slots below the curve. Here, user 2 is allocated more slots than user 1 since it has more money. If both players use Nash equilibrium strategies, the expected throughput obtained are given by: (2.14) (2.15 )
As can be seen, the ratio of the throughput obtained g~i::~\ is equal to ~ which is the ratio of the money each user had initially. Thus, the Nash equilibrium strategy pair provides an allocation scheme that is fair in the sense that the price per unit of throughput is the same for both users. 2.3. Unique Nash equilibrium strategy with multiple bidding functions. In the previous section, we restricted the strategy space of each user to be a single bidding function (i.e., 81 and 82 ) instead of a sequence of bidding functions (i.e., 51 and 52)' However, the money constraint imposed upon each user is a long term average money constraint. A natural question
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL Result of the bid with a
309
= 1 and ~ =2
0.9 C\I
0.8
slotsassigned to user2
~:::J 0.7 ~c
0.6
~
0.5
Q)
80.4
Q;
c ~ 0.3
J::
o
0.2
0.1 00
0.2
0.4
0.6
0.8
channel coefficient of user 1 FIG.
2. Allocation scheme from Nash equilibrium strategy pair for a == 1 and j3 == 2.
to ask is the following: Is it profitable for an individual user to change its bidding functions over time while satisfying the long term average money constraint? Therefore, in this section, we allow the users to use a strategy within a broader class of strategy space, 3 1 and 8 2 , and explore whether there is an incentive for a user to do so (i.e., whether there exists a Nash equilibrium strategy so that it can increase its throughput). To choose a strategy (i.e., a sequence of bidding functions) from the strategy space 8 1 or 52, a user encounters two problems. First, it must decide how to allocate its money among these n bidding functions so that the average money constraint is still satisfied. Second, once the money allocated to the ith bidding function is specified, a user has to choose a bidding function for the ith slot. The second problem is already solved in the previous section (see Theorem 1). In this section, we will focus on the first problem that a user encounters, specifically, the problem of how to allocate money between the bidding functions while satisfying the following condition: The total expected amount of money for the sequence of n bidding functions is n . (X for user 1 and n . {3 for user 2. More precisely, the strategy space or possible actions that can be taken by users are the following:
S\
= {aI,
52 == {f31,
I al
+
+ Q'n == n
,f3n I 131
+
+ fJn
,Q'n
. (X}
== n . f3}.
The objective of each user is still to maximize its own throughput. When user 1 and user 2 allocate (Xi and fJi for their ith bidding function which is given in Theorem 1, the payoff functions are G 1 ( ai, (3i) for user 1 and G2 (a i, {3i ) for user 2.
310
JUN SUN AND EYTAN MODIANO
The following lemma gives us a Nash equilibrium strategy pair for the auction game described in this section. Lemma 3. Given that user 2's strategy is to allocate its money evenly among its bidding functions (i.e., !3i == !3, i == 1 ... n), user l's best response is to allocate its money evenly as well (i. e., ai == a, i == 1 ... n ); and vice versa. Therefore, a Nash equilibrium strategy pair for this auction is for both users to allocate their money evenly. Proof Without loss of generality, we consider the case that n == 2
where each user's strategy can consist of two different bidding functions. Suppose that user 2 allocates !3 for both bidding functions fJl) and fJ2), I ) ando- for bidding function and user 1 allocates al for bidding function 2 ) where al +a2 == 2a and al =I- a2. We now show that the throughput for user 1, G 1(a l , {3) +G 1(a 2, {3 ), is maximized when a1 == a2 == a. Consider the function G 1 (aI, 13) with (3 fixed. The equation
fi
fi
becomes
t
F (t)
== -1+-t+-.-ylr=(1=-==t:::=::;:)2=+=t
7'
where t == F(t) is concave for t 2: o. Thus, we have G 1(a 1,{3 ) + G1(a 2, {3) maximized when al == a2 == a. 0 We have already obtained a Nash equilibrium strategy pair from the above Lemma. The following theorem states that this Nash equilibrium strategy pair is in fact unique within the strategy space considered. Theorem 2. For the auction in this section, a unique Nash equilibrium strategy for both users is to allocate their money evenly among the bidding functions. Proof The complete proof is in the Appendix. 0
In this section, users are given more freedom in choosing their strategies (i.e., they can choose n different bidding functions). However, as Theorem 2 shows, the unique Nash equilibrium strategy pair is for each user to use a single bidding function from its strategy space. Thus, the throughput result obtained in this broader strategy space-51 and 5 2-is the same as the throughput result from previous section. Therefore, there is no incentive for a user to use different bidding functions. 2.4. Comparison with other allocation schemes. To this end, we have a unique Nash equilibrium strategy pair and the resulting throughput when both players choose to use the Nash equilibrium strategy. Inevitably, due to the fairness constraint, total system throughput will decrease as compared to the maximum throughput attainable without any fairness constraint. Hence we would like to compare the total throughput of the Nash
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
311
equilibrium strategy to that of an unconstrained strategy. We address this question by first considering an allocation scheme that maximizes total throughput subject to no constraint. Then, we investigate the throughput of another centralized allocation scheme that maximize the total throughput subject to the constraint that the resulting throughput of individual user is kept at certain ratio.
2.4.1. Maximizing throughput with no constraint. To maximize throughput without any constraints, the transmitter sends data to the user with a better channel state during each time slot. Then the expected throughput is E[max{X I , X 2 } ] . Since Xl and X 2 are independent uniformly distributed in [0,1], we have E[max{X I , X 2 } ] == ~. Using the Nash equilibrium playing strategy, the total expected system throughput, G I (a , (3) + G 2 (a,(3 ), is ~ in the worst case (i.e., one users gets all of the time slots while the other user is starving). Thus, the channel allocation scheme proposed in this paper can achieve at least 75 percent of the maximum attainable throughput. This gives us a lower bound of the
throughput performance of the allocation scheme derived from the Nash equilibrium pair.
2.4.2. Maximizing throughput with a constant throughput ratio constraint. Now, we investigate an allocation scheme with a fairness constraint that requires the resulting throughput of the users to be kept at a constant ratio. Specifically, let G 1 and G 2 denote the expected throughput for user 1 and user 2 respectively. We have the following optimization problem: (2.16) where a is a positive real number. The resulting optimal allocation scheme for the above problem is of the form shown in Fig. 3. The space spanned by Xl and X 2 is divided into two regions by the separation line X 2 == C . X I, where c is some positive real number. Above the line (i.e., X 2 > c- Xl), the transmitter will assign the slot to user 2. Below the line (i.e., X 2 < C . Xl), the transmitter will assign the slot to user 1. To prove the above, we use a method that is similar to the one in [9]. Specifically, let A : (X I, X 2) --7 {I, 2} be an allocation scheme that maps a slot, in which channel states are X I and X 2 to either user 1 or user 2. By using an allocation scheme A, the resulting throughput for user 1 and user 2 are Gt == E[X I ·I A (x 1 ,X 2 )= I ] and Gt == E[X 2 ·l A (x 1 ,X 2)= 2 ] respectively. Now, we define an allocation scheme as follows:
312
JUN SUN AND EYTAN MODIANO user 2
o
user 1
FIG. 3. The optimal allocation scheme to achieve constant throughput ratio fairness.
G1
where A* is chosen such that * / Gt * == a is satisfied. It is straightforward to verify that such A* exists. Consider an arbitrary allocation scheme A that satisfies Gt IGt == a. We have
+ E[X 2 . 1A ( X ,X 2 )= 2 ] ,X 2 ) = 1] + E[X 2 ·1 A ( X ,X 2 )= 2 ]
E[X 1 . 1A ( X 1 ,X 2 )= 1]
== E[X 1 . 1A ( X 1
1
1
+A*(E[X1 . 1A ( X 1 ,X 2 )= 1 ] - aE[X2 ·1 A (x t ,X 2 )= 2 ] ) E[(X I +,,\* Xl) .l A ( X 1 ,X 2 )= I ] +E[(X2 -a"\* X 2 ) .l A ( X 1 ,X 2 )= 2 ] ::; E[(X 1 +A* Xl) ·1 A * ( X 1 ,x 2 )=d + E[(X 2 -aA* X 2 ) ·1 A * ( X 1 ,X2)=2]
==
(2.17)
== E[X 1 . 1A * ( X 1 ,X 2) = 1] + E[X 2 ·l A * ( X 1 ,X 2)= 2 ] +A*(E[X 1 . 1A * ( X 1 ,x 2 ) = d - aE[X2 . 1A * ( X 1 ,X 2 )= 2 ] ) == E[X 1 . 1A * ( X 1 ,X 2 )= 1 ] + E[X 2 ·l A * ( X t ,X 2 ) = 2 ] ' The inequality in the middle is from the definition of A *. Specifically, if we were asked to choose an allocation scheme A to maximize E[(X 1 + A* Xl) . 1A ( X 1 ,X 2 )= 1 ] + E[(X 2 - aA* X 2 ) . 1A ( X 1 ,X 2 )= 2 ] ' Then, A* will be an optimal scheme from its definition. Thus, we are able to show that A*(X 1 , X 2 ) is an optimal solution to the optimization problem in (2.16). To find the slope c in Fig. 3, we first write the throughput for each user:
(2.18) and
(2.19)
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL Throughput comparison for user 1 who has less money
Throughput comparison for user 2 who has more money
0.35
1 / - - - - . -.. -········
0.5
I
.,
II
0.3
313
0.48 II J
if
I ·1
0.46
i
0.25
0.44 :
"S 0.2 a. .
~0.42
t~ \
I !
s: C) ::::s
e
~
0.15 \
0.4:
\
\
0.38
\ 0.1
\ \
I
\
0.05
\
0.36 .
-.---_._ __ _.
..
0.34
._.
Ol.....-_...L..-_~_-..L..._~-----'
o
20 40 60 80 100 ratio of user 2 money to user 1 money
0.32 L..--_~_~_-.-.L..-_--"-_-----' 20 40 60 80 100 ratio of user 2 money to user 1 money
o
FIG. 4. Throughput result comparison for both users.
Since et jet == a, we get c == -1+~. Using the Nash equilibrium strategy pair, the ratio of the resulting
throughput pair g~i::~j is the same as the ratio of money individual user possess (*). For the optimization problem described in (2.16), by setting a == aj {3, we compare the resulting throughput with the throughput obtained when both users employ the Nash equilibrium strategy. Fig. 4 shows the comparison. For both users, the Nash equilibrium throughput result is very close to the throughput obtained by solving the constrained optimization problem (within 97 percent to be precise). 3. Uplink transmission.
3.1. Uplink problem formulation. The uplink communication environment that we consider here consists of multiple users who are sending data to a single base station or satellite over multiple fading channels. We assume that each user always has data to be sent to the base station. Time is assumed to be discrete, and the channel state for a given user changes according to a known probabilistic model independently over time. The channels between the users and the base station are assumed to be independent of each other. Let Xi be a random variable denoting the channel state for the channel between user i and the base station.
314
JUN SUN AND EYTAN MODIANO
When multiple users are transmitting during the same time slot, it is still possible for the receiver to capture one (or more) user's data. The capture model can be described as a mapping from the received power of the transmitting users to the set {l,' .. ,n, O}, where 0 indicates no packet is successfully received. In this paper, we are going to investigate two capture models which will be presented in the later sections. We assume that each individual user is energy constrained. Specifically, each user i has an average amount of energy e; available to itself during each time slot. We assume that the e, values are known to all users, and that users know the distribution of Xi's. However, the exact value of the channel state Xi is known to user i only at the beginning of each time slot. With a given capture model and the energy constraint, the objective for each user is to design a power allocation strategy to maximize its own expected throughput (or probability of success) per time slot subject to the expected or average power constraint. The power allocation strategy will specify how a user will allocate power in every time slot upon observing its channel state. Under power allocation strategy gi('), user i transmits a packet with power equal to gi(X) when it sees its channel condition in this time slot is Xi == x. The received power at the base station is denoted as Ji(X) == x· 9i(X), Formally, let F, be the set of continuous and bounded real-valued functions with finite first and second derivative over the support of Xi. Then, the strategy space for user i (the set of all possible power allocation schemes), say Si, is defined as follows: (3.1)
3.2. Two users case. We start by investigating users' strategies in a communication system consisting of exactly two users and one base station. The analytical method used in this section will help us in obtaining equilibrium power allocation scheme in the multiple users case. We begin our analysis with the assumption that channel state Xi is uniformly distributed over [0,1] for all i. The Nash equilibrium power allocation strategy with general channel state distribution is presented in the subsequent section. Suppose user 1 and user 2 choose their power allocation strategies to be 91 and 92 respectively. Given a time slot with channel state realization (Xl, X2), user 1 and user 2 will transmit their packets using power levels 91 (Xl) and 92 (X2) respectively. The corresponding received power at the base station are fl(Xl) == Xl . gl(Xl) and f2(X2) == X2 . g2(X2). As in [12] and [13], the capture model used in this section is the following: if [Xl,gl(Xl)]/[X2'92(X2)] 2:: K where K 2:: 1, user 1's packet will be captured. Likewise, user 2's packet will be captured if [X2 . g2(X2))/(Xl . gl(X1)] 2: K. Thus, given a power allocation strategy pair (91,92), where 91 E 51 and 92 E 52, the expected throughput for user 1 is defined as the following:
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
315
where
Similarly, the throughput function for user 2: (3.3)
3.2.1. Nash equilibrium strategy. In this part, we present a Nash equilibrium power allocation strategy pair (gt, g2)' The derivation of the Nash equilibrium is similar to the derivation of the Nash equilibrium in the all-pay auction part. We consider here the case where both users choose their strategies from the strategy space 51 and 52 and the value of el and e2 are known to both users. To get the Nash equilibrium strategy pair, we first argue that at equilibrium the received power function It(Xi) must be strictly increasing in Xi. Lemma 4. Given a Nash equilibrium power allocation strategy pair (gi, g2) and its corresponding received power function (fi, f2), the received power function Ii (Xl) must be strictly increasing in Xl· Similarly, 12 (X2) must be strictly increasing in X2. Proof For an arbitrary received power function I which is not strictly increasing, we can always find another received power function that will result in a larger throughput gain. To see this, consider time slots with channel state in the small intervals (a - <5, a + <5) and (b - <5, b + <5) where a < b. When <5 is small, the received power function is close to I (a) for time slots in the interval (a - <5, a + <5). Likewise, the received power function is close to f (b) for time slots in the interval (b - <5, b + <5). For received power function f such that f(a) == a·g(a) > f(b) == b·g(b) for some a < b. The total amount of transmission power used in time slots with channel state in the two intervals is given by: [g(a)
+ g(b)]20 = [f~a) + f~b)120.
Now, if user 1 employs a new power allocation strategy 9 such that g(b) == a f ) and g(a) == f~b), user 1 will achieve the same expected throughput as before. However, the amount of power used [g(b) + g(a)]2<5 is less than [g(a) + g(b)]2<5, and the extra power can be used to get higher throughput. Hence, both equilibrium received power function li(XI) and 1;;(X2) must 0 be strictly increasing in Xl and X2 respectively. With one user's power allocation strategy, say g2, fixed, we now seek the optimal power allocation scheme for user 1. From Lemma 4, we see that the inverse of 11 and 12 are well defined. With user 2's strategy g2 fixed, let u~~) : (xI,b) - t n denote user 1's expected throughput of a slot conditioning on the following events:
1
316
JUN SUN AND EYTAN MODIANO
• User 1's channel state is Xl == Xl . • User 1's allocated power is b. For convenience, we will drop the term 92 in the expression u~;) (Xl, b), and simply write it as UI (Xl, b). Specifically, we can the write the equation: (3.4) where P(/2(X 2 )·K ::S Xl ·b) is the probability that user 1's packet gets captured in a time slot. Consequently, using a strategy 91, user 1's throughput is given by:
1 =1 1
G 1(e1, e2) =
U1(Xl,gl(xd)· PX 1 (Xl) dX1
(3.5)
1
U1 (Xl, gl (Xl)) dx 1
where the last equality results from the uniform distribution assumption. With user 1's strategy 91 fixed, similar terms for user 2 can be defined.
Then, user 2's throughput is given by:
1 1 1
G 2(e1, e2) =
U2(X2, g2(X2)) . PX 2(X2) dX2 (3.6)
1
=
U2(X2, g2(X2)) dX2
Due to the uniformly distributed channel state, P(/2(X2) . K is given by
where
1;;1
:s;
Xl .
b)
is well defined. Thus, we can rewrite Eq. (3.4) as ul (Xl ,
b) :::::
12-1 (K1 Xl . b).
Hence we have,
(3.7) (3.8) We begin our analysis of the Nash equilibrium strategy pair by first considering the power allocation on the boundary points 0 and 1. For
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
317
a pair of power allocation functions (9i, 92) to be a Nash equilibrium, it is straightforward to see that 9i (0) == 92 (0) == 0 since it does not make sense to allocate power for a slot with zero channel state. Likewise, we must have 9i (1) :S K . 92 (1) and 92 (1) < K . 9i (1) since allocating power 91(1) == K92(1) or 91(1) == K92(1) + E, where E > 0, will result in the same throughput for user 1. We call these properties the boundary conditions of a Nash equilibrium strategy pair. With the boundary conditions satisfied, the following lemma gives a necessary and sufficient condition for a pair of power allocation strategies to be a Nash equilibrium strategy pair. For convenience, we denote the marginal gain for user 1 when Xl == Xl and the allocated power b == b" as
Lemma 5. Given a power allocation strategy pair (9i, g2) that satisfies the boundary conditions, (gi, 92) is a Nash equilibrium strategy pair if and only if DUI(Xl, gi (Xl)) == Cl and DU2 (X2' 92 (X2)) == C2, for some constants CI and C2, for all Xl E [0,1] and all X2 E [0,1]. Note that the above lemma does not depend on the assumption of the uniformly distributed channel state. Thus, it is quite general and will be used in the subsequent section where channel states are not uniformly distributed. The proof is similar to the proof of Lemma 2. With Lemma 5, we are able to find the Nash equilibrium strategy pair. The exact form of the equilibrium power allocation strategies are presented in the following Theorem. Theorem 3. Given the average power constraint el and e2, the Nash equilibrium power allocation strategy pair has the following form:
where the constants
Cl, C2
9;(X) ==
Cl .
g~ (x)
C2 . X
==
x'
(3.9)
~
(3.10)
and, are chosen such that
1 1
C1 .
x"l dx =
e1
l1c2,x~dx=e2'
(3.11 )
(3.12)
Equations (3.11) and (3.12) impose the average power constraints. The proof of the above theorem is similar to the proof of Theorem 1. From the above theorem, we see that Equations (3.9) and (3.10) specify the Nash equilibrium power allocation strategy pair. Since there are two equations with three unknowns, the resulting Nash equilibrium may not be unique in general. However, if a packet with stronger received power can
318
JUN SUN AND EYTAN MO·DIANO
Powerallocation strategyfor user 1 with average power = Powerallocation strategyfor user 2 with averagepower = 2
3
3.----~---.-------.-----,-----
2.5
2.5
2
2
Qi ~ 1.5
a;
Q.
Q.
~ 1.5
0.5
0.5
o
O"'-~...L.--~----I..-_~--'
o
0.2
0.4
0.6
L--_....L.--_--'--_-...l-_~_
o
0.8
channelcoefficient
FIG.
0.2
0.4
0.6
__'
0.8
channelcoefficient
5. An example of Nash equilibrium strategy pair for
el
= 1 and
e2
= 2.
always be captured (i.e., K == 1), the Nash equilibrium power allocation strategy is unique. Corollary 1. For K == 1, the unique Nash equilibrium power allocation pair has the following form:
g; (x) == c .
XT ,
g; (x) == c . x ~
(3.13)
where the constants c and, are chosen such that the average power constraints are satisfied. Fig. 5 shows an example of the Nash equilibrium power allocation strategy pair when el == 1 and e2 == 2. Since user 1 has less average power than user 2, user 1 concentrates its power on time slots with very good channel state. Fig. 6 shows the capture result when both users employ the Nash equilibrium strategy shown in Fig. 5. For a time slot with channel state realization that fall into the region above the curve, user 2'8 packet will be successfully captured since user 2's received power is higher than that of user 1 in this region. Here, user 2 has more successful transmissions than user 1 since it has more power.
3.2.2. General channel state distribution. In this section, we specify the conditions that a general channel state distribution has to satisfy in order for a Nash equilibrium strategy pair to exist.
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
Result of the throughput with e1 =1 and e2
319
=2
0.9
NO.8 Q) ~0.7
'0 'E0.6 Q)
~0.5 Q)
80.4 Q)
§0.3 ctJ
-50.2
0.1 00
0.2
0.4
0.6
0.8
channel coefficient of user 1 FIG. 6. Results obtained when using the Nash equilibrium strategy pair for e 1 == 1 and e2 == 2.
From Lemma 4, one can see that 11 and 12 have to be increasing functions regardless of the distribution of the X/so Let PXi (.) denote the probability density function of Xi with the support over an interval starting at zero. Assuming K == 1, the probability that user 1's packet will be captured in a time slot with X I == Xl and 91 (Xl) == b can be written as the following:
Ul(Xl, b) == P(12(X 2)
r>:
= io
< Xl . b) == P(X2 ~ l;l(Xl . b)) (3.14) PX2 (X2) dx 2.
From the optimality condition stated in Lemma 2, we have DUI (Xl, b) == where Cl is some constant. This condition can be expanded as follows:
8UI(Xl,b) _ (f-l(x. b)) Xl 8b - PX2 2 1 f~(f2l(Xlb)) ==
CI·
CI
(3.15)
Now, let's focus on finding a symmetric Nash equilibrium power allocation strategy. Substituting b == 91 (Xl), the term !;l (Xl . b) is equal to I;l(!l(Xl)) == Xl since 11 == 12. Thus, Eq. (3.15) can be reduced to the following: (3.16)
The above equation provides a condition on the distribution of the Xi such that there exists a Nash equilibrium power allocation scheme. The condition can be restated as the following:
320
JUN SUN AND EYTAN MODIANO
(3.17)
From the above condition, for example, we see that if PX 2 (.) is a strictly increasing polynomial, there exist a Nash equilibrium power allocation strategy.
3.3. Multiple users equilibrium strategies. In this section, we explore the Nash equilibrium power allocation strategies when n users are competing to access the single base station. User i's power allocation function is denoted as gi(.). Given a time slot with channel state real== (Xl,'" ,xn ), the transmitting power for each user is gi(Xi)' ization The corresponding received power at the base station is again denoted as fi (Xi) == Xi .gi(Xi)' The new capture rule used in this section is given as the following: a packet from user 1 will be successfully received if the following holds:
x
fl (Xl) ~ (1 +~) max(f2(x2),'" ,fn(xn)). Similar capture model can be found in [15] (Le., protocol model). The quantity ~ models situations where a guard zone is specified to prevent interference. Note also that the capture rule used in the two users' case can be viewed as a special case the above capture rule. We start with each user facing the same average power constraint (i.e., el == e2 == ... == en). Since users are identical, it is reasonable to seek a symmetric Nash equilibrium power allocation strategy. Specifically, the set of strategies (91 ::=: 9,'" ,gn == g) is said to be a symmetric Nash equilibrium strategies if gi == 9 is the best power allocation strategy for user i when all other users are also employing the power allocation strategy g. For a power allocation function 9 to be a symmetric Nash equilibrium strategy, f(x) == xg(x) must be a strictly increasing function using a similar argument as in the two users case. The following theorem shows the existence and the form of a symmetric Nash equilibrium power allocation
strategy.
Theorem 4. Given that each user has the same average power constraint, there exists a symmetric Nash equilibrium power allocation strategy with the following form: gi (Xi) ==
C.
Xin-l
Vi E {I"" ,n}
~
(3.18)
where c is chosen such that the average power constraint is satisfied. 0 Proof. The complete proof is given in the Appendix. With the symmetric Nash equilibrium power allocation strategy given in Eq. (3.18), the expected throughput for each user is given by: P(f(X 1 ) ~ (1 +~) max(f(X 2 ) , ' ==P(X~
== P(X I
2:: ~
(l+~)max(X;,· 1
"
,f(Xn ) ) )
.. ,X~))
(1 +~) ~ max(X 2 , '" ,Xn ) ) .
(3.19)
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
321
To quantify the loss of efficiency due to users' selfish behavior, we consider a system where all users implement the same power allocation policy provided by a system designer such that the overall system throughput is maximized. To find such scheme, we solve the following optimization problem as in the two users' case:
By symmetry, we have the following upper bound for the above probability:
As in the two users' case, we consider a series of functions, vm(x) == x m for m 2: 1. As m - t 00, we have p(X~+l
2: (1 + ~) . max(X;n+l, ... ,X:+ 1 ) ) 1
== P(X 1 2: (1 +~) m+l max(X 2 , ' " ,Xn ) )
1 n
-t -.
Thus, there indeed exists a power allocation scheme that will achieve the maximum possible throughput. In other words, it is possible to have a packet successfully captured in every time slot. Now, when users behave selfishly, the expected throughput for each user is given as follows from Eq. (3.19): (3.20) As n increases, the above equation goes to lin which is the maximum attainable throughput. Therefore, as the number of users becomes large, the symmetric Nash equilibrium power allocation scheme is optimal in the sense that the throughput obtained approaches the maximum attainable throughput. For the special case where ~ == 0, the capture rule becomes that the user with the largest received power get captured. With this simple rule, a Nash equilibrium strategy can be derived with general channel state distribution (i.e., Xi has probability density function PX'i (.)). From Eq. (D.5), we have -1
pz(J
Xl
(Xl· b)) f'(J~I(XI . b)) = c (3.21 )
!'(XI) = !XIPZ(Xt} c
where
322
JUN SUN AND EYTAN MODIANO
Hence, we can write the received power function as the following:
f(x)
=~
J
xPZ(x) dx.
From the above equation, one can get the optimal power allocation function by using g(x) == f~x).
4. Conclusion. We apply an auction algorithm to the problem of fair allocation of a wireless fading channel. Using the all-pay auction mechanism, we are able to obtain.a unique Nash equilibrium strategy. Our strategy allocated bandwidth to the users in accordance with the amount of money that they possess. Hence, this scheme can be viewed as a mechanism for providing quality of service (QoS) differentiation; whereby users are given fictitious money that they can use to bid for the channel. By allocating users different amounts of money, the resulting QoS differentiation can be achieved. We also show that the Nash equilibrium strategy of this auction leads to an allocation at which total throughput is no worse than 3/4 the maximum possible throughput when fairness constraints are not imposed (i.e., slots are allocated to the user with the better channel). In this paper, we focused on finding a Nash equilibrium strategy when both channels are uniformly distributed. However, as we mentioned earlier, our analysis can be extended to channel state with general distribution. An interesting extension could be to find the exact form of a Nash equilibrium with general channel state distribution. In the uplink communication scenario, we consider a communication system with multiple users competing, in a non-cooperative manner, for the access of a single satellite, or base station. With a specified capture rule and an average power constraint, users opportunistically adjust their transmission power based on their channel state to maximize their throughput. A Nash equilibrium power allocation strategy is characterized, and the resulting throughput efficiency loss, due to selfish behavior, is quantified. As the number of users increases, the Nash equilibrium power allocation strategy approaches the optimal power allocation strategy that can be achieved in a cooperative environment.
APPENDIX A. Proof of Lemmma 2. We first show that if (Ii, 12) is a Nash equilibrium strategy pair, Dg}~) (Xl, Ii (xd) and DgW (X2, f2(X2)) must be constants for all Xl E [0, 1] and X2 E [0, 1]. From user 1's perspective Specifically, let with 12 fixed, consider a small variation of the function 16 == Ii + 8(j - Ii) where j is an arbitrary function in S1. Since both j and Ii are in 81 , they are both bounded (i.e., 1!(X1)1 ~ Band I/i(X1)1 ~ B
s:
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
323
for all Xl E [0,1]). Therefore, we have II~(xl) - Ii(xl)1 ::; 2B<5 for all Xl E [0,1]. Using the Lagrange's form of Taylor's theorem, we get for any Xl E [0,1]' there exists a real number C[Xl] E [fi(Xl),f<5(Xl)] such that
(A.l)
The last term is bounded by K . <5 2 for some K since both j and Ii are bounded, and gW(xl,b) has finite second derivative. Therefore, for small enough <5, it is negligible comparing with the other terms. Now we show that if D9}~\Xl,fi(xI)) is not a constant for all Xl E [0,1], we can find a strategy ts which gives user 1 a higher throughput than fi. To do that, we can write the following equations:
(A.2)
Now, since Dg}~)(Xl' fi(xI)) is not a constant for all
Xl
E
[0,1], we can
find a j such that the above equation is positive which implies that there is an incentive for user 1 to use I~. Hence, (Ii, 12) is not a Nash equilibrium strategy pair. Similarly, we can show that D9W(X2,!2(X2)) is a constant for all X2 E [0,1] if (Ii, 12) is a Nash equilibrium strategy pair. For the converse, consider again Eq. (A.2). Since DgJ~)(Xl,fi(xI)) = (1)
8g j* (Xl ,b) 2
I
b=f{ (Xl)
8b
t5
equals to a constant Cl for all Xl E [0, 1]. We have
t' (f(
io
A
Xl) -
89J~) (Xl, b) I 2
8b
b=fi (xtl
dXl (A.3)
=
!
f: (xI))
Je l l\!(XI) - H(xI)) dXl = 0
fal
for all E 81 (i.e., !(Xl) dx; == ex). Thus, there is no incentive for user 1 to use strategy j. Therefore, (Ii, 12) is a Nash equilibrium strategy pair.
B. Proof of Theorem 1 (the Uniqueness). Consider any Nash equilibrium strategy pair (11,12) under the all-pay auction rule. From
324
JUN SUN AND EYTAN MODIANO
previous discussion, we know that the inverse functions, well defined. With user 2's strategy f2 fixed, we have
f;;l
and
f 11,
are
Similarly, with user1 's strategy 11 fixed, we get
From Lemma 2, we know that DgJ~)(Xl' h(xt}) and DgJ~)(X2' h(X2)) are two constants for all Xl E [0,1] and X2 E [0,1] since (fl, 12) is a Nash equilibrium strategy pair. Now, consider the set of channel state pair (Xl, X2) such that 11 (Xl) == 12(X2) (Le., two users' bids are equal). It forms a separation line in space span by Xl and X 2 . Mathematically, this line can be defined as h : [0,1] -+ [0,1] such that X2 == h(Xl) == 12- l(/I(Xl))' By the all-pay auction rule, a slot with channel state (Xl, x~) will be assigned to user 2 if (Xl, x~) is above the line X2 == h(Xl) and to user 1 if (Xl, x~) is below the separation line. Fig. 2 shows an example of h(Xl)' The following lemma shows the uniqueness of h(Xl)' We then derive the uniqueness of the strategy pair (11, f2) from the lemma. (1)
(2)
Lemma 6. If Dg f 2 (Xl, fl(XI)) and Dg/ 1 (X2, f2(x2)) are two constants, CI and C2 respectively, for all Xl E [O, 1] and X2 E [0,1], then h(Xl ) -- XlCl/C2 . Proof Since Dg}~)(XI,fl(XI)) == CI, from g}~)(xI,b) == Xl 'f2- 1(b), we have
Similarly, for user 2, we get
We also know that !l(Xl) == !2(h(Xl)) and !{(Xl) == !~(h(Xl))' h'(Xl). Thus, we have
f{(h- l(X2)) == f~(h(h-l(X2)))' h'(h- I(X2)) == f~(X2)'
h'(Xl)
== f~(h(Xl))'
h'(X1).
(B.3)
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
325
By. combining the equations f~(h-l(X2)) == ~ and f~(h-l(X2)) == . h'(Xl), we get
f~(h(Xl))
Next we substitute Eq. (B.2) and X2 == h(Xl) in the above equation to obtain,
Xl . dh(Xl) = dXl
Cl C2
Cl
h( xd :::} dh( xd = h(Xl)
In Ih(Xl)1 == -In IXll C2
dXl
Cl
C2 Xl
+ C3 =} h(Xl) == e
.=.1
C3 •
X;2 .
.=.1
Combined with fact that h(l) == 1, we get h(Xl) == X;2. 0 Now, we are in a position to derive the exact form of the Nash equilibrium strategy pair. From the equations f{(h- l(X2)) == ~ and X2 == h(Xl), we get f{(xd
h(O)
= 0,
=
h~:Jl
=
we have h(x)
5..l
X~2 !C2.
Combined with the condition that
= C1~C2X*+l.
Following the similar method,
x*+l. Let 1 == £1. and C == _+1 , the Nash equilibwe get f2(X) == _+1 Cl C2 C2 Cl C2 rium strategy pair for the all-pay auction must have the following form:
(B.4) The constant 1 and C are chosen such that Equations (3.11) and (3.12) are satisfied. The uniqueness of the above Nash equilibrium strategy comes from the fact that there is a unique pair of c and 'Y that satisfy Equations (3.11) and (3.12).
c. Proof of Theorem 2. Again, we consider n == 2 case for simplicity. For a1 + a2 == 2a and {31 + {32 == 2{3, this theorem stated that the pair (aI, {31) and (a2, {32) cannot be in equilibrium if al =I a2 and {31 =I {32. We will show this by contradiction. Suppose the pair (a 1, {31) and (a2, {32) are in equilibrium for 0:'1 -:I 0:'2 and (31 -:I (32. That is, for given (31 and (32, 0:'1 and a2 are chosen such that user l's throughput G 1(al,{31) + G 1(a2,{32) is the maximum. This implies the following: 8C
l
~:' fJd 1<>=<>1 = 8C l ~:' fJ2) 1<>=<>2'
(C.1)
aGl~~,i3JlI<>=<>l > aG1~~,i32) 1<>=<>2' we will have C l ((}:l +8, fJl)+ G l (a 2 - <5,(32) > G l (a l , {31) + G 1 (a 2, 132) by first order expansion, thus contradicting the statement that G 1(aI, 131) + G 1(a2, {32) is the maximum To see this, if
throughput for user 1 for given {31 and {32.
326
JUN SUN AND EYTAN MODIANO
Similarly, for given al and a2, if {31 and (32 maximize G 2(a l ,{3I) + G 2(a 2, {32) then,
(C.2) By taking the derivative of Equations (2.14) and (2.15), we get the following:
aGl~:'/h) 1<>=<>1 !31( -2Jai - al!31 +!3r + al - 2131)
(C.3)
8C 2 (a l , {3 ) 1 8{3 {3={31
r-
_ al(-2J a al{31 + {3r + (31 - 2(1) a1{31 + {3r . - - 2(a1 + {31 + Jar - a1{31 + {3r)2Ja
(C.4)
r-
Substituting Eq. (C.3) into Eq. (C.l) and Eq. (C.4) into Eq. (C.2), we then have the following after combining Eq, (C.l) and Eq. (C.2):
{31 (-2J aI - al{3l + (3r + al - 2(31) ,62(-2Va §- a2{32 +,6~ + a2 - 2(32) al(-2Ja i - Ql!3l +!3r + (3l - 2Q1) Q2(-2JQ§ - Q2!32 +!3~ +132 - 2Q2)
(C.5)
2
To simplify the above equation, we multiply ~ on both sides, and let 0
,1 == ,2 == ,1(-2 J 1 - ,1 + ,r + 1 - 2,1) {31
01'
{32. 02
1
We get
,2(-2Vl -,2 +,~ + 1 - 2'2)
-2Jl -,1 +,r +,1 - 2 -2Jl -,2 +,~ +,2 - 2
or, after rearranging terms, the following:
We define
(C.6)
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
327
Then Eq. (C.7) becomes f(11) == f(12). Now we show that this implies /1 == /2 by observing that
and it is easy to check that aa! > 0 V1 > o. Now, we have {31 == {32. We 1"1 01 02 further show that a1 == a2 and {3l == {32. Observe that for fixed {3l and {32, we can write
(C.8)
where
F((J) ==
(J
1+ a
.
+ J(l - (J)2 + (J
Thus, we have
oC 1(a , f3l ) Bo oC l (a, (32)
oa
I
_~OF((J)I f31 o(J a=~
(c.g)
0-
0=01
I
-
_~ of(a) I {32 Ga a= ~
(C.lO)
0-
.
0=02 -
From Eq. (C.l), we have
~ of(cr) 1(7-~ = ~ 8F(CT) f31
8a
-
131
f32
8a
la--
0 2
(C.ll)
•
132
It is easy to verify that 8Fa(a) i= 0 Va >- o. Therefore, since ~ == {32, a 01 02 the above equation implies that {31 == {32 which contradicts our original assumption of {3l =1= {32· Therefore, the pair (aI, (31) and (a2' (32) cannot be in equilibrium if a1 =1= a2 and {3l =1= {32. D. Proof of Theorem 4. With all users i
i=
1 using a fixed power
allocation strategy g, we now explore the optimal power allocation strategy for user 1 which is denoted by gi. Let u~l) : (Xl, b) ~ R denote user l's expected throughput during a slot conditioning on the following events: • User 1's channel state is Xl == Xl . • User 1's allocated power is b. As before, we will drop the term 9 in the expression u~l) (Xl, b), and simply write it as Ul (Xl, b). Specifically, we can the write the equation: UI (Xl,
b) ==P( (1 + ~) max(f2(X2), ... ,fn(Xn)) :S :=P((l + ~)Y ~ Xl . b)
Xl .
b)
328
JUN SUN AND EYTAN MODIANO
where Y == max(f2(X2),'" ,fn(Xn)). Since all users i =1= 1 use the same strategy 9, we have Y == max(f(X 2),'" ,f(Xn)) where f(X i) == Xi' 9(Xi) for all i =I- 1. Moreover, since f is strictly increasing, we can write:
Y == max(f(X2),'" ,f(Xn)) == f(max(X 2,··· ,Xn)). Denoting Z == max(X 2 , ... ,Xn ) , we have the following:
(D.l)
where p z (.) denote the probability density function of the random variable Z. The optimization problem that user 1 faces can be written as the following:
Writing the Lagrangian function, we have 1
1
Ul (Xl, 91(Xl))
1
dXl -
>..(1
1 91(xI)
dx; - e) (D.3)
1
=
[Ul (Xl, 91(Xl))
- >"91 (xI)] dx;
+ >..e.
Therefore, for each fixed Xl, we want to choose a 91 (Xl) to maximize the term Ul (Xl, 91 (Xl)) - A91 (Xl)' For convenience, let b == 91 (Xl)' Then, we have
Maximizing L(b) with respect to b yields the first order condition:
(D.S) Since Z = max(X2 , ... ,Xn ) and Xi'S are i.i.d, we have
FAIR ALLOCATION OF A WIRELESS FADING CHANNEL
329
Now, consider b == gl(Xl) == ex]', Since we are seeking a symmetric Nash equilibrium power allocation strategy, user i i- 1 will adopt the same strategy as user 1. Thus, we have f(x) == x . g(x) == x . cx'" == ex m +1 . The second term in Eq. (D.5) can be written as the following:
(D.6) 1
= c(m + 1) ( 1 + ~
) T:~l
x~ ·
Similarly, PZ
(r C~ ~ Xl . b)) = l
PZ (
C~ ~) m~l Xl)
= (n-l)
c:~) ;;':t; X~-2.
(D.7)
Eq. (D.5) can be re-written in the following form:
(n_l)(_1_);;':t;X~-2 1+~
m
m
e(m+1)(1~~)rn+lXl
Since the above equality has to hold for all be true
Xl
E
--\=0.
(D.8)
[0, 1], the following must
Thus, we have m == n - 1 and gi(X) == cx n - l for all i == 1, ... ,n.
REFERENCES [1] A. Fu, E. MODIANO, AND J. TSITSIKLIS, "Optimal energy allocation for delayconstrained data transmission over a time-varying channel," IEEE INFOCOM 2003, San Francisco, CA, April 2003. [2] P. MARBACH AND R. BERRY, "Downlink resource allocation and pricing for wireless networks," IEEE INFOCOM 2002, New York, NY, June 2002. [3] P. MARBACH, "Priority service and max-min fairness," IEEE INFOCOM 2002, New York, NY, June 2002. [4] P. VISWANATH, D. TSE, AND R. LARaIA, "Opportunistic beamforming using dumb antennas," IEEE Tran. on Information Theory, 48(6): 1277-1294, June 2002. [5] L. TASSIULAS AND S. SARKAR, "Maxmin fair scheduling in wireless networks," IEEE INFOCOM 2002, New York, NY, June 2002. [6] F.P. KELLY, A.K. MAULLOO, AND D.K.H. TAN, "Rate control for communication networks: Shadow prices, proportional fairness and stability," Journal of Operation Research Society, 49: 237-252, 1998.
330
JUN SUN AND EYTAN MODIANO
[7] A. EL GAMAL, E. UYSAL, AND B. PRABHAKAR, "Energy-efficient transmission over a wireless link via lazy packet scheduling," IEEE INFOCOM 2001, Anchorage, April 2001. [8] B. COLLINS AND R. CRUZ, "Transmission policies for time varying channels with average delay constraints," Proceeding, 1999 Allerton Conf. on Commun., Control, and Comp., Monticello, IL, 1999. [9] X. LIU, E.K.P. CHONG, AND N.B. SHROFF, "Opportunistic transmission scheduling with resource-sharing constraints in wireless networks," IEEE Journal of Selected Areas in Communications, 19(10): 2053-2064, October 2001. [10] A. MACKENZIE AND S. WICKER, "Stability of multipacket slotted Aloha with selfish users and perfect information," IEEE INFOCOM 2003, San Francisco, CA, Mar. 2003. [11] X. QIN AND R. BERRY, "Exploiting Multiuser Diversity for Medium Access Control in Wireless Networks," IEEE INFO COM 2003, San Francisco, CA, Mar. 2003. [12] S. GHEZ, S. VERDU, AND S. SCHWARTZ, "Stability properties of slotted Aloha with multipacket reception capability," IEEE Tran. on Automatic Control, 33(7): 640-649, July 1988. [13] N. ABRAMSON, "The throughput of packet broadcasting channels," IEEE Tran. on Communications, Vol. COM-25, pp. 117-128, 1977. [14] P. VISWANATH, D. TSE, AND R. LAROIA, "Opportunistic beamforming using dumb antennas," IEEE Tran. on Information Theory, 48(6): 1277-1294, June 2002. [15] P. GUPTA AND P.R. KUMAR, "The capacity of wireless networks," IEEE Tran. on Information Theory, 46(2): 388-404, Mar. 2000. [16] W. Luo AND A. EPHREMIDES, "Power levels and packet lengths in random multiple access," IEEE Tran. on Information Theory, 48(1): 46-58, Jan. 2002. [17] E. ALTMAN, V. BORKAR, AND A.A. KHERANI, "Optimal random access in networks with two-way traffic," The 15th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC 2004), Barcelona, Spain, 5-8, Sept. 2004. [18] D. BERTSEKAS AND R. GALLAGER, Data Networks, Prentice Hall, 1991. [19] R . .lOHARI AND J. TSITSIKLIS, "Network resource allocation and a congestion game," submitted May 2003. [20] P. KLEMPERER, "Auction theory: A guide to the literature," Journal of Economics Surveys, 13(3): 227-286, July 1999. [21] Y-K. CHE AND I. GALE, "Standard auctions with financially constrained bidders," Review of Economic Studies, 65: 1-21, Jan. 1998. [22] T.R. PALFREY, "Multiple-object, discriminatory auctions with bidding constraints: A game-theoretic analysis," Management Scienece, 26: 935~946, Sep. 1980. [23] D. FAMOLARI, N. MANDAYAM, AND D. GOODMAN, "A new framework for power control in wireless data networks: games, utility, and pricing," Allerton Conference on Communication, Control, and Computing, Monticello, IL, September 1998. [24] T. BASAR AND R. SRIKANT, "Revenue-maximizing pricing and capacity expansion in a many-users regime," IEEE INFOCOM 2002, 1: 23-27, June 2002.
MODELLING AND STABILITY OF FAST TCP* JIANTAO WANGt, DAVID X. WElt, JOON-YOUNG STEV'EN H. r.owt
cnoti,
AND
Abstract. We discuss the modelling of FAST TCP and prove four stability results. Using the traditional continuous-time flow model, we prove, for general networks, that FAST TCP is globally asymptotically stable when there is no feedback delay and that it is locally asymptotically stable in the presence of feedback delay provided a local stability condition is satisfied. We present an experiment on an emulated network in which the local stability condition is violated. While the theory predicts instability, the experiment shows otherwise. We believe this is because the continuous-time model ignores the stabilizing effect of self-clocking. Using a discrete-time model that captures this effect, we show that FAST TCP is locally asymptotically stable for general networks if all flows have the same feedback delay, no matter how large the delay is. We also prove global asymptotic stability for a single bottleneck link in the absence of feedback delay. The techniques developed here are new and applicable to other protocols. Key words. FAST TCP, Modelling, Stability. AMS(MOS) subject classifications. 68M10.
1. Introduction. Congestion control is a distributed feedback algorithm to allocate network resources among competing users. The algorithms in the current Internet, TCP Reno, have prevented severe congestion while the Internet underwent explosive growth during the last decade. It is well known however that TCP Reno's performance degrades steadily as networks continue to scale up in capacity and size [5, 12]. This has motivated several recent proposals for congestion control of high-speed networks, including HSTCP [4], Scalable TCP [10], FAST TCP [7, 8], and BIC TCP [19] (see [7, 8] for extensive references). The details of the architecture, algorithms, experimental evaluations of FAST TCP can be found in [7,8]. A new discrete-time model of congestion control is also introduced in [7, 8] and a sufficient condition for the local asymptotic stability of FAST TCP is proved using the new model for the case of a single link in the absence of feedback delay. In this paper, we extend the analysis and prove four stability results. Most of the stability analysis in the literature is based on the fluid model introduced in [5] (see surveys in [11,9, 15] for extensions and related models). Two key features of these models are that a source controls its sending rate directly! and that the queueing delay at a link is proportional to the integral of the excess demand for its bandwidth. ,. Partial and preliminary results have appeared in [17]. t California Institute of Technology, Pasadena, CA 91125 ({j iantao, weixl , slow} COcaltech. edu).
iPusan National University, KOREA (jycCOpusan.ac.kr). 1 Even when the congestion window size is used as the control variable, sending rate is often taken to be the window normalized by a constant round-trip time, and hence a
source still controls its rate directly.
331
332
JIANTAO WANG ET AL.
In reality, a source dynamically sets its congestion window rather than its sending rate. These models do not adequately capture the self-clocking effect where a packet is sent only when an old one is acknowledged, except briefly and immediately after the congestion window is changed. This automatically constrains the input rate at a link to its .link capacity, after a brief transient, no matter how large the congestion windows are set. The new discrete-time link model proposed in [7, 8] captures this effect. While the traditional continuous-time link model does not consider self-clocking, the new discrete-time link model ignores the fast dynamics at the links. We present both models of FAST TCP in Section 2. Experimental results are provided to show that, despite errors in these models, both of them seem to track the queue process reasonably well. Then we prove two stability results in each of these models. In Section 3, we prove that FAST TC~ is globally asymptotically stable in general networks when there is no teedback delay using the continuous-time model. We also derive a suffiC'ient condition for local asymptotic stability in general networks with feedback delay, using the techniques developed in [13, 16]. This local stability condition becomes necessary when the network consists of a single link and the sources are homogeneous. We then present an experiment on an emulated network (Dummyrlet) in which the local stability condition is violated. While the theory, arid, the 'numerical simulation of the continuous-time model, predict instability, the experiment suggest that FAST TCP is stable. We believe that this' discrepancy is due to the self-clocking effect that helps stability but is ignorediin'the continuous-time model. In Sections 4, we analyze the stability of FAST TCP using the discretetime model. First, we prove that a general network of FAST TCP is locally asymptotically stable if all sources have the same delay, no matter how large the delay is. Then we restrict ourselves to a single link without feedback delay and prove the global asymptotic stability of FAST TCP. The analysis technique developed for the discrete-time model is new and applicable to analyzing other protocols. Finally, we conclude in Section 5 with limitations of this work. 2. Model.
2.1. Notation. A network consists of a set of L links indexed by l with finite capacity ci. It is shared by a set of N flows identified by their sources indexed by i. Let R be the routing matrix where RZ i == 1 if source i uses link l, and 0 otherwise. We use t for time in the continuous-time model, and for time step in the discrete-time model. The meaning of t should be clear from the context. FAST TCP updates its congestion window every fixed time period, which is used as the time unit.
MODELLING AND STABILITY OF FAST TCP
333
Let d; denote the round-trip propagation delay of source i, and qi(t) denote the round-trip queueing delay. The round-trip time is given by Ti(t) :== d, + qi(t). We denote the forward feedback delay from source i to link l by Il~ and the backward feedback delay from link l to source i as Tl~' The sum of forward delay from source i to any link l and the backward delay from link l to source i is fixed, Le., Ti :== Tl~ + Tl~ for any link l on the path of source i. We make a subtle assumption here. In reality, the feedback delays Il~' Tl~ include queueing delay and are time-varying. We assume for simplicity that they are constant, and mathematically unrelated to Ti(t). Later, when we analyze linear stability around the network equilibrium in the presence of feedback delay, we will interpret Ii as the equilibrium value of Ti(t). Let Wi(t) be source i's congestion window at time t (discrete or continuous-time). The sending rate of source i at time t is defined as
(2.1) where Ti(t) == d, + qi(t). The aggregate rate at link l is
(2.2) Let pl(t) be the queueing delay at link l. The end-to-end queueing delay qi(t) observed by source i is
qi(t) ==
L RliPl(t - Il~)'
(2.3)
l
A model of FAST TCP amounts to specifying how Wi(t) and pl(t) evolve. 2.2. Discrete and continuous-time models. A FAST TCP source periodically updates its congestion window based on the average RTT and estimated queueing delay. The pseudo-code is
w f - (1 - ,)w
+,
(
baseRTT ) w+ Q RTT
where, E (0,1], baseRTT is the minimum RTT observed, and a is a constant. We model this by the following discrete-time equation
(2.4) where Wi(t) is the congestion window of source i, , E (0,1], and Qi is a constant that depends on source i. The corresponding continuous-time model is
(2.5)
334
JIANTAO WANG ET AL.
where the time is measured in the unit of update period in FAST TCP. For the continuous-time model, queueing delay has been traditionally modelled by (e.g., [11])
(2.6) In reality, TCP uses self-clocking to match the number of packets-inflight to the congestion window size Wi(t). When the congestion window is fixed, the source sends a new packet exactly after it receives an ACK packet. When the congestion window is increased, the source may send out more than one packet on the receipt of an.ACK packet for the packetin-flight to catch up with the new window size. When the congestion window is decreased, the source .sends no packet for a short while for the packet-in-flight to drop. Therefore, one round-trip time after a congestion window is changed, packet transmission will be clocked at the same rate as the throughput the flow receives. We assume that the. disturbance in the queues due to congestion window changes settles down quickly compared with the update period of the discrete-time model. A consequence of this assumption is that the link queueing delay vector, p(t) == (Pl(t), for alll), is determined implicitly by sources' congestion windows in a static manner: if pl(t) > 0 ifpl(t) == 0
(2.7)
where qi is the end-to-end queueing delay given by (2.3). In summary, the continuous-time model is specified by (2.5) and (2.6), and the discrete-time model is specified by (2.4) and (2.7), where the source rates and aggregate rates at links are given by (2.1) and (2.2), and the endto-end delays are given by (2.3). While the continuous-time model does not take self-clocking into full account, the discrete-time model ignores the fast dynamics at the links. Before comparing these models, we clarify their common equilibrium structure by the following theorem cited from [7, 8]. THEOREM 2.1. Suppose that the routing matrix R has full row rank. A unique equilibrium (x* ,p*) of the network exists. Moreover, x* is the unique maximizer of
max x~O
L
ai
log Xi
subject to
Rx ~ c
(2.8)
and p* is the unique minimizer of the Lagrangian dual problem. This implies in particular that the equilibrium rate x* is (Xi -weighted proportionally fair.
2.3. Validation. The continuous-time link model implies that the queue takes an infinite amount of time to converge after a window change. On the other extreme, the discrete-time link model assumes that the queue
MODELLING AND STABILITY OF FAST TCP 700,---- - , - - ---.-----,-- - - , - ---r-
335
----,,--------,
600
500
, I I I.
200
.1 I
i 100
,:
.( ~
:
-
- -
- - -
..... - ... ... ....
Real Qu eue Discrete time model
Continuoustime model
o • Experimenl lime (seconds)
FIG. 1. Model validation-closed loop.
settles down in one sampling time. Neither is perfect, but we now present experimental results that suggest both track the queue dynamics well. All the experiments reported in this paper are carried out on a Dummynet Testbed [14]. A FreeBSD machine is configured as a Dummynet router that provides different propagation delays for different sources. It can be configured with different capacities and buffer sizes. In our experiments, the bottleneck link capacity is 800Mbps, and the buffer size is 4000 packets with a fixed packet length of 1500 bytes . A Dummynet monitor records the queue size every 0.4 second. The congestion window size and RTT are recorded at the host every 50ms. TCP traffic is generated using iperf The publicly released code of FAST is used in all experiments involving FAST. We present two experiments to validate the model, one closed-loop and one open-loop. In the first (closed-loop) experiment, there are 3 FAST TCP sources sharing a Dummynet router with a common propagation delay of lOOms. The measured and predicted queue sizes are given in Figure 1. At the beginning of the experiment (before time < 4 seconds) , the FAST sources are in the slow-start phase, and neither model gives accurate prediction. After the source enters the congestion avoidance phase, both models track the queue size well. To eliminate the modelling error in the congestion window adjustment algorithm itself while validating the link models, we decouple the TCP and queue dynamics by using open-loop window control. The second exper-
336
JIANTAO WANG ET AL.
iment involves three sources with propagation delays 50ms, lOOms, and 150ms sharing the same Dummynet router. We changed the Linux 2.4.19 kernel so that the sources vary their window sizes according to the schedules shown in Figure 2(a). The sequences of congestion window sizes are then used in (2.1)-(2.2) and (2.6) to compute the queueing delay predicted by the continuous-time model. We also use them in (2.1)-(2.2) and (2.7) to compute the predictions of the discretetime model. The queueing delay measured from the Dummynet and those predicted by these two models are shown in Figure 2(b), which indicates that both models track the queue size well. We next analyze the stability properties of these two models.
3. Stability analysis with the continuous-time model. We present the stability analysis of the continuous model in general networks with and without feedback delays. 3.1. Global stability without feedback delay. In this subsection, we show that FAST is globally asymptotically stable for general networks by designing a Lyapunov function. When there is no feedback delay, the equations (2.2) and (2.3) can be simplified as Yl(t) ==
L RliXi(t)
and
qi(t) ==
L RliPl(t).
(3.1)
l
Suppose that R is full row rank, and the system has unique equilibrium source rates and link prices. Let Wi, Pi, qi, ... be the equilibrium quantities, and denote c5Wi(t) :== Wi(t) - Wi, c5pl(t) == pl(t) - Pl, c5qi(t) == qi(t) - qi, .... From (2.5) the equilibrium window is given by
Wi
(iiTi qi
==--
(3.2)
where T, == di + qi is the equilibrium round-trip delay. We can then rewrite (2.5) as
Therefore, we have
(3.3)
337
MODELLING AND STABI LITY OF FAST Tep
2200
,
I
I
I
,
I
r:
I '
I I
I
2000 1600
,.,
1600
,\
I
I I
~1400
I
I
I
~ 1200 '~
I
I
~'OOO
ill '" is o
"r'
',' .,
~
I
",
':.
1 'I
.. I ..
,
1
, .,
,
"
I
.... I I
i
" J . I
, •.!. - " ,\
1
-,I I I " j I
,"
'j
~, I I
" i'"
I ..
~
I
I
I
I
,"
I .. ... I j 1 ,j, 1 . ,1 .J _
T
i
I I I I
._ , I
..:1.
,
,
I
1 I .. I i
_.,-
,. I
,'" ' :.
. ...
I ' I ' J.. i I j I
,
-~ -
I I
' I ' i"
,, i.
I
I I
I i
!
I ' I i ' , Ii-
',"" ,,!, .. ,." .. I
I'
I
- ~- "
I, ",'" I
.r , I
I I
, I
I
800
I
I ,
\
I
1
J" , I I I
I
,
I
,I
.e
r ~ .,
I
, , .. :
, , ,,
- ,,
,,.. ..
I
I
, ,, , , ,
600
I
I
I
400
I
I
200
I
, I
"
"
0
10
0
15
20
Experim ent time (sec)
(a) Scheduled congest ion window .
3ooo,-T---,r-------r------,----,:.=========~ Real queue - ... Continuous time model - • Discrete lime model
2500
.
2000
" ~1500
s
<5"
1000
500
10
15
20
Experiment time {sec
(b) Resulting qu eu e size.
F IG. 2 , Model validation-open loop.
25
30
338
JIANTAO WANG ET AL.
Based on (2.1) and (3.2) we have
.r . ( ) _
ox, t -
Wi + fJwi (t) _ Wi
Ti(t)
fJwi(t)
Ti(t) -
=
T,
1
1
-i. - Ti(t) )Wi
fJWi(t) fJqi(t) QiTi Ti(t) - Ti(t)Ti
q:'
Therefore, we have
(3.4)
Based on (3.4) and (3.1), the derivative of link price is (from (2.6))
From (3.4) and (3.5), we have
(3.6)
With these preliminary results, we prove the following theorem. THEOREM 3.1. The continues-time model of FAST TCP is globally asymptotically stable when there is no feedback delay and R has full row rank.
Proof: Considering the function V (w, p) defined as
" ") == -1 V( w,p
2"\1I
L -cad: qi (" Wi - Wi)2 + -1 L Cl (" P - Pl )2 2 i
1,
1,
(3.7)
l
where (w, p) is unique equilibrium point, which exists according to Theorem 2.1. Clearly, the function V(w,p) is non-negative for all (w,p) and zero if and only if w == wand p == p. Taking time derivative of V(w(t),p(t)) along the solution trajectory of (3.6) and (3.3) yields
MODELLING AND STABILITY OF FAST TCP
339
where we have used bqi(t) == l:l Rlibpl(t). Hence V > 0 and V < 0 at all (w,p) that is not the equilibrium (w,p), and V == V == 0 at the equilibrium (w,p). Moreover, V(w,p) -4 00 as II(w,p)11 -4 00. This implies that the system specified by (3.6) and (3.3) is globally asymptotically stable. 0 Note that the windows w(t) and the end-to-end queueing delays q(t) converge globally to their equilibrium values regardless of whether R has full row rank. The link queueing delays p(t) may not, unless R has full row rank, in which case p(t) == (RRT)-l Rq(t) is uniquely defined and must also converge globally.
3.2. Local stability with feedback delay. When there are feedback delays, the global stability analysis for FAST TCP in general networks is still open. In this subsection, we provide a sufficient condition for local asymptotic stability. We make two assumptions in this subsection. First, R has full row rank and hence there is a unique equilibrium point (w, p). Second, the roundtrip feedback delays Ti == Tl~ + Tl~ in (2.2) and (2.3) equal the equilibrium values of T i :== di + I:l RliPl. To linearize the model (2.5) and (2.6) around the unique equilibrium, define routing matrices with feedback delay in frequency domain as
[Rf(s)]li := { [Rb(s)]li := {
e-T~S
0 e-Tl~S
0
if Ri, == 1 if Ri, == 0 if Ri, == 1 if Ri, == O.
Let uu, Pl, Xi, qi, and T, be the corresponding equilibrium values. The following Lemma provides the open-loop transfer function.
340
JIANTAO WANG ET AL. LEMMA
3.1. The open-loop transfer function of the linearized FAST
TCP system is
(3.8)
L(s) == DRj(s)A(s)XRJ(-s) where
Proof. See Appendix A.
0 The following theorem provides a sufficient condition for local stability. THEOREM 3.2. The FAST TCP system described by (2.5) and (2.6)
is locally asymptotically stable if
M
¢2 + 'Y2T~ax
¢
¢2
<1
(3.9)
+ '"Y2q~in
where M :== max, l:z R Zi is the maximal number of links in the path of any source, qmin == mini qi, T max == maxi T, and
i
qo/To) 2 Jqi/Ti
-7r - tan -1 1 -
¢ :== min (
2
1,
1,
•
(3.10)
Proof. It is sufficient to show that the eigenvalues of the open-loop transfer function do not encircle -1 in the complex plane for s == jw, w 2 0 when the condition in the theorem is satisfied [3]. The proof is similar to that in [2]. Note that both X and A(s) are diagonal matrices and that AB and BA have the same nonzero eigenvalues for two matrices A and B of approriate dimensions. Hence the set of nonzero eigenvalues of L( s) is the same as those of A(s)RT ( -jw)R(jw), when s == jw, where R(jw) is defined as
R(jw) := diag
(~) Rf(jw)diag( /Xi).
Following the argument of [13, 16], we study the convex hull of Nyquist trajectories and ensure it does not encircle the critical point -1. More specifically, the set a(L(jw)) of eigenvalues of L(jw) satisfies [16] (possibly ignoring the zero eigenvalue):
a(L(jw)) = a (A(s)RT(-jw)R(jw))
~ P (R T (-jw)R(jw)) . co (0 U {Ai(jw), i = 1, ... , N}) where p(A) denotes the spectral radius of matrix A, co(·) denotes the convex hull, and
341
MODELLING AND STABILITY OF FAST TCP
Similar to [2], the spectral radius of RT (- jw) R(jw) is less than M, which is the maximal number of links in the path of any source, M == max, Ll Ru. This implies ~
a(L(jw))
M . co (0 U {Ai(jw), i == 1 ... N}).
Therefore a sufficient condition for local stability is that M Ai(jw) does not encircle -1 for any i. We now prove that when the phase of MAi(jw) reaches -7r, its magnitude is strictly less than 1 and hence the trajectory of M Ai(jw) will not encircle -1 as w goes from 0 to 00. It is not hard to show that the largest phase lag (i.e., the minimum phase) of (jwTi + ,Ti)/(jwTi + ,qi) is produced when c/T, == i . ,qi , which is
'v,T
The above equation yields
Suppose that the phase of Ai(jw) is -7r
-7r
at frequency 7r
== LA'(J'w,) > -w·T· __ - tan- 1 1,
1,
-
1,
1,
2
Then
Wi.
1 - q·/T· 2 Jqi/Ti 1,
1,.
The condition (3.10) in the theorem implies
It is easy to check that the magnitude of Ai(jw) is a decreasing function of w. Therefore under the condition in the theorem, we have ¢2 +,2T? < M ¢2 + q; - ¢
,2
and M A(jwi) can not encircle -1. Hence the system is locally asymptotically stable if (3.9 ) is satisfied. 0 The condition (3.10) can be hard to satisfy when M is large. Nonetheless, it provides information on the effect of various parameters on stability. For example, it suggests that the equilibrium queueing delay should be large to guarantee stability.
3.3. Numerical simulation and experiment. In general, the condition in Theorem 3.2 is only sufficient. When there is only one link and all sources have the same feedback delay, it is necessary as well. The theorem implies that FAST TCP may become unstable in a single bottleneck
342
JIANTAO WANG ET AL.
network with homogeneous sources. We now present an experiment with a single bottleneck link where the local stability condition is violated. Numerical simulation of the continuous-time model exhibits instability confirming the theorem. Yet, the same network on Dummynet with real FAST TCP implementation is stable. This suggests that the discrepancy is not in the stability theorem but rather in the continuous-time model. In our experiment, the sources have identical propagation delay of lOOms with a constant a value of 70 packets. They share a bottleneck with capacity of 800Mbps. The simulations and experiments consist of three intervals. The interval length is 10 seconds for the continuous-time model simulation and 100 seconds for the experiment 2 . Three sources are active from the beginning of the experiment, seven additional sources activate in the second interval, and in the last interval, all sources become inactive except five of them. The simulation and experimental results are shown in Figure 3 and Figure 4, respectively. Figure 3 confirms the theorem that the continuous-time model is unstable under the chosen condition that violates the stability condition of Theorem 3.2. However, as Figure 4 shows, the real FAST TCP implementation is actually stable." We believe that the discrepancy is largely due to the fact that the continuous-time model does not capture the self-clocking effect accurately. Self-clocking ensures that packets are sent at the same rate as the throughput the source receives, except briefly when the window size changes, and helps stabilize the system. Indeed, for the case of one source over one link, a discrete-event model is used in [18] to prove that FAST TCP and Vegas are always stable regardless of the feedback delay. It also provides justification for the discrete-time models in (2.4). 4. Stability analysis with the discrete-time model. We now analyze the stability of the discrete-time model. We first show that a network of homogeneous sources with the same feedback delay is locally stable no matter how large the delay is, agreeing with our experimental experience. We then show that at a single link, FAST TCP converges globally and exponentially in the absence of feedback delay.
4.1. Local stability with feedback delay. A network of FAST TCP sources is modelled by equations (2.3), (2.4), and (2.7)~ We assume R has full row rank so that the equilibrium is unique. Since we are studying local stability around the equilibrium, we ignore all uri-congested links (links where prices are zero in equilibrium) and assume that equality always holds in (2.7). The main result of this section provides a sufficient condition for local stability in general networks with common feedback delay. This proof 2We use a longer duration in the Dummynet experiment because a FAST TCP source takes longer to converge due to slow-start, which is not included in our model. 3The regular spikes every 10 seconds in the queue size are probably due to a certain background task in the sending host.
343
MODELLING AND STABILITY OF FAST TCP
4000
,
3500
3000
2500
£
.5;'" 2000
:
s'"
6
1500
;
;
12
14
1000
500
o
10
16
18
20
22
24
26
28
30
Simulation time (sec)
(a) Queue size .
:1- l
3000
Flow' Flow4 - - Flow 10
: :
2500
~
. . . .. .. . . . . . . . .. . . . . . . . .•.-
..
2000
.
}' 0
~ .~ 1500
I
~
g' 0
(J
1000
500
'
~
Ii
,
.~ . . IJ~/~~~ !.~~WWWWWv '. '
I
I I
I
0 10
• h
. ,1,,\ 1\ '\/1/\
0
I
I 12
14
16
18
20
22
24
26
28
30
Simulation time (sec)
(b) W indow size .
FIG. 3. Numeri cal sim ulation of con tin u ous-time model [o FAST TCP.
344
JIANTAO WANG ET AL.
800
~ ~L·J . , ~ ~L
700
. or'
......... .. .. ..
600
.~
.
..
. ..
~ ,9,500
..
"
.~
s
....
~ 400
co
..
I
..
ndJ
E 300
.11
"J1
.5
.. .....
ij ~. · . lJ ii
200
100
0 50
100
150
200
250
200
250
Experi menl lime (sec)
(a) Queue size .
2500
2000
~ J
~ .~
1500
~
iI1
~looo
,,
___
. _J1I'-~ .
___
r
500
,, I
I
I 0 50
100
150 Simulation time(sec)
(b) W indow size.
FIG. 4. Dummynet experiments of FAST TCP.
MODELLING AND STABILITY OF FAST TCP
345
generalizes the technique in [7, 8] from a single link to a network and by including feedback delay. ' THEOREM 4.1. FAST TCP is locally asymptotically stable for arbitrary networks for any '1 E (0,1] and if all sources have the same round-trip feedback delay Ti == T, == T for all i. In particular, when all feedback delays are ignored, Ti == 0 for all i, then FAST TCP is locally asymptotically stable. This generalizes the stability result in [7, 8] from a single link to a network. COROLLARY 4.1. FAST TCP is locally asymptotically stable in the absence of feedback delay for general networks with any '1 E [0, 1). The rest of this subsection is devoted to the proof of Theorem 4.1. We apply Z-transform to the linearized system, and use the generalized Nyquist criterion to derive a sufficient stability condition. Define the forward and backward Z-transformed routing matrices Rf(z) and Rb(z) as
[Rj(z)b := { [Rb(z)b := { · Tf Th e re Iation li
+ TZbi --
Ti --Ti
Z-Tl~
0 Z-Tti
0
if R li == 1 if Ri, == 0 if R Zi == 1 if R li == O.
. gives (4.1)
Denote by <5w(z), <5q(z), and <5p(z) the corresponding Z-transforms of <5w(t) , <5q(t), and <5p(t) for the linearized system, respectively. Let q and w be the end-to-end queueing delay and congestion window at equilibrium. Linearizing (2.7) yields
where equality is assumed in (2.7). The corresponding Z-transform in matrix form is
(4.2) where the diagonal matrices are
B
:= diag
((d :iqi )2) , M:= diag (d i
i
~ qJ '
D := diag(di ) .
Since Rf(z) is generally not a square matrix, we cannot cancel it in (4.2). Equation (2.3) is already linear, and the corresponding Z-transform in matrix form is
(4.3)
346
JIANTAO WANG ET AL.
By combining (4.2) and (4.3), we obtain
I ( Rj(z)B
-Rf(z) ) ( <5q(Z) 0 8p(z)
)=(
° )8w(z).
Rj(z)D- 1 M
Solving this equation with block matrix inverse gives the transfer function from <5w(z) to <5q(z):
<5q(z) 8w(z)
T
T
= Rb (z)(Rj(z)BR b (z))
-1
Rj(z)D
-1
M.
The Z-transform of the linearized congestion window update algorithm is
z<5w(z) ==, (M<5w(z) - DB<5q(z)) + (1 - ,)<5w(z). By combining the above equations, the open-loop transfer function £(z) from 6w(z) to 6w(z) is:
£(z) == -, (M - DBR[(z)(Rj(z)BR[(z))-l Rj(z)D- 1 M) z-l +(1 - ,)z-l I. A sufficient condition for local asymptotic stability can be derived based on the generalized Nyquist criterion [1, 3]. Since the open-loop system is stable, if we can show that the eigenvalue loci of £( ej W ) does not enclose -1 for w E [0, 27f), the closed-loop system is locally asymptotically stable. A sufficient condition for this is that the spectral radius of L( e j W ) is strictly less than 1 for w E [0,27r). When z == ej w , the spectral radii of £(z) and -z£(z) are the same. Hence, we only need to study the spectral radius of
Clearly, the eigenvalues of J(z) are dependent on ,. For any given z == el'", let the eigenvalues of J (z) be denoted by Ai (1'), i == 1 ... N, as functions of , E (0, 1]. It is clear that
Hence if p(J (z)) < 1 for any z == ej w for , == 1, it will also hold for all (0,1]. Therefore, it suffices to study the stability condition for, == 1. Let J-Li == di/(d i + qi) be the ith diagonal entry of matrix M. Let J.-lmax :== max, J.-li· Since the end-to-end queueing delay qi cannot be zero at equilibrium (otherwise the rate will be infinitely large), we have qi > 0 and J-Lmax < 1. The following key lemma characterizes the eigenvalues of J (z) with, == 1. jw LEMMA 4.1. Whenz == e withw E [0,27f) and, == 1, the eigenvalues
,E
of J (z) have the following properties:
347
MODELLING AND STABILITY OF FAST TCP
1. There are L zero eigenvalues with the corresponding eigenvectors
being the columns of the matrix M- 1DBRf(z). 2. The nonzero eigenvalues have moduli less than 1 if Tmax 1/4, where Tmax == maxi Ti and Tmin == mini Ti . Proof: At 1 == 1, the matrix J (z) is
-
Tmin <
M - DBRf(z)(Rf(z)BRf(z))-l Rf(z)D-1M. It is easy to check that
J(z)M-1DBRf(z) == DBRf(z) - DBRf(z) == O. Since M- 1DBRf(z) has full column rank, it consists of L linearly independent eigenvectors of J(z) with corresponding eigenvalue O. This proves the first assertion. For the second assertion, suppose that A is an eigenvalue of J(z) for a given z. Define matrix A as
which is singular by definition. Recall the matrix inversion formula (see, e.g., [6])
If J + EHS is singular, then either J or H- 1 + SJ-l E must be singular. We can let
J:== M - AI, H :== (Rj(z)BRf (z))-l,
E :== -DBRf(z)
S :== R j (z )D- 1 M.
Since A == J+EHS is singular, either J == M ->..1 or H-l+SJ-IE is singular. The second term can be rewritten as Rf(z)(B-M(M -Al)-1 B)Rt(z). Case 1: M - >"1 is singular. Since M is diagonal, then
o < >.. ==
di
-d- - == J-li :S i + qi
J-lmax
< 1.
Case 2: Rf(z)(B - M(M - >"I)-IB)Rt(z) is singular. It is clear that
B - M(M - >"1)-1 B == diag ((1 - J-li(J-li - >..)-I)l3i) = ->..diag
(JL) J-li - A
where l3i is the ith diagonal entry of matrix B. Hence, >.. == 0 is always an eigenvalue, as shown above. If >.. is nonzero, it has to be true that
det (Rf(Z)dia g
(/lif3~ ,\) R&(Z))
= o.
(4.4)
348
JIANTAO WANG ET AL.
When z == ej w , we have z-l ==
z.
Hence, equation (4.1) can be rewritten as
Substituting the above equation into (4.4) with z == ej w yields
(4.5) Therefore, the following formula is also zero:
where ()i == (Tm ax - Ti)W, and VJ can be any value. When we have for W E [0,27f)
Tm a x - Tmin
< 1/4,
Suppose that there is a solution such that IAI ~ 1. Based on Lemma 4.2, which will be presented later, there exists a VJ s.t. Im( diag (ej (()i +~) {3i / (J-li - A))) is a positive diagonal matrix. Therefore the imaginary part of matrix
is positive definite, and the real part is symmetric. From Lemma 4.3 below, it has to be nonsingular. This contradicts the equation
Hence, we have lA/ < 1. 0 The proof of Theorem 4.1 will be complete after the next two lemmas. LEMMA 4.2. Suppose that 0 < J-li < 1 and 0 :s; ()i < 7f/2. If IAI ~ 1 , there exists a 'ljJ such that
Proof: See Appendix B. 0 LEMMA 4.3. If the real part of a complex matrix is symmetric, and the imaginary part is positive definite, then the matrix is nonsingular. Proof: See Appendix C. 0
MODELLING AND STABILITY OF FAST TCP
349
4.2. Global stability for one link without feedback delay. In the absence of feedback delay, when there is only one link, the FAST TCP model can be simplified into
(4.6)
'""" _W_i_( t)_
~ d, + q(t) -
with equality if q(t)
>0
(4.7)
1,
where q(t) is the queueing delay at the link (subscript is omitted). The main result of this section proves that the system (4.6)-(4.7) is globally asymptotically stable and converges to the equilibrium exponentially fast starting from any initial point. THEOREM 4.2. On a single link, FAST TCP converges exponentially to the equilibrium, in the absence of feedback delay.
In the rest of this subsection, we prove the theorem in several steps. The first result is that equality always holds in (4.7) after some finite number K 1 of steps, Le., and q(t) > 0 for any t > K 1. Define the normalized congestion window sum as Y(t) :== wi(t)/d i. From (4.7), it is clear that q(t) > 0 if and only if Y(t) > c. LEMMA 4.4. There exists K 1 > 0 such that the following are true for
L:i
all t > K 1 : 1. q(t) > O. 2. v(t + 1) == (1 -1)V(t) where v(t) :== Y(t) - c - L:i O!i/di . Proof: If initially q(t) == 0, which also means Y(t) :S c, from (4.6) we have Y(t + 1) == Y(t) + 1 L:i O!i/di, which linearly increases with t. Then Y(t) > c after some finite steps. Therefore, there exists a K 1 such that Y(t) > c and q(t) > 0 at t == K 1 . We will show that Y(t) > c implies Y(t + 1) > c. Hence q(t) > 0 for all t > K 1 • Moreover, v(t) converges exponentially to O. Suppose Y(t) > c. From L:i wi(t)/(d i + qi(t)) == c, we have
vt+ 1) -(
L
ui; (t
.
1,
_ (1
-
+ 1) -
di
L --c O!i
t
) '""" Wi(t) di
- , LJ
di
.
Qi
'"""
(~ w~~t) -
- c
t
i
= (1 -,)
Wi(t)
+, LJ d; + q(t) c-
~ ~;)
= (1 -,) v(t).
This proves the second assertion. Moreover it implies
Y(t
+ 1) = (1 -,)Y(t) +, (~~; +
c)
350
JIANTAO WANG ET AL.
Hence, Y(t) > c implies Y(t + 1) > c and q(t + 1) > O. This completes the proof. 0 For the rest of this subsection, we pick a fixed E with 0 < E < I:i ai/di. Define
max d ( " ai qmax:== -cd
L:
and
i
+ E)
where d m in :== mini d, and d max :== max, di . Then q(t) is bounded by these two values after finite steps. LEMMA 4.5. There exists a positive K 2 such that qmin ~ q(t) ~ qmax for any t ~ K 2 . Proof: From Lemma 4.4, after finite steps K 1 , v(t + 1) == (1 - )')v(t). Therefore, there exists a K 2 such that Iv(t)1 < E for all t ~ K 2 . It implies
Therefore
min d ( " (Xi q(t) 2 -cd
L:
i
- E)
= qrnin
o
The proof for qrnax is the same. 0 Define J.Li (t) :== d;/ (di + q(t)), and J.Lrnax : == max, d,/ (di + qrnin), J.Lrnin : == mini di/(d i + qrnax). Based on Lemma 4.5, we have 0 < J.Lrnin :s.; /-Li(t) ~ J.Lrnax < 1 for any t ~ K 2. Define
'YJi(t) := Wi(t) - fri cad;
_ _1_
(4.8)
q(t)
and denote "lrnax(t) :== max, "li(t), "lmin(t) :== mini "li(t). We will show that the window update for source i is proportional to 1]i(t), and the system is at equilibrium if and only if all "li (t) are zero. The next lemma gives bounds on "li (t). LEMMA 4.6. There exist two positive numbers ()l and ()2 such that for
all t
~
K2
"lrnax(t) >
-()1 (1 - )')t
and
1]rnin (t)
Proof: From (4.8), it is easy to check that Y(t Lemma 4.4, when t ~ K 2 we have
< 82 (1 -
+ 1) -
)')t.
Y(t) == -)'v(t). By
351
MODELLING AND STABILITY OF FAST TCP
where ~:== ,(1- ,)-K2Iv(K2)1. The update of source i's congestion window is
Choose 61 large enough such that 61N,arninqrninJ.-lrnin/drnax > r: where arnin :== mini ai· We now prove TJrnax(t) > -6 1(1 - ,)t for all t ~ K 2by contradiction. Suppose that there is a time t ~ K 2 such that TJrnax(t) ~ -61(1-,)t. Then all the TJi (t) are negative, which implies
Y(t
+ 1) - Y(t)
==
L(Wi(t + 1) - wi(t))/di i
==
L -,aiq(t)J.-li (t)TJi (t)/d i i
~ N( -TJnlax(t)),arninqrninJ.-lrnin/drnax
~ 61N(1 - ,)t,arninqrninJ.-lrnin/drnax
> ~(1 _ ,)t.
This contradicts equation (4.g) and proves the claim. The proof for TJrnin (t) is similar. 0 Define L(t) as:
L (t) :== TJrnax (t) - TJrnin (t).
(4.10)
The following lemma implies that the difference between different TJi (t) goes to zero exponentially fast. LEMMA 4.7. There are two positive numbers 63 and 64 , such that for
t
~
K 2 we have 1. L(t) ~ O. 2. L(t + 1) ::; (1 - , + 'J-lrnax)L(t) + 63(1 - ,)t. 3. L(t) ~ 64 (1 - ,+ 'J.-lrnax)t.
Proof: See Appendix D.
0
Both TJrnax(t) and TJrnin(t) exponentially converge to zero. Proof: When t ~ K 2 , combining Lemma 4.6 and Lemma 4.7 yields bounds for TJrnax(t): LEMMA 4.8.
-6 1 (1_')')t
< TJrnax(t)
==
L(t)
+ TJmin (t) ~ 64 (1-')' + ,),J.-lrnax)t + 62 (1_')')t.
Since both the upper and lower bounds of TJrnax(t) converge to zero exponentially fast, TJrnax (t) exponentially goes to zero. The proof for TJrnin (t) is similar. 0
352
JIANTAO WANG ET AL.
Proof of Theorem 4.2: The system is at equilibrium if and only if ui, (t) == + 1) for all i. This is equivalent to 1Ji(t) == 0 for all i because of the equation proved in Lemma 4.6:
W{(t
Since both TJrnax (t) and TJrnin( t) converge to zero exponentially from any initial value, the system converges to the equilibrium defined by 1Ji (t) == 0 globally. D
5. Conclusion. we have proved that FAST TCP is globally asymptotically stable in a general network when there is no feedback delay using the traditional continuous-time model. When feedback delays are present, a sufficient condition is provided for local stability for general networks. Using a discrete-time model that captures the stabilizing effect of selfclocking, we have proved that FAST TCP is locally asymptotically stable in a general network as long as all flows have the same feedback delay, no matter how large it is. We have also proved that FAST TCP is globally asymptotically stable at a single link in the absence of feedback delay. This work can be extended in several ways. First, the condition for local asymptotic stability derived appears more restrictive than our experiments suggest. Moreover, we have also found scenarios where predictions of the discrete-time model disagree with experiment. These discrepancies should be clarified. Second, it will be interesting to extend the global stability analysis to general networks with feedback delays. Finally, the new model and the analysis techniques here can be applied to analyze other congestion control algorithms. APPENDIX A. Proof of Lemma 3.1. The FAST TCP model (2.1, 2.3, 2.5, 2.2) and (2.6) can be linearized into
where ui; and qi are equilibrium values. Sincer, == Tl~ + Tl~ == T, == d; + qi for all links l on the path of source i, the following equation holds
(A.l)
353
MODELLING AND STABILITY OF FAST TCP
1m
Re
FIG. 5. Illustration of Lemma 4.2.
The Laplace transform of the linearized system in matrix form is
Rb(s)T<5p(S) D 3<5w(s) - D 4<5q(s) -1 (D 2D3<5w(s) + D 1D4<5q(s)) R f (s)<5x(s) D<5y(s)
<5q(s) <5X( s) s<5W( s) <5y( s) s<5p( s)
where the diagonal matrices are
D:= diag
D1
(±)
:== diag
(di ) , D 2
:== diag (qi)
D := diag (di~ qi) D := diag ((di:iqi 3
4
)2 )
.
The open-loop transfer function from <5p( s) to <5p( s) can be derived based on the above equations as
By using the fact that T, == di + qi, Xi == Wi/Ti and (A.I), we can simplify 0 the open loop transfer function L( s) into (3.8).
B. Proof of Lemma 4.2. Proof: Consider the complex plane in Figure 5. Let the points A, B, and A represent the value of fLmin, fLmax, and A, respectively. Z is the intersection of segment AA and the unit circle, and ~ stands for the complex conjugate of A.
354
JIANTAO WANG ET AL.
Let ¢i E [0,27f) be the phase of 1/(J1i - A). Let ¢max :== max, ¢i and ¢min :== mini ¢i. Clearly, ¢i E [0,7f) if Im("\) > 0, and ¢i E (7f,27f) otherwise. Hence 0 ::; ¢max - ¢min ::; n, Since every J1i is in the range [J1min, J1max], it is easy to check that every ¢i is in the range formed by the phases of 1/(J1min - ,.\) and 1/(J1max - A). This implies ¢max - ¢min
< IL ==
. 1_ A - L
J1mln
L.AXB
==
1 _ AI
J1max
LA,.\B < i.OZB < 7f/2.
Let E > 0 be small enough such that ¢max - ¢min 1/J == -¢min + E gives ej (1/J+ (}i ) f3i
L
J1i-
A
== ¢i
< 7f /2 -
E.
Choosing
+ 1/J + ()i
== ¢i - ¢min + E + ()i (greater than 0)
< ¢max - ¢min + E + 7f/2 < tt, The fact that its phase is in (0,7f) implies that
1m (
ej (1/J+ (}d f3i) ,.\ >0. J1i-
o c. Proof of Lemma 4.3. Suppose that A :== A r + jA i where A r == A; and Ai is positive definite. If A is singular, there exists a nonzero vector v :== Vr + jVi such that Av == o. Then Arvr == Aivi and Aivr == -ArVi. Since Ai > 0 and A r == A;, we have o < v; Aivr == -v; ArVi == -v; A;Vi ==
-viArvr == -viAiVi < 0
a contradiction. Hence A is nonsingular.
o
D. Proof of Lemma 4.7. It is obvious that L(t) ~ 0 because of its definition in (4.10). We start with the update of TJi(t)
MODELLING AND STABILITY OF FAST TCP
355
For simplicity, we let ai(t) :== l-')'+')'/Li(t) and denote amax :== l-')'+')'/Lmax, then a;(t) ::; amax. This definition simplifies the above equation into 7}i (t
+ 1), = a, (t) 7}i ( t) -
1 ( ) qt+1
1
+ -( ). qt
(D.1)
By comparing equation (D.1) for source i and j, we obtain
Without loss of generality, suppose that at time t + 1, the largest and smallest values of 'r] are achieved at sources i and i, respectively. This assumption implies
L(t + 1) == 'r]i(t + 1) - "lj(t + 1). The upper bound of L(t + 1) is derived by considering the following three cases separately. Case 1: "li(t) and "lj(t) have different signs. It is easy to see that
L(t + 1) == ai(t)"li(t) - aj(t)"lj(t) ::; amax("li(t) - "lj(t)) == amax("lmax (t) - "lmin (t)) == amaxL(t). Case 2: Both "li(t) and "lj (t) are positive. It yields
L(t
+ 1) == ai(t)"li(t) -
aj (t)"lj (t) ::; amax"lmax(t) == amaxL(t) + amax"lmin(t) ::; amaxL(t) + amax62(1 - ')')t ::; amaxL(t) + 63(1 - ')')t
as long as 83 2: a max62 . Case 3: Both "li (t) and "lj (t) are negative. The proof is similar to that for Case 2. Summarizing all the above cases, we have proved L(t+ 1) ::; amaxL(t) + 83 (1 - ')')t for all t 2: K 2 . Denote b :== 1 - ')'. Then 1 > amax > b 2: o. For any t 2: K 2 , an upper bound of L(t) is L(t) :::; amaxL(t - 1) + 63bt-1
::; at;;a~2 L(K2 ) + 63(bt- 1 + bt-2amax + ... + bK2at;;a~2-1) _ ( -K 2L (K ) r bK2a~~2) t 63 t - amax 2 - u3 b _ amax + b _ a b. a max max Note that the coefficient of bt is negative. By choosing 64 as the coefficient of a~ax, we get
o
356
JIANTAO WANG ET AL.
REFERENCES [1] F.M. CALLIER AND C.A. DESOER, Linear System Theory, Springer-Verlag, New York, 1991, pp. 368-374. [2] H. CHOE AND S.H. Low, Stabilized Vegas, in Proceedings of IEEE Infocom, April 2003. http://netlab.caltech.edu. [3] C.A. DESOER AND Y.T. YANG, On the generalized nyquist stability criterion, IEEE Transactions on Automatic Control, 25 (1980), pp. 187-196. [4] S. FLOYD, Highspeed TCP for large congestion windows, RFC 3649, IETF Experimental, December 2003. http://www . f aqs . org/rf cs/rf c3649. html. [5] C. HOLLOT, V. MISRA, AND W. GONG, Analysis and design of controllers for AQM routers supporting TCP flows, IEEE Transactions on Automatic Control, 47
(2002).
[6] R.A. HORN AND C.R. JOHNSON, Matrix Analysis, Cambridge University Press, 1985.
[7] C. JIN, D.X. WEI, AND S.H. Low, FAST TCP: motivation, architecture, algo-
[8] [9] [10]
[11] [12]
[13]
[14] [15] [16]
[17] [18] [19]
rithms, performance, in Proceedings of IEEE Infocom, March 2004. http: //netlab.caltech.edu. D.X. WEI, C . .lIN, S.H. Low, AND S. HEGDE, FAST TCP: motivation, architecture, algorithms, performance, to appear in IEEE/ACM Transactions on Networking, 2007. F.P. KELLY, Fairness and stability of end-to-end congestion control, European Journal of Control, 9 (2003), pp. 159-176. T. KELLY, Scalable TCP: improving performance in highspeed wide area networks, ACM SIGCOMM Computer Communication Review., 33 (2003), pp. 83-91. S.H. Low, F. PAGANINI, AND J.C. DOYLE, Internet congestion control, IEEE Control Systems Magazine, 22 (2002), pp. 28-43. S.H. Low, F. PAGANINI, J. WANG, AND J.C. DOYLE, Linear stability of TCP/RED and a scalable control, Computer Networks Journal, 43 (2003), pp. 633-647. http://netlab.caltech.edu. F. PAGANINI, Z. WANG, J.e. DOYLE, AND S.H. Low, Congestion control for high performance, stability, and fairness in general networks, IEEE/ ACM Transactions on Networking, 13 (2005), pp. 43-56. L. RIZZO, IP dummynet. http://info.iet . unipi. i t/-luigi/ip_dummynet/. R. SRIKANT, The Mathematics of Internet Congestion Control, Birkhauser, 2004. G. VINNICOMBE, On the stability of networks operating TCP-like protocols, in Proceedings of IFAC, 2002. http://netlab . caltech. edu/pub/papers/gv_ ifac. pdf. J. WANG, D.X. WEI, AND S.H. Low, Modeling and stability of FAST TCP, in Proceedings of IEEE Infocom, Miami, FL, March 2005. D.X. WEI, Congestion control algorithms for high speed long distance tcp. Master's Thesis, Caltech, 2004. L. Xu, K. HARFOUSH, AND 1. RHEE, Binary increase congestion control for fast long-distance networks, in Proceedings of IEEE Infocom, March 2004.
LIST OF WORKSHOP PARTICIPANTS
• Prathima Agrawal, Department of Electrical and Computer Engineering, Auburn University • Raje.ev Agrawal, Motorola • In Soo Ahn, Department of Electrical and Computer Engineering, Bradley University • Jeffrey G. Andrews, Department of Electrical and Computer Engineering, University of Texas - Austin • Daniel Matthew Andrews, Bell Laboratories, Lucent Technologies • Paul Anghel, Department of Electrical and Computer Engineering, University of Minnesota • Douglas N. Arnold, Institute for Mathematics and its Applications, University of Minnesota • Donald G. Aronson, Institute for Mathematics and its Applications, University of Minnesota • Francois Baccelli, INRIA, Ecole Normale Superieure • Randall A. Berry, Department of Electical and Computer Engineering, Northwestern University • Sem C. Borst, Bell Laboratories, Lucent Technologies • Nigel Boston, Department of Mathematics, University of Wisconsin - Madison • Maury Bramson, School of Mathematics, University of Minnesota • Robert Buche, Mathematics and Operations Research, North Carolina State University • Dov Chelst, Mathematical Sciences, DeVry University • Rong-Rong Chen, Department of Electrical and Computer Engineering, University of Utah • Kenneth L. Clarkson, Bell Laboratories, Lucent Technologies • Cristina Comaniciu, Department Of Electical and Computer Engineering, Stevens Institute of Technology • Neiyer S. Correal, Motorola Labs, Motorola • Fadel Digham, Department of Electrical and Computer Engineering, University of Minnesota • Tyrone Duncan, Department of Mathematics, University of Kansas • Kossi Delali Edoh, Elizabeth City State University • Martin Eiger, Telcordia Technologies, Telcordia • Shahrokh Farahmand, Department of Electrical and Computer Engineering, University of Minnesota • Philip Fleming, Network Advanced Technology, Motorola, Inc. • Georgios Giannakis, Department of Electrical and Computer Engineering, University of Minnesota 357
358
LIST OF WORKSHOP PARTICIPANTS
• Martin Greiner, Corporate Technology Department, Siemens • Katherine Guo, Networking Software Research, Bell Laboratories • Kkmal Hammouda, Department of Computer Science, University of Minnesota • Nidhi Hegde, Research and Development, France Telecom • Jim Hickenbothan, Control Science and Dynamical Systems, University of Minnesota • John Hobby, Computer Sciences Research Center, Bell Laboratories • Michael L. Honig, Department of Electrical and Computer Engineering, Northwestern University • Hakim Al Hussier, Department of Electrical and Computer Engineering, University of Minnesota • Chuanyi Ji, Georgia Institute of Technology • Yasong Jin, Department of Mathematics, University of Kansas • Nihar Jindal, Department of Electrical and Computer Engineering, University of Minnesota • Xiao Jinjun, Department of Electrical and Computer Engineering, University of Minnesota • Mostafa Kaveh, Department Electrical and Computer Engineering, University of Minnesota • Sang Wu Kim, Electrical and Computer Engineering Iowa, State University • Sang-Min Kim, Department of Electrical Engineering, University of Minnesota • Thierry Klein, Wireless Research Lab - Bell Labs, Lucent Technologies • Gerhard Kramer, Communications and Statistical Sciences Department, Bell Labs, Lucent Technologies • Vikram Krishnamurthy, Department of Electrical and Computer Engineering, University of British Columbia • Komandur R. Krishnan, Network Design and Traffic Research Telcordia • P.R. Kumar, Department of Electrical and Computer Engineering, University of Illinois - Urbana-Champaign • Harold J. Kushner, Division of Applied Mathematics, Brown University • Spyros Kyperountas, Florida Communication Research Labs, Motorola • Richard La, Department of Electrical and Computer Engineering, University of Maryland • Chulhan Lee, Department of Electrical and Computer Engineering, University of Texas - Austin • Juyul Lee, Department of Electrical and Computer Engineering, University of Minnesota
LIST OF WORKSHOP PARTICIPANTS
359
• Debra Lewis, Institute for Mathematics and its Applications, University of Minnesota • Perry Li, Department of Mechanical Engineering, University of Minnesota • yong Li, Department of Electrical and Computer Engineering, University of Minnesota • Yanzhu Lin, Department of Electrical and Computer Engineering, University of Utah • Xin Liu, Department of Computer Science, University of California - Davis • Yuanjin Liu, Department of Mathematics, Wayne State University • David Love, Department of Electrical and Computer Engineering, Purdue University • Steven Low, Department of Electrical Engineering and Computer Science, California Institute of Technology • Antonio Garcia Marques, Department of Electrical and Computer Engineering, University of Minnesota • Sean P. Meyn, Department of Electrical and Computer Engineering, University of Illinois - Urbana-Champaign • Wei Mo, Department of Electrical and Computer Engineering, Iowa State University • Eytan Modiano, Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology • Jae Moon, Department of Electrical and Computer Engineering; University of Minnesota • Eric Msechu, Department of Electrical and Computer Engineering, University of Minnesota • Arnie Neidhardt, Mathematical Sciences Research Center, Telcordia • Mahdi Nezafat, Department of Electrical and Computer Engineering, University of Minnesota • Daesun Oh, Department of Electrical and Computer Engineering, University of Minnesota • Ronghui Peng, Department of Electrical and Computer Engineering, University of Utah • Vincent Poor, Department of Electrical Engineering, Princeton University • Thomas Posbergh, Department of Electrical Engineering and Computer Science, University of Minnesota • K. Venkatesh Prasad, Infotronics Research and Advanced Engineering, Ford • Alexandre Proutiere, CORE/ePN, France Telecom • Priya Ranjan, Distribution Systems Group, Intelligent Automation, Inc.
360
LIST OF WORKSHOP PARTICIPANTS
• Alejandro Ribeiro, Department of Electrical and Computer Engineering, University of Minnesota • Timothy J. Salo, Salo IT Solutions, Inc. • Tathagata Samanta, Mathematical Sciences, Florida Institute of Technology • Arnd Scheel, Institute for Mathematics and its Applications, University of Minnesota • Christian Scheideler, Fakultat fur Informatik Technische Universitat Miinchen • Syed Faisal Ali Shah, Department of Electrical Engineering, University of Minnesota • Anshuman Sharma, Neurological, Medtronic • Shagi-Di Shih, Department of Mathematics, University of Wyoming • Rahul Sinha, Department of Electrical and Computer Engineering, Illinois Institute of Technology • Qingshuo Song, Department of Mathematics, Wayne State University • Kevin William Sowerby, Department of Electrical and Computer Engineering, University of Auckland • Vijay Subramanian, Performance Analysis Department, Motorola • Jun Tan, Motorola • Choon Yik Tang, Honeywell • Alain B. Tchagang, Department of Control Science and Dynamical Systems, University of Minnesota • Ahmed Tewfik, Department of Electrical Engineering, University of Minnesota • Dan Thomsen • Eric van den Berg, Telcordia Technologies • Sriram Vishwanath, Department of Electrical and Computer Engineering, University of Texas - Austin • Pascal Olivier Vontobel, Department of Electrical and Computer Engineering, University of Wisconsin - Madison • Tho T. Vu, Top-Vu Technology, Inc. • Fan Wang, Motorola • Xiaodong Wang, Department of Electrical Engineering, Columbia University • Xin Wang, Department of Electrical and Computer Engineering, University of Minnesota • Zhengdao Wang, Department of Electrical and Computer Engineering, Iowa State University • Wing Shing Wong, Department of Information Engineering, Chinese University of Hong Kong • Dapeng Wu, Department of Electrical and Computer Engineering, University of Florida
LIST OF WORKSHOP PARTICIPANTS
361
• Jinhong Wu, Department of Electrical and Computer Engineering, George Washington University • Pengfei Xia, Department of Electrical and Computer Engineering, University of Minnesota • Yunjung Vi, Honeywell • George Yin, Department of Mathematics, Wayne State University • Hossein Zare, Department of Electrical and Computer Engineering, University of Minnesota • Ofer Zeitouni, School of Mathematics, University of Minnesota • Lisa Zhang, Computing Sciences Research Center, Bell Laboratories, Lucent Technologies • Qian Zhang, Department of Electrical and Computer Engineering, University of Wisconsin - Madison • Qing Zhang, Department of Mathematics, University of Georgia • Yuping Zhang, Department of Electrical and Computer Engineering, University of Minnesota
1999-2000 2000-2001 2001-2002 2002-2003 2003-2004 2004-2005 2005-2006 2006-2007 2007-2008 2008-2009
Reactive Flows and Transport Phenomena Mathematics in Multimedia Mathematics in the Geosciences Optimization Probability and Statistics in Complex Systems: Genomics, Networks, and Financial Engineering Mathematics of Materials and Macromolecules: Multiple Scales, Disorder, and Singularities Imaging Applications of Algebraic Geometry Mathematics of Molecular and Cellular Biology Mathematics and Chemistry
IMA SUMMER PROGRAMS 1987 1988 1989 1990 1991 1992 1993 1994 1995
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Robotics Signal Processing Robust Statistics and Diagnostics Radar and Sonar (June 18-29) New Directions in Time Series Analysis (July 2-27) Semiconductors Environmental Studies: Mathematical, Computational, and Statistical Analysis Modeling, Mesh Generation, and Adaptive Numerical Methods for Partial Differential Equations Molecular Biology Large Scale Optimizations with Applications to Inverse Problems, Optimal Control and Design, and Molecular and Structural Optimization Emerging Applications of Number Theory (July 15-26) Theory of Random Sets (August 22-24) Statistics in the Health Sciences Coding and Cryptography (July 6-18) Mathematical Modeling in Industry (July 22-31) Codes, Systems, and Graphical Models (August 2-13, 1999) Mathematical Modeling in Industry: A Workshop for Graduate Students (July 19-28) Geometric Methods in Inverse Problems and PDE Control (July 16-27) Special Functions in the Digital Age (July 22-August 2) Probability and Partial Differential Equations in Modern Applied Mathematics (July 21-August 1) n-Categories: Foundations and Applications (June 7-18) Wireless Communications (June 22-July 1) Symmetries and Overdetermined Systems of Partial Differential Equations (July 17-August 4)
IMA "HOT TOPICS" WORKSHOPS • Challenges and Opportunities in Genomics: Production, Storage, Mining and Use, April 24-27, 1999 • Decision Making Under Uncertainty: Energy and Environmental Models, July 20-24, 1999 • Analysis and Modeling of Optical Devices, September 9-10, 1999 • Decision Making under Uncertainty: Assessment of the Reliability of Mathematical Models, September 16-17, 1999 • Scaling Phenomena in Communication Networks, October 22-24, 1999 • Text Mining, April 17-18, 2000 • Mathematical Challenges in Global Positioning Systems (GPS), August 16-18,2000 • Modeling and Analysis of Noise in Integrated Circuits and Systems, August 29-30, 2000 • Mathematics of the Internet: E-Auction and Markets, December 3-5, 2000 • Analysis and Modeling of Industrial Jetting Processes, January 10-13,2001 • Special Workshop: Mathematical Opportunities in Large-Scale Network Dynamics, August 6-7, 2001 • Wireless Networks, August 8-10 2001 • Numerical Relativity, June 24-29, 2002 • Operational Modeling and Biodefense: Problems, Techniques, and Opportunities, September 28, 2002 • Data-driven Control and Optimization, December 4-6, 2002 • Agent Based Modeling and Simulation, November 3-6, 2003 • Enhancing the Search of Mathematics, April 26-27, 2004 • Compatible Spatial Discretizations for Partial Differential Equations, May 11-15, 2004 • Adaptive Sensing and Multimode Data Inversion, June 27-30,2004 • Mixed Integer Programming, July 25-29, 2005 • New Directions in Probability Theory, August 5-6, 2005 • Negative Index Materials, October 2-4, 2006 • The Evolution of Mathematical Communication in the Age of Digital Libraries, December 8-9, 2006
SPRINGER LECTURE NOTES FROM THE IMA: The Mathematics and Physics of Disordered Media Editors: Barry Hughes and Barry Ninham (Lecture Notes in Math., Volume 1035, 1983) Orienting Polymers Editor: J.L. Ericksen (Lecture Notes in Math., Volume 1063, 1984) New Perspectives in Thermodynamics Editor: James Serrin (Springer-Verlag, 1986) Models of Economic Dynamics Editor: Hugo Sonnenschein (Lecture Notes in Econ., Volume 264, 1986)
The IMA Volumes in Mathematics and its Applications
Current Volumes: Homogenization and EffectiveModuli of Materials and Media 1. Ericksen, D. Kinderlehrer, R. Kohn, and J.-L. Lions (eds.)
2
3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23
Oscillation Theory, Computation, and Methods of Compensated Compactness C. Dafermos, 1. Ericksen, D. Kinderlehrer, and M. Slemrod (eds.) Metastability and Incompletely Posed Problems S. Antman, 1. Ericksen, D. Kinderlehrer, and 1. Muller (eds.) Dynamical Problems in Continuum Physics 1. Bona, C. Dafermos, 1. Ericksen, and D. Kinderlehrer(eds.) Theory and Applications of Liquid Crystals 1. Ericksen and D. Kinderlehrer(eds.) Amorphous Polymers and Non-Newtonian Fluids C. Dafermos,1. Ericksen, and D. Kinderlehrer (eds.) Random Media G. Papanicolaou (ed.) Percolation Theory and Ergodic Theory of Infinite Particle Systems H. Kesten (ed.) Hydrodynamic Behavior and Interacting Particle Systems G. Papanicolaou (ed.) Stochastic Differential Systems,Stochastic Control Theory, and Applications W. Fleming and P.-L. Lions (eds.) Numerical Simulation in Oil Recovery M.F. Wheeler (ed.) Computational Fluid Dynamics and Reacting Gas Flows B. Engquist, M. Luskin, and A. Majda (eds.) Numerical Algorithms for Parallel Computer Architectures M.H. Schultz (ed.) Mathematical Aspects of ScientificSoftware l.R. Rice (ed.) Mathematical Frontiers in Computational Chemical Physics D. Truhlar (ed.) Mathematics in Industrial Problems A. Friedman Applications of Combinatorics and Graph Theory to the Biological and Social Sciences F. Roberts (ed.) q-Series and Partitions D. Stanton (ed.) Invariant Theory and Tableaux D. Stanton (ed.) Coding Theory and Design Theory Part I: Coding Theory D. Ray-Chaudhuri (ed.) Coding Theory and Design Theory Part II: Design Theory D. Ray-Chaudhuri (ed.) Signal Processing Part I: Signal Processing Theory L. Auslander, F.A. Grunbaum,1.W. Helton, T. Kailath, P. Khargonekar,and S. Mitter (eds.) Signal Processing Part II: Control Theory and Applications of Signal Processing L. Auslander, F.A. Grtinbaum, 1.W. Helton, T. Kailath, P. Khargonekar, and S. Mitter (eds.)
24 25 26 27
28 29 30
31 32 33 34
35 36
37 38 39 40
41 42
43
44 45
46
47 48
Mathematics in Industrial Problems, Part2 A. Friedman Solitons in Physics, Mathematics, and Nonlinear Optics P.J. Olver and D.H. Sattinger (eds.) Two PhaseFlowsand Waves D.O. Joseph and D.G. Schaeffer (eds.) Nonlinear Evolution Equations that Change Type B.L. Keyfitz and M. Shearer(eds.) Computer Aided Proofs in Analysis K. Meyerand D. Schmidt(eds.) Multidimensional Hyperbolic Problems and Computations A. Majda and J. Glimm(eds.) Microlocal Analysis and Nonlinear Waves M. Beals, R. Melrose, and J. Rauch (eds.) Mathematics in Industrial Problems, Part3 A. Friedman Radar and Sonar,Part I R. Blahut,W. Miller,Jr., and C. Wilcox Directions in Robust Statistics and Diagnostics: Part I W.A. Stahel and S. Weisberg (eds.) Directions in Robust Statistics and Diagnostics: Part II W.A. Stahel and S. Weisberg (eds.) Dynamical Issues in Combustion Theory P. Fife, A. Lifian, and F.A.. Williams (eds.) Computing and Graphics in Statistics A. Buja and P. Tukey (eds.) Patterns and Dynamics in Reactive Media H. Swinney, G. Aris,and D. Aronson (eds.) Mathematics in Industrial Problems, Part4 A. Friedman Radarand Sonar,Part II F.A. Grunbaum, M. Bernfeld, and R.E. Blahut(eds.) Nonlinear Phenomena in Atmospheric and Oceanic Sciences G.F. Carnevaleand R.T. Pierrehumbert (eds.) Chaotic Processes in the Geological Sciences D.A. Yuen (ed.) Partial Differential Equations withMinimal Smoothness and Applications B. Dahlberg, E. Fabes,R. Fefferman, D. Jerison, C. Kenig,and 1. Pipher (eds.) On the Evolution of Phase Boundaries M.E. Gurtin and G.B. McFadden TwistMappings and TheirApplications R. McGeheeand K.R. Meyer(eds.) New Directions in TimeSeriesAnalysis, PartI D. Brillinger, P. Caines, 1. Geweke, E. Parzen, M. Rosenblatt, and M.S. Taqqu (eds.) New Directions in Time SeriesAnalysis, PartII D. Brillinger, P. Caines, 1. Geweke, E. Parzen, M. Rosenblatt, and M.S. Taqqu (eds.) Degenerate Diffusions W.-M. Ni, L.A. Peletier,and J.-L. Vazquez (eds.) LinearAlgebra, Markov Chains, and Queueing Models C.D. Meyer and R.J. Plemmons (eds.)
49 50 51
52 53 54 55 56 57 58 59 60 61 62 63
64
65 66 67 68 69 70 71 72 73
74
Mathematics in Industrial Problems, Part 5 A. Friedman Combinatorial and Graph-Theoretic Problems in Linear Algebra R.A. Brualdi,S. Friedland, and V. Klee (eds.) Statistical Thermodynamics and Differential Geometry of Microstructured Materials H.T. Davis and J.e.c. Nitsche(eds.) Shock Induced Transitions and Phase Structures in General Media l.E. Dunn,R. Fosdick,and M. Slemrod(eds.) Variational and Free Boundary Problems A. Friedman and J. Spruck (eds.) Microstructure and Phase Transitions D. Kinderlehrer, R. lames, M. Luskin, and J.L. Ericksen(eds.) Turbulence in Fluid Flows: A Dynamical Systems Approach G.R. Sell, C. Foias, and R. Temam (eds.) Graph Theory and Sparse Matrix Computation A. George,J.R. Gilbert, and l.W.H. Liu (eds.) Mathematics in Industrial Problems, Part 6 A. Friedman Semiconductors, Part I W.M. Coughran, 1r.,J. Cole, P. Lloyd,and J. White (eds.) Semiconductors, Part II W.M. Coughran, Jr., J. Cole, P. Lloyd,and J. White (eds.) Recent Advances in Iterative Methods G. Golub, A. Greenbaum, and M. Luskin (eds.) Free Boundaries in Viscous Flows R.A. Brown and S.H. Davis (eds.) Linear Algebra for Control Theory P. Van Dooren and B. Wyman(eds.) Hamiltonian Dynamical Systems: History, Theory, and Applications H.S. Dumas, K.R. Meyer, and D.S. Schmidt(eds.) Systems and Control Theory for Power Systems J.H. Chow,P.V. Kokotovic, RJ. Thomas(eds.) Mathematical Finance M.H.A. Davis,D. Duffie, W.H. Fleming, and S.E. Shreve (eds.) Robust Control Theory B.A. Francisand P.P. Khargonekar (eds.) Mathematics in Industrial Problems, Part 7 A. Friedman Flow Control M.D. Gunzburger (ed.) Linear Algebra for Signal Processing A. Bojanczyk and G. Cybenko(eds.) Control and Optimal Design of Distributed Parameter Systems J.E. Lagnese, D.L. Russell, and L.W. White (eds.) Stochastic Networks F.P. Kelly and R.I. Williams (eds.) Discrete Probability and Algorithms D. Aldous, P. Diaconis, 1. Spencer, and J.M. Steele (eds.) Discrete Event Systems, Manufacturing Systems, and Communication Networks P.R. Kumarand P.P. Varaiya(eds.) Adaptive Control, Filtering, and Signal Processing KJ. Astrom, G.C. Goodwin, and P.R. Kumar (eds.)
75
76 77 78 79 80 81 82 83 84 85 86 87 88 89
90 91 92
93
94
95 96 97 98
Modeling, Mesh Generation, and Adaptive Numerical Methods for Partial Differential Equations I. Babuska,lE. Flaherty, W.D. Henshaw,lE. Hopcroft, lE. Oliger,and T. Tezduyar (eds.) Random Discrete Structures D. Aldousand R. Pemantle(eds.) Nonlinear Stochastic PDEs: Hydrodynamic Limit and Burgers' Turbulence T. Funaki and W.A. Woyczynski (eds.) Nonsmooth Analysis and Geometric Methods in Deterministic Optimal Control B.S. Mordukhovich and R.I. Sussmann(eds.) Environmental Studies: Mathematical, Computational, and Statistical Analysis M.F. Wheeler(ed.) Image Models (and their Speech Model Cousins) S.E. Levinsonand L. Shepp (eds.) Genetic Mapping and DNASequencing T. Speed and M.S. Waterman (eds.) Mathematical Approaches to Biomolecular Structure and Dynamics J.P. Mesirov, K. Schulten, and D. Sumners (eds.) Mathematics in Industrial Problems, Part 8 A. Friedman Classical and Modern Branching Processes K.B. Athreyaand P. Jagers (eds.) Stochastic Models in Geosystems S.A. Molchanovand W.A. Woyczynski (eds.) Computational Wave Propagation B. Engquist and G.A. Kriegsmann (eds.) Progress in Population Genetics and Human Evolution P. Donnellyand S. Tavare (eds.) Mathematics in Industrial Problems, Part 9 A. Friedman Multiparticle Quantum Scattering With Applications to Nuclear, Atomic and Molecular Physics D.G. Truhlar and B. Simon (eds.) Inverse Problems in Wave Propagation G. Chavent, G. Papanicolau,P. Sacks, and W.W. Symes(eds.) Singularities and Oscillations J. Rauch and M. Taylor (eds.) Large-Scale Optimization with Applications, Part I: Optimization in Inverse Problems and Design L.T. Biegler, T.F. Coleman, A.R. Conn, and F. Santosa (eds.) Large-Scale Optimization with Applications, Part II: Optimal Design and Control L.T. Biegler, T.F. Coleman, A.R. Conn, and F. Santosa (eds.) Large-Scale Optimization with Applications, Part III: Molecular Structure and Optimization L.T. Biegler, T.F. Coleman, A.R. Conn, and F. Santosa (eds.) Quasiclassical Methods J. Rauch and B. Simon (eds.) Wave Propagation in Complex Media G. Papanicolaou(ed.) Random Sets: Theory and Applications 1. Goutsias,R.P.S. Mahler, and H.T. Nguyen (eds.) Particulate Flows: Processing and Rheology D.A. Drew, D.D. Joseph, and S.L. Passman(eds.)
99
100 101 102 103 104 105 106 107 108
Mathematics of Multiscale Materials K.M. Golden, G.R. Grimmett, R.D. James, G.W. Milton, and P.N. Sen (eds.) Mathematics in Industrial Problems, Part 10 A. Friedman Nonlinear Optical Materials J.V. Moloney(ed.) Numerical Methods for Polymeric Systems S.G. Whittington (ed.) Topology and Geometry in Polymer Science S.G. Whittington, D. Sumners,and T. Lodge (eds.) Essays on Mathematical Robotics 1. Baillieul, S.S. Sastry, and H.J. Sussmann (eds.) Algorithms For Parallel Processing M.T. Heath, A. Ranade, and R.S. Schreiber(eds.) Parallel Processing of Discrete Problems P.M. Pardalos (ed.) The Mathematics of Information Coding, Extraction, and Distribution G. Cybenko, D.P. O'Leary, and 1. Rissanen (eds.) Rational Drug Design D.G. Truhlar, W. Howe, A.J. Hopfinger,
1. Blaney, and R.A. Dammkoehler (eds.) 109 110 111 112 113 114 115
Emerging Applications of Number Theory D.A. Hejhal, 1. Friedman,M.C. Gutzwiller, and A.M. Odlyzko (eds.) Computational Radiology and Imaging: Therapy and Diagnostics
C. Borgers and F. Natterer(eds.) Evolutionary Algorithms L.D. Davis, K. De Jong, M.D. Vose, and L.D. Whitley(eds.) Statistics in Genetics M.E. Halloranand S. Geisser (eds.) Grid Generation and Adaptive Algorithms M.W. Bern, J .E. Flaherty,and M. Luskin (eds.) Diagnosis and Prediction S. Geisser(ed.) Pattern Formation in Continuous and Coupled Systems: A Survey Volume
M. Golubitsky, D. Luss, and S.H. Strogatz(eds.) 116 117
Statistical Models in Epidemiology, the Environment, and Clinical Trials M.E. Halloran and D. Berry (eds.) Structured Adaptive Mesh Refinement (SAMR) Grid Methods
S.B. Baden, N.P. Chrisochoides, D.B. Gannon, and M.L. Norman (eds.) 118
Dynamics of Algorithms
R. de la Llave, L.R. Petzold,and J. Lorenz (eds.) 119
Numerical Methods for Bifurcation Problems and Large-Scale Dynamical Systems
E. Doedel and L.S. Tuckerman (eds.) 120
Parallel Solution of Partial Differential Equations
P. Bjerstad and M. Luskin(eds.) 121
Mathematical Models for Biological Pattern Formation
P.K. Maini and H.G. Othmer(eds.) 122 123
Multiple-Time-Scale Dynamical Systems C.K.R.T. Jones and A. Khibnik (eds.) Codes, Systems, and Graphical Models
B. Marcus and J. Rosenthal (eds.) 124
Computational Modeling in Biological Fluid Dynamics
L.J. Fauci and S. Gueron(eds.) 125
Mathematical Approaches for Emerging and Reemerging Infectious Diseases: An Introduction C. Castillo-Chavez with S. Blower, P. van den Driessche, D. Kirschner.
and A.A. Yakubu (OOs.)
126
127 128 129 130 131 132 133 134 135
136
137 138 139 140 141 142
Mathematical Approaches for Emerging and Reemerging Infectious Diseases: Models, Methods, and Theory C. Castillo-Chavez with S. Blower,P. vanden Driessche, D. Kirschner, and A.A. Yakubu (eds.) Mathematics of the Internet: E-Auction and Markets B. Dietrich and R.V. Vohra (eds.) DecisionMaking Under Uncertainty: Energy and Power C. Greengard and A. Ruszczynski (eds.) Membrane Transport and Renal Physiology H. Layton and A.M. Weinstein (eds.) Atmospheric Modeling D.P. Chock and G.R. Carmichael (eds.) Resource Recovery, Confinement, and Remediation of Environmental Hazards J. Chadam, A. Cunningham,R.E. Ewing, P. Ortoleva, and M.F. Wheeler (eds.) Fractals in Multimedia M.F. Barnsley, D. Saupe, and E.R. Vrscay (eds.) Mathematical Methods in Computer Vision PJ. Olver and A. Tannenbaum (eds.) Mathematical Systems Theory in Biology, Communications, Computation, and Finance 1. Rosenthal and D.S. Gilliam (eds.) Transport in Transition Regimes N. Ben Abdallah, A. Arnold, P. Degond, I. Gamba, R. Glassey, C.D. Lawrence, and C. Ringhofer (eds.) Dispersive Transport Equations and Multiscale Methods N. Ben Abdallah, A. Arnold, P. Degond, I. Gamba, R. Glassey, C.D. Lawrence, and C. Ringhofer (eds.) Geometric Methods in Inverse Problems and PDE Control C.B. Cooke, I. Lasiecka, G. Uhlmann, and M.S. Vogelius(eds.) Mathematical Foundations of Speech and Language Processing M. Johnson, S. Khudanpur, M. Ostendorf, and R. Rosenfeld (eds.) Time Series Analysis and Applications to Geophysical Systems D.R. Brillinger, E.A. Robinson, and F.P. Schoenberg (eds.) Probability and Partial Differential Equations in Modern Applied Mathematics E.C. Waymire and J. Duan (eds.) Modeling of Soft Matter Maria-Carme T. Calderer and Eugene M. Terentjev (eds.) Compatible Spatial Discretizations Douglas N. Arnold, Pavel B. Bochev, Richard B. Lehoucq, Roy A. Nicolaides, and Mikhail Shashkov (eds.)