CITv3n6.qxd
6/20/2007
12:36 PM
Page 1
Eduard Jorswieck and Holger Boche Majorization Theory and Matrix-Monotone Functions in Wireless Communications, reviews the basic definitions of Majorization Theory and Matrix-Monotone Functions, describing their concepts clearly with many illustrative examples. In addition to this tutorial, new results are presented with respect to Schur-convex functions and regarding the properties of matrixmonotone functions. The approach taken by the authors provides a valuable overview of the basic techniques for readers who are new to the subject. They then proceed to show in separate chapters the cutting edge applications of the two basic theories in wireless communications.
3:6 (2006)
Majorization and Matrix-Monotone Functions in Wireless Communications Eduard Jorswieck and Holger Boche
Eduard Jorswieck and Holger Boche
Majorization Theory and Matrix-Monotone Functions in Wireless Communications is an invaluable resource for students, researchers and practitioners involved in the state-of-the-art design of wireless communication systems.
FnT CIT 3:6 Majorization and Matrix-Monotone Functions
Majorization and Matrix-Monotone Functions in Wireless Communications
Foundations and Trends® in Communications and Information Theory
This book is originally published as Foundations and Trends® in Communications and Information Theory Volume 3 Issue 6 (2006), ISSN: 1567-2190.
now
now the essence of knowledge
Majorization and Matrix-Monotone Functions in Wireless Communications
Majorization and Matrix-Monotone Functions in Wireless Communications Eduard Jorswieck Department of Electrical Engineering Royal Institute of Technology 11400 Stockholm, Sweden
[email protected]
Holger Boche Fraunhofer Institute for Telecommunications Heinrich-Hertz-Institut Einsteinufer 37 10587 Berlin, Germany
[email protected]
Boston – Delft
R Foundations and Trends in Communications and Information Theory
Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 USA Tel. +1-781-985-4510 www.nowpublishers.com
[email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is E. Jorswieck and H. Boche, Majorization and Matrix-Monotone Functions in Wireless Communications, Foundation and R Trends in Communications and Information Theory, vol 3, no 6, pp 553–701, 2006
ISBN: 978-1-60198-040-3 c 2007 E. Jorswieck and H. Boche
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). The ‘services’ for users can be found on the internet at: www.copyright.com For those organizations that have been granted a photocopy license, a separate system of payment has been arranged. Authorization does not extend to other kinds of copying, such as that for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. In the rest of the world: Permission to photocopy must be obtained from the copyright owner. Please apply to now Publishers Inc., PO Box 1024, Hanover, MA 02339, USA; Tel. +1-781-871-0245; www.nowpublishers.com;
[email protected] now Publishers Inc. has an exclusive license to publish this material worldwide. Permission to use this content must be obtained from the copyright license holder. Please apply to now Publishers, PO Box 179, 2600 AD Delft, The Netherlands, www.nowpublishers.com; e-mail:
[email protected]
R Foundations and Trends in Communications and Information Theory Volume 3 Issue 6, 2006 Editorial Board
Editor-in-Chief: Sergio Verd´ u Depart of Electrical Engineering Princeton University Princeton, New Jersey 08544 USA
[email protected] Editors Venkat Anantharam (UC. Berkeley) Ezio Biglieri (U. Torino) Giuseppe Caire (U. Sounthern California) Roger Cheng (U. Hong Kong) K.C. Chen (Taipei) Daniel Costello (U. Notre Dame) Thomas Cover (Stanford) Anthony Ephremides (U. Maryland) Andrea Goldsmith (Stanford) Dave Forney (MIT) Georgios Giannakis (U. Minnesota) Joachim Hagenauer (TU Munich) Te Sun Han (Tokyo) Babak Hassibi (Caltech) Michael Honig (Northwestern) Johannes Huber (Erlangen) Hideki Imai (Tokyo) Rodney Kennedy (Canberra) Sanjeev Kulkarni (Princeton)
Amos Lapidoth (ETH Zurich) Bob McEliece (Caltech) Neri Merhav (Technion) David Neuhoff (U. Michigan) Alon Orlitsky (UC. San Diego) Vincent Poor (Princeton) Kannan Ramchandran (UC. Berkeley) Bixio Rimoldi (EPFL) Shlomo Shamai (Technion) Amin Shokrollahi (EPFL) Gadiel Seroussi (MSRI) Wojciech Szpankowski (Purdue) Vahid Tarokh (Harvard) David Tse (UC. Berkeley) Ruediger Urbanke (EPFL) Steve Wicker (Cornell) Raymond Yeung (Hong Kong) Bin Yu (UC. Berkeley)
Editorial Scope R Foundations and Trends in Communications and Information Theory will publish survey and tutorial articles in the following topics:
• Coded modulation
• Multiuser detection
• Coding theory and practice
• Multiuser information theory
• Communication complexity
• Optical communication channels
• Communication system design
• Pattern recognition and learning
• Cryptology and data security
• Quantization
• Data compression
• Quantum information processing
• Data networks
• Rate-distortion theory
• Demodulation and Equalization
• Shannon theory
• Denoising • Detection and estimation
• Signal processing for communications
• Information theory and statistics
• Source coding
• Information theory and computer science
• Storage and recording codes
• Joint source/channel coding
• Wireless Communications
• Speech and Image Compression
• Modulation and signal design
Information for Librarians R Foundations and Trends in Communications and Information Theory, 2006, Volume 3, 6 issues. ISSN paper version 1567-2190. ISSN online version 15672328. Also available as a combined paper and online subscription.
R in Foundations and Trends Communications and Information Theory Vol. 3, No. 6 (2006) 553–701 c 2007 E. Jorswieck and H. Boche
DOI: 10.1561/0100000026
Majorization and Matrix-Monotone Functions in Wireless Communications Eduard Jorswieck1 and Holger Boche2,3,4 1
2
3
4
Royal Institute of Technology, Department of Electrical Engineering, Signal Processing, 11400 Stockholm, Sweden,
[email protected] Technical University of Berlin, Department of Electrical Engineering, Heinrich-Hertz Chair for Mobile Communications, HFT-6 Einsteinufer 25, 10587 Berlin, Germany Fraunhofer German-Sino Lab for Mobile Communications MCI, Einsteinufer 37, 10587 Berlin, Germany Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, Einsteinufer 37, 10587 Berlin, Germany,
[email protected]
Abstract This short tutorial presents two mathematical techniques namely Majorization Theory and Matrix-Monotone Functions, reviews their basic definitions and describes their concepts clearly with many illustrative examples. In addition to this tutorial, new results are presented with respect to Schur-convex functions and regarding the properties of matrix-monotone functions. The techniques are applied to solve communication and information theoretic problems in wireless communications. The impact of spatial correlation in multiple antenna systems is characterized for many important performance measures, e.g., average mutual informaEb tion, outage probability, error performance, minimum N and wide0 band slope, zero-outage capacity, and capacity region. The impact of
user distribution in cellular systems is characterized for different scenarios including perfectly informed transmitters and receivers, regarding, e.g., the average sum rate, the outage sum rate, maximum throughput. Finally, a unified framework for the performance analysis of multiple antenna systems is developed based on matrix-monotone functions. The optimization of transmit strategies for multiple antennas is carried out by optimization of matrix-monotone functions. The results within this framework resemble and complement the various results on optimal transmit strategies in single-user and multiple-user multiple-antenna systems.
Contents
1 Introduction
1
1.1 1.2 1.3 1.4
1 3 4 7
Majorization Theory Matrix-Monotone Functions Classification and Organization Notation
2 Majorization Theory 2.1 2.2 2.3
9
Definition and Examples Basic Results Majorization and Optimization
10 18 30
3 Matrix-Monotone Functions
33
3.1 3.2 3.3 3.4
33 43 47 52
Definition and Examples Basic Characterizations Matrix Norms Further Properties
4 Application of Majorization in Wireless Communications
59
4.1 4.2
59 86
Spatial Correlation in Multiple Antenna Systems User Distribution in Cellular Communication Systems ix
5 Application of Matrix-Monotone Functions in Wireless Communications
103
5.1 5.2
103 114
Generalized Multiple Antenna Performance Measures Optimization of Matrix-Monotone Functions
6 Appendix
133
6.1 6.2
133 134
Linear Algebra Convex Optimization
7 Acknowledgments
141
References
143
1 Introduction
This short tutorial presents two mathematical techniques namely Majorization Theory and Matrix-Monotone Functions which are applied to solve communication and information theoretic problems in wireless communications.
1.1
Majorization Theory
Inequalities have been always a major mathematical research area beginning with Gauß, Cauchy, and others. Pure and applied mathematical analysis needs inequalities, e.g., absolute inequalities, triangle inequalities, integral or differential inequalities, and so on. The building blocks of Majorization are contained in the book [48]. The complete theory including many applications is presented in [92]. The theory is about the question how to order vectors with nonnegative real components and its order-preserving functions, i.e., functions f which satisfy that for x y it follows f (x) ≥ f (y). The characterization of this class of functions is important to exploit the properties of this monotony. In the wireless communication context, those functions arise naturally in resource allocation for multiple user systems or multiple 1
2
Introduction
antenna systems, e.g., sum rate of the multiple access channel (MAC) with K users and channels α1 , . . . , αK as a function of the power allocation p1 , . . . , pK with inverse noise power ρ ! K X C(p) = log 1 + ρ pk αk . k=1
PK Assume that the sum power is constraint to K, i.e., k=1 pk = K. Order the components α1 ≥ α2 ≥ · · · ≥ αK ≥ 0 and p1 ≥ p2 ≥ · · · ≥ pK ≥ 0. The function C turns out to be Schur-convex with respect to p, i.e., monotonic decreasing with respect to the Majorization order. If p q then C(p) ≥ C(q). Therefore, the maximum value is attained for a power allocation vector with elements, i.e., C([K, 0, . . . , 0]) ≥ C(p) ≥ C(1). This monotony behavior is illustrated for K = 2 with power allocation p = [2 − p, p] in Figure 1.1. This result implies that TDMA is optimal, because the complete transmit power is optimally allocated to one user [80].
Fig. 1.1 Sum rate of MAC with channels α1 = 2, α2 = 1 as a function of the power allocation p = [2 − p, p].
1.2 Matrix-Monotone Functions
3
Most of the basic definitions and basic properties can be found in the text books [8, 48, 50, 51, 92]. Majorization theory is a valuable tool and it is successfully applied in many research areas, e.g., in optimization [39, 168], signal processing and mobile communications [59, 105], and quantum information theory [101].
1.2
Matrix-Monotone Functions
More than 70 years have passed since L¨owner [88] proposed the notion of matrix-monotone functions. A real, continuous function f : I → R defined on a nontrivial interval I is said to be matrix monotone of order n if X ≥ Y ⇒ f (X) ≥ f (Y ) for any pair of self-adjoint n × n matrices X and Y with eigenvalues in I. L¨owner characterized the set of matrix-monotone functions of order n in terms of the positivity of certain determinants (the so-called L¨owner determinants and the related Pick determinants), and proved that a function is matrix monotone if and only if it allows an analytic continuation to a Pick function; that is, an analytic function defined in the complex upper half-plane, with nonnegative imaginary part. A function is called matrix monotone if it is matrix monotone for all orders n. A representation theorem was proven for the class of matrixmonotone functions [34, 83, 88, 156]. Every matrix-monotone function f can be expressed as Z f (t) = a + bt + 0
∞
st dµ(s) s+t
(1.1)
with a positive measure µ ∈ [0, ∞) and real constants a, b ≥ 0. Representatives of the class of matrix-monotone functions arise naturally in the context of multiple antenna systems in the single- as well as in the multiuser context. The two most important examples are the mutual information and the minimum mean square error (MMSE) in multiple-input multiple-output (MIMO) systems. Consider the mutual
4
Introduction
information1 for the vector model y = Hx + n between x and y for independently complex zero-mean Gaussian distributed x and n with covariances Q and I f (HQH H ) = I(x; y) = log det I + HQH H . The mutual information denoted as the function f (HQH H ) = tr log I + HQH H can be represented by the matrix-monotone function f (t) = log(1 + t) which has the integral representation Z ∞ t 1 f (t) = ds. s + ts 1 Hence, all results that hold for matrix-monotone function also hold for the mutual information and (as we will show later) for the MMSE. This approach allows to unify many recent results and it is possible to extract the main principles and concepts. Finally, matrix-monotone functions are applied in many other areas, e.g., in optimization [25] and signal processing for communications [71].
1.3
Classification and Organization
1.3.1
Classification and Differences to Related Literature
The well-established book [92] contains more results on Majorization than this short tutorial. The main difference is that this tutorial focusses on a subset of topics from [92], especially results regarding averages and distributions of weighted random variables, as well as averages of trace functions. These topics are treated in more detail, new results are added (from subsection 2.2.3 until subsection 2.2.7), and the connection to the application in communication theory is always kept in mind. Furthermore, the first two tutorial chapters are rigorous in the sense that they contain all necessary definitions and results but additionally contain also many remarks and examples which help the reader to understand the concepts. There exist approaches in the literature that propose a unified framework for analysis and optimization of MIMO systems. First, the 1 Without
any operational meaning for the moment.
1.3 Classification and Organization
5
PhD thesis [104] provides a framework for optimization of linear MIMO systems also by using Majorization theory. The tutorial [107] extends these results to nonlinear decision feedback MIMO systems. Interestingly, the application of Majorization in the other tutorial [107] is not for analysis of impact of fading parameters on system performance but for the optimization of single-user transmit strategies under various objective criteria. Another difference to the tutorial [107] is that the article at hand offers two own full chapters with a tutorial of the mathematical techniques used. Therefore, both tutorial complement one another well. Another related tutorial is [122] which studies the active field of interference function calculus. An interesting overview presentation is given in the plenary lecture at the workshop on signal processing advances in wireless communications in June 2007 [12]. Furthermore, a unified analytical description of MIMO systems was studied in the PhD thesis [79]. The main focus in [79] is to derive a framework for analytically computing closed-form expressions of MIMO transceiver performances which are then used for optimization. Finally, the connection between the capacity and mean-square-error (MSE) from an estimation and information theoretic point of view was analyzed in the PhD thesis [42]. The thesis contains one part that clearly shows the connection between the capacity and MMSE for various channel models, e.g., discrete, continuous, scalar, and vector channels and different input signals. In subsection 5.1.2 three different relationships between the mutual information and the MMSE are described. 1.3.2
Organization
The first two chapters present the definitions, properties, and many examples to explain the foundations and concepts of the two techniques. The three main topics discussed are (a) the partial order on vectors and matrices, (b) the characterization of order preserving functions, (c) the optimization of Schur-convex and matrix-monotone functions.
6
Introduction
The main goal of these two chapters is to make the reader familiar with the basic concepts and to enable her to apply these techniques to problems in his or her respective research area. The various examples illustrate the theoretical concepts and reconnect to practical problem statements. In “Majorization Theory,” we present novel results with respect to Schur-convexity and Schur-concavity for the most general classes of functions and constraints. Later in “Application of Majorization in Wireless Communications,” these functions obtain their operational meaning in the context of communication theory. In “Matrix-Monotone Functions,” we present novel results in terms of bounds for matrix-monotone functions, optimization of matrixmonotone functions, and discuss the connection to matrix norms as well as to connections and means. In “Application of Majorization in Wireless Communications” and “Application of Matrix-Monotone Functions in Wireless Communications,” we apply the learned techniques to concrete problem statements from wireless communications. The four main application areas are (a) (b) (c) (d)
spatial correlation in multiple antenna systems, user distributions in cellular systems, development of a unified performance measure, optimization of MIMO system performance.
The main goal of these two chapters is to show under which conditions and assumptions both techniques can be used. Furthermore, it is shown how to interpret the results carefully to gain engineering insights into the design of wireless communication systems. In “Application of Majorization in Wireless Communications,” a measure for spatial correlation in multiple antenna communications is developed. This measure is exploited for various performance measures and in many scenarios to analyze the impact of spatial correlation. A measure for the user distribution in cellular systems is developed and the sum performance of up- and downlink communication as a function of the user distribution is analyzed. In “Application of Matrix-Monotone Functions in Wireless Communications,” we develop a generalized performance measure which unifies mutual information and MMSE criteria. Finally, the
1.4 Notation
7
results from “Matrix-Monotone Functions” are used to optimize the single-user and multi-user MIMO system performance. The appendix contains two sections with basic results from Linear Algebra and Convex Optimization. These results are used extensively throughout the book.
1.4
Notation
Vectors are denoted by boldface small letters a, b, and matrices by boldface capital letters A, B. AT , AH , and A−1 are the transpose, the conjugate transpose, and the inverse matrix operation, respectively. The identity matrix is I, and 1 is the vector with all ones. ◦ is the Schur-product and ⊗ is the Kronecker product. diag(X) is a vector with the entries of X on the diagonal. Diag(x) is a diagonal matrix with the entries of the vector x on its diagonal. Diag(A, B) is a blockdiagonal matrix with matrices A and B on the diagonal. A1/2 is the square root matrix of A and [A]j,k denotes the entry in the jth row and the kth column of A. The set of real numbers is denoted by R and the set of complex numbers by C. The set of positive integers is N+ . Denote the set of all n × n positive semi-definite matrices by Hn . The multivariate complex Gaussian distribution with mean m and covariance matrix Q is denoted by CN (m, Q). The expectation is denoted by E. The partial order for vectors x y says x majorizes y, or equivalently x y means x is majorized by y. For matrices the order A ≥ B means that A − B is positive semi-definite. The strict versions of these orders for vectors and matrices are denoted by , ≺, >, and <. [a]+ denotes the maximum of a and 0.
2 Majorization Theory
A total order is a binary relation on a set X . It is antisymmetric, transitive, and total. For example, the set of real numbers R can be totally ordered by the order relation less than < and greater than >. Consider the set of all vectors with two non-negative real components which sum up to one, i.e., {x ∈ R2+ : x1 + x2 = 1}. Since all vectors can be param t eterized by for 0 ≤ t ≤ 1, the corresponding order reduces to 1−t the one dimensional case. Let x and y be two dimensional non-negative real vectors. Without loss of generality, order the components of the vector in decreasing order and compare the first component x1 and y1 . If x1 ≥ y1 then the vector x is greater than or equal to y, e.g., 0.8 0.6 . However, this approach does not extend to the case 0.2 0.4 with three or more components. This chapter studies a certain partial order on vectors with more than two components. We will assume that the vector has non-negative real components. The partial order “Majorization” will describe when the components of a vector x are “less spread out” or “more nearly equal” than the components of another vector y. Further on, functions 9
10
Majorization Theory
that are monotone with respect to this order are called “Schur-convex” (monotonic increasing) or “Schur-concave” (monotonic decreasing) functions. Standard results as well as novel results regarding Schurconvex functions are reviewed, presented, and discussed. In order to keep the representation simple and increase readability, many examples illustrate the definitions and results.
2.1 2.1.1
Definition and Examples Majorization for Vectors
Definition 2.1 (Majorization for vectors). For two vectors x, y ∈ Rn with descending ordered components x1 ≥ x2 ≥ · · · ≥ xn ≥ 0 and y1 ≥ y2 ≥ · · · ≥ yn ≥ 0, one says that the vector x majorizes the vector y and writes x y if
m X k=1
xk ≥
m X
yk , m = 1, . . . , n − 1 and
k=1
n X k=1
xk =
n X
yk .
k=1
Example 2.1. Assume the situation in Figure (2.1). We have two different vectors. In scenario A and B the largest two components B A B (λA 1 = λ1 and λ2 = λ2 ) are equal. The smallest three components in B B scenario B are equal (λB 3 = λ4 = λ5 ), but in scenario A the smallest A A three components are unequal (λA 3 > λ4 > λ5 ). In addition to this, the sum of all components in scenario A and B is equal. Applying the order
Fig. 2.1 Example vectors: λA λB .
2.1 Definition and Examples
11
which is introduced in Definition 2.1, the vector in Scenario A majorizes the vector in Scenario B (λA λB ).
Remark 2.1. Note that sometimes the definition of majorization is the other way round. For example, in [50, p. 192], a vector x is said to majorize vector y if the sum of the smallest m components of x is greater than or equal to the sum of the smallest m components of y for all 1 ≤ m ≤ n. This can lead to contradictive statements. Note that the order of the components is notimportant to the defi0.8 0.2 nition, i.e., the following vectors are equal with respect 0.2 0.8 to the partial order in Definition 2.1. The elements are always assumed to be ordered in decreasing order. The strict version of Definition 2.1 is denoted by x y and means that the sum of the largest m components of the vector x is greater than the sum of the largest m components of vector y for all 1 ≤ m < n P P and nk=1 xk = nk=1 yk . For further definitions the less strict notion and will be used.
Example 2.2. The following vectors can be compared using Majorization 1 1 1 1 1 1 , ,..., ≺ , ,..., ,0 n n n n−1 n−1 n−1 1 1 ≺ ··· ≺ , , 0, . . . , 0 ≺ (1, 0, . . . , 0) . 2 2
Example 2.3. Since Majorization provides only a partial order, there exist vectors (with at least three components) that cannot be compared, e.g., 0.6 0.55 0.25? 0.4 . 0.15 0.05
12
Majorization Theory
Closely related to Majorization are doubly stochastic matrices. Definition 2.2 (Doubly stochastic matrix). An n × n matrix P is doubly stochastic if Pij ≥ 0 for 1 ≤ i, j ≤ n and n X i=1
Pij = 1, 1 ≤ j ≤ n,
n X
Pij = 1, 1 ≤ i ≤ n.
j=1
Related to the two dimensional example from the introduction to this t (1 − t) section, the 2 × 2 matrix is doubly stochastic for (1 − t) t all 0 ≤ t ≤ 1. In fact, the set of all doubly stochastic matrices can be parameterized by this matrix. The properties of doubly stochastic matrices are described in the following Theorems. Theorem 2.1(Birkhoff 1946). The permutation matrices constitute the extreme points of the set of doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the convex hull of the permutation matrices.
Theorem 2.2 (Representation of doubly stochastic matrices). Every n × n doubly stochastic matrix can be represented by a convex combination of at most n2 − 2n + 2 permutation matrices. The number n2 − 2n + 2 cannot be replaced by a smaller number. The next theorem connects the partial order majorization and doubly stochastic matrices. Theorem 2.3 (Majorization and doubly stochastic matrices). A necessary and sufficient condition for x y is that there exist a doubly stochastic matrix P such that y = P x. In order to illustrate this relationship the partial order itself, we provide two examples.
2.1 Definition and Examples
13
Example 2.4. Consider the example from the beginning, i.e., x = 0.8 0.6 = y. The corresponding doubly stochastic matrix is 0.2 0.4 given by ! 2 1 0.6 0.8 3 3 = 1 2 . 0.4 0.2 3
3
The following concept is used to compare vectors with different `1 norm. Definition 2.3 (Weak majorization). For x, y ∈ Rn with ordered components x1 ≥ x2 ≥ · · · ≥ xn and y1 ≥ y2 ≥ · · · ≥ yn , y (sub) majorizes weakly x, i.e., y w x means m X
xk ≤
k=1
m X
yk
for all
m = 1, . . . , n.
k=1
For x, y ∈ Rn with ordered components x1 ≥ x2 ≥ · · · ≥ xn and y1 ≥ y2 ≥ · · · ≥ yn , y (super) majorizes weakly x, i.e., y w x means m X k=1
xn−k+1 ≤
m X
yn−k+1
for all
m = 1, . . . , n.
k=1
The connection to doubly stochastic matrices is provided after the next definition. Definition 2.4 (Substochastic and superstochastic matrix). A nonnegative matrix P = [p]ij for which there exists a doubly stochastic matrix D = [d]ij with pij ≤ dij for all i, j is called substochastic matrix. Similarly a nonnegative matrix P = [p]ij for which there exists a doubly stochastic matrix D = [d]ij with pij ≥ dij for all i, j is called superstochastic matrix.
Remark 2.2. Note that by Theorems from von Neumann the existence of such a doubly stochastic matrix is assured [92, Thm. 2.C.1]. The
14
Majorization Theory
necessary and sufficient condition for weak (sub and super) majorization are y w x if and only if x = P 1 y y w x if and only if x = P 2 y with superstochastic matrix P 1 and substochastic matrix P 2 . Another type of majorization is defined next. Definition 2.5 (Log-majorization). Assume x = [x1 , . . . , xn ] and y = [y1 , . . . , yn ] with xk ≥ 0 and yk ≥ 0. If m Y k=1
xk ≤
m Y
yk
for m = 1, . . . , n − 1 and
n Y k=1
k=1
xk =
n Y
yk
k=1
then x is log-majorized by y, i.e., y log x. Log-marjoziation is stronger than majorization. This is described in the next theorem [92, 5.A.2.b]. Theorem 2.4 (Theorem 2.7 in [166]). Let the components of x, y ∈ Rn+ be nonnegative. Then x log y
2.1.2
implies
x w y.
Schur-convexity and Schur-concavity
The next definition describes a function Φ which is applied to the vectors x and y with x y. The function is “order-preserving” with respect to the partial order of Majorization if it is monotonic with respect to the partial order. Definition 2.6 (Schur-convex/Schur-concave). A real-valued function Φ defined on A ⊂ Rn is said to be Schur-convex on A if x y on A ⇒ Φ(x) ≥ Φ(y).
2.1 Definition and Examples
15
Similarly, Φ is said to be Schur-concave on A if x y on A ⇒ Φ(x) ≤ Φ(y). Next, consider one example of a Schur-convex function. Example 2.5. Suppose that x, y ∈ Rn+ are positive real vectors and the function Φ is defined as the sum of the squared components of P the vectors, i.e., Φ2 (x) = nk=1 |xk |2 . Then, it is easy to show that the function Φ2 is Schur-convex on Rn+ , i.e., if x y ⇒ Φ2 (x) ≥ Φ2 (y). In order to check whether a one dimensional real valued function f : R+ → R is monotonic increasing or decreasing, one studies the first derivative. The function is (strict) monotonic increasing if f 0 (x) ≥ (>)0 and (strict) monotonic decreasing if f 0 (x) ≤ (<)0 for all x ∈ R+ . The following lemma (see [92, Thm. 3.A.4]) is sometimes called Schur’s condition. It provides an approach for testing whether some vector valued function is Schur-convex or not. The approach is similar to the one dimensional case described above. Lemma 2.5 (Schur’s condition). Let I ⊂ R be an open interval and let f : I n → R be continuously differentiable. Necessary and sufficient conditions for f to be Schur-convex on I n are f is symmetric1 on I n
(2.1)
and
∂f ∂f − (xi − xj ) ∂xi ∂xj
≥ 0 for all 1 ≤ i, j ≤ n.
(2.2)
Remark 2.3. Since f (x) is symmetric, Schur’s condition can be reduced as in [92, p. 57] ∂f ∂f (x1 − x2 ) − ≥ 0. (2.3) ∂x1 ∂x2 1A
function is called symmetric if the argument vector can be arbitrarily permuted without changing the value of the function.
16
Majorization Theory
From Lemma 2.5 follows that f (x) is a Schur-concave function on I n if f (x) is symmetric and ∂f ∂f (x1 − x2 ) − ≤ 0. (2.4) ∂x1 ∂x2 Finally, if we assume the components of the vector be ordered, the ∂f ∂f function is Schur-convex if ∂x1 − ∂x2 ≥ 0.
Remark 2.4. The reduction of Schur’s condition to only the largest and second largest component in (2.4) shows a helpful property. Since the function is assumed to be symmetrical, it suffices to consider only two components and keep other n − 2 components fixed. Often, this reduces the vector problem to a scalar problem by using the parameterization for the two components by a scalar t as in the introduction to this chapter.
Remark 2.5. The definition of Schur-convexity and Schur-concavity can be extended if another function Ψ : R → R is applied to Φ(x). Assume that Φ is Schur-concave, if the function Ψ is monotonically increasing then the expression Ψ(Φ(x)) is Schur-concave, too. If we take for example the function Ψ(n) = log(n) for n ∈ R+ and the function Φ2 from the example above, we can state that the composition of the two functions Ψ(Φ2 (x)) is Schur-concave on Rn+ . This result can be generalized for many possible compositions of monotonically increasing as well as decreasing functions, and Schur-convex as well as Schur-concave functions (see Chapter 3 in [92]). Next, another important class of Schur-convex function is illustrated. Definition 2.7 (Symmetric gauge function). A function f : Rn → R is called symmetric gauge function if (1) f (u) > 0 for all u 6= 0, (2) f (γu) = |γ|f (u) for γ ∈ R,
2.1 Definition and Examples
17
(3) f (u + v) ≤ f (u) + f (v), (4) f (u1 , . . . , un ) = f (±ui1 , . . . , ±uin ) for any permutation i.
Example 2.6. functions:
The following are examples of symmetric gauge
(a) f (x) = max |xi | 1 P (b) f (x) = ( |xi |p ) p .
Lemma 2.6 (Symmetric gauge functions are Schur-convex). Since every symmetric gauge function is symmetric and convex, it is also Schur-convex.
2.1.3
Majorization for Functions
Consider the transition from finite vectors to the infinite case. The definition of majorization can be easily extended to the continuous function case. Definition 2.8 (Normalized function). A function is said to be normalized, if it fulfills Z 1 S(t)dt = 1. 0
Definition 2.9 (Majorization for functions). A nondecreasing function S1 is defined to majorize a nondecreasing function S2 if both are normalized, i.e., Z 1 Z 1 S1 (t)dt = S2 (t)dt = 1 0
0
and for every θ ∈ [0, 1] Z
1
Z S1 (t)dt ≥
θ
1
S2 (t)dt. θ
18
Majorization Theory
The definition of Schur-convexity and Schur-concavity can be extended to functions by the following definition. Definition 2.10 (Schur-concavity of functions). A real-valued function H is said to be Schur-concave (resp. Schur-convex) if S1 majorizes S2 implies H(S1 (·)) ≤ H(S2 (·)).
Example 2.7. Similar to Example 2.5, consider the function H(S) = R1 2 0 S(t) dt. By replacing the integral by an infinite sum, one sees that the function is Schur-convex. In fact, the results derived in the next section for vectors of size n could be carefully applied to these normalized functions as well. This can be observed by replacing the sum by an integral in most equations.
2.2
Basic Results
In the text book [92] there are many results that show Schur-convexity for certain classes of functions, their concatenations, applying outer functions, and so on. In the following, the propositions that are useful for later application will be stated and mostly proven in a convenient way by using only the basic definitions and properties from the last section. Proposition 2.7 (3.C.1 in [92]). If I ⊆ R is an interval and g : I → R is convex and twice differentiable, then φ(x) =
n X
g(xk )
k=1
is Schur-convex on I n . Consequently, x y on I n implies φ(x) ≥ φ(y). Proof. The first part of the Schur-condition in (2.3) is verified since φ is obviously symmetric. The difference between the derivatives with
2.2 Basic Results
19
respect to x1 and x2 is given by ∆=
∂φ(x) ∂φ(x) − = g 0 (x1 ) − g 0 (x2 ) ≥ 0 ∂x1 ∂x2
because g 00 (x) > 0 follows that g 0 (x1 ) > g 0 (x2 ) for all x1 > x2 . Therefore, Schur’s condition in (2.3) holds.
Remark 2.6. Proposition 2.7 can also be stated with concave function g and ending up in the Schur-concavity.
Proposition 2.8 (Theorem 1 in [93]). If φ is symmetric and convex, then φ is Schur-convex. Consequently, x y implies φ(x) ≥ φ(y). Proof. By symmetry the first part of Schur’s condition in (2.3) is verified. Next, let us construct a vector y that is majorized by x. Consider the doubly stochastic matrix α α ¯ A= α ¯ α with α ¯ = 1 − α. Define y = Ax and we have x y with y1 = αx1 + α ¯ x2 , y2 = α ¯ x1 + αx2 . Because φ is convex, φ(y1 , y2 ) = φ(αx1 + α ¯ x2 , α ¯ x1 + αx2 ) = φ(α(x1 , x2 ) + α ¯ (x2 , x1 )) ≤ αφ(x1 , x2 ) + α ¯ φ(x2 , x1 ) = φ(x1 , x2 ).
Remark 2.7. Note that the reverse statement is not true. Consider the following Schur-convex function which is not convex [166, Ex. II.3.15]. Let φ : [0, 1]2 → R be the function 1 1 φ(x1 , x2 ) = log − 1 + log −1 . x1 x2 Checking Schur’s condition it can be observed that φ is Schur-convex on x ∈ [0, 1]2 : x1 + x2 ≤ 1. However, the function log(1/t − 1) is convex on (0, 1/2] but not on [1/2, 1).
20
Majorization Theory
2.2.1
Inequalities in Matrix Theory
There are also many applications of Majorization to matrix theory (see Chapter 9 in [92] and [166]). Proposition 2.9 (Schur inequality). Let H be an n × n Hermitian matrix. The vector of diagonal entries of H is majorized by the vector of eigenvalues of H, i.e., λ(H) [H ii ]ni=1 . Proof. By the eigenvalue decomposition, we have H = U ΛU H . The diagonal elements of H11 , . . . , Hnn of H are X X Hii = uij uH pij λj , ij λj = j
j
where pij = uij uH ij . Since U is unitary, P : P ij = pij is doubly stochastic, we have [H11 , . . . , Hnn ] = [λ1 , . . . , λn ]P = λP hence λ [H1 1, . . . , Hn n].
Proposition 2.10 (Hadamard inequality). If H is an n × n positive semidefinite matrix such that H11 ≥ · · · ≥ Hnn , then n Y k=l
Hkk ≥
n Y
λk (H)
(2.5)
k=l
for all l = 1, . . . , n with ordered eigenvalues λ1 (H) ≥ · · · ≥ λn (H). Equality holds if H is diagonal. Proof. Use Proposition 2.7 with g(x) = log x. g(x) = log(x) is a concave function and λ [H11 , . . . , Hnn ] by Proposition 2.9. The eigenvalues of the sum of Hermitian matrices are characterized in many different ways [82]. In the following a result is stated without
2.2 Basic Results
21
proof that uses Majorization theory to give bounds on the spectrum of the sum of Hermitian matrices. The proofs can be found in the text book [92] and [8, Thm. III.2.1].
Theorem 2.11 (Weyl’s inequalities, 9.G.1 in [92]). If G and H are n × n positive semi-definite matrices with ordered eigenvalues λ1 ≥ · · · ≥ λn ≥ 0, then λ1 (G) + λn (H) λ1 (G + H) λ1 (G) + λ1 (H) .. .. .. . . . . λn (G) + λ1 (H)
λn (G + H)
λn (G) + λn (H)
Corollary 2.1 (Weyl’s first inequality). If G and H are n × n positive semi-definite matrices with ordered eigenvalues λ1 ≥ · · · ≥ λn ≥ 0, then λ1 (G + H) ≤ λ1 (G) + λ1 (H). Most proofs of Weyl’s inequalities are based on the following theorem.
Theorem 2.12 (20.A.2 in [92]). If H is an n × n positive semidefinite matrix with ordered eigenvalues λ1 ≥ · · · ≥ λn ≥ 0, then max tr U HU H
U
min tr U HU H U
=
=
k X l=1 k X
λl (H)
and
λn−l+1 (H),
l=1
where k = 1, . . . , n and the extrema are over k × n complex matrices U satisfying U U H = I k . The multiplicative version of the above result characterizes the spectrum of the product of Hermitian matrices.
22
Majorization Theory
Theorem 2.13 (H.1.a in [92]). If A and B are full rank n × n positive semi-definite matrices then log λ1 (GH) log λ1 (G) + log λ1 (H) .. .. (2.6) . . . log λn (GH)
log λn (G) + log λn (H)
Remark 2.8. The statement in (2.6) leads by Theorem 2.4 directly to λ1 (GH) λ1 (G)λ1 (H) .. .. w . . . λn (GH)
2.2.2
λn (G)λn (H)
Stochastic Majorization
Proposition 2.14 (Jensens inequality). Let X be a random variable taking values in the open convex set X with finite expectation E[X]. If φ : X → R is convex, then EX φ(X) ≥ φ(EX X)
(2.7)
with strict inequality if and only if φ is affine on the convex hull of the support of X. Conversely, if (2.7) holds for all random variables X taking values in X such that the expectation exists, then φ is convex. A proof of this proposition can be found, e.g., in [92, 16.C.1]. Theorem 2.15 (Theorem in [93]). If X1 ,. . . ,Xn are interchangeable random variables, a b, and φ is a continuous convex function and symmetric in its n arguments, then EX1 ,...,Xn φ(a1 X1 , . . . , an Xn ) ≥ EX1 ,...,Xn φ(b1 X1 , . . . , bn Xn ). If φ is strictly convex, equality occurs only when a = b, possible after reordering components, or Xi is zero with probability one.
2.2 Basic Results
23
Proof. Define the vector a = [a1 , . . . , an ]. The function ψ(a) = EX1 ,...,Xn φ(a1 X1 , . . . , an Xn ) is obviously convex because the function φ is assumed to be convex. To show that ψ is symmetric, it is necessary to use the exchangeability of X1 , . . . , Xn and the symmetry of the function. Next, a number of novel Schur-convexity and Schur-concavity results are listed for certain classes of functions. In later sections, these functions will obtain an operational meaning in multiantenna and multiuser communications theory. 2.2.3
Average Scalar Function of Sum of Weighted Random Variables
Assume that w1 , . . . , wn are independently and identically distributed (iid) random variables according to some probability density function (pdf). A special case is that s1 , . . . , sn which are iid standard exponentially distributed, i.e., p(s1 ) = exp(−s1 ). Furthermore, the vector µ will have non-negative entries that are ordered in non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0.
Lemma 2.16 (Average of function of weighted sum). Suppose the function f : R+ → R+ is concave. Then !# " n X G(µ) = Ew1 ,...,wn f µk wk (2.8) k=1
is Schur-concave. Assume f is convex. Then the function G in (2.8) is Schur-convex.
Proof. Check the conditions in Theorem 2.15. Since w1 , . . . , wn are exchangeable random variables, the function is symmetric with respect P to µ. Furthermore, the function f ( nk=1 µk wk ) is jointly concave (convex) in µ1 , . . . , µn and hence Schur-concavity (Schur-convexity) follows.
24
Majorization Theory
2.2.4
Average Scalar Function of Maximum of Weighted Random Variables
Assume that w1 , . . . , wn are independent and iid random variables according to a Gamma distribution with scale parameter λ = 1, and shape k which corresponds to the diversity order, i.e., p(w) =
wk−1 exp(−w) . Γ(k)
Furthermore, the vector µ will have non-negative entries that are ordered in non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. Denote the cumulative distribution function (cdf) of the maximum as P (x), i.e., ! x n n n γ n, l Y Y X µk x = P (x) = 1 − exp(−x) . Γ(n) l! k=1
k=1
with the incomplete Gamma function γ(n, x) =
l=0
Rx 0
tn−1 exp(−t)dt.
Theorem 2.17 (Average of maximum of weighted random variables). Assume that the function f satisfies the following properties for constants 1 and 2 (1) lim f (x)(P (x) + 1 ) = c1 < ∞, x→0
(2) lim f (x)(P (x) + 1 ) = c2 < ∞, and x→∞ R ∞ (3) 0 f 0 (x)(P (x) + 2 )dx = c3 < ∞. and it is monotonic increasing and concave then G(µ) = Ew1 ,...,wn f max µk wk 1≤k≤n
(2.9)
is Schur-convex. If f has the above properties and is monotonic decreasing and convex, then G in (2.9) is Schur-concave. Proof. Since the random variables w1 , . . . , wn are exchangeable, the function is symmetric with respect to µ. The expectation in (2.9) can
2.2 Basic Results
be written as Z ∞ ∞ Z 0 f (x)P (x)dx = f (x)(P (x) + 1 ) − 0
0
∞
25
f 0 (x)(P (x) + 2 )dx.
0
(P (x)+) The constants 1 and 2 arise because . dx = P 0 (x). With assumptions (1) and (2) the first term on the RHS exists. The second term exists by the third assumption. Note that c1 and c2 are independent of µ. Define Z ∞ n γ n, µxk Y + dx. f (µ) = − f 0 (x) Γ(n) 0 k=1
Next check Schur’s condition directly, i.e., x Z ∞ n γ n, Y µk ∂f (µ) ∂f (µ) ∆= − = f 0 (x) ∂µ2 ∂µ1 Γ(n) 0 k=3 1 n−1 γ n, µ1 x x · exp(−x/µ22 ) 2 µ2 µ2 Γ(n) x n−1 γ n, µ2 x x . − 2 exp(−x/µ1 ) Γ(n) µ1 µ1 Define the difference as γ(x, µ1 , µ2 ). For all x ≥ 0, µ1 > µ2 the function γ(x, µ1 , µ2 ) is positive for x ≤ x∗ and negative for x ≥ x∗ . First, consider f to be monotonic increasing and concave. Then Z ∞ Z ∞ 0 0 ∗ γ(x, µ1 , µ2 )dx. (2.10) f (x)γ(x, µ1 , µ2 )dx ≤ f (x ) 0
0
R∞
Finally, it holds 0 γ(x, µ1 , µ2 )dx ≤ 0 for all µ1 , µ2 . Therefore, the Schur-convexity follows. If f is monotonic decreasing and convex, then Z ∞ Z ∞ 0 0 ∗ f (x)γ(x, µ2 , µ1 )dx ≥ f (x ) γ(x, µ2 , µ1 )dx ≥ 0 (2.11) 0
0
and the Schur-concavity follows in this case.
26
Majorization Theory
Example 2.8. We provide two examples to show the application of Theorem 2.17. First, f (x) = log(1 + x). Choose 1 = 2 = −1 to fulfill the conditions: The first condition follows directly lim log(1 + x)(P (x) − 1) = 0.
x→0
The second condition is checked by noting that the incomplete Gamma P k function can be rewritten as γ(n, x) = (n − 1)![1 − exp(−x) nk=0 xk! ], i.e., the function decreases to one as least as fast as 1 − exp(−x) and hence lim log(1 + x)(P (x) − 1) = lim log(1 + x) exp(−x) = 0.
x→∞
x→∞
1 . Again, let 1 = 2 = −1 to get The next example chooses f (x) = 1+x 1 1 limx→0 1+x (P (x) − 1) = 1 and limx→∞ 1+x exp(−x) = 0.
2.2.5
Average of Scalar Function of Maximized Weighted Sum
Assume that w1 , . . . , wn are iid random variables according to a Gamma distribution with parameters λ = 1 and k > 0. Define a vector p = [p1 , . . . , pn ] with non-negative entries. Furthermore, the vector µ will have non-negative entries that are ordered in non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. Theorem 2.18 (Average of maximized weighted sum). Consider the function f (x) = log(1 + x), then " !# n X H(µ) = max E f pk µk wk p:pk ≥0,|p|≤P
k=1
is Schur-convex with respect to µ. The proof can be found in [68]. Interestingly, this result implies that the optimal p∗ µ. Note that this result cannot be generalized to an arbitrary concave function f .
2.2 Basic Results
2.2.6
27
Probability of Scalar Functions of Sums of Weighted Random Variables
Assume again that w1 , . . . , wn are iid random variables according to a Gamma distribution with scale parameter λ = 1, and shape k, i.e., k−1 exp(−w) p(w) = w Γ(k) . For k = 1, the special case is that s1 , . . . , sn are iid standard exponentially distributed, i.e., p(s1 ) = exp(−s1 ). Furthermore, the vector µ will have non-negative entries that are ordered in non-increasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. In contrast to the clear behavior of the average in Lemma 2.16, the probability of the function of the weighted sum of random variables w1 , . . . , wn shows varying behavior. Theorem 2.19 (Probability of weighted sum). Suppose that the inverse function f −1 of f exists and f is non-negative. If f −1 (x) ≥ 2 and µ1 µ2 then ! # ! # " " n n X X µ2k wk ≤ x . µ1k wk ≤ x ≤ Pr f Pr f k=1
k=1
If
f −1 (x)
µ1
µ2
≤ 1 and then ! " # " n X 1 µk wk ≤ x ≥ Pr f Pr f k=1
n X
! µ2k wk
# ≤x .
k=1
P The result says that Pr [f ( nk=1 µk wk ) ≤ x] is Schur-concave for x ≥ f (2) and Schur-convex for x ≤ f (1). The proof can be found in [73]. The special case for random Gaussian variables was solved in [133]. It relies heavily on the properties of the distribution of the sum of weighted random variables, especially on the unimodality of the corresponding pdf. With respect to majorization there is no clear behavior in the interval f (1) < x < f (2). However, the minimum of the probability can be characterized in a closed form [73, Thm. 3]. 2.2.7
Average Trace of Function of Sum of Weighted Dyads
Consider the random vectors h1 , . . . , hn of size m with iid zero-mean complex Gaussian distributed entries with variance one. Furthermore,
28
Majorization Theory
the vector µ will have non-negative entries that are ordered in nonincreasing order µ1 ≥ µ2 ≥ · · · ≥ µn ≥ 0. Theorem 2.20 (Average trace of weighted dyadic sum). Assume that φ is a matrix-convex function.2 Then, the function " !# n X Φ(µ) = Eh1 ,...,hn tr φ (2.12) µk hk hH k k=1
is Schur-convex with respect to µ. If φ is matrix-concave, the function Φ(µ) is Schur-concave. Proof. For verifying Schur’s condition, the first derivative of (2.12) with respect to µT1 and µT2 is important. These partial derivatives are given by !# " nT X ∂Φ(µ) H T H [1] (2.13) µk hk hk = E tr h1 h1 · φ ∂µT1 k=1 !# " nT X ∂Φ(µ) [1] . (2.14) µTk hk hH = E tr h2 hH 2 ·φ k ∂µT2 k=1 The first two largest eigenvalues µ1 and µ2 are parameterized by µ1 = µ + t and µ2 = µ − t. Then, the difference between the first derivatives in (2.13) and (2.14) is a function of t and is given by h i Γ(t) = E tr ∆ · φ[1] (R + t∆) (2.15) with W k = hk hH k ,
R = µ (W 1 + W 2 ) +
nT X
µk W k , and
k=3
∆ = W 1 − W 2. Next, the matrix-monotone function φ(A) can be written as3 Z ∞ sA [sI + A]−1 dµ(s) φ(A) = 0 2 This 3 See
class of functions is defined in Subsection 3.1.2. the representation theorem in Subsection 3.2.1.
2.2 Basic Results
29
with probability distribution dµ(s). The “first derivative” as defined in Section 3.2.2 is then given as Z ∞ s2 [sI + A]−2 dµ(s). (2.16) φ[1] (A) = 0
The result from (2.16) is set into Γ(t) in (2.15) and integration and summation is exchanged to obtain Z ∞ s2 E tr ∆ · [R(s) + t∆]−2 dµ(s) (2.17) Γ(t) = 0
P T with R(s) = sI + µ (W 1 + W 2 ) + nk=3 µk W k . Finally, we study the trace expression in (2.17) as a function with respect to t. To this end, we define for all s ≥ 0 M (t) = tr ∆ · [R(s) + t∆]−2 . (2.18) Note, that Γ(0) = 0. The first derivative of the function M (t) with respect to t is smaller than or equal to zero for all s ≥ 0 and all W 1 , W 2 . Therefore, the integral in (2.17) is smaller than or equal to zero, because the outer integral is over a pdf, which is positive for all s by definition and has only positive steps. With T = [R(s) + t∆]−1 it holds ∂M (t) = − tr (∆T T ∆T ) − tr (∆T ∆T T ) ∂t = −2 tr T ∆T | {z∆} T .
(2.19)
Q
Note, that the matrix T is positive definite. Finally, the matrix Q can be written as Q = W 1T W 1 − W 2T W 1 − W 1T W 2 + W 2T W 2 h ih i = W 1 T 1/2 − W 2 T 1/2 T 1/2 W 1 − T 1/2 W 2 {z } | C
= CC
H
≥ 0.
(2.20)
Inequality (2.20) shows that the matrix Q is positive definite and therefore the first derivative of M (t) with respect to t in (2.19) is smaller than or equal to zero. Therefore, the function Φ(t) in (2.17) is smaller than or equal to zero and this verifies Schur’s condition and completes the proof.
30
Majorization Theory
Theorem 2.20 is a generalization of Theorem 2.15 to the matrix case.
2.3
Majorization and Optimization
It was mentioned in the introduction of this chapter that Majorization theory is applied successfully to optimization. In this section, results regarding the optimization of certain Schur-convex and Schur-concave functions are listed and proven. The simplest case is a programming problem in which a Schurconcave function is to be maximized and the constraint set fits to the `1 norm constraint, i.e., the optimization vector x is restricted by ||x||`1 = X. Theorem 2.21. Consider the Schur-concave function f : Rn+ → R+ and the following programming problem max f (x) x
s.t. x ≥ 0, |x| = X.
Then the global maximum is achieved by x = corresponding minimization problem min f (x) x
X n 1.
s.t. x ≥ 0, |x| = X
(2.21) In contrast, the
(2.22)
has the global minimum x = [X, 0, 0, . . . , 0].
Proof. The proof follows directly from the fact that x=
X 1 x [X, 0, 0, . . . , 0] = x n
as well as from the Schur-concavity of the objective function, i.e., f (x) ≥ f (x) ≥ f (x). P √ Example 2.9. Let f (x) = nk=1 xk and X = 1. From Proposition 2.7 follows that f is Schur-concave. Then from Theorem 2.21 follows √ f (x) = 1 ≤ f (x) ≤ n = f (x).
2.3 Majorization and Optimization
31
Example 2.10. Order the components of x in a decreasing order, i.e., x1 ≥ x2 ≥ · · · ≥ xn ≥ 0. Another example is f (x) = x1 = max1≤k≤n xn . This function is obviously Schur-convex and the maximum is achieved by x, the minimum is achieved by x. Next, the question arises what happens if the equality constraint in the optimization problems (2.21) and (2.22) is relaxed, i.e., if the constraint set is, e.g., |x| ≤ X. The corresponding programming problem can be stated as max max f (x) . (2.23) 0≤ξ≤X
x≥0,|x|=ξ
For the optimum of (2.23), it follows that x∗ = where .∗ denotes the optimum.
ξ∗ n1
if f is Schur-concave
P √ Example 2.11. f (x) = nk=1 xk is Schur-concave and monotonic increasing in the `1 norm of x. Therefore, the solution to (2.23) is still x∗ = X n 1. P √ Example 2.12. Consider f (x) = |x|(2 − |x|) nk=1 xk . This function is also Schur-convex but not monotonic increasing in the `1 norm of x. Set the optimum x(ξ)∗ = nξ 1 into the function and obtain a scalar √ function f (ξ) = nξ 3/2 (2 − ξ). The global maximum is attained by 6 1. ξ ∗ = 65 , i.e., x∗ = 5n The generalization of this approach is proven in [168, Thm. 2.4]. Theorem 2.22. Let f : Rn+ → R+ be a Schur-convex function and have only one critical point a. If f (a) is a local minimum, then it must be the global minimum.
3 Matrix-Monotone Functions
The last chapter discussed a certain partial order for vectors — Majorization — and characterized the order preserving functions — Schur-convex and Schur-concave functions. This chapter will propose a certain order on matrices, namely the L¨owner order. The order preserving functions are called matrix-monotone functions and their representations and properties will be discussed.
3.1
Definition and Examples
To define monotonicity for matrix-valued functions of matrices, a partial order on the set of all n × n positive semidefinite matrices is required. Define the set of all positive semidefinite matrices of size n × n as Hn . One standard ordering which is called the L¨owner ordering is described by A ≤ B means B − A is positive semidefinite A < B means B − A is positive definite. In order to transfer the notion of monotonicity and convexity to matrices and matrix-valued functions, the following definitions are helpful. 33
34
Matrix-Monotone Functions
Let the function φ be an arbitrary matrix-valued function. The function φ maps from the set of positive semidefinite matrices to the set of positive semidefinite matrices. The function φ is defined in the following [8, 51]. Definition 3.1 (Matrix function). Let the eigenvalue decomposition of A be given as A = U ΛU H . Then φ(A) means φ(A) = U φ(Λ)U H with φ(Λ) = diag (φ(λ1 ), . . . , φ(λn )) . That means, the function φ affects only the eigenvalues of the matrix A and keeps the eigenvectors unaltered.
Remark 3.1. There are many possible definitions of how a function affects a matrix. The Definition 3.1 is motivated by the Spectral Theorem (Proposition 6.2 in Section 6.1). It follows that the matrix function φ can be regarded as scalar function f : R+ → R+ . The interested reader is referred to Chapter 6 of [51]. 3.1.1
Matrix Monotonicity
The concept of matrix-monotonicity was first studied by [88]. Definition 3.2 (Matrix-increasing). Let A ⊂ Hn . A function φ : A → Hn is matrix-increasing of order n on A if A≤B
implies
φ(A) ≤ φ(B)
(3.1)
for A, B ∈ A. The function is strictly matrix-increasing of order n on A if A
implies
φ(A) < φ(B)
for A, B ∈ A.
Remark 3.2. If a function is matrix-increasing for all orders n ≥ 1 it is just called matrix-increasing or sometimes also matrix-monotone.
3.1 Definition and Examples
35
Recently, it was shown that the class of matrix-monotone functions of order n + 1 is a proper subset of the class of matrix-monotone functions of order n [45]. Let us study some examples and counter examples of matrixmonotonicity. Lemma 3.1 (16.E.3.b in [92]). On the set of positive definite matrices, the function φ(A) = A−1 is strictly decreasing. Proof. Let Q1 ≥ Q2 > 0 and define g(t) = (tQ1 + (1 − t)Q2 )−1 = Q(t)−1 . It suffices to prove that g is strictly decreasing in 0 ≤ t ≤ 1. Note that Q(t)Q(t)−1 = I. As a result ∂Q(t) −1 ∂Q(t)−1 Q (t) + Q(t) = 0. ∂t ∂t Hence, ∂Q(t)−1 = −Q(t)−1 ∂t
∂Q(t) Q(t)−1 ∂t
= −Q(t)−1 (Q1 − Q2 ) Q(t)−1 < 0. Example 3.1. On the set of positive definite matrices, the function φ(A) = A2 is not strictly increasing. Consider the following counter example: 40 8 32 0 8 8 A= , B= , A−B= , 8 20 0 11 8 9 640 480 2 2 A −B = . 480 343 The eigenvalues of the matrix A2 − B 2 are −10.94626578 and 993.9462658. The example is a special case of [92, 16.E.4] and it shows that conclusions from the scalar case to the matrix case have to be drawn very carefully.
36
Matrix-Monotone Functions
3.1.2
Matrix Convexity
The concept of matrix-convexity was first studied by [83]. Definition 3.3 (Matrix-convex). Let A ⊂ Hn . A function φ : A → Hn is matrix-convex of order n on A if φ(αA + α ¯ B) ≤ αφ(A) + α ¯ φ(B)
for all
α ∈ [0, 1] and A, B ∈ A
(3.2) .
φ is strictly matrix-convex of order n on A if φ(αA + α ¯ B) < αφ(A) + α ¯ φ(B)
for all α ∈ [0, 1] and A, B ∈ A.
A function is called just matrix-convex if it is matrix-convex for all orders n ≥ 1.
Theorem 3.2 (Theorem 2.5 in [46]). A nonnegative continuous function on [0, ∞) is operator monotone if and only if it is operator concave.
Remark 3.3. However, not every matrix-convex function is necessarily matrix-monotone. The function φ(A) = A2 is matrix-convex but not matrix-monotone as shown in Example 3.1.
Proposition 3.3 (Proposition 16.E.6 in [92]). Let φ be a function defined on a convex set A of m × k matrices, taking values in Hn for some n. If A is open and g is twice differentiable for all A, B ∈ A the following are equivalent: (1) φ is matrix-convex on A. (2) For all fixed A and B in A, the function g(α) = φ(αA + α ¯ B) is convex in α ∈ [0, 1] in the sense that ηg(α) + η¯g(β) − g(ηα + η¯β) is positive semidefinite for all α, β, η ∈ [0, 1]. 2 g(α) (3) For all fixed A, B ∈ A, d dα is positive semidefinite for 0 < 2 α < 1.
3.1 Definition and Examples
37
Proof. Matrix convexity on A is fulfilled if and only if for all x ∈ Rn , ψ(A) = xφ(A)xH is convex. ψ(A) is convex on A if and only if 2 g(α) H ≥ 0 for all A ∈ A. ψ 00 (A) = x d dα 2 x The next example illustrates the notion of matrix-concavity. Example 3.2. Consider f (X) = log(I + X). In order to show that f is matrix-concave, we need the following identity: d2 log(I + αX) = −X[I + αX]−2 X ≤ 0. dα2 As a result, for g(α) = log(I + αX + α ¯ Y ) it holds d2 g(α) ≤0 dα2 and matrix-convexity follows by Proposition 3.3. Matrix-monotonicity follows from Theorem 3.2. The properties of the matrix function φ can be transferred to scalar function φ but not vice versa. Corollary 3.1. Every matrix-monotone function is monotonic (increasing or decreasing) whereas not every monotonic function is matrix monotone. Every matrix convex function is convex whereas not every convex function is matrix convex.
Example 3.3. The function φ(x) = x2 is increasing and convex but not matrix monotone. The function φ(x) = exp(x) is convex but not matrix convex. The connection between the scalar properties of monotonicity and concavity and the matrix-monotonicity is illustrated in the Venn-diagram in Figure 3.1.
38
Matrix-Monotone Functions
Fig. 3.1 Venn-diagram: Matrix-monotone functions are matrix-concave, concave, and monotone.
3.1.3
Fr´ echet Derivative
Corresponding to the first and second derivatives of scalar functions, there exists a derivative of the matrix-valued function φ. We follow closely the derivation in [8, Sec. V.3 and Sec. X.4]. Definition 3.4 (Fr´ echet differentiable). The map φ is called (Fr´ echet) differentiable at A if there exists a linear transformation Dφ(A) on the space of positive semidefinite matrices such that for all H ||φ(A + H) − φ(A) − Dφ(A)(H)|| = o(||H||)
(3.3)
holds. The linear operator Dφ(A)(H) is then called the derivative of φ at A in direction H. The difference from the scalar case is that a direction H is needed to define the derivative. Lemma 3.4 (First derivative in a direction). The first derivative of φ(A) = Ap in direction of B is given by Dφ(A)(B) =
p X k=1
Ak−1 BAp−k .
3.1 Definition and Examples
39
Proof. Compute the first derivative of φ() = (A + B)p with respect to at the point = 0. Using the product rule, it holds d dφ() = (A + B) (A + B)p−1 d d d +(A + B) (A + B) (A + B)p−2 d p−1 d + · · · + (A + B) (A + B) d p X d (A + B) (A + B)p−k = (A + B)k−1 d k=1
=
p X
(A + B)k−1 B(A + B)p−k .
k=1
At the point = 0, we have dφ() d
=
=0
p X
Ak−1 BAp−k .
k=1
This completes the proof.
Example 3.4. Next, three examples are provided to illustrate the notion of the derivative of a matrix function. (1) Let φ(A) = A2 . Then Dφ(A)(B) = AB + BA. (2) Let φ(A) = A−1 for invertible A. Then Dφ(A)(B) = −A−1 BA−1 . (3) Let φ(A) = AH A. Then Dφ(A)(B) = AH B + B H A.
The derivative is linear, i.e., D(φ1 + φ2 )(A) = D(φ1 (A)) + D(φ2 (A)). The composite of two differentiable maps φ and γ is differentiable.
40
Matrix-Monotone Functions
The chain rule is D(γ · φ)(A) = Dγ(φ(A)) · Dφ(A). Interestingly, if the trace operator is applied to the function φ, the following result can be proven. This is a special case of the results in [47]. Lemma 3.5 (Derivative of trace of matrix function). Let φ be a continuous analytic function mapping positive semidefinite matrices to positive semidefinite matrices. Then, the first derivative of the function tr [φ(C + D)] with respect to at the point = 0 is given by 0 ∂ tr [φ(C + D)] = tr φ (C) · D (3.4) ∂ =0 with the simple derivative φ0 of the (scalar) function φ. Proof. Since the function is continuous and analytic, the function can be approximated arbitrarily well by a polynomial. Both sides of (3.4) in Lemma 3.5 are linear in φ. Therefore, it suffices to prove Equation (3.4) for the powers φ(t) = tp with p ∈ N+ . It holds ! p X ∂ k−1 p−k tr [φ(C + D)] = tr C DC ∂ =0 k=1 = tr DC p−1 + CDC p−2 + · · · = p · tr DC p−1 0 = tr φ (C) · D .
Example 3.5. Consider the function φ(A) = Ak , the first derivative of tr φ at point A in direction D is given by ∂ tr [φ(A + D)] = k tr Ak−1 D . ∂ =0
Definition 3.5 (Twice Fr´ echet differentiable). Let φ be a differentiable map from X to Y . At each point u, the derivative Dφ(u) is an element of the Banach space L(X, Y ). This is a map Dφ from X into L(X, Y ), defined as Dφ : u → Dφ(u). If this map is twice differentiable
3.1 Definition and Examples
41
at u, the derivative of the map Dφ at the point u is called the second derivative of φ at u. It is denoted as D2 φ(u).
Remark 3.4. The second derivative of a function φ at point X has two parameters, i.e., D2 φ(X)(A)(B) = D2 φ(X)(B)(A) and it is symmetric in these two parameters.
Example 3.6. Two examples of second derivative are provided. (1) Let φ(A) = A3 . The first derivative is Dφ(A)(B) = A2 B + ABA + BA2 . The second derivative is Dφ2 (A)(B 1 , B 2 ) = AB 1 B 2 B 1 AB 2 + B 1 B 2 A + AB 2 B 1 + B 2 AB 1 + B 2 B 1 A. (2) Let φ(A) = A−1 . We have seen that −A−1 BA−1 . The second derivative is
Dφ(A)(B) =
D2 φ(A)(B 1 , B 2 ) = A−1 B 1 A−1 B 2 A−1 + A−1 B 2 A−1 B 1 A−1 .
3.1.4
First Divided Difference
In the following, we define the first divided difference of φ which is closely related to the derivative of φ that was discussed above. Furthermore, the matrix constructed by these first divided differences is used to characterize the class of matrix-monotone functions [8, Sec. V.3]. Definition 3.6 (First divided difference). Let I be an open interval. Let φ be a continuously differentiable function on I. Then, we
42
Matrix-Monotone Functions 0
denote by φ the function on I × I defined as φ(λ1 ) − φ(λ2 ) , λ1 − λ 2 φ[1] (λ1 , λ1 ) = φ0 (λ1 ). φ[1] (λ1 , λ2 ) =
if λ1 6= λ2
The expression φ[1] (λ1 , λ2 ) is called the first divided difference of φ at (λ1 , λ2 ).
Definition 3.7 (Matrix of first divided differences). The matrix of first divided differences φ[1] (A) for positive semidefinite A = U ΛU H is defined as φ[1] (A) = U φ[1] (Λ)U H Applying the diagonal matrix Λ with entries λ1 , . . . , λn , the function φ[1] is defined as an n × n matrix with h i φ[1] (Λ) = φ[1] (λj , λk ). j,k
The connection between the matrix of first divided differences and the Fre´echet derivative is described in the next lemma. Lemma 3.6 (Lemma V.3.1 in [8]). If φ is a polynomial function and A is positive semidefinite then Dφ(A)(H) = φ[1] (A) ◦ H. The next corollary shows directly the relationship between the first divided difference matrix φ[1] and the simple derivative φ0 of the scalar function φ(t). Corollary 3.2 (First divided difference and first derivative). For any matrix-monotone function φ and Hermitian matrices A, and D the following identity holds tr φ[1] (A) ◦ D = tr φ0 (A) · D . (3.5)
3.2 Basic Characterizations
43
Proof. Let the eigenvalue decomposition of A be given by A = U ΛU H . It holds d tr φ[1] (A) ◦ D = tr (Dφ(A)(D)) = tr φ (A + D) d =0 d tr φ U ΛU H + D = d =0 " # ! d = tr U φ Λ + U H DU U H d =0 = tr U Dφ(Λ)(U H DU )U H H = tr φ[1] (Λ) ◦ U DU} | {z Z
= tr
n h X
i φ[1] (Λ)
k=1
= tr
! k,k
Z k,k
n X 0 φ (Λ) k,k Z k,k k=1 0
= tr φ (Λ)U H DU = tr φ0 (A) · D .
!
(3.6)
In (3.6), the fact used that the diagonal of φ[1] and φ0 are identical and that the trace of AB is equal to the trace of BA.
3.2 3.2.1
Basic Characterizations Representation Theorem
The following integral representations for operator monotone and operator convex functions are parts of L¨owner’s deep theory [166]. Theorem 3.7 (Representation for L¨ owner’s theory). Every matrix-monotone function φ can be expressed as Z ∞ st φ(t) = a + bt + dµ(s) (3.7) s+t 0
44
Matrix-Monotone Functions
with a positive measure µ ∈ [0, ∞) and real constants a, b ≥ 0. Every matrix-convex function ψ can be represented as Z ∞ st2 2 ψ(t) = a + bt + ct + dµ(s) (3.8) s+t 0 with a positive measure µ ∈ [0, ∞) and real numbers a, b, c ≥ 0.
Remark 3.5. There are different approaches and proofs of this fundamental theorem in the literature [34, 83, 88, 156]. Furthermore, the notion φ is used to denote a matrix function as well as the associated scalar function. Applied to the matrix-valued function φ, the matrix valued function can be represented as Z ∞ φ(A) = aI + bA + sA (sI + A)−1 dµ(s). (3.9) 0
The representation in (3.9) works according to Definition 3.1. Interest˜ ingly, the matrix φ(A) with entries Z ∞ h h i i ˜ φ(A) = aδkl + bAkl + s A (sI + A)−1 dµ(s) (3.10) kl
0
kl
is identical to φ(A) in (3.9). In conclusion, it means that the matrixmonotone function φ can be represented by three components, i.e., two scalars and one measure M M = (a, b, µ(s)). Example 3.7. The list of the following functions provides only a small number of representatives. However, these functions will be of certain importance in applications later. (1) M M = 0, 0, s12 u(s − 1) with step function u(s) leads to Z ∞ st 1 ds = log(1 + t). φ(t) = s + t s2 1 (2) M M = − 1, 0, s13 u(s − 1) leads to Z ∞ st 1 log(1 + t) φ(t) = −1 + ds = − . 3 s+ts t 1
3.2 Basic Characterizations
45
(3) M M = (0, 0, δ(s − 1)) with the delta function δ(s) leads to Z ∞ st 1 φ(t) = δ(s − 1)ds = . s+t 1+t 0 (4) M M = 0, 0, sin(rπ) sr−2 with 0 < r ≤ 1 yields π Z sin(rπ) ∞ st r−2 φ(t) = s ds = tr . π s + t 0
3.2.2
Derivative of Matrix-Monotone Function
Theorem 3.8 (Derivative of matrix-monotone function). The first derivative of an arbitrary matrix-monotone function at A in direction B is given by Z ∞ Dφ(A)(B) = bB + s2 [sI + A]−1 B [sI + A]−1 dµ(s). (3.11) 0
Proof. The directional ∂φ(A+B) . ∂
derivative
is
defined
as
Dφ(A)(B) =
=0
Z φ(A + B) = aI + b(A + B) +
∞
s(A + B) [sI + A + B]−1 dµ(s).
0
The first derivative of φ(A + B) with respect to is given by Z ∞ ∂φ(A + B) = bB + sB [sI + A + B]−1 dµ(s) ∂ Z ∞ 0 − s [A + B] [sI + A + B]−1 B 0
· [sI + A + B]−1 dµ(s). At point = 0 we obtain Z ∞h i ∂φ(A + B) = bB + I − A [sI + A]−1 sB [sI + A]−1 dµ(s) ∂ 0 =0
46
Matrix-Monotone Functions
Z = bB +
∞
[sI + A − A] [sI + A]−1 sB
0
· [sI + A]−1 dµ(s) which is equal to (3.11).
Corollary 3.3 (Derivative of trace of matrix-monotone function). The first derivative of the trace of an arbitrary matrixmonotone function is given by Z ∞ ∂ tr φ(A) 0 = bI + s2 [sI + A]−2 dµ(s). (3.12) φ (A) = ∂A 0 Remark 3.6. The function φ0 in (3.12) corresponds to the function φ0 (A) defined by φ0 (A) = U diag[φ0 (λ1 ), . . . , φ0 (λn )]U H because of (3.10). The matrix function φ0 defined in (3.12) will be used when describing further properties of matrix-monotone functions. First derivatives will become also important while studying optimality conditions for certain programming problems. We provide one example of the function φ0 . Example 3.8. Consider the function tr φ(A) = tr log (I + A) and assume that one does not know the derivative with respect to A. By (3.12), the derivative is given by Z ∞ ∂ tr log (I + A) 0 φ (A) = = [sI + A]−2 ds = [I + A]−1 . (3.13) ∂A 0
Definition 3.8 (Completely monotone function). A function g is called completely monotone if g (k) ≤ 0 for even k and g (k) ≥ 0 for odd k.
3.3 Matrix Norms
47
Remark 3.7. Note that the derivative of a matrix-monotone function is always completely monotone and has a unique representation [155, p. 161] Z ∞ exp(−ts)dµ(s) g(t) = 0
for measure µ(s).
Example 3.9. Therefore, the function φ0 (A) in (3.13) is completely monotone and can be represented by Z ∞ Z ∞ 1 = exp(−ts) exp(−s)ds = exp(−s(1 + t))ds. g(t) = 1+t 0 0
3.3
Matrix Norms
One main question regarding the matrix functions that were characterized in the last section is how to map them to real numbers to measure, e.g., the performance of a communication system. Often the simplest and easiest solution is to take the trace of the matrix function. It will be shown in later chapters that most common performance measures are the trace of a certain matrix-monotone performance function. However, an arbitrary norm also maps from the output of the matrix function to the positive real line. Therefore, matrix norms and their properties are studied in this subsection. Since we will consider only positive definite matrices, the definitions are given for positive definite matrices and not for the general case of Hermitian matrices. Most of the definitions can be found in [50, 51]. Definition 3.9 (Matrix norm). For all positive semi-definite matrices A, B, a norm || · || satisfies (1) ||A|| ≥ 0 (nonnegative) (2) ||A|| = 0 if and only if A = 0 (positive) (3) ||cA|| = |c|||A|| for all complex scalars c (homogeneous)
48
Matrix-Monotone Functions
(4) ||A + B|| ≤ ||A|| + ||B|| (triangle inequality) (5) ||AB|| ≤ ||A|| · ||B|| (submultiplicative) A norm that fulfills only property (1–4) is called vector-norm on matrices. Let us study three examples of matrix norms. Example 3.10. The `1 norm for positive semi-definite A is defined by ||A||1 =
n X n X
|aij |.
k=1 j=1
The `2 norm or Euclidean norm for positive semi-definite A is defined by v uX n u n X |aij |2 . ||A||2 = t k=1 j=1
The `∞ norm is defined by ||A||∞ = max |aij |. 1≤i,j≤n
3.3.1
Unitary Invariant Norms and Matrix-Monotone Functions
Since the class of matrix norms is very huge and for further analysis only certain cases are needed, the following subclass of vector-norms on matrices is defined. Definition 3.10 (Unitary invariant norm). A vector-norm || · || on positive semi-definite matrices is said to be unitary invariant if ||U AV || = ||A|| for all positive semi-definite A and for all unitary matrices U, V . The unitary invariant norm is denoted by |||A|||.
3.3 Matrix Norms
49
Remark 3.8. Furthermore, the unitary invariant norms are related to symmetric gauge functions Φ on Rn by [8, Thm. IV.2.1] |||H|||Φ = Φ(λ(H)).
(3.14)
The vector λ(H) denotes the vector with singular values of H. For each unitary invariant norm there is a corresponding symmetric gauge function and vice versa (see [51, Thm. 3.5.18]). For the definition and discussion of symmetric gauge functions see Definition 2.7 and the result in Lemma 2.6. Let us give some examples for unitary invariant matrix norms. Example 3.11. Interestingly, the only unitary invariant vector norm is the `2 norm. Examples for unitary invariant matrix norms are for positive semi-definite A with ordered eigenvalues α1 ≥ α2 ≥ · · · ≥ αn ≥ 0 Pn p 1/p with special cases (1) Schatten p norm: |||A||| = k=1 αk p = 2 Frobenius norm or p → ∞ spectral norm or p = 1 the trace norm. The corresponding symmetric gauge function is Pn p 1/p x . Φ(x) = k=1 k (2) Ky Fan k norm with generating symmetric gauge function ! k X xil Φk (x) = max 1≤i1
l=1
for all 1 ≤ k ≤ n. The special role of the Ky Fan k norm is outlined in the following theorem. Theorem 3.9 (7.4.45 in [50]). Let x and y be vectors in Rn+ . Then Φ(x) ≤ Φ(y) for all symmetric gauge functions Φ on Rn+ if and only if Φk (x) ≤ Φk (y) for k = 1, 2, . . . , n where Φk are the Ky Fan k norms defined above.
50
Matrix-Monotone Functions
Remark 3.9. The Theorem says that in order to have ||A|| ≤ ||B|| it is necessary and sufficient to check the Ky Fan norms 1 ≤ k ≤ n. Note that this condition corresponds exactly to the statement that x is majorized by y, i.e., x y. In other words, the symmetric gauge function is Schur-convex with respect to x as stated in Lemma 2.6. Note that the L¨owner order and the Majorization of the eigenvalue vector are two possible partial orders of positive semidefinite matrices. If a matrix is greater than or equal to another matrix with respect to the L¨owner order, i.e., A ≥ B, then the eigenvalues of matrix A weakly majorize also the eigenvalues of matrix B, i.e., λ(A) w λ(B). In particular, if A ≥ B, then |||A||| ≥ |||B|||. Next, combine the class of matrix-monotone functions with an outer unitary invariant norm, the following results characterize how the properties of matrix-monotonicity are mapped by the norm operation. One simple unitary invariant norm is the trace norm. Lemma 3.10. Consider a matrix-concave function φ : Hn → Hn . The scalar function tr φ : Hn → R+ is concave and monotone. Proof. Both properties follow easily from the matrix-concavity in (3.2) and matrix-monotonicity in (3.1) by applying the linear trace operator to both sides of the inequalities. The more general result is presented next. It corresponds to the Theorem in [5] and [166, Thm. 4.4]. Theorem 3.11. Consider a nonnegative matrix-concave function φ. It holds |||φ(A) + φ(B)||| ≥ |||φ(A + B)|||. For every nonnegative function g with g(0) = 0 and g(∞) = ∞ whose inverse function is operator monotone |||g(A + B)||| ≥ |||g(A) + g(B)|||.
3.3 Matrix Norms
51
Remark 3.10. This inequality can be complemented with results for a concave φ. An application of the above result of further relevance is described in the following corollary [166, Cor. 4.8]. Corollary 3.4. For all A, B ≥ 0 and every ||| · |||, ||| log(I + A + B||| ≤ ||| log(I + A) + log(I + B)||| and ||| exp(A) + exp(B)||| ≤ ||| exp(A + B) + I|||. Another lower bound on matrix-monotone function is provided in the next result [166, Thm. 4.10]. Theorem 3.12. Let φ be a nonnegative matrix-monotone function and ||| · ||| a unitary invariant norm with normalization |||diag[1, 0, . . . , 0]||| = 1. Then for every matrix A ≥ 0, φ(|||A|||) ≤ |||φ(A)|||. The corresponding upper bound is provided in the next result [166, Thm. 4.12] using a different normalization of the norm. Theorem 3.13. Let φ be a nonnegative matrix-monotone function and ||| · ||| a unitary invariant norm with normalization |||I||| = 1. Then for every matrix A ≥ 0, φ(|||A|||) ≥ |||φ(A)|||. From the last two results the combination follows with E = diag[1, 0, . . . , 0] |||A||| |||A||| |||E|||φ ≤ |||φ(A)||| ≤ |||I|||φ . |||E||| |||I|||
52
Matrix-Monotone Functions
A map Φ that maps from the set of positive semidefinite matrices to the set of positive semidefinite matrices is called a permutation operator if for all A the entries of Φ(A) are one fixed rearrangement of those of A. Theorem 3.14 (Theorem 4.38 in [166]). For every permutation operator Φ and all ||| · ||| on the set of positive semidefinite n × n matrices √ 1 √ |||A||| ≤ |||Φ(A)||| ≤ n|||A||| (3.15) n √ √ with positive semidefinite A and the constants n and 1/ n are best possible. The special role of the `2 norm is underlined in the following result [166, Thm. 4.40]. Theorem 3.15. If |||Φ(A)||| = |||A||| holds for all permutation operators Φ and all positive semidefinite A, the ||| · ||| is a constant multiple of || · ||2 .
Definition 3.11 (Dual norm). Let || · || be a given norm on the set of positive semi-definite matrices. The dual norm of || · || with respect to the Frobenius inner product is defined by ||A||D = max{tr AB H : ||B|| = 1, B ≥ 0}.
By the duality theorem [51, Thm. 5.5.14], it follows (|| · ||D )D = || · ||.
3.4
Further Properties
The next theorem is a structural result for traces of matrix-monotone functions. It derives an upper and lower bound. Furthermore, the matrices that achieve these bounds are characterized. Theorem 3.16 is a generalization of [36].
3.4 Further Properties
53
Theorem 3.16. For positive semidefinite matrices A and B with eigenvalues α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn , it holds min tr φ(diag(α1 , . . . , αn )diag(βπ1 , . . . , βπn )) ≤ tr φ(B 1/2 AB 1/2 ) π
≤ max tr φ(diag(α1 , . . . , αn )diag(βπ1 , . . . , βπn ))
(3.16)
π
with permutation π.
This result is stated in [17] without a proof. Proof. The function φ(B 1/2 AB 1/2 ) can be rewritten using the H eigenvalues decomposition A = U A ΛA U H A , B = U B ΛB U B , and U = UH B U A . Without loss of generality, we assume that both A and B have full rank. The function is 1/2
1/2
φ(B 1/2 AB 1/2 ) = φ(ΛB U AU H ΛB ). Let B 0 = ΛB and A1 = U AU H . Note that B 0 and A1 do not commutate, i.e., B 0 A1 6= A1 B 0 . Next, we parameterize a unitary matrix U = eS with specific choice of S with S = −S H . We show that the directional derivative of the parameterized function 1/2
1/2
φ() = φ(B 0 eS A1 e−S B 0 )
(3.17)
at the point = 0 is either positive definite by a specific choice of S or negative definite for another choice of S. Therefore, it can neither be the global maximum nor the global minimum of the objective function. This is a contradiction and it follows that the maximum and minimum is attained for commutating matrices A and B. For the maximum, we chose 1/2
1/2
1/2
1/2
1/2
1/2
1/2
1/2
S = B 0 φ0 (B 0 A1 B 0 )B 0 A1 − A1 B 0 φ0 (B 0 A1 B 0 )B 0
6= 0.
Note that S H = −S in (3.4). Furthermore, it is zero only if A1 and B 0 commute or if one of them is the zero matrix.
54
Matrix-Monotone Functions
The Taylor series expansion of φ() is given by 1/2
1/2
φ() = φ B 0 A1 B 0 +
1/2 1/2 B 0 SA1 B 0
−
1/2 1/2 B 0 A1 SB 0
! + 2 (. . .) + · · · .
The first derivative of tr φ() with respect to at the point = 0 is given by ∂ 1/2 1/2 φ() = tr φ0 (B 0 A1 B 0 ) ∂ =0 ! 1/2 1/2 1/2 1/2 × B 0 SA1 B 0 − B 0 A1 SB 0 = tr
1/2
1/2
1/2
A1 B 0 φ0 (B 0 A1 B 0 )B 1/2 S ! 1/2 1/2 1/2 1/2 − B 0 φ0 (B 0 A1 B 0 )B 0 A1 S
= tr
h
1/2
1/2
1/2
1/2
A1 B 0 φ0 (B 0 A1 B 0 )B 0
1/2 1/2 1/2 1/2 − B 0 φ0 (B 0 A1 B 0 )B 0 A1
i
! S
= tr SS H > 0.
(3.18)
The same approach can be used to show that for another choice of S the derivative of φ() at the point is negative definite. Lemma 3.17. The value of the following two optimization problems is equal max |||φ(HQH H )||| = tr (Q)≤P
max |||φ(H H SH)|||.
(3.19)
tr (S)≤P
In addition to this, with the singular value decomposition (SVD) of H = U H ΛH V H H , it holds H |||φ(HQH H )||| = |||φ(H H SH)||| if Q = V H U H H SU H V H .
(3.20)
3.4 Further Properties
55
The dimensions of the three matrices in the SVD are given by: U H is 1 m × µ, ΛH is µ × µ, and V H H is µ × n with µ = min(m, n). Proof. The value of the two optimization problems in (3.19) does not depend on the left or right eigenvectors of H, because |||φ(U AU H )||| = |||φ(A)||| for unitary U and because |||(U QU H )||| = |||(Q)|||. Denote the rank of the n × m matrix H by ν. Furthermore, the value of the matrix-monotone function φ at point zero is equal to zero, i.e., φ(0) = 0. Write the unitary invariant norm as its symmetric gauge function, i.e., Φ(λ(A)) = |||A|||. As a result, the LHS of (3.19) is max |||φ(HQH H )||| tr (Q)≤P
Φ P max p ≤P = P max Φ p ≤P =
m k=1 k
ν k=1 k
φ(λ1 (H H H)p1 , . . . , λn (H H H)pn ) φ(λ1 (H H H)p1 , . . . , λν (H H H)pν ) .
(3.21)
The RHS of (3.19) is max |||φ(H H SH)||| tr (S)≤P
Φ P max s ≤P = P max Φ s ≤P =
φ(λ1 (HH H )s1 , . . . , λn (HH H )sn )
k=1 k ν k=1 k
φ(λ1 (HH H )s1 , . . . , λν (HH H )sν ) .
(3.22)
with eigenvalues s1 , . . . , sn of S. Equation (3.21) is equal to (3.22). The second part of the Lemma follows easily from H φ(HQH H ) = φ(U H ΛH V H H QV H ΛH U H )
= φ(ΛH V H H QV H ΛH ) H H = φ(ΛH V H H V H U H SU H V H V H ΛH )
= φ(ΛH U H H SU H ΛH ) H = φ(V ΛH U H H SU H ΛH V H )
= φ(H H SH).
1 This
type of SVD is sometimes called “economy size decomposition.”
(3.23)
56
Matrix-Monotone Functions
Definition 3.12 (Contraction). A matrix C is a contraction if C H C ≤ I, or equivalently, ||C||∞ ≤ 1.
Theorem 3.18 (Theorem 1.15 in [166]). Let φ be a matrixmonotone function on [0, ∞), ψ a matrix convex function on [0, ∞) with ψ(0) ≤ 0. Then for every contraction C and every A ≥ 0, φ(C H AC) ≥ C H φ(A)C
3.4.1
and ψ(C H AC) ≤ C H ψ(A)C.
Connections to Connections
The material in this section serves as a brief outlook to a certain class of binary operations that is closely related to matrix-monotone functions. Definition 3.13 (Connection). A binary operation σ on the class of positive definite matrices (A, B) → AσB, is a connection if the following requirements are fulfilled [84]: (1) A ≤ C and B ≤ D imply AσB ≤ CσD. (2) C(AσB)C ≤ (CAC)σ(CBC). (3) If a series An converges to A and a series B n converges to B, respectively, then the series(An σB n ) converges to AσB.
If 1σ1 = 1 then the connection is a mean. Interestingly, the inequality in (2) leads directly to the following result if the inverse of C exists C(AσB)C − (CAC)σ(CBC) = C AσB − C −1 (CAC)σ(CBC)C −1 C ≥ C [AσB − AσB] C = 0.
(3.24)
The inequality in (3.24) follows from (2). Therefore, it holds C(AσB)C = (CAC)σ(CBC).
(3.25)
3.4 Further Properties
57
Example 3.12. For positive invertible matrices A and B and for 0 ≤ α ≤ 1, the arithmetic mean, the geometric mean, and the harmonic mean are defined as follows: A∇α B = (1 − α)A + αB α A]α B = A1/2 A−1/2 BA−1/2 A1/2 −1 A!α B = (1 − α)A−1 + αB −1 . Interestingly, the inequalities between the harmonic, geometric, and arithmetic mean hold also for the matrix case [4, 116] A!α B ≤ A]α B ≤ A∇α B.
Remark 3.11. There exists a one-to-one correspondence between a connection σ and a matrix-monotone function φ ≥ 0 on [0, ∞). The operator connection σ can be defined via the corresponding function φ, which is called the representing function of σ, by AσB = A1/2 φ(A−1/2 BA−1/2 )A1/2 if A is invertible, and σ is a matrix mean if and only if φ(1) = 1. Regarding the three means from the last example, the corresponding matrix-monotone functions are (1 − α) + αt for ∇α , tα for ]α , and [(1 − α) + αt−1 ]−1 for !α . From this and the representation in (3.9), it follows also the relation Z ∞ 1σA = φ(A) = aI + bA + sA [sI + A]−1 dµ(s) 0 Z ∞ = aI + bA + s(I!1/2 sA)dµ(s). 0
Example 3.13. Consider the function tr φ Z −1/2 HQH H Z −1/2 . It can be represented by tr φ Z −1/2 HQH H Z −1/2 = tr 1σφ Z −1/2 HQH H Z −1/2 = tr Z −1 Z −1 σφ HQH H .
4 Application of Majorization in Wireless Communications
4.1
Spatial Correlation in Multiple Antenna Systems
Recently, there is a transition in communication theory of how fading variations are judged. The time variation and spectral variation of the propagation channel are nowadays welcome. In fact, they are exploited to increase the reliability and spectral efficiency in mobile communications systems. It is well known, that fading variations in time, space, and frequency, increase the diversity of the system (e.g., [159]). With the introduction of MIMO systems the question about how to model, analyze, and exploit the spatial correlation that is observed at the transmit antenna array and the receive antenna array (see, e.g., [165, ch. 2] for outdoor scenario and [102, 160] for MIMO channels). If the antenna geometry is simple, e.g., a uniform linear array (ULA), the antenna correlation matrix leads to a Toeplitz structure. Since multiple antennas are on both sides of the link, there may or may not be correlation between transmit and receive antenna pairs. In the Kronecker model, the correlation is modeled locally at the transmit and receive side and in between rich multipath scattering is assumed. 59
60
Application of Majorization in Wireless Communications
4.1.1
The Kronecker Model
Consider the quasi-static block flat-fading MIMO channel H. The correlation of the channel matrices arises in the common downlink transmission scenario in which the base station is un-obstructed [131]. We follow the model in [38] where the subspaces and directions of the paths between the transmit antennas and the receive cluster change more slowly than the actual attenuation of each path. The most general form of the correlation model consists of a very large correlation matrix of size (nT · nR × nT · nR ) which incorporates the transmit and receive correlation, i.e., it is the expectation of the outer product of the vectorized channel matrix κ = E vec(H) · vec(H)H . (4.1) The correlation matrix κ in (4.1) expresses the correlation between each transmit or receive element to every other transmit or receive element. Often, the transmit and the receive antenna array are spatially divided. Then, the following simplification is possible (see, e.g., [30]): Definition 4.1 (Kronecker correlation model). If the transmit correlation is independent of the receive antenna and the receive correlation does not depend on the transmit antenna, the correlation matrix in (4.1) is a block-matrix that is given by κ = RR ⊗ RT
(4.2)
and the corresponding correlation model is called Kronecker correlation model. The channel matrix H for the case in which we have the Kronecker assumption and correlated transmit and correlated receive antennas is modeled as 1
1
H = RR2 · W · RT2
(4.3)
with transmit correlation matrix RT = U T D T U H T and receive correlaH tion matrix RR = U R D R U R . U T and U R are the matrices with the eigenvectors of RT and RR , respectively, and D T , D R are diagonal
4.1 Spatial Correlation in Multiple Antenna Systems
61
matrices with the eigenvalues of the matrix RT and RR , respectively. The random matrix W has zero-mean independent complex Gaussian identically distributed entries, i.e.,W ∼ CN (0, I). The matrix W models the rich multipath environment between transmit and receive antenna array. The constructed matrix H in (4.3) satisfies (4.2) because vec(AXC) = (C T ⊗ A)vec(X) and E vec(H)vec(H)H h i 1/2 1/2 1/2 1/2 = E (RR ⊗ RT )vec(W )vec(W )H (RR ⊗ RT ) 1/2
1/2
1/2
1/2
= (RR ⊗ RT )(RR ⊗ RT ) = RR ⊗ RT . In Figure 4.1, some of the basic assumptions in MIMO channels are illustrated. Often, it is assumed that the base station antennas are mounted on roof top of high buildings or towers. Therefore, less local scatterer surround the base station antenna array and increased spatial correlation can be observed. In contrast, the mobile moves around surrounded by buildings, cars, trees, and pedestrians. Therefore, it is often assumed that the mobile antennas are spatially uncorrelated. Note that polarization diversity provides an additional degree of freedom [96]. The analysis in [28, 29] is adapted to several special practical scenarios in which so called keyholes occur. For example, in transmission scenarios in which we have long corridors (see Figure 4.1) the channel can be singular. This is not because of correlation at the transmitter or the receiver but because of a keyhole in between.
Fig. 4.1 Propagation models: Correlated transmit antennas at base station, uncorrelated mobile with rich scattering, and key-hole channel.
62
Application of Majorization in Wireless Communications
In the case in which each receive antenna observes the same correlation between the transmit antennas, i.e., the transmit correlation is independent of the receive antenna and vice versa the receive correlation is independent of the transmit antenna, the correlation model in (4.1) simplifies to the model in (4.3). Note that the Kronecker model arises not only in MIMO communications but also in the modeling of electroencephalography (EEG) data. Methods to estimate the correlation matrices under the Kronecker assumption are described in [154]. Note that the Kronecker model is a limited correlation model that can only be applied successfully under certain conditions on the local scattering at the transmitter and receiver [85, 103]. Therefore, a more generalized model is to allow a sum of Kronecker products [11], i.e., κ=
n X
T RR k ⊗ Rk .
(4.4)
k=1
However, it turns out that even the model (4.4) cannot cover the complete set of positive semi-definite correlation matrices. One counter example is explicitly given here for the case nT = nR = 21 3/4 0 0 3/8 0 1/4 1/8 0 κ= . 0 1/8 1/8 0 3/8 0 0 3/8 4.1.2
A Measure of Spatial Correlation
In order to provide a measure of correlation, we take two arbitrarily chosen transmit correlation matrices R1T and R2T with the constraint that trace(R1T ) = trace(R2T ) = nT which is equivalent to nT X l=1
λT,1 l
=
nT X
λT,2 l ,
(4.5)
l=1
T,2 with λT,1 l , 1 ≤ l ≤ nT , and λl , 1 ≤ l ≤ nT , are the eigenvalues of the covariance matrix R1T and R2T , respectively. 1 This
example is taken from [52].
4.1 Spatial Correlation in Multiple Antenna Systems
63
This constraint regarding the trace of the correlation matrix RT is necessary because the comparison of two transmission scenarios is only fair if the average path loss is equal. Without receive correlation, the trace of the correlation matrix can be written as nT nT X X H tr (RT ) = E HH = E |hi |2 . (4.6) ii i=1
i=1
However, the RHS of (4.6) is the sum of the average path loss from the transmit antenna i = 1, . . . , nT . In order to study purely the impact of correlation on the achievable capacity separately, the average path loss is kept fixed by applying the trace constraint on the correlation matrices R1T and R2T . We will say that a correlation matrix R1T is more correlated than T,1 T,1 R2T with descending ordered eigenvalues λT,1 1 ≥ λ2 ≥ · · · ≥ λnT ≥ 0 T,2 T,2 T,2 and λ1 ≥ λ2 ≥ · · · ≥ λnT ≥ 0 if m X
λT,1 k
k=1
≥
m X
λT,2 1 ≤ m ≤ nT − 1. k
(4.7)
k=1
The measure of correlation is defined in a natural way: the larger the first m eigenvalues of the correlation matrices are (with the trace constraint in (4.6)), the more correlated is the MIMO channel. As a result, the most uncorrelated MIMO channel has equal eigenvalues, whereas the most correlated MIMO channel has only one non-zero eigenvalue which is given by λ1 = nT . The following definition provides again themeasure for comparison of two correlation matrices. Definition 4.2 (Measure for spatial correlation). The transmit correlation matrix R1T is more correlated than R2T if and only if m X l=1
λT,1 ≥ l
m X l=1
λT,2 l
for m = 1, . . . , nT , and
nT X l=1
λT,1 1 =
nT X
λT,2 2 .
l=1
(4.8) One says that the vector consisting of the ordered eigenvalues λT1 majorizes λT2 , and this relationship can be written as λT1 λT2 like in Definition 2.1.
64
Application of Majorization in Wireless Communications
Remark 4.1. Note that our definition of correlation in Definition 4.2 differs from the usual definition in statistics. In statistics a diagonal covariance matrix indicates that the random variables are uncorrelated. This is independent of the auto-covariances on the diagonal. In our definition, we say that the antennas are uncorrelated if in addition to statistical independence, the auto-covariances of all entries are equal. This difference to statistics occurs because the direction, i.e., the unitary matrices of the correlation have no impact on our measure of correlation. Imagine the scenario in which all transmit antennas are uncorrelated, but have different average transmit powers because of their amplifiers. In a statistical sense, one would say the antennas are uncorrelated. Our measure of correlation says that the antennas are correlated, because they have different transmit powers. The measure of correlation in Definition 4.2 is more suitable for the analysis of the performance of multiple antenna systems, because different transmit powers at the antennas obviously havea strong impact on the performance. We will not consider such effects. Example 4.1. As an example for the proposed measure of correlation, consider the following simple 2 × 2 correlation matrix R, i.e., for 0 ≤ ρ≤1 1 ρ R(ρ) = . ρ 1 The eigenvalues of the correlation matrix are given by 1 + ρ and 1 − ρ. Therefore, we have ρ1 ≥ ρ2 =⇒ λ(R(ρ1 )) λ(R(ρ2 )) and R(ρ1 ) is more correlated than R(ρ2 ). The extreme cases are ρ = 0 which leads to completely uncorrelated R(0) = I and ρ = 1 which leads to completely correlated R(1).
4.1 Spatial Correlation in Multiple Antenna Systems
65
Example 4.2. As another example of the measure of correlation, consider a random Gaussian distributed vector z of dimension n with z ∼ CN (0, R) and with covariance matrix R. Denote the eigenvalues of the covariance matrix R as r = [r1 , . . . , rn ]. In the following, we show that the entropy hr (z) is a Schur-concave function with respect to the correlation eigenvalues r. The entropy of z is given by [33, Thm. 9.6.5] " # n Y hr (z) = log [(2πe)n det(R)] = log (2πe)n (4.9) ri . i=1
It can be shown alternatively, by " # n n Y X ri = log(2πe)n + log ri . hr (z) = log (2πe)n i=1
(4.10)
i=1
According to Proposition 2.7 in Section 2.2 n n X X λµ→ log λi ≤ log µi . i=1
i=1
Another measure of spatial correlation is explained in [89] for low power spectral efficiency of MIMO systems. Definition 4.3 (Dispersion of random n×n matrix). The dispersion of a random n × n matrix A is defined as ζ(A) = n
EA [ tr (A2 )] . E2A [ tr A]
Applied to an n × n correlation matrix R, the dispersion reduces to 2 the correlation number, i.e., ζ(R) = trnR . Obviously, the function is Schur-convex with respect to the eigenvalues of the correlation matrix by Proposition 2.7. Therefore, the dispersion is a special case of the Majorization based measure of correlation. Remark 4.2. In [53], another measure of spatial correlation is described. It is named diversity measure and it is given by tr R 2 ( tr R )2 Ψ(R) = = ||R||F tr R2
66
Application of Majorization in Wireless Communications
and therefore closely related to the correlation number defined above n Ψ(R) = . ζ(R) Therefore, the diversity measure is also a special case of the Majorization based measure of correlation. In the next subsections, various examples are presented where the Majorization based measure of correlation is applied in order to characterize the impact of spatial correlation on the system performance. For different performance function it will be shown that there are either Schur-convex or Schur-concave with respect to the eigenvalues of the correlation matrices. The interpretation will always be as follows: Consider a performance function which measures the percentage of successful transmission, e.g., the BER, the SER, the MSE, or the outage probability, and so on. The lower the performance function is the better the performance. • If the performance function is Schur-convex with respect to the correlation eigenvalues, that means the function increases with higher correlation and performance decreases. This leads to the statement: Spatial correlation decreases performance. • If the performance function is Schur-concave with respect to the correlation eigenvalues, that means the function decreases with higher correlation and performance increases. This leads to the statement: Spatial correlation increases performance. For a performance function which measures the number of error free transmitted bits, e.g., the transmission rate, the spectral efficiency, or the goodput, and so on. The higher the performance function is the better the performance. • If the performance function is Schur-convex with respect to the correlation eigenvalues, that means the function increases with higher correlation and performance increases. This leads to the statement: Spatial correlation increases performance.
4.1 Spatial Correlation in Multiple Antenna Systems
67
• If the performance function is Schur-concave with respect to the correlation eigenvalues, that means the function decreases with higher correlation and performance decreases. This leads to the statement: Spatial correlation decreases performance. 4.1.3
Error Performance of OSTBC
One concept for achieving high portions of the capacity and performance gains in MIMO systems is space–time coding [41, 115, 135]. The design and analysis of orthogonal space–time block codes (OSTBC) was studied in [134, 136]. For the design of space–time codes it is usually assumed that the receiver has perfect channel state information (CSI) while the transmitter has no CSI. However, the situation in which space–time codes are combined with linear precoding is also studied, e.g., in [57, 169, 170]. The average bit error performance of the Alamouti scheme is studied in [144], the average symbol error performance of OSTBC is analyzed in [130], and the performance of general space–time codes is derived in [23]. The impact of correlation on the outage probability in OSTBC MIMO systems is studied in [124]. We consider a single-user MIMO system with nT transmit and nR receive antennas. The receiver has perfect CSI while the transmitter has no CSI. The transmitter applies an OSTBC. The noise at the receiver is complex iid distributed with variance σn2 I nR . The total transmit power is constrained to P . We define the SNR as ρ = σP2 . The channel n T ,nR is assumed to be a flat fading channel with matrix entries [hi,j ]ni=1,j=1 . The receiver applies a matched filter and we obtain at the output k of the matched filter at the receiver the following: v uX nR u nT X ˜ k ∀k = 1, . . . , nT (4.11) r k = t |hi,j |2 xk + n i=1 j=1
with n ˜ k is complex iid with variance σn2 because the normed matched filter matrix is unitary. The matched filter that leads to (4.11) is matched to an effective channel that takes into account the space–time code, not the actual physical channel.
68
Application of Majorization in Wireless Communications
We follow the approach in [110] in order to derive the BER of the OSTBC. Consider one output of the matched filter from (4.11), then the instantaneous SNR per bit is given by nT X nR nT X nR X X γ= |hi,j |2 ρ = λTi λR (4.12) i si,j ρ. i=1 j=1
i=1 j=1
√ The decision x ˆk = sign(rk ) has bit error probability Q 2γ . The R∞ 1 t2 Q-function is defined as Q(x) = 2π x exp(− 2 )dt. Averaging the bit error probability over the pdf of the instantaneous SNR γ provides the BER [110]. The pdf of γ is a function of nT and nR and the correlation vectors µ and ν. Hence, the BER as a function of the SNR ρ, the number of transmit nT and receive nR antennas and the correlation µ, ν can be written as nT X nR h p i X . 2γ = Es1,1 ,...,snT ,nR f ρ λTi λR BER = Eγ Q i si,j i=1 j=1
If we collect all tuples of eigenvalues of the transmit and receive corT R relation matrix in one large vector η with η1 = λT1 λR 1 , η2 = λ1 λ2 , . . . , ηnT nR = λTnT λR nR and sort all components in non-decreasing order η1 ≥ η2 ≥ · · · ≥ ηnT ·nR ≥ 0 the average BER can be rewritten as !# " nX T ·nR BER = Es1 ,...,snT ·nR f ρ ηk sk . k=1
We can readily apply Lemma 2.16 to obtain the following corollary. Corollary 4.1 (Average BER of OSTBC is Schur-convex). The average BER of OSTBC MIMO systems in spatially correlated Rayleigh fading is Schur-convex with respect to the transmit and receive correlation, i.e., the more correlated the transmit or receive antennas are (according to the Definition 4.2) the worse is the performance. The worst case performance is obtained by completely correlated transmit and receive antennas. The performance is equal to the performance of a SISO system with nT = 1 and nR = 1 but with channel
4.1 Spatial Correlation in Multiple Antenna Systems
69
gain nT nR , i.e., BERwc = Es1 [f (nT nR s1 )]. The best case performance is achieved for completely uncorrelated transmit and receive anten PnT PnR nas, i.e., BERbc = Es1,1 ,...,snT ,nR f = Ev [f (v)] with k=1 l=1 sk,l 2 χ distributed v with 2nT nR degrees of freedom. This is maximum diversity gain of the system. In [76], it is shown that the full diversity gain is achieved as long as the channel correlation matrices have full rank. If the correlation matrices have full rank, the spatial correlation shifts the BER curves only to the right but do not change the slope. In Figure 4.2, the average BER for BPSK modulation and a 2 × 2 MIMO system applying an Alamouti STC is shown for three correlation scenarios. The result can be interpreted in the following way: Since no CSI is available at the transmitter, the spatial dimension is best used if all spatial diversity is exploited and an OSTBC is used to achieve full diversity. Therefore, the reduction in diversity due to correlated transmit or receive antennas leads to a performance degradation. This fact is shown in the corollary above. Closed form expressions for the BER as well as an illustration can be found in [76]. Equal gain combining and selection combining are studied in [75].
Fig. 4.2 Average BER for Alamouti STC 2 × 2 and different spatial correlations.
70
Application of Majorization in Wireless Communications
4.1.4
Average Capacity of MISO Systems
Consider a single user system with multiple transmit antennas and a single receive antenna. In [157], the potential of transmit diversity systems was pointed out. The capacity of a MISO system with imperfect feedback was first analyzed in [99, 100, 149]. In [55, 66], the optimum transmission strategy with covariance knowledge at the transmit array with respect to the ergodic capacity was analyzed. It has been shown that even partial CSI at the transmitter can increase the mutual information of a MISO system. Recently, transmission schemes for optimizing mutual information in MISO mean-feedback and covariance-feedback systems were derived in [99, 149]. The capacity can be achieved by Gaussian distributed transmit signals with a particular covariance matrix. Consider the standard MISO block-flat-fading channel model. The block-flat-fading channel model is given by y = xH h + n with complex nT × 1 transmit vector x, channel vector h (nT × 1), circularly sym2 metric complex Gaussian noise n with variance σ2n per dimension. The channel vector h is constant for a block of T symbols. Then the channel changes to a completely new uncorrelated channel realization. For convenience, we define the inverse noise variance as ρ = 1/σn2 . In the following, we assume that the receiver knows h perfectly. The channel vector consists of complex Gaussian distributed entries with zero mean and covariance matrix R, i.e., h ∼ CN (0, R). Denote the ordered eigenvalues of R as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0. Denote wk = |hk |2 as iid standard exponentially distributed random variables. In [68], it is shown that the average mutual information with an uninformed transmitter is given by ! nT X ρ noCSI Copt (µ) = E log 1 + µl wl . (4.13) nT l=1
Furthermore, the ergodic capacity with perfect CSI at the transmitter is given by ! nT X pCSI Copt (µ) = E log 1 + ρ µl wl . (4.14) l=1
4.1 Spatial Correlation in Multiple Antenna Systems
71
Corollary 4.2. The average mutual information in (4.13) and the ergodic capacity in (4.14) are Schur-concave function with respect to the vector of eigenvalues µ. Proof. Since f (x) = log(1 + ax) for a > 0 is a concave function, this result follows directly from Lemma 2.16. The ergodic capacity with covariance feedback is given by [68] ! nT X cfCSI (4.15) Copt (µ) = max E log 1 + ρ pl µl wl . p≥0:|p|≤1
l=1
Corollary 4.3. The ergodic capacity in (4.15) is Schur-convex with respect to µ. Proof. This follows from Theorem 2.18. These results lead to a complete characterization of the impact of correlation on the average mutual information of MISO systems. The inequality chain in the next corollary shows the relation between the different CSI schemes and different levels of correlation. Assume that the correlation vector µ2 majorizes µ1 , i.e., µ1 µ2 . We define the fully correlated vector ψ = [nT , 0, . . . , 0]T and the completely uncorrelated vector as χ = [1, 1, . . . , 1]T . Note, that the vector ψ majorizes all other vectors and that the vector χ is majorized by all other vectors. Corollary 4.4. For the ergodic capacities in MISO systems with different levels of correlation and different CSI at the transmitter, we have the following inequalities: noCSI noCSI noCSI noCSI Copt (ψ) ≤ Copt (µ2 ) ≤ Copt (µ1 ) ≤ Copt (χ) cfCSI cfCSI cfCSI cfCSI = Copt (χ) ≤ Copt (µ1 ) ≤ Copt (µ2 ) ≤ Copt (ψ) pCSI pCSI pCSI pCSI = Copt (ψ) ≤ Copt (µ2 ) ≤ Copt (µ1 ) ≤ Copt (χ).
An illustration of this inequality chain can be found in [68].
(4.16)
72
Application of Majorization in Wireless Communications
4.1.5
Outage Probability of MISO System
In contrast to the ergodic capacity which is a measure for the amount of average information error-free transmitted, the outage probability is a more subtle measure for the probability of successful transmission while the channel is in a certain channel state. Since the instantaneous capacity depends on the channel state, it is itself a random variable. The first moment corresponds to the ergodic capacity. The cumulative distribution function (cdf) is the outage probability. The outage probability gives the probability that a given transmission rate cannot be achieved in one fading block. Recently, the outage probability was studied for multiple antenna channel and space–time codes [97, 125]. The properties of the optimum transmission strategies change, if we replace the ergodic capacity as objective function with the outage probability. e.g., for no CSI at thetransmitter, the optimum transmission strategy is to use only a fraction of the available number of transmit antennas. Telatar has already conjectured this in [137]. In [64], a part of this conjecture is verified. In addition to this, in [64], a necessary and sufficient condition for the optimality of single-antenna processing was derived. The complete solution of Telatars conjecture can be found in [73]. Consider again the standard MISO block-flat-fading channel model. The block-flat-fading channel model is given by y = xH h + n with complex nT × 1 transmit vector x, channel vector h (nT × 1), cir2 cularly symmetric complex Gaussian noise n with variance σ2n per dimension. The channel vector consists of complex Gaussian distributed entries with zero mean and covariance matrix R, i.e., h ∼ CN (0, R). Denote the ordered eigenvalues of R as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0. Denote wk = |hk |2 as iid standard exponentially distributed random variables. In [73], different types of CSI are studied. Here, consider the case where the transmitter is uninformed and it performs equal power allocation. However, the channel is assumed to be correlated. In this case the outage probability is given by " Pout (ρ, R, µ) = Pr log 1 + ρ
nT X k=1
! µk sk
# ≤R .
4.1 Spatial Correlation in Multiple Antenna Systems
73
Observe that the outage probability is symmetric but in general neither concave nor convex with respect to µ. Therefore, the stochastic majorization approach presented in Section 2.2.2 cannot be applied and a new technique is developed. It turns out that the behavior of the outage probability is chameleonic compared to the clear unique behavior of the average mutual information. Theorem 4.1 (Schur-convexity of outage probability). For a MISO system which applies equal power allocation and for fixed transmission rate R, the outage probability as a function of the correlation properties of the transmit antennas is characterized by the following statements: R
• for SNR ρ < ρ = 2 2−1 , the outage probability is a Schur-concave function of the channel covariance matrix eigenvalues µ1 , . . . , µnT , i.e., correlation decreases the outage probability and µ1 µ2 =⇒ Pout (ρ < ρ, R, µ1 ) ≤ Pout (ρ < ρ, R, µ2 ), • for SNR ρ > ρ = 2R − 1, the outage probability is a Schurconvex function of the channel covariance matrix eigenvalues µ1 , . . . , µnT , i.e., correlation increases the outage probability and µ1 µ2 =⇒ Pout (ρ < ρ, R, µ1 ) ≥ Pout (ρ < ρ, R, µ2 ).
The proof follows directly from Theorem 2.19. For more discussion and illustrations, the interested reader is referred to [73]. 4.1.6
Outage Probability of OSTBC–MISO Systems
There has been a considerable amount of work on a variety of new codes and modulation signals, called space–time (ST) codes, in order to approach the huge capacity of multiple antenna channels. One scheme of particular interest is the Alamouti scheme [3] for two transmit antennas. Later on, [134, 138] proposed more general schemes referred to as
74
Application of Majorization in Wireless Communications
OSTBC with the same properties as the Alamouti scheme like, e.g., a remarkably simple maximum-likelihood decoding algorithm. The performance of OSTBC with respect to mutual information was analyzed (among others) for the uncorrelated Rayleigh fading case in [118, 6] and for the more general case with different correlation scenarios and line of sight (LOS) components in [98]. More information about ST codes can be found in the books [86] and [109]. Consider again the standard MISO block-flat-fading channel model given by y = xH h + n with complex nT × 1 transmit vector x, channel vector h (nT × 1), circularly symmetric complex Gaussian noise n with 2 variance σ2n per dimension. The inverse noise variance is denoted by ρ = σ12 . The channel vector consists of complex Gaussian distributed n entries with zero mean and covariance matrix I, i.e., h ∼ CN (0, I). The transmitter has no CSI and applies an OSTBC. For data stream k the received signal after channel matched filtering is given by yk = ||h||2 xk + nk .
(4.17)
For an OSTBC with nT transmit antennas it is shown [87] that the maximum achievable rate is given by
rc (nT ) =
b nT2+1 c + 1 2b nT2+1 c
(4.18)
It is important to note, that rc (k + 1) = rc (k + 2) with k even. Furthermore, it holds that limnT →∞ rc (nT ) = 1/2. The outage probability for the model in (4.17) is a function of the number of transmit antennas nT , the rate R, and the SNR ρ and it is given by [77]
Pout (ρ, nT , R) = 1 −
Γ(l, (2R/rc (l) − 1) ρl ) Γ(l)
(4.19)
with the incomplete Gamma function Γ(n, x) [1]. The proof of the next Theorem can be found in [77].
4.1 Spatial Correlation in Multiple Antenna Systems
75
Theorem 4.2. Fix R and ρ. The minimum of the outage probability of the OSTBC MISO system with nT antennas min Pout (R, l, ρ) = min 1 −
1≤l≤nT
1≤l≤nT
Γ(l, (2R/rc (l) − 1) ρl ) Γ(l)
(4.20)
is attained for even l or nT . In Figure 4.3, the outage probability as a function of the SNR is shown for l = 2, 3, 4, 5, 6 active antennas and Rate R = 1. In Figure 4.3, the switching SNR points from two to three ρ23 , from two to four ρ24 , from four to five ρ45 , and from four to six ρ46 are shown. In order to minimize the outage probability we always choose the lowest of the curves. That means that the higher the SNR the more antennas are used. Up to 4.8 dB an orthogonal ST coded system with two antennas is optimal. Then a system with four antennas is optimal in the range from 4.8 dB up to 5.4 dB. And from 5.4 dB a system with 6 antennas has minimum outage probability. In Figure 4.3, it can be observed that as indicated in the proof of Theorem 4.2, theswitching points from an
Fig. 4.3 Outage probability of OSTBC with l = 2, 3, 4, 5, 6 active antennas and R = 1.
76
Application of Majorization in Wireless Communications
even to an odd number of antennas ρ23 and ρ45 are at a higher SNR as the switching points from even to the next even number of antennas ρ24 and ρ46 , respectively. 4.1.7
Average Capacity of MIMO System
Consider the standard MIMO block flat-fading channel model y = Hx + n
(4.21)
with complex nT × 1 transmit vector x, channel matrix H with nR × nT entries, circularly symmetric complex Gaussian noise n with σ2 variance 2N I per dimension. For convenience, we define the inverse 2 . We assume that the receiver knows H noise variance as ρ = 1/σN perfectly. The channel matrix H for the case in which we have correlated transmit and correlated receive antennas is modeled as in (4.2), i.e., 1
1
H = RR2 · W · RT2 with transmit correlation matrix RT = U T D T U H T and receive correlation matrix RR = U R D R U H . U and U are T R R the matrices with the eigenvectors of RT and RR , respectively, and D T , D R are diagonal matrices with the eigenvalues of the matrix RT and RR , respectively, i.e., D T = diag[λT1 , . . . , λTnT ] and D R = R diag[λR 1 , . . . , λnR ]. Without loss of generality, we assume that all eigenvalues are ordered with decreasing order, i.e., λT1 ≥ λT2 ≥ · · · ≥ λTnT . The random matrix W has zero-mean independent complex Gaussian identically distributed entries, i.e., W ∼ CN (0, I). The average performance measure will be defined in the next chapter using matrix-monotone functions. Consider the following average performance function and assume for the moment that φ is the mutual information, i.e., φ(x) = log(1 + x). The characterization of a general class of performance functions will be given in Section 5.1.4. Then the average performance reads ! nT X T R T H ˜ kw ˜k Φ(λ , λ ) = E tr φ ρ λk w k=1
˜ = RR W . ˜ k , i.e., W with w
4.1 Spatial Correlation in Multiple Antenna Systems
77
¯ R and fixed vector Corollary 4.5. For fixed receive correlation vector λ λT0 and for arbitrary vector λT1 which majorizes vector λT0 , i.e., λT1 λT0 ¯ R ) ≥ Φ(λT1 , λ ¯ R ). it follows that Φ(λT0 , λ ¯ T and fixed vector λR For fixed transmit correlation vector λ 0 and R R R λ , i.e., λ which majorizes vector λ for arbitrary vector λR 0 , it 1 0 1 T T R R ¯ ¯ follows that Φ(λ , λ0 ) ≥ Φ(λ , λ1 ). This Corollary follows from Theorem 2.20. The result is illustrated in Figure 4.4 for a 2 × 2 MIMO system. Further discussions, analysis, and illustrations can be found in [71]. 4.1.8
MIMO Spectral Efficiency in the Low Power Regime
In [143], the spectral efficiency in the wideband regime was studied Eb using two novel performance metrics, namely the minimum N and 0
Fig. 4.4 Average mutual information for 2 × 2 MIMO system as a function of transmit and receive correlation.
78
Application of Majorization in Wireless Communications
the wideband slope S0 . These quantities characterize the first and second order behavior of the capacity at low SNR values. For the uncorrelated Rayleigh fading case, these performance metrics were derived in [143, Thm. 12] for the informed transmitter case, and [143, Thm. 13] for the uninformed transmitter case with perfect CSI at the receiver only. In [89], the uninformed transmitter case with correlation in Rician fading MIMO channels with polarization diversity was studied. It turned out, that transmit and receive correlation has Eb no impact on the minimum N but on the wideband slope. This 0 impact was quantified in [89] by the correlation number. In [141], the impact of antenna correlation on the capacity of multiantenna Eb channels was analyzed by studying the minimum N , the wideband 0 slope S0 , and the high SNR slope S∞ for certain classes of MIMO channels. The transmitter has nT transmit antennas. The receiver applies nR receive antennas. The received signal vector y in the quasi-static block flat fading MIMO channel H is given by y = Hx + n with transmit signal x and additive white Gaussian noise (AWGN) vector n ∼ CN (0, σn2 I). The channel matrix H for the case in which we have correlated transmit and correlated receive antennas is modeled 1
1
as H = RR2 · W · RT2 with transmit correlation matrix RT and receive correlation matrix RR . In [143], the low-SNR regime has been analyzed and two perforEb mance measures namely the N and the wideband slope S0 were 0 min introduced. The system parameters bandwidth B, transmission rate Eb R, transmit power P , and spectral efficiency C N satisfy the funda0 mental limit R Eb ≤C . (4.22) B N0 The function C
Eb N0
is directly related to the common capacity expres Eb sion C(SNR), i.e., C N = C(SNR) for the SNR which solves 0 Eb C(SNR) = SNR. N0
4.1 Spatial Correlation in Multiple Antenna Systems
Eb At low SNR, the function C N can be expressed as [143] 0 Eb S0 Eb Eb C ≈ − N0 3dB N0 dB N0 min dB
79
(4.23)
with 2 ˙ 2 C(0) Eb loge 2 . = and S0 = ¨ ˙ N0 min −C(0) C(0)
(4.24)
Eb Eb The closer N gets to N the better is the approximation in (4.23). 0 0 min Note, that the first and second derivative in (4.24) are taken of the function common capacity function C(SNR). In [89, 142], the two performance measures in (4.24) were computed for the MIMO channel without CSI at the transmitter and with perfect Eb CSI at the receiver. The minimum N and the wideband slope are given 0 by [143, Thm. 13]
loge 2 Eb noCSI = , N0 min nR
(4.25) 2n2T n2R
S0noCSI = n2T
nR P
2 (λR k)
+
Note that we focus on the transmitted
Eb N0
k=1
n2R
nT P k=1
.
(4.26)
(λTk )2
as in [89].
Lemma 4.3. Fix the receive correlation λR . The wideband slope S0noCSI as a function of the transmit correlation S0 (λR ) is Schurconcave, i.e., λ1T λ2T =⇒ S0noCSI (λ1T ) ≤ S0noCSI (λ2T ). For fixed transmit correlation, the wideband slope S0noCSI is Schurconcave with respect to the receive correlation, i.e., λ1R λ2R =⇒ S0noCSI (λ1R ) ≤ S0noCSI (λ2R ).
Lemma 4.3 is a direct consequence of Lemma 2.16.
80
Application of Majorization in Wireless Communications
For perfect CSI at the transmitter and the receiver, the the wideband slope S0 are given by loge 2 Eb pCSI = N0 min Eλmax (HH H ) S0pCSI =
2(Eλmax (HH H ))2 . E(λmax (HH H ))2
Eb N0 min
and
(4.27) (4.28)
The impact of correlation on the performance metric in (4.27) is characterized in the following theorem. Theorem 4.4. With perfect CSI at the transmitter and receiver, the Eb minimum N is Schur-concave with respect to the transmit and receive 0 correlation, i.e., for fixed receive correlation it holds λT1 λT2 =⇒
Eb pCSI T Eb pCSI T (λ1 ) ≤ (λ2 ). N0 min N0 min
Eb Proof. The minimum N and the wideband slope S0 do not depend 0 min on the eigenvectors of the transmit and receive correlation matrix since the pdf of H is invariant against multiplication with unitary matrix from left and from right, i.e., E λmax (HH H ) = E λmax (RT W RR W H ) = E λmax (D T W D R W H ) . (4.29)
Fix the receive correlation and express the expectation in (4.29) as a function of the vector of eigenvalues in D T f (λT ) = E λmax (diag(λT )W D R W H ) (4.30) We have the following two observations: (1) f (λT ) is symmetric with respect to λT since for all permutation matrices Π it holds f (ΠλT ) = E λmax (diag(ΠλT )W D R W H ) = E λmax (Πdiag(λT )ΠW D R W H ) = E λmax (diag(λT )ΠW D R W H Π) = E λmax (diag(λT )W D R W H ) .
81
4.1 Spatial Correlation in Multiple Antenna Systems
The last equality follows from the fact that the pdf of W and ΠW is equal, because the permutation matrix Π is unitary. (2) f (λT ) is convex with respect to λT . This holds even for each realization W . Define Λ(t) = tΓ + (1 − t)Ψ. It holds λmax (Λ(t)W D R W H ) = λmax (W H [tΓ + (1 − t)Ψ]W ) = λmax (tW H ΓW + (1 − t)W H ΨW ) ≤ tλmax (W H ΓW ) +(1 − t)λmax (W H ΨW ).
(4.31)
The last inequality is proven in Theorem 2.11 in Subsection 2.2.1. Using the Theorem 2.15 in Subsection 2.2.2 we observe that the conditions, i.e., convexity and symmetry are fulfilled for Schur-convexity. This completes the proof. For covariance knowledge at the transmitter, the minimum the wideband slope S0 are given by Eb covCSI loge 2 = N0 min nR λT1 S0covCSI =
2n2R PnR R 2 . E k=1 λk wk
Eb N0
and
(4.32) (4.33)
This corresponds to the result in [143, Eq. (236)]. Theorem 4.5. With covariance knowledge at the transmitter and perEb fect CSI at the receiver, the minimum N is Schur-concave with respect 0 to transmitter correlation and does not depend on the receiver correlation. The wideband slope is Schur-concave with respect to the receiver correlation and does not depend on the transmitter correlation. The proof follows from Lemma 2.16. Bounds on the achievable performance can be found in [70]. Eb In Figure 4.5, the spectral efficiency over N is shown for different 0 MIMO systems with uninformed transmitter and perfectly informed
82
Application of Majorization in Wireless Communications
Fig. 4.5 Spectral Efficiency over
Eb N0
for different MIMO systems and transmitter and
receiver correlation for uninformed transmitter. The solid lines are the slope S0 approximations, the symbols are the simulated results.
Eb N0
and wideband
receiver. The impact of the number of receive antennas on the minimum Eb N0 can be observed. In addition to this, the wideband slope S0 decreases with increasing transmitter and receiver correlation. The correlation eigenvalues in the simulation in Figure 4.5 are [1.8, 0.2] for transmitter Eb and receiver correlation. The minimum N values for the SISO, the 0 two times two, and three times three cases are −1.59, −4.602, and −6.3629 dB, respectively. Eb The minimum N of the three CSI scenarios (no CSI, perfect CSI, 0 covariance knowledge) are connected by the following inequalities: covCSI −1 nCSI −1 Eb Eb = nR ≤ nR λT1 = loge 2 loge 2 N0 min N0 min with equality for completely uncorrelated transmit antennas and covCSI −1 Eb loge 2 = nR λT1 ≤ Eλmax (HH H ) N0 min pCSI −1 Eb = loge 2 N0 min
4.1 Spatial Correlation in Multiple Antenna Systems
83
with equality for completely correlated transmit antennas and uncorrelated receive antennas. The last inequality follows from the fact that Eλmax (HH H ) is monotonic increasing with increasing transmit and receive correlation and from Eλmax (HH H ) = Eλmax (RT W W H ) ≥ λmax (RT )Eλmax (W W H ) ≥ λT1 nR .
(4.34)
Next, the inequalities for the wideband slope are presented. Denote the completely correlated scenario, i.e., λT1 = nT and λT2 = λT3 = R R R · · · = λTnT = 0 and λR 1 = nR and λ2 = λ3 = · · · = λnR = 0 by S0,cc and the completely uncorrelated case, i.e., λT1 = λT2 = · · · = λTnT = λR 1 = R = 1 as S λR = · · · = λ . Then the following inequalities hold 0,uc nR 2 2nT nR nCSI ≥ S0nCSI ≥ 1 = S0,cc nT + nR 2nR 1 covCSI S0 covCSI = ≥ S0covCSI ≥ = S0,cc uc nR + 1 2 1 S0 pCSI = ≥ S0pCSI . cc 2 S0 nCSI = uc
4.1.9
Delay Limited Capacity of Multiple Antenna Systems
Future wireless communication systems will support more and more delay-sensitive multimedia and entertainment streaming data that has to be successfully transmitted within a fixed time frame. This is sometimes called non-elastic traffic. Already, the TCP/IP version 6 supports hard delay constraints. However, the lower layers cannot adequately handle those quality-of-service requirements, yet. As a result, it is necessary to study hard delay constraints on the physical layer. In this subsection, we study the capacity of multiple antenna system under hard delay constraints. Multiple antenna systems were extensively studied in terms of their performance and achievable rates [37, 137]. Recently, the impact of correlation on the average mutual information, on the Eb outage probability, and on the minimum N and the wideband slope 0 S0 was analyzed in [19, 68, 70, 141].
84
Application of Majorization in Wireless Communications
For the delay-constraint analysis, the ergodic capacity as well as the outage probability are not suitable. Both approaches do not guarantee the successful transmission of information in any finite number of blocks. Therefore, we restrict the delay to one fading block and fix the outage probability to some , i.e., Pr[C(α) < R] = . Then we solve this for R to obtain the so called -capacity. In order to avoid outages at all, we set = 0 and obtain the zero-outage capacity, or the delay limited capacity (DLC) C d with Pr[C(α) < C d ] = 0. The DLC is defined as the transmission rate that can be reliably supported in each fading state H of the channel [44], i.e., h i Pr log det I + ρHQ(H)H H < C d (ρ) = 0 (4.35) under a long-term power constraint on Q, i.e., E [ tr Q] ≤ P . From an information theoretic point of view, the notion of DLC is somewhat problematic, since the code that achieves capacity requires a long block length, but a block fading channel model is assumed. However, following the arguments in [27], the outage probability predicts surprisingly well the error probability of actual codes for practical values of block length [81]. Consider again the quasi-static block-flat correlated Rayleigh fading MISO channel. The block-flat-fading channel model is given by y = xH h + n and h ∼ CN (0, R). Denote the ordered eigenvalues of R as µ1 ≥ µ2 ≥ · · · ≥ µnT ≥ 0. Denote wk = |hk |2 as iid standard exponentially distributed random variables.
Theorem 4.6 (Section III.B in [44]). The DLC of fading MISO channels is given by ρ i C d (ρ, P, µ) = log2 1 + h (4.36) 1 E ||h||2 P R with ||h||2 = nk=1 µk wk . The optimal transmit strategy consists of the encoder, power allocation, and beamforming. The optimal beamh forming vector is given by v ∗ (h) = ||h|| 2 and the optimal power
4.1 Spatial Correlation in Multiple Antenna Systems
85
allocation by P
p∗ (h) =
h E
1 ||h||2
i
1 . ||h||2
(4.37)
Theorem 4.7. The DLC in (4.36) is Schur-concave with respect to µ, i.e., µ1 µ2 =⇒ C d (ρ, P , µ1 ) ≥ C d (ρ, P , µ2 ). The theorem is a consequence of Lemma 2.16. More details and further illustrations can be found in [74]. 4.1.10
Zero-Outage Capacity Region for SISO BC
Consider the downlink transmission of a cellular system. The base station has multiple antennas (nT ), denote the channels to the users as h1 , . . . , hK . The base applies an OSTBC (e.g., [3]). The data streams d1 , . . . , dK of dimension 1 × nT of the K users are weighted by a power allocation p1 , . . . , pK and added before they come into the OSTBC as s1 , . . . , snT . Each mobile first performs channel matched filtering according to the effective OSTBC channel. Afterwards the received signal at user k is given by yk = ak
K X
xk + nk
(4.38)
l=1
with fading coefficients αk = a2k = ||hk ||2 , transmit signal xl intended for user l and noise nk . We assume that the fading processes of user k and l for k 6= l are independently distributed. Let pk be the power allocated to user k, i.e., pk = E[|xk |2 ]. Denote the long-term sum transmit power constraint at the base station as P , i.e., "K # X Ea1 ,...,ak pk (a1 , . . . , ak ) ≤ P. k=1
86
Application of Majorization in Wireless Communications
The noise power at the receivers is σk2 = ρ1 . The transmit power to noise power is given by SNR = P ρ which is called transmit SNR. The chan1/2 nels are modeled by hk = wk Rk with correlation matrix Rk for user 1 ≤ k ≤ K. Denote the eigenvalues of Rk in decreasing order λk1 ≥ · · · ≥ λknT ≥ 0. The following result is proven in [58] for the case where only linear precoding is allowed at the base station and the base knows only the norm of the channel vectors of all users. Theorem 4.8. The zero-outage capacity region consists of all rates r1 , . . . , rK for which h i PK 1 −rk ) E k=1 αk (1 − 2 ≤ SNR. (4.39) P −rk ) 1− K k=1 (1 − 2 From Lemma 2.16 follows that the zero-outage capacity region shrinks with increasing correlation. Corollary 4.6. The guaranteed MSE region without SIC shrinks with increasing spatial correlation at the mobile terminals, i.e., from λk γ k for 1 ≤ k ≤ K, it follows MSE(λ1 , . . . , λK ) ⊆ MSE(γ 1 , . . . , γ K ). In Figure 4.6, the zero-outage capacity region for two users and two transmit antennas with symmetric correlation for different scenarios is shown. Note that completely correlated transmit antennas lead to zero-outage capacity. The uncorrelated scenario leads to E[1/α1 ] = E[1/α2 ] = 1 whereas correlation λ increases this value to E[1/α1 ] = E[1/α2 ] =
4.2
log(λ) − log(2 − λ) . 2λ − 2
User Distribution in Cellular Communication Systems
In this section, the single antenna multi-user scenario is studied in a cellular environment. Cases of interest are the uplink and downlink transmission corresponding to the multiple-access channel (MAC) and
4.2 User Distribution in Cellular Communication Systems
87
Fig. 4.6 Zero-outage capacity region for MISO BC with two transmit antennas and two users for different correlation scenarios λ = 1 and λ = 1.9.
broadcast channel (BC). The main focus of interest is on the impact of deterministic path-loss variation with distance in conjunction with random fast multipath fading. Random slow shadowing will not be considered for clarity of exposition. The distribution of users in a multiuser cellular communication system has a great impact on the performance. In general, the interplay between user distribution and performance depends on the performance measure and the type of scheduling and transmit strategy. The transmit strategy in turn depends on the available CSI. From a cross-layer optimization point of view, Figure 4.7 illustrates some of the relationships [62]. In [80], it was shown that the optimum strategy for maximizing the sum capacity with perfect CSI of a cellular single-input single-output (SISO) MAC is to allow only the best user to transmit at each time slot. The result in [80] has induced the notion of multiuser diversity, i.e., the achievable capacity of the system increases with the number of users while “riding on the peaks” [139]. In addition to this,
88
Application of Majorization in Wireless Communications
Fig. 4.7 Interplay between different terms in cross-layer optimization. The arrows correspond to some different types of relationships and interactions, e.g., (1) The user distribution influences the fading statistics and thereby the CSI. (2) The user distribution influences the scheduling strategy because cell-edge users are treated in a different way than closeto-the-base users. (3) The network utility function directly determines the optimal scheduling strategy. (4) The value of the network utility function depends on the availability of CSI. Note that there are many more relationships between the four terms.
the result in [80] has led to the development of opportunistic downlink scheduling algorithms [151] for the BC. In [15], the average sum rate of the SISO MAC with successive interference cancellation (SIC) under a sum transmit power constraints was studied for different types of CSI. Recently, the downlink case was analyzed in [72]. It turned out that the optimal scheduling depends strongly on the CSI at the transmitter. The average sum rate describes the long-term system throughput. This performance measure can be used by the system operator to optimize the overall throughput. The short-term system throughput is measured by the outage sum rate and its corresponding outage probability [10]. It describes the probability that an outage occurs during the next transmission block. The properties of the outage probability with respect to the optimal transmit strategy and the channel statistics (e.g., the user distribution) are different to the average sum rate [19]. There are two further performance measures, namely the delay limited sum rate [44, 69] and the maximum throughput [2, 7], that describe the guaranteed performance and the goodput of the system.
4.2 User Distribution in Cellular Communication Systems
89
Recently, the scaling laws of wireless networks were analyzed under simplified assumptions, e.g., the fading variances of the participating users are equal (e.g., all users are located on a unit circle around the base), or for SNR approaching infinity. In [15, 72], different user distributions are compared using Majorization theory and their impact on the average sum rate was characterized. For perfect and long-term CSI, the sum rate was shown to be Schur-convex with respect to the user distribution and for an uninformed base station, the sum rate is Schur-concave. Also, the asymptotic sum rate loss between the best case and the worst case user distribution, was derived. 4.2.1
A Measure for User Distributions
Consider the downlink transmission. In the signal model, there are K mobile users who are going to receive data from one base station. The single-antenna quasi-static block flat-fading channels h1 , ..., hK between the mobiles and the base are modeled as constant for a block of coherence length T and from block to block as zero-mean independent complex Gaussian distributed (CN(0, ck )). The variance is ci = E (h∗i hi ) for 1 ≤ i ≤ K. The analysis of scaling laws often assumes iid distributed channels across the users. In this section, we study the effect of different fading variances ck of the users. In order to guarantee a fair comparison, we constrain the sum variance to be equal to the number of users, P i.e., K k=1 ck = K. In Figure 4.8, the implications of this constraint are shown. Starting from the symmetric scenario c1 = c2 = · · · = cK = 1, one mobile moves toward the base while another moves to the cell edge. The other extreme scenario occurs when all but one user have very small fading variances ck . Under the normalization above, this leads to the variances c2 = c3 = · · · = cK = 0 and c1 = K. Without loss of generality, we order the users in a decreasing way according to their fading variances, i.e., c1 ≥ c2 ≥ · · · ≥ cK . The constraint regarding the sum of the fading variances verifies that we compare scenarios in which the average of the channels carry the same sum power.
90
Application of Majorization in Wireless Communications
Fig. 4.8 Fair comparison of user distributions with K = 8 users. (a) Symmetric scenario c1 = (1, 1, . . . , 1). (b) One mobile moves to the base and another to the cell edge. The sum of their fading variances stays constant. c2 = (1 + α, 1, 1, . . . , 1 − α). (c) All but one mobile at the cell edge c3 = (8, 0, . . . , 0).
4.2 User Distribution in Cellular Communication Systems
91
Definition 4.4 (More spread out user distribution). A user distribution c1 is called more spread out than a user distribution c2 if the fading variance vector c1 is majorized by c2 , i.e., c1 c2 .
Remark 4.3. In Figure 4.8, the symmetric scenario is majorized by all other scenarios and scenario three majorizes all other scenarios, i.e., c1 c2 c3 .
Remark 4.4. Note, that the measure of user distribution and the measure of spatial correlation can be combined for, e.g., multiuser MIMO systems. The channel of a user k can be modeled for all 1 ≤ k ≤ K under the Kronecker model assumption from Section 4.1.1 as 1/2
1/2
H k = ck RR,k W k RT,k
with normalized transmit and receive correlation matrix as well as normalized random matrix, i.e., for all 1 ≤ k ≤ K tr RT,k = nT , tr RR = nR , and EW k = 1. Then, the long-term fading is captured by ck , the spatial correlation by RT and RR and the rich multi-path environment by W . In the following subsections, some examples for application of the measure of user distribution are provided. 4.2.2
Average Sum Rate in MAC
There are K mobile users who are going to transmit data to a base station. The flat-fading channels h1 , . . . , hK between the mobiles and the base are complex Gaussian iid (CN (0, I)). The additive white Gaussian noise n(t) at the base receiver has variance σn2 . Furthermore, we assume that the sum transmit power is constrained to be Psum . The SNR is given byρ = Pσsum 2 . The received signal at the base is given by n
y(t) =
K X k=1
hk xk (t) + n(t).
(4.40)
92
Application of Majorization in Wireless Communications
The statistic of the fading channel coefficient is completely characterized by their second moment, i.e., ci = E (h∗i hi ) for 1 ≤ i ≤ K. The transmit power directly corresponds with the variance of the transmit signals, pi = E (x∗i xi ) for 1 ≤ i ≤ K. It follows from the transmit power constraint that the l1 -norm of the power allocation vector P p = [p1 , . . . , pK ] is constrained to be one, i.e., ||p|| = K k=1 pk = P = 1. The average sumrate is a function of the SNR ρ and the fading variance distribution c1 , . . . , cK . The ergodic sum capacity of the SISO MAC with perfect CSI at the base and the mobiles is given by CpCSI (ρ, c) = E (log [1 + ρ max (c1 w1 , . . . , cK wK )]) .
(4.41)
The optimum transmission strategy is to allocate power only for the best user, i.e., the user with maximum ci wi . Theorem 4.9. For perfect CSI at the mobiles, only the best user is allowed to transmit at one time. The average sum rate in (4.41) is Schur-convex function w.r.t. the fading variance vector c. The proof follows from Theorem 2.17. The ergodic sum capacity of the SISO MAC with perfect CSI at the base and long-term CSI at the mobiles in terms of c1 , . . . , cK is given by ! K X (4.42) ck pk wk . CcfCSI (ρ, c) = max E log 1 + ρ
P
K i=1 pi =1
pi ≥0 ∀1≤i≤K
k=1
The solution to the programming problem in (4.42) can be found in [15]. Note that usually one major difference between the uplink and downlink is in regards of the power constraints. Theorem 4.10. For mobiles which know the fading variances and perfect CSI at the base, the average sum rate in (4.42) increases with less spread out fading variances c, i.e., the average sum rate in (4.42) is a Schur-convex function w.r.t. the fading variance vector c. The proof follows from Theorem 2.18.
4.2 User Distribution in Cellular Communication Systems
93
Finally, the ergodic sum capacity of the SISO MAC with perfect CSI at the base and no CSI at the mobiles is given by ! K X CnoCSI (ρ, c) = E log 1 + ρ ck wk . (4.43) k=1
Theorem 4.11. For uninformed mobiles and perfect CSI at the base, the average sum rate in (4.43) increases with more spread out fading variances c, i.e., the average sum rate in (4.43) is a Schur-concave function w.r.t. the fading variance vector c. The proof follows from Lemma 2.16. In the downlink the base station has a sum power constraint on all users signals whereas in the uplink individual power constraint are applied. In order to take intercell interference into account additional sum power constraints occur also in the uplink [61]. This provides the motivation for the optimization problems in (4.41–4.43). 4.2.3
Average Sum Rate in BC
In the signal model, there are K mobile users who are going to receive data from one base station. The single-antenna quasi-static block flat-fading channels h1 , . . . , hK between the mobiles and the base are modeled as constant for a block of coherence length T and from block to block as zero-mean independent complex Gaussian distributed (CN(0, ck )). The variance is ci = E (h∗i hi ) for 1 ≤ i ≤ K. The additive zero-mean white Gaussian noise nk (t) at the each receiver is iid and has variance σn2 . Furthermore, we assume that the sum transmit power is constrained to be P . The SNR is given by ρ = σP2 . The received signal n at mobile k at time t is yk (t) = hk
K X
xl (t) + nk (t).
l=1
We omit the time index for convenience. The statistics of the fading channel coefficients hi are completely characterized by ci . The transmit power directly corresponds to the variance of the transmit signals
94
Application of Majorization in Wireless Communications
pi = E (x∗i xi ) for 1 ≤ i ≤ K. The l1 -norm of the power allocation vecP tor p = [p1 , . . . , pK ] is constrained to be one ||p|| = K k=1 pk = P = 1. 2 For 1 ≤ k ≤ K define wk by ||hk || = ck wk , i.e., wk are iid standard exponential distributed random variables. We assume that the receivers have perfect CSI. Furthermore, we collect the channel states in a vector h = [h1 , . . . , hK ]. The next Lemma yield the average sum capacity and average sum rate expressions for the three CSI scenarios considered. The proofs can be found in [72]. Lemma 4.12. The sum rate with perfect CSI at the base station is achieved by TDMA. The optimal power allocation is to transmit into direction of the best user l with ||hl ||2 > ||hk ||2 for all 1 ≤ k ≤ K and l 6= k. The ergodic sum capacity is then given by CpCSI (ρ, c) = E log 1 + ρ max ||h1 ||2 , . . . , ||hK ||2 . (4.44) The optimal transmit strategy to achieve the average sum capacity with long-term CSI is TDMA. Only the user with highest channel variance ck is allowed to transmit. The achievable average sum capacity is given by CcCSI (ρ, c) = E log(1 + ρc1 w1 ).
(4.45)
For no CSI at the base, the most robust transmit strategy against worst case user distribution is equal power allocation and the ergodic sum rate2 is given by ! K X (4.46) CnoCSI (ρ, c) = E log 1 + ρ ck wk . k=1
Next, let us characterize the impact of the spread of the fading variances on the ergodic sum capacity for the cases with perfect, covariance and on the ergodic sum rate with no CSI at the base. 2 Since
the optimal transmit strategy for no CSI is motivated by a compound channel approach, we cannot talk about the sum capacity. Instead we use the term sum rate.
4.2 User Distribution in Cellular Communication Systems
95
Theorem 4.13. Assume perfect CSI at the mobiles. For perfect CSI at the base, the ergodic sum capacity in (4.44) is a Schur-convex function w.r.t. the fading variance vector c. For a base which knows the fading variances, the ergodic sum capacity in (4.45) is a Schur-convex function w.r.t. the fading variance vector c. For an uninformed base station, the ergodic sum rate in (4.46) is a Schur-concave function w.r.t. the fading variance vector c. The proof and illustrations can be found in [72]. The proof is based on Lemma 2.16 and Theorem 2.17. 4.2.4
Sum Rate Related Measures
Consider the instantaneous sum rate with scheduling policy p(h) ! K X (4.47) pk (h)||hk ||2 . C(α) = C(h, ρ) = log 1 + ρ k=1
The instantaneous sum rate depends on the deterministic SNR and on the channel which is a random variable. That means the instantaneous sum rate is also a random variable (indicated by α). In the block fading model, the channel is constant for the coherence time T . It is assumed that the coherence time T is large enough to code over many blocks in order to achieve almost the mutual information. Then the mutual information in (4.47) has its usual meaning as the instantaneous capacity [10]. Since the scheduling policy depends on the channel state, it could also vary randomly from fading block to fading block. As a result, the instantaneous capacity itself is a random variable and has a pdf pC (α). The average of the random variable Z ∞ E[C(α)] = αpC (α)dα 0
is the average sum rate. For single user systems with perfect CSI, it is called ergodic capacity [95]. In multiuser systems with perfect CSI we can call it ergodic sum capacity and it describes the overall performance
96
Application of Majorization in Wireless Communications
of the system in average. For finer analysis, the cdf of C is important. It is the outage probability of the channel, i.e., Z R Pr[C(α) < R] = pC (α)dα. 0
The outage probability gives the probability that a certain sum rate R cannot be achieved for a channel state. The system is in an outage means we cannot guarantee to successfully deliver any information at the sum rate R during this channel state. The feasible delay can be exploited either by increasing the length of one codeword or by introducing some kind of automatic repeat request (ARQ). If the block length of the codeword is increased, the outage probability for this codeword is reduced. Here, following [2], we consider the “Maximum Zero-Outage Throughput.” The receiver requests a retransmission as long as outages occur until the codeword is successfully decoded. Therefore, the complete information is reliably transmitted. The maximum throughput for this simple retransmission scheme is given by T (SNR) = max R (1 − Pr [C(h, SNR) ≤ R]) . R≥0
(4.48)
In [2] the quantity in (4.48) is called “Maximum Zero-Outage Throughput” (compare also to [7]). For the delay-constraint analysis, the ergodic capacity as well as the outage probability and the maximum throughput are not suitable. Both approaches do not guarantee the successful transmission of information in a finite number of blocks. Therefore, we restrict the delay to one fading block and fix the outage probability to some , i.e., Pr[C(α) < R] = . Then we solve this for R. In order to avoid outages at all, we set = 0 and obtain the zero-outage sum rate, or the delay-limited sum rate R∗ with Pr[C(α) < R∗ ] = 0. 4.2.4.1
Outage Sum Rate
If the users are not equally distributed and the information is not available at the base station, it can be shown by a compound chan-
4.2 User Distribution in Cellular Communication Systems
97
nel approach that equal power allocation across all users is optimal. Furthermore, the impact of the user distribution on the outage sum rate is characterized in the following theorem. Theorem 4.14. Assume that the base station is uninformed and the user distribution is according to c. For fixed transmission rate R and R for SNR ρ < ρ = 2 2−1 , the sum outage probability is a Schur-concave function of the user distribution c1 , . . . , cK , i.e., a less equal distribution of users decreases the sum outage probability. For SNR ρ > ρ = 2R − 1, the sum outage probability is a Schur-convex function of the user distribution c1 , . . . , cK , i.e., a less equal distribution of users increases the sum outage probability. The proof follows from Theorem 2.19. Theorem 4.15. With perfect CSI at the base, the optimal scheduling is TDMA and the outage probability is given by Pr max[||h1 ||2 , . . . , ||hK ||2 ≤ z . (4.49) For fixed sum rate R and SNR ρ ≤ ξˆ =
2R − 1 K
(4.50)
the sum outage probability is Schur-concave with respect to c and for SNR 2R − 1 ρ ≥ ξ¯ = min ck
(4.51)
1≤k≤K
the sum outage probability is Schur-convex. Proof. In order to verify Schur’s condition, note that the outage probability can be written as pCSI Pout (c)
=
K Y k=1
(1 − exp(−z/ck ))
98
Application of Majorization in Wireless Communications
which is obviously a symmetric function with respect to c. The differpCSI ence of the first derivatives of Pout (c) with respect to c1 and c2 is given by ∆(c) =
K Y
(1 − e−z/ck )e−z/c1 −z/c2
k=3
1 c21 c22
h i · (1 − e−z/c1 ez/c1 c21 ) − (1 − e−z/c2 ez/c2 c22 ) =
K Y
(1 − e−z/ck ) (g(z, c1 ) − g(z, c2 ))
k=3
with g(z, c) = −z/c2 exp(−z/c)(1 − exp(−z/c)). The sign of ∆(c) depends on the monotony properties of the function g with respect to c. The sign of the first derivative of g with respect to c depends on z (c − z). sign 1 − exp − c If z > max ck , the first derivative of g with respect to c is negative and the outage probability is Schur-concave. If z < min ck , the first derivative of g with respect to c is positive and the outage probability is Schur-convex. Since max ck ≤ K, the inequality in (4.50) and (4.51) follows. 4.2.4.2
Further Sum Rate Performance Measures
If stringent delay requirements are relaxed and an arbitrary delay allowed, the delay can be exploited either by increasing the length of one codeword or by introducing some kind of ARQ. If the block length of the codeword is increased, the outage probability for this codeword is reduced. Here, following [2], we focus on the “Maximum Zero-Outage Throughput.” The receiver requests a retransmission as long as outages occur until the codeword is sucessfully decoded. Therefore, the complete information is reliably transmitted. The probability that a codeword has to be transmitted s times is given by "s−1 #" !# s−1 \ \ Pr outi 1 − Pr outs outi , (4.52) i=1
i=1
4.2 User Distribution in Cellular Communication Systems
99
where outs means an outage in retransmission s. If the receiver considers only the actual packet for decision, the probability of s times transmission is given by s−1 Y
h i h i Pr kth outage · 1 − Pr sth outage
(4.53)
k=1
under the assumption of an independent block-fading channel. The maximum throughput is then defined to be T (ρ, P ) = sup R
R E[S]
with the average service time E[S]. Using (4.53), the maximum throughput for this simple retransmission scheme is given by T MZT (ρ, P ) = max R (1 − Pr [I(ρ, P, h) ≤ R]) . R≥0
(4.54)
In [2] the quantity in (4.48) is called “Maximum Zero-Outage Throughput.” Here, a transmission of a certain amount of data is not guaranteed within a limited delay. The measure in (4.48) is contrary to the delaylimited capacity in the last section. There are channel realizations in which more bits are reliably transmitted than T MZT (ρ, P ) and there are realizations in which less bits are transmitted. The probability that T MZT (ρ, P ) bits (out of R bits) are transmitted without errors is given by Pr I(ρ, P, h) ≤ T MZT (ρ, P ) . The connection between the delay-limited capacity and the maximum throughput becomes clear for rates that are smaller than the delay-limited capacity. In this case, the optimal power allocation which minimizes the outage probability corresponds to the optimal power allocation in the delay-limited case. Therefore, the achievable rates are equal, too. Even in the simplest setting the optimization problem in (4.54) leads to a complicated solution containing again the Lambert-W function [7]. Table 4.1 shows the maximum throughput for equally distributed users c = 1 for perfect CSI and no CSI. The multiuser diversity that stems from the fact that the best user is exclusively scheduled can clearly be observed.
100
Application of Majorization in Wireless Communications
Table 4.1 Scaling of maximum throughput with perfect and no CSI at the base at SNR 0 dB. CSI\M pCSI noCSI
1 0.264 0.264
2 0.523 0.417
3 0.753 0.521
5 1.132 0.664
10 1.785 0.861
20 2.5265 1.0521
Next, the strict zero-outage or delay-limited sum rate is studied. The delay-limited sum rate R0 is defined by 2 Pr p(h) max ||hk || ≤ z = 0. (4.55) 1≤k≤K
The delay-limited sum rate R0 is given by ρ R0 = log 1 + β h i with β = E max 1 ||h ||2 . The expectation as a function of the user k 1≤k≤K distribution c is simplified to Z ∞ Y K 1 t β(c) = 1 − exp − dt. t2 ck 0 k=1
Corollary 4.7. The delay-limited sum rate R0 is Schur-concave with respect to c, i.e., the less spread the users are, the higher is R0 . Proof. The proof is an application of Theorem 2.17. The function β(c) is the expectation of a convex and monotonic decreasing function f (x) = x1 which has the maximum of set of iid weighted standard exponential random variables. The delay-limited sum rate R0 depends on the inverse of β, therefore it is Schur-concave whenever β(c) is Schurconvex. Solely, the technical issues in the theorem must be verified. Q Note that P (x) = nk=1 (1 − exp(−x/µk )) and f (x) = 1/x. The first limit exists because 1 lim (1 − exp(−x)) = 1. x→0 x The second limit exists because 1 lim (1 − exp(−x)) = 1. x→∞ x
4.2 User Distribution in Cellular Communication Systems
101
Finally, the third limit also exists because Z ∞ 1/x2 (1 − exp(−x))2 dx = −2 log(2). − 0
Remark 4.5. From the proof of Corollary 4.7, it follows that there must be at least two users in the cell, otherwise the delay limited sum rate is zero.
5 Application of Matrix-Monotone Functions in Wireless Communications
5.1
Generalized Multiple Antenna Performance Measures
Multiple-antennas can improve the spectral efficiency and reliability in wireless communications systems. In recent years, it was discovered that MIMO systems have the ability to reach higher transmission rates than one-sided array links [137, 158]. First, we review recent results for MISO systems since this case has recently gained much attention. In [157], the potential of multiple antenna systems was pointed out. The capacity of a MISO system with imperfect feedback was first analyzed in [149] and [99, 100]. In [55, 63], the optimum transmission strategy with covariance knowledge at the transmit array with respect to the ergodic capacity was analyzed. In [21, 111, 121], the problem of downlink beamforming problem in MISO systems was solved. In [54], the ergodic capacity in the non-coherent transmission scenario with only covariance knowledge at the transmitter and the receiver, is studied. Many results regarding the capacity of MISO and MIMO systems under different levels of CSI and the corresponding transmission strategies are recently published [40]. It has been shown that even partial CSI at the transmitter can increase the capacity of a MISO system. Recently, transmission schemes 103
104
Application of Matrix-Monotone Functions in Wireless Communications
for optimizing capacity in MISO mean-feedback and covariancefeedback systems were derived in [149, 99]. The capacity can be achieved by Gaussian distributed transmit signals with a particular covariance matrix. In a block-fading model, the general signal processing structure which achieves capacity independent of the type of CSI consists of a Gaussian codebook, a number of beamformers and a power allocation entity [9, 149]. Additionally, it was proved that the optimal transmit covariance matrix in the covariance feedback case has the same eigenvectors as the known channel covariance matrix. The complete characterization of the impact of correlation on the ergodic capacity in MISO systems can be found in [68]. In addition to the capacity other performance metrics like the MMSE were analyzed in the literature, e.g., [117, 120]. The multiuser MIMO system optimization is performed with respect to sum MSE in [129], with per-user MMSE requirements in [128]. The analysis and design methodology of single-antenna and beamforming systems was extended and generalized to multiantenna systems. Many novel approaches and techniques were developed and a unmanageable bulk of papers, reports, and books were produced. However, some ideas occurred inherently as persistent concepts in many works. The main goal of this section is to detect these main concepts and express them on a meta-level by constructing a unified framework. In order not to have different statements for different performance metrics, we present here a unifying framework in which a class of functions serves as the performance metric. The underlying mathematical structure is described by the representation of L¨owner from “MatrixMonotone Function.” Let us start with some motivating and illustrative examples. 5.1.1
Examples in Single-User MIMO Systems
Consider the quasi-static block-flat fading MIMO system in Figure 5.1. The transmit signals are complex Gaussian distributed random vectors with zero mean and transmit covariance matrix Q. The transmitter structure that corresponds to this type of signaling is described as follows: The transmit covariance matrix is given by Q = E xxH .
5.1 Generalized Multiple Antenna Performance Measures
105
Fig. 5.1 Single-user MIMO system.
Using the eigenvalue decomposition of Q = U Q ΛQ U H Q , it becomes obvious how one can construct a particular transmit covariance matrix. The input data stream d(k) is split into m parallel data streams d1 (k), . . . , dm (k). Each parallel data stream is multiplied by a factor √ √ p1 , . . . , pm and then weighted by a beamforming vector u1 , . . . , um , respectively. The number of parallel data streams is less or equal to the number of transmit antennas (m ≤ nT ). The beamforming vectors have size 1 × nT with nT as the number of transmit antennas. The √ nT signals of each weighted data stream xi (k) = di (k) · pi · ui are Pm i added up x(k) = i=1 x (k) and sent. By omitting the time index k for convenience we obtain in front of the transmit antennas x=
m X
dl ·
√
pl · ul .
(5.1)
l=1
The transmit signal in x has a covariance matrix Q with eigenvalues p1 , . . . , pm , 0, . . . , 0 and eigenvectors u1 , . . . , um . In order to construct a transmit signal with a given covariance matrix, two signal processing steps are necessary: the power control p1 , . . . , pm and the beamformP T ers u , . . . , um . The sum transmit power nk=1 pk is constrained, i.e., PnT 1 p = P . k=1 k The first performance measure is the ergodic capacity [10], i.e., the rate that can be transmitted reliable over ergodic (infinite) many channel realizations by codes with very long (infinite) block length.1 For 1 Since
we fix the transmit covariance matrix, the term capacity can be confused. Usually, the capacity is the ultimate rate that is achieved by optimization of the transmitter including the covariance matrix. However, we think of the linear precoding matrix Q as fixed and talk about the capacity for this fixed precoding Q.
106
Application of Matrix-Monotone Functions in Wireless Communications
the system in Figure 5.1 with nT transmit and nR receive antennas and n = min(nT , nR ), m = max(nT , nR ) it is given by C(ρ, Q) = EH log det I + ρHQH H n Y = EH log 1 + ρλk (HQH H ) = EH
k=1 n X
log(1 + ρλk (HQH H ))
k=1
= EH tr log I + ρHQH H
(5.2)
with SNR ρ = σ12 , channel matrix H, and expectation with respect to n H. The channel matrix is zero-mean iid Rayleigh distributed. A very important property that has been used many times is the invariance property of the channel statistics, i.e., left and right multiplication of H with an unitary matrix U [94]. Furthermore, the function in (5.2) is obviously monotone increasing in ρ, in tr Q for fixed Q tr Q and also in Q, i.e., the function inside the trace is matrixmonotone with respect to Q. Further on, the function is concave with respect to Q. Next, consider the slightly modified system in Figure 5.2. In addition to the additive white Gaussian noise, another additive noise with colored covariance matrix is added. That can correspond to either intra- or inter-cell interference or to any other type of jamming signal.
Fig. 5.2 Single-user MIMO system with interference.
5.1 Generalized Multiple Antenna Performance Measures
107
The ergodic capacity for the system in Figure 5.2 is given by
1 ˜ C(ρ, Q, Z) = E log det I + Z +HQH H − log detZ ρ | {z } ˜ Z
˜ −1/2 HQH H Z ˜ −1/2 . = E tr log I + Z
(5.3)
˜ is the positive definite noise plus interference matrix. Note In (5.3), Z that the function in (5.3) is concave in Q and convex in Z. Next, consider a completely different performance measure, the normalized minimum mean-square error (MMSE). The linear MMSE receiver reduces the computational complexity at the receiver side. The MSE can be evaluated for each fading state and each symbol. The average MSE described the quality of data transmission. If we apply the linear MMSE receiver, the performance metric changes from the average mutual information to the normalized MSE [150]. In general, the Wiener filter for a linear system y = ax + n can be described as w = Rxy R−1 yy . The reason, why we speak about the normalized MMSE is that there are two cases for deriving the MMSE. In the first case, the actual transmit signal x is considered. In the second case, the source signal before linear precoding is considered. In the first case, the resulting weight w or the resulting MSE expression must be normalized with the transmit covariance matrix. However, both approaches lead to the same result. We consider the first approach: The linear MMSE receiver weights the received signal vector y by the Wiener filter ˜ + ρHQH H −1 y. ˆ = ρQH H Z x
(5.4)
The covariance matrix of the estimation error R is given by R = EH (ˆ x − x) (ˆ x − x)H ˜ + ρHQH H −1 HQ. = Q − QH H Z
(5.5)
108
Application of Matrix-Monotone Functions in Wireless Communications
The average normalized sum MSE is defined as the trace error covariance matrix of the estimation error in (5.5) [49, 65] MSE(σn2 , Q, Z) = EH tr Q−1/2 R Q−1/2
˜ + ρHQH H −1 = nT − EH tr ρHQH H Z −1/2 ˜ ˜ −1/2 = nT − EH tr ρZ HQH H Z ˜ −1/2 HQH H Z ˜ −1/2 −1 (5.6) · I + ρZ and its average over channel realizations is called average sum MSE. Note, that the MSE is convex in Q and concave in Z. In SISO systems, the relationship between the rate, the SNR, and the MSE is quite simple, i.e., C = log(1 + SNR) = log(1/MSE). In MIMO systems, the connection is more complicated due to the spatial dimension, e.g., the pairwise error probability (PEP) between X and ˆ can be upper bounded by P (X → X) ˆ ≤ exp(−ρ||H(X − X)|| ˆ 2 ). X The connection between the performance measure mutual information and MSE is highlighted in the next subsection. 5.1.2
Relationship between MSE and Mutual Information
There are at least three connections between the MSE and the capacity that provide intuitive insights into the meaning of these performance metrics. The first is a function theoretic relationship, the second is a direct relationship by linear algebra, and the third is an analytical relationship following from the second one.
5.1.2.1
Function Theoretic Relationship
Let us compare the capacity and the normalized sum MSE. First, we rewrite the capacity as C = tr log I + ρZ −1/2 HQH H Z −1/2 . The capacity expression is the trace of a matrix valued function. Let us denote the matrix valued function as Φ1 (X) = log (I + X). Then the
5.1 Generalized Multiple Antenna Performance Measures
109
capacity can be written as C = tr Φ1 (ρZ −1/2 HQH H Z −1/2 ).
(5.7)
Let us turn to the sum MSE expression. It can be written using the matrix-valued function Φ2 (X) = I − X [I + X]−1 = [I + X]−1 as MSE = tr Φ2 (ρZ −1/2 HQH H Z −1/2 ).
(5.8)
This leads directly to the next theorem. Theorem 5.1. Both performance metrics, the instantaneous mutual information and the MSE can be written in the following generalized form Φ(Z, Q, H) = tr φ ρZ −1/2 HQH H Z −1/2 (5.9) with a matrix-monotone function φ. Proof. It has already been shown in Section 3.2 that log(1 + x) and 1 1+x are matrix-monotone functions. 5.1.2.2
Linear Algebraic Relationship
The normalized MSE matrix is defined in (5.5). Denote the receive covariance matrix as Ry = E[yy H ] = I + HQH H and the crosscovariance matrix Rxy = E[xy H ] = QH H . Compute the inverse of the error covariance matrix of the MSE in (5.5) using the Matrix Inversion Lemma in (6.1) to obtain [132] H −1 R−1 = Q − Rxy Ry Rxy −1 H −1 H −1 Rxy Q = Q−1 + Q−1 Rxy R−1 y − Rxy Q Rxy = Q−1 + H H Z −1 H The capacity can be written as C = log det I + Z −1 HQH H = log det I + H H Z −1 HQ detQ −1 H −1 = log detQ + log det Q + H Z H = log . detR
110
Application of Matrix-Monotone Functions in Wireless Communications
5.1.2.3
Analytical relationship
It is easily observed that by taking the first derivative of the capacity with respect to the ρ, the following expression closely related to the normalized sum MSE is obtained, i.e., ∂ log det Z + ρHQH H ∂C(ρ) = ∂ρ ∂ρ −1 1/2 H = tr Q H Z + ρHQH H HQ1/2 . This observation is generalized into the following important result. Note that the MMSE estimator in [43] tries to estimate Hx instead of only x. Therefore, the expression differs from the MMSE in (5.6). Theorem 5.2 (Theorem 2 in [43]). Denote the MMSE as ˆ ||2 where x ˆ is the conditional mean estimmse(ρ) = E ||Hx − H x mate. Then with independent AWGN vector n d √ I(x; ρHx + n) = mmse(ρ) dρ
(5.10)
This holds for all random variables x with finite variance. Note that the gradient of the mutual information with respect to different parameters in linear vector Gaussian channels is studied in [108]. 5.1.3 5.1.3.1
Examples in Multi-User MIMO Systems Introduction and Motivation
In this subsection an overview of the recent results are given. It is impossible to collect and refer to all relevant work in the multiuser multi-antenna research area. Therefore, only a fraction of the interesting work is cited and the interested reader is referred to the references in these papers to obtain more information. In multiuser scenarios, the application of new services which create elastic traffic is on the rise. Therefore, it is necessary, to study the complete point-to-multi-point (downlink) and multi-point-to-point
5.1 Generalized Multiple Antenna Performance Measures
111
(uplink) transmission system. The MIMO MAC model appears in the uplink transmission from multiple user to the multi-antenna base station. Each user is equipped with multiple antennas. One important performance metric analyzed is the throughput of such multiple user systems under either individual power constraints in the uplink or a sum power constraints for the downlink transmission. In [112], the ergodic sum capacity of a multi-antenna Gaussian MAC is defined, and the impact of the number of transmit and receive antennas, as well as the number of users is analyzed. The special case in which covariance information is only available at the mobiles is considered in [56]. In [113], the ergodic capacity, the ergodic sum rate region, the outage capacity, and algorithms for the vector MAC are analyzed. In [13, 164], the authors maximize the ergodic sum capacity of the MIMO MAC for fixed individual power constraints for the transmit covariance matrices. It was shown that the optimal transmit covariance matrices are characterized by an iterative water-filling solution which treats the other users like noise under individual power constraints. The system model, which is dual to the MIMO MAC, is the multiuser MIMO downlink transmission. It leads to the BC [31, 32], the multi-antenna Gaussian non-degraded BC [26], or vector broadcast channel [163]. Recently, the sum capacity and achievable region of the multiuser MIMO BC was studied in [147, 146] and [152]. An upper bound on the capacity region of the BC was derived in [119]. The bound is found by computing the capacity of the cooperative system under worst case noise. The structure of the worst case noise for the MIMO BC is analyzed in [161]. In [22], [140], [146], and [162], the duality between the multiuser uplink and downlink channel was studied. It was shown that the achievable capacity region of the downlink transmission collapses with the capacity region of the uplink. In addition to this, the maximum sum rate point on the envelope of the capacity region can be characterized by the capacity of the equivalent cooperative MIMO system with the worst case noise [119]. The capacity region of the non-degraded Gaussian BC is not an open research problem anymore. In [148], the authors show that all points below their proposed upper bound are achievable under the
112
Application of Matrix-Monotone Functions in Wireless Communications
assumption that Gaussian code books are optimal. This assumption has been verified in [153] and it has been shown that the capacity region of the Gaussian MIMO BC equals that of the Gaussian MIMO MAC. In addition to the throughput another relevant performance metric is the MMSE in the uplink transmission when the base station applies a linear MMSE receiver. The resulting sum MSE has been studied in [67]. Both sum performance measures are discussed individually in detail in [60]. Some preliminary results can be also found in [59, 126]. The solution to the sum transmit power minimization with per-user MMSE requirements is provided in [123, 127, 128].
5.1.3.2
MIMO MAC
Consider the MIMO MAC. The communication channel between each user and the base station is modeled by a block-flat fading MIMO channel. We have K mobiles with nT antennas each. We can easily extend the results to the case in which every mobile has a different number of transmit antennas. The base station owns nR receive antennas. In the discrete time model, the received vector y at any one time at the base station can be described by y=
K X
H k xk + n
(5.11)
k=1
with the receiver noise n ∈ CnR ×1 which is AWGN, flat fading channel matrices H k ∈ CnR ×nT , and transmit signals xk ∈ CnT ×1 . We assume uncorrelated noise with covariance σn2 I nR . The inverse noise power is denoted by ρ = σ12 . Equation (5.11) can be rewritten in compact n form as y = Hx + n
(5.12)
with H = [H 1 , H 2 , . . . , H K ] and x = [xT1 , . . . , xTK ]T . We collect the transmit covariance matrices in a large block-diagonal matrix Q = Diag (Q1 , Q2 , . . . , QK ). The sum capacity of the MIMO MAC with SIC
5.1 Generalized Multiple Antenna Performance Measures
113
applied at the base station is given by [163] C(Q, H, ρ) = log det I + ρ
K X
! H k Qk H H k
k=1
= tr log I + ρ
K X
! H k Qk H H k
.
(5.13)
k=1
Next, consider a linear multiuser MMSE receiver and follow the derivation in the single-user case closely (see Section 5.1.1). Define the normalized MSE as nR X µi −1/2 −1/2 MSE(Q, H, ρ) = tr (Q K Q ) = KnT − 2 σn + µi i=1
= KnT − nR + σn2 = KnT − nR +
nR X
1
σ 2 + µi i=1 n σn2 tr A−1
(5.14)
with µi as the eigenvalues of HQH H and the matrix A as A = σn2 I +
K X
H k Qk H H k .
(5.15)
k=1
The MSE is reduced by minimizing the sum in the RHS of (5.14). It is P R 1 worth mentioning that the term ni=1 2 +µ is a Schur-convex function σn i with respect to the µi [92]. Therefore, the term is minimized if all eigenvalues µi are equal. 5.1.3.3
MIMO BC
Next, we study the downlink transmission from the base station to the mobiles. The base station is equipped with nT transmit antennas and each mobile has nR antennas. The channel matrices in the downlink transmission correspond to the Hermitian channel matrices from the H uplink, i.e., H dl i = H i (reciprocity). The received vector y k at each mobile k can be written as yk = H H k xk +
K X l=1,k6=l
HH k xl + nk
(5.16)
114
Application of Matrix-Monotone Functions in Wireless Communications
with the flat-fading channel matrices H k , the AWG receiver noises nk , and the transmit signal xk which is intended for mobile k. The noise at the mobiles is assumed uncorrelated and independent identically distributed. In Equation (5.16) the first term isthe signal for user k, the second term is the interference from the signals for the other users, and the last term is the noise. The sum capacity of the MIMO BC with Costa Precoding at the base station is equal to the sum capacity of the MIMO MAC and it is given by [147] ! K X . (5.17) H k Qk H H C(Q, H, ρ) = log det I + ρ k k=1
5.1.4
Generalized Sum Performance Measure
Both sum performance measures (5.13), (5.14) for downlink and (5.17) for uplink can be written as ! K X . (5.18) H k Qk H H Φ(Q, H, ρ) = tr φ ρ k k=1
The function in (5.18) is jointly concave in the set of transmit covariance matrices Q1 , . . . , QK . This follows from the fact that the function depends on the weighted sum of covariance matrices ! K X (1) (2) ˜ + λQ ˜ )H H tr φ H k ((1 − λ)Q k
k
k
k=1
= tr φ (1 − λ)
K X
˜ (1) H H HkQ k k
k=1
+λ
K X
˜ (2) H H HkQ k k
!
k=1
= tr φ ((1 − λ)A + λB)
(5.19)
with 0 ≤ λ ≤ 1 and fixed A and B. The function is concave, since φ is matrix-monotone.
5.2
Optimization of Matrix-Monotone Functions
The performance measures derived and described in the last section can be represented as the trace of some matrix-monotone function which
5.2 Optimization of Matrix-Monotone Functions
115
depends on either a single quadratic form ρZ −1/2 HQH H Z −1/2 or a P sum of quadratic forms of ρ K k=1 H k Qk H k . Depending on the type of optimization problem the expectation is taken with respect to H or H 1 , . . . , H K where H is distributed according to the Kronecker model from Section 4.1.1. 5.2.1
Single Quadratic Form
At first, we list some properties of the performance function. Let U be a unitary matrix and A, B two positive semidefinite matrices. Both of dimension n × n. Furthermore H is in general a rectangular matrix of dimension n × m. tr φ U AU H = tr φ (A) (5.20) H H tr φ HH = tr φ H H (5.21) (5.22) tr φ A1/2 BA1/2 = tr φ B 1/2 AB 1/2 . Property (5.20) follows from the definition of the matrix-valued function φ. (5.21) follows from the fact, that φ acts only on the eigenvalues and the eigenvalues of HH H are equal to the eigenvalues of H H H. The last property (5.22) follows from tr φ(B 1/2 AB 1/2 ) = tr φ(B 1/2 A1/2 A1/2 B 1/2 ) = tr φ(A1/2 BA1/2 ) by property (5.21). Theorem 5.3. Consider the following optimization problem with diagonal positive Γ = diag[γ1 , . . . , γn ] max tr Γφ ρHQH H s.t. tr Q ≤ P. (5.23) Q0
Decompose H according to SVD as H = U DV H . Denote the eigenvalues of H H H by d1 , . . . , dn . Then the optimal Q has eigenvalue decomposition Q = V ΛV H with eigenvalues Λ = diag(p1 , . . . , pn ) characterized by + 1 ˜0 µ φ pk = ρdk ρdk γk with x+ = max(x, 0) and φ˜0 as the inverse function of the first derivative P of φ. µ is chosen such that nk=1 pk = P .
116
Application of Matrix-Monotone Functions in Wireless Communications
The first part of Theorem 5.3 follows from Theorem 3.16. The second part uses the optimality conditions that are described in Appendix 6.2.3. The complete and detailed proof and illustrations for the case Γ = I can be found in [20]. Theorem 5.4. Consider the following optimization problem max E tr φ ρHQH H s.t. tr Q ≤ P Q0
(5.24)
under the assumption that H is distributed according to the model 1/2 1/2 in Section 4.1.1, i.e., H ∼ RR W RT . Q is to be fixed for different realizations of W but may depend on RR and RT . Denote and order the eigenvalues of RT by λT1 ≥ · · · ≥ λTn ≥ 0. The optimal covariance matrix Q has eigenvectors which commute with the eigenvectors of RT and eigenvalues p = [p1 , . . . , pn ] that are characterized for all 1 ≤ k ≤ n by ! # " n X 1/2 T H 1/2 0 T 1/2 H 1/2 ρλk E wk RR φ ρ pl λ l R R w l w l R R RR w k l=1
for some µ > 0 such that
Pn
k=1 pk
( =µ
pk > 0
≤µ
pk = 0
(5.25)
= P.
One proof can be found in [71]. The first part follows from invariance properties of W and the second part follows from the optimality conditions that are described in Appendix 6.2.3. Remark 5.1. On the one hand, there is a closed form solution for the optimal eigenvectors that solve (5.24), but on the other hand, the optimal eigenvalues that solve (5.24) are only given in an indirect form. However, the properties of the optimal eigenvalues are analyzed based on the implicit characterization. Depending on the parameter ρ in (5.24), the number of eigenvalues greater than zero is determined, i.e., the higher ρ the more eigenvalues of Q are greater than zero. This leads to the question where the transitions are, e.g., for which ρ there is only one eigenvalues of Q greater than zero, i.e., Q has rank one.
5.2 Optimization of Matrix-Monotone Functions
117
Corollary 5.1. The optimization problem (5.24) has a rank one solution if and only if h i 1/2 1/2 λ2 E tr RR φ0 RR w1 wH 1 RR h i 1/2 0 1/2 1/2 H 1/2 ≤ λ1 E w H RR w 1 . 1 RR φ RR w 1 w 1 RR This follows directly from (5.25). 5.2.2
Min–max Problems
In this section, three different min–max problems with objective function tr φ(Z −1/2 HQH H Z −1/2 ) are considered. The constraint on the covariance matrix Q will always be Q ≥ 0 and tr Q ≤ P . The constraint set of the covariance matrix Z will vary. The results are stated in [18] without proof. Therefore, we give the complete proofs. In the analysis of vector channels, minimax problems are often studied of the following type min max tr φ(Z −1/2 HQH H Z −1/2 )
Z∈Z Q∈Q
(5.26)
in which the noise Z is positive semidefinite and in some set of admissible noise plus interference Z and the transmit strategy Q is positive semidefinite and belongs to some set of admissible transmit strategies Q. The matrix H with singular value decomposition H = U ΛV H is fixed and depends on the transmission medium. If the AWGN channel is studied, H = I. Otherwise, the matrix H can be a flat-fading MIMO channel matrix, or a frequency-selective channel matrix, etc. In general, we assume that the transmit strategy is power limited, i.e., Q = {Q : tr (Q) ≤ P }. However, the noise Z can be created by a variety of effects, e.g., thermal noise, intercell, intracell, inter-symbol, or inter-carrier-interference. Z has full rank. The concrete structure of Z depends on the application, transmit- and receive strategies. In
118
Application of Matrix-Monotone Functions in Wireless Communications
[14, 145], expressions like (5.26) in which φ was the mutual information were studied under different admissible sets Z and Q. For mutual information [106] studies a general game-theoretic framework for min–max optimization. The results in this section were stated in [17] without proofs. 5.2.2.1
Min–max Problem with a Trace Constraint
In this section, the worst case matrix Z with a trace constraint is characterized. The optimization problem is given by ΦI =
inf
max
tr (Z)≤σ 2 n tr (Q)≤P
tr F (Z −1/2 HQH H Z −1/2 ).
(5.27)
The following Theorem 5.5 characterizes the value of the minimax problem in (5.27). Theorem 5.5. The value of the minimax problem in (5.27) is given by P H HH . (5.28) ΦI = tr φ σ2n
Remark 5.2. Note that the argument of the RHS in (5.28) is independent of the function φ. Proof. First, we prove that the minimax performance equals the minimax performance of the expression −1/2 1/2 1/2 −1/2 ΦD = inf max tr φ Λ Λ Λ Λ Λ . (5.29) Q H I Z H Z tr ΛZ ≤nσ 2 tr ΛQ ≤P
1/2
The singular value decomposition of H is given by H = U H ΛH V H H. D At first, we show that for ΦI ≤ ΦI , we have tr φ(Z −1/2 HQH H Z −1/2 )
max tr (Q)≤P
=
max tr (Q)≤P
=
H H QV
max tr (Q)≤P
1/2
1/2
max tr (V
=
1/2
H −1/2 tr φ(Z −1/2 U H ΛH V H ) H QV H ΛH U H Z
)≤P
1/2
−1/2 tr φ(Z −1/2 U H ΛH QΛH U H ) HZ 1/2
1/2
−1/2 tr φ(Z −1/2 U H ΛH QΛH U H ). HZ
5.2 Optimization of Matrix-Monotone Functions
119
ˆ = U H ΛZ U H fixed, then it directly follows Now, we choose Z H ΦI =
inf
tr
≤
(Z)≤σ 2 n
tr φ(Z −1/2 HQH H Z −1/2 )
max tr (Q)≤P
inf
−1/2
max
tr (Z)≤σ 2 n tr (Q)≤P
tr φ(ΛZ
1/2
−1/2
1/2
ΛH QΛH ΛZ
)
= ΦD I .
(5.30)
Next, we use Theorem 3.16 to show that ΦI ≥ ΦD I . With Theorem 3.16 we have −1/2
tr φ(ΛZ
1/2
−1/2
1/2
ΛH QΛH ΛZ ) m X H ≥ min tr φ(λ−1 πi (Z)λi (HQH )). π
(5.31)
i=1
The maximum over Q of the term in (5.31) is greater than or equal to the termwith the choice of U Q = U H H , i.e., −1/2
max
tr φ(ΛZ
tr (Q)≤P
≥ min π
m X
1/2
1/2
−1/2
ΛH QΛH ΛZ
)
ˆ tr φ(λ−1 πi (Z)λi (H)λi (Q)).
(5.32)
i=1
Inequality (5.32) is valid for all Z. Therefore, we have inf
tr
(Z)≤nσ 2
≥
−1/2
max tr (Q)≤P
inf
tr φ(ΛZ
max
m X
tr (Z)≤nσ 2 tr (Q)≤P
1/2
1/2
−1/2
ΛH QΛH ΛZ
)
tr φ(λ−1 i (Z)λi (H)λi (Q)).
(5.33)
i=1
From (5.33) it follows ΦI ≥ ΦD I . From (5.30) and (5.34), it follows ΦI = ΦD I . Next, the minimax problem −1/2 H −1/2 inf max tr φ(Z HQH Z ) tr (Z)≤σ 2 n
tr (Q)≤P
(5.34)
120
Application of Matrix-Monotone Functions in Wireless Communications
yields two different Lagrangian’s: One for the maximization with respect to Q and one with respect to Z. We start with the Lagrangian of the minimization problem ! ! ν ν X X λ (H)λ (Q) i i 2 ˆ Z , ξk , µ) = L(Λ +µ φ λl (Z) − nσ λi (Z) i=1
+
ν X
l=1
ξk λk (Z).
(5.35)
k=1
The objective function φ is monotonic increasing. Therefore, the Lagrangian multiplier ξk which ensures that the eigenvalues of the matrix Z are greater than or equal to zero, are all equal to zero, because λk (Z) > 0 for all 1 ≤ k ≤ ν. Since the optimization problem is convex with respect to the eigenvalues of Z, we have the necessary and sufficient Karush–Kuhn–Tucker (KKT) condition from (5.35) 0 λi (H)λi (Q) λ (H)λ (Q)φ ˆ Z , µ) i i λi (Z) ∂L(Λ =− + µ = 0. (5.36) 2 ∂λi (Z) λi (Z) We express (5.36) as λ∗i (Z)2 0 λi (H)λi (Q) . φ = µ λ∗i (Z) λi (H)λi (Q)
(5.37)
P The Lagrangian multiplier µ has to be chosen that νi=1 λ∗i (Z) = nσ 2 . The derivation for the maximization with respect to Q goes similar lines. In the second part of the proof, we further characterize the worst P case matrix Z eigenvalues. Denote ρ = nσ 2. We denote the eigenvalues of the optimal matrix Q∗ with λi (Q∗ ) and the worst case matrix Z ∗ eigenvalues with λi (Z). The so called “water-filling” solution of the matrix Q eigenvalues is given for all λi (Q∗ ) > 0 as ∗ λk (Z) 0 λk (H)λk (Q ) φ =µ ˜ (5.38) λk (Z) λk (H) with µ ˜ ≥ 0. We show that the choice λi (Z ∗ ) = ρ1 λi (Q∗ ) fulfills both optimality conditions (5.37) and (5.38) simultaneously. This result is
5.2 Optimization of Matrix-Monotone Functions
121
derived by computing the Lagrangian multiplier for (5.38) and (5.37) by noting that µ ˜λk (Q∗ ) = µλk (Z ∗ ).
(5.39)
Finally, the two Lagrangian multipliers are related from (5.39) by µ ˜=
nσ 2 µ. P
As a result, the eigenvalues of the worst case matrix Z in (5.37) correspond to the weighted eigenvalues of the optimal matrix Q 1 λ∗i (Z) = λ∗i (Q). ρ Finally, we set λk (Q∗ ) into the performance function and obtain (5.28), which completes the proof.
Remark 5.3. If m > ν = rank(Q), we have to allocate to the eigenvalues ν + 1, . . . , m of Z. By → 0, the minimax value of the Theorem is achieved. If m ≤ ν, we can replace the infimum by the minimum in the minimization problem. 5.2.2.2
Min–max Problem with Eigenvalue Constraints
We assume that the eigenvalues of Z are fixed and ordered, i.e., λ1 (Z) ≥ λ2 (Z) ≥ · · · ≥ λn (Z). Here, we study the impact of the unitary matrix U Z . We write the set of unitary n × n matrices as U(n). Let us define the optimization problem as ΦII = min
max
W ∈U (n) tr (Q)≤P
−1/2
tr φ(ΛZ
−1/2
W H HQH H W ΛZ
).
(5.40)
Using Theorem 3.16, one can easily prove the following corollary. Corollary 5.2. The solution to the minimization problem in (5.40) is given by a permutation matrix P , i.e., −1/2
ΦII = tr φ(ΛZ
−1/2
P H ΛH ΛQ ΛH H P ΛZ
).
122
Application of Matrix-Monotone Functions in Wireless Communications
5.2.2.3
Min–max Optimization with Diagonal Constraints
We define the value of the minimax problem as ΦIII = min
max
Z∈Z3 tr (Q)≤P
tr F (Z −1/2 HQH H Z −1/2 ).
(5.41)
The set Z3 denotes the set of positive semi-definite matrices which have fixed diagonal entries σn2 . Furthermore, we define the following programming problem ΦD III =
P
tr F (H H diag([p1 , . . . , pn ])H). (5.42)
max
n k=1 pk =P,pk ≥0
The minimax problem in (5.41) was studied in [147, Sec. III.c] in the context of uplink and downlink duality. Further on, in the tutorial paper [91] the application of Lagrange duality theory for convex optimization problems to show the uplink downlink duality and a similar optimization problem as in (5.41) is analyzed. Remember Lemma 3.17 and (3.19). In the context of wireless communications, the result in (3.19) is called reciprocity of uplink and downlink transmission. This result is needed in the next theorem. Theorem 5.6. The value of (5.41) equals the value of (5.42), i.e., ΦIII = ΦD III . Proof. We start with ΦIII in (5.41). We have by the reciprocity result from the last lemma ΦIII = min
max
tr φ(Z −1/2 HQH H Z −1/2 )
= min
max
tr φ(H H Z −1/2 SZ −1/2 H)
Z∈Z3 tr (Q)≤P Z∈Z3 tr (S)≤P
= min
max
Z∈Z3 tr (SZ)≤P
tr φ(H H SH).
(5.43)
Next, we define the maximization problem ΦIIIa (Z) =
max
tr φ(H H SH).
(5.44)
tr (SZ)≤P
We obtain the upper bound for (5.43) by choosing a concrete Z in (5.44). This leads to the upper bound on ΦIII . As a result, it holds ΦD III ≤ ΦIII ≤ ΦIIIa (Z).
(5.45)
5.2 Optimization of Matrix-Monotone Functions
123
For the lower bound in (5.45), we choose S to be diagonal. Then the minimization with respect to Z has no impact on the diagonal S, because the diagonal entries of the Z are constrained to be less than or equal to σ 2 . Note that the objective function in (5.44) and (5.42) are equal, namely tr φ(H H SH). Next, we study the optimality conditions for the optimization problem in (5.44) and (5.42). The necessary and sufficient condition for optimality of S ∗ in (5.44) is given by the KKT conditions (6.5) in Appendix 6.2.3. The Lagrangian of the optimization problem in (5.44) is given by L(S, λ, Ψ) = tr φ(H H SH) − λ (tr (SZ) − P ) + tr ΨS with Lagrangian multipliers λ ≥ 0 for the trace constraint and Ψ ≥ 0 for the positive-semi definiteness constraint. As a result, the KKT conditions for the problem (5.44) are given by tr SZ ≤ P tr ΨS = 0 ∂ L(S, λ) = Hφ0 (H H S ∗ H)H H − λZ + Ψ = 0. ∂Q S=S ∗
(5.46)
The KKT conditions (see (6.5) in Appendix 6.2.3) for the optimization in (5.42) are derived via the Lagrangian ! n n X X H ¯ L(p, ν, ψ) = φ(H diag(p)H) − ν pk − P + pk ψ k k=1
k=1
as tr diag(p) ≤ P tr diag(p)diag(ψ) = 0 0
H
∗
hi φ (H diag(p )H)hH i − ν + ψi = 0.
(5.47)
The row vector hi is the ith row vector of H. Observe that the first KKT condition in (5.47) is also fulfilled for all Z with fixed diagonal entries. The KKT conditions in (5.47) correspond to the KKT conditions in (5.46). This means that the value of ΦD III is equal to the value of ΦIIIa (Z) for 1 1 (5.48) Z ∗ = Hφ0 (H H S ∗ H)H H + Ψ. λ λ
124
Application of Matrix-Monotone Functions in Wireless Communications
Note that the matrix Z ∗ has full rank regardless of the rank of S. Furthermore, note that the matrix Z and the Lagrangian multiplier Ψ to ensure positive semi-definiteness of S fulfill tr S ∗ Ψ = 0 tr Z ∗ S ∗ = P. The Lagrangian multiplier λ is derived from (5.42) as λ = hk φ0 (H H diag(p)H)hH k
(5.49)
for all k for which pk > 0. The columns k of the channel H that correspond to pk = 0 can be omitted. For the minimax problem in ΦIII this means that the effective channel can be reduced by canceling all those columns k of H. Therefore, the rows of Z −1/2 that correspond to these k do not influence the value of ΦIII . Especially the kth eigenvalue of Z can be chosen arbitrarily according to the positiveness constraint and to the diagonal constraint. As a result, the kth diagonal entry of Z can be smaller than σ 2 without increasing the value of the minimax problem ΦIII . Finally, these kth diagonal entries of Z can be “filled up” to σ 2 without decreasing the value of ΦIII . 5.2.3
MSE Based Performance Measures
Let us start with the weighted sum of MSEs for single-user MIMO systems [117] n X k=1
h
1/2
λk I + Q
H
1/2
H HQ
i−1 kk
h i−1 1/2 H 1/2 = tr Λ I + Q H HQ
with diagonal matrix Λ = diag(λ1 , . . . , λn ). Next, consider a matrixconvex performance function φ that is based on the individual MSEs and a scalar mapping f , e.g., f (x) = − log(x) for mutual information where φ(x) = f (1/x). Then the following inequality holds for the
5.2 Optimization of Matrix-Monotone Functions
125
weighted sum performance n n X X −1 λk f (MSEk ) = λk f I + Q1/2 H H HQ1/2 kk k=1
k=1
≤
n X
λk φ I + Q1/2 H H HQ1/2 kk
k=1
= tr Λφ I + Q1/2 H H HQ1/2 = J(λ, Q).
(5.50)
The inequality in (5.50) follows from Jensens inequality. Note that the inequality is fulfilled with equality if the MSE matrix is a diagonal matrix. There are many cases in which the MSE matrix is actually diagonal, e.g., single-user MIMO with perfect CSI [105], single-user (and also multi-user) MIMO with covariance or mean feedback [167], and single-user MIMO with equal power allocation and isotropically distributed channel [78]. A counter example is the multiuser MIMO case with perfect CSI where the MSE matrix is not diagonal. However, in such cases, the function I(λ, Q) provides an upper bound to the performance. Example 5.1. This example is an easy extension to the case considered in [117] and serves merely for illustration. Consider the single-user MIMO case with perfect CSI at transmitter and receiver. The optimal transmit strategy is to send in direction of the right eigenvectors of the channel matrix, the resulting power allocation problem reads n X J(λ) = max λk φ (γk pk ) pk ≥0:
P p ≤P k
k=1
H
with eigenvalues γ1 , . . . , γn of H H. The optimal power allocation is easily found as + µ 1 ∗ 0 ˜ pk = φ (5.51) λk γk γk with φ˜0 as the inverse of the first derivative of φ and µ such that Pn ∗ k=1 pk = P . Also compare (5.51) to [90].
126
Application of Matrix-Monotone Functions in Wireless Communications
The optimal power allocation specializes directly to p∗k = λµk − q λk 1 + 1 + ∗ = for mutual information and p (see [117, (21)]). k γk µγk − γk The resulting function J(λ) is thus given by n X µ J(λ) = λk ψ λk γk k=1
with concatenated function ψ(x) = φ(φ˜0 (x)). If the inequality in (5.50) is not tight or should not be used, it is also possible to define a class of optimization problems based on the diagonal elements of the MSE matrix as f0 (d(M )) with diagonal operator d h i−1 and MSE matrix M = I + Q1/2 H H HQ1/2 . The following class of MSE based optimization problems has been studied in [105, Thm. 1] −1 min f0 d I + Q1/2 H H HQ1/2 s.t. tr Q ≤ P (5.52) Q0
The solution to (5.52) was completely characterized for Schur-convex and Schur-concave functions f0 . For these two classes the optimal Q diagonalizes the H H H as in Theorem 3.16. 5.2.4
Sum of Quadratic Hermitian Forms
From the discussion in Section 5.1.4 the sum performance in (5.18) of the MIMO MAC and BC was obtained: The sum capacity of the MIMO MAC with SIC and MIMO BC with Costa Precoding as well as the average normalized sum MSE for the MIMO MAC can be written in the following generalized representation using an arbitrary matrixmonotone performance function φ(X) ! K X Φ(ρ, Q, H) = tr φ ρ H k Qk H H . (5.53) k k=1
5.2.4.1
Individual Power Constraints
In order to maximize the sum performance in (5.53) for fixed and known channel realizations H k , find the optimal transmit covariance matrices
5.2 Optimization of Matrix-Monotone Functions
127
Q∗1 , . . . , Q∗K , i.e., solve max
Q1 ,...,QK
tr φ ρ
K X
! H k Qk H H k
k=1
subject to tr Qk ≤ pk and Qk 0, 1 ≤ k ≤ K.
(5.54)
The next result shows that the optimal covariance matrices can be found by iterative single-user performance optimization with colored noise. This approach corresponds with the iterative waterfilling approach in [164] for sum capacity optimization in which single-user waterfilling is iteratively performed treating the other users as noise. On the one hand this approach provides insight into the structure of the optimum transmit covariance matrices and on the other hand under specific conditions this approach is computational more efficient than the joint optimization of the transmit covariance matrices. If the number of users is large in comparison to the number of transmit antennas of the users, the joint optimization is computational more complex than the iterative optimization of each user separately. Theorem 5.7. The optimal transmit covariance matrices solve the optimization problem in (5.54) if and only if they fulfill the following set of conditions for all 1 ≤ i ≤ K ! K X H 0 H ρH i φ ρ H k Qk H i H i = νi I − Ψi , k=1
tr (Ψi Qi ) = 0, Ψi 0, Qi 0, νi ≥ 0, pi − tr (Qi ) ≥ 0, νi (pi − tr (Qi )) = 0.
(5.55)
The proof is based on the optimality conditions in (6.5) and can be found in [20]. The following interpretation of the optimality result in Theorem 5.7 leads to an elegant solution of the optimization problem. Based on the optimality conditions in Theorem 5.7, it can be shown that a certain kind of iterative single-user optimization solves (5.54). For the kth user, we fix the transmit covariance matrices of all other
128
Application of Matrix-Monotone Functions in Wireless Communications
users and write the normalized noise plus interference as Zk = I + ρ
K X
H l Ql H H l .
(5.56)
l=1 l6=k
Next, the following optimization is performed iteratively for all users 1, . . . , K max tr φ Z k + ρH k Qk H H k subject to tr Qk ≤ pk and Qk 0, 1 ≤ k ≤ K. 5.2.4.2
(5.57)
Power Allocation Under Sum Power Constraint
The channel realizations H k of all users k are assumed to be known. Keep the transmit covariance matrices fixed Q01 , Q02 , . . . , Q0K . Distribute a fixed amount of transmit power P across the mobiles, i.e., solve ! K X H 0 pk H k Qk H k max tr φ ρ p1 ,...,pK
subject to
k=1 K X
pk ≤ P and pk > 0, 1 ≤ k ≤ K.
(5.58)
k=1
Theorem 5.8. The optimal power allocation p1 , . . . , pK solves the optimization problem in (5.58) if and only if it fulfills for all 1 ≤ i ≤ K " K #! X 0 H 0 0 H tr ρH i Qi H i φ ρ pk H k Qk H k = µ − λi , k=1
λk pk = 0, λk ≥ 0, µ ≥ 0, pk ≥ 0, ! K K X X P − pk ≥ 0, µ P − pk = 0 k=1
(5.59)
k=1
The proof is based again on the optimality conditions in (6.5) and can be found in [20]. These conditions in (5.59) are fulfilled by the optimum power allocation vector for fixed covariance matrices. Observe, that for all
5.2 Optimization of Matrix-Monotone Functions
129
active users l the Lagrangian multiplier λl is equal to zero. There 0 PK H 0 = µ is fulfore, the condition tr ρH i Q0i H H i φ ρ k=1 pk H k Qk H k filled. This directly leads to the following characterization of the SNR range in which single-user power allocation (or TDMA) is optimal.
Theorem 5.9. Let the users be ordered according to their maximum channel eigenvalues in decreasing order. Let Qs1 be the optimal transmit covariance matrix for the single-user channel H 1 according to generalized single-user water filling from Theorem 5.4. The optimal transmit strategy is described by the following points: (1) For fixed ρˆ, it is optimal to allocate the complete sum power to user one, i.e., the user with the largest maximum channel eigenvalue, if and only if the following condition is satisfied 0 λmax H H ˆH 1 Qs1 (ˆ ρ)H H 2 φ ρ 1 H2 0 ≤ λmax H H ˆH 1 Qs1 (ˆ ρ)H H 1 φ ρ 1 H1
(5.60)
ρ) = Qs1 (ˆ ρ) then the optimal set of covariance matrices is Q1 (ˆ and Q2 (ˆ ρ) = Q3 (ˆ ρ) = · · · = QK (ˆ ρ) = 0. (2) If (5.60) is fulfilled for ρˆ then (5.60) holds for all ρ ≤ ρˆ, too. Therefore, the single-user optimality range is exactly characterized by choosing the largest ρˆ for which (5.60) holds. H (3) If λmax (H 1 H H 1 ) > λmax (H j H j ) for all 2 ≤ j ≤ K then user one is supported first.
The proof can be found in [20]. 5.2.4.3
Sum Performance Optimization Under Sum Power Constraints
Combining these two steps — power allocation and transmit covariance matrix optimization — then one arrives at the general problem of sum
130
Application of Matrix-Monotone Functions in Wireless Communications
performance optimization under a sum power constraint ! K X H ˜ kH max tr φ ρ HkQ k
¯ ,...,Q ¯ Q 1 K
subject to
k=1 K X
˜ k ≤ P and Q ˜ k 0, 1 ≤ k ≤ K. (5.61) tr Q
k=1
Theorem 5.10. The optimal transmit covariance matrices solve (5.61) if and only if they fulfill the following set of conditions for all 1 ≤ i ≤ K " K # X 0 ˜ k H H H i = µI − Ψi , ρH H φ ρ HkQ i
k
k=1
˜ i 0, µ ≥ 0, ˜ i Ψi ) = 0, Ψi 0, Q tr (Q ! K K X X ˜ k ) = 0. ˜ k ) ≥ 0, µ P − tr (Q P − tr (Q k=1
(5.62)
k=1
The proof is based on the optimality conditions (6.5) and can be found in [20]. Furthermore, in [20], an algorithm is developed which is based on alternating optimization, the optimality in the fixed point and the convergence is proved. 5.2.4.4
Illustration of Single-user Optimality Range
For comparison purposes, we consider the same two-user MIMO MAC as in the example 1 in [147], i.e., two transmit antennas per user and two receive antennas at the base station with the following real-valued channel matrices for user one and two, respectively 1 0.5 0.2 2 H1 = , H2 = . (5.63) 0.8 2 1 0.5 The channel matrix one has the maximum eigenvalue. This means that user one is the best user and is supported first. In Figure 5.3(a), we show the single-user power level and the maximum eigenvalue of the effective channel which user two experiences. The noise variance is fixed σn2 = 1.
5.2 Optimization of Matrix-Monotone Functions
131
(a)
(b) Fig. 5.3 Sum capacity optimization for MIMO MAC: Optimal power allocation and comparison with suboptimal transmit strategy. (a) Single-user power level for user 1 and maximum eigenvalue of channel matrix for user 2 as a function of the maximum transmit sum power. (b) Sum capacity comparison between optimal transmit strategy compared to single-user only strategy for multiple antenna MAC with two users and the channels in (5.63).
132
Application of Matrix-Monotone Functions in Wireless Communications
In the example for the illustration in Figure 5.3(a), the single-user range is up to an SNR of about −12 dB. From this point on, the optimal transmit strategy supports more than one user. Hence, the convenient scheme that supports only one user at a time (or time-division multipleaccess TDMA) is not optimal for higher SNR values. In [80], it was shown that in single-antenna MAC the optimal transmit strategy is to support only one user at a time. That lead to the notion of multiuser diversity and the development of scheduling algorithms based on that information theoretic result. In multiple antenna systems, this singleuser optimality is not valid [16]. The necessary and sufficient optimality condition can be found in (5.60).
6 Appendix
6.1
Linear Algebra
Most of the material can be found in [50, 51]. Since this section contributes basic results, the proofs are omitted. Proposition 6.1 (Singular value decomposition). Every square matrix A of dimension n can be decomposed into A = V ΣW H , where V , W are unitary square matrices of size n × n and Σ is a diagonal matrix with nonnegative main diagonal entries and the rank of Σ is the same as the rank of A.
Proposition 6.2 (Eigenvalue decomposition). Let A be a square matrix of dimension n. Then A is Hermitian if and only if there is a square unitary matrix U of size n and a real diagonal matrix Λ of size n × n such that A = U ΛU H .
133
134
Appendix
Lemma 6.3 (Matrix inversion lemma). Suppose A, B are square and invertable matrices. It holds −1 A + XBX H −1 H −1 = A−1 − A−1 X B −1 + X H A−1 X X A . (6.1)
6.2
Convex Optimization
The book [24] provides a very good overview over convex optimization. However, we review certain results that are useful for immediate application in the current work. The material is mainly from [24, Ch. 5]. 6.2.1
Lagrange Dual Function
Consider the following optimization problem given in standard form: p∗ = min f0 (x)
s.t. fi (x) ≤ 0 for i = 1, . . . , m and hi (x) = 0 for i = 1, . . . , p
(6.2)
for x ∈ X . Definition 6.1 (Lagrangian). The Lagrangian L : X × Rm × Rp → R for problem (6.2) is L(x, λ, ν) = f0 (x) +
m X
λk fk (x) +
k=1
p X
νk hk (x).
k=1
The Lagrangian multiplier λi is associated with the ith inequality constraint, whereas νi is associated with the ith equality constraint. The vectors λ and ν are called the dual variables or Lagrange multiplier vectors for the problem (6.2). The Lagrange dual function g : Rm × Rp → R is the minimum value of the Lagrangian over x g(λ, ν) = inf L(x, λ, ν). x∈X
6.2 Convex Optimization
135
Remark 6.1. Since the dual function is the pointwise infimum of a family of affine functions of (λ, ν), it is concave, even if the problem in (6.2) is not convex.
Lemma 6.4 (Section 5.1.3 [24]). The dual function yields lower bounds on the optimal value of the problem (6.2). For any λ 0 and any ν we have g(λ, ν) ≤ p∗ .
Proof. Suppose x ¯ is a feasible point for the problem (6.2), i.e., fi (¯ x) ≤ 0 and h(¯ x) = 0, and λ 0. Obviously m X
λk fk (¯ x) +
k=1
p X
νk hk (¯ x) ≤ 0,
k=1
since the terms in the first sum are nonpositive and terms in the second sum are all zero. As a result, g(λ, ν) = inf L(x, λ, ν) ≤ L(¯ x, λ, ν) ≤ f0 (¯ x). x∈X
This lower bound holds for all feasible points x ¯ ∈ X.
Remark 6.2. In the Lagrangian function, a constraint violation is weighted linearly by its corresponding multiplier. Since the constraints are to be fulfilled in any way, the first approach would be to define the function outside the feasible range as ∞. This hard punishment of constraint violation is replaced by the linear one in the Lagrangian multiplier L.
Definition 6.2 (Langrange dual problem). A natural question regarding the Lagrange dual function is about the best lower bound that can be obtained from the Lagrange dual function d∗ = max g(λ, ν)
s.t. λ 0.
(6.3)
136
Appendix
This is called the Langrange dual problem associated with the problem (6.2).
Remark 6.3. The Lagrange dual problem (6.3) is always a convex optimization problem, since the objective to be maximized is concave and the constraint is convex. This is independent of whether the primal problem in (6.2) is convex or not. Furthermore, the optimal value of the dual problem is always smaller than or equal to the value of the primal problem d ∗ ≤ p∗ . The difference between these two p∗ − d∗ is the optimal duality gap is always nonnegative. 6.2.2
Strong Duality
Definition 6.3 (Strong duality). Strong duality holds if the duality gap p∗ − d∗ = 0. This means that the best bound that can be obtained from the Lagrange dual function is tight.
Remark 6.4. Strong duality does not, in general, hold. Even convexity of the primal problem is not sufficient. The conditions under which strong duality holds are called constraint qualifications. One simple constraint qualification is Slater’s condition: There is an x ∈ X s.t. fi (x) < 0 for all i = 1, . . . , m and hi (x) = 0 for all i = 1, . . . , p. Slater’s theorem says that strong duality holds, if Slater’s condition holds (and the problem is convex). Consider in the following the optimization problem with only inequality constraints max f0 (x)
s.t. fi (x) ≤ 0 for i = 1, . . . , m.
Denote the Lagrangian multiplier for the inequality constraints by λk . The connection to max–min problems is explained by the following
6.2 Convex Optimization
137
representations: The weak duality can be expressed as the inequality sup inf L(x, λ) ≤ inf sup L(x, λ)
λ0 x
x λ0
and strong duality as the equality sup inf L(x, λ) = inf sup L(x, λ).
λ0 x
x λ0
This can be generalized to arbitrary functions f (x, y). In order to decide whether the min–max expressions satisfy the saddle-point property min max f (x, y) = max min f (x, y) x∈X y∈Y
y∈Y x∈X
(6.4)
we use Theorem 1 in [35]. One result in [35, Thm. 1] states, that (6.4) is fulfilled, if X and Y are two compact Hausdorff spaces, f is for every y ∈ Y lower semi-continuous, f is for every x ∈ X upper semicontinuous, and f is convex on x and concave on y and if the sets X and Y are convex, too. The following saddle-point interpretation can also be applied f (x∗ , y) ≤ f (x∗ , y ∗ ) ≤ f (x, y ∗ ), for all x ∈ X and y ∈ Y . In other words, x∗ minimizes f (x, y ∗ ) and y ∗ maximizes f (x∗ , y) f (x∗ , y ∗ ) = inf f (x, y ∗ ) x∈X
and
f (x∗ , y ∗ ) = sup f (x∗ , y). y∈Y
This also implies that the strong max–min property holds (and therefore the strong duality). Example 6.1. Consider the following convex programming problem min x2
s.t. x ≤ −3.
In standard form (6.2) we have f0 (x) = x2 and f1 (x) = x + 3. The Lagrangian is given by L(x, λ) = x2 + λ(x + 3).
138
Appendix
The dual function is given by 1 g(λ) = − λ2 + 3λ. 4 The maximum of the dual function is found to be at λ∗ = 6 and yields the value of the dual problem d∗ = 9. The solution of the primal problem is then x∗ = − 12 λ = −3 and the value of the primal problem is p∗ = 9. Since the problem is convex, strong duality holds, i.e., p∗ = d∗ . 6.2.3
Optimality Conditions
Assume that strong duality holds. Let x∗ be a primal optimal and (λ∗ , ν ∗ ) be a dual optimal point. This means that ! p m X X ∗ ∗ ∗ ∗ ∗ f0 (x ) = g(λ , ν ) = inf f0 (x) + λi fi (x) + νi hi (x) x
k=1
≤ f0 (x∗ ) +
m X k=1
k=1
λ∗i fi (x) +
p X
νi∗ hi (x) ≤ f0 (x∗ ).
k=1
The two inequalities in this chain hold with equality. Therefore, it follows that the minimizer of the Lagrangian with respect to x is x∗ . In P ∗ ∗ addition to this, m k=1 λi fi (x ) = 0. It follows the so called complementary slackness condition λ∗i fi (x∗ ) = 0 for all i = 1, . . . , m. This means that the ith optimal Lagrange multiplier is zero unless the ith constraint is active at the optimum. Finally, we arrive at the Karush–Kuhn–Tucker optimality conditions (KKT conditions). Definition 6.4 (KKT conditions). The KKT conditions for the optimization problem in (6.2) are given by ∇f0 (x∗ ) +
m X i=1
λ∗i ∇fi (x∗ ) +
p X
νi∗ ∇hi (x∗ ) = 0,
i=1
fi (x∗ ) ≤ 0, i = 1, . . . , m hi (x∗ ) = 0, i = 1, . . . , p λ∗i 0, i = 1, . . . , m λ∗i fi (x∗ ) = 0, i = 1, . . . , m.
(6.5)
6.2 Convex Optimization
139
Lemma 6.5 (KKT optimality conditions). The KKT conditions are necessary optimality conditions for any nonlinear optimization problem under the assumption that the constraints are regular (see, e.g., [114]). If the primal problem is convex, the KKT conditions are also sufficient. If a convex optimization problem with differentiable objective and constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality.
7 Acknowledgments
Part of the content of this book was presented during lectures at the Technische Universit¨at Berlin, Germany, within the course “Applied Information Theory” from 2005–2007, at the Royal Institute of Tech nology, Stockholm, Sweden, within the course “Advanced Digital Communications,” and at the Beihang University, Beijing, China within the course “Advanced Digital Communications” in 2007. This work has been supported in part by the Bundesministerium f¨ ur Bildung und Forschung (BMBF) under Grant BU150 and in part by the Swedish Research Foundation (Vetenskapsr˚ adet) under Grant 623-2005-5359.
141
References
[1] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. Dover Publications, 1970. [2] N. Ahmed and R. G. Baraniuk, “Throughput measures for delay-constrained communications in fading channels,” Allerton Conference on Communication, Control and Computing, October 2003. [3] S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451–1458, October 1998. [4] T. Ando, “On some operator inequalities,” Mathematique Annalen, vol. 279, pp. 157–159, 1987. [5] T. Ando and X. Zhan, “Norm inequalities related to operator monotone functions,” Mathematique Annalen, vol. 315, pp. 771–780, 1999. [6] G. Bauch and J. Hagenauer, “Smart versus dumb antennas-capacities and FEC performance,” IEEE Communication Letters, vol. 6, no. 2, pp. 55–57, February 2002. [7] I. Bettesh and S. Shamai, “Optimal power and rate control for minimal average delay: The single-user case,” IEEE Transactions on Information Theory, vol. 52, no. 9, pp. 4115–4141, September 2006. [8] R. Bhatia, Matrix Analysis. Springer-Verlag, 1997. [9] E. Biglieri, G. Caire, and G. Taricco, “Limiting performance of block-fading channels with multiple antennas,” IEEE Transactions on Information Theory, vol. 47, no. 4, pp. 1273–1289, May 2001. [10] E. Biglieri, J. Proakis, and S. Shamai (Shitz), “Fading channels: Informationtheoretic and communications aspects,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2619–2692, October 1998. 143
144
References
[11] I. Bjelakovic and H. Boche, “Structure of optimal input covariance matrices for mimo systems with covariance feedback under general correlated fading,” Proceedings of IEEE International Symposium on Information Theory (ISIT 2006), July 2006. [12] H. Boche, “Advanced network calculus for interference functions,” in Plenary Talk at IEEE SPAWC, 2007. [13] H. Boche and E. A. Jorswieck, “Sum capacity optimization of the MIMO Gaussian MAC,” Proceedings of 5th International Symposium on Wireless Personal Multimedia Communications, invited paper, vol. 1, pp. 130–134, October 2002. [14] H. Boche and E. A. Jorswieck, “Multiuser MIMO systems: Worst case noise and transmitter cooperation,” Proceedings of ISSSPIT, invited paper, 2003. [15] H. Boche and E. A. Jorswieck, “Uplink sumrate maximization with different types of channel state information at the transmitters,” Proceedings of IEEE ISSPIT, 2003. [16] H. Boche and E. A. Jorswieck, “Multiple antenna multiple user channels: Optimisation in low SNR,” Proceedings of IEEE Wireless Communications and Networking Conference, 2004. [17] H. Boche and E. A. Jorswieck, “Optimization of matrix monotone functions: Saddle-point, worst case noise analysis, and applications,” Proceedings of IEEE ISIT, 2004. [18] H. Boche and E. A. Jorswieck, “Optimization of matrix monotone functions: Saddle-point, worst case noise analysis, and applications,” Proceedings of IEEE ISIT, 2004. [19] H. Boche and E. A. Jorswieck, “Outage probability of multiple antenna systems: Optimal transmission and impact of correlation,” Proceedings of IEEE IZS, pp. 116–119, 2004. [20] H. Boche and E. A. Jorswieck, “On the performance optimization in multiuser MIMO systems,” European Transactions on Telecommunications, vol. 18, no. 3, pp. 217–233, April 2007. [21] H. Boche and M. Schubert, “A new approach to power adjustment for spatial covariance based downlink beamforming,” Proceedings of IEEE ICASSP, vol. 5, pp. 2957–2960, May 2001. [22] H. Boche and M. Schubert, “A general theory for uplink and downlink beamforming,” Proceedings of IEEE 56th Vehicular Technology Conference, vol. 1, pp. 87–91, September 2002. [23] H. B¨ olcskei and A. J. Paulraj, “Performance of space-time codes in the presence of spatial fading correlation,” in Proceedings of Asilomar Conference, 2000. [24] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004. [25] J. Brinkhuis, Z. Q. Luo, and S. Zhang, “Matrix convex functions with applications to weighted centers for semidefinite programming,” Technical Report SEEM2005-06, 2005. [26] G. Caire and S. Shamai (Shitz), “On the multiple-antenna broadcast channel,” 35th Asilomar Conference on Signals, Systems and Computers, 2001.
References
145
[27] G. Caire, G. Taricco, and E. Biglieri, “Optimum power control over fading channels,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1468– 1489, July 1999. [28] D. Chizhik, G. J. Foschini, M. J. Gans, and R. A. Valenzuela, “Keyholes, correlations, and capacities of multielement transmit and receive antennas,” IEEE Transactions on Wireless Communications, vol. 1, no. 2, pp. 361–368, April 2002. [29] D. Chizhik, G. J. Foschini, and R. A. Valenzuela, “Capacities of multielement transmit and receive antennas,” IEE Electronics Letters, vol. 36, pp. 1099– 1100, June 2000. [30] C.-N. Chuah, D. N. C. Tse, and J. M. Kahn, “Capacity scaling in MIMO wireless systems under correlated fading,” IEEE Transactions on Information Theory, vol. 48, no. 3, pp. 637–650, March 2002. [31] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 2–14, January 1972. [32] T. M. Cover, “Comments on broadcast channels,” IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2524–2530, October 1998. [33] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley & Sons, 1991. [34] W. F. Donoghue Jr, Monotone Matrix Functions and Analytic Continuation. Springer-Verlag, 1974. [35] K. Fan, “Minimax theorems,” Proceedings National Academic Society, vol. 39, pp. 42–47, 1953. [36] M. Fiedler, “Bounds for the determinant of the sum of hermitian matrices,” Proceedings of the American Mathematical Society, vol. 30, no. 1, pp. 27–31, September 1971. [37] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Communications, vol. 6, pp. 311–335, 1998. [38] D. Gerlach and A. Paulraj, “Adaptive transmitting antenna methods for multipath environments,” Global Telecommunications Conference, vol. 1, pp. 425– 429, December 1994. [39] A. Goel and A. Meyerson, “Simultaneous optimization via approximate majorization for concave profits or convex costs,” Algorithmica, vol. 44, pp. 301–323, 2006. [40] A. J. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of MIMO channels,” IEEE Jourunal on Selected Areas in Communications, vol. 21, no. 5, pp. 684–702, June 2003. [41] J.-C. Guey, M. P. Fitz, M. R. Bell, and W.-Y. Kuo, “Signal design for transmitter diversity wireless communication systems over rayleigh fading channels,” in 1996 IEEE Vehicular Techology Conference, pp. 136–140, Atlanta, GA, 1996. [42] D. Guo, Gaussian Channels: Information, Estimation and Multiuser Detection. PhD thesis, Princeton University, 2004. [43] D. Guo, S. Shamai (Shitz), and S. Verd´ u, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Transactions on Information Theory, vol. 51, no. 4, pp. 1261–1282, April 2005.
146
References
[44] S. Hanly and D. Tse, “Multiaccess fading channels: Part II: Delay-limited capacities,” IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 2816–2831, November 1998. [45] F. Hansen, G. Ji, and J. Tomiyama, “Gaps between classes of matrix monotone functions,” Bulletin London Mathematical Soceity, vol. 36, no. 1, pp. 53–58, 2004. [46] F. Hansen and G. K. Pedersen, “Jensen’s inequality for operators and L¨ owner’s theorem,” Mathematique Annalen, vol. 258, pp. 229–241, 1982. [47] F. Hansen and G. K. Pedersen, “Perturbation formulas for traces on c*algebras,” Publication of the Research Institute for Mathematical Sciences, Kyoto University, vol. 31, pp. 169–178, 1995. [48] G. Hardy, J. E. Littlewood, and G. P´ olya, Inequalities. Cambridge Mathematical Library, Second ed., 1952. [49] T. Haustein and H. Boche, “On optimal power allocation and bit-loading strategies for the MIMO transmitter with channel knowledge,” Proceedings of IEEE ICASSP 2003, vol. IV, pp. 405–409, 2003. [50] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985. [51] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge University Press, 1991. [52] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: Necessary and sufficient conditions,” Physics Letters A, vol. 223, pp. 1–8, 1996. [53] M. T. Ivrlac and J. A. Nossek, “Quantifying diversity and correlation in rayleigh fading mimo communication systems,” Proceedings of IEEE ISSPIT, vol. 1, pp. 158–161, 2003. [54] S. Jafar and A. Goldsmith, “Multiple antenna capacity in correlated rayleigh fading with channel covariance information,” IEEE Transactions on Wireless Communication, vol. 4, no. 3, pp. 990–997, May 2005. [55] S. A. Jafar and A. Goldsmith, “On optimality of beamforming for multiple antenna systems with imperfect feedback,” International Symposium on Information Theory, p. 321, 2001. [56] S. A. Jafar and A. Goldsmith, “Vector MAC capacity region with covariance feedback,” IEEE International Symposium on Information Theory, p. 54, 2001. [57] G. J¨ ongren, M. Skoglund, and B. Ottersten, “Combining beamforming and orthogonal space-time block coding,” IEEE Transactions on Information Theory, vol. 48, no. 3, pp. 611–627, March 2002. [58] E. Jorswieck, B. Ottersten, A. Sezgin, and A. Paulraj, “Guaranteed performance region in fading orthogonal space-time coded broadcast channels,” Proceedings of IEEE ISIT, 2007. [59] E. A. Jorswieck, Unified approach for optimisation of single-user and multi-user multiple-input multiple-output wireless systems. PhD thesis, Technical University of Berlin, Germany, September 2004. Available online: http://edocs.tu-berlin.de/diss/2004/jorswieck eduard.htm. [60] E. A. Jorswieck, Transmission strategies for the MIMO MAC, ch. 21, pp. 423– 442, Hindawi Publishing Corporation, 2005.
References
147
[61] E. A. Jorswieck, “Uplink throughput maximization with combined sum and individual power constraints,” IEEE Communications Letters, vol. 10, no. 12, pp. 816–818, December 2007. [62] E. A. Jorswieck, M. Bengtsson, and B. Ottersten, “On the interplay between scheduling, user distribution, csi, and performance measures in cellular downlink,” in EUSPICO, invited paper, 2006. [63] E. A. Jorswieck and H. Boche, “On transmit diversity with imperfect channel state information,” Proceedings of IEEE ICASSP, vol. 3, pp. III-2181–III-2184, May 2002. [64] E. A. Jorswieck and H. Boche, “Behavior of outage probability in MISO systems with no channel state information at the transmitter,” Proceedings of IEEE Information Theory Workshop, pp. 353–356, 2003. [65] E. A. Jorswieck and H. Boche, “On the optimal transmission strategy for the MIMO MAC with MMSE receiver,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, pp. 109–112, April 2003. [66] E. A. Jorswieck and H. Boche, “Optimal transmission with imperfect channel state information at the transmit antenna array,” Wireless Personal Communications, vol. 27, no. 1, pp. 33–56, 2003. [67] E. A. Jorswieck and H. Boche, “Transmission strategies for the MIMO MAC with MMSE receiver: Average MSE optimization and achievable individual MSE region,” IEEE Transactions on Signal Processing, vol. 51, no. 11, pp. 2872–2881, November 2003. [68] E. A. Jorswieck and H. Boche, “Optimal transmission strategies and impact of correlation in multi-antenna systems with different types of channel state information,” IEEE Transactions on Signal Processing, vol. 52, no. 12, pp. 3440–3453, December 2004. [69] E. A. Jorswieck and H. Boche, “Delay-limited capacity of parallel fading channels,” Proceedings of IEEE SPAWC, 2005. [70] E. A. Jorswieck and H. Boche, “Multiple-antenna capacity in the low-power regime: Channel knowledge and correlation,” Proceedings of IEEE ICASSP, 2005. [71] E. A. Jorswieck and H. Boche, “Performance analysis of MIMO systems in spatially correlated fading using matrix-monotone functions,” IEICE Transactions on Fundamentals, vol. E98-A, no. 5, pp. 1454–1472, May 2006. [72] E. A. Jorswieck and H. Boche, “Throughput analysis of cellular downlink with different types of channel state information,” in Proceedings of IEEE ICC, 2006. [73] E. A. Jorswieck and H. Boche, “Outage probability in multiple antenna systems,” European Transactions on Telecommunications, vol. 18, no. 3, pp. 217– 233, April 2007. [74] E. A. Jorswieck, H. Boche, and A. Sezgin, “Delay-limited capacity and maximum throughput of spatially correlated multiple antenna systems under average and peak-power constraints,” Proceedings of IEEE Information Theory Workshop, 2004.
148
References
[75] E. A. Jorswieck, T. Oechtering, and H. Boche, “Performance analysis of combining techniques with correlated diversity,” Proceedings of IEEE WCNC, vol. 2, pp. 849–854, March 2005. [76] E. A. Jorswieck and A. Sezgin, “Impact of spatial correlation on the performance of orthogonal space-time block codes,” IEEE Communications Letters, vol. 8, no. 1, pp. 21–23, January 2004. [77] E. A. Jorswieck, A. Sezgin, and H. Boche, “Outage probability of OSTBC: Optimal transmit strategy and suboptimality of odd number of transmit antennas,” Proceedings of IEEE ICASSP, vol. 4, pp. IV-177–IV-180, 2006. [78] A. Jovicic, P. Viswanath, and S. R. Kulkarni, “Upper bounds to transport capacity of wireless networks,” IEEE Transactions on Information Theory, vol. 50, no. 11, pp. 2555–2565, November 2004. [79] M. Kiessling, Statistical Analysis and Transmit Processing for MIMO Wireless Systems in Correlated Fading Environments. PhD thesis, University of Stuttgart, 2004. [80] R. Knopp and P. A. Humblet, “Information capacity and power control in single-cell multiuser communications,” Proceedings IEEE ICC, vol. 1, pp. 331– 335, June 1995. [81] R. Knopp and P. A. Humblet, “On coding for block fading channels,” IEEE Transactions on Information Theory, vol. 46, no. 1, pp. 189–205, 2000. [82] A. Knutson and T. Tao, “Honeycombs and sums of hermitian matriceso,” Notices of the American Mathematical Society, February 2001. ¨ [83] F. Kraus, “Uber konvexe Matrixfunktionen,” Mathematische Zeitschrift, vol. 41, pp. 18–42, 1936. [84] F. Kubo and T. Ando, “Means of positive linear operators,” Mathematique Annalen, vol. 246, pp. 205–224, 1980. [85] T. A. Lamahewa, R. A. Kennedy, T. D. Abhayapala, and T. Betlehem, “MIMO channel correlation in general scattering environments,” Proceedings Australian Communication Theory Workshop, 2006. [86] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless Communications. Cambridge University Press, 2003. [87] X. B. Liang, “Orthogonal designs with maximal rates,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2468–2503, October 2003. ¨ [88] K. T. L¨ owner, “Uber monotone Matrixfunktionen,” Mathematische Zeitschrift, vol. 38, pp. 177–216, 1934. [89] A. Lozano, A. M. Tulino, and S. Verd´ u, “Multiple-antenna capacity in the low-power regime,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2527–2544, October 2003. [90] A. Lozano, A. M. Tulino, and S. Verdu, “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 3033–3051, July 2006. [91] Z.-Q. Luo and W. Yu, “An introduction to convex optimization for communications and signal processing,” IEEE Journal on selected areas in communications, vol. 24, no. 8, pp. 1426–1438, August 2006. [92] A. W. Marshall and I. Olkin, “Inequalities: Theory of Majorization and Its Application,” in Mathematics in Science and Engineering, Academic Press, Inc. (London) Ltd., 1979.
References
149
[93] A. W. Marshall and F. Proschan, “An inequality for convex functions involving majorization,” Journal of Mathematical Analysis Applications, 1965. [94] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,” IEEE Transactions on Information Theory, vol. 45, no. 1, pp. 139–157, January 1999. [95] R. J. McEliece and W. E. Stark, “Channels with block interference,” IEEE Transactions on Information Theory, vol. 30, no. 1, pp. 44–53, January 1984. [96] R. U. Nabar, H. B¨ olcskei, V. Erceg, D. Gesbert, and A. J. Paulraj, “Performance of multiantenna signaling techniques in the presence of polarization diversity,” IEEE Transactions on Signal Processing, vol. 50, pp. 2553–2562, 2002. [97] R. U. Nabar, H. B¨ olcskei, and A. J. Paulraj, “Outage properties of space-time block codes in correlated Rayleigh or Ricean fading environments,” IEEE ICASSP, pp. 2381–2384, May 2002. [98] R. U. Nabar, H. B¨ olcskei, and A. J. Paulraj, “Diversity and outage performance in Ricean MIMO channels,” IEEE Transactions Wireless Communications, vol. 4, no. 5, pp. 2519–2532, September 2005. [99] A. Narula, M. J. Lopez, M. D. Trott, and G. W. Wornell, “Efficient use of side information in multiple-antenna data transmission over fading channels,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1423– 1436, October 1998. [100] A. Narula, M. J. Trott, and G. W. Wornell, “Performance limits of coded diversity methods for transmitter antenna arrays,” IEEE Transactions on Information Theory, vol. 45, no. 7, pp. 2418–2433, November 1999. [101] M. A. Nielsen, “Conditions for a class of entanglement transformations,” Physical Review Letters, vol. 83, pp. 436–439, 1999. ¨ [102] H. Ozcelik, Indoor MIMO Channel Models. PhD thesis, Technische Universit¨ at Wien, 2004. ¨ [103] H. Ozcelik, M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek, “Deficiencies of the kronecker MIMO radio channel model,” IEE Electronics Letters, vol. 39, no. 16, pp. 1209–1210, 2003. [104] D. P. Palomar, A unified framework for communications through MIMO channels. PhD thesis, Universitat Polit´ecnica de Catalunya, 2003. [105] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Joint TX-RX beamforming design for multicarrier MIMO channels: A unified framework for convex optimization,” IEEE Transactions on Signal Processing, vol. 51, no. 9, pp. 2381– 2401, September 2003. [106] D. P. Palomar, J. M. Cioffi, and M. A. Lagunas, “Uniform power allocation in MIMO channels: A game-theoretic approach,” IEEE Transactions on Information Theory, vol. 49, no. 7, pp. 1707–1727, July 2003. [107] D. P. Palomar and Y. Jiang, “MIMO transceiver design via majorization theory,” Foundations and Trends in Communications and Information Theory, vol. 3, no. 4–5, pp. 331–551, 2007. [108] D. P. Palomar and S. Verd´ u, “Gradient of mutual information in linear vector Gaussian channels,” IEEE Transactions on Information Theory, vol. 52, no. 1, pp. 141–154, January 2006.
150
References
[109] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communications. Cambridge University Press, 2003. [110] J. G. Proakis, Digital Communications. New York: McGraw-Hill, Forth ed., 2000. [111] F. Rashid-Farrokhi, K. J. R. Liu, and L. Tassiulas, “Transmit beamforming and power control for cellular wireless systems,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1437–1450, October 1998. [112] W. Rhee and J. M. Cioffi, “Ergodic capacity of multi-antenna Gaussian multiple-access channels,” Proceedings of the 35th Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 507–512, November 2001. [113] W. Rhee and J. M. Cioffi, “On the capacity of multiuser wireless systems with multiple antennas,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2580–2595, October 2003. [114] S. M. Robinson, “First order conditions for general nonlinear optimization,” SIAM Journal of Applied Mathematics, vol. 30, no. 4, pp. 597–607, June 1976. [115] A. Roger Hammons Jr and H. E. Gamal, “On the theory of space-time codes for PSK modulation,” IEEE Transactions on Information Theory, vol. 46, no. 2, pp. 524–542, March 2000. [116] M. Sagae and K. Tanabe, “Upper and lower bounds for the arithmeticgeometric-harmonic means of positive definite matrices,” Linear and Multilinear Algebra, vol. 37, pp. 279–282, 1994. [117] H. Sampath, P. Stoica, and A. Paulraj, “Generalized linear precoder and decoder design for MIMO channels using the weighted MMSE criterion,” IEEE Transactions on Communications, vol. 49, no. 12, pp. 2198–2206, December 2001. [118] S. Sandhu and A. J. Paulraj, “Space-time block codes: A capacity perspective,” IEEE Communications Letters, vol. 4, no. 12, pp. 384–386, December 2000. [119] H. Sato, “An outer bound to the capacity region of broadcast channels,” IEEE Transaction on Information Theory, vol. 24, no. 3, pp. 374–376, May 1978. [120] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath, “Optimal designs for space-time linear precoders and decoders,” IEEE Transactions on Signal Processing, vol. 50, no. 5, pp. 1051–1064, May 2002. [121] M. Schubert and H. Boche, “Solvability of coupled downlink beamforming problems,” Global Telecommunications Conference, vol. 1, pp. 614–618, November 2001. [122] M. Schubert and H. Boche, “QoS-based resource allocation and transceiver optimization,” in Foundations and Trends in Communications and Information Theory, vol. 2, Now publishers, 2005. [123] M. Schubert, S. Shi, and H. Boche, “Transceiver optimization for linear multiuser MIMO channels: Sum power minimization with per-user mmse requirements,” in Proceedings of EUSIPCO, 2006. [124] A. Sezgin, E. A. Jorswieck, and H. Boche, “Performance optimization of openloop MIMO systems with orthogonal space-time block codes,” IEEE Signal Processing Letters, vol. 14, no. 1, pp. 13–16, January 2007.
References
151
[125] A. Sezgin and T. J. Oechtering, “On the outage probability of quasi-orthogonal space-time codes,” in Proceedings of IEEE Information Theory Workshop 2004, San Antonio, TX, October 2004. [126] S. Shi and M. Schubert, “Mmse transmit optimization for multiuser multiantenna systems,” in Proceedings of IEEE ICASSP, 2005. [127] S. Shi, M. Schubert, and H. Boche, “Computational efficient transceiver optimization for multiuser MIMO systems: Power minimization with user-mmse requirements,” in Proceedings Asilomar, 2006. [128] S. Shi, M. Schubert, and H. Boche, “Downlink mmse transceiver optimization for multiuser MIMO systems: Duality and sum-mse minimization,” IEEE Transactions on Signal Processing, (will appear), 2007. [129] S. Shi, M. Schubert, E. A. Jorswieck, and H. Boche, “Downlink sum-MSE transceiver optimization for linear multi-user MIMO systems,” in Proceedings of Asilomar Conference on Signal, Systems, and Computers, 2005. [130] H. Shin and J. H. Lee, “Exact symbol error probability of orthogonal space-time block codes,” in Proceedings of IEEE Globecom, pp. 1547–1552, November 2002. [131] D. Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation and its effect on the capacity of multi-element antenna systems,” IEEE Transactions on Communications, vol. 48, no. 3, pp. 502–513, March 2000. [132] P. Stoica, Y. Jiang, and J. Li, “On MIMO channel capacity: An intuitive discussion,” IEEE Signal Proccessing Magazine, vol. 5, pp. 83–84, May 2005. [133] G. J. Sz´ekely and N. K. Barikov, “Extremal probabilities for Gaussian quadratic forms,” Probabilistic Theory Related Fields, vol. 126, pp. 184–202, April 2003. [134] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal design,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456–1467, July 1999. [135] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block coding for wireless communications: Performance results,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 3, pp. 451–460, March 1999. [136] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: Performance analysis and code construction,” IEEE Transaction on Information Theory, vol. 44, no. 3, pp. 774–765, March 1998. [137] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, November/ December 1999. [138] O. Tirkkonen and A. Hottinen, “Square-matrix embeddable space-time block codes for complex signal constellations,” IEEE Transactions on Information Theory, vol. 48, no. 2, pp. 1122–1126, February 2002. [139] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge University Press, 2005. [140] D. N. Tse and P. Viswanath, “Downlink-uplink duality and effective bandwidths,” IEEE International Symposium on Information Theory, p. 52, 2002.
152
References
[141] A. Tulino, A. Lozano, and S. Verd´ u, “Impact of antenna correlation on the capacity of multiantenna channels,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2491–2509, July 2005. [142] A. M. Tulino, A. Lozano, and S. Verd´ u, “Multiantenna channels: Capacity, coding and signal processing,” Bandwidth-Power Tradeoff of Multi-antenna Systems in the Low-Power Regime, DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, 2003. [143] S. Verd´ u, “Spectral efficiency in the wideband regime,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1319–1343, June 2002. [144] A. Vielmon, Y. Li, and J. R. Barry, “Performance of Alamouti transmit diversity over time-varying rayleigh-fading channels,” Proceedings of IEEE WCNC, 2005. [145] S. Vishwanath, S. Boyd, and A. Goldsmith, “Worst-case capacity of Gaussian vector channels,” in Proocedings of IEEE Canadian Workshop on Information Theory, 2003. [146] S. Vishwanath, N. Jindal, and A. Goldsmith, “On the capacity of multiple input multiple output broadcast channels,” Proceedings of International Conference on Communications, vol. 3, pp. 1444–1450, April 2002. [147] S. Vishwanath, N. Jindal, and A. Goldsmith, “Duality, achievable rates and sum-rate capacity of Gaussian MIMO broadcast channels,” IEEE Transactions on Information Theory, vol. 49, no. 10, pp. 2658–2668, October 2003. [148] S. Vishwanath, G. Kramer, S. Shamai (Shitz), S. Jafar, and A. Goldsmith, “Capacity bounds for Gaussian vector broadcast channels,” DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 2003. [149] E. Visotsky and U. Madhow, “Space-time transmit precoding with imperfect feedback,” IEEE Transactions on Information Theory, vol. 47, no. 6, pp. 2632– 2639, September 2001. [150] P. Viswanath, V. Anantharam, and D. N. C. Tse, “Optimal sequences, power control and user capacity of synchronous CDMA systems with linear MMSE multiuser receivers,” IEEE Transactions on Information Theory, vol. 45, no. 6, pp. 1968–1983, September 1999. [151] P. Viswanath, D. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Transactions on Information Theory, vol. 48, no. 6, pp. 1277– 1294, June 2002. [152] P. Viswanath and D. N. C. Tse, “Sum capacity of the vector Gaussian broadcast channel and uplink-downlink duality,” IEEE Transactions on Information Theory, vol. 49, no. 8, pp. 1912–1921, August 2003. [153] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “The Capacity Region of the Gaussian MIMO Broadcast Channel,” Proceedings of Conference on Computer Systems and Sciences, pp. 7–12, 2004. [154] K. Werner, M. Jansson, and P. Stoica, “On estimation of kronecker structured covariance matrices,” IEEE Transactions on Signal Processing, (will appear), 2006. [155] D. V. Widder, The Laplace Transform. Princeton University Press, 1941.
References
153
[156] E. P. Wigner and J. von Neumann, “Significance of l¨ owner’s theorem in the quantum theory of collisions,” Annals of Mathematics, vol. 59, pp. 418–433, 1954. [157] J. H. Winters, “The diversity gain of transmit diversity in wireless systems with Rayleigh fading,” IEEE Transactions on Vehicular Technology, vol. 47, no. 1, pp. 119–123, February 1998. [158] P. Wolniansky, G. Foschini, G. Golden, and R. Valenzuela, “V-BLAST: An architecture for realizing very high data rates over the rich-scattering wireless channel,” in Proceedings URSI International Symposium on Signals, Systems, and Electronics, (IEEE, New York, NY), pp. 295–300, 1998. [159] G. Wornell, “Linear diversity techniques for fading channels,” in Wireless Communications, Signal Processing Perspectives, (V. Poor and G. Wornell, eds.), Chapter 1, 1998. [160] K. Yu, Multiple-Input Multiple-Output Radio Propagation Channels: Characteristics and Models. PhD thesis, Royal Institute of Technology, Stockholm, Sweden, 2005. [161] W. Yu, “The structure of the worst-noise in Gaussian vector broadcast channels,” American Mathematical Society – Discrete Mathematics and Theoretical Computer Science (DIMACS) series on Network Information Theory, 2003. [162] W. Yu, “Uplink-downlink duality via minimax duality,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 361–374, February 2006. [163] W. Yu and J. M. Cioffi, “Sum capacity of gaussian vector broadcast channels,” will appear in IEEE Transactions on Information Theory, November 2004. [164] W. Yu, W. Rhee, S. Boyd, and J. M. Cioffi, “Iterative water-filling for Gaussian vector multiple-access channels,” IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 145–151, January 2004. [165] P. Zetterberg, Mobile Cellular Communications with Base Station Antenna Arrays: Spectrum Efficiency, Algorithms and Propagation Models. PhD thesis, Royal Institute of Technology, Stockholm, 1997. [166] X. Zhan, Matrix Inequalities, vol. 1790 of Lecture Notes in Mathematics. Springer-Verlag Berlin Heidelberg, 2002. [167] X. Zhang, E. Jorswieck, B. Ottersten, and A. Paulraj, “MSE based optimization of multiuser MIMO MAC with partial CSI,” in Proceedings of ASILOMAR CSSC, 2006. [168] X.-M. Zhang, “Optimization of Schur-convex functions,” Mathematical Inequalities and Applicaitons, vol. 1, no. 3, pp. 319–330, 1998. [169] S. Zhou and G. B. Giannakis, “Optimal transmitter eigen-beamforming and space-time block coding based on channel mean feedback,” IEEE Transactions on Signal Processing, vol. 50, pp. 2599–2613, October 2002. [170] S. Zhou and G. B. Giannakis, “Adaptive modulation for multi-antenna transmissions with channel mean feedback,” IEEE Transactions on Wireless Communications, vol. 3, no. 5, pp. 1626–1636, 2004.