Wireless Communications over MIMO Channels
Wireless Communications over MIMO Channels Applications to CDMA and Multiple Antenna Systems
Volker Kuhn ¨ Universit¨at Rostock, Germany
Copyright 2006
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone (+44) 1243 779777
Email (for orders and customer service enquiries):
[email protected] Visit our Home Page on www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to
[email protected], or faxed to (+44) 1243 770620. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The Publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809 John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, ONT, L5R 4J3, Canada Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN-13 978-0-470-02716-5 (HB) ISBN-10 0-470-02716-9 (HB) Typeset in 10/12pt Times by Laserwords Private Limited, Chennai, India. Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, England. This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.
Contents Preface
xi
Acknowledgements
xv
List of Abbreviations
xvii
List of Symbols
xxi
1 Introduction to Digital Communications 1.1 Basic System Model . . . . . . . . . . . . . . . . . 1.1.1 Introduction . . . . . . . . . . . . . . . . . . 1.1.2 Multiple Access Techniques . . . . . . . . . 1.1.3 Principle Structure of SISO Systems . . . . 1.2 Characteristics of Mobile Radio Channels . . . . . . 1.2.1 Equivalent Baseband Representation . . . . 1.2.2 Additive White Gaussian Noise . . . . . . . 1.2.3 Frequency-Selective Time-Variant Fading . . 1.2.4 Systems with Multiple Inputs and Outputs . 1.3 Signal Detection . . . . . . . . . . . . . . . . . . . 1.3.1 Optimal Decision Criteria . . . . . . . . . . 1.3.2 Error Probability for AWGN Channel . . . . 1.3.3 Error and Outage Probability for Flat Fading 1.3.4 Time-Discrete Matched Filter . . . . . . . . 1.4 Digital Linear Modulation . . . . . . . . . . . . . . 1.4.1 Introduction . . . . . . . . . . . . . . . . . . 1.4.2 Amplitude Shift Keying (ASK) . . . . . . . 1.4.3 Quadrature Amplitude Modulation (QAM) . 1.4.4 Phase Shift Keying (PSK) . . . . . . . . . . 1.5 Diversity . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 General Concept . . . . . . . . . . . . . . . 1.5.2 MRC for Independent Diversity Branches . 1.5.3 MRC for Correlated Diversity Branches . . 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
1 1 1 3 5 8 8 11 12 16 18 18 20 22 25 27 27 28 30 33 36 36 40 47 49
vi 2 Information Theory 2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . 2.1.1 Information, Redundancy, and Entropy . . . . . 2.1.2 Conditional, Joint and Mutual Information . . . 2.1.3 Extension for Continuous Signals . . . . . . . . 2.1.4 Extension for Vectors and Matrices . . . . . . . 2.2 Channel Coding Theorem for SISO Channels . . . . . . 2.2.1 Channel Capacity . . . . . . . . . . . . . . . . . 2.2.2 Cutoff Rate . . . . . . . . . . . . . . . . . . . . 2.2.3 Gallager Exponent . . . . . . . . . . . . . . . . 2.2.4 Capacity of the AWGN Channel . . . . . . . . 2.2.5 Capacity of Fading Channel . . . . . . . . . . . 2.2.6 Channel Capacity and Diversity . . . . . . . . . 2.3 Channel Capacity of MIMO Systems . . . . . . . . . . 2.4 Channel Capacity for Multiuser Communications . . . . 2.4.1 Single Antenna AWGN Channel . . . . . . . . 2.4.2 Single Antenna Flat Fading Channel . . . . . . 2.4.3 Multiple Antennas at Transmitter and Receiver . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
51 51 51 53 56 57 58 58 59 62 64 68 70 73 78 78 82 85 89
3 Forward Error Correction Coding 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Linear Block Codes . . . . . . . . . . . . . . . . . . . . . 3.2.1 Description by Matrices . . . . . . . . . . . . . . 3.2.2 Simple Parity Check and Repetition Codes . . . . 3.2.3 Hamming and Simplex Codes . . . . . . . . . . . 3.2.4 Hadamard Codes . . . . . . . . . . . . . . . . . . 3.2.5 Trellis Representation of Linear Block Codes . . 3.3 Convolutional Codes . . . . . . . . . . . . . . . . . . . . 3.3.1 Structure of Encoder . . . . . . . . . . . . . . . . 3.3.2 Graphical Description of Convolutional Codes . . 3.3.3 Puncturing Convolutional Codes . . . . . . . . . 3.3.4 ML Decoding with Viterbi Algorithm . . . . . . . 3.4 Soft-Output Decoding of Binary Codes . . . . . . . . . . 3.4.1 Log-Likelihood Ratios – A Measure of Reliability 3.4.2 General Approach for Soft-Output Decoding . . . 3.4.3 Soft-Output Decoding for Walsh Codes . . . . . . 3.4.4 BCJR Algorithm for Binary Block Codes . . . . 3.4.5 BCJR Algorithm for Binary Convolutional Codes 3.4.6 Implementation in Logarithmic Domain . . . . . 3.5 Performance Evaluation of Linear Codes . . . . . . . . . 3.5.1 Distance Properties of Codes . . . . . . . . . . . 3.5.2 Error Rate Performance of Codes . . . . . . . . . 3.5.3 Information Processing Characteristic . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
91 92 94 94 97 98 99 99 100 101 104 105 106 109 109 112 114 115 118 120 121 121 125 131
CONTENTS 3.6
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
135 135 137 141 146 153 160 160 165 167 169 171
4 Code Division Multiple Access 4.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Direct-Sequence Spread Spectrum . . . . . . . . . 4.1.2 Direct-Sequence CDMA . . . . . . . . . . . . . . 4.1.3 Single-User Matched Filter (SUMF) . . . . . . . 4.1.4 Spreading Codes . . . . . . . . . . . . . . . . . . 4.2 OFDM-CDMA . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Multicarrier Transmission . . . . . . . . . . . . . 4.2.2 Orthogonal Frequency Division Multiplexing . . . 4.2.3 Combining OFDM and CDMA . . . . . . . . . . 4.3 Low-Rate Channel Coding in CDMA Systems . . . . . . 4.3.1 Conventional Coding Scheme (CCS) . . . . . . . 4.3.2 Code-Spread Scheme (CSS) . . . . . . . . . . . . 4.3.3 Serially Concatenated Coding Scheme (SCCS) . . 4.3.4 Parallel Concatenated Coding Scheme (PCCS) . . 4.3.5 Influence of MUI on Coding Schemes . . . . . . 4.4 Uplink Capacity of CDMA Systems . . . . . . . . . . . . 4.4.1 Orthogonal Spreading Codes . . . . . . . . . . . 4.4.2 Random Spreading Codes and Optimum Receiver 4.4.3 Random Spreading Codes and Linear Receivers . 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
173 174 174 181 185 191 194 194 195 200 208 209 210 211 214 216 219 220 220 222 225
5 Multiuser Detection in CDMA Systems 5.1 Optimum Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Optimum Joint Sequence Detection . . . . . . . . . . . . . . . . . 5.1.2 Joint Preprocessing and Subsequent Separate Decoding . . . . . . 5.1.3 Turbo Detection with Joint Preprocessing and Separate Decoding . 5.2 Linear Multiuser Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Decorrelator (Zero-Forcing, ZF) . . . . . . . . . . . . . . . . . . . 5.2.2 Minimum Mean Squared Error Receiver (MMSE) . . . . . . . . . 5.2.3 Linear Parallel Interference Cancellation (PIC) . . . . . . . . . . . 5.2.4 Linear Successive Interference Cancellation (SIC) . . . . . . . . .
. . . . . . . . .
227 227 228 229 231 233 233 236 240 243
3.7
3.8
Concatenated Codes . . . . . . . . . . . . . . . . . . . 3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.6.2 Performance Analysis for Serial Concatenation . 3.6.3 Performance Analysis for Parallel Concatenation 3.6.4 Turbo Decoding of Concatenated Codes . . . . 3.6.5 EXIT Charts Analysis of Turbo Decoding . . . Low-Density Parity Check (LDPC) Codes . . . . . . . 3.7.1 Basic Definitions and Encoding . . . . . . . . . 3.7.2 Graphical Description . . . . . . . . . . . . . . 3.7.3 Decoding of LDPC Codes . . . . . . . . . . . . 3.7.4 Performance of LDPC Codes . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . .
vii
viii
CONTENTS 5.3
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
245 245 247 253 258 258 258 268 270 273
6 Multiple Antenna Systems 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Spatial Diversity Concepts . . . . . . . . . . . . . . . . . . . 6.2.1 Receive Diversity . . . . . . . . . . . . . . . . . . . 6.2.2 Performance Analysis of Space–Time Codes . . . . . 6.2.3 Orthogonal Space–Time Block Codes . . . . . . . . . 6.2.4 Space–Time Trellis Codes . . . . . . . . . . . . . . . 6.3 Multilayer Transmission . . . . . . . . . . . . . . . . . . . . 6.3.1 Channel Knowledge at the Transmitter and Receiver 6.3.2 Channel Knowledge only at the Receiver . . . . . . . 6.3.3 Performance of Multilayer Detection Schemes . . . . 6.3.4 Lattice Reduction-Aided Detection . . . . . . . . . . 6.4 Linear Dispersion Codes . . . . . . . . . . . . . . . . . . . . 6.4.1 LD Description of Alamouti’s Scheme . . . . . . . . 6.4.2 LD Description of Multilayer Transmissions . . . . . 6.4.3 LD Description of Beamforming . . . . . . . . . . . 6.4.4 Optimizing Linear Dispersion Codes . . . . . . . . . 6.4.5 Detection of Linear Dispersion Codes . . . . . . . . . 6.5 Information Theoretic Analysis . . . . . . . . . . . . . . . . 6.5.1 Uncorrelated MIMO Channels . . . . . . . . . . . . . 6.5.2 Correlated MIMO Channels . . . . . . . . . . . . . . 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
275 275 277 277 279 282 293 304 304 306 308 312 319 320 321 321 322 323 323 323 325 328
Appendix A Channel Models A.1 Equivalent Baseband Representation . . . . . . . . . . . . . . . . . . . . . . A.2 Typical Propagation Profiles for Outdoor Mobile Radio Channels . . . . . . A.3 Moment-Generating Function for Ricean Fading . . . . . . . . . . . . . . .
329 329 330 331
5.4
5.5
Nonlinear Iterative Multiuser Detection . . . . . . . . 5.3.1 Nonlinear Devices . . . . . . . . . . . . . . . 5.3.2 Uncoded Nonlinear Interference Cancellation . 5.3.3 Nonlinear Coded Interference Cancellation . . Combining Linear MUD and Nonlinear SIC . . . . . 5.4.1 BLAST-like Detection . . . . . . . . . . . . . 5.4.2 QL Decomposition for Zero-Forcing Solution 5.4.3 QL Decomposition for MMSE Solution . . . 5.4.4 Turbo Processing . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Appendix B Derivations for Information Theory 333 B.1 Chain Rule for Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 B.2 Chain Rule for Information . . . . . . . . . . . . . . . . . . . . . . . . . . 333 B.3 Data-Processing Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Appendix C Linear Algebra 335 C.1 Selected Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
CONTENTS
ix
C.2 Householder Reflections and Givens Rotation . . . . . . . . . . . . . . . . . 341 C.3 LLL Lattice Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Bibliography
347
Index
359
Preface Motivation Mobile radio communications are evolving from pure telephony systems to multimedia platforms offering a variety of services ranging from simple file transfers and audio and video streaming, to interactive applications and positioning tasks. Naturally, these services have different constraints concerning data rate, delay, and reliability (quality-of-service (QoS)). Hence, future mobile radio systems have to provide a large flexibility and scalability to match these heterogeneous requirements. Additionally, bandwidth has become an extremely valuable resource emphasizing the need for transmission schemes with high spectral efficiency. To cope with these challenges, three key areas have been the focus of research in the last decade and are addressed in this book: Code division multiple access (CDMA), multiple antenna systems, and strong error control coding. CDMA was chosen as a multiple access scheme in third generation mobile radio systems such as the universal mobile telecommunication system (UMTS) and CDMA 2000. The main ingredient of CDMA systems is the inherent spectral spreading that allows a certain coexistence with narrow band systems. Owing to the large bandwidth, it generally provides a higher diversity degree and thus a better link reliability. Compared to second generation mobile radio systems, the third generation offers increased flexibility like different and much higher data rates as required for the large variety of services. The frequency reuse factor in such cellular networks allows neighboring cells to operate at the same frequency, leading to a more efficient use of the resource frequency. Moreover, this allows simpler soft handover compared to the ‘break before make’ strategy in global system for mobile communication (GSM) when mobile subscribers change the serving cell. The main drawback of CDMA systems is the multiuser interference requiring appropriate detection algorithms at the receiver. Multiple antenna systems represent the second major research area. Owing to their high potential in improving the system efficiency they have already found their way into several standards. On one hand, multiple antennas at the receiver and transmitter allow the transmission of several spatially separated data streams. For point-to-point communications, this is termed space division multiplexing (SDM), and in multiuser scenarios, it is called space division multiple access (SDMA). In both the scenarios, the system’s spectral efficiency can be remarkably increased compared to the single antenna case. On the other hand, the link reliability can be improved by beamforming and diverse techniques. As a third research area, powerful channel coding like concatenated codes or low-density parity check codes allows efficient communications in the vicinity of Shannon’s channel
xii
PREFACE
capacity. This leads to a power-efficient transmission that is of particular interest concerning the battery lifetime in mobile equipment and the discussion about the electromagnetic exposition. Certainly, all mentioned areas have to be jointly considered and should be incorporated into third generation mobile radio systems and beyond. Owing to the influence of the mobile radio channel, a power- and bandwidth-efficient transmission can be obtained only with appropriate signal processing either at the transmitter or the receiver. Assuming channel knowledge at the transmitter, a preequalization of the channel-like Tomlinson-Harashima Precoding allows very simple receiver structures. Derivatives are also applicable in multiuser downlink scenarios where a common base station can coordinate all transmissions and avoid interference prior to transmission. Even without channel knowledge at the transmitter, space–time coding schemes allow the full exploitation of diversity for multiple transmit antennas and flat fading channels with a simple matched filter at the receiver. All these techniques require joint preprocessing at a common transmitter, that is, a coordinated transmission has to be implemented and provide the advantage of a very simple receiver structure. On the contrary, the generally asynchronous multiuser uplink consists of spatially separated noncooperating transmitters and a common powerful base station. In this scenario, a joint preprocessing is not possible and the receiver has to take over the part of jointly processing the signals. The same situation occurs when multiple antennas are used for spatial multiplexing without channel knowledge at the transmitter. At first sight, such a multiple antenna system seems to be quite different from the CDMA uplink. However, the mathematical description using vector notation illustrates their similarity. Hence, the common task of receivers in both cases is to separate and recover the interfering signals so that the same detection algorithms can be used. The aim of this book is to explain the principles and main advances of the three research areas mentioned above. Moreover, the similarity between the SDM and the CDMA uplink is illustrated. Therefore, a unified description using vector notations and comprising multiple antenna as well as CDMA systems is presented. This model can be generalized to arbitrary vector channels, that is, channels with multiple inputs and outputs. It is used to derive efficient detection algorithms whose error rate performances are compared.
Structure of Book Chapter 1: Introduction to Digital Communications The book starts with an introduction to digital communication systems. Since the mobile radio channel dominates the design of these systems, its statistical properties are analyzed and appropriate models for frequency selective channels with single as well as multiple inputs and outputs are presented. Afterwards, the basic principles of signal detection and some general expressions for the error rate performance are derived. These results are used in the next section to determine the performance of linear modulation schemes for different channel models. Finally, the principle of diversity is generally discussed and the effects are illustrated with numerical results. They are used in subsequent chapters in which frequency diversity in CDMA systems and space diversity in multiple antenna systems are explained.
PREFACE
xiii
Chapter 2: Information Theory Chapter 2 deals with the information theoretical analysis of mobile radio systems. Starting with some basic definitions, capacities of the additive white Gaussian noise (AWGN) channel and fading channels are derived. In particular, the difference between ergodic and outage capacity is discussed. The next section derives the capacity of multiple-input multiple-output (MIMO) systems in a general way without delivering specific results. They are presented for CDMA and SDMA systems in Chapters 4 and 6. The chapter closes with a short summary on theoretic survey of multiuser communications.
Chapter 3: Forward Error Correction Coding The third chapter gives a short survey of selected channel coding topics that become relevant in subsequent chapters. Starting with a basic description of linear block and convolutional codes, soft-output decoding algorithms representing an essential ingredient in concatenated coding schemes are derived. Next, low-density parity check codes are briefly explained and the general performance of codes is evaluated. On one hand, the error rate performance is analyzed by the union bound technique, exploiting the distance properties of codes. On the other hand, the information processing characteristic is based on information theory and allows a comparison with ideal coding schemes. Finally, concatenated codes are considered, including turbo decoding whose analysis is based on EXtrinsic information transfer (EXIT) charts.
Chapter 4: Code Division Multiple Access The multiple access scheme CDMA is described in Chapter 4. Besides single-carrier CDMA with the Rake receiver, multicarrier CDMA with different despreading or equalization techniques is also considered. Moreover, the basic differences between uplink and downlink are explained and some examples for spreading sequences are presented. Next, the high performance of low rate coding, exploiting the inherent spreading in CDMA systems is demonstrated. The chapter ends with an information theoretical analysis of the CDMA uplink with random spreading by picking up the general results from Chapter 2.
Chapter 5: Multiuser Detection in CDMA Systems While the fourth chapter is mainly restricted to single-user matched filters, Chapter 5 considers multiuser detection strategies for the CDMA uplink. After sketching the optimum detectors, low-cost linear detectors as well as nonlinear multistage detectors including turbo processing with channel decoding are derived. The chapter closes with a discussion on the combination of linear preprocessing and nonlinear interference cancellation based on the QL decomposition of the mixing matrix.
Chapter 6: Multiple Antenna Systems Chapter 6 covers several topics related to point-to-point communications with multiple antennas. It starts with diversity concepts such as receive diversity and space-time coding. Next, the principle of spatial multiplexing is explained. Besides the detection algorithms
xiv
PREFACE
already described in Chapter 5, a new approach based on the lattice reduction is presented showing a performance that is close to the optimum maximum likelihood detector. A unified description is provided by the linear dispersion codes addressed in Section 6.4. Finally, a brief information theoretical analysis of multiple antenna systems is presented.
Appendices In Appendix A, some basic derivations concerning the equivalent baseband representation and the moment generating function of Rice fading are presented. Furthermore, it contains tables with frequently used channel models. Appendix B proves the chain rules for entropy and information, as well as the data processing theorem. Finally, Appendix C presents some basics of linear algebra, Householder reflection, and Givens rotation, as well as the Lenstra, Lenstra and Lov´asz (LLL) algorithm for the lattice reduction used in Chapter 6.
Notational Remarks In order to avoid confusion, some notational remarks should be made. Real and imaginary parts of a signal x(t) are denoted by x (t) and x (t), respectively. To distinguish timecontinuous signals and their sampled time-discrete versions, square brackets are used in the time-discrete case leading to x[k] = x(kTs ) with Ts as sampling interval. Moreover, X represents a stochastic process while x[k] represents a corresponding sampling function. Hence, probability mass functions of continuous processes are denoted by pX (x). The conditional probability mass function pX|d (x) considers the process X , given a fixed hypothesis d so that it is a function of only a single variable x. Multivariate processes comprising several random variables X1 · · · Xn are denoted by X . Column vectors, row vectors, and matrices are distinguished by x, x, and X, respectively. A set X contains all the possible values a signal x[k] can take, that is, x[k] ∈ X holds. It can be either an infinite set like Z, R, or C, representing the sets of all integers, real numbers, or complex numbers, respectively, or a finite set like X consisting generally of N symbols {X0 , . . . , XN−1 }. Finally, log denotes the natural logarithm while other bases are explicitly mentioned.
Acknowledgements This book was written during my time at the Department of Communications Engineering at the University of Bremen and basically comprises the results of my research. Certainly, it could not have been written without the support and patience of many people. Therefore, I am obliged to everyone who assisted me during that time. In particular, I am obliged to Professor Kammeyer for all the valuable advice, encouragement, and discussions. The opportunity to work in his department was a precious experience. I would also like to acknowledge the proofreading by Ralf Seeger, Sven Vogeler, Peter Klenner, Petra Weitkemper, Ansgar Scherb, J¨urgen Rinas, and Martin Feuers¨anger. Special thanks are due to Dirk W¨ubben and Ronald B¨ohnke for my fruitful discussions with them and their valuable hints. Of great benefit was also the joint work with Armin Dekorsy in the area of low rate coding for CDMA. Finally, I would like to thank my wife Claudia and my children Erik and Jana for their infinite patience and trust.
Volker K¨uhn Bremen, Germany
List of Abbreviations 3GPP
Third Generation Project Partnership
ARQ ASK AWGN
Automatic Repeat on Request Amplitude Shift Keying Additive White Gaussian Noise
BCJR BER BLAST BPSK BSC
Bahl, Cocke, Jelinek, Raviv Bit Error Rate Bell Labs Layered Space Time Binary Phase Shift Keying Binary Symmetric Channel
CCS CDMA COST CSI CSS
Conventional Coding Scheme Code Division Multiple Access European Cooperation in the field of Scientific and Technical Research Channel State Information Code-Spread System
DAB D-BLAST DFT DMT DS-CDMA DSL DVB
Digital Audio Broadcast Diagonal BLAST Discrete Fourier Transform Discrete Multi-Tone Direct-Sequence CDMA Digital Subscriber Line Digital Video Broadcast
EGC EXIT
Equal Gain Combining EXtrinsic Information Transfer
FDD FDMA FEC FER FFT
Frequency Division Duplex Frequency Division Multiple Access Forward Error Correction Frame Error Rate Fast Fourier Transform
xviii
LIST OF ABBREVIATIONS
FHT FIR
Fast Hadamard Transform Finite Impulse Response
GF GSM
Galois Field Global System for Mobile communications
HSDPA
High Speed Downlink Packet Access
ICI IDFT IEEE IFFT i.i.d. IIR IOWEF IPC ISI
Inter-Carrier Interference Inverse Discrete Fourier Transform Institute of Electrical and Electronic Engineers Inverse Fast Fourier Transform independent identically distributed Infinite Impulse Response Input Output Weight Enumerating Function Information Processing Characteristic InterSymbol Interference
LD LDPC LLL LLR LoS LR LSB
Linear Dispersion Low-Density Parity Check Lenstra, Lenstra and Lov´asz Log-Likelihood Ratio Line of Sight Lattice Reduction Least Significant Bit
MAI MAP MC MF MGF MIMO MISO ML MLD MMSE MRC MSB MUD MUI
Multiple Access Interference Maximum A Posteriori Multi-Carrier Matched Filter Moment-Generating Function Multiple Input Multiple Output Multiple Input Single Output Maximum Likelihood Maximum Likelihood Decoding Minimum Mean Square Error Maximum Ratio Combining Most Significant Bit Multi-User Detection Multi-User Interference
NSC
Nonrecursive Nonsystematic Convolutional
OFDM ORC
Orthogonal Frequency Division Multiplexing Orthogonal Restoring Combining
LIST OF ABBREVIATIONS PAM PCCS PIC PSA PSK
Pulse Amplitude Modulation Parallel Concatenated Coding Scheme Parallel Interference Cancellation Post-Sorting Algorithm Phase Shift Keying
QAM QLD QoS QPSK
Quadrature Amplitude Modulation QL Decomposition Quality of Service Quaternary Phase Shift Keying
RSC
Recursive Systematic Convolutional
SCCS SDMA SIC SIMO SINR SISO SNR SPC SQLD STBC STC STTC SUMF SUB SVD
Serially Concatenated Coding Scheme Space Division Multiple Access Successive Interference Cancellation Single Input Multiple Output Signal to Interference plus Noise Ratio Single Input Single Output Signal-to-Noise Ratio Single Parity Check Code Sorted QL Decomposition Space–Time Block Code Space–Time Code Space–Time Trellis Code Single-User Matched Filter Single-User Bound Singular Value Decomposition
TDMA TDD
Time Division Multiple Access Time Division Duplex
UEP UMTS
Unequal Error Protection Universal Mobile Telecommunications System
V-BLAST
Vertical BLAST
WSSUS WLAN
Wide Sense Stationary Uncorrelated Scattering Wireless Local Area Network
ZF
Zero Forcing
xix
List of Symbols au [] au A(D) A(W, D)
modulation symbols of user u at time instance vector containing modulation symbols of user u Distance Spectrum of convolutional code Input Output Weight Enumerating Function of convolutional code
bu [] bu Bc Bd β
code bit of user u at time instant vector containing code bits of user u coherence bandwidth of channel Doppler bandwidth of channel load of CDMA system
c[k, ] cu C Cout
chip of spreading code at time instance for kth symbol vector containing CDMA spreading sequence of user u channel capacity outage capacity
D du [i] du df dmin dH (a, b) Dµ Dµ Dµ,ν 2µ,ν
diversity degree information bit of user u at time instant i vector containing information bits of user u free distance of convolutional code minimum Hamming distance of a code Hamming distance between two vectors a and b set of received symbols that are closest to transmit symbol Xµ set complementary to Dµ set of received symbols that are closer to Xν than to Xµ squared Euclidean distance between symbols Xµ and Xν
EX {·} Eb EG (Rc ) Es E0 (·, ·) erfc(·)
expectation with respect to process X average energy per information bit Gallager exponent average energy per transmitted symbol Gallager function complementary error function
xxii
LIST OF SYMBOLS
f0 fd
carrier frequency Doppler frequency
gc gd Gp gR (t) GR (j ω) gT (t) GT (j ω) gµ (D) γ [k] γ¯
coding gain of space–time coding schemes diversity gain of space–time coding schemes processing gain of CDMA system impulse shaping filter at receiver spectrum of impulse shaping filter at receiver impulse shaping filter at transmitter spectrum impulse shaping filter at transmitter µth generator polynomial of convolutional or spreading code signal-to-noise ratio at receive antenna and time instant k code average signal-to-noise ratio
H[k] H Hred H[k, µ]
η
matrix of MIMO channel at time instant k extended matrix of MIMO channel reduced matrix of MIMO channel submatrix of frequency selective MIMO channel corresponding to delay µ at time instant k channel coefficient between transmitter ξ and receiver ρ corresponding to delay µ at time instant k spectral efficiency of CDMA system
Im {·} I (Xµ ) I¯(X ) I¯(X , Y) I¯(X ; Y) I0 (·) I¯(X | Y) I¯diff (X )
imaginary part of a complex number information of event or symbol Xµ entropy of process X joint entropy of processes X and Y mutual information between processes X and Y zero-th order modified Bessel function of first kind uncertainty of X if Y is totally known differential entropy of process X
K κ
Rice factor: power ratio between line-of-sight and scattered components discrete delay
L(x) La (x) Le (x) La Lc Lp Lπ Lt L
log-likelihood ratio of variable x a priori log-likelihood ratio of variable x extrinsic log-likelihood ratio of variable x length of vector a constraint length of convolutional code period of puncturing length of interleaver length of channel impulse response hu [k] lower triangular matrix from QL decomposition
hρ,ξ [k, µ]
LIST OF SYMBOLS λµ
µth eigenvalue of a square matrix diagonal matrix containing eigenvalues λµ
MX (s) m M
moment generating function of process X number of bits per symbol size of modulation alphabet
nν [k] NI NO NR Ns NT Nu N0
additive noise at receive antenna ν and time instant k number of inputs of MIMO system number of outputs of MIMO system number of receive antennas spreading factor of CDMA system number of transmit antennas number of active users power spectral density of complex noise in equivalent baseband
ω0
angular carrier frequency
pX (x) u P Pb Pd Ps Pt Pw Pout XX (f ) φHH (t, τ ) HH (fD , τ ) HH (fD ) HH (τ ) XX Pr{·}
probability density function of process X interleaver for user u n × Lp puncturing matrix bit error probability pairwise error probability for code words with Hamming distance d symbol error probability target error probability word error probability (probability of decoding or detection failure) outage probability for fading channels power spectral density of process X auto correlation function of stochastic channel H Scattering function of stochastic channel H Doppler power spectrum of stochastic channel H Power delay profile of stochastic channel H covariance matrix of multivariate process X probability of certain event
Q(·) Q
quantization function unitary matrix from QL decomposition
R0 Rc Rccc Rcrc Rcwh r[] Re {·}
cutoff rate overall code rate code rate of convolutional code code rate of repetition code code rate of Walsh-Hadamard code matched filter output at time instance real part of a complex number
xxiii
xxiv
LIST OF SYMBOLS
s[k, ] S[] σN2 σµ S
signature at time instance for kth symbol system matrix of CDMA system at time instance average power of noise singular value of a certain matrix diagonal matrix containing singular values σµ of matrix S
Tb Tc Ts tc τmax
duration of one information bit duration of one chip in spread spectrum systems symbol duration coherence time of channel maximal delay of channel impulse response
US
unitary matrix whose columns are the eigenvectors of SSH
VS
unitary matrix whose columns are the eigenvectors of SH S
wH (a) WZF WMMSE
Hamming weight of vector a ZF filter matrix (decorrelator) MMSE filter matrix
xBP (t) xu [k] xu X
time-continuous transmitted signal in bandpass domain transmitted symbol of user u at time instant k transmitted signal vector of user u (discrete) alphabet of transmitter consisting of symbols Xµ
yν [k] yu Y
received symbol at antenna ν and time instant k received signal vector of user u alphabet at receiver consisting of symbols Yν
∗ det(A) x x x∗ xH xˆ xT x2 XF X† X X
convolution determinant of matrix A real part of signal x imaginary part of signal x conjugate complex of signal x Hermitian of a vector x estimate of signal x transposed of a vector x norm of vector x (sum of squared magnitudes of elements) Frobenius norm of matrix X (see Appendix 1) Moore-Penrose pseudo inverse of matrix X scalar processes are written by calligraphic letters multivariate processes are written by underlined calligraphic letters
1
Introduction to Digital Communications This first chapter introduces some of the basics of digital communications that are needed in subsequent chapters. Section 1.1 starts with a brief introduction of fundamental multiple access techniques and the general structure of communication systems. Some of the most important parts of the systems are discussed in more detail later in the chapter. Section 1.2 addresses the mobile radio channel with its statistical properties and the way of modeling it. Some analysis concerning signal detection and the theoretical performance of linear schemes for different transmission channels are presented in Sections 1.3 and 1.4. Finally, Section 1.5 explains the principle of diversity and delivers basic results for outage and ergodic error probabilities.
1.1 Basic System Model 1.1.1 Introduction Vector channels, or multiple-input multiple-output (MIMO) channels, represent a very general description for a wide range of applications. They incorporate SISO (Single-Input Single-Output), MISO (Multiple-Input Single-Output) and SIMO (Single-Input MultipleOutput) channels as special cases. Often, MIMO channels are only associated with multiple antenna systems. However, they are not restricted to this case but can be used in a much broader context, for example, for any kind of multiuser communication. Therefore, the aim of this work is to study MIMO systems for two specific examples, namely Code Division Multiple Access (CDMA) and multiple antenna systems. Besides a unified description using vector notations, detection algorithms are derived that make the similarity of both systems obvious. Figure 1.1 illustrates the considered scenario in a very abstract form. Generally, a common channel that may represent a single cell in a cellular network is accessed by NI inputs and NO outputs. In this context, the term channel is not limited to the physical transmission Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
2
INTRODUCTION TO DIGITAL COMMUNICATIONS
1 1 2
2 channel
NI
NO Figure 1.1 Principle of multiple access to a common channel
medium, that is, the radio channel, but has a more general meaning and also incorporates parts of a digital communication system. The boundary between the transmitter and the receiver on the one side, and the channel on the other side is not strict and depends on the considered scenario. Detailed information about specific descriptions can be found in subsequent chapters. Principally, single-user and multiuser communications are distinguished. In the singleuser case, the multiple inputs and outputs of a vector channel may correspond to different transmit and receive antennas, carrier frequencies, or time slots. Due to the fact that the data stems from a single user, intelligent signaling at the transmitter can be performed, for example, for efficient receiver implementations. In Chapter 6, multiple antenna systems will be used in several ways, depending on the channel characteristics and the system requirements. Multiple antennas can be employed for increasing the system’s diversity degree (Alamouti 1998; Naguib et al. 1998; Seshadri and Winters 1994; Tarokh et al. 1999a) and, therefore, enhance the link performance. The link reliability can also be improved by beamforming, which enlarges the signal to noise ratio (SNR). Alternatively, several data streams can be multiplexed over spatially separated channels in order to multiply the data rate without increasing bandwidth (Foschini 1996; Foschini and Gans 1998; Golden et al. 1998). On the contrary, the NI inputs and NO outputs may correspond to Nu independent user signals in the multiuser case.1 With reference to conventional mobile radio communications, a central base station coordinates the transmissions of uplink and downlink, that is, from mobile subscribers to the base station and vice versa, respectively. In the downlink, a synchronous transmission can be easily established because all the signals originate from the same transmitter. Moreover, knowledge about interactions between different users and their propagation conditions can be exploited at the base station. This allows sophisticated signaling in order to avoid mutual interference and to distribute the total transmit power onto different users efficiently. In many cases, intelligent signaling at the base station coincides with low complexity receivers, being very important for mobile terminals with limited battery power. However, optimal signaling strategies 1 In multiuser MIMO scenarios, transmitters and receivers are equipped with multiple antennas. These systems are a part of current research and possess an extremely high degree of freedom. They will be briefly introduced in Chapter 6.
INTRODUCTION TO DIGITAL COMMUNICATIONS
3
for the downlink is only known for a few special cases and is still subject to intensive research. Establishing synchronous transmissions in the uplink requires high efforts because mobile subscribers transmit more or less independently. They only communicate with the base station and not among themselves. In most practical cases, they have no information about their influence on other users so that mutual interference cannot be avoided prior to transmission. Hence, sophisticated detection algorithms have to be deployed at the base station to compensate for this drawback.
1.1.2 Multiple Access Techniques Looking at the transmission of multiple data streams sharing a common medium, their separation is managed by multiplexing techniques in single-user scenarios or multiple access techniques in multiuser communications. In order to ensure reliable communication, many systems try to avoid interference by choosing orthogonal access schemes so that no multiple access interference (MAI) disturbs the transmission. However, in most cases, orthogonality cannot be maintained due to the influence of the mobile radio channel. The next subsections introduce the most important multiplexing and multiple access strategies. Time Division Multiplexing (TDM) and Multiple Access (TDMA) This relatively common multiple access technique divides the time axis into different time slots, each of length Tslot according to Figure 1.2. Each data packet or burst is assigned to a certain slot, whereby a user can also occupy several slots. A defined number Nslot of slots build a frame that is periodically repeated. Hence, each user has periodical access to the shared medium. Due to the influence of the transmission channel (cf. Section 1.2) and the restrictions of practical filter design, guard intervals of length T have to be inserted between successive slots in order to avoid interference between them. Within these intervals, no information is transmitted so that they represent redundancy and reduce the spectral efficiency of the communication system. Frequency Division Multiplexing (FDM) and Multiple Access (FDMA) Alternatively, the frequency axis can be divided into Nf subbands each of width B as illustrated in Figure 1.3. The data streams are now distributed on different frequency bands, f
1
3
2 T
Nslot Tslot
1 t
Figure 1.2 Principle of time division multiple access
4
INTRODUCTION TO DIGITAL COMMUNICATIONS
Figure 1.3 Principle of frequency division multiple access
rather than on different time slots. However, in mobile environments, the signals’ bandwidths are spread by the Doppler effect, so that neighboring subbands interfere. Thus, gaps of an appropriate width f combating this effect at the expense of a reduced spectral efficiency are required for Frequency division multiple access (FDMA). Code Division Multiplexing (CDM) and Multiple Access (CDMA) In contrast to both the preceding schemes, CDMA allows simultaneous access on the channel in the same frequency range. The basic principle is to spectrally spread the data streams with specific sequences called spreading codes (Spread Spectrum technique). The signals can be distinguished by assigning them individual spreading codes. This opens a third dimension, as can be seen in Figure 1.4. One intuitive choice would lead to orthogonal codes, ensuring a parallel transmission of different user signals. However, the transmission channel generally destroys the orthogonality and multiuser interference (MUI) becomes a limiting factor concerning spectral efficiency (cf. Chapters 4 and 5). Space Division Multiplexing (SDM) and Multiple Access (SDMA) The fourth access scheme exploits the resource space (Figure 1.4 and Figure 1.5). Spatially separated data streams can simultaneously access the channel in the same frequency band,
Figure 1.4 Principle of code and space division multiple access
INTRODUCTION TO DIGITAL COMMUNICATIONS
3
2
5
4
1 Nu
Figure 1.5 Principle of space division multiple access
provided that the locations of transmit and receive antennas are appropriately chosen. In mobile environments, this requirement is sometimes difficult to fulfill, because users are changing their position during the connection. Therefore, quasi-static scenarios or combinations with the aforementioned access techniques are often considered. Mutual interference is also likely to occur in Space division multiple access (SDMA) systems because the transmitter and the receiver have no perfect channel knowledge of what would be necessary to totally avoid interference. As expected, all the mentioned access schemes can be combined. The well-known Global System for Mobile (GSM) Communications and Digital Cellular System (DCS)1800 standards both combine Time division multiple access (TDMA) and FDMA. In UMTS (Universal Mobile Telecommunications System) or IMT-2000 (International Mobile Communications) systems, CDMA is used in connection with TDMA and FDMA (Dahlman et al. 1998; Ojanper¨a and Prasad 1998b; Toskala et al. 1998). While TDMA, FDMA, and CDMA have already been used for a fairly long time, SDMA is rather recent in comparison. This development is a result of the demand to use licenses that are assigned to certain frequency bands as efficiently as possible. Hence, all the resources have to be exploited for reaching this goal.
1.1.3 Principle Structure of SISO Systems Since the behavior and the properties of a MIMO system vastly depend on the characteristics of the underlying SISO systems between each pair of transmit and receive antennas, this subsection describes their principle structure. Figure 1.6 shows a simplified time-discrete block diagram of a SISO communication link. The time-discreteness is expressed by square brackets and the time indices i, , and k indicate different symbol rates of the corresponding signals. Neglecting a lot of the fundamental components of practical systems like source coding, analog-to-digital conversion and so on, the transmitter consists of three blocks that are of special interest here: a forward error correction (FEC) encoder, an interleaver and a signal mapper. Due to our focus on digital communications, the inputs and the outputs of an FEC encoder and an interleaver are binary, while the output of the signal mapper depends on the type of modulation and can take on symbols out of an M-ary alphabet X. Conventionally, M = 2m is a power of two. The receiver is comprised of the corresponding counterparts of the above-mentioned blocks in reverse order. The first
6
INTRODUCTION TO DIGITAL COMMUNICATIONS d[i]
FEC encoder
b[]
signal mapper
x[k]
time-discrete channel
ˆ d[i]
FEC decoder
r[] −1
SP/ demapper
y[k]
Figure 1.6 Principle structure of digital communication systems
block performs some kind of signal processing (SP) – depending on the channel – and the demapping. It is followed by a de-interleaver −1 and an FEC decoder delivering estimates ˆ of the transmitted information bits. d[i] All the blocks mentioned thus far will be subsequently described in more detail. However, some remarks on the time-discrete channel depicted in Figure 1.6 are necessary at this point. In order to simplify the description and to concentrate on the main focus of this work, all time-continuous components of the modulator and the demodulator are declared as parts of a time-discrete channel model generally described in the equivalent baseband (cf. Section 1.2). Therefore, the only parts of the modulator and the demodulator that appear separately in Figure 1.6 are the signal mapper and the demapper. These assumptions coincide with a citation of Massey (1984): ‘the purpose of the modulation system is to create a good discrete channel from the modulator input to the demodulator output, and the purpose of the coding system is to transmit the information bits reliably through this discrete channel at the highest practicable rate.’ However, it is not always easy to strictly separate both devices (e.g. for coded modulation (Biglieri et al. 1991; Ungerboeck 1982)). Interleaving Interleaving plays an important role in many digital communication systems for manifold reasons. In the context of mobile radio communications, fading channels often lead to bursty errors, that is, several successive symbols may be corrupted by deep fades. This has a crucial impact on the decoding performance, for example, of convolutional codes because of its sensitivity to bursty errors (compare the decoding of convolutional codes in Chapter 3). In order to overcome this difficulty, interleaving is applied. At the transmitter, an interleaver simply permutes the data stream b[] in a specified manner, so that the symbols are transmitted in a different order. Consequently, a de-interleaver has to be employed at the receiver, in order to reorder the symbols back into the original succession. Moreover, we will see in Section 3 that interleaving is also employed in concatenated coding schemes.
INTRODUCTION TO DIGITAL COMMUNICATIONS b[]
7
write ˜ b[]
b[0] b[3] b[6] b[9] read b[1] b[4] b[7] b[10] b[2] b[5] b[8] b[11]
Figure 1.7 Structure of block interleaver of length Lπ = 12 Block interleaver There are several types of interleaving. The simplest one is termed block interleaver, which divides a sequence into blocks of length Lπ . The symbols b[] within each block are then permuted by writing them column-wise into an array consisting of Lrow rows and Lcol columns, and reading them row by row. An example with Lrow = 3 and Lcol = 4 is shown in Figure 1.7. The input sequence b[0], b[1], . . . b[11] develops into the following due to interleaving b[0], b[3], b[6], b[9], b[1], b[4], b[7] b[10], b[2], b[5], b[8], b[11]. It is recognized that there is a spacing between the originally successive symbols of LI = 4. This gap is called interleaving depth. The optimum number of rows and columns and, therefore, the interleaving depth depends on several factors that are discussed in subsequent chapters. Convolutional interleaving For the sake of completeness, convolutional interleaving should be mentioned here. It provides the same interleaving depth as block interleaving, but with lower delays and lesser memory. However, since this interleaver is not addressed later in the chapter, further details are not discussed and instead the reader is referred to (Viterbi and Omura 1979). Random interleaving The application of block interleaving in concatenated coding schemes generally leads to a weak performance. Due to the regular structure of the interleaver it may ensue that the temporal distance between pairs of symbols does not change by interleaving, resulting in poor distance properties of the entire code (cf. Section 3.6). Therefore, random or pseudorandom interleavers are often applied in this context. Pseudo-random interleavers can be generated by calculating row and column indices with modulo arithmetic. For concatenated coding schemes, interleavers are optimized with respect to the constituent codes. Interleaving delay A tight restriction to the total size of interleavers may occur for delay sensitive applications such as full duplex speech transmission. Here, delays of only around 10 ms are tolerable.
8
INTRODUCTION TO DIGITAL COMMUNICATIONS
Since the interleaver has to first be completely written before it can be read out, its size Lπ directly determines the delay t = Lπ · Ts .
1.2 Characteristics of Mobile Radio Channels 1.2.1 Equivalent Baseband Representation Wireless channels for mobile radio communications are challenging media that require careful system design for reliable transmission. As SISO channels, they represent an important building block of vector channels. Therefore, this section describes their time-discrete, equivalent baseband representation in more detail. Using a representation in the equivalent baseband is beneficial for simulation purposes, because the carrier whose frequency is generally much higher than the signal bandwidth need not to be explicitly considered. Figure 1.8 depicts the entire channel model that comprises all time-continuous analog components, including those from the transmitter and the receiver. The whole structure describes a time-discrete model, whose input x[k] is a sequence of generally complex-valued symbols of duration Ts according to some finite symbol alphabet X. The output sequence typically y[k] has the same rate 1/Ts and its symbols are distributed within the complex plane C. The input x[k] is first transformed by the transmit filter gT (t) of bandwidth B into a time-continuous, band limited signal x[k] · gT (t − kTs ) (1.1) x(t) = Ts · k
called complex envelope. Denoting the symbols of the alphabet X by Xµ and assuming that ∞ the energy of the transmit filter impulse response is defined to −∞ |gT (t)|2 dt = Ts−1 , a √ jω t 2e 0 x(t)
x[k]
analog part of transmitter
Re {·}
gT (t)
xBP (t)
hBP (t, τ ) √1 e−j ω0 t 2
y[k]
y(t) gW [k]
gR (t) analog part of receiver
+ (t) yBP
nBP (t) yBP (t) j H {·}
Figure 1.8 Structure of the time-discrete, equivalent baseband representation of a mobile radio channel
INTRODUCTION TO DIGITAL COMMUNICATIONS a)
XX (f ) σX2
−f0
B
+f0 f
XBP XBP (f )
b)
9 c)
HBP HBP (f ) NBP NBP (f )
σX2 /2
σX2 /2
N0 /2
2B
2B
2B
2B
−f0
+f0 f
−f0
+f0 f
Figure 1.9 Power spectral densities for a) complex envelope, b) bandpass signal and c) transmission channel for a rectangular shape of GT (j ω) single symbol Ts · x[k] · gT (t − kTs ) has the average energy ∞ |gT (t)|2 dt = Ts · E{|Xµ |2 } Es = Ts2 · E{|Xµ |2 } · −∞
(1.2)
resulting in an average power of σX2 = Es /Ts = E{|Xµ |2 }.2 For zero-mean and independent identically distributed (i.i.d.) symbols x[k], the average spectral density of x(t) is (Kammeyer 2004; Kammeyer and K¨uhn 2001; Proakis 2001) XX (j ω) = Ts · |GT (j ω)|2 · E{|Xµ |2 } = Es · |GT (j ω)|2 .
(1.3)
Obviously, it largely depends on the spectral shape of the transmit filter gT (t), and not on the kind of modulation scheme. Proceeding toward transmission, the real-valued bandpass signal √ √ xBP (t) = 2 · Re x(t) · ej ω0 t = 2 · x (t) cos(ω0 t) − x (t) sin(ω0 t) (1.4) is obtained by shifting x(t) into the bandpass region with the carrier frequency ω0 = 2πf0 √ and taking the real part. The factor 2 in (1.4) keeps the signal power and the symbol energy constant during modulation. The average spectral density of xBP (t) has the form XBP XBP (j ω) =
Es · |GT (j ω − j ω0 )|2 + |GT (j ω + j ω0 )|2 . 2
(1.5)
Besides the shift to ±ω0 , it differs from XX (j ω) by the factor 1/2 due to the total transmit power constraint. Figure 1.9 sketches the spectral densities for a rectangular shape of GT (j ω) with B = 1/(2Ts ). Now, xBP (t) is transmitted over the mobile radio channel, which is generally represented by its time-variant impulse response hBP (t, τ ) and an additive noise term nBP (t) with spectral density N0 /2 (1.6) yBP (t) = hBP (t, τ ) ∗ xBP (t) + nBP (t). The convolution in (1.6) is defined by
∞
hBP (t, τ ) ∗ xBP (t) =
hBP (t, τ )xBP (t − τ ) dτ.
(1.7)
0 2 The impulse response g (t) itself has the dimension s −1 because its spectrum G (j ω) = F {g (t)} has no T T T dimension.
10
INTRODUCTION TO DIGITAL COMMUNICATIONS a)
b) Y+ Y+ (f ) BP BP N+ N+ (f )
YBP YBP (f ) NBP NBP (f )
BP
+f0 f
YY (f ) NN (f )
2N0
BP
N0
N0 /2 −f0
c)
2B
−f0
+f0 f
−f0
B
+f0 f
Figure 1.10 Power spectral densities for a) received bandpass signal, b) analytical signal and c) received complex envelope In order to express yBP in the equivalent baseband, negative frequencies are eliminated with the Hilbert transformation (Kammeyer 2004; Proakis 2001) ∞ a(τ ) dτ F {H {a(t)}} = −j sgn(ω)A(j ω). (1.8) H {a(t)} = t −∞ − τ Therefore, adding j H {yBP (t)} to the received signal yBP (t) yields the complex analytical signal
2YBP (j ω) for ω > 0 + + YBP (j ω) = yBP (t) = yBP (t) + j H {yBP (t)} (1.9) 0 else whose spectrum vanishes for negative frequencies. However, for positive frequencies, the spectrum is doubled or, equivalently, the spectral power density is quadrupled (Figure 1.10). + (t) is shifted back into the baseband and weighted by a factor According to Figure 1.8, yBP √ 1/ 2 in order to keep the average power constant. With reference to the background noise, this leads to a spectral density of N0 . As is shown in the Appendix A.1, the output of the receive filter gR (t) in Figure 1.8 has the form ˜ kTs ) + n(t) y(t) = gR (t) ∗ h(t, τ ) ∗ x(t) + n(t) = x[k] · h(t, (1.10) k −j ω0 t denotes the equivalent baseband representations of where h(t, τ ) = 1/2 · h+ BP (t, τ )e √ channel impulse response and n(t) = gR (t) ∗ (n+ (t)e−j ω0 t / 2) the filtered background ˜ kTs ) is comprised of a transmit and receive filter as well as the noise. The filter h(t, channel impulse response, and represents the response of a time-discrete channel on an impulse transmitted at time instant kTs .3 The optimum receive filter gR (t) that maximizes the SNR at its sampled output has to be matched to the concatenation of channel impulse response and the transmit filter (Forney 1972; Kammeyer 2004), that is, gR (t) = f ∗ (−t) with f (t) = gT (t) ∗ h(t, τ ) holds. In order to avoid interference between successive symbols, the transmit and receive filters are generally chosen such that their convolution fulfills the first Nyquist criterion (Nyquist 1928; Proakis 2001). This criterion also ensures that the filtered and sampled noise remains 3 Note
˜ kTs ) does not represent delay τ , but the transmission time kTs . that the second parameter of h(t,
INTRODUCTION TO DIGITAL COMMUNICATIONS
11
white, and a symbol-wise detection is still optimum. However, even if gT (t) ∗ gR (t) fulfills the first Nyquist criterion, the channel impulse response h(t, τ ) between them destroys this property and the background noise n(t) is colored. Therefore, a prewhitening filter gW [k] working at the sampling rate 1/Ts and decorrelating the noise samples n(t)|t=kTs is required.4 Finally, the time-discrete equivalent baseband channel delivers a complex-valued signal y[k] = gW [k] ∗ y(t)|t=kTs by sampling y(t) at rate 1/Ts and filtering it with gW [k]. Throughout this work, gT (t) is assumed to be a perfect lowpass filter of bandwidth B = 1/(2Ts ). With gR (t) matched to h(t, τ ) ∗ gT (t) and a perfect prewhitening filter, the received signal y[k] has the form y[k] =
Lt
h[k, κ] · x[k − κ] + n[k]
(1.11)
κ=0
where Lt denotes the total filter length of the time-discrete channel model h[k, κ] working at rate 1/Ts and n[k] is termed Additive White Gaussian Noise (AWGN). It is described in more detail in the next subsection, followed by a description of the frequency-selective fading channel.
1.2.2 Additive White Gaussian Noise Every data transmission is disturbed by noise stemming from thermal noise, noise of electronic devices, and other sources. Due to the superposition of many different statistically independent processes at the receive antenna, the noise nBP (t) is generally modeled as white and Gaussian distributed. The attribute white describes the flat spectral density that corresponds with uncorrelated successive samples in the time domain. For Gaussian distributed samples, this is equivalent with statistical independence. A model reflecting this behavior is the AWGN channel. As mentioned in the last section, its two-sided spectral power density N0 /2 results in infinite power due to the infinite bandwidth. Therefore, this model only gains practical relevance with a bandwidth limitation, for example, by filtering with gR (t). In this subsection, the channel is assumed to be frequency-nonselective and time invariant so that h(t, τ ) = δ(τ ) holds and the transmit and receive filters are perfect lowpass filters (cf. Figure 1.9 and 1.11b). They fulfill the first Nyquist condition (Nyquist 1928), that is, their spectra are symmetric with respect to the Nyquist frequency fN = 1/(2Ts ).5 Therefore, the sampled equivalent baseband noise n[k] = n(t)|t=kTs remains white (cf. Figure 1.11) (Kammeyer 2004) and has a spectral density of N0 (cf. (1.9) and Figure 1.10). N0 is equally distributed onto the real part n [k] and the imaginary part n [k], each with a density of N0 /2. They are independent of each other resulting in the joint density pN (n) = pN (n ) · pN (n ) 1
= e 2π σN2
2 − n2 2σ
N
1
· e 2π σN2
2 − n2 2σ
N
|n|2
1 − σ2 = e N π σN2
(1.12)
4 In practice, the receive filter gR (t) is only matched to gT (t) due to lower implementation costs and imperfect knowledge of the channel impulse response. 5 For perfect lowpass filters, B = f = 1/(2T ) holds, that is, 2BT symbols can be transmitted per channel N s s usage.
12
INTRODUCTION TO DIGITAL COMMUNICATIONS NN (f )
b)
a) n[k] x[k]
N0
B
y[k] −f0
+f0
f
Figure 1.11 a) Model and b) spectral density of AWGN channel in the equivalent baseband representation of the complex baseband noise. The power of n(t) and, therefore, n[k] is σN2 = 2B · N0 =
N0 = σN2 + σN2 . Ts
(1.13)
The perfect lowpass filter gR (t) = gT∗ (−t) with bandwidth B is matched to gT (t) (Kammeyer 2004; Proakis 2001) and maximizes the SNR at its output. With the signal power σX2 = Es /Ts we obtain the signal to noise ratio SNR =
σX2BP σN2 BP
=
σX2 Es /Ts Es = = . 2 N0 /Ts N0 σN
(1.14)
as a characteristic measure of the AWGN channel in the baseband, as well as in the bandpass regime.
1.2.3 Frequency-Selective Time-Variant Fading For mobile radio systems, the propagation of radio waves is disturbed by scattering, reflections, and shadowing. Generally, many replicas of the same signal arrive at the receive antenna with different delays, attenuations, and phases. Moreover, the channel is time-variant due to the movements of the transmitter or the receiver. A channel with N propagation paths can be described by its equivalent baseband impulse response h(t, τ ) =
N−1
h(t, ν) · δ(τ − τν ),
(1.15)
ν=0
where t denotes the observation time and h(t, ν) the complex-valued weighting coefficient corresponding to the ν-th path with delay τν . Statistical Characterization Due to the stochastic nature of mobile radio channels, they are generally classified by their statistical properties. The autocorrelation function φHH (t, τ ) = E{h∗ (t, τ )h(t + t, τ )}
(1.16)
of h(t, τ ) with respect to t is an appropriate measure for this classification. The faster the channel changes, the faster φHH (t, τ ) vanishes in the direction of t. This relationship
INTRODUCTION TO DIGITAL COMMUNICATIONS
13
can also be expressed in the frequency domain. The Fourier transformation of φHH (t, τ ) with respect to t yields the scattering function HH (fd , τ ) = F {φHH (t, τ )} .
(1.17)
The Doppler frequency fd originates from the relative motions between the transmitter and the receiver. Integrating over τ leads to the Doppler power spectrum ∞ HH (fd ) =
HH (fd , τ ) dτ,
(1.18)
0
describing the power distribution with respect to fd . The range over which HH (fd ) is almost nonzero is called Doppler bandwidth Bd . It represents a measure for the time variance of the channel and its reciprocal 1 (1.19) tc = Bd denotes the coherence time. For tc Ts , the channel is slowly fading, for tc Ts , it changes remarkably during the symbol duration Ts . In the latter case, it is called time-selective and time diversity (cf. Section 1.4) can be gained when channel coding is applied. Integrating HH (fd , τ ) versus fd instead of τ delivers the power delay profile fd max
HH (τ ) =
HH (fd , τ ) dfd
(1.20)
−fd max
that describes the power distribution with respect to τ . The coherence bandwidth defined by Bc =
1 τmax
(1.21)
represents the bandwidth over which the channel is nearly constant. For frequency-selective channels, B Bc holds, that is, the signal bandwidth B is much larger than the coherence bandwidth and the channel behaves differently in different parts of the signal’s spectrum. In this case, the maximum delay τmax is larger than Ts so that successive symbols overlap, resulting in linear channel distortions called intersymbol interference (ISI). If the coefficients h[k, κ] in the time domain are statistically independent, frequency diversity is obtained (cf. Section 1.4). For B Bc , the channel is frequency-nonselective, that is, its spectral density is constant within the considered bandwidth (flat fading). Examples for different power delay profiles can be found in Appendix A.2. Modeling Mobile Radio Channels Typically, frequency-selective channels are modeled with time-discrete finite impulse response (FIR) filters following the wide sense stationary uncorrelated scattering (WSSUS) approach (H¨oher 1992; Schulze 1989). According to (1.11), the signal is passed through a tapped-delay-line and weighted at each tap with complex channel coefficients h[k, κ] as shown in Figure 1.12.
14
INTRODUCTION TO DIGITAL COMMUNICATIONS x[k]
h[k, 0]
Ts
Ts
Ts
h[k, 1]
h[k, 2]
h[k, Lt − 1]
n[k] y[k]
+
Figure 1.12 Tapped-delay-line model of frequency-selective channel with Lt taps
Although the coefficients are comprised of transmit and receive filters, as well as the channel impulse response h(t, τ ) and the prewhitening filter gW [k], (as stated in Section 1.2.1), they are assumed to be statistically independent (uncorrelated scattering). The length Lt = τmax /Ts of the filter depends on the ratio of maximum channel delay τmax and symbol duration Ts . Thus, the delay axis is divided into equidistant intervals and for example, the nκ propagation paths falling into the κ-th interval compose the coefficient h[k, κ] =
n κ −1
ej 2πfd,i kTs +j ϕi
(1.22)
i=0
with ϕi as the initial phase of the i-th component. The power distribution among the taps according to the power delay profiles described in Appendix A.2 (Tables A.1 and A.3) can be modeled with the distribution of the delays κ. The more delays that fall into a certain interval, the higher the power associated with this interval. Alternatively, a constant number of n propagation paths for each tap can be assumed. In this case, h[k, κ] = ρκ ·
n−1
ej 2πfd,i kTs +j ϕi
(1.23)
i=0
holds and the power distribution is taken into account by adjusting the parameters ρκ . The Doppler frequencies fd,i in (1.22) and (1.23) depend on the relative velocity v between the transmitter and the receiver, the speed of light c0 and the carrier frequency f0 fd =
v · f0 · cos α. c0
(1.24)
In (1.24), α represents the angle between the direction of arrival of the examined propagation path and the receiver’s movement. Therefore, its distribution also determines that of fd leading to Table A.2. Maximum and minimum Doppler frequencies occur for α = 0 and α = 180, respectively, and determine the Doppler bandwidth Bd = 2fd max . The classical Jakes distribution depicted in Figure 1.13
HH (fd ) =
√
0
A 1−(fd /fd max )2
|fd | ≤ fd max else,
(1.25)
INTRODUCTION TO DIGITAL COMMUNICATIONS
15
pfd (fd /fd max ) →
2
1.5
1
0.5
0 −1.5
−1
−0.5 0 0.5 fd /fd max →
1
1.5
Figure 1.13 Distribution of Doppler frequencies for isotropic radiations (Jakes spectrum)
is obtained for isotropic radiations without line-of-sight (LoS) connection. For referred directions of arrival, Gaussian distributions with appropriate means and variances are often assumed (cf. Table A.2 for τ > 0.5 µs). Unless otherwise stated, nondissipative channels assume meaning, so that EH { κ |h[k, κ]|2 } = 1 holds. In the following part, the focus is on the statistics of a single channel coefficient and, therefore, drop the indices k and κ. For a large number of propagation paths per tap, real and imaginary parts of H are statistically independent and Gaussian distributed stochastic √ processes and the whole magnitude |H| = H2 + H2 is Rayleigh distributed
2ξ/σH2 · exp(−ξ 2 /σH2 ) ξ ≥ 0 p|H| (ξ ) = (1.26) 0 else with mean EH {|h|} = π σH2 /2. In (1.26), σH2 denotes the average power of H. The instantaneous power which is chi-squared distributed with two degrees of freedom
1/σH2 · exp(−ξ/σH2 ) ξ ≥ 0 p|H|2 (ξ ) = (1.27) 0 else while the phase is uniformly distributed in [−π, π ]. If a LoS connection exists between the transmitter and the receiver, the total power P of the channel coefficient h is shared among a constant LoS and a Rayleigh fading component with a variance of σH2 . The power ratio between both parts is called Rice factor 2 2 K = σLoS /σH2 . Hence, the LoS component has a power of σLoS = KσH2 and the channel coefficient becomes h = σH2 · K + α (1.28) with total power P = (1 + K)σH2 . The fading process α consists of real and imaginary parts that are statistically independent zero-mean Gaussian processes each with variance σH2 /2.
16
INTRODUCTION TO DIGITAL COMMUNICATIONS 1
K K K K
2
=0 =1 =3 =5
0.6 0.4
=0 =1 =6 = 11
1
0.5
0.2 0 0
K K K K
1.5
p|H| (ξ ) →
0.8
p|H| (ξ ) →
b) P = 1
a) σH2 = 1
1
2 ξ→
3
4
0 0
1
2
3
ξ→
Figure 1.14 Rice distributions for different Rice factors K As shown in (Proakis 2001), the magnitude of H is Ricean distributed
2ξ/σH2 · exp − ξ 2 /σH2 − K · I0 2ξ K/σH2 ξ ≥0 p|H| (ξ ) = 0 else .
(1.29)
In (1.29), I0 (·) denotes the zeroth-order modified Bessel function of first kind (Benedetto and Biglieri 1999). With reference to the squared magnitude, we obtain the density
1/σH2 · exp − ξ/σH2 − K · I0 2 ξ K/σH2 ξ ≥0 (1.30) p|H|2 (ξ ) = 0 else . The phase is no longer uniformly distributed. Figure 1.14a shows some Rice distributions for a constant fading variance σH2 = 1 and varying Rice factor. For K = 0, the direct component vanishes and pure Rayleigh fading is obtained. In Figure 1.14b, the total average power is fixed to P = 1 and σH2 = P /(K + 1) is adjusted with respect to K. For a growing Rice factor, the probability density function becomes more narrow and reduces to a Dirac impulse for K → ∞. This extreme case corresponds to the AWGN channel without any fading. The reason for especially discussing the above channels is that they somehow represent extreme propagation conditions. The AWGN channel represents the best case because noise contributions can never be avoided perfectly. The frequency-nonselective Rayleigh fading channel describes the worst-case scenario. Finally, Rice fading can be interpreted as a combination of both, where the Rice factor K adjusts the ratio between AWGN and fading parts.
1.2.4 Systems with Multiple Inputs and Outputs So far, this section has only described systems with a single input and a single output. Now, the scenario is extended to MIMO systems that have already been introduced in
INTRODUCTION TO DIGITAL COMMUNICATIONS h1,1 [k, κ]
17 n1 [k]
h2,1 [k, κ]
x1 [k]
y1 [k] n2 [k] y2 [k]
x2 [k] h2,NI [k, κ]
nNO [k]
hNO ,NI [k, κ]
xNI [k]
yNO [k]
Figure 1.15 General structure of frequency-selective MIMO channel
Subsection 1.1.1. However, at this point we are restricted to a general description. Specific communication systems are treated in Chapters 4 to 6. According to Figure 1.1, the MIMO system consists of NI inputs and NO outputs. Based on (1.11), the output of a frequency-selective SISO channel can be described by y[k] =
L t −1
h[k, κ] · x[k − κ] + n[k].
κ=0
This relationship now has to be extended for MIMO systems. As a consequence, NI signals xµ [k], 1 ≤ µ ≤ NI , form the input of our system at each time instant k and we obtain NO output signals yν [k], 1 ≤ ν ≤ NO . Each pair (µ, ν) of inputs and outputs is connected by a channel impulse response hν,µ [k, κ] as depicted in Figure 1.15. Therefore, the ν-th output at time instant k can be expressed as yν [k] =
NI L t −1
hν,µ [k, κ] · xµ [k − κ] + nν [k]
(1.31)
µ=1 κ=0
where Lt denotes the largest number of taps among all the contributing channels. Exploiting vector notations by comprising all the output signals yν [k] into a column vector y[k] and all the input signals xµ [k] into a column vector x[k], (1.31) becomes y[k] =
L t −1
H[k, κ] · x[k − κ] + n[k].
(1.32)
κ=0
In (1.32), the channel matrix has the form h1,1 [k, κ] .. H[k, κ] = .
··· .. . hNO ,1 [k, κ] · · ·
h1,NI [k, κ] .. .
.
(1.33)
hNO ,NI [k, κ]
Finally, we can combine the Lt channel matrices H[k, κ] to obtain a single matrix H[k] = [H[k, 0] · · · H[k, Lt − 1]]. With the new input vector xLt [k] = [x[k]T · · · x[k − Lt − 1]T ]T we obtain (1.34) y[k] = H[k] · xLt [k] + n[k].
18
INTRODUCTION TO DIGITAL COMMUNICATIONS
1.3 Signal Detection 1.3.1 Optimal Decision Criteria This section briefly introduces some basic principles of signal detection. Specific algorithms for special systems are described in the corresponding chapters. Assuming a frame-wise transmission, that is, a sequence x consisting of Lx discrete, independent, identically distributed (i.i.d.) symbols x[k] is transmitted over a SISO channel as discussed in the last section. Moreover, we are restricted to an uncoded transmission, while the detection of coded sequences is subject to Chapter 3. The received sequence is denoted by y and comprises Ly symbols y[k]. Sequence Detection For frequency-selective channels, y suffers from ISI and has to be equalized at the receiver. The optimum decision rule for general channels with respect to the frame error probability Pf looks for the sequence x˜ that maximizes the a posteriori probability Pr{X = x˜ | y}, that is, the probability that x˜ was transmitted under the constraint that y was received. Applying Bayes’ rule Pr{X = x˜ } , (1.35) Pr{X = x˜ | Y = y} = pY|˜x (y) · pY (y) we obtain the maximum a posteriori (MAP) sequence detector xˆ = argmax Pr{˜x | y} = argmax pY|˜x (y) · Pr{˜x} x˜ ∈XLx
(1.36)
x˜ ∈XLx
where XLx denotes the set of sequences with length Lx and symbols x[k] ∈ X.6 It illustrates that the sequence MAP detector takes into account the channel influence by pY|˜x (y) as well as a priori probabilities Pr{˜x} of possible sequences. It has to be emphasized that pY|˜x (y) is a probability density function since y is distributed continuously. On the contrary, Pr{˜x | y} represents a probability because x˜ serves as a hypothesis taken from a finite alphabet XLx and y represents a fixed constraint. If either Pr{˜x} is not known, a priori to the receiver or all sequences are uniformly distributed resulting in a constant Pr{˜x}, we obtain the maximum likelihood (ML) sequence detector (1.37) xˆ = argmax pY|˜x (y). x˜ ∈XLx
Under these assumptions, it represents the optimal detector minimizing Pf . Since the symbols x[k] in x˜ are elements of a discrete set X (cf. Section 1.4), the detectors in (1.36) and (1.37) solve a combinatorial problem that cannot be fixed by gradient methods. An exhaustive search within the set of all possible sequences x˜ ∈ XLx requires a computational effort that grows exponentially with |X| and Lx and is prohibitive for most practical cases. An efficient algorithm for an equivalent problem – the decoding of convolutional codes (cf. Section 3.4) – was found by Viterbi in 1967 (Viterbi 1967). Forney showed in (Forney 1972) that the Viterbi algorithm is optimal for detecting sequences in the presence of ISI. notational simplicity, pY|X=˜x (y) is simplified to pY|˜x (y) and equivalently Pr{X = x˜ } to Pr{˜x}. The term pY (y) can be neglected because it does not depend on x˜ . 6 For
INTRODUCTION TO DIGITAL COMMUNICATIONS
19
Orthogonal Frequency Division Multiplexing (OFDM) and CDMA systems offer different solutions for sequence detection in ISI environments. They are described in Chapter 4. Symbol-by-Symbol Detection While the Viterbi algorithm minimizes the error probability when detecting sequences, the optimal symbol-by-symbol MAP detector
x[k] ˆ = argmax Pr{X [k] = Xµ | y} = argmax Xµ ∈X
Xµ ∈X
= argmax Xµ ∈X
Pr{X = x˜ | y}
x˜ ∈XLx
x[k]=X ˜ µ
pY|˜x (y) · Pr{˜x}
(1.38)
x˜ ∈XLx
x[k]=X ˜ µ
minimizes the symbol error probability Ps . Obviously, the difference compared to (1.36) is the fact that all sequences x˜ with x[k] ˜ = Xµ contribute to the decision, and not only to the most probable one. Both approaches need not deliver the same decisions as the following example demonstrates. Consider a sequence x = [x[0], x[1]] of length Lx = 2 with binary symbols x[k] ∈ {X0 , X1 }. The conditional probabilities Pr{˜x | y} = Pr{x[0], ˜ x[1] ˜ | y} are exemplarily summarized in Table 1.1. While the MAP sequence detector delivers the sequence xˆ = [X0 , X1 ] with the highest a posteriori probability Pr{˜x | y} = 0.27, the symbol-by-symbol detector decides in favor to Pr{X [0] = Xµ | y} =
Pr{X [0] = Xµ , X [1] = Xν | y}
Xν ∈X
(and an equivalent expression for x[1]) resulting in the decisions x[0] ˆ = x[1] ˆ = X0 . However, the difference between both approaches is only visible at low SNRs and vanishes at low error rates. Again, for unknown a priori probability or uniformly distributed sequences, the corresponding symbol-by-symbol ML detector is obtained by x[k] ˆ = argmax pY|X [k]=Xµ (y) = argmax Xµ ∈X
Xµ ∈X
pY|˜x (y).
x˜ ∈XLx x[k]=X ˜ µ
Table 1.1 Illustration of sequence and symbol-by-symbol MAP detection Pr{x[0], ˜ x[1] ˜ | y} x[0] ˜ = X0 x[0] ˜ = X1 Pr{x[1] ˜ | y}
x[1] ˜ = X0
x[1] ˜ = X1
Pr{x[0] ˜ | y}
0.26 0.25 0.51
0.27 0.22 0.49
0.53 0.47
(1.39)
20
INTRODUCTION TO DIGITAL COMMUNICATIONS
Memoryless channels For memoryless channels like AWGN and flat fading channels and i.i.d. symbols x[k], the a posteriori probability Pr{˜x | y} can be factorized into k Pr{x[k] ˜ | y[k]}. Hence, the detector no longer needs to consider the whole sequence, but can instead decide symbol by symbol. In this case, the time index k can be dropped and (1.38) becomes xˆ = argmax Pr{X = Xµ | y}. Xµ ∈X
(1.40)
Equivalently, the ML detector in (1.39) reduces to xˆ = argmax pY|Xµ (y).
(1.41)
Xµ ∈X
1.3.2 Error Probability for AWGN Channel This section shall describe the general way by which to determine the probabilities of decision errors. The derivations are restricted to memoryless channels but can be extended to channels with memory or trellis-coded systems. In these cases, vectors instead of symbols have to be considered. For a simple AWGN channel, y = x + n holds and the probability density function pY|Xµ (y) in (1.41) has the form pY|Xµ (y) =
2 2 1 · e−|y−Xµ | /σN 2 π σN
(1.42)
(cf. (1.12)). With (1.42), a geometrical interpretation of the ML detector in (1.41) shows that the symbol Xµ out of X that minimizes the squared Euclidean distance |y − Xµ |2 is determined. Let us now define the decision region (1.43) Dµ = y | |y − Xµ |2 < |y − Xν |2 ∀ Xν = Xµ for symbol Xµ comprising all symbols y ∈ C whose Euclidean distance to Xµ is smaller than to any other symbol Xν = Xµ . The complementary set is denoted by Dµ . Assuming / Dµ or equivalently y ∈ Dµ . The that Xµ was transmitted, a detection error occurs for y ∈ complementary set can be expressed by the union Dµ = ν=µ Dµ,ν of the sets Dµ,ν = y | |y − Xµ |2 > |y − Xν |2 (1.44) containing all symbols y whose Euclidean distance to a specific Xν is smaller than to Xµ . This does not mean that Xν has the smallest distance of all symbols to y ∈ Dµ,ν . The symbol error probability can now be approximated by the well-known union bound (Proakis 2001) Dµ,ν Ps (Xµ ) = Pr y ∈ Dµ = Pr y ∈ ≤
ν=µ
ν=µ
Pr y ∈ Dµ,ν .
(1.45)
INTRODUCTION TO DIGITAL COMMUNICATIONS
21
The equality in (1.45) holds if and only if the sets Dµ,ν are disjointed. The upper (union) bound simplifies the calculation remarkably because in many cases it is much easier to determine the pairwise sets Dµ,ν than to exactly describe the decision region Dµ . Substituting y = Xµ + n in (1.44) yields Pr{Y ∈ Dµ,ν } = Pr |Y − Xµ |2 > |Y − Xν |2 = Pr |N |2 > |Xµ − Xν + N |2 (1.46) = Pr Re (Xµ − Xν ) · N ∗ < − 12 |Xµ − Xν |2 η ξ In (1.46), η is new a zero-mean Gaussian distributed real random variable with variance ση2 = |Xµ − Xν |2 σN2 /2 and ξ a negative constant. This leads to the integral Pr{Y ∈ Dµ,ν } =
1 π |Xµ − Xν |2 σN2
−|Xµ −Xν |2 /2 −
−∞
e
η2 |Xµ −Xν |2 σ 2 N
dη
(1.47)
that is not solvable in closed form. With the complementary error function (Benedetto and Biglieri 1999; Bronstein et al. 2000) ∞ x 2 2 2 2 e−ξ dξ = 1 − √ e−ξ dξ = 1 − erf(x). (1.48) erfc(x) = √ π x π 0 and the substitution ξ = η/(|Xµ − Xν |σN ) we obtain the pairwise error probability between symbols Xµ and Xν !" # |Xµ − Xν |2 1 Pr{Y ∈ Dµ,ν } = · erfc . (1.49) 2 4σN2 Next, we normalize the squared Euclidean distance |Xµ − Xν |2 by the average symbol power σX2 |Xµ − Xν |2 |Xµ − Xν |2 2µ,ν = = (1.50) 2 Es /Ts σX so that the average error probability can be calculated with (1.14) to Ps = E Ps (Xµ ) = Ps (Xµ ) · Pr{Xµ } Xµ
" & '2 E 1 µ,ν s . Pr{Xµ } · erfc · ≤ 2 X 2 N 0 X =X µ
ν
(1.51)
µ
Equation (1.51) shows that the symbol error rate solely depends on the squared Euclidean distance between competing symbols and the SNR Es /N0 . Examples are presented for various linear modulation schemes in Section 1.4.
22
INTRODUCTION TO DIGITAL COMMUNICATIONS
1.3.3 Error and Outage Probability for Flat Fading Channels Ergodic Error Probability For nondispersive channels, the transmitted symbol x is weighted with a complex-valued channel coefficient h and y = hx + n holds. Assuming perfect channel state information (CSI) at the receiver, that is, h is perfectly known, we obtain Pr{Y ∈ Dµ,ν | h} = Pr |Y − hXµ |2 > |Y − hXν=µ |2 " & '2 1 E µ,ν s = · erfc · |h|2 . 2 2 N0
(1.52)
Therefore, the symbol error probability is itself a random variable depending on the instantaneous channel energy |h|2 . The ergodic symbol error rate can be obtained by calculating the expectation of (1.52) with respect to |h|2 . A convenient way exploits the relationship 1 1 · erfc(x) = · 2 π
π/2
0
+ * x2 dθ exp − 2 sin θ
for
x>0
(1.53)
which can be derived by changing from Cartesian to polar coordinates (Simon and Alouini 2000). Inserting (1.53) into (1.52), reversing the order of integration and performing the substitution s(θ ) = −(µ,ν /2)2 Es /N0 / sin2 (θ ) we obtain 1 π/2 p|H|2 (ξ ) π 0 0 * + (µ,ν /2)2 Es /N0 dξ dθ × exp −ξ sin2 (θ ) 1 π/2 ∞ = exp(ξ s(θ )) · p|H|2 (ξ ) dξ dθ. π 0 0
, - EH Pr{Y ∈ Dµ,ν | h} =
∞
(1.54)
The inner integral in (1.54) describes the moment generating function (MGF) ∞ M|H|2 (s) =
p|H|2 (ξ ) · esξ dξ
(1.55)
0
of the random process |H|2 (Papoulis 1965; Simon and Alouini 2000). Using the MGF is a very general concept that will be used again in Section 1.5 when dealing with diversity. For the Rayleigh fading channel, the squared magnitude is chi-squared distributed with two degrees of freedom so that M|H|2 (s) has the form ∞ M|H|2 (s) = 0
1 −ξ/σ 2 sξ 1 H ·e e dξ = . 2 σH 1 − sσH2
(1.56)
INTRODUCTION TO DIGITAL COMMUNICATIONS
23
Replacing s(θ ) by −(µ,ν /2)2 Es /N0 / sin2 (θ ) again, the subsequent integration with respect to θ can be solved in closed-form. It yields for a nondissipative channel with σH2 = 1 , sin2 (θ ) 1 π/2 EH Pr{Y ∈ Dµ,ν | h} = dθ π 0 sin2 (θ ) + (µ,ν /2)2 Es /N0 " # ! (µ,ν /2)2 Es /N0 1 . = 1− 2 1 + (µ,ν /2)2 Es /N0
(1.57)
Contrary to (1.51), the error probability does not decrease exponentially but much slower. In Appendix A.3 it is shown that the MGF of H for a Ricean distribution with P = 1 has the form * + sK 1+K . (1.58) · exp M|H|2 (s) = 1+K −s 1+K −s Here, no closed-form expression is available for ,
EH Pr{Y ∈ Dµ,ν
' & 1 π/2 (µ,ν /2)2 Es /N0 dθ. | h} = M|H|2 − π 0 sin2 (θ )
(1.59)
However, (1.59) can be easily computed numerically with arbitrary precision due to the finite limits of the integral. Equivalent to the procedure for the AWGN channel, the union bound in (1.45) can be applied to obtain an upper bound of the average error probability. A comparison of (1.49) with (1.57) shows that the exponential decay of the error rate for the AWGN channel is replaced by a much lower slope. This can also be observed in Figure 1.16 illustrating the error rate probabilities for a binary antipodal modulation scheme (binary phase shift keying, BPSK) with equiprobable symbols. For small K, the Ricean fading channel behaves similarly to the pure Rayleigh fading channel without an LoS component. With growing K, the LoS component becomes more and more dominating leading finally for K → ∞ to the AWGN case. Principally, fading channels require much higher SNRs than the AWGN channel in order to achieve the same error rates for uncoded systems. Outage Probability For many applications, the ergodic error probability is not the most important parameter. Instead, a certain transmission quality represented by, for example, a target error rate Pt , has to be guaranteed to a predefined percentage. Therefore, the outage probability Pout , that is, the probability that a certain error rate cannot be achieved, is important. For the frequencynonselective Rayleigh fading channel, Pout describes the probability that the instantaneous signal to noise ratio γ = |h|2 · Es /N0 falls below a predefined threshold γt γt Es /N0
Pout = Pr{|H|2 · Es /N0 < γt } = 0
* p|H|2 (ξ ) dξ = 1 − exp −
+ γt . Es /N0
(1.60)
24
INTRODUCTION TO DIGITAL COMMUNICATIONS 10 10
Ps →
10 10 10 10 10
Rice fading for BPSK
0
Rayleigh K=1 K=5 K = 10 K = 100 AWGN
−1
−2
−3
−4
−5
−6
0
10
20 30 Eb /N0 in dB →
40
Figure 1.16 Symbol error probability for BPSK and transmission over AWGN, Rayleigh and Rice fading channels with different Rice factors K
0
10
Pt Pt Pt Pt Pt
−1
Pout →
10
= 10−1 = 10−2 = 10−3 = 10−4 = 10−5
−2
10
−3
10
−4
10
0
10
20 30 Es /N0 in dB →
40
Figure 1.17 Outage probability for BPSK, a frequency-nonselective Rayleigh fading channel and different target error rates Pt Hence, the outage probability can be calculated by defining a target error rate Pt , determining the required SNR γt and insert it into (1.60). For a binary antipodal modulation scheme with equiprobable symbols, Figure 1.17 illustrates the outage probability for different target error rates. It can be observed that the higher the quality constraints (low Pt ), the higher the outage probability. For 10 log10 (Es /N0 ) = 20 dB, an error rate of Pt = 10−5 can only be achieved to 91%, that is, Pout ≈ 0.09, while Pt = 10−1 can be ensured to nearly 99.8% (Pout ≈ 8 · 10−3 ). For an outage probability of Pout = 10−2 , a gap of nearly 10 dB exists between Pt = 0.1 and Pt = 10−5 .
INTRODUCTION TO DIGITAL COMMUNICATIONS
25
1.3.4 Time-Discrete Matched Filter In Section 1.2.1 on page 10, the matched filter was already mentioned as a time-continuous filter that maximizes the SNR of its sampled output. After deriving a time-discrete channel model in the last section and discussing optimum detection strategies, we now consider the time-discrete matched filter. It provides a sufficient statistics, that is, its output contains the same mutual information as the input and no information is lost (cf. also Chapter 2) (Benedetto and Biglieri 1999). Hence, the optimum detection can still be applied after matched filtering. In order to derive the matched filter in vector notations, we have a look at the general system described in (1.34). It comprises as special cases the transmission over a frequencyselective SISO channel (NI = NO = 1) as well as a MIMO system with flat fading channels (Lt = 1). We are now interested in a receive filter w maximizing the SNR for an isolated symbol, for example, x[k] transmitted at time instant k over a frequency-selective channel.7 Note that this filter does not take into account intersymbol or multilayer interference, but just the ratio of desired signal power and noise power. Therefore, we comprise the Lt coefficients belonging to the channel impulse response into a column vector hk and obtain the Lt × 1 vector yk = hk · x[k] + nk . (1.61) The filter output r[k] = wH · yk = wH · hk · x[k] + wH nk
(1.62)
can be split into the information bearing part wH hk x[k] and a noise part wH nk so that the SNR amounts to .2 . wH hk EX |x[k]|2 hH EX .wH hk x[k]. k w . (1.63) SNR = .2 = . H H E H . . w n n w k k N EN w nk Assuming i.i.d. noise samples with covariance matrix EN nk nH = σN2 · ILt and an average k 2 2 signal power of E |x[k]| = σX , (1.63) becomes SNR =
σX2 wH hk hH k w . · 2 H w w σN
(1.64)
Without loss of generality, we can force the filter w to have unit energy, that is, wH w = 1 holds. With this constraint, we obtain the optimization problem ˜ H hk hH ˜ w = argmax w k w ˜ w
˜ Hw ˜ =1 s.t. w
(1.65)
which can be transformed into an unconstrained problem by using the Lagrange multiplier. In this case, the cost function H L(w, λ) = wH hk hH k w − λ(w w − 1)
(1.66)
7 Equivalently, we can consider symbol x transmitted over the kth transmit antenna in a MIMO system. For k this case, it is shown in Section 1.5 that the matched filter delivers optimum estimates.
26
INTRODUCTION TO DIGITAL COMMUNICATIONS
has to be maximized. Setting the partial derivative with respect to wH to zero (∂w/∂wH = 0) yields the eigenvalue problem ∂L(w, λ) H = hk hH k w − λw = hk hk − λI · w = 0. H ∂w
(1.67)
2 Since hk hH k is a matrix with rank one, only one nonzero eigenvalue λ = hk exists and the corresponding eigenvector is hk w= . (1.68) hk
Obviously, the matched filter is simply the Hermitian of the vector hk normalized to unit energy. The resulting signal becomes r[k] =
hH k · yk = h · x[k] + n[k] ˜ hk
(1.69)
hk hH hk σX2 hH Es · k H k = hk 2 · . 2 N0 hk hk σN
(1.70)
and its SNR amounts to SNR =
For a SISO flat fading channel with a single coefficient hk , the matched filter reduces to a scalar weighting with h∗k /|hk |. Moreover, if real-valued modulation schemes are employed (see next section), the imaginary part of r[k] does not contain any information and r[k] = Re wH yk represents a sufficient statistic. Sometimes, it is desirable to obtain an unbiased estimate of the information bearing symbol, that is, r[k] = x[k] + n[k] ˜ (cf. Subsection 1.5.1). Hence, the normalization of w with hk has to be replaced by hk 2 . On the contrary, no normalization may be needed in other scenarios as described in Section 3.1 leading to w = hk . These differences do not affect the structure of the matched filter or the resulting SNR. In a real-world scenario, a sequence of symbols is transmitted over the frequencyselective channel and ISI occurs. The received vector can be described with y = X · h + n,
(1.71)
where y and n are column vectors whose size depends on the length of the transmitted sequence x and X is a convolution matrix set up from x. Extracting that part in X which contains the kth symbol results in yk = Xk · h + nk with
x[k] x[k + 1] .. .
x[k − 1] x[k] .. .
Xk = x[k + Lt − 1] x[k + Lt − 2]
(1.72) · · · x[k − Lt + 1] · · · x[k − Lt + 2] .. .. . . ··· x[k]
(1.73)
INTRODUCTION TO DIGITAL COMMUNICATIONS
27
The matched filter delivers the estimate r[k] = hH · yk = hH · Xk · h + hH nk L L = h∗i · hj · x[k − i + j ] + n[k + i − 1] j =1
i=1
=
L
|hi |2 · x[k] +
L L
h∗i hj · x[k − i + j ] + n[k]. ˜
(1.74)
i=1 j =1 j =i
i=1
While the SNR ignores the middle term and is still described by (1.70), the signal to interference plus noise ratio (SINR) equals
SINR =
σX2 · σN2 ·
L
/ L
2 2 i=1 |hi | + σX ·
2 i=1 |hi |
02
. .2 . Lt −|i| . ∗ h h . j j +|i| . j =1 i=−Lt +1
Lt −1
(1.75)
i=0
and is not the optimum one. Higher SINRs can be achieved by employing appropriate equalizers (Kammeyer 2004; Proakis 2001). However, optimum detection after matched filtering is still possible, because the matched filter output represents a sufficient statistic.
1.4 Digital Linear Modulation 1.4.1 Introduction The analysis and investigations presented in this book are based on linear digital modulation schemes, that is, the modulator has no memory. Their performances are analyzed in this section for different channels. As before, we always assume perfect lowpass filters gT (t) and gR (t) at the transmitter and the receiver. Therefore, the description focuses on the time-discrete equivalent baseband signal x[k] at the channel input (cf. Figures 1.6 and 1.8). The modulator just performs a simple mapping and extracts m successive bits out of ˜ b() and maps the m-tuple b[k] = [b[mk] · · · b[m(k + 1) − 1]], k = l/m, onto one of m M = 2 possible symbols Xµ . They form the signal alphabet X = {X0 , . . . , XM−1 } that depends on the type of modulation. Assuming that all the symbols are equally likely, the agreement in (1.2) delivers the average symbol energy Es = Ts E{|Xµ | } = Ts 2
M−1 µ=0
M−1 Ts Pr{Xµ } · |Xµ | = |Xµ |2 . M 2
(1.76)
µ=0
Attention has to be paid when modulation schemes with different alphabet sizes are compared. In order to draw a fair comparison, the different numbers of bits per symbol have to be considered. This is done by calculating the performance with respect to the average energy per bit Eb = Es /m. In this case, the SNR is expressed by the measure Eb /N0 rather than Es /N0 .
28
INTRODUCTION TO DIGITAL COMMUNICATIONS
In many of today’s communication systems, the modulation type and size are not fixed parameters but are chosen according to instantaneous channel conditions and data rate demands of the users. These adaptive modulation schemes require some kind of channel knowledge at the transmitter. As an example, the SNR can be used to determine the highest possible modulation size under the constraint of a desired error probability. Adaptive modulation schemes are already applied in wireless local area networks (WLAN) using the IEEE 802.11 g standard (Hanzo et al. 2003a) or in the High Speed Downlink Packet Access (HSDPA) transmission (Blogh and Hanzo 2002; Holma and Toskala 2004) of UMTS. Mapping Strategies While the symbol error rate (following subsections) only depends on the geometrical arrangement of the symbols as well as the SNR, the bit error rate is also affected by ˜ the specific mapping of the m-tuples b[k] onto the symbols Xµ . In many cases, Gray mapping is applied, ensuring that neighboring symbols differ only in one bit. This results in a minimum bit error probability if error events are dominated by mixing up neighboring symbols at the receiver (what is true in most cases). Hence, we obtain a tight approximation Pb ≈
1 Ps . m
(1.77)
Alternatively, natural mapping just enumerates the symbols (e.g. counterclockwise for Phase Shift Keying (PSK), starting with the smallest phase) and assigns them the binary representations of their numbers. For both the mapping strategies, the exact solution of the bit error rate requires the consideration of the specific error probabilities Pr{y ∈ Dµ,ν } between two competing symbols Xµ and Xν and the corresponding number wµ,ν of differing bits concerning their binary presentations. The exact bit error probability has the form Pb = Pr{Xµ } · wµ,ν · Pr{Y ∈ Dµ,ν }. (1.78) Xν =Xµ
Xµ
1.4.2 Amplitude Shift Keying (ASK) If the amplitude of real-valued symbols bears the information, the modulation is called Amplitude Shift Keying (ASK). Alternatively, it is also termed Pulse Amplitude Modulation (PAM). In order to have equal distances between neighboring symbols (Figure 1.18), the amplitudes are chosen to Xµ = (2µ + 1 − M) · e
for 0 ≤ µ < M.
The parameter e has to be determined such that the energy constraint in (1.76) is fulfilled leading to " M−1 2 ! Ts Es 3 · (1.79) (2µ + 1 − M)e = Es ⇒ e = M M 2 − 1 Ts µ=0
for equally likely symbols. The minimum normalized squared Euclidean distance amounts to 20 =
(2e)2 12 . = 2 Es /Ts M −1
(1.80)
INTRODUCTION TO DIGITAL COMMUNICATIONS 4-ASK
29
16-QAM
Im
32-QAM
Im
5e
Im
3e 3e e -3e
-e
e
3e
Re
-3e
-e -e
e e
3e
Re
-5e -3e
-e
e
3e
5e Re
-3e
Figure √ 1.18 Symbol alphabets of linear√amplitude modulation (e = e = Es /Ts /10 for 16-QAM and e = Es /Ts /20 for 32-QAM)
√ Es /Ts /5 for 4-ASK,
Performance for AWGN Channel The symbol error probability, that is, the probability that a wrong symbol is detected at the receiver, can be easily calculated without the union bound. An error occurs if the real part of the noise n [k] exceeds half of the distance 2e between neighboring symbols. Therefore, the inner sum in (1.51) only has to be evaluated for two adjacent symbols, that is, for the smallest Euclidean distance. While the outermost symbols have only one neighbor that they can be mixed up with, the inner symbols have two competing neighbors. For equiprobable symbols, this fact can be considered by weighting the symbol specific error rates with the number of competing neighbors resulting in a total average weight (2(M − 2) + 2)/M = 2(M − 1)/M. Therefore, (1.80) and the derivation in Section 1.3 yield the average error probability !" # E 3 M − 1 s · erfc · (1.81a) PsM -ASK = M (M 2 − 1) N0 !" # Eb 3m M −1 · erfc · = . (1.81b) M (M 2 − 1) N0 The bit error probabilities can be determined by applying (1.77) or (1.78). Performance for Flat Rayleigh Fading Channel According to the second part of Section 1.3 and the argumentation above, the symbol error probability for a frequency-nonselective Rayleigh fading channel is obtained by applying (1.54) for µ,ν = 0 and appropriate weighting " ! # 3E /N M − 1 s 0 . (1.82) · 1− PsM -ASK = M M 2 − 1 + 3Es /N0 Figure 1.19 shows the results for M-ASK and transmission over an AWGN and a frequency-nonselective Rayleigh fading channel. Obviously, the performance degrades with increasing M due to decreasing Euclidean distances between the symbols for constant
30
INTRODUCTION TO DIGITAL COMMUNICATIONS 10 10
Ps →
10 10 10 10 10
0
M M M M
−1
−2
=2 =4 =8 = 16
−3
−4
−5
−6
0
10
20 30 Eb /N0 in dB →
40
Figure 1.19 Symbol error rates for M-ASK and transmission over an AWGN channel (—) and a frequency-nonselective Rayleigh fading channel (- - -) average signal energy Es . While the error rate shows an exponential decay for the AWGN channel, the slope is nearly linear for the flat Rayleigh fading channel. Therefore, in order to achieve the same average performance for fading channels, the transmit power has to be remarkably higher than that for the AWGN channel. For 2-ASK and Ps = 10−4 , this gap is larger than 25 dB and grows for lower error rates. An appropriate way to bridge this divide is the application of error correcting codes as explained in Chapter 3.
1.4.3 Quadrature Amplitude Modulation (QAM) For quadrature amplitude modulation (QAM), deployed, for example, in WLAN systems (ETSI 2001; Hanzo et al. 2000) and the HSDPA in UMTS (Holma and Toskala 2004), real and imaginary parts of a symbol can be chosen independently from each other. Hence, m = m/2 bits are mapped onto both real and imaginary symbol parts, according to a real-valued M -ASK with M = 2m . The combination of both parts results in a square arrangement, for example, a 16-QAM with m = 2 (see Figure 1.18). Adapting the condition in (1.76) to QAM yields
M −1 M −1 M −1 e2 ! Es 2 2 2 (2µ + 1 − M ) + (2ν + 1 − M ) = 2e (2µ + 1 − M )2 = . M Ts µ=0 ν=0
µ=0
Due to M = M 2 , the parameter e for M-QAM can be calculated to " Es 3 · e= 2(M − 1) Ts
(1.83)
and the minimum squared Euclidean distance is 20 =
(2e)2 6 . = Es /Ts M −1
(1.84)
INTRODUCTION TO DIGITAL COMMUNICATIONS
31
Without going into further details, it has to be mentioned that the combinations of ASK and Phase Shift Keying (PSK) are possible. As an example, Figure 1.18 shows the 32-QAM modulation scheme. Performance for AWGN Channel The symbol error √rate for M-QAM can be immediately derived because real and imaginary parts represent M-ASK schemes and can be detected independently. Hence, a correct decision is made if and only if both the components are correctly detected. Since the signal energy Es is equally distributed between real and imaginary parts, we can use (1.81a) with Es |√M -ASK = Es |M -QAM /2 to obtain the error probabilities for both parts. The total error probability for M-QAM becomes & '+ * √ Es /2 2 PsM -QAM = 1 − 1 − Ps M -ASK N0 & ' * √ & '+ √ Es /2 Es /2 2 − Ps M -ASK = 2Ps M -ASK . (1.85) N0 N0 The squared error probability in (1.85) can be neglected for high signal to noise ratios resulting with m = m/2 in an upper bound !" # √ Es M −1 3 M -QAM erfc Ps <2 √ (1.86a) 2(M − 1) N0 M !" # √ Eb M −1 3m erfc . (1.86b) =2 √ 2(M − 1) N0 M Figure 1.20 illustrates the results for various QAM schemes. Solid lines depict the solutions for the AWGN channel while dashed and dashed dotted lines show the results of the Rayleigh fading channel. With reference to the Rayleigh fading channel, only a small difference between the exact solution and approximation can be observed. For the AWGN channel, this difference is even smaller so that it is not shown in the figure. Due to √the separability of real and imaginary parts the bit error probability is identical to that of M-ASK, taking into account that both parts have only half of the average symbol energy Es . Performance for Flat Rayleigh Fading Channel According to the procedure described in the last section, the ergodic symbol error probability for a frequency-nonselective Rayleigh fading channel must be determined by solving PsM -QAM = EH PsM -QAM (h) , √ √ 2 (1.87) = EH 2Ps M -ASK (h) − Ps M -ASK (h) . The√expectation of the linear term is already known from M-ASK. Properly replacing M by M and Es by Es /2 in (1.82) delivers " √ , √ M − 1 3Es /N0 · (1 − α) with α = . 2 EH Ps M -ASK (h) = 2 √ 2(M − 1) + 3Es /N0 M
32
INTRODUCTION TO DIGITAL COMMUNICATIONS 10 10
Ps →
10 10 10 10 10
0
M =4 M = 16 M = 64
−1 −2 −3 −4 −5 −6
0
10
20 30 Eb /N0 in dB →
40
Figure 1.20 Symbol error probabilities for M-QAM and transmission over an AWGN channel (—) and a frequency-nonselective Rayleigh fading channel (exact solution - - -, approximations - . -) Calculating the expectation of the squared term requires the relationship + * 11 22 1 π/4 x2 d erfc(x) = exp − 2 2 π 0 sin
(1.88)
given in (Simon and Alouini 2000). Exploiting (1.88) provides the closed-form solution !√ #2 * & '+ M −1 4 −1 1 √ · 1 − α tan . π α M The combination of both parts finally leads to the solution !√ #2 * √ / 1 0+ M −1 M −1 4 QAM · (1 − α) − =2 √ √ · 1 − α · tan−1 . Ps π α M M Again, an upper bound is obtained by dropping the second part of (1.89) " ! # √ M −1 3Es /N0 QAM · 1− <2 √ . Ps 2(M − 1) + 3Es /N0 M
(1.89)
(1.90)
As illustrated in Figure 1.20, a small gap between the exact solution and its approximation remains over the whole SNR range. Again, a higher spectral efficiency is obtained with growing M at the expense of a larger error rate. Outage Probability The outage probability can be numerically evaluated by exploiting (1.86a) in order to determine the relation between γt and Pt . The obtained thresholds γt can be inserted into (1.60) leading to the results depicted in Figure 1.21. Pout increases as expected with growing M. For an outage probability of 1%, SNRs of 24.3 dB, 25.5 dB and 28.4 dB are required for 4-QAM, 16-QAM and 64-QAM, respectively.
INTRODUCTION TO DIGITAL COMMUNICATIONS
33
0
10
M=4 M = 16 M = 64
−1
Pout →
10
−2
10
−3
10
−4
10
0
10
20 30 Eb /N0 in dB →
40
Figure 1.21 Outage probability for M-QAM and transmission over a frequencynonselective Rayleigh fading channel, Pt = 10−3
1.4.4 Phase Shift Keying (PSK)
√ For M-ary PSK, M symbols are arranged on a circle with radius Es /Ts resulting in a ˜ constant symbol energy. The bits within the m-tuples b[k] determine the symbols’ phases that are generally multiples of 2π/M. Alternatively, an offset of π/M can be chosen as shown in Figure 1.22 for Quaternary Phase Shift Keying (QPSK) and 8-PSK. Binary Phase Shift Keying (BPSK) for M = 2 and QPSK for M = 4 represent special cases, because they can be assigned to the class of amplitude modulation schemes, too (2-ASK and 4-QAM, respectively). For M > 4, real and imaginary parts are not independent from each other and have to be detected simultaneously. If the symbols are numbered counterclockwise, the normalized squared Euclidean distance between two symbols Xµ and Xν is 2µ,ν =
2-ASK, 2-PSK (BPSK) Im
/ π0 |Xµ − Xν |2 . = 4 sin2 (µ − ν) Es /Ts M
4-QAM, 4-PSK (QPSK) Im
(1.91)
8-PSK Im √
√
√ Es /Ts
Re
Es /Ts
Es /Ts
Re
Figure 1.22 Symbol alphabets of digital phase modulation
Re
34
INTRODUCTION TO DIGITAL COMMUNICATIONS
The exact symbol error probability can generally not be expressed in closed form. Except for some special cases, it has to be calculated by numerical integration or approximated by the union bound, cf. (1.45). In the following part, the focus is again on a coherent reception for the AWGN and the flat Rayleigh fading channel. Within the GSM extension EDGE (Enhanced Data Rates for GSM Evolution) (Olofsson and Furusk¨ar 1998; Schramm et al. 1998) 8-PSK is used in contrast to Gaussian Minimum Shift Keying (GMSK) as in standard GSM systems. This enlarges the data rate significantly, since three bits are transmitted per symbol compared to only a single bit for GMSK. Performance for AWGN Channel Since BPSK with M = 2 is a special case of ASK, the error probability can be calculated with (1.81a) leading to !" # Es 1 PsBPSK = · erfc . (1.92) 2 N0 QPSK can be interpreted as two parallel BPSK schemes with the same Eb /N0 (compare ASK and QAM). Therefore, the bit error probabilities are identical !" # !" # E Es 1 1 b QPSK = · erfc Pb = · erfc (1.93) 2 N0 2 2N0 while the symbol error probability for QPSK (following the same argumentation as for QAM) is upper bounded by !" #42 !" # 3 !" # 1 Eb Eb Eb QPSK Ps · erfc = erfc < erfc − . (1.94) N0 2 N0 N0 In relation to M > 4, closed-form expressions are not available and the union bound can be applied. Substituting (1.91) into (1.51) yields the upper bound " # ! M−1 Es 1 erfc sin(µπ/M) · . (1.95) PsPSK < 2 N0 µ=1
A simple approximation stems from the fact that the error probability is dominated √ by those decisions that mix up neighboring symbols with a Euclidean distance 2 sin(π/M) Es /Ts . Inserting (1.91) for |µ − ν| = 1 into (1.49), and taking into account that each symbol has two competing neighbors with equal error probability, we obtain with (1.92) a tight approximation (Kammeyer 2004; Proakis 2001) " # " ! ! # Es Eb PSK ≈ erfc sin(π/M) · Ps = erfc sin(π/M) · m . (1.96) N0 N0 The exact solution with arbitrary accuracy can be obtained by numerically solving the integral (Craig 1991) 3 4 (M−1)/Mπ Eb 1 sin2 (π/M) PSK ·m = · exp − dθ. (1.97) Ps π N0 sin2 (θ ) 0
INTRODUCTION TO DIGITAL COMMUNICATIONS
35
Performance for Flat Rayleigh Fading Channel For Rayleigh fading channels, the closed-form expression * + 1 1 M −1 PsM -PSK = −γ + tan−1 γ cot(π/M) M 2 π "
with γ =
(1.98)
sin2 (π/M)Es /N0 1 + sin2 (π/M)Es /N0
can be found in (Simon and Alouini 2000). For BPSK with M = 2, (1.98) reduces to the well-known form " 3 4 Es /N0 1 BPSK Ps = · 1− , (1.99) 2 1 + Es /N0 and for QPSK (M = 4) PsQPSK
" 1 3 = − 4 π
! " # Es /N0 Es /N0 −1 · cot − 2 + Es /N0 2 + Es /N0
(1.100)
is obtained (Proakis 2001). As before, a simple approximation can be found by considering only the smallest Euclidean distance leading to " sin2 (π/M)Es /N0 . (1.101) PsPSK ≈ 1 − 1 + sin2 (π/M)Es /N0 The corresponding results are illustrated in Figure 1.23. Since the error probabilities obtained by (1.96) and (1.97) are nearly identical for M > 4 and differ only at very low SNRs below 0 dB, only those obtained by numerical integration are shown. The same holds 10 10
Ps →
10 10 10 10 10
0
M M M M
−1 −2
=2 =4 =8 = 16
−3 −4 −5 −6
0
10
20 30 Eb /N0 in dB →
40
Figure 1.23 Symbol error probabilities for M-PSK and transmission over an AWGN channel (—) and a frequency-nonselective Rayleigh fading channel (- - -)
36
INTRODUCTION TO DIGITAL COMMUNICATIONS 10
Pout →
10
10
10
10
0
M M M M M
−1
=2 =4 =8 = 16 = 32
−2
−3
−4
0
10
20 30 Eb /N0 in dB →
40
Figure 1.24 Outage probability Pout for M-PSK, a frequency-nonselective Rayleigh fading channel and a target symbol error rate Pt = 10−3
for (1.98) and (1.101) for the Rayleigh fading channel. Therefore, the simple approximation looking only at the nearest neighbors should be preferred. Outage Probability In the same way as described for M-QAM, (1.97) can be used to determine γt for a specified target error rate Pt . Figure 1.24 shows the corresponding results obtained by inserting γt into (1.60). As expected, Pout increases with growing M.
1.5 Diversity 1.5.1 General Concept The previous section illustrated the influence of flat fading channels on the error rate performance. The instantaneous SNR γ [k] = |h[k]|2 · Es /N0 at the receiver’s input is a random variable, according to the statistics of the current channel coefficient h[k]. Low SNRs caused by deep fades cannot be compensated by good channel states, resulting in a significantly increased error rate. In order to overcome or at least lower the fading’s influence, the probability of deep fades has to be reduced. This can be accomplished by diversity concepts where several replicas of a signal x[k] are transmitted over different (frequency-nonselective) channels h [k] 2 with individual powers σH, , 1 ≤ ≤ D. In order to perform fair comparisons between systems with different diversity degrees, the total transmitted energy is fixed and equally distributed onto the channels. According to Figure 1.25, the received signal can be expressed with x[k] y [k] = h [k] · √ + n [k] D
⇔
x[k] y[k] = √ · h[k] + n[k] D
(1.102)
INTRODUCTION TO DIGITAL COMMUNICATIONS
37
n1 [k] y1 [k]
h1 [k] n2 [k] x[k]
h2 [k]
y2 [k]
multi-channel receiver
√1 D
hD [k]
x[k] ˆ
nD [k] yD [k]
Figure 1.25 Illustration of D-fold diversity reception where y[k], h[k] and n[k] are vectors comprising D received symbols y [k], D channel coefficients h [k], and D statistically independent noise samples n [k], respectively. The described scenario may be obtained by using D receive antennas and one transmit antenna. For mutually independent channels, the probability that all |h [k]|2 are simultaneously very small is much lower than the probability for a single channel. Therefore, a communication system should be designed in a way to exploit as much diversity as possible. If the transmission channel itself provides diversity, this can be used by appropriate receiver structures. However, for nondiversity channels, it is possible to artificially introduce diversity into the system. There are several sources that diversity can originate from. • Frequency Diversity: If the channel behaves frequency-selective (cf. Section 1.2), its transfer function influences different parts of the signal’s spectrum diversely. Hence, diversity is obtained in the frequency domain that can be exploited by appropriate receiver structures. For CDMA systems, the Rake receiver (cf. Subsection 4.1.1 on page 178) exploits frequency diversity by combining different propagation paths being separable in time (Proakis 2001). In coded OFDM systems, the decoding process averages over carriers associated with different channel coefficients (Dekorsy et al. 1999a). Even conventional equalizers like the Viterbi equalizer or a linear FIR filter exploit frequency diversity (not the decision feedback equalizer). Finally, frequency diversity can be artificially introduced by operating with different carriers separated by at least the coherence bandwidth of the channel. • Time Diversity: If the channel varies in time, the application of FEC (Forward Error Correction in Chapter 3) coding yields diversity, provided that a code word or a coded sequence is longer than the coherence time of the channel. In this case, decoding performs a kind of averaging over good and bad channel states. • Space Diversity: Recently, systems using multiple antennas at transmitter or receiver gained much interest. For antenna separations larger than several wavelengths, the channels can
38
INTRODUCTION TO DIGITAL COMMUNICATIONS be assumed to be independent, so that diversity is obtained even for frequencynonselective and quasi-static channels. Hence, the space is used to artificially enhance the diversity degree (see Chapter 6) (Fischer et al. 2001b). • Polarization Diversity: If antennas support different polarizations, this can be used for polarization diversity. However, we will restrict ourselves to the first three cases in this context will be considered.
The diversity degree highly depends on the correlation among the contributing channels and their power distribution. The highest degree will be obtained by statistically independent channels with equal average power. Further analysis will be presented later in the chapter. Combining Methods Receiving several replicas of the same signal requires some kind of combining in order to obtain a single representation of the desired symbol. There are different combining methods depending on the level of channel knowledge at the receiver. • Maximum Ratio Combining (MRC): Maximum ratio combining achieves the maximum signal to noise ratio at the receiver’s output by weighting each received replica yj [k] by the corresponding complex conjugate channel coefficient h∗j [k] and successive summation versus j . Therefore, this method requires the knowledge of amplitudes and phases of all involved channels, and requires a scanning and tracking for all components. Due to this knowledge, MRC is not restricted to PSK but is also applicable for multiamplitude signals like QAM. However, it is sensitive to channel estimation errors. • Equal Gain Combining (EGC): With equal gain combining, only the phase rotations for each yj [k] are compensated, the magnitudes remain unchanged. This method only requires the phases of all channel coefficients, not the magnitudes. However, due to missing knowledge of the channels’ magnitudes, this technique is not suited for ASK and QAM modulation and it performs worse than MRC. • Square Law Combining: If the channel is highly time varying and its phase cannot be estimated accurately, square law combining of orthogonally modulated signals is an appropriate method to exploit diversity. Here, the squared magnitudes of the received signals are simply added, resulting in a noncoherent receiver. This technique can only be applied for orthogonal signaling schemes (Kammeyer 2004; Proakis 2001). • Selection Combining: Selection combining represents the simplest combining method because it only selects a subset of all replicas for further processing and neglects all the remaining signals. This reduces the computational costs and may even lead to a better performance than MRC, because channels with very low SNR cannot be accurately estimated and contribute much noise.
INTRODUCTION TO DIGITAL COMMUNICATIONS
39
The focus in the following part is on maximum ratio combining as the optimum strategy with respect to the error rate performance for perfect channel knowledge. Maximum Ratio Combining The optimum solution for a detection problem concerning a system as described in (1.102) is obtained by applying an ML detector 5 52 5 5 x˜ 5 (1.103) x[k] ˆ = argmax log pY|x˜ (y[k]) = argmin 5y[k] − √ h[k]5 5 . D x˜ x˜ Setting the partial derivation of (1.103) with respect to x˜ ∗ to zero and applying the Wirtinger calculus ∂x/∂x ∗ = 0 leads to the optimum receiver structure √ D · h[k]. (1.104) w= h[k]2 A comparison of (1.104) with the results of Section 1.3.4 illustrates that the matched filter has been essentially obtained, except for a slight difference concerning the scaling.8 Applying (1.104) to (1.102) yields the output of the filter √ & ' D D x[k] H ∗ + n x[k] ˜ = w · y[k] = · h [k] · h [k] [k] √ h[k]2 D =1 √ D D = x[k] + · h∗ [k] · n [k]. (1.105) h[k]2 =1
As described above, the received replicas are weighted with the complex conjugate channel coefficients and summed. This technique is called Maximum Ratio Combining (MRC). The resulting SNR can be determined by calculating the signal power as well as the noise power at the filter output. Assuming that the noise samples are mutually independent, that is, EN {ni [k]n∗j =i [k]} = 0 holds, the contributing noise power becomes D D D ∗ ∗ · E h [k]n [k] · h [k]n [k] j N j h[k]4 j =1
=1
=
D D D · h∗ [k]hj [k] · EN n [k]n∗j [k] 4 h[k] =1 j =1
=
D D D · |h [k]|2 · σN2 = · σ2. 4 h[k] h[k]2 N
(1.106)
=1
With EX {|x|2 } = σX2 the SNR can be expressed by γ [k] = 8 The
D D σX2 1 Es 1 2 2 h[k] · · · = · |h [k]| = γ [k]. N0 D σN2 D =1 =1
(1.107)
optimum scaling derived here is necessary for multiamplitude modulation in order to apply the correct √ decision thresholds. For BPSK, the scalar weighting with D/h[k]2 can be neglected.
40
INTRODUCTION TO DIGITAL COMMUNICATIONS
Hence, the ‘global’ SNR γ [k] is obtained by simply summing the ‘local’ ratios γ [k] each with a mean γ = σH2 Es /N0 /D.
1.5.2 MRC for Independent Diversity Branches If the channel coefficients in all branches are mutually independent and identically distributed (i.i.d.), γ [k] is chi-squared distributed with 2D degrees of freedom (Bronstein et al. 2000; Simon and Alouini 2000) pγ (ξ ) =
D ξ D−1 D D ξ D−1 − Esξ/N −ξ/γ 0 · e = · e (D − 1)!(Es /N0 )D (D − 1)! · γ D
(1.108)
instead of two degrees of freedom for the ‘local’ ratios γ [k] (cf. Section 1.2). The second equality in (1.108) holds for nondissipative channels with σH2 = 1. Numerical results are shown in Figure 1.26. With growing diversity degree D, the instantaneous SNR increasingly concentrates to Es /N0 and the SNR variations (influence of the fading) become smaller. For D → ∞, the AWGN channel without any fading is obtained. Outage Probability In order to analyze the outage probability for MRC i.i.d. diversity paths, the distribution given in (1.108) has to be integrated according to (1.60) γt Pout =
pγ (ξ ) dξ.
(1.109)
0
However, γ is now chi-squared distributed with 2D degrees of freedom. Hence, an analytical solution is hard to obtain and the easiest way is to perform the integration in (1.109) 5
D D D D D D
pγ (ξ ) →
4 3
=1 =2 =4 = 10 = 20 = 100
2 1 0 0
0.5
1
1.5 ξ Es /N0
2
2.5
3
→
Figure 1.26 Probability density functions of normalized γ [k] for different diversity degrees D and i.i.d. diversity channels
INTRODUCTION TO DIGITAL COMMUNICATIONS a) Pt = 10−3
10
−1
Pout →
=1 =2 =4 = 10 = 20 = 100
D D D D D D
10
−2
10
−3
−2
10
= 10−1 = 10−2 = 10−3 = 10−4 = 10−5
−4
10
10
−4
0
Pt Pt Pt Pt Pt
−6
10
10
b) 10 log10 (Eb /N0 ) = 12 dB
0
10
Pout →
0
41
−8
10 20 30 Eb /N0 in dB →
40
10
0
10
1
2
10
10
3
10
D→
Figure 1.27 Outage probabilities for BPSK and i.i.d. Rayleigh fading channels a) a target error rate of Pt = 10−3 and b) 10 log10 (Eb /N0 ) = 12 dB
numerically. For BPSK, Figure 1.27 shows the corresponding results. The left diagram shows Pout versus Eb /N0 for a fixed target error rate of Pt = 10−3 . Obviously, Pout decreases with growing SNR as well as with increasing diversity degree D. At very small values Es /N0 , we observe the astonishing effect that high diversity degrees provide a worse performance. This can be explained by the fact that the variations of γ [k] become very small for large D. As a consequence, instantaneous SNRs lying above the average Es /N0 occur less frequently than for low D resulting in the described effect. Figure 1.27b shows the results for a fixed 10 log10 (Eb /N0 ) = 12 dB. Obviously, diversity can remarkably reduce the outage probability. Ergodic Error Probability for MRC Equivalent to the derivations in Section 1.4, the expectation of the instantaneous error probability with respect to γ [k] has to be determined. An exact solution for BPSK and i.i.d. diversity branches can be found in (Proakis 2001) " ' & ' D−1 & ' & 1+µ γ 1−µ D D−1+ · . (1.110) · with µ = Ps = 2 2 1+γ =0
The parameter γ denotes the average SNR in each branch. In order to get a better illustration of (1.110), we derive a simple approximation for large SNR, that is, γ 1. In this case, the application of a Taylor series expansion shows that (1 + µ)/2 ≈ 1 and (1 − µ)/2 ≈ 1/(4γ ) hold. Therefore, with the relation (Proakis 2001) D−1 & =0
' & ' D−1+ 2D − 1 = D
(1.111)
42
INTRODUCTION TO DIGITAL COMMUNICATIONS
the symbol error rate can be approximated for large SNRs by 'D & & 'D & ' & ' 1 2D − 1 D 2D − 1 Ps ≈ · = · . 4γ 4Es /N0 D D
(1.112)
Obviously, Ps is proportional to the D-th power of the reciprocal of the signal to noise ratio. Since error rate curves are scaled logarithmically, their slope will be dominated by the diversity degree D at high SNRs. This will be demonstrated later in this section. The expression given in (1.110) becomes computationally expensive and numerically difficult for high diversity degrees. A more convenient approach, especially for unequal power distributions, can be obtained by exploiting the alternative representation of the complementary error function already introduced in (1.53). This method can easily be applied to arbitrary modulation schemes and unequal power distributions. Moreover, it can be modified in order to also cope with the case of correlated diversity branches. We start with the statistically independent diversity branches. As a consequence, the joint density pγ1 ···γD (ξ1 · · · ξD ) can be factorized into marginal densities pγ (ξ ). This results in Ps = Eγ {Ps (γ )} = Eγ1 ···γD {Ps (γ1 + · · · + γD )} ∞ =
∞ ···
0
Ps (γ1 + · · · + γD ) · pγ1 (ξ1 ) · · · pγD (ξD ) dξ1 · · · dξD
(1.113)
0
where the integrals in (1.113) can be solved separately. Resuming the last section and exploiting the alternative representation of the complementary error function in (1.53), it turns out that the following unique, parameterized expression for all linear modulation schemes can be found.9 b Ps (γ ) = a · 0
+ * c·γ dθ exp − 2 sin (θ )
The parameters a, b and c are defined as a = π2 · M−1 M M-ASK : b = π2 c = M 23−1 √ a = π4 · √M−1 M M-QAM : b = π2 3 c = 2(M−1) a = π1 M-PSK : b = π · M−1 M c = sin2 (π/M). 9 For
M-QAM, (1.114) represents only an approximation because the quadratic term is neglected.
(1.114)
(1.115a)
(1.115b)
(1.115c)
INTRODUCTION TO DIGITAL COMMUNICATIONS
43
Inserting (1.114) into (1.113) and reversing the order of integration leads to ∞ Ps = 0
=a
+ * ∞ b c · (ξ1 + · · · ξD ) dθ · pγ1 (ξ1 ) · · · pγD (ξD ) dξ1 · · · dξD · · · a exp − sin2 (θ ) 0
b 7 D ∞ 0 =1 0
0
+ * c · ξ · pγ (ξ ) dξ dθ. exp − 2 sin (θ )
(1.116)
From the derivations in Section 1.3 it is already known that the substitution s = −c/ sin2 (θ ) in the inner integrals of (1.116) leads to the MGFs ∞ Mγ (s) =
pγ (ξ ) · esξ dξ
(1.117)
0
of the random processes γ (Papoulis 1965; Simon and Alouini 2000). Therefore, as an important intermediate result, the ergodic error probability can be calculated by numerically solving Ps = a
b 7 D 0 =1
& Mγ
−c sin2 (θ )
' dθ = a
b *
& Mγ
i.i.d.
0
−c sin2 (θ )
'+D dθ
(1.118)
where the second equality in (1.118) holds for identically distributed branches. Ergodic Error Probability for Rayleigh Fading Channel Inserting the MGF in (1.56) for Rayleigh fading channels with chi-squared distributed γ results in the following expression for D-fold diversity and statistically independent channels #D b 7 b ! D D sin2 (θ ) sin2 (θ ) Ps = a dθ = a dθ. (1.119) sin2 (θ ) + cγ i.i.d. D sin2 (θ ) + cEs /N0 =1 0
0
Due to the agreement that the transmit power is uniformly distributed onto all diversity paths, the average SNR per path for identically distributed branches is γ = Es /N0 /D leading to the second equality in (1.119). For Ricean fading, the MGF of (1.58) has to be inserted. Figure 1.28 shows the results for BPSK and equally distributed branches. As already explained, with increasing diversity degree D, the error rates decrease due to smaller variations of the global SNR. In the limit for D → ∞, the performance of an AWGN channel is obtained. This asymptotic case is normally not of practical interest because channel estimation becomes a critical task due to local SNRs that tend to zero when D goes to infinity. The largest gains are obtained at high SNRs where the slopes of the curves are determined by D. For illustration, the dashed lines represent the results obtained with (1.112). It can be seen that this approximation is quite tight for high SNR and low diversity
44
INTRODUCTION TO DIGITAL COMMUNICATIONS 0
10
D=1 D=2 D=4 D = 10 D = 20 D = 100 AWGN
−2
10
Ps →
−4
10
−6
10
−8
10
−10
10
0
10
20 30 Eb /N0 in dB →
40
Figure 1.28 Symbol error probabilities for the transmission of BPSK over i.i.d. Rayleigh fading channels with different diversity degrees D, dashed lines for asymptotic approximation with (1.112)
10
Ps →
10 10 10 10 10
0
BPSK QPSK 8-PSK 16-PSK 16-QAM
−2
−4
−6
−8
−10 0
10
1
10
2
D→
10
3
10
Figure 1.29 Symbol error probabilities versus diversity degree D for different modulation schemes at 10 log10 (Eb /N0 ) = 12 dB for i.i.d. Rayleigh fading channels degrees. However, for low SNR and especially for high D it becomes quite loose. For D > 10, the bound is only tight for error rates that are beyond the scope of practical systems. Figure 1.29 illustrates the symbol error rates for a fixed 10 log10 (Eb /N0 ) = 12 dB for different modulation schemes. For D = 100, the AWGN performance is nearly approached. However, the largest gain is already obtained for D < 100, especially when the error rate for an AWGN channel is quite high.
INTRODUCTION TO DIGITAL COMMUNICATIONS a) K = 10
0
0
10
45
b) 10 log10 (Eb /N0 ) = 12 dB
10
D D D D D
−2
10
−2
10
Ps →
Ps →
−4
10
−6
10
=1 =2 =4 = 10 = 20
−4
10
−6
10
−8
10
−10
10
0
−8
10 20 30 Eb /N0 in dB →
40
10 −1 10
0
10
1
2
10 10 K→
3
10
Figure 1.30 Symbol error probabilities for i.i.d. Rayleigh and Rice fading channels, a) versus Eb /N0 for K = 10 (- - - Rice, — Rayleigh), b) versus K at 10 log10 (Eb /N0 ) = 12 dB Ergodic Error Probability for Ricean Fading Channel Scenarios in which the transmit and receive antennas have a direct LoS connection suffer less from fading. However, even in these cases, diversity improves the system performance. The improvement naturally depends on the Rice factors of the contributing transmission paths. In Appendix A.3 the MGF for Ricean fading is derived * + sKP K +1 . (1.120) · exp M|H|2 (s) = K + 1 − sP (K + 1) − sP Inserting (1.120) into (1.118) and applying the substitution s = −c/ sin2 (θ ) allows a numerical computation of the ergodic error probability. Figure 1.30 shows the error rate performance of a system with D i.i.d. Rice fading channels for different Rice factors K. The average power was fixed to P = 1 leading to different variances σH2 for different Rice factors K. From the left diagram it can be seen that – as expected – the Rice fading channel outperforms the Rayleigh fading channel for all D and a strong LoS part with K = 10. For D = 10 the AWGN curve can be approached for Ricean fading while the Rayleigh fading channel loses more than 5 dB. From Figure 1.30b it becomes obvious that the diversity gain decreases with growing K. For small K, that is, weak LoS components, the diversity gains known from Rayleigh fading are possible. For K → ∞, the performance of the AWGN channel is reached even without diversity. Hence, diversity concepts are only an appropriate mean in severe fading environments. Unequal Power Distributions If the power is not equally distributed among the different transmission paths, the first expression in (1.119) with different γ , has to be used. Without loss of generality, it can be
46
INTRODUCTION TO DIGITAL COMMUNICATIONS 0
10
i.i.d. linear decay exp. decay
−2
10
Ps →
δ = 100 −4
δ = 10
10
δ=5
−6
10
−8
10
0
1
10
10
2
D→
3
10
10
Figure 1.31 Symbol error probabilities versus diversity degree D for BPSK and different power profiles at 10 log10 (Eb /N0 ) = 12 dB
assumed that the paths are arranged in ascending order with respect to their average power σH2 and that the average total power is restricted to |h |2 = D resulting in an overall average SNR D D Es 2 Es γ¯ = γ¯ = · σH = (1.121) DN0 N0 =1
=1
after MRC.10 For a linear increase of σH2 , we obtain σH2 = δ · with the slope δ = 2/ (D + 1). The average SNRs of the contributing channels become γ =
2 Es Es /N0 · · σH2 = DN0 D D+1
with
1 ≤ ≤ D.
(1.122)
For an exponential growth, the SNRs are determined by γ = C ·
Es · exp(δ/D) DN0
with 1 ≤ ≤ D
(1.123)
with δ > 0 and C as a normalization factor. Figure 1.31 shows the results for BPSK. The best performance is obtained for i.i.d. channels. A linear decay leads to a slight degradation, while the exponential decays with δ ≥ 5 lose considerably. The reason is that, with growing δ, the power is concentrated on a single fading channel and, thus, thwarting the diversity effect. konterkariert At an error rate of Ps = 10−4 , the exponential profile with δ = 10 needs D = 20 paths for obtaining an equivalent diversity degree as in the case of i.i.d. branches with D = 5. For δ = 100, approximately 200 paths are required. 10 This
assumption coincides with the i.i.d. case where σH2 ≡ 1 holds.
INTRODUCTION TO DIGITAL COMMUNICATIONS
47
1.5.3 MRC for Correlated Diversity Branches If the different diversity branches are mutually dependent, the factorization of the probability density in (1.113) cannot be performed anymore and solving the integral becomes a difficult task. Generally, the correlations between different channels can be expressed by a correlation matrix σH2 1 σH1 σH2 ρ1,2 · · · σH1 σHD ρ1,D σH1 σH2 ρ2,1 σH2 2 · · · σH2 σHD ρ2,D H H = (1.124) .. .. .. .. . . . . σH2 D σH1 σHD ρD,1 σH2 σHD ρD,2 · · · whose elements are µ,ν = EH {hµ h∗ν } = σHµ σHν ρµ,ν . The coefficient ρµ,ν describes the correlation between the channel coefficients hµ and hν normalized on σHµ σHν . Due to ρµ,µ = 1, the main diagonal consists of the powers σH2 µ of the different channels. A solution of the integral in (1.113) can be derived by the well-known Karhunen-Lo`eve transformation (Kammeyer 2004; Mertins 1999). It linearly transforms the vector h with correlated elements hµ into a vector w with uncorrelated elements wµ . Since the channel coefficients are assumed to be complex Gaussian distributed, they are also statistically independent. The matrix U = [u1 · · · uD ] describing this linear transformation simply consists of the eigenvectors uµ of H H . Therefore, we only have to solve the eigenvalue problem H H uµ = λµ uµ
⇒
det(H H − λµ ID ) = 0
(1.125)
delivering the eigenvalue decomposition H H = UUH . The diagonal matrix contains the eigenvalues λµ of H H . Since U is a unitary matrix, the transformation is performed by (1.126) h=U·w ⇒ w = U−1 · h = UH · h resulting in E{wwH } = EH {UH hhH U} = UH UUH U = . Hence, we obtain D independent equivalent channels with coefficients wµ and average powers λµ . For these equivalent channels, (1.118) can be directly applied again by inserting the corresponding MGFs. For Rayleigh fading channels, we obtain the expression Ps = a
b 7 D
sin2 (θ )
sin (θ ) + c · γ =1 2
0
dθ
with
γ = λ ·
Es . DN0
(1.127)
Obviously, only nonzero eigenvalues contribute to the product in (1.127) and the effective diversity degree becomes smaller for correlated channels. Constant Correlations between Equal Power Paths In the following part, the influence of correlations between different paths will be illustrated for a specific example. Assuming that all channels have the same average power, that is, σH2 = σH2 1 = · · · = σH2 D = 1, and that the correlation between the pairs of diversity branches is constant for all pairs, that is, ρµ,ν = ρ for all µ = ν. Hence, the correlation matrix
48
INTRODUCTION TO DIGITAL COMMUNICATIONS
becomes
H H = σH2 ·
1 ρ .. .
ρ 1 .. .
ρ
ρ
··· ρ ··· ρ . .. . .. ··· 1
(1.128)
and the argument of the determinant in (1.125) a b b b a b .. . b
b
has the form ··· b ··· b .. . ··· b a
whose determinant is (a − b)D−1 [a + b(D − 1)] (Simon and Alouini 2000). Substituting a = σH2 − λ and b = σH2 ρ yields the expression (σH2 − λ − σH2 ρ)D−1 [σH2 − λ + σH2 ρ(D − 1)] = 0. The solutions are the (D − 1)-fold zero λ1 = σH2 (1 − ρ) and the single zero λ2 = σH2 (1 + ρ(D − 1)). Inserting these eigenvalues into (1.127) delivers for σH2 = 1 the ergodic error probability b 1 1 Ps = a / dθ. (1.129) 0D−1 cEs /N0 (1+ρ(D−1)) 1+ 0 (1−ρ) 1 + cEDs /N D sin2 (θ ) 2 0 sin (θ) A check on plausibility shows the well-known result of i.i.d. diversity branches for ρ = 0, and the nondiversity case (D = 1) for ρ = 1 (total correlation). Figure 1.32 illustrates the corresponding results. Similar to unequal power distributions among the contributing
10
Ps →
10
10
10
10
0
i.i.d. ρ = 0.2 ρ = 0.4 ρ = 0.6 ρ = 0.8 ρ = 1.0
−2
−4
−6
−8 0
10
1
10
2
D→
10
3
10
Figure 1.32 Symbol error probabilities versus diversity degree D for BPSK, identically distributed channels and varying correlation coefficient ρ at 10 log10 (Eb /N0 ) = 12 dB
INTRODUCTION TO DIGITAL COMMUNICATIONS
49
paths, correlation decreases the benefits of diversity. For ρ = 0.4, the error rate hardly falls below 10−6 even for D = 1000. This error rate is already achieved with D = 19 for i.i.d. channels, that is, the effective diversity degree is only 19. For ρ = 1, that is, totally correlated channels, no diversity gain can be exploited.
1.6 Summary This chapter introduced the basics of digital communications over wireless channels. After presenting the general system model, the main characteristics of mobile radio channels were explained. Since they can be described by stochastic processes, their statistical properties like delay spread and coherence bandwidth, as well as Doppler bandwidth and coherence time have been derived from the power delay profile and the Doppler power spectrum, respectively. The relation of these quantities to the parameters bandwidth and symbol duration of the transmitted signal determine whether a channel behaves time or frequency selective. The description of a channel model in the equivalent baseband and the extension to MIMO channels with multiple inputs and outputs closes this section. The next two sections present detection principles and the most important linear modulation schemes, including a performance analysis. With increasing alphabet size, both spectral efficiency and error probabilities increase so that a trade-off has to be found. Finally, the concept of diversity was discussed and the significant benefits have been illustrated. Several combining techniques have been mentioned, whereas the main focus was on maximum ratio combining. Essentially, diversity reduces the variations of the SNR at the output of the combiner, leading to a smaller variance of the instantaneous error probability.
2
Information Theory This section briefly introduces Shannon’s information theory, which was founded in 1948 and represents the basis for all communication systems. Although this theory is used only with respect to communication systems, it can be applied in a much broader context, for example, for the analysis of stock markets (Sloane and Wyner 1993). Furthermore, emphasis is on the channel coding theorem and source coding and cryptography are not addressed. The channel coding theorem delivers ultimate bounds on the efficiency of communication systems. Hence, we can evaluate the performance of practical systems as well as encoding and decoding algorithms. However, the theorem is not constructive in the sense that it shows us how to design good codes. Nevertheless, practical codes have already been found that approach the limits predicted by Shannon (ten Brink 2000b). This chapter, starts with some definitions concerning information, entropy, and redundancy for scalars as well as vectors. On the basis of these definitions, Shannon’s channel coding theorem with channel capacity, Gallager exponent, and cutoff rate will be presented. The meaning of these quantities is illustrated for the Additive White Gaussian Noise (AWGN) and flat fading channels. Next, the general method to calculate capacity will be extended to vector channels with multiple inputs and outputs. Finally, some information on the theoretical aspects of multiuser systems are explained.
2.1 Basic Definitions 2.1.1 Information, Redundancy, and Entropy In order to obtain a tool for evaluating communication systems, the term information must be mathematically defined and quantified. A random process X that can take on values out of a finite alphabet X consisting of elements Xµ with probabilities Pr{Xµ } is assumed. By intuition, the information I (Xµ ) of a symbol Xµ should fulfill the following conditions. 1. The information of an event is always nonnegative, that is, I (Xµ ) ≥ 0. Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
52
INFORMATION THEORY 2. The information of an event Xµ depends on its probability, that is, I (Xµ ) = f (Pr{Xµ }). Additionally, the information of a rare event should be larger than that of a frequently occurring event. 3. For statistically independent events Xµ and Xν with Pr{Xµ , Xν } = Pr{Xµ } Pr{Xν }, the common information of both events should be the sum of the individual contents, that is, I (Xµ , Xν ) = I (Xµ ) + I (Xν ).
Combining conditions two and three leads to the relation f (Pr{Xµ } · Pr{Xν }) = f (Pr{Xµ }) + f (Pr{Xν }). The only function that fulfills this condition is the logarithm. Taking care of I (Xµ ) ≥ 0, the information of an event or a symbol Xµ is defined by (Shannon 1948) I (Xµ ) = log2
1 = − log2 Pr{Xµ }. Pr{Xµ }
(2.1)
Since digital communication systems are based on the binary representation of symbols, the logarithm to base 2 is generally used and I (Xµ ) is measured in bits. However, different definitions exist using for example, the natural logarithm (nat) or the logarithm to base 10 (Hartley). The average information of the process X is called entropy and is defined by I¯(X ) = EX {I (Xµ )} = − Pr{Xµ } · log2 Pr{Xµ }. (2.2) µ
It can be shown that the entropy becomes maximum for equally probable symbols Xµ . In this case, the entropy of an alphabet consisting of 2k elements equals 2−k · log2 2k = log2 |X| = k bit. (2.3) I¯max (X ) = µ
Generally, 0 ≤ I¯(X ) ≤ log2 |X| holds. For an alphabet consisting of only two elements with probabilities Pr{X1 } = Pe and Pr{X2 } = 1 − Pe , we obtain the binary entropy function I¯2 (Pe ) = −Pe · log2 (Pe ) − (1 − Pe ) · log2 (1 − Pe ).
(2.4)
This is depicted in Figure 2.1. Obviously, the entropy reaches its maximum I¯max = 1 bit for the highest uncertainty at Pr{X1 } = Pr{X2 } = Pe = 0.5. It is zero for Pe = 0 and Pe = 1 because the symbols are already a priori known and do not contain any information. Moreover, entropy is a concave function with respect to Pe . This is a very important property that also holds for more than two variables. A practical interpretation of the entropy can be obtained from the rate distortion theory (Cover and Thomas 1991). It states that the minimum average number of bits required for representing the events x of a process X without losing information is exactly its entropy I¯(X ). Encoding schemes that use less bits cause distortions. Finding powerful schemes that need as few bits as possible to represent a random variable is generally nontrivial and
INFORMATION THEORY
53
1
I¯2 (Pe ) →
0.8 0.6 0.4 0.2 0 0
0.2
0.4 0.6 Pe →
0.8
1
Figure 2.1 Binary entropy function subject to source or entropy coding. The difference between the average number m ¯ of bits a particular entropy encoder needs and the entropy is called redundancy R=m ¯ − I¯(X );
r=
m ¯ − I¯(X ) . I¯(X )
(2.5)
In (2.5), R and r denote the absolute and the relative redundancy, respectively. Well-known examples are the Huffmann and Fanø codes, run-length codes and Lempel-Ziv codes (Bell et al. 1990; Viterbi and Omura 1979; Ziv and Lempel 1977).
2.1.2 Conditional, Joint and Mutual Information Since the scope of this work is the communication between two or more subscribers, at least two processes X and Y with symbols Xµ ∈ X and Yν ∈ Y, respectively have to be considered. The first process represents the transmitted data, the second the corresponding received symbols. For the moment, the channel is supposed to have discrete input and output symbols and it can be statistically described by the joint probabilities Pr{Xµ and Yν } or, equivalently, by the conditional probabilities Pr{Yν | Xµ } and Pr{Xµ | Yν } and the a priori probabilities Pr{Xµ } and Pr{Yν }. Following the definitions given in the previous section, the joint information of two events Xµ ∈ X and Yν ∈ Y is I (Xµ , Yν ) = log2
1 = − log2 Pr{Xµ , Yν }. Pr{Xµ , Yν }
Consequently, the joint entropy of both processes is given by Pr{Xµ , Yν } · log2 Pr{Xµ , Yν }. I¯(X , Y) = EX,Y {I (Xµ , Yν )} = − µ
(2.6)
(2.7)
ν
Figure 2.2 illustrates the relationships between different kinds of entropies. Besides the terms I¯(X ), I¯(Y), and I¯(X , Y) already defined, three additional important entropies exist.
54
INFORMATION THEORY I¯(X , Y)
I¯(X )
I¯(Y)
I¯(X | Y)
I¯(X ; Y)
I¯(Y | X )
Figure 2.2 Illustration of entropies for two processes At the receiver, y is totally known and the term I¯(X | Y) represents the information of X that is not part of Y. Therefore, the equivocation I¯(X | Y) represents the information that was lost during transmission I¯(X | Y) = I¯(X , Y) − I¯(Y) = EX,Y − log2 Pr{Xµ | Yν } Pr{Xµ , Yν } · log2 Pr{Xµ | Yν }. (2.8) =− µ
ν
From Figure 2.2, we recognize that I¯(X | Y) equals the difference between the joint entropy I¯(X , Y) and the sinks entropy I¯(Y). Equivalently, we can write I¯(X , Y) = I¯(X | Y) + I¯(Y), leading to the general chain rule for entropies. Chain Rule for Entropies In Appendix B.1, it has been shown that the entropy’s chain rule (Cover and Thomas 1991) I¯(X1 , X2 , . . . , Xn ) =
n
I¯(Xi | Xi−1 · · · X1 )
(2.9)
i=1
holds for a set of random variables X1 , X2 , up to Xn , belonging to a joint probability Pr{X1 , X2 , . . . , Xn }. On the contrary, I¯(Y|X ) represents information of Y that is not contained in X . Therefore, it cannot stem from the source X and is termed irrelevance. I¯(Y | X ) = I¯(X , Y) − I¯(X ) = EY,X − log2 Pr{Yν | Xµ } Pr{Xµ , Yν } · log2 Pr{Yν | Yµ } (2.10) =− µ
ν
INFORMATION THEORY
55
Naturally, the average information of a process X cannot be increased by some knowledge about Y so that I¯(X | Y) ≤ I¯(X ) (2.11) holds. Equality in (2.11) is obtained for statistically independent processes. The most important entropy I¯(X ; Y) is called mutual information and describes the average information common to X and Y. According to Figure 2.2, it can be determined by I¯(X ; Y) = I¯(X ) − I¯(X | Y) = I¯(Y) − I¯(Y | X ) = I¯(X ) + I¯(Y) − I¯(X , Y).
(2.12)
Mutual information is the term that has to be maximized in order to design a communication system with the highest possible spectral efficiency. The maximum mutual information that can be obtained is called channel capacity and will be derived for special cases in subsequent sections. Inserting (2.2) and (2.7) into (2.12) yields Pr{Xµ , Yν } Pr{Xµ , Yν } · log2 I¯(X ; Y) = Pr{X µ } · Pr{Yν } µ ν =
Pr{Xµ }
µ
Pr{Yν | Xµ } . l Pr{Yν | Xl } Pr{Xl }
Pr{Yν | Xµ } log2
ν
(2.13)
As can be seen, mutual information depends on the conditional probabilities Pr{Yν | Xµ } determined by the channel and the a priori probabilities Pr{Xµ }. Hence, the only parameter that can be optimized for a given channel in order to maximize the mutual information is the statistics of the input alphabet. Chain Rule for Information If the mutual information depends on a signal or parameter z, (2.12) changes to I¯(X ; Y | Z) = I¯(X | Z) − I¯(X | Y, Z). This leads directly to the general chain rule for information (Cover and Thomas 1991) (cf. Appendix B.2) I¯(X1 , . . . , Xn ; Z) =
n
I¯(Xi ; Z | I¯(Xi−1 , . . . , X1 ).
(2.14)
i=1
For only two random variables X and Y, (2.14) becomes I¯(X , Y; Z) = I¯(X ; Z) + I¯(Y; Z | X ) = I¯(Y; Z) + I¯(X ; Z | Y) .
(2.15)
From (2.15), we learn that first detecting x from z and subsequently y – now for known x – leads to the same mutual information as starting with y and proceeding with the detection of x. As a consequence, the detection order of x and y has no influence from the information theoretic point of view. However, this presupposes an error-free detection of the first signal that usually cannot be ensured in practical systems, resulting in error propagation. Data Processing Theorem With (2.14), the data processing theorem can now be derived. Imagine a Markovian chain X → Y → Z of three random processes X , Y, and Z, that is, Y depends on X and Z depends on Y but X and Z are mutually independent for known y. Hence, the entire
56
INFORMATION THEORY
information about X contained in Z is delivered by Y and I¯(X ; Z | y) = 0 holds. With this assumption, the data processing theorem I¯(X ; Z) ≤ I¯(X ; Y)
and
I¯(X ; Z) ≤ I¯(Y; Z)
(2.16)
is derived in Appendix B.3. If Z is a function of Y, (2.16) states that information about X obtained from Y cannot be increased by some processing of Y leading to Z. Equality holds if Z is a sufficient statistics of Y which means that Z contains exactly the same information about X as Y, that is, I¯(X ; Y | Z) = I¯(X ; Y | Y) = 0 holds.
2.1.3 Extension for Continuous Signals If the random process X consists of continuously distributed variables, the probabilities Pr{Xµ } defined earlier have to be replaced by probability densities pX (x). Consequently, all sums become integrals and the differential entropy is defined by ∞ ¯ pX (x) · log2 pX (x)dx = E{− log2 pX (x)}. (2.17) Idiff (X ) = − −∞
Contrary to the earlier definition, the differential entropy is not restricted to be nonnegative. Hence, the aforementioned interpretation is not valid anymore. Nevertheless, I¯diff (X ) can still be used for the calculation of mutual information and channel capacity, which will be demonstrated in Section 2.2. For a real random process X with a constant probability density pX (x) = 1/(2a) in the range |x| ≤ a, a being a positive real constant, the differential entropy has the value a 1 · log2 (2a)dx = log2 (2a). (2.18) I¯diff (X ) = −a 2a With reference to a real Gaussian distributed process with mean µX and variance σX2 , we obtain ' & 1 (x − µX )2 pX (x) = · exp − 2σX2 2π σX2 and
1 I¯diff (X ) = · log2 (2π eσX2 ). (2.19a) 2 If the random process is circularly symmetric complex, that is, real and imaginary parts are independent with powers σX2 = σX2 = σX2 /2, the Gaussian probability density function (PDF) has the form ' & 1 |x − µX |2 . · exp − pX (x) = pX (x ) · pX (x ) = π σX2 σX2 In this case, the entropy is
I¯diff (X ) = log2 (π eσX2 ).
(2.19b)
Comparing (2.19a) and (2.19b), we observe that the differential entropy of a complex Gaussian random variable equals the joint entropy of two independent real Gaussian variables with halved variance.
INFORMATION THEORY
57
2.1.4 Extension for Vectors and Matrices When dealing with vector channels that have multiple inputs and outputs, we use vector notations as described in Section 1.2.4. Therefore, we stack n random variables x1 , . . . , xn of the process X into the vector x. With the definition of the joint entropy in (2.7), we obtain Pr{x} · log2 Pr{x} (2.20) I¯(X ) = − x∈Xn
=−
|X|
···
ν1 =1
|X|
Pr{Xν1 , · · · , Xνn } · log2 Pr{Xν1 , · · · , Xνn }.
νn =1
Applying the chain rule recursively for entropies in (2.9) leads to an upper bound I¯(X ) =
n µ=1
I¯(Xµ | x1 , · · · , xµ−1 ) ≤
n
I¯(Xµ )
(2.21)
µ=1
where equality holds exactly for statistically independent processes Xµ . Following the previous subsection, the differential entropy for real random vectors becomes pX (x) · log2 pX (x) dx = E{− log2 pX (x)} (2.22) I¯diff (X ) = − Rn
Under the restriction x ≤ a, a being a positive real constant, the entropy is maximized for a uniform distribution. Analogous to Section 2.1.1, we obtain
2π n/2 a n 1/Vn (a) for x ≤ a , (2.23) with Vn (a) = pX (x) = 0 else n(n/2) that is, the PDF describes the surface of a ball in the n-dimensional space. The gamma ∞ function in (2.23) is defined by (x) = 0 t x−1 e−t dt (Gradshteyn 2000). It becomes √ (n) = (n − 1)! and (n − 12 ) = (2n)! π/(n! · 22n ) for n = 1, 2, 3, . . . . The expectation in (2.22) now delivers & n/2 n ' 2π a . (2.24) I¯diff (X ) = log2 n(n/2) On the contrary, for a given covariance matrix X X = EX {xxT } of a real-valued process X , the maximum entropy is achieved by a multivariate Gaussian density ! T −1 # x X X x 1 pX (x) = 8 (2.25) · exp − 2 det(2π X X ) and amounts to
1 I¯diff (X ) = · log2 det(2π eX X ). 2
(2.26)
For complex elements of x with the same variance σX2 , the Gaussian density becomes / 0 1 · exp −xH −1 x (2.27) pX (x) = 8 XX det(π X X )
58
INFORMATION THEORY d
x
FEC encoder
y
FEC decoder
channel
Rc = k/n
dˆ
I¯(X | Y)
I¯(X ) I¯(Y)
I¯(X ; Y)
I¯(Y | X ) Figure 2.3 Simple model of a communication system
with X X = EX {xxH } and the corresponding entropy has the form I¯diff (X ) = log2 det(π eX X ) ,
(2.28)
if the real and imaginary parts are statistically independent.
2.2 Channel Coding Theorem for SISO Channels 2.2.1 Channel Capacity This section describes the channel capacity and the channel coding theorem defined by Shannon. Figure 2.3 depicts the simple system model. An Forward Error Correction (FEC) encoder, which is explained in more detail in Chapter 3, maps k data symbols represented by the vector d onto a vector x of length n > k. The ratio Rc = k/n is termed code rate and determines the portion of information in the whole message x. The vector x is transmitted over the channel, resulting in the output vector y of the same length n. Finally, the FEC decoder tries to recover d on the basis of the observation y and the knowledge of the code’s structure. As already mentioned in Section 2.1.2, mutual information I¯(X ; Y) is the crucial parameter that has to be maximized. According to (2.12), it only depends on the conditional probabilities Pr{Yν | Xµ } and the a priori probabilities Pr{Xµ }. Since Pr{Yν | Xµ } are given by the channel characteristics and can hardly be influenced, mutual information can only be maximized by properly adjusting Pr{Xµ }. Therefore, the channel capacity C describes the maximum mutual information C = sup
Pr{X } µ
ν
Pr{Yν | Xµ } Pr{Y ν | Xl } · Pr{Xl } l
Pr{Yν | Xµ } · Pr{Xµ } · log2
(2.29)
INFORMATION THEORY
59
obtained for optimally choosing the source statistics Pr{X }.1 It can be shown that mutual information is a concave function with respect to Pr{X }. Hence, only one maximum exists, which can be determined by the sufficient conditions ∂C =0 ∂ Pr{Xµ }
∀
Xµ ∈ X.
(2.30)
Owing to the use of the logarithm to base 2, C is measured in (bits/channel use) or (bits/s/Hz). In many practical systems, the statistics of the input alphabet is fixed or the effort for optimizing it is prohibitively high. Therefore, uniformly distributed input symbols are assumed and the expression Pr{Yν | Xµ } 1 · . (2.31) Pr{Yν | Xµ } · log2 I¯(X ; Y) = log2 |X | + |X | µ ν l Pr{Yν | Xl } is called channel capacity although the maximization with respect to Pr{X } is missing. The first term in (2.31) represents I¯(X ) and the second, the negative equivocation I¯(X | Y). Channel Coding Theorem The famous channel coding theorem of Shannon states that at least one code of rate Rc ≤ C exists for which an error-free transmission can be ensured. The theorem assumes perfect Maximum A Posteriori (MAP) or maximum likelihood decoding (cf. Section 1.3) and the code’s length may be arbitrarily long. However, the theorem does not show a way to find this code. For Rc > C, it can be shown that an error-free transmission is impossible even with tremendous effort (Cover and Thomas 1991). For continuously distributed signals, the probabilities (2.29) have to be replaced by corresponding densities and the sums by integrals. In the case of a discrete signal alphabet and a continuous channel output, we obtain the expression pY|Xµ (y) dy. (2.32) pY|Xµ (y) · Pr{Xµ } · log2 C = sup Pr{X} l pY|Xl (y) · Pr{Xl } µ Y
Examples of capacities for different channels and input alphabets are presented later in this chapter.
2.2.2 Cutoff Rate Up to this point, no expression addressing the error rate attainable for a certain code rate Rc and codeword length n was achieved. This drawback can be overcome with the cutoff rate and the corresponding Bhattacharyya bound. Valid codewords by x and the code representing the set of all codewords as is denoted . Furthermore, assuming that x ∈ of length n was transmitted its decision region D(x) is defined such that the decoder decides correctly for all received vectors y ∈ D(x). For a discrete output alphabet of the channel, the word error probability Pw (x) of x can be expressed by Pw (x) = Pr Y ∈ / D(x) | x = Pr{y | x}. (2.33) y∈ / D(x) 1 If the maximum capacity is really reached by a certain distribution, the supremum can be replaced by the maximum operator.
60
INFORMATION THEORY
Since the decision regions D(x) for different x are disjoint, we can alternatively sum the probabilities Pr{Y ∈ D(x ) | x} of all competing codewords x = x and (2.33) can be rewritten as Pw (x) = Pr Y ∈ D(x ) | x = Pr{y | x}. (2.34) x ∈\{x }
x ∈\{x } y∈D(x )
The right-hand side of (2.34) replaces y ∈ / D(x) by the sum over all competing decision regions D(x = x). Since Pr{y | x } is larger than Pr{y | x} for all y ∈ D(x ), " Pr{y | x } ≥1 (2.35) ⇒ Pr{y | x } ≥ Pr{y | x} Pr{y | x} holds. The multiplication of (2.34) with (2.35) and the extension of the inner sum in (2.34) to all possible received words y ∈ Yn leads to an upper bound " Pr{y | x } Pr{y | x} · Pw (x) ≤ Pr{y | x} x ∈\{x } y∈D(x )
=
8 Pr{y | x} · Pr{y | x }.
(2.36)
x ∈\{x } y∈Yn
The computational costs for calculating (2.36) are very high for practical systems because the number of codewords and especially the number of possible received words is very large. Moreover, we do not know a good code yet and we are not interested in the error probabilities of single codewords x. A solution would be to calculate the average error probability over all possible codes , that is, we determine the expectation EX {Pw (x)} with respect to Pr{X }. Since all possible codes are considered with equal probability, all words x ∈ Xn are possible. In order to reach this goal, it is assumed that x and x are identically distributed and are independent so that Pr{x, x } = Pr{x} · Pr{x } holds.2 The expectation of the square root in (2.36) becomes ,8 - 8 E Pr{y | x} Pr{y | x } = Pr{y | x} Pr{y | x } Pr{x} Pr{x } x∈Xn x ∈Xn
=
8
Pr{y | x} Pr{x}
x∈Xn
! =
8
8
Pr{y | x } Pr{x }
x ∈Xn
#2
Pr{y | x} · Pr{x}
.
(2.37)
x∈Xn
Since (2.37) does not depend on x any longer, the outer sum in (2.36) becomes a constant factor 2k − 1 that can be approximated by 2nRc with Rc = k/n. We obtain (Cover and Thomas 1991) 2 This assumption also includes codes that map different information words onto the same codeword, leading to x = x . Since the probability of these codes is very low, their contribution to the ergodic error rate is rather small.
INFORMATION THEORY
61
Pw = EX {Pw (x)} < 2
nRc
·
!
y∈Yn
= 2nRc +log2
#2 8 Pr{y | x} · Pr{x}
(2.38a)
x∈Xn y∈Yn
(
x∈Xn
√
2
Pr{y|x}·Pr{x})
(2.38b)
that is still a function of the input statistics Pr{X }. In order to minimize the average error probability, the second part of the exponent in (2.38b) has to be minimized. Defining the cutoff rate to #2 ! 8 1 (2.39) Pr{y | x} · Pr{x} , R0 = max − · log2 Pr{X } n y∈Yn x∈Xn that depends only on the conditional probabilities Pr{y | x} of the channel, we obtain an upper bound for the minimum average error rate min E{Pw } < 2−n(R0 −Rc ) = 2−n·EB (Rc ) .
Pr{X }
(2.40)
In (2.40), EB (Rc ) = R0 − Rc denotes the Bhattacharyya error exponent. This result demonstrates that arbitrarily low error probabilities can be achieved for R0 > Rc . If the code rate Rc approaches R0 , the length n of the code has to be infinitely increased for an error-free transmission. Furthermore, (2.40) now allows an approximation of error probabilities for finite codeword lengths. For memoryless channels, the vector probabilities can be factorized into symbol probabilities, simplifying the calculation of (2.39) tremendously. Applying the distributive law, we finally obtain #2 ! 8 (2.41) Pr{y | x} · Pr{x} . R0 = max − log2 Pr{X }
y∈Y
x∈X
Owing to the applied approximations, R0 is always smaller than the channel capacity C. For code rates with R0 < Rc < C, the bound in (2.40) cannot be applied. Moreover, owing to the introduction of the factor in (2.35), the bound becomes very loose for large number of codewords. Continuously Distributed Output In Benedetto and Biglieri (1999, page 633), an approximation of R0 is derived for the AWGN channel with a discrete input X and a continuously distributed output. The derivation starts with the calculation of the average error probability and finally the result is obtained in (2.40). Using our notation, we obtain & ' 1 r(X, N0 ) (2.42) R0 = log2 (|X|) − log2 1 + |X| with r(X, N0 ) = min |X|2 · Pr{X}
|X| |X| µ=1 ν=1
+ * |Xµ − Xν |2 . Pr{Xµ } Pr{Xν } exp − 4N0
(2.43)
62
INFORMATION THEORY
However, performing the maximization is a difficult task and hence a uniform distribution of X is often assumed. In this case, the factor in front of the double sum and the a priori probabilities eliminate each other.
2.2.3 Gallager Exponent As already mentioned, the error exponent of Bhattacharyya becomes very loose for large codeword sets. In order to tighten the bound in (2.38a), Gallager introduced an optimization parameter ρ ∈ [0, 1], leading to the expression (Cover and Thomas 1991) Pw (ρ) = EX {Pw (ρ, x)} < 2
ρnRc
·
!
y∈Yn ρnRc +log2
=2
#1+ρ Pr{y | x}
1 1+ρ
· Pr{x}
x∈Xn y∈Yn
&
1
x∈Xn
'1+ρ
Pr{y|x} 1+ρ ·Pr{x}
.
(2.44)
Similar to the definition of the cutoff rate in (2.39), we can now define the Gallager function #1+ρ ! 1 1 . E0 (ρ, Pr{X }) = − · log2 (2.45) Pr{y | x} 1+ρ · Pr{x} n n n y∈Y x∈X Comparing (2.37) with (2.45), it becomes obvious that the bounds of Gallager and Bhattacharyya are identical for ρ = 1, and R0 = maxPr{X} E0 (1, Pr{X }) holds. The average symbol error probability in (2.40) becomes Pw (ρ) < 2−n(E0 (ρ,Pr{X})−ρ·Rc ) . For memoryless channels, the Gallager function can be simplified to #1+ρ ! 1 . E0 (ρ, Pr{X }) = − log2 Pr{y | x} 1+ρ · Pr{x} y∈Y
(2.46)
(2.47)
x∈X
With the Gallager exponent EG (Rc ) = max max (E0 (ρ, Pr{X }) − ρ · Rc ) , Pr{X} ρ∈[0,1]
(2.48)
we finally obtain the minimum error probability Pw = min Pw (ρ) < 2−n·EG (Rc ) . Pr{X},ρ
(2.49)
A curve sketching of EG (Rc ) is now discussed. By partial derivation of E0 (ρ, Pr{X }) with respect to ρ, it can be shown that the Gallager function increases monotonically with ρ ∈ [0, 1] from 0 to its maximum R0 . Furthermore, fixing ρ in (2.48), EG (Rc ) describes a straight line with slope −ρ and offset E0 (ρ, Pr{X }). As a consequence, we have a set of straight lines – one for each ρ – whose initial values at Rc = 0 grow with increasing ρ.
INFORMATION THEORY
63
Figure 2.4 Curve sketching of Gallager exponent EG (Rc )
Each of these lines is determined by searching the optimum statistics Pr{X }. The Gallager exponent is finally obtained by finding the maximum among all lines for each code rate Rc . This procedure is illustrated in Figure 2.4. The critical rate . . ∂ E0 (ρ, Pr{X }).. Rcrit = (2.50) ∂ρ ρ=1 represents the maximum code rate for which ρ = 1 is the optimal choice. It is important to mention that Pr{X } in (2.50) already represents the optimal choice for a maximal rate. In the range 0 < Rc ≤ Rcrit , the parametrization by Gallager does not affect the result and EG (Rc ) equals the Bhattacharyya exponent EB (Rc ) given in (2.40). Hence, the cutoff rate can be used for approximating the error probability. For Rc > Rcrit , the Bhattacharyya bound cannot be applied anymore and the tighter Gallager bound with ρ < 1 will have to be used. According to (2.49), we can achieve arbitrarily low error probabilities by appropriately choosing n as long as EG (Rc ) > 0 holds. The maximum rate for which an error-free transmission can be ensured is reached at the point where EG (Rc ) approaches zero. It can be shown that this point is obtained for ρ → 0 resulting in Rmax = lim
ρ→0
E0 (ρ, Pr{X }) = max I¯(X ; Y) = C. Pr{X} ρ
(2.51)
Therefore, the maximum rate for which an error-free transmission can be ensured is exactly the channel capacity C (which was already stated in the channel coding theorem). Transmitting at Rc = C requires an infinite codeword length n → ∞. For the sake of completeness, it has to be mentioned that an expurgated exponent Ex (ρ, Pr{X }) with ρ ≥ 1 exists, lead∂ ing to tighter results than the Gallager exponent for rates below Rex = ∂ρ Ex (ρ, Pr{X })|ρ=1 (Cover and Thomas 1991).
64
INFORMATION THEORY
2.2.4 Capacity of the AWGN Channel AWGN Channel with Gaussian Distributed Input In this and the next section, the recent results for some practical channels are discussed. Starting with the equivalent baseband representation of the AWGN channel depicted in Figure 1.11. If the generally complex input and output signals are continuously distributed, differential entropies have to be used. Since the information in Y for known X can only stem from the noise N , mutual information illustrated in Figure 2.3 has the form I¯(X ; Y) = I¯diff (Y) − I¯diff (Y | X ) = I¯diff (Y) − I¯diff (N ).
(2.52)
The maximization of (2.52) with respect to pX (x) only affects the term I¯diff (Y) because the background noise cannot be influenced. For statistically independent processes X and N , the corresponding powers can simply be added σY2 = σX2 + σN2 and, hence, fixing the transmit power directly fixes σY2 . According to Section 2.1.3, the maximum mutual information for a fixed power is obtained for a Gaussian distributed process Y. However, this can only be achieved for a Gaussian distribution of X . Hence, we have to substitute (2.19b) into (2.52). Inserting the results of Section 1.2.2 (σX2 = 2BEs and σN2 = 2BN0 ), we obtain the channel capacity C 2−dim = log2 (π eσY2 ) − log2 (π eσN2 ) ! # & ' σX2 + σN2 Es . = log2 = log2 1 + N0 σN2
(2.53)
Obviously, the capacity grows logarithmically with the transmit power or, equivalently, with Es /N0 . If only the real part of X is used for data transmission – such as for real-valued binary phase shift keying (BPSK) or amplitude shift keying (ASK) – the bits transmitted per channel usage is halved. However, we have to take into account that only the real part of the noise disturbs the transmission so that the effective noise power is also halved (σN2 = 12 σN2 = BN0 ). If the transmit power remains unchanged (σX2 = σX2 = 2BEs ), (2.53) becomes & ' 1 1 1 Es 1−dim 2 2 . (2.54) C = · log2 (π eσY ) − · log2 (π eσN ) = · log2 1 + 2 2 2 2 N0 In many cases, the evaluation of systems in terms of a required Es /N0 does not lead to a fair comparison. This is especially the case when the number of channel symbols transmitted per information bit varies. Therefore, a better comparison is obtained by evaluating systems with respect to the required energy Eb per information bit. For binary modulation schemes with m = 1, it is related to Es by k · Eb = n · Es
=⇒
Es =
k · Eb = Rc · Eb n
(2.55)
because the FEC encoder should not change the energy. Substituting Es in (2.53) and (2.54) delivers & ' & ' Eb Eb 1 , C 1−dim = · log2 1 + 2Rc . C 2−dim = log2 1 + Rc N0 2 N0
INFORMATION THEORY
65 b) capacity versus Eb /N0
a) capacity versus Es /N0 10
10
real complex
8
C→
C→
8 6
6
4
4
2
2
0 −10
real complex
0 10 20 Es /N0 in dB →
30
0
0
5 10 15 Eb /N0 in dB →
20
Figure 2.5 Channel capacities for AWGN channel with Gaussian distributed input Since the highest spectral efficiency in maintaining an error-free transmission is obtained for Rc = C, these equations only implicitly determine C. We can resolve them with respect to Eb /N0 and obtain the common result Eb 22C − 1 = N0 2C
(2.56)
for real-as well as complex-valued signal alphabets. For C → 0, the required Eb /N0 does not tend to zero but to a finite value Eb 22C · log 2 · 2 = log 2 = ˆ − 1.59 dB. = lim C→0 N0 C→0 2 lim
(2.57)
Hence, no error-free transmission is possible below this ultimate bound. Figure 2.5 illustrates the channel capacity for the AWGN channel with Gaussian input. Obviously, real and complex-valued transmissions have the same capacity for Es /N0 → 0 or, equivalently, Eb /N0 → log(2). For larger signal-to-noise ratios (SNRs), the complex system has a higher capacity because it can transmit twice as many bits per channel use compared to the real-valued system. This advantage affects the capacity linearly, whereas the drawback of a halved SNR compared to the real-valued system has only a logarithmic influence. Asymptotically, doubling the SNR (3 dB step) increases the capacity by 1 bit/s/Hz for the complex case. AWGN Channel with Discrete Input Unfortunately, no closed-form expressions exist for discrete input alphabets and (2.32) has to be evaluated numerically. Owing to the reasons discussed on page 59 we assume a uniform distribution of X pY|Xµ (y) 1 · dy. (2.58) pY|Xµ (y) · log2 C = log2 (|X |) + |X | l pY|Xl (y) µ Y
66
INFORMATION THEORY b) capacity versus Eb /N0
a) capacity versus Es /N0 6 complex Gaussian
3 real Gaussian
8-PSK QPSK
2
0 10 20 Es /N0 in dB →
16-PSK
4
8-PSK
3
QPSK
2
BPSK
1
32-PSK
5
16-PSK
4
0 −10
32-PSK
C→
C→
5
6
BPSK
1 30
0
0
5 10 15 Eb /N0 in dB →
20
Figure 2.6 Capacity of AWGN channel for different PSK constellations An approximation of the cutoff rate was already presented in (2.42). Figure 2.6 shows the capacities for the AWGN channel and different PSK schemes. Obviously, for very low SNRs Es /N0 → 0, no difference between discrete input alphabets and a continuously Gaussian distributed input can be observed. However, for higher SNR, the Gaussian input represents an upper bound that cannot be reached by discrete modulation schemes. Their maximum capacity is limited to the √ number of bits transmitted per symbol (log2 |X |). Since BPSK consists of real symbols ± Es /Ts , its capacity is upper bounded by that of a continuously Gaussian distributed real input, and the highest spectral efficiency that can be obtained is 1 bit/s/Hz. The other schemes have to be compared to a complex Gaussian input. For very high SNRs, the uniform distribution is optimum again since the maximum capacity reaches exactly the number of bits per symbol. Regarding ASK and quadrature amplitude modulation QAM schemes, approximating a Gaussian distribution of the alphabet by signal shaping can improve the mutual information although it need not to be the optimum choice. The maximum gain is determined by the power ratio of uniform and Gaussian distributions if both have the same differential entropy. With (2.18) and (2.19a) for real-valued transmissions, we obtain log2 (2a) =
1 log2 (2π eσX2 ) 2
⇒
σX2 =
2a 2 . πe
(2.59)
Since the average power for the uniformly distributed signal is .a ∞ a 2 x 3 .. x a2 2 dx = , pX (x)x dx = = 6a .−a 3 −∞ −a 2a the power ratio between uniform and Gaussian distributions for identical entropies becomes (with (2.59)) a 2 /3 a 2 /3 πe = 2 = → 1.53 dB. (2.60) 2 2a /(π e) 6 σX
INFORMATION THEORY
67
a) mutual information versus Es /N0 6
b) mutual information versus Eb /N0 6
5
I¯(X ; Y) →
I¯(X ; Y) →
5
4
3
2 0
Gaussian 16-QAM 64-QAM 10 20 30 Es /N0 in dB →
4
3
Gaussian 16-QAM 64-QAM 5 10 15 20 Eb /N0 in dB →
2 0
Figure 2.7 Capacity of AWGN channel for different QAM constellations (solid lines: uniform distribution, dashed lines: Gaussian distribution) Theoretically, we can save 1.53 dB transmit power when changing from a uniform to a Gaussian continuous distribution without loss of entropy. The distribution of the discrete signal alphabet has the form (Fischer et al. 1998) Pr{Xµ } = K(λ) · e−λ|Xµ |
2
(2.61)
where K(λ) must be chosen appropriately to fulfill the condition µ Pr{Xµ } = 1. The parameter λ ≥ 0 has to be optimized for each SNR. For λ = 0, the uniform distribution with K(0) = |X|−1 is obtained. Figure 2.7 depicts the corresponding results. We observe that signal shaping can close the gap between the capacities for a continuous Gaussian input and a discrete uniform input over a wide range of Es /N0 . However, the absolute gains are rather low for these small alphabet sizes and amount to 1 dB for 64-QAM. As mentioned before, for high SNRs, λ tends to zero, resulting in a uniform distribution achieving the highest possible mutual information. The last aspect in this subsection addresses the influence of quantization on the capacity. Quantizing the output of an AWGN channel leads to a model with discrete inputs and outputs that can be fully described by the conditional probabilities Pr{Yν | Xµ }. They depend on the SNR of the channel and also on the quantization thresholds. We will concentrate in the following part on BPSK modulation. A hard decision at the output delivers the binary symmetric channel (BSC). Its capacity can be calculated by C = 1 + Ps log2 (Ps ) + (1 − Ps ) log2 (1 − Ps ) = 1 − I¯2 (Ps ) (2.62) √ where Ps = 12 erfc( Es /N0 ) denotes the symbol error probability. Generally, we obtain 2q output symbols Yν for a q-bit quantization. The quantization thresholds have to be chosen such that the probabilities Pr{Yν | Xµ } with 1 ≤ µ ≤ |X| and 1 ≤ ν ≤ 2q maximize the mutual information. Figure 2.8 shows the corresponding results. On the one hand, the loss due to a hard decision prior to decoding can be up to 2 dB, that is, the minimum Eb /N0
68
INFORMATION THEORY b) channel capacity versus Eb /N0 1
a) channel capacity versus Es /N0 1
0.8
C→
C→
0.8 0.6 0.4 0.2 0 −10
q=1 q=2 q=3 q=∞ Gaussian −5 0 5 10 Es /N0 in dB →
0.6 0.4
q=1 q=2 q=3 q=∞ Gaussian
0.2 0
0
5 Eb /N0 in dB →
10
Figure 2.8 Capacity of AWGN channel for BPSK and different quantization levels for which an error-free transmission is principally possible is approximately 0.4 dB. On the other hand, a 3-bit quantization loses only slightly compared to the continuous case. For high SNRs, the influence of quantization is rather small.
2.2.5 Capacity of Fading Channel In Section 1.3.3, the error probability for frequency-nonselective fading channels was discussed and it was recognized that the error rate itself is a random variable that depends on the instantaneous fading coefficient h. For the derivation of channel capacities, we encounter the same situation. Again, we can distinguish between ergodic and outage capacity. The ergodic capacity C¯ represents the average capacity among all channel states and is mainly chosen for fast fading channels when coding is performed over many channel states. On the contrary, the outage capacity Cout denotes the capacity that cannot be reached with an outage probability Pout . It is particularly used for slowly fading channels where the coherence time of the channel is much larger than a coding block which is therefore affected by a single channel realization. For the sake of simplicity, we restrict the derivation on complex Gaussian distributed inputs because there exist no closed-form expressions for discrete signal alphabets. Starting with the result of the previous section, we obtain the instantaneous capacity & ' 2 Es = log2 (1 + γ ) C(γ ) = log2 1 + |h| · (2.63) N0 that depends on the squared magnitude of the instantaneous channel coefficient h and, thus, on the current SNR γ = |h|2 Es /N0 . Averaging (2.63) with respect to γ delivers the ergodic channel capacity ∞ ¯ (2.64) C = E{C(γ )} = log2 (1 + ξ ) · pγ (ξ ) dξ. 0
INFORMATION THEORY
69
In order to compare the capacities of fading channels with that of the AWGN channel, we have to apply Jensen’s inequality (Cover and Thomas 1991). Since C(γ ) is a concave function, it states that (2.65) EX {f (x)} ≤ f (EX {x}) . For convex functions, ‘≤’ has to be replaced by ‘≥’. Moving the expectation in (2.64) into the logarithm leads to (2.66) C¯ = E log2 (1 + γ ) ≤ log2 1 + E{γ } . From (2.66), we immediately see that the capacity of a fading channel with average SNR E{γ } = Es /N0 for σH2 = 1 is always smaller than that of an AWGN channel with the same average SNR. Ergodic Capacity We now want to calculate the ergodic capacity for particular fading processes. If |H| is Rayleigh distributed, we know from Section 1.5 that |H|2 and γ are chi-squared distributed with two degrees of freedom. According to Section 1.3.3, we have to insert pγ (ξ ) = 1/γ¯ · exp(−ξ/γ¯ ) with γ¯ = σH2 · Es /N0 into (2.64). Applying the partial integration technique, we obtain ∞ 1 C¯ = log2 (1 + ξ ) · · e−ξ/γ¯ dξ γ¯ 0
!
# 1 · expint (2.67) σH2 Es /N0 ∞ where the exponential integral function is defined as expint(x) = x e−t /t dt (Gradshteyn 2000). Figure 2.9 shows a comparison between the capacities of AWGN and flat Rayleigh fading channels (bold lines). For sufficiently large SNR, the curves are parallel and we can observe a loss of roughly 2.5 dB due to fading. Compared with the bit error rate (BER) loss of approximately 17 dB in the uncoded case, this loss is rather small. It can be explained by the fact that the channel coding theorem presupposes infinite long codewords, allowing the decoder to exploit a high diversity gain. This leads to a relatively small loss in capacity compared to the AWGN channel. Astonishingly, the ultimate limit of −1.59 dB is the same for AWGN and Rayleigh fading channel. 1 = log2 (e) · exp 2 σH Es /N0
#
!
Outage Probability and Outage Capacity With the same argumentation as in Section 1.3 we now define the outage capacity Cout with the corresponding outage probability Pout . The latter one describes the probability of the instantaneous capacity C(γ ) falling below a threshold Cout . (2.68) Pout = Pr C(γ ) < Cout = Pr log2 (1 + γ ) < Cout Inserting the density pγ (ξ ) with γ¯ = Es /N0 into (2.68) leads to ' & 1 − 2Cout . Pout = Pr γ < 2Cout − 1 = 1 − exp Es /N0
(2.69)
70
INFORMATION THEORY 12
C1 C5 C10 C20 C50 AWGN C¯
10
C→
8 6 4 2 0 −10
0
10 20 Es /N0 in dB →
30
40
Figure 2.9 Ergodic and outage capacity of flat Rayleigh fading channels for Gaussian input versus Eb /N0 (Cp denotes the outage capacity for Pout = p) Resolving the last equation with respect to Cout yields Cout = log2 1 − Es /N0 · log(1 − Pout ) .
(2.70)
Conventionally, Cout is written as Cp where p = Pout determines the corresponding outage probability. Figure 2.9 shows the outage capacities for different values of Pout . For Pout = 0.5, C50 , which can be ensured with a probability of only 50%, is close to the ¯ The outage capacity Cout decreases dramatically for smaller Pout . At ergodic capacity C. a spectral efficiency of 6 bit/s/Hz, the loss in terms of Eb /N0 amounts to nearly 8 dB for Pout = 0.1 and roughly 18 dB for Pout = 0.01 compared to the AWGN channel. Figure 2.10 depicts the outage probability versus the outage capacity for different values of Es /N0 . As expected for high SNRs, large capacities can be achieved with very low outage probability. However, Pout grows rapidly with decreasing Es /N0 . The asterisks denote the outage probability of the ergodic capacity C. As already observed in Figure 2.9, it is close to 0.5.
2.2.6 Channel Capacity and Diversity As explained in the previous subsection, the instantaneous channel capacity is a random variable that depends on the squared magnitude of the actual channel coefficient h. Since |h|2 and, thus, also γ vary over a wide range, the capacity also has a large variance. ¯ the outage capacity Cout is an appropriate measure for the Besides the ergodic capacity C, channel quality. From Section 1.5, we know that diversity reduces the SNR’s variance and, therefore, also reduces the outage probability of the channel. Assuming that a signal x is transmitted over D statistically independent channels with coefficients h , 1 ≤ ≤ D, the instantaneous capacity after maximum ratio combining becomes ! # D 2 Es |h | (2.71) = log2 (1 + γ ) . C(γ ) = log2 1 + DN0 =1
INFORMATION THEORY
71
1
0 dB 5 dB 10 dB 15 dB 20 dB 25 dB 30 dB ¯ Pout (C)
Pout →
0.8 0.6 0.4 0.2 0 0
2
4 6 Cout →
8
10
Figure 2.10 Outage probability of flat Rayleigh fading channels for Gaussian input 2 Es The only difference compared to (2.63) is that the new random variable γ = D =1 |h | DN0 is chi-squared distributed with 2D degrees of freedom (instead of two). The probability density pγ (ξ ) of γ was already presented in (1.108) on page 40. The ergodic capacity is obtained by averaging C(γ ) in (2.71) with respect to pγ (ξ ). Solving the integral ∞ D D ξ D−1 − ξD log2 (1 + ξ ) · · e Es /N0 dξ (2.72) C¯ = D (D − 1)!(Es /N0 ) 0 numerically leads to the results in Figure 2.11. We have assumed independent Rayleigh fading channels with uniform average power distribution. As expected, the ergodic capacity grows with increasing D. While the largest gains are obtained for the transition from D = 1 10
D=1 D=2 D=5 D = 50 AWGN
C¯ →
8 6 4 2 0
0
5 10 15 Eb /N0 in dB →
20
Figure 2.11 Ergodic capacity for Rayleigh fading channels with different diversity degree
72
INFORMATION THEORY b) Es /N0 = 20 dB
a) Es /N0 = 10 dB D D D D D
Pout →
0.8 0.6
1
=1 =2 =5 = 10 = 50
D D D D D
0.8
Pout →
1
0.4 0.2
0.6
=1 =2 =5 = 10 = 50
0.4 0.2
0 0
5 Cout →
10
0 0
5 Cout →
10
Figure 2.12 Outage probability of flat Rayleigh fading channels for Gaussian input to D = 2, a higher diversity degree only leads to minor improvements. Asymptotically, the capacity of the AWGN channel is reached for D → ∞. Nearly, no differences can be observed for D = 50. 2 With regard to the outage probability, (2.68) has to be calculated for γ = D =1 |h | Es /N0 . Resolving (2.68) with respect to γ and inserting the chi-squared distribution with 2D degrees of freedom from (1.108) yields Pout
= Pr γ < 2Cout − 1 =
1 · (D − 1)!
2Cout −1 Es /N0 /D
ξ D−1 · e−ξ dξ.
(2.73)
0
The integral in (2.73) is often referred to as incomplete Gamma function because the upper limit is finite. The obtained outage probabilities are illustrated in Figure 2.12 versus the corresponding outage capacities. For low outage probabilities, for example, Pout = 0.1, diversity increases the outage capacity, that is, larger capacities can be guaranteed with a certain probability. The gains become bigger for high SNRs (compare Es /N0 = 10 dB and Es /N0 = 20 dB). However, for Pout > 0.7, the relations are reversed and large diversity degrees lead to higher outage probabilities. This behavior is not surprising because diversity reduces the variations of the SNR, that is, very low SNRs as well as very high SNRs occur less frequently. Therefore, very high capacities much above the ergodic capacity C¯ do not occur very often for large D, resulting in high outage probabilities. Nevertheless, these cases are pathological because outage probabilities above 0.7 are generally not desired in practical systems. Finally, Figure 2.13 depicts Cout versus Es /N0 for different diversity degrees D and outage probabilities Pout . We see that diversity can dramatically increase the outage capacity especially for large Es /N0 . Moreover, the largest gains are obtained for transitions between small D and for low outage probabilities.
INFORMATION THEORY
73 b) Pout = 0.1
a) Pout = 0.01
Cout →
8 6
D D D D D
10
=1 =2 =5 = 10 = 50
8
Cout →
10
4 2 0 0
6
D D D D D
=1 =2 =5 = 10 = 50
4 2
10 20 Es /N0 in dB →
30
0 0
10 20 Es /N0 in dB →
30
Figure 2.13 Outage probability of flat Rayleigh fading channels for Gaussian input
2.3 Channel Capacity of MIMO Systems As explained earlier, MIMO systems can be found in a wide range of applications. The specific scenario very much affects the structure and the statistical properties of the system matrix given in (1.32) on page 17. Therefore, general statements concerning the ergodic or the outage capacity of arbitrary MIMO systems will not be derived here. The reader is referred to Chapters 4 and 6 where specific examples like code division multiple access (CDMA) or multiple antenna systems are discussed in more detail. In fact, this section only derives the basic principle of how to calculate the instantaneous capacity of a general system described by a matrix S that is not further specified. This approach is later adapted to specific transmission scenarios like CDMA, space division multiple access (SDMA), or space time coding. The MIMO system comprises point-to-point MIMO communications between a single transmitter receiver pair each equipped with multiple antennas as well as multiuser communications. Since the latter case covers some additional aspects, it will be explicitly discussed in Section 2.4. In the following part, we will restrict ourselves to frequency-nonselective fading channels. Since we focus on the instantaneous capacity C(S), we can neglect the time index k. The general system model with NI inputs and NO outputs was already described in (1.32) as y = S · x + n.
(2.74)
The channel matrix H is replaced by the NO × NI system matrix S because it represents not only the channel but may also comprise other parts of the transmission system such as spreading in CDMA systems. The coefficients Sν,µ of S describe the transmission between the µth input and the νth output. For the sake of simplicity, Gaussian distributed input signals and perfect channel state information at the receiver are assumed.
74
INFORMATION THEORY
The assumption of a Gaussian input leads to an upper bound of the capacity for discrete input alphabets. The only difference compared to Single-Input Single-Output (SISO) systems is that we have to deal with vectors instead of scalars. Adapting (2.52) to the MIMO case described in (2.74), we obtain (2.75) I¯(X ; Y | S) = I¯diff (Y | S) − I¯diff (N ). Using vector notations and the definitions given in Sections 2.1.3 and 2.1.4, the instantaneous mutual information becomes I¯(X ; Y | S) = log2 det(π eY Y ) − log2 det(π eN N ).
(2.76)
Next, we have to address the covariance matrices for the channel output y and the noise process N . Taking (2.74) into account and assuming mutually independent noise contributions at the NO outputs of the system, we obtain Y Y = EX, N {(Sx + n) · (Sx + n)H } = SX X SH + N N
(2.77a)
N N = σN2 · INO .
(2.77b)
and The insertion of (2.77a) and (2.77b) into (2.76) yields ! ! # # det Y Y 1 H I¯(X ; Y | S) = log2 = log2 det INO + 2 SX X S . det N N σN
(2.78)
In order to reduce the vector notation to a set of independent scalar equations, we now apply the singular value decomposition (SVD) (see Appendix C.3 and (Golub and van Loan 1996)) to the system matrix (2.79) S = U S · S · VH S . In (2.79), US and VS are square unitary matrices (see Appendix C.3), that is, the relations H H H US UH S = US US = INO and VS VS = VS VS = INI hold. The NO × NI matrix S contains on its diagonal the singular values σi of S. For NI < NO , it has the form 0 σ1 .. . S = (2.80a) , 0 σNI 0 ··· 0 while
S =
σ1
0 ..
0
. σNO
0
0 0
(2.80b)
holds for NI > NO . The rank r of S is limited to the minimum of NI and NO , that is, r ≤ min(NI , NO ). The application of the SVD to (2.78) results in ! # 1 H H H ¯ I (X ; Y | S) = log2 det INO + 2 US S VS X X VS S US σN
INFORMATION THEORY
75 !
= log2 det US (INO + ! = log2 det INO +
1 H H S VH S X X VS S )US σN2 #
1 H S VH S X X VS S σN2
#
.
(2.81)
The second equality holds because the determinant of a matrix does not change if it is multiplied by a unitary matrix. We now have to distinguish two special cases concerning the kind of channel knowledge at the transmitters. No Cooperation between MIMO Inputs First, we assume that the different inputs of the MIMO system do not or cannot cooperate with each other. This might be the case in the uplink of a CDMA system where mobile subscribers can only communicate with a common base station and not among themselves. Moreover, if no channel knowledge is available, it is impossible to adapt the signal vector x onto the channel properties. In these cases, an optimization of X X with respect to S cannot be performed and the best strategy is to transmit NI independent data streams with equal power Es /Ts . Hence, the covariance matrix of x becomes X X = Es /Ts · INI and the mutual information in (2.81) represents the channel capacity (σN2 = N0 /Ts ) ' & Es S H C(S) = log2 det INO + S N0 3 r & 4 ' & ' r 7 Es Es 1 + σν2 . log2 1 + σν2 = log2 = N0 N0 ν=1
(2.82)
ν=1
The coefficients σν2 in (2.82) denote the squared nonzero singular values of S or, equivalently, the eigenvalues of SH S. The interpretation of (2.82) shows that we sum up the capacities of r independent subchannels with different powers σν2 . These subchannels represent the eigenmodes of the channel described by S. The ergodic capacity is obtained by calculating the expectation of (2.82) with respect to σν2 . Cooperation among MIMO Inputs If the transmitters are allowed to cooperate and if they have the knowledge about the instantaneous system matrix S, we can do better than simply transmitting independent data streams with equal power over the NI inputs. According to (2.82), the eigenmodes of S represent independent subsystems that do not interact. Applying the eigenvalue decomposition to the covariance matrix of x that still has to be determined yields X X = EX {xxH } = VX VH X
(2.83)
where = diag(λν ) denotes a diagonal matrix with powers λν on the main diagonal. The maximum mutual information is obtained by choosing VX = VS . Inserting (2.83) into (2.81)
76
INFORMATION THEORY
then leads to
!
I¯(X ; Y | S) = log2 det INO ! = log2 det INO
# 1 H H H + 2 S VS VS VS VS S σN # ! # r σν2 1 H + 2 S S = log2 1 + λν 2 . σN σN ν=1
(2.84)
Hence, we fully exploit the eigenmodes of the system S with a transmit vector x = VS x˜ where X˜ is isotropic, that is, it has no preferred direction. The multiplication with the unitary matrix VS does not change its statistical properties but rotates the coordinate system. It is particularly important that the total power of X does not change so that σX2 = σX2˜ holds. However, in order to maximize I¯(X ; Y | S), we still have to find the optimal power levels λν for the data streams xν . Since only r nonzero singular values of S exist, we would waste power by choosing λν > 0 for r < ν ≤ NI . Therefore, power is spent only on those subchannels with σν2 > 0. For these r channels, we have to solve the following convex optimization problem ! # r r σν2 Es log2 1 + λν 2 λν = N I · . (2.85) with λν ≥ 0 and max λ1 ···λr Ts σN ν=1
ν=1
The first expression in (2.85) has to be maximized, while the second and third ones represent necessary conditions. By using the Lagrange function 3 ! # 4 r σν2 log2 1 + λν 2 + µν λν − ξ · λν , (2.86) L(, µ, ξ ) = − σN ν=1 with definitions = [λ1 , . . . , λr ]T and µ = [µ1 , . . . , µr ]T , we obtain the necessary and sufficient Karush-Kuhn-Tucker conditions (Cover and Thomas 1991) a)
∂L = 0, ∂λν
b)
µν ≥ 0,
c)
µν · λν = 0 for
1 ≤ ν ≤ r.
(2.87)
First, the partial derivation of L with respect to λν yields log2 (e)σν2 /σN2 ∂L =− − µν + ξ = 0. ∂λν 1 + λν σν2 /σN2
(2.88)
Resolving this with respect to µν and exploiting condition b) in (2.87), we obtain µν = ξ −
log2 (e)σν2 /σN2 1 + λν σν2 /σN2
≥0
⇐⇒
λν ≥
log2 (e) σN2 − 2 . ξ σν
(2.89)
Next, we have to apply the third Karush-Kuhn-Tucker condition often termed complementary slackness. It states that µν has to be zero as long as the power λν > 0 holds and that µν can become arbitrarily large for λν = 0. Hence, inserting the left-hand side of (2.89) into condition c) of (2.87), results in ! # log2 (e)σν2 /σN2 log2 (e) σN2 = 0 ∨ λ = (2.90) λν · ξ − = 0 ⇔ λ − 2 ν ν ξ σν 1 + λν σν2 /σN2
INFORMATION THEORY
77
λ2 = 0 λ5 = 0 log2 (e) ξ
λ6 λ3 λ4
λ1
σN2 /σ22
σN2 /σ52 σN2 /σ62
σN2 /σ32 σN2 /σ42
σN2 /σ12
channel ν Figure 2.14 Principle of waterfilling We have to distinguish the two cases σ2 log2 (e) > N2 ξ σν
=⇒
λν =
log2 (e) σN2 − 2 >0 ξ σν
σ2 log2 (e) ≤ N2 ξ σν
=⇒
λν = 0.
Finally, the unknown variable ξ is chosen such that the total power constraint is fulfilled and ! # r r Es log2 (e) σN2 − 2 = NI · λν = (2.91) ξ σν Ts ν=1
ν=1
must hold. The procedure is typically known as waterfilling and is illustrated in Figure 2.14. It can be interpreted as pouring water into a vessel with a bumpy ground. Naturally, the water surface is flat. Therefore, at those positions with a high ground level, the water column is rather small, whereas it is high for deep ground levels. Moreover, it may happen for low water levels that the ground is not totally covered, leading to a zero height water column. The heights of the water columns represent the different power levels and the heights of the vessel ground represent the inverse SNRs of the equivalent channels. Hence, the higher the normalized noise level σN2 /σν2 , the less the power spent on channel ν. In other words, more power is assigned to good channels and only low power to bad channels. If a channel is too bad, it is not used at all (λν = 0). The higher the total power Es /Ts , the lower the probability of unused channels. These conclusions hold only for Gaussian distributed input signals. For discrete input alphabets, different strategies have to be pursued with regard to the power distribution. The reason is that only m bits can be transmitted per symbol for 2m -ary modulation schemes. Once an error-free uncoded transmission is reached on a certain layer, more transmit power
78
INFORMATION THEORY
does not increase the layer’s spectral efficiency unless a different modulation scheme is chosen. The increase in spectral efficiency is quantized, leading to different distributions as for the continuous Gaussian case. In conclusion, we can say that without channel knowledge at the transmitter, the best strategy is to transmit NI independent data streams with equal power. However, if the transmitters have perfect channel state information CSI and are allowed to cooperate, the covariance matrix X X has to be chosen such that the eigenmodes of the channel are exploited and the power levels of the resulting r independent channels have to be adjusted according to the waterfilling principle. Assuming identical singular values of the matrix S, (2.82) shows us that the capacity C(S) grows linearly with the rank r of S. Hence, the capacity of full-rank MIMO systems can be linearly increased with r = min{NI , NO }, for example, by using more antennas at the transmitter and receiver. Since C(S) grows only logarithmically with the SNR, the potential of MIMO systems is much larger. Similar to scalar Rayleigh fading channels, ergodic and outage capacities can be determined for vector channels. Since concrete MIMO systems have not been considered, examples will be presented in Chapters 4 and 6 dealing with CDMA and multiple antenna systems.
2.4 Channel Capacity for Multiuser Communications In the previous section, the principal method to determine the capacity of a general MIMO system with NI inputs and NO outputs has been derived without considering specific system properties. Although this approach also covers multiuser scenarios, there is a major difference between point-to-point MIMO and multiuser communications. While the optimization of the former scheme generally targets to maximize the system capacity also denoted as sum rate, each subscriber in a multiuser environment has its own quality-of-service (QoS) constraints such as target data and error rates. Here, we are interested not only in the maximum cell throughput or sum rate but also in the set of achievable individual data rates for all users called capacity region. This point of view allows us to include the aspect of fairness among users. The capacity region contains constellations yielding the maximum sum rate. A simple multiuser AWGN channel where the base station and each user only have a single antenna is now discussed. Extending the scenario to fading channels and finally addressing the multiuser MIMO case. Since the capacity of an AWGN channel is achieved for Gaussian distributed inputs, we assume that the users transmit independent complex Gaussian distributed signals with power levels Pu = Es,u /Ts . Moreover, this choice greatly simplifies the results, allowing us to concentrate on the main aspects of multiuser scenarios. We do not consider detailed derivations and restrict ourselves to the discussion of general results.
2.4.1 Single Antenna AWGN Channel Uplink Transmission Observing the uplink with Nu mobile users and a common base station with a single receive antenna, the optimum strategy is to transmit all user signals simultaneously using
INFORMATION THEORY
79
the entire bandwidth (Tse and Viswanath 2005). This is similar to CDMA systems except that capacity-achieving channel coding is applied instead of direct-sequence spreading as will be discussed in Chapter 4. As mentioned above, we consider independent complex Gaussian distributed signals with power levels Pu = Es,u /Ts . The received symbol at time instant k consists of the transmitted signals xu [k] and the noise n[k] according to y[k] =
Nu
xu [k] + n[k].
(2.92)
u=1
The question to be solved now is what individual rates can be supported for the users. Obviously, the achievable rate Ru of an arbitrary user u is limited even in the absence of all other users by the background noise n[k] and is expressed by the individual capacity Cu derived in Section 2.2. & ' Es,u . (2.93) Ru ≤ Cu = log2 (1 + SNRu ) = log2 1 + N0 Merely summing up the rates in (2.93) for 1 ≤ u ≤ Nu would result in a very optimistic sum rate because the mutual interference is not considered. Instead, we can imagine the ideal situation where a single user would transmit with the power of all Nu users, leading to the maximum capacity or sum rate ! # Nu Nu E s,u Ru ≤ Cmax = log2 1 + u=1 . (2.94) N0 u=1
In order to determine the set of individual rates Ru that can be simultaneously supported, we look at the receiver for the two-user case. It can be easily shown that (2.94) can be rewritten as & ' & ' & ' Es,1 + Es,2 Es,1 Es,2 = log2 1 + + log2 1 + (2.95a) log2 1 + N0 N0 Es,1 + N0 & ' & ' Es,2 Es,1 = log2 1 + + log2 1 + . (2.95b) N0 Es,2 + N0 From (2.95a) and (2.95b), the following conclusion can be drawn. The optimum receiver performs successive decoding with interference cancellation as will be discussed for CDMA systems in Chapter 5. It achieves exactly the maximum sum capacity defined in (2.94). The first user to be detected suffers from the background noise as well as multiuser interference (MUI) induced by the other user. Since both contributions are independent of each other, their corresponding powers can be added, leading to the signal to interference plus noise ratio (SINR) Es,u /(Es,v=u + N0 ). If the detection starts with user two, the second term on the right-hand side of (2.95a) denotes the maximum rate of user two. Therefore, if its rate fulfills & ' Es,2 MUI , (2.96a) R2 ≤ C2 = log2 1 + Es,1 + N0 it can be detected without errors and, hence, perfectly cancelled from the received signal y[k]. The remaining signal of user one is now only disturbed by the thermal noise, leading
80
INFORMATION THEORY R2 (C1MUI , C2 )
C2
(C1 , C2MUI )
C2MUI
C1MUI
C1
R1
Figure 2.15 Capacity region for AWGN uplink transmission and Nu = 2 (bold dashed line for orthogonal schemes)
to its maximum rate
& ' Es,1 = I¯ x1 [k]; y[k] | x2 [k] . R1 = C1 = log2 1 + N0
(2.96b)
This is exactly the maximum mutual information between x1 [k] and y[k] if the signal x2 [k] is totally known to the receiver. Equivalently, we can start with user one, with the resulting rates & ' Es,1 MUI (2.97a) R1 ≤ C1 = log2 1 + Es,2 + N0 and
& ' Es,2 R2 = C2 = log2 1 + = I¯ x2 [k]; y[k] | x1 [k] . N0
(2.97b)
Hence, we have already obtained two rate pairs (C1MUI , C2 ) and (C1 , C2MUI ) of the capacity region in Figure 2.15. These two points represent the maximum sum rate achievable with successive decoding. The whole capacity region comprising all supported pairs (R1 , R2 ) is the area inside the pentagon and can be mathematically expressed as {(R1 , R2 ) | Ru ≤ Cu , u = 1, 2 ∧ R1 + R2 ≤ Cmax } .
(2.98)
Without the constraint in (2.94), we would have obtained a rectangle instead of a pentagon, which is obviously too optimistic. The points on the line between the maximum sum rates (C1MUI , C2 ) and (C1 , C2MUI ) in Figure 2.15 can be reached by appropriately switching between the two corresponding transmission strategies. Surprisingly, the maximum sum rate is larger than the maximum of the capacities (C1 , C2 ). Therefore, we can conclude that the orthogonal multiple access schemes such as
INFORMATION THEORY
81
time division multiple access (TDMA) or frequency division multiple access (FDMA) that share the resources – time or frequency – among the users show an inferior performance. In the following discussion, we still consider the two-user case and fix the received symbol energies to Es,1 and Es,2 . The first user shall obtain a portion α of the resource to be shared and the rest (1 − α) is spent on the second user. For TDMA, the received powers can be expressed as P1 = Es,1 /(αTs ) and P2 = Es,2 /[(1 − α)Ts ]. The received noise power is not affected by α and amounts to N = N0 B = N0 /Ts . On the contrary, FDMA partitions the bandwidth according to B1 = αB and B2 = (1 − α)B, leading to the noise powers N1 = αBN0 for the first user and N2 = (1 − α)BN0 for the second user. Hence, the resulting SNRs of both the users are divided for TDMA as well as FDMA by factors α and (1 − α), respectively, and the corresponding capacities are & ' Es,1 C1 = α · log2 1 + αN0 & ' Es,2 . C2 = (1 − α) · log2 1 + (1 − α)N0 This leads to the sum rate
& ' & ' Es,1 Es,2 orth + (1 − α) · log2 1 + Cmax = α · log2 1 + αN0 (1 − α)N0
(2.99)
with 0 ≤ α ≤ 1. For this scenario, we see from Figure 2.15 that the sum rate of orthogonal multiple access schemes reaches the maximal sum capacity only for distinct points and is inferior elsewhere (Tse and Viswanath 2005). The generalization of the optimum strategy with successive decoding for Nu users is straightforward. For each of the Nu ! possible detection orders, a set of Nu individual rate constraints is obtained. As an example, the rates ! # Es,u MUI Ru ≤ Cu = log2 1 + N (2.100a) u v=u+1 Es,v + N0 are valid if the signals are detected in ascending order. The maximum sum rate amounts to ! # Nu u=1 Es,u Cmax = log2 1 + . (2.100b) N0 The complete capacity region requires the consideration of all sets K ⊂ {1 , . . . , Nu } containing different subsets of users and the corresponding sum rates. Hence, it describes an Nu -dimensional polymatroid according to
! #< Nu Nu ; u∈K Es,u Ru ≤ log2 1 + {Ru }u∈K | . (2.101) N0 u∈K K⊂{1, ..., Nu }
Downlink Transmission In the downlink, a common base station transmits to different mobile units. Each mobile unit receives all transmitted signals and has to extract the desired one. Observing the uth
82
INFORMATION THEORY
receiver, we obtain the signal yu [k] =
Nu
xv [k] + nu [k].
(2.102)
v=1
A major difference compared to the uplink is that we have to share the total transmit power among the users while each user had its own power constraint in the uplink scenario. Nevertheless, the individual rate constraints in (2.93) and (2.100a) still hold. The constraint in (2.93) covers the case when all the transmit power is spent on only a single user so that there is no interference. In order to arrive at the optimum transmission strategy, we look again at the two-user case and assume without loss of generality that Es,1 > Es,2 with Es,1 + Es,2 = Es holds. If the code rate of user two is adjusted appropriately so that it can detect its signal even under interference of user one without errors, then user one can also detect the signal of user two without errors. After detecting user two’s signal, user one can proceed with the processing of its own signal. Hence, we again have the strategy of successive detection as in the uplink. The maximum rate of the user to be detected first is & ' Es,2 MUI . (2.103) R2 ≤ C2 = log2 1 + N0 + Es,1 Since the other user can detect and subtract x2 first, it can transmit at the rate & ' Es,1 R1 ≤ C1 = log2 1 + . N0
(2.104)
Adapting the rates appropriately, the detection order can also be reversed. For the general case of Nu users and an ascending order of detection, the uth user can transmit at the rate ! # Es,u MUI . (2.105) Ru ≤ Cu = log2 1 + u N0 + N v=u+1 Es,v
2.4.2 Single Antenna Flat Fading Channel Uplink Transmission The major difference compared to the AWGN uplink channel is that fading coefficients hu [k] affect the SNR at the transmitter. The coefficients are assumed to be independent but identically distributed. The received signal has the form y[k] =
Nu
hu [k] · xu [k] + n[k].
(2.106)
u=1
In the following part, we distinguish slowly and fast fading channels. Slowly fading channels For slowly fading channel, the coefficients are constant during one coding block, resulting in hu [k] = hu . Hence, we have a scenario similar to the AWGN uplink in each coding block except that the received signal powers are different owing to the fading. As a consequence, the instantaneous capacity region for a single channel snapshot is obtained from the single
INFORMATION THEORY
83
antenna AWGN results by replacing the user-specific SNR Es,u /N0 with |hu |2 Es,u /N0 . Certainly, simultaneous transmission with successive decoding still achieves capacity. The major difference between AWGN and slowly fading channels is that this capacity region is itself a random variable as explained in Section 2.2.5. Assuming identical transmit powers Es /Ts for all users, the outage probability that a certain sum rate R cannot be supported becomes
! # < Nu Es 2 Pout = Pr log2 1 + · |hu | < R . (2.107) N0 u=1
Fast fading channels without CSI at transmitter Fast fading channels differ from slowly fading ones in that (many) different channel snapshots fall into one coding block. As we already know from Section 2.2.5, the ergodic capacity is an appropriate means for the characterization of such channels. Without channel state information (CSI) at the transmitter, the best strategy is that all users transmit with the same average signal power Es /Ts and the ergodic maximum sum rate becomes
! #< Nu E s C¯ max = E log2 1 + · |Hu |2 . (2.108) N0 u=1
For Rayleigh fading channels, the sum of the channel coefficients’ squared magnitudes is chi-squared distributed with 2Nu degrees of freedom. An upper bound is obtained with Jensen’s inequality from (2.65)
N ! #< ! <# Nu u Es Es 2 2 E log2 1 + · |Hu | ·E |Hu | ≤ log2 1 + N0 N0 u=1 u=1 & ' Es (2.109) = log2 1 + Nu N0 where we have assumed that E{|Hu |2 } = 1 holds for 1 ≤ u ≤ Nu . Comparing (2.109) with (2.94), it becomes u obvious that the last term describes the maximal sum rate for the AWGN uplink with N u=1 Es,u = Nu Es . Hence, fast fading without channel knowledge at the transmitter always decreases the capacity. Simultaneous transmission with successive decoding still achieves capacity. Downlink Transmission For the downlink, we distinguish the situations with and without channel knowledge at the transmitter. Without CSI at the transmitter, the same argumentation as for the AWGN case holds: Superposition coding with successive decoding is capacity achieving. On the contrary, CSI at the transmitter allows powerful scheduling, which will be explained for fast fading channels. Fast fading channels with CSI at transmitter If the common base station has instantaneous channel knowledge of all channels, orthogonal multiple access with appropriate scheduling of the users is possible and improves the
84
INFORMATION THEORY
maximal throughput remarkably. This effect is called multiuser diversity and is illustrated subsequently. The key idea is that only the user with the best instantaneous propagation conditions, that is, the largest channel gain |hu [k]|2 , transmits. If we know all channel states within one coding block, the power of each user is distributed onto its active time instances according to the waterfilling principle derived in Section 2.3. If the channel statistics are identical for all users, they have asymptotically the same number of slots that they can transmit. Moreover, all users achieve the same asymptotic rate and fulfill their power constraints. Multiuser Diversity Multiuser diversity describes an effect that occurs for channel-dependent scheduling. We consider the downlink for a common base station that has to serve Nu users and has knowledge of the instantaneous channel quality. As mentioned earlier, the best strategy to achieve maximum throughput is to pick up the user with the strongest channel. If the channels are independent and identically distributed, every user can transit the same average data rate. In order to illustrate the effect of always choosing the strongest channel, we have a look at a simple system consisting of Nu users with independent flat Rayleigh fading channels. The corresponding SNRs γi = |hi |2 Es /N0 are chi-squared distributed with two degrees of freedom according to (1.27) on page 15. We are now interested in finding the PDF pγmax (ξ ) for γmax = max γ1 , . . . , γi , . . . , γNu . (2.110) The probability that γmax is smaller than a constant γ depends on the cumulative density function (CDF) Pγmax (γ ) Pr{γmax < γ } = Pγmax (γ ) = Pr{γ1 < γ , γ2 < γ , · · · , γNu < γ }.
(2.111)
For independent processes γi , the joint probability in (2.111) can be factorized into the marginal probabilities Pr{γi < γ }, resulting in Pγmax (γ ) =
Nu 7
Pγi (γ ).
(2.112)
i=1
The PDF is obtained by the derivative of (2.112). With dPγi (γ )/dγ = pγi (γ ), the application of the product rule yields pγmax (γ ) =
Nu Nu 7 dPγmax (γ ) = pγi (γ ) · Pγi (γ ). dγ
(2.113)
j =1,j =i
i=1
For chi-squared distributed processes with Pγi (γ ) = 1 − e−γ /γ¯i , (2.113) results in pγmax (γ ) =
Nu Nu 7 1 −γ /γ¯i 1 − e−γ /γ¯i . ·e · γ¯i i=1
j =1,j =i
(2.114)
INFORMATION THEORY
85
a) PDF of max. SNR 1
b) CDF of max. SNR 1
single Nu = 4 Nu = 6 Nu = 8 Nu = 10
0.8 0.6
0.8 0.6
0.4
0.4
0.2
0.2
0 0
2
4 γ →
6
8
0 0
single Nu = 4 Nu = 6 Nu = 8 Nu = 10 2
4 γ →
6
8
Figure 2.16 Probability density and cumulative distribution of largest out of Nu i.i.d. processes (bold line: single coefficient), marks indicate mean SNRs If the processes are identically distributed further with γ¯i = γ¯ , (2.114) becomes Pγmax (γ ) =
Nu −1 Nu −γ /γ¯ ·e · 1 − e−γ /γ¯ . γ¯
(2.115)
Figure 2.16 illustrates the probability density of the largest out of the Nu i.i.d. processes. Obviously, the probability that the largest process takes a very small value becomes much smaller with increasing Nu . Consequently, the probability of large values increases, which is also indicated by the mean values. Therefore, the average SNR of those channels used by the base station grows significantly compared to the average SNR of the entire ensemble. Figure 2.17 shows the corresponding ergodic capacities C¯ versus Es /N0 and the outage probabilities versus the sum rate R. Obviously, C¯ increases remarkably for growing number of users. At low SNR, the capacity can be doubled compared to the single user case because the probability that one out of Nu users has a good channel is much larger than the probability of a single user. Although the effect of multiuser diversity seems to look similar to that of conventional diversity, both techniques are totally different. While conventional diversity techniques average over many good and bad channels in order to reduce the overall variance, multiuser diversity concepts use only the good channels and increase the mean SNR. Thus, we skip at least the worst channels and try to ride on the peaks.
2.4.3 Multiple Antennas at Transmitter and Receiver Uplink with Single Transmit and Multiple Receive Antennas We start the multiuser MIMO discussion with the simple case of a single transmit antenna and multiple receive antennas for the uplink. Except for the power constraints, this scenario is equivalent to a point-to-point MIMO system discussed in Section 2.3 and represents the classical SDMA system where Nu users equipped with single antennas transmit independent
INFORMATION THEORY a) ergodic capacity versus Es /N0 5 single Nu = 2 4 Nu = 4 Nu = 8 Nu = 10 3 2
b) outage probability versus R
0
10
−1
10
Pout →
C¯ →
86
single Nu = 2 Nu = 4 Nu = 8 Nu = 10
−2
10 1 0 0
−3
2
4 6 8 Es /N0 in dB →
10
10
0
2
4 R→
6
Figure 2.17 Illustration of multiuser diversity for different Nu . a) Ergodic capacities b) Outage probabilities for 10 log10 (Es /N0 ) = 10 dB data streams to a common base station with multiple receive antennas. It is shown in (Tse and Viswanath 2005) that the optimal receiver performs linear minimum mean squared error (MMSE) filtering with successive interference cancellation (cf. also Subsection 5.2.2). For the two-user case, the instantaneous capacity region is described again by a pentagon as depicted in Figure 2.15. The intersections with the axes represent the cases where only a single user is active and beamforming is applied at the receiver. These scenarios yield the instantaneous capacities & ' hu [k]2 Es,u , (2.116) Ru [k] ≤ log2 1 + N0 where the scalar channel coefficient hu [k] known from Subsection 2.4.2 has been replaced by the corresponding vector hu [k] owing to multiple receive antennas at the base station. The maximum sum rate is obtained assuming a situation where a single user transmits two independent data streams with powers P1 and P2 over two transmit antennas. Inserting the corresponding covariance matrix X X = diag[P1 , P2 ] into the general result of (2.78) from the last section leads to ! # + * 1 P1 0 H H [k] R1 [k] + R2 [k] ≤ log2 det INR + 2 · H[k] 0 P2 σN & ' Es,1 Es,2 H H = log2 det INR + · h1 [k]h1 [k] + · h2 [k]h2 [k] . (2.117) N0 N0 The extension to Nu users is straightforward and yields ! # Nu Nu Es,u H Ru [k] ≤ log2 det INR + hu [k]hu [k] . N0 u=1
u=1
(2.118)
INFORMATION THEORY
87
The achievable rate of user two that experiences user one’s signal as interference becomes 1 2 −1 H R2 [k] ≤ log2 det 1 + Es,2 hH [k] N I + E h [k]h [k] h [k] . (2.119) 0 N s,1 1 2 R 2 1 An equivalent expression holds for user one. For base stations with a single receive antenna, successive decoding and interference cancellation was already found to be the best receiver concept. In that case, interference is treated as temporally white Gaussian noise. On the contrary, we recognize from the term h1 [k]hH 1 [k] in (2.119) that interference is treated as spatially colored noise when multiple receive antennas are used. We will see in Subsection 5.2.2 that the H −1 in (2.119) represents an MMSE filter that directly term hH 2 [k](N0 INR + Es,1 h1 [k]h1 [k]) addresses the noise color and represents the optimum linear filter for Gaussian inputs. Naturally, outage probabilities for slowly fading channels and ergodic capacities for fast fading channels can be derived from these instantaneous results. Uplink with Multiple Antennas at Transmitter and Receiver Now, we extend the previous scenario by allowing multiple antennas at each mobile. This leads to a system with Nu users that have NTu antennas and transmit as many independent data streams with power levels Pu,i with 1 ≤ u ≤ Nu and 1 ≤ i ≤ NTu subject to the NTu constraint i=1 Pu,i = Pu . In the absence of multiple access interference, for example, in a single-user environment, we can directly apply the results in (2.78) from the last section and obtain the individual rate constraints ! # 1 H (2.120) Ru ≤ Cu = log2 det INR + 2 Hu Xu Xu Hu . σN The transmit covariance matrix Xu Xu = Uu u UH u of user u can be decomposed into the unitary matrix Uu and the diagonal matrix u . The latter determines the transmit powers of the data streams generated by user u. In multiuser scenarios, the maximum sum rate of the entire system is limited to ! # Nu 1 H Hu Xu Xu Hu . (2.121a) Cmax = log2 det INR + 2 σN u=1 The capacity region is obtained by considering all possible subsets K ⊂ {1, . . . , Nu } as in Subsection 2.4.1. It amounts to !
#< ; 1 H Ru ≤ log2 det INR + 2 Hu Xu Xu Hu {Ru }u∈K | . (2.121b) σN u∈K u∈K K⊂{1, ..., N } u
u The common base station deploys N u=1 NTu MMSE filters, one for each data stream. The detection is performed by successive MMSE filtering and interference cancellation. For each power allocation scheme and each set of steering matrices Uu , the achievable rates can be described for the two-user case by a pentagon such as the one depicted in Figure 2.15. The whole capacity region is a convex hull comprising all pentagons obtained
88
INFORMATION THEORY
for different u and power allocations. Since mobile users generally have only knowledge of their own channel, an overall optimization is hardly possible. Without any channel knowledge at the transmitters and independent Rayleigh fading components of the MIMO channels, identical power levels Pu,i = Pu /NTu should be used, leading to a diagonal covariance matrix u = Pu,i · INTu . Downlink with Multiple Transmit and Single Receive Antennas The major difference between the previously discussed uplink and the downlink is that transmit antennas at a common base station can cooperate with each other. We start with a single receive antenna for each user. Hence, interference cannot be spatially suppressed at the mobile but it can be avoided by appropriate signaling at the transmitter. Assuming that the channel to user u is described by the row vector hu , the use of a linear prefilter w = hH u maximizes the SNR because it is matched to the spatial channel. In fact, beamforming is applied and the beam pattern points directly in the optimal direction. However, MUI is not considered here and this prefilter will probably disturb all other users. The MIMO channel matrix is obtained by arranging all row vectors hu into the Nu × NT channel matrix h1 .. H = . . hNu Interference from user u to all other users is avoided by using a prefilter that is orthogonal to all but the uth rows in H. In the context of a CDMA uplink, we will see in Section 5.2 that the decorrelator represents such a filter. There, it is deployed as a receive filter that perfectly suppresses all interference at the expense of severe background noise amplification. Since the base station knows the transmission channels of all users, we can implement the decorrelator (zero forcing, ZF) at the transmitter and can avoid the MUI. We obtain −1 . (2.122) WZF = H† = HH HHH The uth column wZF,u of WZF represents the prefilter of user u. At its receiver, we obtain the signal −1 · x + nu . (2.123) yu = hu · AWZF · x + nu = hu · AHH HHH Geometrically, the steering vector wZF,u for each symbol xu in x does not necessarily point directly in the direction of the desired user. Instead, it points in a direction perpendicular to all other users so that they will be not disturbed. Hence, we can imagine that only a small portion of the transmit power arrives at the desired receiver. In order to satisfy the QoS constraints of the associated user, a potentially unlimited transmit power would have to be spent on this user, violating the overall power constraint. This effect is equivalent to the amplification of the background noise that occurs for the decorrelator. Therefore, the scaling factor A has to adjust the total transmit power according to some defined constraints, which may lead to poor SNRs at certain receivers. Hence, the question arises as to how the optimum linear prefilter looks. Since all user signals influence each other, the optimum choice of filters and transmit powers has to be determined for all channels hu simultaneously. A separate optimization would fail. As a
INFORMATION THEORY
89
measure of performance, the SINR at the mobile receivers can be used. It has been shown in (Tse and Viswanath 2005) that a duality exists between uplink and downlink. Therefore, the MMSE filter, which is known to maximize the SINR at the receiver, would be the optimum linear prefilter at the transmitter, too. There also exists an equivalent transmitter structure for successive interference cancellation at the receiver. Here, nonlinear precoding techniques such as the Tomlinson-Harashima precoding have to be applied (Fischer 2002). Downlink with Single Transmit and Multiple Receive Antennas In environments with a single transmit antenna at the base station and multiple receive antennas at each mobile, superposition coding and receive beamforming with interference cancellation is the optimal strategy and maximizes the SINRs at the receive filter outputs. Considering the two-user case, both transmit signals xu [k] are assumed to have identical powers Es /Ts . The receive filters are matched to the spatial channel vectors hu and deliver the outputs hH hH ˜ (2.124) ru [k] = u · hu x[k] + u n[k] = hu x[k] + n[k]. hu hu At each mobile, superposition decoding has to be applied. Without loss of generality, we can assume that h1 2 > h2 2 holds so that the rates each user can support are & ' Es,1 R1 ≤ C1 = log2 1 + h1 2 N0 & ' h2 2 Es,2 . R2 ≤ C2MUI = log2 1 + h2 2 Es,1 + N0 Hence, user two decodes only its own signal disturbed by thermal noise and the signal of user one. On the contrary, user one first detects the signal of user two, subtracts it from the received signal and decodes its own signal afterwards. Downlink with Multiple Transmit and Receive Antennas Finally, we briefly consider the multiuser MIMO downlink where transmitters and receivers are both equipped with multiple antennas. Here, the same strategies as in the multiuser MIMO uplink have to be applied. For each user, the base station transmits parallel data streams over its antennas. With full CSI at the transmitter, linear prefiltering in the zeroforcing or MMSE sense or nonlinear precoding can be applied. At the receivers, MMSE filtering with successive interference cancellation represents the optimum strategy.
2.5 Summary This chapter has addressed some fundamentals of information theory. After the definitions of information and entropy, mutual information and the channel capacity have been derived. With these quantities, the channel coding theorem of Shannon was explained. It states that an error-free transmission can be principally achieved for an optimal coding scheme if the code rate is smaller than the capacity. The channel capacity has been illustrated for
90
INFORMATION THEORY
the AWGN channel and fading channels. The basic difference between them is that the instantaneous capacity of fading channels is a random variable. In this context, ergodic and outage capacities as well as the outage probability have been defined. They were illustrated by several examples including some surprising results for diversity. The principle method of an information theoretic analysis of MIMO systems is explained in Section 2.3. Basically, the SVD of the MIMO system matrix delivers a set of parallel SISO subsystems whose capacities are already known from the results of previous sections. Particular examples will be presented in Chapters 4 and 6. Finally, multiuser scenarios are briefly discussed. As a main result, we saw that orthogonal multiple access schemes do not always represent the best choice. Instead, systems with inherent MUI but appropriate code and receiver design often achieve a higher sum capacity. If the channel is known to the transmitter, channel-dependent scheduling exploits the multiuser diversity and increases the maximum throughput remarkably.
3
Forward Error Correction Coding Principally, three fundamental coding principles are distinguished: source coding, channel or forward error correction (FEC) coding and cryptography. The task of source coding is to compress the sampled and quantized signal such that a minimum number of bits is needed for representing the originally analog signal in digital form. On the contrary, codes for cryptography try to cipher a signal so that it can only be interpreted by the desired user and not by third parties. In this chapter, channel coding techniques that pursue a totally different intention are considered. They should protect the information against transmission errors in the sense that an appropriate decoder at the receiver is able to detect or even correct errors that have been introduced during transmission. This task is accomplished by adding redundancy to the information, that is, the data rate to be transmitted is increased. In this manner, channel coding works contrary to source coding which aims to represent a message with as few bits as possible. Since channel coding is only one topic among several others in this book, it is not the aim to treat this topic comprehensively. Further information can be found in Blahut (1983), Bossert (1999), Clark and Cain (1981), Johannesson and Zigangirov (1998), Lin and Costello (2004). This chapter starts with a brief introduction reviewing the system model and introducing some fundamental basics. Section 3.2 explains the concept of linear block codes, their description by generator and parity check matrices as well as syndrome decoding. Next, convolutional codes which represent one of the most important error-correcting codes in digital communications are introduced. Besides the definition of their encoder structure, their graphical representation, and the explanation of puncturing, the Viterbi decoding algorithm is derived, whose invention launches the breakthrough for these kinds of codes in practical systems. Section 3.4 derives special decoding algorithms that provide reliability information at their outputs. They are of fundamental importance for concatenated coding schemes addressed in Section 3.6. Section 3.5 discusses the performance of codes by different means. The distance properties of codes are examined and used for Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
92
FORWARD ERROR CORRECTION CODING
the derivation of an upper bound on the error probability. Moreover, an information theoretical measure termed information processing characteristic (IPC) is used for evaluation. Finally, Section 3.6 treats concatenated coding schemes and illustrates the turbo decoding principle.
3.1 Introduction FEC coding plays an important role in many digital systems, especially in today’s mobile communication systems which are not, realizably, without coding. Indeed, FEC codes are applied in standards like GSM (Global System for Mobile Communications) (Mouly and Pautet 1992), UMTS (Universal Mobile Telecommunication System) (Holma and Toskala 2004; Laiho et al. 2002; Ojanper¨a and Prasad 1998b; Steele and Hanzo 1999) and Hiperlan/2 (ETSI 2000, 2001) or IEEE802.11 (Hanzo et al. 2003a). Thus, channel coding is not restricted to communications but can also be found in storage applications. In this area, compact disks, digital versatile disks, digital audiotape (DAT) tapes and hard disks in personal computers use FEC strategies. Since the majority of digital communication systems transmit binary data with symbols taken from the finite Galois field GF(2) = {0, 1} (Blahut 1983; Lin and Costello 2004; Peterson and Weldon 1972), we only consider binary codes throughout this book. Moreover, we restrict the derivations in this chapter to a blockwise BPSK transmission over frequencynonselective channels with perfect channel state information (CSI) at the receiver. On the basis of these assumptions and the principle system structure illustrated in Figure 1.6, we obtain the model in Figure 3.1. First, the encoder collects k information bits out of the data stream d[i] and builds a vector d. Second, it maps this vector onto a new vector b of length n > k. The resulting data stream b[] is interleaved, BPSK modulated, and transmitted over the channel. The frequency-nonselective channel consists of a single coefficient h[] per time instant and the additive white Gaussian noise (AWGN) component n[]. According to Section 1.3.1, the optimum ML sequence detector determines that code sequence b˜ with the largest conditional probability density pY|b˜ (y). Equivalently, we can also estimate the sequence x because BPSK simply maps a bit in b onto a binary symbol
d[i]
FEC encoder
b[]
d k ˆd ˆ d[i]
x[]
BPSK h[]
b n r
FEC decoder
CSI
r[]
channel n[] −1
Re
y[] h[]∗ /|h[]|
matched filter
Figure 3.1 Structure of coded communication system with BPSK
FORWARD ERROR CORRECTION CODING
93
in x.1 Since the logarithm is a strictly monotone function, we obtain , , xˆ = argmax pY|˜x (y) = argmax log pY|˜x (y) . x˜
(3.1)
x˜
For flat fading channels with y[] = h[] · x[] + n[], the conditional densities pY|˜x (y) can be factorized ! . .2 # .y[] − h[]x[] 7 ˜ . 1 pY|x[] · exp − pY|˜x (y) = ˜ (y[]) with pY|x[] ˜ (y[]) = π σN2 σN2 where σN2 denotes the power of the complex noise. Inserting the conditional probability density into (3.1) leads to . .2 .y[] − h[] · x[] ˜ . = argmax xˆ = argmin x[] ˜ · Re h∗ [] · y[] x˜
= argmax x˜
x˜
x[] ˜ · |h[]| · r[] with
r[] =
1 · Re h∗ [] · y[] . |h[]|
(3.2)
Therefore, the optimum receiver for coded BPSK can be split into two parts, a matched filter and the FEC decoder as illustrated in Figure 3.1. First, the matched filter (cf. Section 1.3.4 on page 26) weights the received symbols y[] with h∗ []/|h[]| and – for BPSK – extracts the real parts. This multiplication corrects the phase shifts induced by the channel. In the decoder, r[] is first attenuated with the CSI |h[]|, which is fed through the de-interleaver to its input.2 Owing to this scaling, unreliable received symbols attenuated by channel coefficients with small magnitudes contribute only little to the decoding decision, whereas large coefficients have a great influence. Finally, the ML decoder determines the codeword xˆ with the maximum correlation to the sequence {· · · |h[]|r[] · · · }. Owing to the weighting with the CSI, each information symbol x[] is multiplied in total with |h[]|2 . Hence, the decoder exploits diversity such as the maximum ratio combiner for diversity reception discussed in Section 1.5.1, that is, decoding exploits time diversity in time-selective environments. While the computational complexity of the brute force approach that directly correlates this sequence with all possible hypotheses x˜ ∈ grows exponentially with the sequence length and is prohibitively high for most practical implementations, less complex algorithms will be introduced in subsequent sections. As mentioned above, the encoding process simply maps a vector of k binary symbols onto another vector consisting of n symbols. Owing to this assignment, which must be bijective, only 2k vectors out of 2n possible vectors are used as codewords. In other words, the encoder selects a k-dimensional subspace out of an n-dimensional vector space. A proper choice allows the detection and even the correction of transmission errors. The ratio Rc = 1 In
k , n
(3.3)
the following derivation, the influence of the interleaver is neglected. the steps can be combined so that a simple scaling of y[] with h∗ [] is sufficient. In this case, the product h[]∗ y[] already bears the CSI and it has not to be explicitly sent to the decoder. 2 Both
94
FORWARD ERROR CORRECTION CODING
is called code rate and describes the relative amount of information in a codeword. Consequently, the absolute redundancy is n − k, the relative redundancy (n − k)/n = 1 − Rc . We strictly distinguish between the code representing the set of codewords (subspace with k dimensions) and the encoder (Bossert 1999). The latter just performs the mapping between d and b. Systematic encoding means that the information bits in d are explicitly contained in b, for example, the encoder appends some additional bits to d. If information bits and redundant bits cannot be distinguished in b, the encoding is called nonsystematic. Note that the position of systematic bits in a codeword can be arbitrary. Optimizing a code means arranging a set of codewords in the n-dimensional space such that certain properties are optimal. There exist different criteria for improving the performance of the entire coding scheme. As will be shown in Subsection 3.5.1, the pairwise Hamming distances between codewords are maximized and the corresponding number of pairs with small distances is minimized (Bossert 1999; Friedrichs 1996; Johannesson and Zigangirov 1998; Lin and Costello 2004). A different approach proposed in H¨uttinger et al. (2002) and addressed in Subsection 3.5.3 focuses on the mutual information between encoder input and decoder output being the basis of information theory. Especially for concatenated codes, this approach seems to be well suited for predicting the performance of codes accurately (H¨uttinger et al. 2002; ten Brink 2000a,b, 2001c). However, the optimization of codes is highly nontrivial and still an unsolved problem in the general case. Similar to Section 1.3.2 where the squared Euclidean distance between symbols determined the error rate performance, an equivalent measure exists for codes. The Hamming distance dH (a, b) denotes the number of differing symbols between the codewords a and b. For binary codes, the Hamming distance and Euclidean distance are equivalent measures. The minimum distance dmin of a code, that is, the minimum Hamming distance that can occur between any pair of codewords, determines the number of correctable and detectable errors. An (n, k, dmin ) code can certainly correct = > dmin − 1 (3.4a) t= 2 and detect
t = dmin − 1
(3.4b)
errors.3 In (3.4a), x denotes the largest integer smaller than x. Sometimes a code may correct or detect even more errors, but this cannot be ensured for all error patterns. With reference to convolutional codes, the minimum Hamming distance is called free distance df . In Subsection 3.5.1, the distance properties of codes are discussed in more detail.
3.2 Linear Block Codes 3.2.1 Description by Matrices Linear block codes represent a huge family of practically important codes. This section describes some basic properties of block codes and considers selected examples. As already mentioned, we restrict to binary codes, whose symbols are elements of GF(2). Consequently, 3 This is a commonly used notation for a code of length n with k information bits and a minimum Hamming distance dmin .
FORWARD ERROR CORRECTION CODING
95
the rules of finite algebra have to be applied. With regard to the definitions of finite groups, fields, and vector spaces, we refer to Bossert (1999). All additions and multiplications have to be performed modulo 2 according to the rules in GF(2), which are denoted by ⊕ and ⊗, respectively. In contrast to hard decision decoding that often exploits the algebraic structure of a code in order to find efficient algorithms, soft-in soft-out decoders that are of special interest in concatenated schemes exist and will be derived in Section 4.3. Generator Matrix An (n, k) linear block code can be completely described by a generator matrix G consisting of n rows and k columns. Each information word is represented by a column vector d = [d1 , . . . , dk ]T of length k and assigned to a codeword b = [b1 , . . . , bn ]T of length n by4 G1,1 . . . G1,k .. (3.5) b=G⊗d with G= . . Gn,1
. . . Gn,k
The code represents the set of all 2k codewords and is defined as = G ⊗ d | d ∈ GF(2)k
(3.6)
where GF(2)k denotes the k-dimensional vector space where each dimension can take values out of GF(2). The codeword b can be interpreted as linear combination of the columns of G where the symbols in d are the coefficients of this combination. Owing to the assumed linearity and the completeness of the code space, all columns of G represent valid codewords. Therefore, they span the code space, that is, they form its basis. Elementary matrix operations Re-sorting the rows of G leads to a different succession of the symbols in a codeword. Codes that emanate from each other by re-sorting their symbols are called equivalent codes. Although the mapping d → b is different for equivalent codes, their distance properties (see also Section 3.5.3) are still the same. However, the capability of detecting or correcting bursty errors may be destroyed. With reference to the columns of G, the following operations are allowed without changing the code. 1. Re-sorting of columns 2. Multiplication of a column with a scalar according to the rules of finite algebra 3. Linear combination of columns. By applying the operations listed above, each generator matrix can be put into the Gaussian normal form * + I (3.7) G= k . P 4 In many text books, row vectors are used to describe information and codewords. Since we generally define vectors as column vectors, the notation is adapted appropriately.
96
FORWARD ERROR CORRECTION CODING
In (3.7), Ik represents the k × k identity matrix and P a parity matrix with n − k rows and k columns. Generator matrices of this form describe systematic encoders because the multiplication of d with the upper part of G results in d again. The rest of the codeword represents redundancy and is generated by the linear combining subsets of bits in d. Parity Check Matrix Equivalent to the generator matrix, the n × (n − k) parity check matrix H can be used to define a code. Assuming a structure of G as given in (3.7), it has the form * T+ −P . (3.8) H= In−k The minus sign in (3.8) can be neglected for binary codes. Obviously, the relation * + I HT ⊗ G = −P I(n−k) ⊗ k = −P ⊕ P = 0(n−k)×k . P
(3.9)
always holds regardless of whether G and H have the Gaussian normal form or not. Since the columns of G form the basis of the code space, HT ⊗ b = 0(n−k)×1
(3.10)
is valid for all b ∈ , that is, the columns in H are orthogonal to all codewords in . Hence, the code represents the null space concerning H and can be expressed by (3.11) = b ∈ GF(2)n | HT ⊗ b = 0(n−k)×1 . Syndrome decoding The parity check matrix can be used to detect and correct transmission errors. We assume that the symbols of the received codeword r = b ⊕ e have already been hard decided, and e denote the error pattern with nonzero elements at erroneous positions. The syndrome is defined by s = HT ⊗ r = HT ⊗ (b ⊕ e) = HT ⊗ b ⊕ HT ⊗ e = HT ⊗ e
(3.12)
and represents a vector consisting of n − k elements. We see from (3.12) that it is independent of the transmitted codeword x and depends only on the error pattern e. For s = 0(n−k)×1 , the transmission was error free or the error pattern was a valid codeword (e ∈ ). In the latter case, the error is not detectable and the decoder fails. If a binary (n, k, dmin ) code must be able to correct t errors, each possible error pattern has to be uniquely assigned to a syndrome. Hence, as many syndromes as error patterns are needed and the following Hamming bound or sphere packing bound is obtained: 2
n−k
t & ' n ≥ . r
(3.13)
r=0
Equality holds for perfect codes that provide exactly as many syndromes (left-hand side of (3.13)) as necessary for uniquely labeling all error patterns with wH (e) ≤ t. This corresponds
FORWARD ERROR CORRECTION CODING
97
to the densest possible packing of codewords in the n-dimensional space. Only very few perfect codes are known today. One example are the Hamming codes that will be described subsequently. Since the code consists of 2k out of 2n possible elements of the n-dimensional vector space, there exist much more error patterns (2n − 2k ) than syndromes. Therefore, decoding principles such as standard array decoding or syndrome decoding (Bossert 1999; Lin and Costello 2004) group error vectors e leading to the same syndrome sµ into a coset Mµ = e ∈ GF(2)n | HT ⊗ e = sµ .
(3.14)
For each coset Mµ with µ = 0, . . . , 2n−k − 1, a coset leader eµ is determined, which generally has the minimum Hamming weight among all elements of Mµ . Syndromes and coset leaders are stored in a lookup table. After the syndrome s has been calculated, the table is scanned for the corresponding coset leader. Finally, the error correction is performed by subtracting the coset leader from the received codeword bˆ = r ⊕ eµ .
(3.15)
This decoding scheme represents the optimum maximum likelihood hard decision decoding. Unlike the direct approach of (3.2), which compares all possible codewords with the received vector, the exponential dependency between decoding complexity and the cardinality of the code is broken by exploiting the algebraic code structure. More sophisticated decoding principles such as soft-in soft-out decoding are presented in Section 3.4. Dual code On the basis of the above properties, the usage H instead of G for encoding leads to a code ⊥ whose elements are orthogonal to . It is called dual code and is defined by , ⊥ = b˜ ∈ GF(2)n | b˜ T ⊗ b = 0 ∀
b∈ .
(3.16)
The codewords of ⊥ are obtained by b˜ = H ⊗ d˜ with d˜ ∈ GF(2)n−k . Owing to the dimension of H, the dual code has the same length as but consists of only 2n−k elements. This fact can be exploited for low complexity decoding. If n − k k holds, it may be advantageous to perform the decoding via the dual code and not with the original one (Offer 1996).
3.2.2 Simple Parity Check and Repetition Codes The simplest form of encoding is to repeat each information bit n − 1 times. Hence, an (n, 1, n) repetition code (RP) with code rate Rc = 1/n is obtained, which consists of only 2 codewords, the all-zero and the all-one word. · · 1]T } = {[0 · · 0]T , [1 · · n
n
98
FORWARD ERROR CORRECTION CODING
Since the two codewords differ in all n bits, the minimum distance amounts to dmin = n. The generator and parity check matrices have the form 1 1 ··· 1 1 1 (3.17) G = . H= . . .. .. 1
1
As the information bit d is simply repeated, the multiplication of b with HT results in the modulo-2-addition of d with each of its replicas, which yields the all-zero vector. The corresponding dual code is the (n, n − 1, 2) single parity check (SPC) code. The generator matrix equals H in (3.17) except that the order of the identity and the parity part has to be reversed. We recognize that the encoding is systematic. The row consisting only of ones delivers the sum over all n − 1 information bits. Hence, the encoder appends a single parity bit so that all codewords have an even Hamming weight. Obviously, the minimum distance is dmin = 2 and the code rate Rc = (n − 1)/n.
3.2.3 Hamming and Simplex Codes Hamming codes are probably the most famous codes that can correct single errors (t = 1) and detect double errors (t = 2). They always have a minimum distance of dmin = 3 whereby the code rate tends to unity for n → ∞. Definition 3.2.1 A binary (n, k, 3) Hamming code of order r has the block length n = 2r − 1 and encodes k = n − r = 2r − r − 1 information bits. The rows of H represent all decimal numbers between 1 and 2r − 1 in binary form. Hamming codes are perfect codes, that is, the number of syndromes equals exactly the number of correctable error patterns. For r = 2, 3, 4, 5, 6, 7, . . ., the binary (n, k) Hamming codes (3,1), (7,4), (15,11), (31,26), (63,57), and (127,120) exist. As an example, generator and parity check matrices of the (7,4) Hamming code are given in systematic form. 1 0 0 0 0 1 1 0 1 0 0 1 0 1 0 0 1 0 1 1 0 G= (3.18) 0 0 0 1 ; H = 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 The dual code obtained by using H as the generator matrix is called the simplex code. It consists of 2n−k = 2r codewords and has the property that all columns of H and, therefore, all codewords have the constant weight wH (b) = 2r−1 (except the all-zero word). The name simplex stems from the geometrical property that all codewords have the same mutual Hamming distance dH (b, b ) = 2r−1 .
FORWARD ERROR CORRECTION CODING
99
3.2.4 Hadamard Codes Hadamard codes can be constructed from simplex codes by extending all codewords with a preceding zero (Bossert 1999). This results in a generator matrix whose structure is identical to that of the corresponding simplex code except an additional first row containing only zeros. Hence, the rows of G consist of all possible decimal numbers between 0 and 2k − 1. Hadamard codes have the parameters n = 2r and k = r so that M = 2r codewords of length n = M exist. The code rate amounts to Rc = r/2r = log2 (M)/M. For k = 3 and M = 8, we obtain the generator matrix T 0 1 0 1 0 1 0 1 G = 0 0 1 1 0 0 1 1 . 0 0 0 0 1 1 1 1
(3.19)
Since the rows of G contain all possible vectors with weight 1, G represents a systematic encoder although it does not have the Gaussian normal form. Therefore, the information bits are distributed within the codeword at positions µ = 2−(l+1) M with 0 ≤ l < k. Moreover, the property of simplex codes that all pairs of codewords have identical Hamming distances is retained. This distance amounts to d = 2r−1 . The so-called Hadamard matrix BH comprises all codewords. It can be recursively constructed with + * BH,m−1 BH,m−1 (3.20) BH,m = BH,m−1 BH,m−1 where BH and BH are complementary matrices, that is, zeros and ones are exchanged. Using BH,0 = 1 for initialization, we obtain Hadamard codes whose block lengths n = 2r are a power of two. With a different initialization, codes whose block length are multiples of 12 or 20 can also be constructed. √ The application of BPSK maps the logical bits onto antipodal symbols xν = ± Es /Ts . This leads to orthogonal Walsh sequences that are used in CDMA systems for spectral spreading (see Chapter 4). They can also be employed as orthogonal modulation schemes allowing simple noncoherent detection techniques (Benthin 1996; Proakis 2001; Salmasi and Gilhousen 1991). An important advantage of Hadamard codes is the fact that they can be very efficiently soft-input ML decoded. The direct approach in (3.2) correlates the received word with all possible codewords and subsequently determines the maximum. The correlation can be efficiently implemented by the Fast Hadamard transformation. This linear transformation is similar to the well-known Fourier transformation and exploits symmetries of a butterfly structure. Moreover, the received symbols are only multiplied with ±1, allowing very efficient implementations.
3.2.5 Trellis Representation of Linear Block Codes Similar to convolutional codes that will be introduced in the next section, linear block codes can be graphically described by trellis diagrams (Offer 1996; Wolf 1978). This
100
FORWARD ERROR CORRECTION CODING
000 001 010 011 100 101 110 111 0
1
2 bν = 1
3
4
5
7
6
bν = 0
Figure 3.2 Trellis representation for (7,4,3)-Hamming code from Section 3.2.4 representation is based on the parity check matrix H = [h1T · · · hTn ]T . The number of states depends on the length of the row vectors hν and equals 2n−k . A state is described by a vector s = [s1 , . . . , sn−k ] with the binary elements sν ∈ GF(2). At the beginning (ν = 0), we start with s = 01×(n−k) . If s denotes the preceding state at time instant ν − 1 and s the successive state at time instant ν, we obtain the following description for a state transition s = s ⊕ bν · hν ,
1 ≤ ν ≤ n.
(3.21)
Hence, the state remains unchanged for bν = 0 and changes for bν = 1. From (3.10), we can directly see that the linear combination of the rows hν taking the coefficients from a codeword b ∈ results in the all-zero vector 01×(n−k) . Therefore, the corresponding trellis is terminated, that is, it starts and ends in the all-zero state. Figure 3.2 shows the trellis for a (7,4,3) Hamming code with a parity check matrix, discussed in the previous section, in systematic form. Obviously, two branches leave each state during the first four transitions, representing the information part of the codewords. The parity bits are totally determined by the information word and, therefore, only one branch leaves each state during the last three transitions, leading finally back to the allzero state. The trellis representation of block codes can be used for soft-input soft-output decoding, for example, with the algorithm by Bahl, Cocke, Jelinek, and Raviv (BCJR) presented in Section 3.4.
3.3 Convolutional Codes Convolutional codes are employed in many modern communication systems and belong to the class of linear codes. Contrary to the large number of block codes, only a few convolutional codes are relevant in practice. Moreover, they have very simple structures and
FORWARD ERROR CORRECTION CODING
101
can be graphically described by the finite state and trellis diagrams. Their breakthrough came with the invention of the Viterbi algorithm (Viterbi 1967). Besides its ability of processing soft inputs instead of hard decision inputs, its major advantage is the decoding complexity reduction. While the complexity of the brute force maximum likelihood approach described in Subsection 1.3.1 on page 18 grows exponentially with the sequence length, only a linear dependency exists for the Viterbi algorithm. There exists a duality between block and convolutional codes. On the one hand, convolutional codes have memory such that successive codewords are not independent from each other and sequences instead of single codewords have to be processed at the decoder. Therefore, block codes can be interpreted as special convolutional codes without memory. On the other hand, we always consider finite sequences in practice. Hence, we can imagine a whole sequence as a single codeword so that convolutional codes are a special implementation of block codes. Generally, it depends on the kind of application which interpretation is better suited. The minimum Hamming distance of convolutional codes is termed free distance and is denoted by df .
3.3.1 Structure of Encoder Convolutional codes exist for a variety of code rates Rc = k/n. However, codes with k = 1 are employed in most systems because this reduces the decoding effort and higher rates can be easily obtained by appropriate puncturing (cf. Section 3.3.3). As a consequence, we restrict the description to rate 1/n codes. Therefore, the input vector of the encoder reduces to a scalar d[i] and successive codewords b[i] consisting of n bits are correlated. Owing to Rc = 1/n, the bit rate is multiplied with n as indicated by the time index in Figure 3.1. Here, we combine n code bits belonging to an information bit d[i] to a codeword b[i] = [b1 [i], . . . , bn [i]]T that obviously has the same rate and time index as d[i]. The encoder can be implemented by a linear shift register as depicted in Figure 3.3. Besides the code rate, the constraint length Lc is another important parameter describing the number of clock pulses an information bit affects the output. The larger the Lc and, thus, the register memory, the better the performance of a code. However, we will see that this coincides with an exponential increase in decoding complexity. b1 [i]
a)
d[i]
g1,0 = 1
D g2,0 = 1
g1,1 = 1 g1,2 = 1
D g2,1 = 0
g2,2 = 1
b2 [i]
b1 [i]
b)
d[i]
a[i]
g1,1 = 1 g1,2 = 1
D
g2,0 = 1
D g2,1 = 0 g2,2 = 1
b2 [i]
Figure 3.3 Structure of convolutional encoders with Rc = 1/2 and Lc = 3. a) Nonrecursive encoder with g1 (D) = 1 + D + D 2 and g2 (D) = 1 + D 2 . b) Recursive encoder with g1 (D) = 1 and g2 (D) = (1 + D 2 )/(1 + D + D 2 )
102
FORWARD ERROR CORRECTION CODING
The simple example in Figure 3.3 for explaining the principle encoding process is now referred to. At each clock pulse, one information bit d[i] is fed into the register whose elements are linearly combined by modulo-2-adders. They deliver n = 2 outputs bν [i], ν = 1, 2, at each clock pulse building the codeword b[i]. Hence, the encoder has a code rate Rc = 1/2 and a memory of 2 so that Lc = 2 + 1 = 3 holds. The optimal encoder structure, that is, the connections between register elements and adders cannot be obtained with algebraic tools but has to be determined by a computer-aided code search. Possible performance criteria are the distance spectrum or the input–output weight enumerating function (IOWEF) that is described in Section 3.5. Tables of optimum codes for various code rates and constraint lengths can be found in Johannesson and Zigangirov (1998), Proakis (2001), Wicker (1995). Nonrecursive Nonsystematic Encoders Principally, we distinguish between recursive and nonrecursive structures resembling infinite impulse response (IIR) and finite impulse response (FIR) filters, respectively. Obviously, the nonrecursive encoder in Figure 3.3a is nonsystematic since none of the coded output bits permanently equals d[i]. For a long time, only nonsystematic nonrecursive convolutional nonsystematic nonrecursive convolutional (NSC) encoders have been employed because no good systematic encoders without feedback exist. This is different from linear block codes that show the same error rate performance for systematic and nonsystematic encoders. The linear combinations of the register contents are described by n generators that are assigned to the n encoder outputs. Each generator gν comprises Lc scalars gν,µ ∈ GF(2) with µ = 0, . . . , Lc − 1. A nonzero scalar gν,µ = 1 indicates a connection between register element µ and the νth modulo-2-adder, while the connection is missing for gν,µ = 0. Using the polynomial presentation L c −1 gν,µ · D µ , (3.22) gν (D) = µ=0
the example in Figure 3.3a becomes g1 (D) = g2 (D) =
g1,0 + g1,1 D + g1,2 D 2 g2,0 + g2,1 D + g2,2 D 2
= =
1 + D + D2 1 + D2.
Vector notations as well as octal or decimal representations can be used alternatively. For a generator polynomial g(D) = 1 + D + D 3 , we obtain g = g0 g1 g2 g3 = [1 1 0 1] = ˆ 1110 = ˆ 158 . With respect to the decimal notation, the coefficient g0 always denotes the least significant bit (LSB), gLc −1 denotes the most significant bit (MSB) leading to 1 + 2 + 8 = 11. For octal notation, 3-bit tupels are formed starting from the right-hand side, resulting in [1 0 1] = 58 . If less than three bits remain, zeros are added to the left. The νth output stream of a convolutional encoder has the form bν [i] =
L c −1 µ=0
d[i − µ] · gν,µ
mod 2
⇒
bν (D) = d(D) ⊗ gν (D).
(3.23)
FORWARD ERROR CORRECTION CODING
103
We recognize that the coded sequence bν [i] is generated by convolving the input sequence d[i] with the νth generator which is equivalent to the multiplication of the corresponding polynomials d(D) and gν (D). This explains the naming of convolutional codes. Recursive Systematic Encoders With the first presentation of ‘Turbo Codes’ in 1993 (Berrou et al. 1993), recursive systematic convolutional (RSC) encoders have found great attention. Although they were known much earlier, their importance for concatenated codes have become obvious only since then. Recursive encoders have an IIR structure and are mainly used as systematic encoders, although this is not mandatory. The structure of RSC encoders can be derived from their nonrecursive counterparts by choosing one of the polynomials as denominator. For codes with n = 2, we can choose g1 (D) as well as g2 (D) for the feedback. In Figure 3.3b, we used the g1 (D) and obtained the modified generator polynomials g˜ 1 (D) = 1
(3.24a)
1 + D2 g2 (D) = g1 (D) 1 + D + D2
g˜ 2 (D) =
(3.24b)
and the output bits b˜1 (D) = g˜1 (D) ⊗ d(D) = d(D)
(3.25a)
b˜2 (D) = g˜2 (D) ⊗ d(D) = g2 (D) ⊗ a(D).
(3.25b)
The polynomial a(D) = d(D)/g1 (D) in (3.25b) represents the input of the shift register depicted in Figure 3.3b. Since D is a delay operator, we obtain the following temporal relationship a(D) ⊗ 1 + D + D 2 = d(D) ⇔ a[i] = d[i] ⊕ a[i − 1] ⊕ a[i − 2]. From this, the recursive encoder structure becomes obvious. The assumption g1,0 = 1 does not restrict the generality and leads to a[i] = d[i] ⊕
L c −1
g1,µ · a[i − µ] mod 2.
(3.26)
µ=1
For notational simplicity, we neglect the tilde in the following part and denote recursive polynomials also by gν (D). It has to be mentioned that nonrecursive nonsystematic codes and their recursive systematic counterparts have the same distance spectra A(D). However, the mapping between input and output sequences and, thus, the IOWEF A(W, D) (see Subsection 3.5.1) are different. Recursive codes have an IIR due to their IIR structure, that is, they require a minimum input weight of w = 2 to obtain a finite output weight. This is one important property that predestines them for the application in concatenated coding schemes (cf. Section 3.6). Termination of Convolutional Codes In practical systems, we always have sequences of finite lengths, for example, they consist of N codewords b[i]. Owing to the memory of the encoder, the decoder cannot decide on the basis of single codewords but has to consider the entire sequence or at least larger
104
FORWARD ERROR CORRECTION CODING
parts of it. Hence, a decoding delay occurs because a certain part of the received sequence has to be processed until a reliable decision of the first bits can be made (see Viterbi decoding). Another consequence of a sequencewise detection is the unreliable estimation of the last bits of a sequence if the decoder does not know the final state of the encoder (truncated codes). In order to overcome this difficulty, Lc − 1 tail bits are appended to the information sequences forcing the encoder to end in a predefined state, conventionally the all-zero state. With this knowledge, the decoder is enabled to estimate the last bits very reliably. Since tail bits do not bear any information but represent redundancy, they reduce the code rate Rc . For a sequence consisting of N codewords, we obtain Rctail =
N N = Rc · . n · (N + Lc − 1) N + Lc − 1
(3.27)
For N Lc , the reduction of Rc can be neglected. A different approach to allow reliable detection of all bits without reducing the code rate are tailbiting codes. They initialize the encoder with its final state. The decoder only knows that the initial and final states are identical but it does not know the state itself. A detailed description can be found in Calderbank et al. (1999).
3.3.2 Graphical Description of Convolutional Codes Since the encoder can be implemented by a shift register, it represents a finite state machine. This means that its output only depends on the input and the current state but not on preceding states. The number of possible states is determined by the length of the register (memory) and amounts to 2Lc −1 in the binary case. Figure 3.4 shows the state diagrams of the nonrecursive and the recursive examples of Figure 3.3. Owing to Lc = 3, both encoders have four states. The transitions between them are labeled with the associated information bit d[i] and the generated code bits b1 [i], . . . , bn [i]. Hence, the state diagram totally describes the encoder.
a)
b)
0/00 00
0/00 00
1/11
0/11
1/11
1/00
1/11 0/00
10
01
10
0/10
01 1/10
1/01
0/01 11
0/01
0/01 11
1/10
1/10
Figure 3.4 Finite state diagrams of convolutional codes with a) g1 (D) = 1 + D + D 2 and g2 (D) = 1 + D 2 b) g1 (D) = 1 and g2 (D) = (1 + D 2 )/(1 + D + D 2 )
FORWARD ERROR CORRECTION CODING
00 10
0/00
0/00
1/11
0/11
0/11 1/00
0/10
0/10
1/01
01
0/01
11
i =0
105
0/01
1/10
i =1
i =2
i =3
i =4
i =N
i = N +1
i = N +2
Figure 3.5 Trellis diagram for nonrecursive convolutional code with g1 (D) = 1 + D + D 2 and g2 (D) = 1 + D 2 Although the state diagram fully describes a convolutional encoder, it does not contain a temporal component that is necessary for decoding. This missing part is delivered by the trellis diagram shown in Figure 3.5. It stems from the state diagram by arranging the states vertically as nodes and repeating them horizontally to illustrate the time axis. The state transitions are represented by branches labeled with the corresponding input and output bits. Generally, the encoder is initialized with zeros so that we start in the all-zero state. After Lc steps, the trellis is fully developed, that is, two branches leave each state and two branches reach every state. If the trellis is terminated as shown in Figure 3.5, the last state is the all-zero state again.
3.3.3 Puncturing Convolutional Codes In modern communication systems, adaptivity is an important feature. In the context of link adaptation, the code rate as well as the modulation scheme are adjusted with respect to the channel quality. During good transmission conditions, weak codes with large Rc are sufficient so that high data rates can be transmitted with little redundancy. In bad channel states, strong FEC codes are required and Rc is decreased. Moreover, the code rate is adjusted with respect to the importance of different information parts for unequal error protection (UEP) (Hagenauer 1989). Finally, the concept of incremental redundancy in automatic repeat request (ARQ) schemes implicitly decreases the code rate when transmission errors have been detected (Hagenauer 1988). A popular method for adapting the code rate is by puncturing. Although puncturing can be applied to any code, we restrict to the description for convolutional codes. The basic principle is that after encoding, only a subset of the code bits is transmitted, while the others are suppressed. This decreases the number of transmitted bits and, therefore, increases the code rate. Besides its flexibility, a major advantage of puncturing is that it does not affect the decoder so that a number of code rates can be achieved with only a single hardware implementation of the decoder. Principally, the optimum subset of bits to be transmitted has to be adapted to the specific mother code and can only be found by a computer-aided code search. In practice, puncturing is performed periodically where one period comprises Lp codewords. A pattern in the form
106
FORWARD ERROR CORRECTION CODING
of a matrix P determines the transmitted and suppressed bits during one period. This matrix consists of n rows and Lp columns with binary elements pµ,ν ∈ GF(2) p1,1 p1,2 · · · p1,Lp p2,1 p2,2 · · · p2,Lp (3.28) = p1 p2 · · · pLp . P= . . . .. .. .. pn,1 pn,2 · · · pn,Lp The columns pν are periodically assigned to successive codewords b[i] = [b1 [i], . . . , bn [i]]T such that ν = (i mod Lp ) + 1 holds. Each column contains the puncturing patterns for a whole codeword. A zero at the µth position indicates that the µth bit bµ [i] is suppressed, while a one indicates that it is transmitted. Generally, P contains l + Lp ones with 1 ≤ l ≤ (n − 1) · Lp , that is, only l + Lp bits are transmitted instead of n · Lp without puncturing. Hence, the code rate amounts to Lp (3.29) Rc = Lp + l and can vary in the interval Lp Lp 1 = ≤ Rc ≤ . Lp · n n Lp + 1
(3.30)
Certainly the largest achievable code rate increases with growing puncturing period Lp . Moreover, puncturing reduces the performance of the mother code because the Hamming distances between codewords are decreased. However, it can be shown that punctured codes are as good as nonpunctured codes of the same rate. Catastrophic Convolutional Codes Puncturing has to be applied carefully because it can generate catastrophic codes. They are not suited for error protection because they can generate a theoretically infinite number of decoding errors for only a finite number of transmission errors, leading to a performance degradation due to coding. There exist sufficient criteria for NSC encoders, allowing the recognition of catastrophic codes. Systematic encoders are principally not catastrophic. • All generator polynomials have a common factor. • The finite state diagram contains a closed loop with zero weight (except the loop in the all-zero state). • All modulo-2-adders have an even number of connections. This leads to a loop in the all-one state with zero weight.
3.3.4 ML Decoding with Viterbi Algorithm A major advantage of convolutional codes is the possibility to perform an efficient softinput maximum likelihood decoding (MLD), while this is often too complex for block codes.5 The focus in this section is on the classical Viterbi algorithm delivering hard 5 Syndrome
decoding for linear block codes performs MLD with hard decision input.
FORWARD ERROR CORRECTION CODING
107
decision estimates of the information bits. Section 3.4 addresses algorithms that provide reliability information for each decision and are therefore suited for decoding concatenated codes. In the following part, we assume that no apriori information of the information bits d[i] is available and that all information sequences are equally likely. In this case, MLD is the optimum decoding approach. Since a convolutional encoder delivers a sequence of codewords b[i], we have to rewrite the ML decision rule in (3.2) slightly. If a sequence x consists of N codewords x[i] each comprising n code bits xν [i], we obtain xˆ = argmax x˜
N−1 n
x˜ν [i] · |hν [i]| · rν [i].
(3.31)
i=0 ν=1
According to (3.31), we have to sum the incremental metrics nν=1 x˜ν [i] · |hν [i]| · rν [i] for all sequences and decide in favor of that one with the largest (cumulative) path metric. This is obviously impractical because the number of possible sequences grows exponentially with their lengths. Since convolutional encoders are finite state machines, their output at a certain time instant only depends on the input and the current state. Hence, they can be interpreted as a Markov process of first order, that is, the history of previous states is meaningless if we know the last state. Exploiting this property leads to the famous Viterbi decoding algorithm, whose complexity depends only linearly on the sequence length N (Kammeyer 2004; Kammeyer and K¨uhn 2001; Proakis 2001). In order to explain the Viterbi algorithm, we now have a look at the trellis segment depicted in Figure 3.6. We assume that the encoder and decoder both start in the all-zero state. The preceding states are denoted by s and successive states by s. They represent the register content, for example, s = [1 0]. To simplify the notation, the set S = GF(2)Lc−1 containing all possible states s is defined. For our example with four states, we obtain S = {[0 0], [0 1], [1 0], [1 1]}. Moreover, the set S→s comprises all states s for which a transition to state s exists. Viterbi Algorithm ① Start in the all-zero state of the trellis at time instant i = 0 and initialize the cumulative path metrics Ms [i = 0] = 0 of all states s ∈ S. Mu1 [i − 1] s = u 1
z(u1 →v) , γ (u1 →v) [i] Mv [i] s=v
z(u2 →v) , γ (u2 →v) [i] s = u 2 Mu2 [i − 1] i−1
i
i+1
Figure 3.6 Segment of trellis diagram for the illustration of Viterbi algorithm
108
FORWARD ERROR CORRECTION CODING
② At time instant i, calculate incremental metrics (branch metrics)
γ (s →s) [i] =
n
(s →s)
rν [i] · |hν [i]| · zν
,
(3.32)
ν=1 (s →s)
for all s ∈ S→s and s ∈ S where zν transition s → s.
√ = ± Es /Ts denotes the νth code symbol of
③ Add incremental metrics of ② to cumulative metrics of corresponding states at time instant i − 1: Ms [i − 1] + γ (s →s) [i] with s ∈ S→s and s ∈ S.
④ At each state s, choose the path with the largest cumulative metric and discard the competing path:
, Ms [i] = max Ms [i − 1] + γ (s →s) [i] s ∈S→s
Owing to the Markov property, we have to consider just one preceding time instant. Once we have chosen the best path among those arriving at a state s, all other paths cannot outperform the best path in the future and are discarded. Therefore, the computational complexity grows only linearly with the sequence length N .
⑤ Repeat procedure from ② until all N received codewords r[i] have been processed. ⑥ Determine the survivor at end of trellis diagram: • Terminated trellis: Continue the procedure for Lc − 1 tail bits and determine the path with the best metric M0 [N + Lc − 1] in the all-zero state. • Truncated trellis: Determine the path with the best global cumulative metric Ms [N ] among all s ∈ S.
⑦ Estimates of the information bits are delivered by tracing back the survivor determined in ⑥ to the all-zero state at i = 0.
According to the above procedure, a whole sequence has to be processed until estimates of the information bits are available. However, for a continuous transmission or very long blocks, this leads to long delays that may not be acceptable for certain applications. Moreover, limited memory in the decoder may require processing shorter parts of the block. It has been shown that a reliable decision of a bit d[i] is obtained when a subsequence from r[i] to r[i + K] with sufficiently large K has been processed. A rule of thumb states that the decision depth K has to be approximately five times the constraint length Lc (Heller and Jacobs 1971). After processing K steps, early parts of competing paths have merged with high probability and the decision is reliable. Punctured Codes Prior to decoding, positions of punctured bits have to be filled with dummy bits. Assuming an antipodal transmission, zeros can be inserted. Looking at (3.31), we recognize that rν [i] = 0 does not affect the incremental metric. However, puncturing reduces the Hamming
FORWARD ERROR CORRECTION CODING
109
distances between code sequences. Therefore, the decision depth K has to be increased (Hagenauer 1988) in order to keep the decision reliable.
3.4 Soft-Output Decoding of Binary Codes In the last decade, tremendous effort has been spent to design and analyze concatenated codes. As shown in Sections 3.6.2 and 3.6.3, the analytical performance analysis often presupposes an optimum MLD of the entire scheme, which is infeasible in most practical systems. Instead, a concatenation of distinct decoders mutually exchanging information was found to be a suboptimum but practical solution. In order to avoid a loss of information by hard decision decoding, algorithms that process and provide soft information for each bit are required. The optimum soft-in soft-out decoder would deliver a list of a posteriori probabilities Pr{˜x | y}, one for each possible codeword x˜ ∈ . Since the number of codewords for practical codes can easily even exceed the number of atoms in space, this is an infeasible approach. Instead, the generally trellis-based decoders work on a symbol-by-symbol basis and deliver soft information for each bit separately. In this section, an appropriate measure of reliability is first defined proceeding to the derivation of the soft-output decoding algorithms for block codes and convolutional codes. For the sake of simplicity, we always assume BPSK-modulated signals according to Section 1.4 and a memoryless channel.
3.4.1 Log-Likelihood Ratios – A Measure of Reliability Since the information is represented by a random process B, a suitable soft information is of course √ x= √ the probability. BPSK maps the coded√ bits b onto antipodal signals (1 − 2b) Es /Ts , that is, b = 0 corresponds to x = + Es /Ts and b = 1 to x = − Es /Ts . Owing to the restriction on the binary case, Pr{B = 0} + Pr{B = 1} = 1 holds, that is, we have only one independent parameter and the entire information is determined by either Pr{B = 0} or Pr{B = 1}. Hence, we can also use the logarithmic ratio of the probabilities leading to the log-likelihood ratio (LLR) √ Pr{X = + Es /Ts } Pr{B = 0} = log (3.33) L(b) = L(x) = log √ Pr{B = 1} Pr{X = − Es /Ts } as an appropriate measure of reliability for B = b (Hagenauer 1996b). The sign of L(b) determines the hard decision, while the magnitude denotes the reliability. The larger the difference between Pr{B = 0} and Pr{B = 1}, the larger the magnitude of their ratio. If B = 1 and B = 0 are equally likely, a decision is totally random (unreliable) and L(b) = 0 holds. Since the logarithm is a strictly monotone function, we can also calculate probabilities from the LLRs. Resolving (3.33) with respect to Pr{B = 0 | r} and Pr{B = 1 | r} results in e−v·L(b) with v ∈ {0, 1} (3.34a) Pr{B = v} = 1 + e−L(b)
110
FORWARD ERROR CORRECTION CODING
for the logical variable b and in Pr{X = X} =
1 1 + e− sgn(X)·L(x)
, 8 8 with X ∈ + Es /Ts , − Es /Ts
(3.34b)
for antipodal signals. The probability for a correct decision bˆ = b is determined in the ˆ is positive, that is, following way. For b = 0, we obtain the true data if L(b) Pr{Bˆ = b | b = 0} =
ˆ
1 ˆ 1 + e−L(b)
=
e|L(b)| ˆ 1 + e|L(b)|
ˆ > 0. for L(b)
ˆ has to be negative for b = 1. Equivalently, L(b) Pr{Bˆ = b | b = 1} =
1 ˆ 1 + eL(b)
=
ˆ
1 ˆ 1 + e−|L(b)|
=
e|L(b)| ˆ 1 + e|L(b)|
ˆ <0 for L(b)
Combining both expressions finally yields Pr{Bˆ = b} =
ˆ
e|L(b)| ˆ 1 + e|L(b)|
.
(3.35)
Moreover, the expectation of the antipodal signal x becomes, with (3.34b), " ' & L(x) Es e 1 X · Pr{X = X} = · − E{X } = Ts 1 + eL(x) 1 + eL(x) s X=± E Ts
" =
Es · tanh L(x)/2 . Ts
(3.36)
For uniformity the logical values ‘0’ and ‘1’ is discussed in the following derivation. √ However, equivalent expressions can be obtained with antipodal signals ± Es /Ts . Looking at the transmission of information, a decision is based on the matched filter output r=
1 1 Re h∗ y = |h| · x + · Re h∗ n . |h| |h|
(3.37)
According to the MAP criterion in Section 1.3, we have to choose that bˆ that maximizes the a posteriori probability bˆ = argmax Pr{B = v | r}. v∈{0, 1}
Replacing the probabilities in (3.33) by a posteriori probabilities and applying Bayes’ rule, we obtain ˆ = L(b | r) = log L(b)
pR|b=0 (r) Pr{B = 0 | r} Pr{B = 0} + log = log . Pr{B = 1 | r} pR|b=1 (r) Pr{B = 1} La (b) L(r | b)
(3.38)
FORWARD ERROR CORRECTION CODING
111
Hence, the LLR in (3.38) consists for an uncoded transmission of two components. The term L(r|b) depends on the channel statistics pR|b (r) and, therefore, only on the matched filter output r. On the contrary, La (b) is independent of r and represents a priori knowledge about the bit b. The LLR L(r|b) can be very easily calculated for memoryless channels such as the AWGN and flat fading channels depicted in Figure 3.1. Inserting the conditional probability densities6 ! 2 # √ r − |h|(1 − 2b) Es /Ts 1 pR|b (r) = · exp − (3.39) σN2 π σN2 into (3.38) results in √ L(r | b) = 4|h|
Es /Ts Es r = 4|h|2 r˜ 2 N σN 0 Lch
with
r˜ =
r √ . |h| Es /Ts
(3.40)
In (3.40), r˜ is a normalized version of the matched filter output r with an information-bearing part of unit energy. Hence, the LLR is directly obtained from r˜ by some appropriate scaling with the channel reliability Lch that depends on Es /N0 as well as the channel gain |h|2 . As a consequence, it is natural to use LLRs in subsequent decoding algorithms. For this goal, we need an appropriate algebra called L-Algebra (Hagenauer 1996b). As already known from block and convolutional codes, the parity check bits are generated by modulo-2-sums of certain information bits di . In order to calculate the LLR of a parity bit, we look at a simple SPC code with two statistically independent information bits b1 = d1 , b2 = d2 and the parity bit b3 = d1 ⊕ d2 . The LLR L(b3 ) is given by L(b3 ) = L(d1 ⊕ d2 ) = log = log
Pr{B3 = 0} Pr{B3 = 1}
Pr{D1 = 0} · Pr{D2 = 0} + Pr{D1 = 1} · Pr{D2 = 1} . Pr{D1 = 0} · Pr{D2 = 1} + Pr{D1 = 1} · Pr{D2 = 0}
Rearranging the probabilities such that we obtain likelihood ratios Pr{D = 0}/ Pr{D = 1} and applying the relationships tanh(x/2) = (ex − 1)/(ex + 1) as well as log[(1 + x)/(1 − x)] = 2 artanh(x) yields (3.41) L(d3 ) = 2 artanh λ1 · λ2 with λµ = tanh(L(dµ )/2). By complete induction techniques, it can be shown that (3.41) can be generalized for N independent variables N 7 tanh(L(dµ )/2) (3.42) L(d1 ⊕ · · · ⊕ dN ) = 2 artanh µ=1
With (3.42), we now have a rule for calculating the LLR of a sum of statistically independent random variables. 6 Be
aware that only half of the noise power affects the transmission owing to the extraction of the real part.
112
FORWARD ERROR CORRECTION CODING
An approximation with lower computational complexity can be derived by exploiting the behavior of the tanh function. It saturates at ±1 for arguments with large magnitude and has a nearly linear shape with a slope of 1 around the origin. Therefore, large magnitudes at the input result in a multiplication with ±1 while the smallest magnitude mainly determines the magnitude of the output. Hence, we obtain the approximation L(d1 ⊕ · · · ⊕ dN ) ≈ min(|L(dµ )|) · µ
N 7
sgn L(dµ ) .
(3.43)
µ=1
3.4.2 General Approach for Soft-Output Decoding In the first step, we derive a direct approach for calculating the LLRs for each information bit in a symbol-by-symbol manner. Therefore, we consider an encoder that maps the information vector d = [d1 , . . . , dk ]T onto the codeword b = [b1 , . . . , bn ]T consisting of n bits. After BPSK modulation, the vector x is transmitted. On the basis of the matched filter output r = [r1 , . . . , rn ]T , we now have to determine L(dµ | r). In the context of concatenated codes, we will see that it is necessary to calculate LLRs L(bν | r) of coded bits as well. This can be accomplished by simply replacing the targeted dµ with the desired code bit bν in the following derivation. According to the symbol-by-symbol MAP criterion of Section 1.3, the a posteriori probability Pr{X = x | r} represents an appropriate soft information containing all available information. Since we consider a binary transmission, the random variable X can take only two different values and the LLR of the corresponding probabilities also comprises the entire information. L(dˆµ ) = log
pDµ ,R (dµ = 0, r) Pr{Dµ = 0 | r} = log . Pr{Dµ = 1 | r} pDµ ,R (dµ = 1, r)
(3.44)
The probability densities in the numerator and denominator can be extended by exploiting the relation Pr{Bµ = v} = b,bµ =v Pr{B}. In order to obtain the separation into dµ = 0 and dµ = 1, we divide the code space into two sets of equal size, namely, µ(0) comprising all codewords whose µth information bit is zero and µ(1) with all remaining codewords corresponding to dµ = 1. This results in7 (0) pB,R (b, r) (0) pR|b (r) · Pr{b} b∈µ b∈µ = log . (3.45) L(dˆµ ) = log (1) pB,R (b, r) (1) pR|b (r) · Pr{b} b∈ b∈ µ
µ
For memoryless channels, the conditional probability densities can be factorized into n terms n 7 pR|b (r) = pRν |bν (rν ). ν=1
Moreover, a codeword b is totally determined by the corresponding information word d. Since the information bits are assumed to be statistically independent, Pr{b} = Pr{d} = 7 For
notational simplicity, we use the simpler expression Pr{b} instead of Pr{B = b}.
FORWARD ERROR CORRECTION CODING k µ=1 Pr{dµ } holds and we obtain (0)
b∈µ
L(dˆµ ) = log
(1)
113
n
ν=1 pRν |bν (rν )
·
ν=1 pRν |bν (rν )
·
n
b∈µ
k ν=1
Pr{dν }
ν=1
Pr{dν }
k
.
(3.46)
If we consider systematic encoders with dµ = bµ for 1 ≤ µ ≤ k, the terms corresponding to ν = µ in the products of the numerator and denominator are constant. Hence, pRµ |bµ (rµ ) as well as Pr{dµ } can be extracted from the sums, yielding n
L(dˆµ ) = L(rµ | dµ ) + La (dµ ) + log
(0) b∈µ νν=1 =µ
n (1) b∈µ νν=1 =µ
pRν |bν (rν ) · pRν |bν (rν ) · Le (dˆµ )
k
Pr{dν }
ν=1 ν=µ k
.
(3.47)
Pr{dν }
ν=1 ν=µ
Equation (3.47) shows that L(dˆµ ) consists of three parts for systematic encoding: The intrinsic information L(rµ | dµ ) = log
pRµ |dµ =0 (rµ ) Es = 4|hµ |2 r˜µ pRµ |dµ =1 (rµ ) N0
(3.48a)
obtained from the weighted matched filter output of the symbol dµ itself, the a priori information Pr{Dµ = 0} (3.48b) La (dµ ) = log Pr{Dµ = 1} that is already known from the uncoded case and as a third part, Le (dˆµ ). The last component does not depend on the µth bit itself but on all other bits of a codeword. Therefore, it is called extrinsic information. For memoryless channels, all three parts are independent of each other so that the LLRs can simply be summed. The extrinsic information is responsible for the coding gain since it exploits the structure of the code. In the case of nonsystematic encoding, extrinsic and intrinsic components cannot be separated. However, soft-output decoding is still possible. Since the log-likelihood values L(rµ | bµ ) are scaled versions of the matched filter output, it would be desirable to express (3.47) by these LLRs instead of probability densities. The a priori probabilities can be substituted by (3.34a) and for the conditional probability densities pRν |bν (rν ) = Pr{bν | rν } ·
pRν (rν ) Pr{bν }
exp − bν L(bν | rν ) = 1 + exp − L(bν | rν ) exp − bν L(rν | bν ) = 1 + exp − L(bν | rν )
1 + exp − La (bν ) · pRν (rν ) · exp − bν La (bν ) · 1 + exp[−La (bν )] · pRν (rν )
(3.49)
114
FORWARD ERROR CORRECTION CODING
holds. It has to be emphasized that the LLRs La (bν ) and L(rν | bν ) do not depend on the particular bν but only on the ratio of probabilities corresponding to the two possible values of bν . Hence, inserting (3.49) into (3.47) and cancelling all terms independent of bν results in n
Le (dˆµ ) = log
k exp − bν L(rν | bν ) · exp − bν La (bν )
(0) b∈µ νν=1 =µ
n
ν=1 ν=µ
k exp − bν L(rν | bν ) · exp − bν La (bν )
(1) b∈µ νν=1 =µ
With the definition
.
(3.50)
ν=1 ν=µ
L(rν | bν ) + La (bν ) L(bν ; rν ) = L(rν | bν )
for 1 ≤ ν ≤ k for k < ν ≤ n
(3.51)
(3.47) finally becomes n
L(dˆµ ) = L(rµ |dµ ) + La (dµ ) + log
(0) b∈µ νν=1 =µ n
exp − bν L(bν ; rν ) (3.52a)
exp − bν L(bν ; rν )
(1) b∈µ νν=1 =µ
(+1)
= L(rµ |dµ ) + La (dµ ) + log
x∈µ
(−1)
x∈µ
n ν=1 ν=µ n
exp xν L(bν ; rν )/2 exp xν L(bν ; rν )/2
.
(3.52b)
ν=1 ν=µ
Equation (3.52a) demonstrates that we only need the scaled matched filter outputs and the a priori LLRs for each bit in order to determine the extrinsic information. An equivalent expression is obtained by using antipodal symbols xν = ±1 instead of logical bits bν ∈ {0, 1}. The relationship bν = (1 − xν )/2 leads to (3.52b). However, a direct implementation of (3.52a) or (3.52b) requires a prohibitive computational effort because the number of codewords b ∈ µ(0,1) over which we have to sum the terms in the numerator and denominator becomes excessively high for codes with reasonable cardinality. For example, the (255,247,3)-Hamming code consists of 2247 = 2.3 · 1074 codewords. Walsh codes represent an exception for which the described soft-output decoding is applicable and are described in the next subsection.
3.4.3 Soft-Output Decoding for Walsh Codes Symbol-by-symbol soft-output decoding of Walsh codes can be efficiently implemented by the Fast Hadamard Transform (FHT). Unifying the three terms in (3.52a) into a single
FORWARD ERROR CORRECTION CODING
115
expression and exchanging the order of the product and the exponential function yields * n + exp − bν · L(bν ; rν ) L(dˆµ ) = log
(0)
b∈µ
ν=1
* n + exp − bν · L(bν ; rν ) (1)
b∈µ
(3.53a)
ν=1
exp [−FHT{L(b; r)}]
(0)
b∈µ
= log
exp [−FHT{L(b; r)}]
.
(3.53b)
(1)
b∈µ
From (3.53a), we see that each possible codeword b ∈ has to be correlated with the LLRs L(bν ; rν ) (inner sums in the numerator and denominator). While this requires, in general, a computational effort that grows proportional to M 2 , an efficient implementation exists for Walsh codes. The FHT delivers the correlation directly to all possible codewords at reduced costs that are proportional to M log2 (M). It works similar to the fast Fourier transform (FFT) algorithm whereby only multiplications with zeros and ones are required which can be efficiently implemented by summing over subsets of LLRs. In summary, soft-output decoding of Walsh codes is performed by dividing the FHT outputs for each bit into appropriate subsets µ(0) and µ(1) and inserting them into (3.53b). For codes with large cardinality, it is possible to perform the decoding with the dual code (see page 97). This leads to lower computational costs if the dual code has much fewer codewords than the original code (Offer 1996). A different possibility is the representation of the code by its trellis diagram. In the following subsections, we present trellis-based symbol-by-symbol decoding algorithms for linear block codes and convolutional codes as well as simplified versions operating in the logarithmic domain.
3.4.4 BCJR Algorithm for Binary Block Codes Basically, the algorithm to be presented now is not restricted to decoding purposes but can be generally applied for estimating a posteriori LLRs in systems represented by a trellis diagram. Hence, it can be used for decoding convolutional and linear block codes as well as for the equalization of dispersive channels (Bahl et al. 1974; Douillard et al. 1995; Jordan 2000; Li et al. 1995). The so-called BCJR algorithm was presented for the first time in 1972 by Bahl, Cocke, Jelinek and Raviv (Bahl et al. 1972). The savings in computational complexity compared to a direct implementation of (3.52a) are based on the fact that the encoder can be interpreted as a Markov process of the first order (compare the Viterbi decoder on page 106). We start the derivation for binary linear block codes by going back to (3.44) and focus on soft-output decoding the information bit dµ , that is, estimating L(dˆµ ). Looking at the trellis representation of linear block codes (rf. Section 3.2.5), dµ initiates a transition from a state s at time instant µ − 1 to a state s at time µ. The set of all possible transitions {s , s} can be separated into two subsets, those corresponding to dµ = 0 and those associated
116
FORWARD ERROR CORRECTION CODING
with dµ = 1. Moreover, r can be divided into three parts: The vector rk<µ comprising all received symbols before time instant µ, the symbol rµ , and the vector rk>µ containing all symbols received after time instant µ. With this, (3.44) becomes p s , s, r p s , s, rk<µ , rµ , rk>µ {s ,s} d =0
µ L(dˆµ ) = log
{s ,s} dµ =1
{s ,s} dµ =0
. = log p s , s, r p s , s, rk<µ , rµ , rk>µ
(3.54)
{s ,s} dµ =1
If LLRs of coded bits bν are to be calculated, dµ has to be replaced in (3.54) by bν . The probability density functions in the numerator and denominator can be factorized by the chain rule . p s , s, rk<µ , rµ , rk>µ = p rk>µ . s , s, rk<µ , rµ . ·p s, rµ . s , rk<µ · p s , rk<µ . (3.55) Since the encoder can be modeled as a Markov process of the first order, the first term in (3.55) can be rewritten as . . βµ (s) := p rk>µ . s , s, rk<µ , rµ = p rk>µ . s (3.56) because the history described by s , rk<µ and rµ is irrelevant once s is known. Illustratively, βµ (s) represents the probability that the sequence rk>µ will be received if we start at time instant µ from state s. The second factor in (3.55) can be simplified in the same way. Once the state s at time µ − 1 is known, the sequence rk<µ is obsolete. The probability density has the form (3.57) γµ (s , s) := p s, rµ | s = p rµ | s , s · Pr{s | s }. Hence, it comprises the conditional probability density of the matched filter output for a flat fading channel (cf. (3.39)) ! 2 # √ rµ − |hµ |z(s →s) Es /Ts 1 p rµ | s , s = · exp − . (3.58) σN2 π σN2 and the a priori probability Pr{dµ } = Pr{s | s } of a bit dµ associated with a transition from s to s. According to Section 3.3.4, z(s →s) = ±1 in (3.58) denotes the code symbol corresponding to the transition from s to s. If a priori knowledge is available in the form of La (dµ ), we obtain −1 for dµ = 0 1 + exp[−La (dµ )] −1 Pr{s | s } = 1 + exp[La (dµ )] (3.59) for dµ = 1 0 transition does not exist. Finally, we define the joint probability density αµ−1 (s ) := p(s , rk<µ ).
(3.60)
FORWARD ERROR CORRECTION CODING
117
Inserting (3.56), (3.58), and (3.60) into (3.54) leads to αµ−1 (s ) · γµ (s , s) · βµ (s)
(s ,s) dµ =0
L dˆµ = log
(s ,s) dµ =1
αµ−1 (s ) · γµ (s , s) · βµ (s)
.
(3.61)
Hence, the terms in the sums of (3.61) can be split into three parts: αµ−1 (s ) covers the past k < µ, γµ (s , s) represents the presence, and βµ (s) the future k > µ. The probability densities αµ−1 (s ) and βµ (s) can be calculated recursively. αµ (s) = p s, rk<µ+1 = p s , s, rk<µ , rµ =
s
γµ (s , s) · αµ−1 (s )
(3.62a)
s
βµ−1 (s ) = p rk>µ−1 | s = =
1 p s , s, rµ , rk>µ Pr{s } s
γµ (s , s) · βµ (s).
(3.62b)
s
From (3.62a), we see that αµ (s) can be calculated by a forward recursion because αµ (s) at time instant µ depends on the preceding values αµ−1 (s ) at time instant µ − 1. Equivalently, βµ−1 (s ) is determined by a backward recursion. The calculations are illustrated in Figure 3.7 showing a trellis segment with forward and backward recursions. αµ−1 (s ) u1
γµ (s , s)
βµ (s)
αµ (v) = αµ−1 (u1 ) · γµ (u1 , v) +αµ−1 (u2 ) · γµ (u2 , v)
u2
v1 βµ−1 (u) = βµ (v1 ) · γµ (u, v1 ) +βµ (v2 ) · γµ (u, v2 )
v2 µ−1
µ
Figure 3.7 Section of trellis diagram for explanation of BCJR algorithm
118
FORWARD ERROR CORRECTION CODING
Initialization A recursive calculation always requires an initialization at the beginning of the procedure. Computing γµ (s , s) and αµ (s) in the forward recursion is similar to the conventional Viterbi algorithm except that no paths are discarded. At time instance µ = 0, the trellis starts in the all-zero state with the initialization
1 for s = 01×(n−k) (3.63) α0 (s ) = 0 for s = 01×(n−k) . Since block codes are represented by a terminated trellis, we end again in the all-zero state. This leads to the initialization
1 for s = 01×(n−k) (3.64) βn (s) = 0 for s = 01×(n−k) . Extended Battail Algorithm The decoding algorithm can be simplified for linear block codes. Using the bits bµ of a codeword b ∈ as coefficients for linear combining the rows hµ of the parity check matrix H always leads to the all-zero vector. Since the trellis representation of linear block codes is also based on the linear combination of hµ , each codeword corresponds to a certain paths through the trellis starting and ending in the all-zero state. Resolving (3.10) with respect to the µth component, we obtain n ν=1
bν · hν
mod 2 = 01×(n−k)
⇒
bµ · hµ =
n
bν · hν
mod 2.
(3.65)
ν=1 ν=µ
Hence, skipping the µ-th row in H leads to a trellis with two possible final states. For bµ = 0, the trellis ends in the all-zero state, for bµ = 1 it is the state s = hµ . Figure 3.8 illustrates the modified trellis of the Hamming code when the second row h2 = [1 0 1] of H is skipped (cf. Figure 3.2). The extended Battail algorithm exploits this relationship and avoids a backward recursion. The extrinsic information can be calculated by (Offer 1996) Le dˆµ = log αn (0) − log αn (hµ ). (3.66) Therefore, only a forward recursion similar to the Viterbi algorithm has to be performed for each bit to be estimated.
3.4.5 BCJR Algorithm for Binary Convolutional Codes This subsection describes a symbol-by-symbol soft-output decoding algorithm for binary convolutional codes. Since most parts of the algorithm are identical to that in the previous section, only the differences are mentioned. Contrary to linear block codes, successive codewords are not independent anymore. Hence, an information sequence d is mapped by T the encoder onto a sequence b = b[0]T · · · b[N − 1]T consisting of codewords b[i] =
FORWARD ERROR CORRECTION CODING
119 α7 (0)
000 001 010 011 100 101
α7 (h2 )
110 111 0
3
1 bµ = 1
4
5
6
7
bµ = 0
Figure 3.8 Modified trellis diagram of (7,4) Hamming code, second column h2 = [1 0 1] of H skipped T b1 [i] · · · bn [i] . Consequently, the decoder processes the whole sequence r = r[0]T · · · T r[N − 1]T at the matched filter output instead of single codewords r[i]. The main difference concerns the calculation of the incremental metrics in (3.58), that is, the conditional probability density of the channel. Since each segment of the trellis now corresponds to an n-bit codeword, the incremental metric becomes ! 2 # (s →s) √ n 7 rµ [i] − |hµ [i]|zµ Es /Ts 1 · exp − . (3.67) p r[i] | s , s = σN2 π σN2 µ=1
According to Section 3.3.4, z(s →s) denotes the codeword corresponding to the state transition (s , s). The variables αi−1 (s ), γi (s , s) and βi (s) are calculated in the same way as for linear block codes. Initialization For a terminated trellis, the same initialization as that introduced in the previous section can be applied. If the trellis is truncated and the last state of the encoder not known to the receiver, two different initializations of the backward recursion are possible. On the one hand, we can use the results of the forward recursion resulting in βN (s) = αN (s).
(3.68)
On the other hand, βN (s) can be set to a constant value for all 1 ≤ s ≤ 2Lc −1 βN (s) = 21−Lc .
(3.69)
120
FORWARD ERROR CORRECTION CODING
3.4.6 Implementation in Logarithmic Domain Regarding the implementation of the BCJR algorithm on a specific hardware, it is beneficial to perform all calculations in the logarithmic domain. This circumvents problems with the numerical representation of very large or very small numbers. Moreover, multiplications are transformed into additions, reducing the computational costs remarkably. Since we generally assume Gaussian distributed background noise, the natural logarithm is predestinated. Without loss of generality, we restrict to binary codes so that only two branches leave each state s and arrive at each state s. Looking at the derivations in the previous sections, we recognize that the arguments of the logarithm are always sums. The following transformation allows us to simplify the logarithm of sums. (3.70) log ex1 + ex2 = max [x1 , x2 ] + log 1 + e−|x1 −x2 | . Obviously, the expression on the left-hand side is mainly transformed into a maximum search. The correction term log (1 + exp(−|x1 − x2 |)) only depends on the absolute difference between x1 and x2 and can be easily quantized. This enables the use of simple lookup tables, avoiding the computation of exponential and logarithmic functions. Equation (3.70) can also be recursively applied to more than two terms, that is, we start with the first two terms, then apply (3.70) to the intermediate result and the third term, and so on (Robertson et al. 1995). Using the notation for convolutional codes, the transformation of the incremental metric in (3.67) into the logarithmic domain delivers γ¯i (s , s) = log γi (s , s) 2 (s →s) √ n rµ [i] − |hµ [i]|zµ Es /Ts =C− + log Pr{s | s } (3.71) 2 σN µ=1 where C is independent of particular state transitions and can therefore be neglected. Furthermore, calculating the square in the numerator of (3.71), we recognize that the squared 2 (s →s) √ terms rµ [i]2 and |hµ [i]|zµ Es /Ts do not differ for different state transitions and can also be cancelled. Consequently, we obtain √ n Es /Ts (s →s) 2 · |hµ [i]| · rµ [i] · zµ + log Pr{s | s } γ¯i (s , s) = 2 σN µ=1 =
n 1 µ=1
2
(s →s)
· L(rµ [i] | bµ ) · zµ
+ log Pr{s | s } .
(3.72)
Hence, only a sum of LLRs has to be calculated rather than probability densities. The a priori probabilities in (3.72) can be substituted by applying (3.34b) and (3.70) log Pr{s | s } = − log 1 + e±La (d[i]) = − max 0, ±La (d[i]) + log 1 + e−|La (d[i])| . (3.73) In (3.73), the plus sign holds for the transition (s → s) corresponding to an information bit d[i] = 0, and, consequently, the minus sign for d[i] = 1. For the variables α¯ i (s) =
FORWARD ERROR CORRECTION CODING
121
log[αi (s)] and β¯i−1 (s ) = log[βi−1 (s )], we obtain the expressions 1 2 exp log γi (s , s) · αi−1 (s ) α¯ i (s) = log s
= max γ¯i (s , s) + α¯ i−1 (s ) + log 1 + e−|[i]| s
and β¯i−1 (s ) = log
(3.74)
1 2 exp log γi (s , s) · βi (s)
s
= max γ¯i (s , s) + β¯i (s) + log 1 + e−|[i]| . s
(3.75)
In (3.74) and (3.75), [i] denotes the difference between those terms from which the maximum is taken. Looking at the previous equations and assuming that the correction term is obtained by a lookup table, it becomes obvious that no exponential or logarithmic functions have to be calculated. All multiplications are transformed into additions and all additions into maximizations. Therefore, the computational costs are reduced remarkably without loss in performance. This version of the BCJR algorithm is often called log-MAP (Robertson et al. 1997). An approximation of the BCJR algorithm which reduces the computational complexity additionally is obtained if the correction terms are neglected. This results in the so-called Max-Log-MAP algorithm. It can be shown that the hard decisions of its output values equal those of the Viterbi decoder (Robertson et al. 1995). The whole output can be described by ˆ L(d[i]) = max α¯ i−1 (s ) + γ¯i (s , s) + β¯i (s) (s ,s) d[i]=0
− max α¯ i−1 (s ) + γ¯i (s , s) + β¯i (s) . (s ,s) d[i]=1
(3.76)
With regard to the initialization,
0 α0 (s ) = −∞
and βN (s) =
for s = 01×(n−k) for s = 01×(n−k)
(3.77)
0 for s = 01×(n−k) −∞ for s = 01×(n−k)
(3.78)
holds for a terminated trellis. In case of a truncated trellis, βN (s) ≡ 01×(n−k) can be used. There also exists an extension of the Viterbi algorithm that delivers soft outputs. The so-called Soft-Output Viterbi Algorithm is described in Hagenauer and H¨oher (1989).
3.5 Performance Evaluation of Linear Codes 3.5.1 Distance Properties of Codes The performance of a code is only asymptotically determined by its minimum Hamming distance dmin , that is, for high signal-to-noise ratios (SNRs). For low and medium SNRs, the
122
FORWARD ERROR CORRECTION CODING
whole distance spectrum, that is, all distances that can occur between pairs of codewords, has to be considered. Determining the distance spectrum of a code may become very demanding, especially for extremely long codes. A simplification occurs for linear codes allowing us to restrict to the comparison of each codeword b ∈ \ {0} with the all-zero word instead of calculating its Hamming distances to all other words in . The distance to the all-zero word is obtained by simply counting the nonzero symbols in b, and, therefore, determining its Hamming weight wH (b). The whole spectrum of a code can be expressed by the polynomial A(D) =
D wH (b) =
n
Ad · D d = 1 +
Ad · D d .
(3.79)
d=dmin
d=0
b∈
n
The coefficients A d represent the number of codewords b with weight wH (b) = d. Therefore, the relation nd=0 Ad = 2k = || holds, that is, the sum of all coefficients Ad equals the number of codewords. For convolutional codes, Ad describes the number of code sequences with weight d. The minimum weight and distance spectrum solely depend on the code and not on the encoder. If concatenated codes are considered or the bit error rate (BER) has to be determined, the so-called IOWEF (Benedetto and Montorsi 1996a,b; Benedetto et al. 1996) is important. It reflects not only the weight d of the coded bits but also the weight w of the corresponding information bits. Therefore, it describes the input and output relationship of the encoder. The IOWEF is obtained by simply counting the number of nonzero symbols for each pair of input and output vectors of the encoder, leading to a two-dimensional polynomial A(W, D) =
k n
Aw,d · W w D d = 1 +
k n
Aw,d · W w D d .
(3.80)
w=1 d=dmin
w=0 d=0
In (3.80), Aw,d denotes the number of codewords with weight d and weight w of the asso ciated information vector d. Certainly, w Aw,d = Ad holds. Especially for concatenated codes, we need conditional IOWEFs that are restricted to only a single input weight w A(w, D) =
n
Aw,d · D d
(3.81)
Aw,d · W w .
(3.82)
d=0
or a single output weight d A(W, d) =
k w=0
Finally, calculating the bit error probability requires the coefficients . k . ∂ w 1 Cd = · A(W, d).. · Aw,d = k ∂W k W =1
(3.83)
w=1
because they represent the average number of nonzero information bits considering all code sequences with output weight d.
FORWARD ERROR CORRECTION CODING
123
IOWEF for some Linear Block Codes As already mentioned, it is a demanding task to determine the IOWEF A(W, D) or even A(D) for arbitrary codes. Only for distinct or very simple codes, closed form expressions are known. Here, distance spectra and IOWEF for some linear block codes considered in this book are illustrated. Starting with the (n, 1, n)-repetition code from page 97, it can be recalled that it consists only of two codewords, the all-zero word and the all-one word with weight n. Hence, the distance spectrum has the form A(D) = 1 + D n and the IOWEF becomes A(W, D) = 1 + W D n . Looking at the (n, n − 1, 2)-single parity check code of Subsection 3.2.2, we know that only even weights occur. Since n−1 denotes the number of possibilities to arrange w ones w in the information part of length k = n − 1, we obtain the IOWEF A(W, D) =
' n−1 & n−1 w=0
w
?
·W ·D w
w+1 2
@ ·2
(3.84)
where x denotes the largest integer smaller than x. The distance spectrum becomes n−1 2 &
A(D) =
d=0
' n 0 2d ·D + 2d Dn
for n is odd for n is even.
(3.85)
Next, we look at the Simplex code with n = 2r − 1 and k = r. All its codewords have a Hamming weight of wH (b) = 2r−1 (except the all-zero word). Hence, the distance spectrum has the form r−1 (3.86) A(D) = 1 + (2r − 1) · D 2 and the IOWEF becomes A(W, D) = 1 +
k=r & ' r w=1
w
r−1
· W w D2
.
(3.87)
Since the Hadamard code of Subsection 3.2.4 has been obtained by extending each codeword of the Simplex code by a preceding zero, their distance spectra and IOWEFs are identical. Finally, we consider the Hamming code of order r that is the dual code of the Simplex code. Distance spectrum of dual codes are related to each other by the MacWilliam’s identity (Blahut 1983; Friedrichs 1996; Peterson and Weldon 1972) & ' 1−D ⊥ −k n (3.88) A (D) = 2 · (1 + D) · A 1+D where n and k are given by the original (n, k)-code . In our case, the Simplex code serves as original code with n = 2r − 1 and k = r. Applying (3.88) to (3.86) yields r−1
r −1
A(D) = (1 − 2−r ) · (1 − D)(1 − D 2 )2 −1 + 2−r · (1 + D)2 n−1 1 n · (1 − D) 1 − D 2 2 + · (1 + D)n = n+1 n+1 for the Hamming code.
(3.89a) (3.89b)
124
FORWARD ERROR CORRECTION CODING
WD 11 WD
00
WD
D
W
2
10
D
01
D
2
00
Figure 3.9 Modified trellis diagram for determining IOWEF of convolutional code from Figure 3.4a
IOWEF for Convolutional Codes We now take a look at the IOWEF of convolutional codes. We start the derivation with the modification of the finite state diagram from Figure 3.4a by cutting the self-loop in the all-zero state. Defining the all-zero state as the starting and ending points results in the structure depicted in Figure 3.9. Each branch is labeled with a polynomial denoting the input weight w and the output weight d of the corresponding state transition. The weight of a sequence of successive transitions is calculated by multiplying their polynomial terms. The whole IOWEF is obtained by summing the polynomials of all possible paths starting and ending in the all-zero state. First, we consider a continuous data transmission with only a single divergence from and a single reunion with the all-zero sequence. The above-mentioned procedure can be mathematically described by a series of the form A(W, D) =
∞
a\{0} · i\{0} · b\{0} ,
(3.90)
i=0
where the matrix \{0}
0 = D D
W 0 0
0 W D WD
(3.91)
comprises the labels of all possible state transitions excluding the all-zero state. An element νµ represents the label of the transition from state ν into state µ where the indices are the decimal representations of the state vectors s and s consisting of binary elements. For example, the second row describes the branches leaving state s = [1 0], that is, 21 = D belongs to the transition to state s = [0 1] with zero input weight and an output weight of one and 23 = W D to the transition to state s = [1 1]. Since a self-loop does not exist in state s = [1 0], 22 = 0 holds. According to Figure 3.9, we start from the all-zero state always with a transition into state s = [1 0] with label W D 2 . This is ensured by the row vector a\{0} = [0 W D 2 0]. Equivalently, we reach the all-zero state only from state s = [0 1] with label D 2 described by vector b\{0} = [D 2 0 0]T . For a practical implementation, the series is truncated after a certain number of transitions.
FORWARD ERROR CORRECTION CODING a) Lc = 4
6
10
4
Ad Cd 4
10
10
2
2
10
10
0
5
b) Lc = 9
6
10
Ad Cd
10
125
0
10
d→
15
20
10
5
10
d→
15
20
Figure 3.10 Coefficients Ad and Cd for different convolutional codes a) g1 = 178 and g2 = 158 , b) g1 = 5618 and g2 = 7538 For concatenated codes, we have to regard a blockwise transmission and especially multiple divergences from the all-zero sequence. This can be accomplished by extending the matrix in (3.91). By including all transitions from and into the all-zero state, we obtain 0 1 0 W D2 1 0 W D2 0 D 2 0 D 2 W 0 = = (3.92) 0 0 D \{0} 0 W D 0 0 D 0 WD and the modified vectors a = [1 0 0 0] and b = [1 0 0 0]T . Figure 3.10 shows the coefficients Ad and Cd for two convolutional codes with constraint lengths Lc = 4 and Lc = 9 and a common code rate Rc = 1/2. Obviously, Ad and Cd increase exponentially with the distance d. The free distance of a code corresponds to the first nonzero coefficient Ad (excluding the all-zero word) and amounts to df = 6 for the Lc = 4 code and df = 12 for Lc = 9. Hence, df becomes larger for increasing constraint length Lc .
3.5.2 Error Rate Performance of Codes The analytical evaluation of the error rate performance of codes is based on the maximum likelihood detection already described in Section 3.1. Since a codeword consists of n symbols, the detection cannot be performed symbolwise but the entire codeword has to be considered. For convolutional codes, n has to be replaced by the total number nN of symbols in a sequence. Since we restrict to BPSK, the vectors b and x are linearly related to each other and can be used equivalently. The derivation will be divided into two parts: We start with the AWGN channel and succeed with flat fading channels.
126
FORWARD ERROR CORRECTION CODING
Error Rate Performance for AWGN Channel Without loss of generality, we assume that we transmit the all-zero word b = 0n×1 , that is, √ T √ Es /Ts · · · Es /Ts holds. The matched filter output has the form x(0) = r = Re {y} = x(0) + n
(3.93)
where n = Re {n} contains real Gaussian distributed noise samples with zero mean and the variance σN2 = N0 /(2Ts ). According to Section 3.1, the optimum detector in (3.2) determines the word xˆ with the largest conditional density pR|ˆx (r). The set , - (3.94) M0,i = r | pR|x(i) (r) > pR|x(0) (r) = r | rT x(i) > rT x(0) comprises all received vectors r = [r1 · · · rn ]T that have a larger conditional probability with respect to x(i) than to the true x(0) . Therefore, these vectors would lead to a wrong decision. Rewriting the inner product in (3.94) and inserting (3.93) delivers the pairwise error probability8 Pr{x(0) → x(i) } = Pr{r ∈ M0,i | x(0) }
n < n (0) (i) = Pr rν xν < rν xν
= Pr
ν=1
ν=1
n
(xν(0)
−
xν(i) )nν
<−
ν=1
n
< (xν(0)
−
xν(i) )xν(0)
.
(3.95)
ν=1
√ (i) Owing to the assumption xν(0) ≡ Es /Ts the differences between xν(0) and xν take √ (0) (i) (0) (i) positions, only the values 0 and 2 Es /Ts . Since x and x differ in exactly dH x , x the left-hand side of the inequality in (3.95) describes a zero mean Gaussian distributed random variable η with variance ση2 = 4dH x(0) , x(i) σN2 Es /Ts . The right-hand side has (0) (i) the constant value −2dH x , x Es /Ts . Equivalent to (1.49), we obtain with the pairwise error probability ' & 1 (3.96) dH x(0) , x(i) · Es /N0 . Pd = Pr{x(0) → x(i) } = · erfc 2 for two codewords x(0) and x(i) with Hamming distance dH x(0) , x(i) . An average error probability for x(0) can be obtained by calculating (3.96) for the union over all sets M0,i for i = 0. However, it may become very difficult to exactly determine ∪i M0,i . A good approximation is obtained by the well-known union bound (Bossert 1999; Proakis 2001) k −1 k −1 2 . 2 (0) . (0) Pe x M0,i . x Pr r ∈ M0,i | x(0) = Pr r ∈ ≤ i=1
k
2 −1 1 ≤ erfc 2 i=1
8 We
i=1
!" dH
Es x(0) , x(i) N0
apply the same simplifications as in footnote 6 on page 18.
# (3.97)
FORWARD ERROR CORRECTION CODING 10 10
Pb →
10 10 10 10 10
127
0
dmax = df dmax = 7 dmax = 8 dmax = 10 dmax = 20 simulation
−1
−2
−3
−4
−5
−6
0
2 4 Eb /N0 in dB →
6
Figure 3.11 Bit error probabilities for convolutional code with g1 = 158 and g2 = 178 and different maximum distances considered (union bound for AWGN) where the equality holds if and only if all sets M0,i are disjoint. Owing to the linearity of the considered codes, (3.97) is not only valid for a specific x(0) but also represents the general probability of a decoding failure. The argument of the complementary error function only depends on the Hamming distances and the SNR. Therefore, instead of running over all competing codewords or sequences, we can simply use the distance spectrum defined in (3.79) to rewrite (3.97) !" # n n 1 Es Ad · erfc d Ad · Pd . (3.98) = Pe ≤ 2 N0 d=dmin
d=dmin
With regard to bit error probabilities, we have to consider the specific mapping of information vectors d onto code vectors b or, equivalently, x. This can be accomplished by replacing the coefficients Ad in (3.98) by Cd defined in (3.83). We obtain !" !" # # n n Eb 1 Es 1 Cd · erfc d Cd · erfc dRc = . (3.99) Pb ≤ 2 d=d N0 2 d=d N0 min
min
The union bound approximation of the BER for an NSC code with generators g1 = 158 and g2 = 178 is illustrated in Figure 3.11. The results have been obtained from (3.99) by replacing n as upper limit of the summation by the parameter dmax . For high SNRs, the asymptotic performance is dominated by the minimum Hamming distance df as can be seen from dmax = dmin = df . For medium SNRs, higher Hamming distances also have to be included. However, for small SNRs, the union bound diverges for large dmax as the comparison with the simulation results show. Figure 3.12 shows the BERs for convolutional codes with different constraint length Lc . Obviously, the performance increases with growing memory. However, the decoding costs also grow exponentially with the constraint length. Hence, a trade-off between performance
128
FORWARD ERROR CORRECTION CODING 10 10
Pb →
10 10 10 10 10
0
uncoded Lc = 3 Lc = 4 Lc = 5 Lc = 6 Lc = 7 Lc = 8 Lc = 9
−1 −2 −3 −4 −5 −6
0
2
4 Eb /N0 in dB →
6
8
Figure 3.12 Bit error probabilities for convolutional codes of different constraint lengths (Proakis 2001) (union bound for AWGN, bold dashed line: capacity bound) and complexity has to be found. Since the additional gains become smaller for growing Lc , it is questionable whether Shannon’s channel capacity can be reached by simply enlarging the memory of convolutional codes. With reference to this goal, concatenated codes presented in Section 3.6 are more promising. Error Rate Performance for Flat Fading Channels For flat fading channels, only the pairwise error probability has to be recalculated. Each symbol xν is weighted with a complex channel coefficient hν of unit average power. Assuming that the coefficients are perfectly known to the receiver, the output of the matched filter including subsequent weighting with the CSI has the form rν = h∗ν · yν = |hν |2 xν(0) + Re h∗ν nν = |hν |2 xν(0) + n˜ ν (3.100) and the probability in (3.95) becomes Pr{r ∈ M0,i | x(0) , h}
n < n = Pr (xν(0) − xν(i) )n˜ ν < − (xν(0) − xν(i) )|hν |2 xν(0) . ν=1
(3.101)
ν=1
√ Again, the differences between xν(0) and xν(i) take only the values 0 and 2 Es /Ts . The set L of those indices ν for which xν(0) and xν(i) differ is now defined. Obviously, L consists elements. The right-hand side of the inequality in (3.101) has the constant of dH (x(0) , x(i) ) value −2Es /Ts ν∈L |hν |2 . Since the noise is circularly symmetric, the left-hand side is a zero mean Gaussian distributed random variable η with variance ση2 = 4
Es σN2 E s N0 · · |hν |2 = 2 2 · |hν |2 . Ts 2 ν∈L Ts ν∈L
(3.102)
FORWARD ERROR CORRECTION CODING
129
We obtain the pairwise error probability Pr{x(0) → x(i)
" 1 | h} = · erfc |hν |2 · Es /N0 . 2 ν∈L
(3.103)
Determining its expectation requires averaging over all contributing channel coefficients hν with ν ∈ L. We distinguish two cases. Block fading channel For a block fading channel where all n symbols of a codeword by the same are affected channel coefficient hν = h, the sum in (3.103) becomes dH x(0) , x(i) · |h|2 . In this case, we have to average over a single coefficient and can exploit the results of Section 1.3. For a Rayleigh fading channel with σH2 = 1, we obtain " 4 3 dH x(0) , x(i) Es /N0 1 (0) (i) . (3.104) 1− Pd = Pr{x → x } = 2 1 + dH x(0) , x(i) Es /N0 Inserting (3.104) into the right-hand side of (3.98) provides the ergodic error probability Pe ≤ nd=dmin Ad · Pd . However, the union bound technique applied on convolutional codes and block fading channels converges only for extremely high SNRs. Even for SNRs in the range of 20–30 dB, the results are not meaningful at all. Perfectly interleaved channel If the channel is perfectly interleaved, the coefficients hν are statistically independent from each other and identically distributed. In this case, we use the equivalent expression for the complementary error function already known from Section 1.3 and (3.103) becomes + * π/2 2 1 ν∈L |hν | · Es /N0 dθ. (3.105) exp Pr{x(0) → x(i) | h} = · π 0 sin2 (θ ) The ergodic error probability has to be calculated by averaging (3.105) with respect to the set of channel coefficients hν for ν ∈ L. This procedure was already applied for diversity reception in Section 1.3. Hence, it becomes obvious that coding over time-selective channels can exploit diversity. The achievable diversity degree depends on time of the coherence the channel and the data rate. We denote the process comprising dH x(0) , x(i) channel coefficients by H. The moment-generating function M|H|2 (s) of the squared magnitudes |hν |2 , ν ∈ L, requires a multivariate integration which can be separated into single integrations for i.i.d. coefficients hν . With M|H|2 (s) =
∞ 0
* esξ · p|H|2 (ξ ) dξ =
∞ 0
esξ · p|H|2 (ξ ) dξ
+dH
/ 0 x(0) ,x(i)
(3.106)
Es /N0 and the substitution s = − sin 2 (θ) we finally obtain
Pr{x(0) → x(i) } =
1 · π
π/2 *
0
& '+ Es /N0 dH M|H|2 − 2 sin (θ )
/
x(0) ,x(i)
0
dθ.
(3.107)
130
FORWARD ERROR CORRECTION CODING
Inserting the results already known for Rayleigh fading with σH2 = 1 (cf. (1.56)) and Rice fading with P = 1 (cf. (1.58)) finally leads to the union bound approximations PeRayleigh
#d π/2 ! n 1 sin2 (θ ) ≤ Ad · dθ π sin2 (θ ) + σH2 Es /N0 0
(3.108)
d=dmin
PeRice
#d π/2 ! n 1 (1 + K) sin2 (θ ) ≤ Ad · π (1 + K) sin2 (θ ) + Es /N0 0 d=dmin + * dKEs /N0 dθ, · exp − (1 + K) sin2 (θ ) + Es /N0
(3.109)
respectively. Again, bit error probabilities are obtained by replacing the coefficients Ad in (3.109) by Cd defined in (3.83). The corresponding error probability curves are depicted in Figure 3.13a for a Rayleigh fading channel and convolutional codes of different constraint lengths. Since the free distance df of a code determines the diversity degree for perfect interleaved fading channels, the slopes of the curves become steeper with increasing Lc and, thus, growing df . In order to evaluate the diversity degree of each code, the dashed lines represent the theoretical steepness for the associated diversity order df . Obviously, the solid lines are parallel to the dashed lines, indicating the same slope. In Figure 3.13b, the results for a half-rate code with Lc = 4 and a Rice fading channel with unit power P are depicted. As expected, the AWGN performance is approached for increasing Rice factor K. Figure 3.14 demonstrates the tightness of the union bound for medium and high SNRs. At low SNRs, it diverges and different bounding techniques should be preferred.
10
Pb →
10 10 10 10 10
a) Rayleigh fading channel
0 −1 −2
Lc Lc Lc Lc
=3 =5 =7 =9
−1
10
−2
10
−3 −4
−3
10
AWGN K =0 K =1 K =5 K = 10 K = 100
−4
10
−5
−5
10
−6
0
b) Rice fading channel
0
10
Pb →
10
−6
5 10 Eb /N0 in dB →
15
10
0
5 10 Eb /N0 in dB →
15
Figure 3.13 Bit error probabilities for convolutional codes (union bound, bold dashed lines: capacity bounds) a) Different constraint lengths and Rayleigh fading channel b) g1 = 158 and g2 = 178 and Rice fading channels
FORWARD ERROR CORRECTION CODING
131
0
10
AWGN, UB AWGN, sim Rayleigh, UB Rayleigh, sim Rice (K = 5), UB Rice (K = 5), sim
−1
Pb , BER →
10
−2
10
−3
10
−4
10
−5
10
−6
10
0
5 10 Eb /N0 in dB →
15
Figure 3.14 Comparison of union bound and simulation for convolutional code with g1 = 158 and g2 = 178 and different channels
3.5.3 Information Processing Characteristic Applying the union bound to evaluate the error rate performance of codes always assumes optimal MLD. However, especially for concatenated codes introduced in Section 3.6, optimum decoding is not feasible and nonoptimum techniques like turbo decoding have to be applied. In order to verify the performance of encoder and (suboptimum) decoder pairs the mutual information can be used (H¨uttinger et al. 2002; ten Brink 2001c). Simplifying the system given in Figure 3.1 leads directly to the model in Figure 3.15. For the sake of simplicity, we restrict our analysis to the AWGN channel. Without loss of generality, we choose a sequence d consisting of Nd information bits. This sequence is encoded with a code rate Rc = Nd /Nx and BPSK modulated, that is, we transmit a sequence x of Nx BPSK symbols over the channel. At the receiver, the matched filtered ˜ For the moment, the interleaver/de-interleaver pair is sequence r is decoded, delivering d. neglected. The sequences d, x, r, and d˜ are samples of the corresponding processes D, X , ˜ R, and D. The optimality of a code and a corresponding encoder–decoder pair can be evaluated ˜ between the encoder input and the decoder by comparing the mutual information I¯(D; D) ¯ output with the mutual information I (X ; R) between the channel input and matched filter output. The larger this difference, the larger the suboptimality of the encoder–decoder pair. n
d
FEC encoder
x
r
FEC decoder
d˜
Figure 3.15 Simplified model of communication system
132
FORWARD ERROR CORRECTION CODING
From the data-processing theorem (cf. Section 2.1), we already know that signal pro˜ ≤ I¯(X ; R) is always true. Since the cessing cannot increase the capacity and that I¯(D; D) mutual information depends on the length of the transmitted sequence, it is preferable to define the information processing characteristic (IPC) (H¨uttinger et al. 2002) IPC(C) =
1 ¯ ˜ · I (D; D) Nd
(3.110)
as an appropriate measure. It describes the average information common to the data D and ˜ which is normalized to the length of d. Hence, it can take values between its estimate D, zero and one. As the noise is white and stationary and since we transmit Nx BPSK symbols, I¯(X ; R) = Nx · C equals Nx times the channel capacity C and we obtain the relationship IPC(C) =
1 ¯ ˜ ≤ 1 · I¯(X ; R) = Nx · C = C . · I (D; D) Nd Nd Nd Rc
(3.111)
Equation (3.111) illustrates that the IPC is upper bounded by the ratio of channel capacity and code rate. This inequality holds for code rates Rc > C for which an error-free transmission is impossible owing to the channel coding theorem. On the other hand, IPC(C) cannot exceed 1 because we can transmit at most 1 bit/symbol for BPSK even for code rates below the capacity (C/Rc > 1). Consequently, IPC(C) ≤ min[1, C/Rc ]
(3.112)
holds. A perfect coding scheme with an optimum decoder can achieve at most equality in (3.112). IPC(C = Rc ) = 1 is only obtained for codes that reach Shannon’s channel capacity. Furthermore, it is shown in (H¨uttinger et al. 2002) that a perfect coding scheme does not ˜ = I¯(D; D) ˆ with Dˆ = sgn(D). ˜ benefit from soft-output decoding, that is, I¯(D; D) ˜ and, thus, IPC(C) are hard to determine owing to the For practical codes, I¯(D; D) generally nonlinear behavior of the decoder. An optimum decoding algorithm providing a posteriori probabilities Pr{x(i) | r} for each possible code sequence x(i) would not lose any information. Although a direct implementation requires prohibitively high computational costs, we can determine the corresponding IPC by applying the entropy’s chain rule (cf. (2.9)) I¯(D; R) = I¯(D1 ; R) + I¯(D2 ; R | D1 ) + · · · + I¯(DNd ; R | D1 · · · DNd −1 ).
(3.113)
It is shown in H¨uttinger et al. (2002) that I¯(Di ; R | D1 · · · Di−1 ) = I¯(D1 ; R) holds, leading to I¯(D; R) = Nd · I¯(D1 ; R1 ). Therefore, we obtain the mutual information for optimum soft-output sequence decoding by applying symbol-by-symbol decoding and restricting the IPC analysis only on the first information bit d1 of a sequence. IPC(C) = I¯(D1 ; R) = I¯(D1 ; D˜ 1 ) 1 =1+ · 2
∞ 1
−∞
pD˜ |D =µ (ξ ) dξ pD˜ 1 |D1 =µ (ξ ) · log2 1 1 1 ˜ 1 |D1 =ν (ξ ) ν=0 pD µ=0
(3.114)
FORWARD ERROR CORRECTION CODING b) symbol-by-symbol decoding 1
a) sequence decoding 1 0.8
0.8
0.6
0.6
0.4
uncoded Lc = 3 Lc = 5 Lc = 7
0.2 0 0
IPC →
IPC →
133
0.5
1 1.5 C/Rc →
0.4
uncoded Lc = 3 Lc = 5 Lc = 7
0.2 2
0 0
0.5
1 1.5 C/Rc →
2
Figure 3.16 IPC charts for different nonsystematic nonrecursive convolutional encoders with Rc = 1/2 (bold line represents ideal coding scheme) a) optimum sequence soft-output decoding b) optimum symbol-by-symbol soft-output decoding with BCJR Hence, we simply have to carry out a simulation with the BCJR algorithm performing an optimum symbol-by-symbol soft-output decoding, estimating pD˜ 1 |D1 =i (ξ ) for i ∈ {0, 1} by determining the corresponding histograms for the first bit d1 at the decoder output and inserting the obtained parameters into (3.114). As the decoder knows the initial state of the trellis, it can estimate the first bit much more reliably than all other bits di>1 . Therefore, sequencewise soft-output decoding will result in a higher IPC than symbol-bysymbol decoding. The results obtained for nonsystematic nonrecursive convolutional codes are shown in Figure 3.16a. The curve for the ideal coding scheme is obtained from equality in (3.112) showing that the IPC depends linearly on C until C reaches Rc . For C < Rc , soft-output sequence decoding of convolutional codes nearly achieves the performance of an ideal coding scheme (bold line). Although an error-free transmission is impossible in this region, the observed behavior is of special interest in the context of concatenated codes because the constituent codes with code rates Rc,i > C are operating above capacity. Only the overall rate of the concatenated code is below the channel capacity. In the region 0.4Rc ≤ C ≤ 0.7Rc , a small gap to the ideal coding scheme occurs while for C ≥ 0.7Rc , the optimum performance is reached again. For symbol-by-symbol soft-output decoding, we have to assume perfect interleaving before encoding and after decoding that destroys the memory in the end-to-end system (see gray colored interleavers in Figure 3.15). Since the D˜ i are now mutually independent, the entropy’s chain rule in (3.113) becomes Nd
˜ ≤ I¯(D; D) ˜ I¯(Di ; D˜ i ) = Nd · I¯(D; D)
(3.115)
i=1
where the inequality holds because memory increases the capacity. Consequently, the IPC for symbol-by-symbol decoding is obtained by extending (3.114) to all information bits di ,
134
FORWARD ERROR CORRECTION CODING b) symbol-by-symbol decoding 1
a) sequence decoding
0.8
0.8
0.6
0.6
0.4
uncoded Lc = 3 Lc = 5 Lc = 7
0.2 0 0
IPC →
IPC →
1
0.5
1 1.5 C/Rc →
0.4
uncoded Lc = 3 Lc = 5 Lc = 7
0.2 2
0 0
0.5
1 1.5 C/Rc →
2
Figure 3.17 IPC charts for different systematic recursive convolutional encoders with Rc = 1/2 (bold line represents ideal coding scheme) a) optimum symbol-by-symbol soft-output decoding with BCJR b) symbol-by-symbol soft-output decoding with Max-Log-MAP 1 ≤ i ≤ Nd resulting in ˜ =1+ 1 · I¯(D; D) 2
∞ 1
−∞
pD|D=µ (ξ ) dξ. pD|D=µ (ξ ) · log2 1 ˜ ν=0 pD|D=ν (ξ ) µ=0
(3.116)
The results are shown in Figure 3.16b. A comparison with Figure 3.16a illuminates the high loss due to symbol-by-symbol decoding. The performance of the optimum coding scheme is not reached over a large range of capacities C < 0.7. If the capacity C is smaller than the code rate Rc , the performance becomes even worse than in the uncoded case. Furthermore, we can observe a point of intersection for codes with different constraint lengths roughly at 0.55 C. Hence, weaker codes perform better for low C. Only for larger capacities, strong codes can benefit from their better error-correcting capabilities. Figure 3.17a illustrates the results for recursive systematic encoding and symbol-bysymbol MAP decoding. Since no soft-output sequence decoding is carried out, a high degradation compared to the optimum coding scheme can be observed. However, we perform always better than for an uncoded transmission. This is a major difference compared to nonsystematic nonrecursive encoding. It has to be mentioned that the curves for all RSC codes intersect exactly at C = Rc = 0.5. A reason for this behavior has not yet been found. Finally, suboptimum symbol-by-symbol Max-Log-MAP decoding loses a little compared to the optimum BCJR algorithm, mainly at low capacities as depicted in Figure 3.17b. In this area, the performance is slightly worse than in uncoded case. In summary, we can state that memory increases the performance of convolutional codes in the high SNR regime because the free distance dominates the error probability. On the contrary, for low SNR and, hence, small channel capacity C, the results of this subsection show that low-memory (weak) codes are superior. This behavior will be of importance in the next section because the constituent codes of a concatenated scheme often operate in the area C < Rc .
FORWARD ERROR CORRECTION CODING
135
3.6 Concatenated Codes 3.6.1 Introduction Reviewing the results of the previous section, especially Figures 3.12 and 3.13a, it seems to be questionable if Shannon’s channel capacity can be reached by simply increasing the constraint length of convolutional codes. Moreover, the computational decoding complexity increases exponentially with the encoder’s memory, leading quickly to impractical computational costs. Also, the linear block codes described so far are not suited to reach the ultimate limit. Exceptions are the low-density parity check (LDPC) codes (Gallager 1963) that perform close to the capacity limit; these are introduced in Section 3.7. A different approach for reaching the channel capacity was found by Forney (1966). He concatenated several simple codes with the aim to increase the overall performance while maintaining moderate decoding costs. In 1993, concatenated codes gained great attention with the presentation of the so-called turbo codes (Berrou et al. 1993). These originally halfrate codes represent a parallel concatenation of two convolutional codes that are decoded iteratively and approach Shannon’s capacity up to a small gap of 0.5 dB. At that time, this was a phenomenal performance that pushed worldwide research activities in the area of concatenated codes. Meanwhile, a lot of research has been done in this field (Benedetto et al. 1998; Benedetto and Montorsi 1996a,b; Hagenauer 1996b; K¨uhn 1998a,b; Robertson et al. 1997; ten Brink 2001b,c) and the basic understanding becomes better. Woven codes (Freudenberger et al. 2001; Jordan et al. 2004; Lucas et al. 1998) have also shown exceptional performance. More generally, the concatenation with the corresponding decoding principle is not restricted to FEC codes but can be applied to a variety of concatenated systems like modulation and coding (H¨oher and Lodge 1999), coding and equalization (Bauch and Franz 1998; Hanzo et al. 2002a), or coding and multiuser (multilayer) detection (Hochwald and ten Brink 2003; K¨uhn 2003; Sezgin et al. 2003a). Principally, we distinguish between parallel and serial code concatenations. A serial concatenation of N codes with rates Rc,i = ki /ni is depicted in Figure 3.18. Each encoder processes the entire data stream of the preceding encoder where successive encoders Ci and Ci+1 are separated by an individual interleaver i . As we will see in the next subsection, the interleaver plays a crucial role in concatenating codes. The entire code rate is simply the product of the rates of all contributing codes Rc =
N 7
Rc,i .
(3.117)
i=1
The corresponding decoders are arranged in reverse order and linked by de-interleavers. However, as will be shown later, the signal flow is not one directional but decoders may also feed information back to the preceding instances.
encoder 1
1
encoder 2
N−1
Figure 3.18 Serial code concatenation
encoder N
136
FORWARD ERROR CORRECTION CODING
encoder 1 1
M U X
encoder 2
N−1
encoder N Figure 3.19 Parallel code concatenation On the contrary, each encoder processes the same information bits in the case of a parallel concatenation (Figure 3.19). However, the orders of the encoder’s input bits are different owing to the interleaving. This results in substreams at the encoder outputs that are not only permuted but also nearly independent. They are finally multiplexed, resulting in a total code rate of k 1 = 1 . (3.118) Rc = n1 + · · · + nN + · · · + R1 R c,1
c,N
With reference to the decoding process, is referred at the moment in the following sections. For the sake of simplicity, the reference to the two concatenated codes is restricted to the following part. Therefore, only one interleaver is employed and its index can be omitted. In the case of a serial concatenation, encoder one is denoted as the outer encoder, and, consequently, encoder two as the inner encoder. Since concatenated codes comprise several components, a number of parameters have to be optimized. Questions such as the distribution of the code rates among the encoders or the order of strong and weak codes for a serial concatenation have to be answered. Besides the constituent codes, the interleaver represents a key component. Investigations in the past years have shown that random or pseudorandom interleaving is superior to simple block interleaving in many cases (Barbulescu and Pietrobon 1995; H¨ubner et al. 2003, 2004; Jordan et al. 2001). In order to analyze the BER performance of concatenated codes analytically, we can apply the union bound given in (3.99) !" # n Eb 1 Cd · erfc dRc . Pb ≤ · 2 d=d N0 min
Determining the coefficients Cd requires knowledge of the IOWEF (3.80) A(W, D) =
k n w=0 d=0
Aw,d · W w D d
FORWARD ERROR CORRECTION CODING
137
0 0 1 1 0 1 0 1 0 1 0 1
uniform interleaver
1 0 0 1 0 1 1 0 1 0 1 0 1 1 0 0
Figure 3.20 Illustration of uniform interleaver of length Lπ = 4 and an input sequence with weight w = 2 of the whole concatenated code. An exact calculation of the IOWEF is a demanding task or even impossible because the number of states of the underlying supertrellis becomes extremely high and because concatenated codes are time variant owing to the inherent interleaving. A simplified analysis was derived in Benedetto and Montorsi (1996a,b), Benedetto et al. (1998) on the basis of the concept of a uniform interleaver. The uniform interleaver is a theoretical device that covers all possible permutations of a sequence with a certain length Lπ . All permutations are equally likely so that they are −1 weighted with Lwπ where w denotes the weight of the permuted sequence. Therefore, the uniform interleaver represents an average interleaver and, consequently, an average IOWEF of the concatenated code will be obtained. As a consequence, the IOWEF does not solely consist of integer numbers but can also contain real numbers. Figure 3.20 illustrates an example of a uniform interleaver of length Lπ = 4 for a sequence with weight w = 2. The next two sections analyze the error rate performance of concatenated codes using the IOWEF and the union bound technique. In the previous sections, we saw that the union bound diverges at low SNRs, that is, exactly in that region where concatenated codes are intended to operate. Hence, it seems to be questionable whether the results obtained with the union bound technique are meaningful. Furthermore, the application of the union bound presupposes an optimal MLD of the concatenated code. As we will see in Section 3.6.4, this solution is generally not feasible and suboptimum iterative approaches are applied. However, comparisons between analytical and simulation results have shown a good consistency for not too low SNRs, that is, code rates above the cutoff rate of the channel (Benedetto et al. 1998). Therefore, at least the asymptotic behavior can be predicted with the union bound. Approaching the channel capacity at very low SNRs, so-called EXtrinsic Information Transfer (EXIT) charts presented in Section 3.6.5 allow more accurate predictions taking into account the iterative decoding approach.
3.6.2 Performance Analysis for Serial Concatenation As mentioned above, the focus is only on the concatenation of two constituent codes depicted in Figure 3.21. The introduction of the uniform interleaver decouples both component codes so that all possible sequences with a certain weight l at the output of the outer encoder are mapped onto all possible input sequences with weight l of the inner encoder.
138
FORWARD ERROR CORRECTION CODING d
encoder 1
b1
encoder 2
b2
Figure 3.21 Serial concatenation of two codes Therefore, the IOWEF of the entire code is obtained by multiplying the conditioned IOWEFs of the constituent codes according to Benedetto et al. (1998) A1 (W, l) · A2 (l, D) w d = Aser (3.119) Aser (W, D) = L w,d · W D . π
l
l
w
d
From (3.119), we see that the output weight of the outer encoder equals the input weight of Lπthe inner encoder. Averaging over all possible permutations requires the division by l , that is, the number of possibilities to arrange l ones in a block of length Lπ . The coefficients Cd are obtained by w · Aser (3.120) Cd = w,d . L π · Rc,1 w In relation to the concatenation of block codes, we have to consider the particularity that the interleaver length Lπ may cover several codewords of the inner and outer codes. Hence, the IOWEFs inserted in (3.119) have to be extended to describe a set of statistically independent (parallel) codewords. Assuming that Lπ is m1 times as large as the codeword length n1 of the outer code, the resulting IOWEF is simply the m1 th power of A1 (W, D) m
A1 1 (W, D) = [A1 (W, D)]m1 .
(3.121)
If the interleaver additionally comprises m2 information words of the inner encoder C2 , the overall IOWEF becomes Am1 (W, l) · Am2 (l, D) w d 1 = Aser (3.122) Aser (W, D) = L 2 w,d · W D . π
l
l
w
d
For convolutional codes, this case is generally irrelevant because the sequence length matches the interleaver size. Code Design Without further derivations, some guidelines are now presented, relating to the design of a serial code concatenation on the basis of results obtained from (3.119) and the union bound in (1.45). Detailed derivations can be found in Benedetto et al. (1998, 1996), Kammeyer and K¨uhn (2001). 1. The inner code should always be a recursive convolutional code. Only in this case a gain is obtained by increasing the interleaver size especially at low SNRs. 2. The effective distance deff of the inner code should be as large as possible. It is defined as the minimum Hamming weight among all codewords whose input weight amounts to wH (d) = 2. This constraint can be fulfilled by choosing a recursive convolutional code whose feedback polynomial is prime.
FORWARD ERROR CORRECTION CODING
139
3. Another criterion that partly contradicts the last one states that sequences with an input weight 3 of the inner encoder have a large influence on the error rate performance. Hence, for an odd minimum Hamming distance of the outer code, a feedback polynomial of the inner code containing the factor 1 + D is often advantageous. 4. The free distance of the outer code should be as large as possible because it represents the input weight of the inner encoder. This ensures a large interleaving gain. 5. The number of sequences with wH (b1 ) = df and their input weights should be minimized for the outer code. This generally leads to the choice of nonrecursive convolutional codes as outer codes. All these guidelines are valid for code rates below the cutoff rate (Benedetto et al. 1998). For codes working above the cutoff rate, that is, at extremely low SNRs, different guidelines may be valid. These are discussed in Section 3.6.5. Example: Serial Concatenation of Convolutional Codes An example for the performance of two serially concatenated convolutional codes is now illustrated. As constituent codes, half-rate codes with constraint length Lc = 4 have been chosen. They are punctured appropriately in order to obtain an overall code rate of Rc = 1/2. Figure 3.22a shows the structure of the nonrecursive nonsystematic encoder with the generator polynomials g1 (D) = 1 + D + D 3 and g2 (D) = 1 + D + D 2 + D 3 . It is used as the outer encoder. Figure 3.22b depicts one of the two possible recursive systematic encoders with g2 (D) = (1 + D + D 2 + D 3 )/(1 + D + D 3 ). Alternatively, the encoder with g2 (D) = (1 + D + D 3 )/(1 + D + D 2 + D 3 ) (not depicted) is used as the inner encoder. Moreover, random interleavers of different lengths are employed. The concatenation is compared to a conventional half-rate NSC code with constraint length Lc = 9. This specific choice allows for similar decoding complexities. As a measure of the decoding complexity, the number of states in the trellis is used. In Subsection 3.6.4, it will be explained that decoding is performed iteratively. Assuming that the Max-Log-MAP algorithm requires twice as many operations as the Viterbi algorithm owing to the forward and backward processing of the trellis, and taking into account that two constituent codes have to be decoded, we can carry out 8 decoding iterations for a trellis with 8 states (Lc = 4) to have the same complexity as for a single Lc = 9 code (256 states/(2 · 2 · 8 states) = 8 iterations). The code parameters of the considered serial concatenations are summarized in Table 3.1 where Rc denotes the code rate after puncturing.
a)
b)
D
D
D
D
D
D
Figure 3.22 a) NSC encoder with g1 (D) = 1 + D + D 3 , g2 (D) = 1 + D + D 2 + D 3 b) RSC encoder with g2 (D) = (1 + D + D 2 + D 3 )/(1 + D + D 3 )
140
FORWARD ERROR CORRECTION CODING Table 3.1 Parameters of serial code concatenations
name
outer NSC code C1
inner RSC code C2
puncturing
SC1
Lc = 4, Rc = 2/3 g1 (D) = 1 + D + D 2 + D 3 g2 (D) = 1 + D + D 3
Lc = 4, Rc = 3/4 g1 (D) = 1 1+D+D 3 g2 (D) = 1+D+D 2 +D 3
P1 = 11 01 P2 = 11 10 01
SC2
Lc = 4, Rc = 2/3 g1 (D) = 1 + D + D 2 + D 3 g2 (D) = 1 + D + D 3
Lc = 4, Rc = 3/4 g1 (D) = 1 2 +D 3 g2 (D) = 1+D+D 1+D+D 3
P1 = 11 01 P2 = 11 10 01
SC3
Lc = 4, Rc = 3/4 g1 (D) = 1 + D + D 2 + D 3 g2 (D) = 1 + D + D 3
Lc = 4, Rc = 2/3 g1 (D) = 1 1+D+D 3 g2 (D) = 1+D+D 2 +D 3
P1 = 11 10 01 1 0 P2 = 1 1
Lc = 9, Rc = 1/2, g1 = 5618 , g2 = 7538
NSC
20
10
NSC Lπ = 90 Lπ = 900 Lπ = 9000
10
Cd →
10
0
10
−10
10
−20
10
0
10
20 d→
30
40
Figure 3.23 Coefficients Cd for NSC code (g1 = 5618 , g2 = 7538 ) and serial concatenation SC1 for different interleaver sizes Figure 3.23 illustrates the coefficients Cd obtained from (3.120). The outer NSC code was punctured to Rc,1 = 2/3 and the inner code to Rc,2 = 3/4. Obviously, the concatenated codes have a much smaller minimum Hamming distance than the Lc = 9 code. As a result of the puncturing, the free distances of the constituent codes have been largely reduced, resulting in an overall minimum Hamming distance of only four.9 On the contrary, the NSC code with Lc = 9 has a free distance of df = 12 and, therefore, a better asymptotic performance. However, with increased interleaver size, the average number of sequences with a low Hamming weight becomes very small, that is, only few interleavers will cause 9 Be aware that this is only an average minimum distance due to the uniform interleaving. A lot of permutations exist that lead to larger minimum distances. This is expressed by coefficients Cd being much smaller than one.
FORWARD ERROR CORRECTION CODING 10
Pb →
10 10 10 10
141
0
NSC SC1, SC2, SC1, SC2, SC1, SC2,
−5
−10
Lπ Lπ Lπ Lπ Lπ Lπ
= 90 = 90 = 900 = 900 = 9000 = 9000
−15
−20
0
2
4 6 Eb /N0 in dB →
8
10
Figure 3.24 Error rate performance of NSC code and serial concatenations SC1 and SC2 for different interleaver lengths
sequences with low weight. Hence, it should be possible to find at least one interleaver with high minimum distance. Moreover, at low SNRs, the coefficients Cd have even more impact on the error rate performance than the distance d itself. This is confirmed in Figure 3.24 showing the bit error probabilities of the considered codes. Increasing the interleaver size results in a remarkable gain even at low SNRs. However, the minimum distance is not increased by larger interleavers, only the average number of sequences with small Hamming weight is lowered. Asymptotically, the convolutional code with Lc = 9 outperforms the serial concatenation owing to its larger free distance. A quantitative comparison for very low SNRs is not possible because of the inaccuracy of the union bound in this area. Additionally, we see that the code SC1 whose feedback polynomial contains the term 1 + D performs slightly better at medium and high SNRs than the code SC2 with a prime feedback polynomial. This was already stated in requirement 3 of the code design. Finally, Figure 3.25 demonstrates the effect when the code rates of inner and outer codes are exchanged. This is accomplished by simply switching the puncturing pattern between both codes. It has to be mentioned that the exchange of code rates also affects the interleaver size because the overall codeword length was assumed to be constant. It can be seen that the puncturing scheme for SC1 (outer rate Rc,1 = 2/3 and inner rate Rc,2 = 3/4) shows a better performance at low and medium SNRs. Hence, the union bound tells us that it is better to have a strong outer code with a high minimum distance than a strong inner code. Please note that MLD is assumed.
3.6.3 Performance Analysis for Parallel Concatenation The parallel code concatenation set in motion the development of general code concatenation very much. The most famous representations are the turbo codes presented for the first time in 1993 by Berrou and Glavieux (1993), Berrou et al. (1993). They approached Shannon’s capacity within 0.5 dB for a half-rate code, initiating a worldwide boom. Besides the
142
FORWARD ERROR CORRECTION CODING 10
Pb →
10 10 10 10
0
NSC SC1, SC3, SC1, SC3, SC1, SC3,
−5
−10
Lπ Lπ Lπ Lπ Lπ Lπ
= 90 = 80 = 900 = 800 = 9000 = 8000
−15
−20
0
2
4 6 Eb /N0 in dB →
8
10
Figure 3.25 Error rate performance of NSC code and serial concatenations SC1 and SC3 for different interleaver lengths d d
encoder 1
b1
d˜
encoder 2
P + M U X
b
b2
Figure 3.26 Parallel concatenation of two codes for systematic encoding choice of important encoder ingredients such as the RSC codes and the specific interleaver structure, the iterative decoding strategy is responsible for the big success of turbo codes. Meanwhile, a lot of research in the area of concatenated codes has been done in the last decade and the gap for Shannon’s channel capacity was closed up to 0.1 dB (ten Brink 2000b) and even below. The focus in the following part is on the description of turbo codes, that is, the concatenation of systematic constituent encoders.10 The corresponding structure for a parallel concatenation of two codes is shown in Figure 3.26. Since each information bit should be transmitted at most once, d can be directly fed to the puncturing device. Hence each encoder only generates additional parity bits b1 and b2 . In order to adjust the entire code rate, appropriate puncturing of d, b1 , and b2 may be performed. Finally, the resulting sequences are multiplexed to a vector b. In contrast to the serial concatenation, all parallelly arranged encoders obtain the same information bits but in different orders. Hence, the weight of their input sequences is always 10 Generally,
the parallel concatenation of nonsystematic encoders is also possible (Costello et al. 2000).
FORWARD ERROR CORRECTION CODING
143
the same. Moreover, the encoder outputs contain only parity bits. Their weights are denoted by p and the conditional IOWEF is obtained by Aw,p · P p , (3.123) A(w, P ) = p
where Aw,p denotes the number of sequences with an input weight w and a parity weight p. The entire Hamming weight of a sequence amounts to d = w + p. Using the uniform interleaver, the conditional IOWEF of the parallel concatenation becomes (Benedetto and Montorsi 1996a; K¨uhn 1999) A1 (w, P ) · A2 (w, P ) par = Aw,p · P p . (3.124) Apar (w, P ) = Lπ p
w
By multiplying the conditional IOWEFs A1 (w, P ) and A2 (w, P ), we ensure that all output sequences with the same input weight w are combined. Uniform interleaving is now only applied to the information bits, leading to the normalization with Lwπ resulting in w Cd = · Apar (3.125) w,p , Lπ w+p=d
where the sum runs over all pairs (w, p) whose sum equals the entire Hamming weight d = w + p. Note that information as well as parity bits may be punctured; this has to be considered in (3.125). Code Design As in the case of the serial concatenation, some guidelines exist for the code construction. Since the target of concatenating codes was the construction of powerful codes on the basis of simple constituent encoders, convolutional codes with small constraint lengths are mainly employed. Although codes with large memory may also show a good performance (Costello et al. 2000), they contradict the requirement of a feasible decoding effort. It can be shown that the bit error rate behaves asymptotically like min Pb ∼ L1−w π
=⇒
Pb ∼ L−1 π
for wmin = 2
(3.126)
where wmin represents the minimum input weight of an encoder to obtain an output sequence with finite weight. Obviously, wmin = 2 holds for RSC codes (cf. page 101) so that the error rate decreases when the interleaver is enlarged. This is not true for nonrecursive codes with wmin = 1. Hence, RSC codes are used as constituent codes in a parallel concatenation. Moreover, codes with rate 1/n are preferred because higher rates can be easily achieved by appropriate puncturing. Finally, the effective distance deff (see page 138) has to be maximized rather than the free distance df . From this it can be shown that the feedback polynomial should be prime. Besides the constituent codes, the interleaver also has a deep impact on the performance. First of all, it has to avoid the simultaneous generation of two code sequences with low weight. In that case, the minimum Hamming distance of the entire code would be small. Hence, if the first encoder outputs a low-weight sequence, the information bits should be
144
FORWARD ERROR CORRECTION CODING Table 3.2 Parameters of parallel code concatenations name
constituent code
PC1
Lc = 4, Rc = 2/3, g1 (D) = 1, g2 (D) =
1+D+D 2 +D 3 1+D+D 3
PC2
Lc = 4, Rc = 2/3, g1 (D) = 1, g2 (D) =
1+D+D 3 1+D+D 2 +D 3
P=
1+D+D 2 +D 3 1+D+D 3
P=
Rc
PC3
Lc = 4,
= 2/3, g1 (D) = 1, g2 (D) =
NSC
Lc = 9, Rc = 1/2, g1 = 5618 , g2 = 7538
puncturing 11 02 P= 11 01
11 02 11 01
11 12 10 01
permuted in such a way that the output of the second encoder has a large weight. Therefore, the interleaver directly influences the distance spectrum of the concatenated code.11 Example: Turbo Codes The performance of the parallel code concatenation for the example of a turbo code is now illustrated. As constituent codes, the same convolutional codes as in Subsection 3.6.2 have been chosen. They are both punctured to the rate Rc,1 = Rc,2 = 2/3, yielding an entire code rate of Rc = 1/2.12 The turbo codes are compared to a conventional half-rate NSC code with constraint length Lc = 9 because of a similar decoding complexity. All configurations are listed in Table 3.2. Figure 3.27 illustrates the coefficients Cd . As in the case of the serial concatenation, the coefficients become smaller with increasing interleaver size. Hence, sequences with low weight occur less frequently and the performance at low SNRs is expected to be better for turbo codes. The minimum distance of the turbo code amounts to 10, while that of the convolutional code is 12 so that the Lc = 9 convolutional code will outperform the turbo code asymptotically. On the basis of the coefficients shown in Figure 3.27, Figure 3.28 depicts the bit error rate performance of the turbo code. As expected, the performance enhances for larger interleavers. The NSC code is outperformed over the whole range of depicted SNRs. The slope of its BER curve becomes higher for large SNRs, leading to a better asymptotic performance outside the scope of Figure 3.27. Furthermore, we observe a slight difference between the codes PC1 and PC2. PC1 outperforms PC2 owing to its prime feedback polynomial.13 This verifies the influence of requirement 2 of the code design. Figure 3.29 compares different puncturing schemes for the turbo code. Obviously, PC3 puncturing of only the parity bits leads to a worse performance. The scheme PC1 also punctures some systematic information bits and, thus, keeps more of the redundancy of the 11
This cannot be observed for the uniform interleaver that comprises all possible permutations. Both the component codes have the same code rate of 2/3 only if the information bits are assigned to both encoder outputs although they are transmitted only once. If they are assigned to only one constituent code, this code has the rate 2/3, while the remaining code transmits only 1 parity bit per two information bits (Rc,2 = 2). 13 The feedback polynomial of PC2 can be factorized to 1 + D + D 2 + D 3 = (1 + D)(1 + D 2 ). 12
FORWARD ERROR CORRECTION CODING 10 10
Cd →
10 10 10 10 10 10
145
15
NSC Lπ = 60 Lπ = 600 Lπ = 6000
10 5 0 −5 −10 −15 −20
0
10
20 d→
30
40
Figure 3.27 Coefficients Cd for NSC code and turbo code PC1 for different interleaver lengths 0
10
NSC PC1 PC2 Lπ = 60 Lπ = 600 Lπ = 6000
−5
Pb →
10
−10
10
−15
10
−20
10
0
2
4 6 Eb /N0 in dB →
8
10
Figure 3.28 Error rate performance of NSC code and of half-rate turbo codes PC1 and PC2 code. Although the minimum Hamming distance is 10 for both schemes, the coefficients Cd are much smaller for PC1 so that it has fewer sequences with low weight, leading to a better overall performance. Comparison of Serial and Parallel Concatenations Figure 3.30 illustrates the comparison of serial and parallel code concatenations for the examples discussed in the previous subsections. All codes have the overall rate Rc = 1/2 and the interleaver lengths were chosen such that the codeword lengths are identical for serial and parallel concatenations. It can be clearly seen that the parallel concatenation shows the best performance for all codeword lengths. For large interleavers, the difference
146
FORWARD ERROR CORRECTION CODING 0
10
NSC PC1 PC3 Lπ = 60 Lπ = 600 Lπ = 6000
−5
Pb →
10
−10
10
−15
10
−20
10
0
2
4 6 Eb /N0 in dB →
8
10
Figure 3.29 Error rate performance of NSC code and half-rate turbo codes PC1 and PC3 0
10
serial parallel nN = 120 nN = 1200 nN = 12000
−5
Pb →
10
−10
10
−15
10
−20
10
0
2
4 6 Eb /N0 in dB →
8
10
Figure 3.30 Comparison of best serial concatenation (SC1) and best parallel concatenation (PC1) for half-rate codes and different codeword lengths seems to become smaller at low SNRs. Asymptotically, the parallel concatenation shows a lower asymptotical error rate owing to its larger minimum Hamming distance.
3.6.4 Turbo Decoding of Concatenated Codes As already mentioned, the specific decoding of concatenated codes plays a major role in their success. Since the optimal MLD of the entire code is infeasible, an approach to this solution iteratively by decoding all contributing codes separately is necessary. The key idea behind this approach is that the interleaver decouples the concatenated codes so that the extrinsic information obtained by the two decoders can be assumed to be mutually independent. Hence, exchanging extrinsic information between the involved decoders may improve the overall performance, leading, hopefully, to a near maximum likelihood estimate.
FORWARD ERROR CORRECTION CODING L(ν) a,2 (b1 ) L(r | b2 )
decoder 2
ˆ L(ν) 2 (b1 )
147
−1
ˆ L(ν−1) e,1 (b1 )
L(ν) a,1 (b1 )
ˆ L(ν) 1 (b1 )
decoder 1 ˆ L(ν) 1 (d)
Figure 3.31 Iterative decoding for a serial concatenation of two codes Iterative Decoding for Serial Concatenations The serial concatenation according to Figure 3.18 is now described here. The corresponding decoder structure employs soft-in soft-out symbol-by-symbol decoders such as the BCJR algorithm for each constituent code and is depicted in Figure 3.31. The first iteration of the ˆ process starts with the inner decoder 2 delivering the LLR L(1) 2 (b1 ) whose de-interleaved (1) ˆ version represents the input La,1 (b1 ) of the outer decoder 1. This decoder now provides ˆ estimates L(1) 1 (d) of the information bits as well as LLRs of the coded bits (1) (1) ˆ ˆ L(1) 1 (b1 ) = La,1 (b1 ) + Le,1 (b1 ).
(3.127)
From (3.127), we see that the LLRs consist of the input signal itself and the extrinsic (1) ˆ (1) ˆ ˆ part L(1) e,1 (b1 ), which is extracted by subtracting La,1 (b1 ) from L1 (b1 ). The interleaved (2) ˆ extrinsic LLRs are fed back as a priori information La,2 (b1 ) to decoder 2 for the second iteration.14 Now the second iteration starts with an improved decoder 2 that can also exploit the a ˆ priori information L(2) a,2 (b1 ) according to (3.59) and (3.73). As shown in (3.47) for systematic encoding, its output generally consists of three parts (ν) (ν) ˆ ˆ L(ν) 2 (b1 ) = Lch · rs + La,2 (b1 ) + Le,2 (b1 )
(3.128)
where rs only contains the received systematic symbols. Since L(ν) a,2 (b1 ) was generated by (ν−1) ˆ Le,1 (b1 ) of decoder 1, it must not be delivered back to the outer decoder but has to be subtracted prior to the de-interleaving, resulting in (ν) ˆ (ν) L(ν) a,1 (b1 ) = L2 (b1 ) − La,2 (b1 ).
(3.129)
Repeating the described procedure multiple times results in an iterative decoding scheme from which the name turbo codes can be explained because it resembles the principle of a turbo engine. Some simulation results for the same constituent codes listed in Table 3.1 and a simple AWGN channel are now presented. Unless otherwise stated, decoding with the Max-LogMAP algorithm is performed. First, Figure 3.32 shows the BERs for different decoding iterations and an interleaver size of Lπ = 80 bits. Obviously, the error rate decreases continuously with each additional iteration. However, the incremental gain becomes smaller and smaller because the correlation among the extrinsic LLRs gets larger with each iteration step. These correlations can be reduced by increasing the length of the interleaver. For Eb /N0 > 4 dB, the union bound is very tight and can be approached by the iterative decoding scheme. 14 Remember that the coded bits of the outer code are the information bits of the inner code in a serial concatenation.
148
FORWARD ERROR CORRECTION CODING 0
10
union bound simulations
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.32 Illustration of performance improvements by iterative decoding of serial concatenation SC3 and a random interleaver of length Lπ = 80 0
10
Lπ = 80 Lπ = 800 Lπ = 8000 iteration 1 iteration 3 iteration 10
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.33 Comparison of different interleaver lengths for serial concatenation SC3 Figure 3.33 illustrates the influence of the interleaver size Lπ . As can be seen, the gains by additional decoding iterations become larger with increasing Lπ because the abovementioned correlations are reduced. For Lπ = 8000, a BER of 10−5 is obtained at an SNR of 2 dB. Decreasing the interleaver length to Lπ = 800 roughly requires Eb /N0 = 3 dB for achieving the same performance. For Lπ = 80, the error rate is reached for Eb /N0 = 5 dB. Figure 3.34 compares different puncturing schemes to achieve a total code rate of 1/2. Contrary to Figure 3.25, keeping the inner code stronger with a rate 2/3 and puncturing the outer code to a rate 3/4 (scheme SC3) leads to better results than with scheme SC1. Finally, Figure 3.35 shows the comparison of Log-MAP and Max-Log-MAP decoding algorithms. It can be seen that the difference in the first iteration is negligible, while it becomes larger
FORWARD ERROR CORRECTION CODING 10
BER →
10 10 10 10 10
149
0
SC1, SC3, SC1, SC3, SC1, SC3,
−1
−2
−3
Lπ Lπ Lπ Lπ Lπ Lπ
= 90 = 80 = 900 = 800 = 9000 = 8000
−4
−5
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.34 Comparison of puncturing schemes SC1 and SC3 for different interleaver lengths 0
10
Max-Log-MAP, it. 1 Log-MAP, it. 1 Max-Log-MAP, it. 2 Log-MAP, it. 2 Max-Log-MAP, it. 6 Log-MAP, it. 6
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.35 Comparison of Max-Log-MAP and Log-MAP for serial concatenation SC3 with Lπ = 8000 and larger in subsequent iterations. In the sixth iteration, the loss due to the Max-Log-MAP algorithm is 0.5 dB. Iterative Decoding for Parallel Concatenations With reference to the parallel concatenation, both constituent encoders process the same information bits but in different orders. Consequently, the corresponding decoders estimate the same bits. However, their estimation is based on different parity bits that are decoupled because of the interleaver. As in the case of the serial concatenation, the extrinsic information at the decoder output is extracted and exchanged between the decoders. Figure 3.36
150
FORWARD ERROR CORRECTION CODING (ν−1) ˆ L(ν) a,1 (d) = Le,2 (d) -1
P L(r |d) 1 ˆ L(ν) L(r|d) + 1 (d) M decoder 1 U X L(r2 |d)
−1
ˆ L(ν) 2 (d)
decoder 2
−1
Figure 3.36 Iterative decoding for a parallel concatenation of two codes
shows the decoder structure. First, the received symbols are distributed to the corresponding decoders (MUX) and dummy bits are inserted at the positions of the punctured bits (P−1 ). The iterative process then starts with decoder 1 providing LLRs of the information bits (1) ˆ ˆ L(1) 1 (d) = L(r1 | d) + Le,1 (d).
The systematic part as well as the extrinsic information are fed to decoder 2, which additionally receives LLRs L(r2 | d) of the parity bits generated by encoder 2.15 Its extrinsic ˆ information is extracted, de-interleaved, and fed back as a priori knowledge L(2) a,1 (d) = (1) ˆ Le,2 (d) to decoder 1. Now, the second iteration starts with an improved decoder output (2) (2) ˆ ˆ L(2) 1 (d) = L(r1 | d) + La,1 (d) + Le,1 (d).
(3.130)
exploiting the a priori information. As in the case of a serial concatenation, the extrinsic information is extracted in each iteration and supplied as a priori knowledge to the opposite ˆ stems from decoder 2 and has to be eliminated before decoder. The a priori LLR La,1 (d) passing the signal again to decoder 2. Performing these steps several times leads to the well-known turbo decoding. Some simulation results for different turbo coding schemes listed in Table 3.2 are now discussed. The interleaver is chosen randomly, following the concept of uniform interleaving. Unless otherwise mentioned, decoding is performed with the Max-Log-MAP algorithm. Figure 3.37 shows the BERs for a turbo code with Lπ = 60. Puncturing is performed in such a way that the parity bits of both encoders are transmitted alternately, that is, the information bits are always transmitted. Obviously, the gains after the third iteration become very small for this little interleaver. The union bound is reached for Eb /N0 = 2.5 dB. Note that this does not mean that the performance of an overall maximum likelihood decoder is achieved because the union bound represents only an upper bound that can be underrun as depicted in Figure 3.37. Figure 3.38 illustrates the turbo code’s performance for different interleaver lengths. While the interleaver size has nearly no impact in the first iteration, large differences can be observed for subsequent decoding iterations. The longer the interleavers, the higher the additional gains for successive iterations. This behavior can be explained by the fact that large interleavers ensure a better decoupling of the a priori information, that is, the assumption of uncorrelated successive symbols La (di ) is better fulfilled for large interleavers. 15 Since the systematic information bits are an explicit part of the decoder’s output, they can be provided to decoder 2 via decoder 1.
FORWARD ERROR CORRECTION CODING
151
0
10
union bound simulations
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.37 Illustration of performance improvements by iterative decoding for turbo code PC3, interleaver size Lπ = 60
0
10
Lπ = 60 Lπ = 600 Lπ = 6000 iteration 1 iteration 3 iteration 10
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.38 Comparison of turbo code PC3 performance for different interleaver lengths
Hence, a gain of 1.5 dB between the third and the tenth iteration is obtained for Lπ = 6000 and only 1 dB for Lπ = 600. Since the systematic information bits are transmitted only once for classical turbo codes, the question arises whether puncturing should be applied only to the parity bits or also to the information bits. The analysis with the union bound in Section 3.6.3 delivered the result that puncturing information bits is superior because the code’s structure is maintained. Figure 3.39 shows the corresponding results obtained by Monte Carlo simulations. We see that the union bound results cannot be confirmed. Obviously, puncturing only the parity bits leads to a better first iteration and a faster convergence of the iterative process. Asymptotically, it seems that the gap can be decreased. The systematic information bits help
152
FORWARD ERROR CORRECTION CODING 0
10
PC3 PC1 iteration 1 iteration 3 iteration 10
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.39 Comparison of different puncturing schemes PC1 and PC2 for turbo codes, Lπ = 6000 0
10
SC3 PC3 nN = 120 nN = 1200 nN = 12000
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 3.40 Comparison of serial and parallel concatenations of convolutional codes for different interleaver lengths and 10 decoding iterations the decoder especially at low SNRs. This was also observed in ten Brink (2000b) where code doping is applied for ensuring a convergence of the iterative process. Moreover, puncturing an information bit affects both codes simultaneously, while puncturing a parity bit has impact only on the generating constituent code. Finally, serial and parallel concatenations should be compared. From Figure 3.40, we can conclude that the parallel concatenation of the considered convolutional code outperforms the serial one for all three interleaver lengths. Especially at very low SNRs, the turbo code starts with lower error rates and, thus, reaches the waterfall region, where the BER decreases rapidly, earlier. However, this result cannot be generalized to all possible serial and parallel concatenations.
FORWARD ERROR CORRECTION CODING
153
3.6.5 EXIT Charts Analysis of Turbo Decoding The analysis presented in Subsection 3.6.3 focused on the distance properties of concatenated codes and approximated the BER performance with the union bound. However, the union bound diverges at very low SNR, that is, exactly in the region of interest when approaching channel capacity. Moreover, this technique assumes an optimum maximum likelihood decoding that is not feasible in practice, and suboptimum iterative turbo decoding is applied instead. Turbo decoding was the subject of the previous subsection that delivered some examples concerning the error rate performance of turbo codes without deriving guidelines for their construction. Realizing that the basic principle of turbo decoding is the exchange of extrinsic information, ten Brink developed a new analysis tool based on the so-called EXIT charts (ten Brink 2000a,b, 2001a,b,c). With this approach, the iterative nature of the decoding process is taken into consideration and the performance of concatenated codes can be tightly predicted. This holds not only for concatenated codes but also for many types of concatenated systems (K¨uhn 2004; Sezgin et al. 2003a,b). Alternative solutions like density evolution (Divsalar et al. 2001), and so on, are less accurate. However, the use of mutual information presupposes infinite long sequences (interleavers) that are impossible in practice. Nevertheless, results will demonstrate that this technique accurately predicts the true performance and allows the optimization of the constituent codes. The basic idea behind this semianalytical approach is to determine the decoder’s input– output relationship with respect to the mutual a priori information I¯a and the mutual extrinsic information I¯e . At this point, we have to distinguish serial and parallel concatenations. We start with the serial code concatenation that is depicted once again in Figure 3.41. The inner decoder D2 processes the channel output r as well as the a priori LLRs La,2 (b)(ν) (cf. Figure 3.31) provided by the outer decoder D1 in the previous iteration (ν − 1). The information part contained in L(ν) a,2 (b) that is common to the true code bits b is the mutual a priori information (see also the definition of mutual information in Section 2.1.2) (ν) ¯(ν−1) . = I¯ b; L(ν) (3.131) I¯a,2 a,2 (b) = Ie,1 The second equality holds because the amount of information does not change by permutations. The mutual extrinsic information at the output of D2 (ν) ˆ ¯(ν) ¯(ν) (3.132) = I¯ b; L(ν) I¯e,2 e,2 (b) = f Es /N0 , Ia,2 = f C, Ia,2 depends on both inputs and can be expressed as a function of the channel capacity C (ν) or equivalently Es /N0 and I¯a,2 . With the above argumentation that interleaving does not change the mutual information, the a priori mutual information of the outer decoder D1 I¯a,2
n d
C1
b
C2
x
r
D2
I¯e,2
−1
I¯e,1
I¯a,1
D1
Figure 3.41 Simplified system model for serial code concatenation
154
FORWARD ERROR CORRECTION CODING I¯a,1
n d
C1
C2
M U X
x
D r E M U X
D1
−1
I¯e,1
I¯e,2
I¯a,2
D2
Figure 3.42 Simplified system model for parallel code concatenation becomes
(ν) (ν) = I¯e,2 . I¯a,1
(3.133)
This outer decoder only processes the output of the inner decoder and has no direct access to the channel. Therefore, its output is independent of the SNR and the mutual information ˆ (ν) of its extrinsic output Le,1 (b) (ν) (ν) ˆ ¯ (3.134) = I¯ b; L(ν) I¯e,1 e,1 (b) = f Ia,1 (ν) is only a function of I¯a,1 . Looking at the parallel code concatenation depicted in Figure 3.42, we have similar relationships. However, a slight difference appears in that both decoders have access to the channel output so that their outputs are both functions of the SNR as well as the a priori information. Moreover, the mutual information is always related to the information bits d instead of the code bits b because the decoders exchange information with respect to d. We obtain (ν) ˆ ¯(ν−1) ¯(ν) (3.135) = I¯ d; L(ν) I¯e,1 e,1 (d) = f Es /N0 , Ia,1 = f C, Ie,2 (ν) ˆ ¯(ν) ¯(ν) (3.136) = I¯ d; L(ν) I¯e,2 e,2 (d) = f Es /N0 , Ia,2 = f C, Ie,1 .
Next, we have to answer the question on how to determine the mutual information I¯e and I¯a . We follow the semianalytical approach of ten Brink (2001c) where I¯a is obtained by modeling the a priori LLRs La (dµ ) at the decoder inputs as Gaussian distributed random variables whose means depend on the transmitted data symbols. The motivation comes from the assumption that the extrinsic information at the output of a decoder is Gaussian distributed. Although this is only approximately true after some iterations, the tight results justify the assumption. According to (3.40) on page 111, the LLR of a signal y = x + n with the binary transmit symbol x = ±1 has the form L(y) = Lch · y = Lch · (x + n) =
2 2 ·x+ 2 ·n 2 σN σN
(3.137)
where n denotes real white Gaussian noise with variance σN2 = N0 /2/Ts . Owing to x = ±1, σX2 = Es /Ts = 1 holds and Lch = 2/σN2 is obtained. Obviously, the mean of this LLR for fixed x is 2/σN2 while its variance amounts to ! ! # #2 2 2 2 4 2 2 ·n = · σN2 = 2 . (3.138) σNa = E 2 2 σN σN σN
FORWARD ERROR CORRECTION CODING
155
Comparing (3.138) with the mean, we recognize that the latter is half of the variance (ten Brink 2001c). Since x is antipodal but the information bit dµ and the code bit bµ are unipolar, we define the new variable aµ = 1 − 2dµ or aµ = 1 − 2bµ , depending on the considered concatenation. Hence, we can model the Gaussian distributed LLR by La (a) =
σN2 a 2
· a + na .
(3.139)
From (3.139), we see that the artificially generated a priori LLR is created by taking the true data a – based on either the information bits or the code bits – weighting them with σN2 a /2, and adding white Gaussian noise na with zero mean and variance σN2 a . According to (2.58) on page 65, the average mutual information between equiprobable binary symbols aµ and the a priori LLRs is defined as ∞ & ' pNa (ξ − αA) 1 dξ pNa (ξ − αA) log2 I¯a = 1 + · 2 pNa (ξ − A) + pNa (ξ + A)
(3.140)
α=±1−∞
with A = σN2 a /2. Inserting the Gaussian distribution into (3.140) and exploiting the integrand’s symmetry yields I¯a = 1 −
∞
−∞
1
3
· exp − 2π σN2 a
(ξ − σN2 a /2)2 2σN2 a
4
log2 1 + e−ξ dξ.
(3.141)
Equation (3.141) shows that I¯a = f (σN2 a ) is only a function of the a priori variance σN2 a . Since this function is monotonically increasing as depicted in Figure 3.43, the variance of Na can be determined by σN2 a = f −1 (I¯a ). Hence, the a priori information is modeled according to (3.139) where the noise variance σN2 a is chosen according to the given I¯a . 1
I¯a →
0.8 0.6 0.4 0.2 0 0
10
20
σN2 a Figure 3.43 Relationship between
σN2 a
30
40
→
and the mutual a priori information I¯a
156
FORWARD ERROR CORRECTION CODING 1
Eb /N0 = ˆ − 1 dB ˆ 0 dB Eb /N0 = Eb /N0 = ˆ 1 dB Eb /N0 = ˆ 2 dB Eb /N0 = ˆ 3 dB
I¯e →
0.8 0.6 0.4 0.2 0 0
0.2
0.4 0.6 ¯ Ia →
0.8
1
Figure 3.44 Mutual extrinsic versus mutual a priori information for RSC code with g1 (D) = 1, g2 (D) = (1 + D + D 3 )/(1 + D + D 2 + D 3 ) and AWGN, MAP algorithm: solid line, Max-Log-MAP algorithm: dashed line Owing to the nonlinear behavior of soft-in soft-out decoding algorithms, the mutual extrinsic information cannot be calculated analytically. Hence, simulations have to be carried out for measuring the conditional histograms pˆ Z|a=±1 (z) of the extrinsic information zµ = Le (aµ ) at the decoder output. The average extrinsic information is obtained by solving ∞ & ' pˆ Z|a=α (z) 1 dz pˆ Z|a=α (z) log2 I¯e = 1 + · 2 pˆ Z|a=+1 (z) + pˆ Z|a=−1 (z)
(3.142)
α=±1−∞
numerically.16 Figure 3.44 shows the relationship I¯e = f (Es /N0 , I¯e ) for a systematic recursive convolutional code and an AWGN channel with different SNRs. The decoder processes both, the output of the AWGN channel as well as a priori LLRs with specified I¯a . Obviously, the mutual extrinsic information increases with growing SNR and I¯a . If the a priori information is certain, that is, I¯a = 1, the extrinsic information at the decoder output is reliable too, regardless of the channel quality. On the contrary, for high SNR, nearly no a priori information is needed for good decoding results. Moreover, the Max-Log-MAP algorithm seems to lose only at very low SNRs and for small mutual a priori information. These small losses will be important at the end of this section. Figure 3.45a compares the results for different RSC codes if only a priori information is provided to the decoder as in the case of the outer decoder in a serial concatenation. Astonishingly, all curves intersect at the point (0.5, 0.5). Up to now, no explanation was found for this behavior. On the left-hand side of this intersection point, stronger codes with large constraint length Lc perform worse than codes those with small Lc . Moreover, the 16 For decoding algorithms delivering true LLRs, the mutual information can be calculated without using histograms. Since we also look at the Max-Log-MAP decoder providing only approximations of the true LLRs, we restrict to the histogram-based way.
FORWARD ERROR CORRECTION CODING
157 b) total information 1
0.8
0.8
0.6
0.6
I¯t →
I¯e →
a) extrinsic information 1
0.4
0.4
Lc = 3 Lc = 5 Lc = 7 uncoded
0.2 0 0
0.2
0.4 0.6 I¯a →
0.8
Lc = 3 Lc = 5 Lc = 7 uncoded
0.2
1
0 0
0.2
0.4 0.6 I¯a →
0.8
1
Figure 3.45 Mutual a) extrinsic and b) total information versus mutual a priori information for RSC codes with different Lc , AWGN channel and MAP decoding
mutual extrinsic information is smaller than the mutual a priori information at the decoder’s input. On the right-hand side, the relations change and strong codes perform better. Here, I¯e exceeds I¯a . ˆ at the decoder’s outFigure 3.45b depicts the total mutual information I¯t = I¯(b; L(b)) ˆ = La (b) + Le (b). ˆ There is still the intersection at I¯a = 0.5. put on the basis of L(b) Additionally, it becomes obvious that the decoder always provides an improvement (possibly very small) compared to the uncoded case. Comparing a and b of Figure 3.45, we ˆ does not result in a sum of the correspondobserve that the sum of La (b) and Le (b) ˆ has to be directly used to determine ing mutual information I¯a and I¯e . Instead, L(b) I¯t or an approximation called information combining (Land et al. 2004, 2003) can be applied. Serial Concatenation The basic idea behind EXIT charts is that extrinsic information provided by one constituent decoder is used in the turbo decoding process as a priori information for the other decoder. A graphical illustration of this process is obtained by drawing the curves of the inner and outer codes into one diagram and exchanging the abscissa and ordinate for the outer code. This leads to an EXIT chart like the one depicted in Figure 3.46. It shows the curves of an outer NSC code punctured to Rc,1 = 3/4 and an inner RSC code with rate Rc,2 = 2/3 resulting in an overall code rate of Rc = 1/2. Be aware that the SNR Es /N0 = Rc Eb /N0 at the input of the inner decoder is adjusted according to the overall code rate Rc and not to the individual rate Rc,2 . The bold solid curve of the outer code does not depend on the SNR, whereas the dashed curves of the inner code strongly depend on the SNR. It can be seen that the curves touch for Eb /N0 = 1 dB (!). For larger Eb /N0 , for example, solid line with pentagon for Eb /N0 = 1.2 dB, the dashed curves lie above the solid one and a gap opens through which
158
FORWARD ERROR CORRECTION CODING 1
Eb /N0 = −1dB Eb /N0 = ˆ 0 dB Eb /N0 = ˆ 1 dB Eb /N0 = ˆ 1.2 dB Eb /N0 = ˆ 2 dB Eb /N0 = ˆ 3 dB outer decoder
I¯e(2) = I¯a(1) →
0.8 0.6 0.4 0.2 0 0
0.2
0.4
I¯a(2)
=
0.6
I¯e(1)
0.8
1
→
Figure 3.46 EXIT chart for serial concatenation of outer NSC code (Lc,1 = 4, Rc,1 = 3/4) and inner RSC code (Lc,2 = 4, Rc,2 = 2/3) as used in Figure 3.35, AWGN channel, and Log-MAP decoding 1
I¯e(2) = I¯a(1) →
0.8 0.6 0.4 0.2 0 0
0.2
0.4 0.6 I¯a(2) = I¯e(1) →
0.8
1
Figure 3.47 EXIT chart for serial concatenation of outer NSC and inner RSC codes as used in Figure 3.35, AWGN channel with Eb /N0 = 1.2 dB, and MAP decoding the decoding process can converge. Only in this case the a priori information leads to an increased extrinsic information so that reliability is improved in each iteration. Figure 3.47 illustrates the convergence of the turbo decoding process for an SNR of 1.2 dB. The squares indicate the mutual extrinsic information I¯e,2 = I¯a,1 at the output of the inner decoder and the pentagons, the mutual extrinsic information I¯e,1 = I¯a,2 at the output of the outer decoder. Obviously, the inner decoder starts without any a priori (1) (1) information (I¯a,2 = 0) and delivers I¯e,2 ≈ 0.58. This extrinsic information is exactly the a (1) (1) ¯ priori information Ia,1 of the outer decoder that provides I¯e,1 = 0.07. This becomes the a
FORWARD ERROR CORRECTION CODING
159
(2) ≈ 0.61, and priori information of the inner decoder in the second iteration that delivers I¯e,2 so on. Owing to the tunnel between the boundary lines, the mutual extrinsic information always increases from iteration to iteration and finally approaches I¯e,1 = I¯e,2 = 1. If the boundaries intersect before the point (I¯a , I¯e ) = (1, 1), the trajectory cannot proceed, the iterative decoding process gets stuck, and the convergence is lost. The larger the gap, the less iterations needed until convergence, but the larger also the gap to Shannon’s channel capacity. Hence, for large interleavers, EXIT charts represent a suitable means of predicting the convergence behavior for iterative decoding of concatenated codes. Moreover, it is possible to design codes whose decoding trajectories match each other so that a very narrow tunnel leads to the point (I¯a , I¯e ) = (1, 1). In ten Brink (2000b), a serially concatenated code is presented that approaches Shannon’s channel capacity within a gap of 0.1 dB. Figure 3.48 shows the results obtained with the Max-Log-MAP decoder. Astonishingly, the boundary lines do not tightly predict the true behavior of the turbo decoder for Eb /N0 = 1.8 dB. Although the decoder slowly converges to (I¯a , I¯e ) = (1, 1), the boundaries predict a faster convergence. For smaller Eb /N0 , convergence cannot be achieved anymore, although predicted. The reason for this discrepancy is the assumption that each decoder is assumed to obtain true LLRs. In fact, they are only provided with approximations of LLRs delivered by suboptimum Max-Log-MAP decoders. At higher SNRs Eb /N0 , for example, 2 dB, the true decoding behavior matches again the boundaries’ prediction. Hence, EXIT charts based on the Max-Log-MAP algorithm are not suited for the determination of the minimum required Eb /N0 in order to ensure convergence.
Parallel Concatenation The analysis of parallel code concatenations is depicted in Figure 3.49. The left diagram shows the results for symbol-by-symbol MAP decoding. Obviously, the prediction tightly b) Eb /N0 = 2 dB 1
0.8
0.8
I¯e(2) = I¯a(1) →
I¯e(2) = I¯a(1) →
a) Eb /N0 = 1.8 dB 1
0.6 0.4 0.2 0 0
0.6 0.4 0.2
0.2
0.4
0.6
I¯a(2) = I¯e(1) →
0.8
1
0 0
0.2
0.4 0.6 0.8 I¯a(2) = I¯e(1) →
1
Figure 3.48 EXIT chart for serial concatenation of outer NSC and inner RSC codes as used in Figure 3.35, AWGN channel, and Max-Log-MAP decoding
160
FORWARD ERROR CORRECTION CODING b) Max-Log-MAP decoding 1
0.8
0.8
I¯e(2) = I¯a(1) →
I¯e(2) = I¯a(1) →
a) MAP decoding 1
0.6 0.4 0.2 0 0
0.6 0.4 0.2
0.2
0.4
0.6
0.8
1
I¯a(2) = I¯e(1) →
0 0
0.2
0.4 0.6 0.8 (2) ¯ Ia = I¯e(1) →
1
Figure 3.49 EXIT chart for parallel concatenation of two RSC codes as used in Figure 3.39 (P1), AWGN channel with Eb /N0 = 1 dB
matches the true turbo decoding behavior. In Figure 3.49b, it can be seen that the turbo decoder gets stuck when the Max-Log-MAP algorithm is applied although the boundaries predict convergence. As already explained for the serial concatenation, the reason for this failure is that the assumptions made for the EXIT charts analysis are not fulfilled when employing the Max-Log-MAP decoder. One assumption was that LLRs represent the extrinsic information that is exchanged between the decoders. The Max-Log-MAP does not deliver exact LLRs but only approximations (see page 121) and the predicted convergence cannot be achieved. In order to verify this explanation, Figure 3.50a shows the results for Max-Log-MAP decoding if the a priori information is provided by an additionally introduced MAP decoder. We recognize that convergence is obtained. On the contrary, Figure 3.50b shows the results for MAP decoding when the a priori information is provided by a Max-Log-MAP decoder. In this case, the iterative decoding scheme does not converge. From this, we can conclude that for low SNRs when the boundaries come close to each other, optimum MAP decoding is crucial because the extrinsic information delivered by the Max-Log-MAP decoder leads to a significant loss, avoiding convergence.
3.7 Low-Density Parity Check (LDPC) Codes 3.7.1 Basic Definitions and Encoding We saw from the previous section that concatenated codes exhibit an amazing performance close to the capacity limit stated by Shannon. Another class of capacity-achieving codes are LDPC codes. Although they are not considered anymore in subsequent chapters, they will be briefly introduced here because of their practical importance. A comprehensive overview can be found in Chung et al. (2001), Forney (2001), Gallager (1963), Lin and Costello
FORWARD ERROR CORRECTION CODING
161 b) 1
0.8
0.8
I¯e(2) = I¯a(1) →
I¯e(2) = I¯a(1) →
a) 1
0.6 0.4 0.2 0 0
0.6 0.4 0.2
0.2
0.4
0.6
I¯a(2) = I¯e(1) →
0.8
1
0 0
0.2
0.4 0.6 0.8 (2) ¯ Ia = I¯e(1) →
1
Figure 3.50 EXIT chart for parallel concatenation of two RSC codes as used in Figure 3.39, AWGN channel and with Eb /N0 = 1 dB, a) Max-Log-MAP decoding with a priori information by MAP decoder, b) MAP decoding with a priori information by Max-Log-MAP decoder (2004), MacKay (1999), Richardson et al. (2001), Shokrollahi (2002), Tanner (1981) and references therein. LDPC codes were first discovered in 1963 by Gallager (1963). However, they have been forgotten for quite a long time, probably because encoding and decoding were too complex for practical implementations at that time. Their latest rediscovery took place after the invention of turbo codes in 1993, making the turbo decoding principle very popular. As LDPC codes show a performance close to the capacity limit, it is natural to compare them with concatenated codes. First, their block error rate performance is generally better than that of concatenated codes. Second, the flattening of the BER curve that can be observed for turbo codes appears at much lower error rates for LDPC codes. Finally, their decoding is not based on a trellis diagram which would cause infeasible computational costs for good LDPC codes. LDPC codes belong to the general class of linear block codes and can be described by parity check and generator matrices. Their definition is based on the parity check matrix H already defined in Section 3.2. Principally, regular and irregular codes are distinguished. The parity check matrices of regular LDPC codes, which have been defined by Gallager (1963), have an identical number of u ones in each row and v ones in each column. On the contrary, u and v can vary from row to row or from column to column for irregular codes (Luby et al. 2001; Richardson et al. 2001). When appropriately designed, irregular codes are superior to regular codes, especially at high code rates. For example, Chung et al. (2001) constructed irregular LDPC codes operating at only 0.0045 dB from the Shannon limit. Next we give the definition of regular LDPC codes introduced by Gallager (1963) as well as some comments on random and irregular LDPC codes. Afterwards, a technique for the low-complexity encoding of LDPC codes is presented.
162
FORWARD ERROR CORRECTION CODING
Regular LDPC Codes by Gallager Definition 3.7.1 (LDPC codes) A regular LDPC code is defined by its parity check matrix H generally consisting of n rows and J columns. It has the following properties: 1. Each row has the same number of u ones. 2. Each column has the same number of v ones. 3. At most a single one is allowed to be at a common position between two arbitrary columns. 4. The parameters u and v should be small compared to the code length n. The density of an LDPC code is defined as the ratio of the number of ones and the total number of elements in H and amounts to ρ=
v u = J n
(3.143)
for regular codes. Note that the number of columns J may be different from n − k as defined in Section 3.2. The number of independent columns in H represents the dimension of the null space ⊥ and, therefore, also the dimension of the code . A variety of ways exist to construct a parity check matrix for LDPC codes. Many of them are based on Euclidean and finite geometries (Lin and Costello 2004). We will shortly review the procedure proposed by Gallager in his original work.17 The matrix H consisting of n rows and J columns is set up of u submatrices Hi each of size n × J /u. H = H1
H2
· · · Hu .
(3.144)
The first matrix H1 consists of v consecutive ones in each column that are located at nonoverlapping positions ( − 1)v + 1 · · · v in column with 1 ≤ ≤ J /u. Hence, there exists only a single one in each row as can be seen in (3.145). 1 .. . 1 H1 = 1 .. . 1
..
(3.145)
.
The remaining submatrices are obtained by permuting the elements in each column. These permutations have to be carried out such that the properties in Definition 3.7.1 are fulfilled. 17 Be aware that the parity check matrix H is defined differently as in many text books. The number of rows equals n instead of the number of columns.
FORWARD ERROR CORRECTION CODING
163
Random Code Construction A different method is to select a parity check matrix quasi-randomly. In this case, H is constructed column by column. At each step, a new column that has not been used or rejected before is selected randomly. The following enumeration briefly describes the procedure. 1. Randomly select a new column that has not been accepted or rejected before. 2. If the new column has more than a single ‘1’ in common with the old columns, it is rejected. 3. Construct a new intermediate matrix H by appending the new column to the old columns. If weight constraints of Definition 3.7.1 are violated, reject column, otherwise accept it. 4. Continue procedure until J columns have been found. Randomly constructed codes have a couple of disadvantages. The encoding process suffers from a lack of structure and can become computationally expensive. Moreover, it has been shown that decoding these codes may require a lot of iterations and they are generally not suited for majority logic decoding (see next subsection). However, very long randomly selected codes perform very close to Shannon’s limit as demonstrated in Chung et al. (2001). Regarding the effort to be spent for the code construction, both approaches, Gallager’s permutations as well as the random construction require extensive computerbased search routines. Irregular LDPC Codes As mentioned above, irregular codes have a parity check matrix with a varying number of ones in rows and columns. The number of ones in a row or column is called the degree (see also Subsection 3.7.2). Their performance highly depends on the degree distribution for rows and columns. For example, many degree-2-nodes will cause a small minimum Hamming distance, leading to a poor asymptotic error rate performance at high SNRs. Irregular LDPC codes can outperform their regular counterparts, especially at low SNRs. LDPC Encoding Looking at the encoding process, we can try to find the generator matrix G corresponding to H. Since the linearly independent columns of H span the space of the dual code ⊥ (see page 97), determining G is equivalent to finding the null space of H. The dimensionality of the null space determines the dimension of the code and, thus, the number of information bits per codeword k. If J = n − k holds, H can be transformed by Gaussian elimination T into a systematic form Hs = −P In−k . This allows the construction of Gs according to (3.7) and (3.8) on page 96. Once the systematic generator matrix has been determined, all transformations applied to force H into a systematic shape have to be applied to Gs in reverse order to obtain the final generator matrix G. While H itself is a sparse matrix with only a few nonzero elements, this is not necessarily true for G. Unfortunately, a high density of G implies many modulo-2-sums in the encoding process, leading generally to a complexity of O(n2 ). Moreover, G may become quite
164
FORWARD ERROR CORRECTION CODING
a)
b)
k
k
1 n−k
1
1
g
n
1
1
1 1
n−k−g
0 n−k
n
1
0
1
1
n−k−g
1 g
Figure 3.51 Parity check matrix for efficient systematic encoding of LDPC codes (light gray area: sparse matrix part, dark gray area: nonsparse part)
large for long powerful LDPC codes so that encoding becomes demanding in terms of memory and computational costs. Efficient encoding techniques that are mainly based on the transformation of H into a triangular form are derived by Richardson and Urbanke (2005) and will be briefly explained now. The basic idea for low-complexity encoding is based on the fact that systematic encoders generating codewords of the form bT = dT pT only have to determine the n − k parity check bits in p instead of all n code bits. Assuming linear independent columns in H, that is, J = n − k, we know from Section 3.2 that the corresponding parity check matrix is composed of a parity part P and an identity matrix In−k . Relaxing this constraint may lead to a structure as depicted in Figure 3.51a. Obviously, H does not have a purely systematic form. However, if the elements in the lower left triangular are zero, the parity bits pj , 1 ≤ j ≤ n − k, can be recursively calculated. For the left most column, the element k + 1 has to be nonzero because it is associated with the first parity bit p1 =
k
Hi,1 · di
(3.146a)
i=1
depending only on the information bits di with 1 ≤ i ≤ k. For the remaining parity bits associated with all subsequent columns on the right, pj =
k i=1
Hi,j · di ⊕
k+j −1
Hi,j · pi−k
(3.146b)
i=k+1
with j > 1 holds. Hence, each parity bit depends on the information word and the parity bits determined earlier. Unfortunately, the parity check matrix is no longer sparse when transformed in this form. Richardson and Urbanke (2005) presented a solution that approximates the triangular structure of H as depicted in Figure 3.51b. A nonsparse dark gray
FORWARD ERROR CORRECTION CODING
165
area whose width g has to be minimized is allowed, while the rest of the matrix is sparse. With this approach, a codeword is split into three parts, the information part d and two parity parts denoted by q and p, that is, bT = dT qT pT holds. The encoding process now consists of two steps. First, the coefficients of q are determined by q1 =
k
Hi,n−k−g+1 · di
(3.147a)
i=1
and qj =
k
Hi,n−k−g+j · di ⊕
i=1
k+j −1
Hi,n−k−g+j · qi−k
(3.147b)
i=k+1
with 2 ≤ j ≤ g. Their calculation is based on the nonsparse part of H. Next, the bits of p can be determined according to p1 =
k
Hi,1 · di ⊕
i=1
and pj =
k i=1
Hi,j · di ⊕
k+g i=k+1
k+g
Hi,1 · qi−k
(3.148a)
i=k+1
Hi,j · qi−k ⊕
j +k+g−1
Hi,j · pi−k−g
(3.148b)
i=k+g+1
with 2 ≤ j ≤ n − k − g. Richardson and Urbanke (2001) showed that the modification of the parity check matrix leads to a low-complexity encoding process and shows nearly no performance loss.
3.7.2 Graphical Description Graphs are a very illustrative way of describing LDPC codes. We will see later that the graphical representation allows an easy explanation of the decoding process for LDPC codes. Generally, graphs consist of vertices (nodes) and edges connecting the vertices (Lin and Costello 2004; Tanner 1981). The number of connections of a node is called its degree. Principally, cyclic and acyclic graphs can be distinguished when the latter type does not possess any cycles or loops. The girth of a graph denotes the length of its shortest cycle. Generally, loops cannot be totally avoided. However, at least short cycles of length four should be avoided because they lead to poor distance properties and, thus, asymptotically weak codes. Finally, a bipartite graph consists of two disjoint subsets of vertices where edges only connect vertices of different subsets but no vertices of the same subset. These bipartite graphs will now be used to illustrate LDPC codes graphically. Actually, graphs are graphical illustrations of parity check matrices. Remember that the J columns of H represent parity check equations according to s = HT ⊗ r in (3.12), that is, J check sums between certain sets of code bits are calculated. We now define two sets of vertices. The first set V comprises n variable nodes each of them representing exactly one received code bit rν . These nodes are connected via edges with the elements of the second
166
FORWARD ERROR CORRECTION CODING r1
r2
r3
r4
r5
r6
V
P s1
s2
s3
Figure 3.52 Bipartite Tanner graph illustrating the structure of a regular code of length n=6 set P containing J check nodes representing the parity check equations. A connection between variable node i and check node j exists if Hi,j = 1 holds. On the contrary, no connection exists for Hi,j = 0. The parity check matrix of regular LDPC codes has u ones in each row, that is, each variable node is of degree u and connected by exactly u edges. Since each column contains v ones, each check node has degree v, that is, it is linked to exactly v variable nodes. Following the above partitioning, we obtain a bipartite graph also termed Tanner or factor graph as illustrated in Figure 3.52. Certainly, the code in our example does not fulfill the third and the fourth criteria of Definition 3.7.1. Moreover, its graph contains several cycles from which the shortest one is emphasized by bold edges. Its length and, therefore, the girth of this graph amounts to four. If all the four conditions of the definition by Gallager were fulfilled, no cycles of length four would occur. Nevertheless, the graph represents a regular code of length n = 6 because all variable nodes are of degree two and all check nodes have the degree four. The density of the corresponding parity check matrix 1 1 0 1 0 1 0 1 1 H= 1 0 1 1 1 0 0 1 1 amounts to ρ = 4/6 = 2/3. We can see from Figure 3.51 and the above parity check matrix that the fifth code bit is checked by the first two sums and that the third check sum comprises the code bits b2 , b3 , b4 , and b6 . These positions form the set P3 = {2, 3, 4, 6}. Since they correspond to the nonzero elements in the third column of H, the set is also termed support of column three. Similarly, the set V2 = {1, 3} belongs to variable node two and contains all check nodes it is connected with. Equivalently, it can be called support of row two. Such sets are defined for all nodes of the graph and used in the next subsection for explaining the decoding principle.
FORWARD ERROR CORRECTION CODING
167
3.7.3 Decoding of LDPC Codes One-Step Majority Logic Decoding Decoding LDPC codes by looking at a rather old-fashioned algorithm, namely, one-step majority logic decoding is discussed. The reason is that this algorithm can be used as a final stage if the message passing decoding algorithm, that will be introduced subsequently, fails to deliver a valid codeword. One-step majority logic decoding belongs to the class of hard decision decoding algorithms, that is, hard decided channel outputs are processed. The basic idea behind this decoding algorithm is that we have a set of parity check equations and that each code bit is probably protected by more than one of these check sums. Taking our example of the last subsection, we get xˆ1 ⊕ xˆ2 ⊕ xˆ4 ⊕ xˆ5 = 0 xˆ1 ⊕ xˆ3 ⊕ xˆ5 ⊕ xˆ6 = 0 xˆ2 ⊕ xˆ3 ⊕ xˆ4 ⊕ xˆ6 = 0. Throughout this chapter, it is assumed that the coded bits bν , 1 ≤ ν ≤ n, are modulated onto antipodal symbols xν using BPSK. At the matched filter output, the received symbols rν are hard decided delivering xˆν = sign(rν ). The vector xˆ comprising all these estimates can be multiplied from the left-hand side with HT , yielding the syndrome s. Each element in s belongs to a certain column of H and represents the output of the corresponding check sum. Looking at a certain code bit bν , it is obvious that all parity check equations incorporating xˆν may contribute to its decision. Resolving the above equations with respect to xˆν=2 , we obtain for the first and the third equations xˆ2 = xˆ1 ⊕ xˆ4 ⊕ xˆ5 xˆ2 = xˆ3 ⊕ xˆ4 ⊕ xˆ6 . Both equations deliver a partial decision on the corresponding code bit c2 . Unfortunately, xˆ4 contributes to both equations so that these intermediate results will not be mutually independent. Therefore, a simple combination of both partial decisions will not deliver the optimum solution whose determination will be generally quite complicated. For this reason, one looks for sets of parity check equations that are orthogonal with respect to the considered bit bν . Orthogonality means that all columns of H selected for the detection of the bit bν have a one at the νth position, but no further one is located at the same position in more than one column. This requirement implies that each check sum uses disjoint sets of symbols to obtain an estimate bˆν . Using such an orthogonal set, the resulting partial decisions are independent of each other and the final result is obtained by simply deciding in favor of the majority of partial results. This explains the name majority logic decoding. Message Passing Decoding Algorithms Instead of hard decision decoding, the performance can be significantly enhanced by using the soft values at the matched filter output. We now derive the sum-product algorithm also known as message passing decoding algorithm or believe propagation algorithm (Forney 2001; Kschischang et al. 2001). It represents a very efficient iterative soft-decision decoding
168
FORWARD ERROR CORRECTION CODING L(˜r2 | b2 )
L(˜r1 | b1 )
L(˜r4 | b4 ) Le,j
(bˆ4 )
L(µ) (bˆ4 ) − Le,j
(bˆ4 )
(µ−1)
L(µ) (bˆ2 )
−
(µ−1) Le,j (bˆ2 )
(µ−1)
Le,j
(µ−1) ˆ Le,j (b2 )
(bˆ1 )
(µ−1) L(µ) (bˆ1 ) − Le,j (bˆ1 )
(µ−1)
sj Figure 3.53 Illustration of message passing algorithm algorithm approaching the maximum likelihood solution at least for acyclic graphs. Message passing algorithms can be described using conditional probabilities as in the case of the BCJR algorithm. Since we consider only binary LDPC codes, log-likelihood values will be used, resulting in a more compact derivation. Decoding based on a factor graph as illustrated in Figure 3.53 starts with an initialization of the variable nodes. Their starting values are the matched filter outputs appropriately weighted to obtain the LLRs L(0) (bˆi ) = L(˜ri | bi ) = Lch · r˜i
(3.149)
(see Section 3.4). These initial values indicated by the iteration superscript (0) are passed to the check nodes via the edges. An arbitrary check node sj corresponds to a modulo2-sum of connected code bits bi ∈ Pj . Resolving this sum with respect to a certain bit bi = ν∈Pj \{i} bν delivers extrinsic information Le (bˆi ). Exploiting the L-Algebra results of Section 3.4, the extrinsic log-likelihood ratio for the j th check node and code bit bi becomes 1 + ν∈Pj \{i} tanh(L(0) (bˆν )/2) (0) ˆ . (3.150) Le,j (bi ) = log 1 − ν∈Pj \{i} tanh(L(0) (bˆν )/2) The extrinsic LLRs are passed via the edges back to the variable nodes. The exchange of information between variable and check nodes explains the name message passing decoding. Moreover, since each message can be interpreted as a ‘belief’ in a certain bit, the algorithm is often termed belief propagation decoding algorithm. If condition three in Definition 3.7.1 is fulfilled, the extrinsic LLRs arriving at a certain variable node are independent of each other and can be simply summed. If condition three is violated, the extrinsic LLRs are not independent anymore and summing them is only an approximate solution. We obtain a new estimate of our bit (µ−1) Le,j (bˆi ) (3.151) L(µ) (bˆi ) = Lch · r˜i + j ∈ Vi
where µ = 1 denotes the current iteration. Now, the procedure is continued, resulting in an iterative decoding algorithm. The improved information at the variable nodes is passed again
FORWARD ERROR CORRECTION CODING
169
to the check nodes. Attention has to be paid that extrinsic information Le,j (bˆi ) delivered by check node j will not return to its originating node. For µ ≥ 1, we obtain / 0 (µ−1) 1 + ν∈Pj \{i} tanh L(µ) (bˆν ) − Le,j (bˆν ) /2 (µ) / Le,j (bˆi ) = log (3.152) 0. ˆν ) /2 1 − ν∈Pj \{i} tanh L(µ) (bˆν ) − L(µ−1) ( b e,j (µ)
After each full iteration, the syndrome can be checked (hard decision). If it equals 0, the algorithm stops, otherwise it continues until an appropriate stopping criterion such as the maximum number of iterations applies. If the sum-product algorithm does not deliver a valid codeword after the final iteration, the one-step majority logic decoder can be applied to those bits which are still pending. The convergence of the iterative algorithm highly depends on the girth of the graph, that is, the minimum length of cycles. On the one hand, the girth must not be too small for efficient decoding; on the other hand, a large girth may cause small minimum Hamming distances, leading to a worse asymptotic performance. Moreover, the convergence is also influenced by the row and column weights of H. To be more precise, the degree distribution of variable and check nodes affects the message passing algorithm very much. Further information can be found in Forney (2001), Kschischang et al. (2001), Lin and Costello (2004), Richardson et al. (2001). Complexity In this short analysis concerning the complexity, we assume a regular LDPC code with u ones in each row and v ones in each column of the parity check matrix. At each variable node, 2u · I additions of extrinsic LLRs have to be carried out per iteration. This includes the subtractions in the tanh argument of (3.152). At the check nodes, v − 1 calculations of the tanh function and two logarithms are required per iteration assuming that the logarithm is applied separately to the numerator and denominator with subsequent subtraction. Moreover, 2v − 3 multiplications and 3 additions have to be performed. This leads to Table 3.3.
3.7.4 Performance of LDPC Codes Finally, some simulation results concerning the error rate performance of LDPC codes are presented. Figure 3.54 shows the BER evolution with increasing number of decoding iterations. Significant gains can be observed up to 15 iterations, while further iterations only lead to marginal additional improvements. The BER of 10−5 is reached at an SNR of 1.4 dB. This is 2 dB apart from Shannon’s channel capacity lying at −0.6 dB for a code rate of Rc = 0.32. Table 3.3 Computational costs for message passing decoding algorithm type additions log and tanh multiplications
number per iteration 2u · n + 3 · J (v + 1) · J (2v − 3) · J
170
FORWARD ERROR CORRECTION CODING 0
10
−1
10
BER →
#1 −2
10
−3
10
#10 #15
−4
10
0
1
#5
2 3 4 Eb /N0 in dB →
5
6
Figure 3.54 BER performance of irregular LDPC code of length n = 29507 with k = 9507 for different iterations and AWGN channel (bold line: uncoded system)
10
BER →
10 10 10 10
0
uncoded LDPC PC3 SC3
−1
−2
−3
−4
0
1 2 Eb /N0 in dB →
3
Figure 3.55 BER performance of irregular LDPC code of length n = 20000 as well as serially and parallel concatenated codes, both of length n = 12000 from Tables 3.1 and 3.2 for AWGN channel (bold line: uncoded system) Next, Figure 3.55 compares LDPC codes with serially and parallel concatenated convolutional codes known from Section 3.6. Obviously, The LDPC code performs slightly worse than the turbo code PC3 and much better than the serial concatenation SC3. This comparison is only drawn to illustrate the similar behavior of LDPC and concatenated convolutional codes. Since the lengths of the codes are different and no analysis was made with respect to the decoding complexity, these results cannot be generalized. The frame error rates for the half-rate LDPC code of length n = 20000 are depicted in Figure 3.56. The slopes of the curves are extremely steep indicating that there may be a
FORWARD ERROR CORRECTION CODING
171
0
FER →
10
−1
10
#15 #20
#10
−2
10
0
0.5
1 1.5 2 Eb /N0 in dB →
2.5
3
Figure 3.56 Frame error rate performance of irregular LDPC code of length n = 20000 with rate Rc = 0.5 for different iterations and AWGN channel cliff above which the transmission becomes rapidly error free. Substantial gains in terms of Eb /N0 can be observed for the first 15 iterations.
3.8 Summary This third chapter gave a survey of error control coding schemes. Starting with basic definitions, linear block codes such as repetition, single parity check, Hamming, and Simplex codes have been introduced. They exhibit a rather limited performance being far away from Shannon’s capacity limits. Next, convolutional codes that are widely used in digital communication systems have been explained. A special focus was put on their graphical illustration by the trellis diagram, the code rate adaptation by puncturing, and the decoding with the Viterbi algorithm. Moreover, recursive convolutional codes were introduced because they represent an important ingredient for code concatenation. Principally, the performance of convolutional codes is enhanced with decreasing code rate and growing constraint length. Unfortunately, large constraint lengths correspond to high decoding complexity, leading to practical limitations. In Section 3.4, soft-output decoding algorithms were derived because they are required for decoding concatenated codes. After introducing the L-Algebra with the definition of LLRs as an appropriate measure of reliability, a general soft-output decoding approach as well as the trellis-based BCJR algorithm have been derived. Without these algorithms, most of today’s concatenated coding schemes would not work. For practical purposes, the suboptimal but less complex Max-Log-MAP algorithm was explained. Section 3.5 evaluated the performance of error-correcting codes. Since the minimum Hamming distance only determines the asymptotic behavior of a code at large SNRs, the complete distance properties of codes were analyzed with the IOWEF. This function was used to calculate the union upper bound that assumes optimal MLD. The union bound tightly predicts the error rate performance for medium and high SNRs, while it diverges at low
172
FORWARD ERROR CORRECTION CODING
SNR. Finally, IPCs have been introduced. This technique exploits information theoretical measures such as the mutual information and considers specific decoding algorithms that do not necessarily fulfill the maximum likelihood criterion. In the last two sections, capacity approaching codes were presented. First, serially and parallel concatenated codes also known as turbo codes were derived. We started looking at their Hamming distance properties. Basically, concatenated codes do not necessarily have large minimum Hamming distances. However, codewords with low weight occur very rarely, especially for large interleaver lengths. The application of the union bound illuminated some design guidelines concerning the choice of the constituent codes and the importance of the interleaver. Principally, the deployment of recursive convolutional codes ensures that the codes’ error rate performance increases with growing interleaver length. Since the ML decoding of the entire concatenated code is infeasible, an iterative decoding concept also termed turbo decoding was explained. The convergence of the iterative scheme was analyzed with the EXIT charts technique. Last but not least, LDPC codes have been introduced. They show a performance similar to that of concatenated convolutional codes.
4
Code Division Multiple Access In Section 1.1.2 different multiple access techniques were introduced. Contrary to time and (FDMA) frequency division multiple access schemes, each user occupies the whole time-frequency domain in (CDMA) code division multiple access systems. The signals are separated with spreading codes that are used for artificially increasing the signal bandwidth beyond the necessary value. Despreading can only be performed with knowledge of the employed spreading code. For a long time, CDMA or spread spectrum techniques were restricted to military applications. Meanwhile, they found their way into mobile radio communications and have been established in several standards. The IS95 standard (Gilhousen et al. 1991; Salmasi and Gilhousen 1991) as a representative of the second generation mobile radio system in the United States employs CDMA as well as the third generation Universal Mobile Telecommunication System (UMTS) (Holma and Toskala 2004; Toskala et al. 1998) and IMT2000 (Dahlman et al. 1998; Ojanper¨a and Prasad 1998a,b) standards. Many reasons exist for using CDMA, for example, spread spectrum signals show a high robustness against multipath propagation. Further advantages are more related to the cellular aspects of communication systems. In this chapter, the general concept of CDMA systems is described. Section 4.1 explains the way of spreading, discusses the correlation properties of spreading codes, and demonstrates the limited performance of a single-user matched filter (MF). Moreover, the differences between principles of uplink and downlink transmissions are described. In Section 4.2, the combination of OFDM (Orthogonal Frequency Division Multiplexing) and CDMA as an example of multicarrier (MC) CDMA is compared to the classical single-carrier CDMA. A limiting factor in CDMA systems is multiuser interference (MUI). Treated as additional white Gaussian noise, interference is mitigated by strong error correction codes in Section 4.3 (Dekorsy 2000; K¨uhn et al. 2000b). On the contrary, multiuser detection strategies that will be discussed in Chapter 5 cancel or suppress the interference (Alexander et al. 1999; Honig and Tsatsanis 2000; Klein 1996; Moshavi 1996; Schramm and M¨uller 1999; Tse and Hanly 1999; Verdu 1998; Verdu and Shamai 1999). Finally, Section 4.4 presents some information on the theoretical results of CDMA systems. Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
174
CODE DIVISION MULTIPLE ACCESS
4.1 Fundamentals 4.1.1 Direct-Sequence Spread Spectrum The spectral spreading inherent in all CDMA systems can be performed in several ways, for example, frequency hopping and chirp techniques. The focus here is on the widely used direct-sequence (DS) spreading where the information bearing signal is directly multiplied with the spreading code. Further information can be found in Cooper and McGillem (1988), Glisic and Vucetic (1997), Pickholtz et al. (1982), Pickholtz et al. (1991), Proakis (2001), Steele and Hanzo (1999), Viterbi (1995), Ziemer and Peterson (1985). For notational simplicity, the explanation is restricted to a chip-level–based system model as illustrated in Figure 4.1. The whole system works at the discrete chip rate 1/Tc and the channel model from Figure 1.12 includes the impulse-shaping filters at the transmitter and the receiver. Certainly, this implies a perfect synchronization at the receiver. For the moment, though restricted to an uncoded system the description can be easily extended to coded systems as is done in Section 4.2. The generally complex-valued symbols a[] at the output of the signal mapper are multiplied with a spreading code c[, k]. The resulting signal x[k] =
a[] · c[, k] with c[, k] =
± √1N
s
0
for Ns ≤ k < ( + 1)Ns else
(4.1)
has a chip index k that runs Ns times faster than the symbol index . Since c[, k] is nonzero only in the interval [Ns , ( + 1)Ns ], spreading codes of consecutive symbols do not overlap. The spreading factor Ns is often termed processing gain Gp and denotes the number of chips c[, k] multiplied with a single symbol a[]. In coded systems, Gp also includes the code rate Rc and, hence, describes the ratio between the durations of an information bit (Tb ) and a chip (Tc ) Gp =
Tb Ts Ns = = . Tc Rc · Tc Rc
(4.2)
This definition is of special interest in systems with varying code rates and spreading factors, as discussed in Section 4.3. The processing gain describes the ability to suppress interfering signals. The larger the Gp , the higher is the suppression. c[, k]
n[k]
k a[]
y[k] h[k, κ]
x[k]
matched filter
k
Figure 4.1 Structure of direct-sequence spread spectrum system
r[]
CODE DIVISION MULTIPLE ACCESS
175
Owing to their importance in practical systems, the following description to binary √ spreading sequences is restricted, that is, the chips take the values ±1/ Ns . Hence, the signal-to-noise ratio (SNR) per chip is Ns times smaller than for a symbol a[] and Es /N0 = Ns · Ec /N0 holds. Since the local generation of spreading codes at the transmitter and the receiver has to be easily established, feedback shift registers providing periodical sequences are often used (see Section 4.1.4). Short codes and long codes are distinguished. The period of short codes equals exactly the spreading factor Ns , that is, each symbol a[] is multiplied with the same code. On the contrary, the period of long codes exceeds the duration of one symbol a[] so that different symbols are multiplied with different segments of a long sequence. For notational simplicity, short codes are referred to only unless otherwise stated. In Figure 4.1, spreading with short codes for Ns = 7 is illustrated by showing the signals a[], c[, k], and x[k]. Figure 4.2 shows the power spectral densities of a[] and x[k] for a spreading factor Ns = 4, an oversampling factor of w = 8, and rectangular pulses of the chips. Obviously, the densities have a (sin(x)/x)2 shape and the main lobe of x[k] is four times broader than that of a[]. However, the total power of both signals is still the same, that is spreading does not affect the signal’s power. Hence, the power spectrum density around the origin is larger for a[]. As we know from Section 1.2, the output of a generally frequency-selective channel is obtained by the convolution of the transmitted signal x[k] with the channel impulse response h[k, κ] and an additional noise term y[k] = x[k] ∗ h[k, κ] + n[k] =
L t −1
h[k, κ] · x[k − κ] + n[k].
(4.3)
κ=0
Generally, it can be assumed that the channel remains constant during one symbol duration. In this case, the channel impulse response h[k, κ] can be denoted by h[, κ] which will be used in the following derivation. Inserting the structure of the spread spectrum signal given
0
unspread spread
(f · Tsampl ) →
−10 −20 −30 −40 −50 −60 0
0.125
0.25 0.375 f · Tsampl →
0.5
Figure 4.2 Power spectral densities of original and spread signal for Ns = 4
176
CODE DIVISION MULTIPLE ACCESS
in (4.1) and exchanging the order of the two sums delivers y[k] =
L t −1
h[, κ] ·
=
a[] ·
L t −1
=
a[] · c[, k − κ] + n[k]
κ=0
h[, κ] · c[, k − κ] + n[k]
κ=0
a[] · s[, k] + n[k] with
s[, k] = c[, k] ∗ h[, k].
(4.4)
The convolution between the spreading code c[, k] and the channel impulse response is termed signature s[, k] and describes the effective channel including the spreading. Hence, the receive filter maximizing the SNR at its output has to be matched to the signature s[, k] also and not only to the physical channel impulse response. It inherently performs the despreading also. Next, the specific structures of the MF for frequency-selective and nonselective channels are explained in more detail. Matched Filter for Frequency-Nonselective Fading For the sake of simplicity, the discussion starts with the MF for frequency-nonselective channels represented by a signal coefficient h[]. Therefore, the signature reduces to s[, k] = h[] · c[, k] and the received signal becomes a[] · h[]c[, k] + n[k] = a[] · s[, k] + n[k]. (4.5) y[k] =
The MF that maximizes the SNR has the form gMF [, k] = s ∗ [, ( + 1)Ns − k].1 The convolution of y[k] with gMF [k] now yields rTc [k] =
(+1)N s −1 k =Ns
=
(+1)N s −1 k =Ns
y[k − k ] · gMF [, k ] !
#
a[] · s[, k − k ] + n[k − k ] · s ∗ [, ( + 1)Ns − k ].
Exchanging the order of the two sums and locating all terms independent of k in front of this sum leads to the chip rate filter output rTc [k] =
=
a[] ·
(+1)N s −1
s[, k − k ] · s ∗ [, ( + 1)Ns − k ] + nTc [k]
k =Ns
a[] · φSS [( + 1)Ns − k] + nTc [k].
1 For
simplicity, the normalization of gMF [, k] to unit energy has been dropped.
(4.6)
CODE DIVISION MULTIPLE ACCESS
177
In (4.6), nTc [k] denotes the noise contribution at the MF output and φSS [k] denotes the autocorrelation of the signature s[, k] which is defined by φSS [k] =
(+1)N s −1
s[, k + k ] · s ∗ [, k ]
k =Ns
= |h[]| · 2
(+1)N s −1
c[, k + k ] · c[, k ]
k =Ns
= |h[]|2 · φCC [k].
(4.7)
For frequency-nonselective channels, φSS [k] simply consists of the product of the channel coefficient’s squared magnitude and the spreading code’s autocorrelation function φCC [k]. Hence, the output of the MF is simply the correlation between y[k] and s[, k]. Naturally, the autocorrelation function has its maximum at the origin implying that the optimum sampling time with the maximum SNR for rTc [k] is k = ( + 1)Ns . According to (4.1), φCC [0] = 1 holds. Furthermore, the spreading code is restricted to one symbol duration Ts resulting in φCC [k] = 0 for |k| ≥ Ns . Hence, only one term of the outer sum contributes to the results and we obtain (+1)N s −1 . y[k ] · s ∗ [, k ] = |h[]|2 · a[] + n[]. ˜ r[] = rTc [k].k=(+1)Ns =
(4.8)
k =Ns
The MF delivers the original symbol a[] weighted with the squared magnitude |h[]|2 of the channel coefficient and disturbed by white Gaussian noise with zero mean and variance σ 2˜ = |h[]|2 σN2 . Since the signal-to-noise ratio N
SNR =
σA2 σ 2˜ N
= |h[]|2 ·
Es N0
is the same as that for narrow-band transmission, spread spectrum gives no advantage in single-user systems with flat fading channels. Matched Filter for Frequency-Selective Fading The broadened spectrum leads in many cases to a frequency-selective behavior of the mobile radio channel. For appropriately chosen spreading codes, no equalization is necessary and the MF is still a suited mean. The signature cannot be simplified as for flat fading channels so that the length of the signature now exceeds Ns samples, and successive symbols interfere. Correlating the received signal y[k] with the signature s[, k] yields after some manipulations r[] =
(+1)N s +Lt −1
s[, k]∗ · y[k]
k=Ns
=
L t −1 κ=0
∗
h[, Lt − 1 − κ] ·
(+1)N s +Lt −2 k=Ns +Lt −1
y[k − κ] · c[, k − Lt + 1].
(4.9)
178
CODE DIVISION MULTIPLE ACCESS y[k] Tc
Tc
Tc
c[, k − Lt + 1]
c[, k − Lt + 1]
c[, k − Lt + 1]
(+1)N s +Lt −2
(+1)N s +Lt −2
(+1)N s +Lt −2
k=Ns +Lt −1
k=Ns +Lt −1
k=Ns +Lt −1
h∗ [, Lt − 2]
h∗ [, Lt − 1]
h∗ [, 0]
L t −1
r[]
κ=0
Figure 4.3 Structure of Rake receiver as parallel concatenation of several correlators Implementing (4.9) directly leads to the well-known Rake receiver that was originally introduced by Price and Greene (1958). It represents the matched receiver for spread spectrum communications over frequency-selective channels. From Figure 4.3 we recognize that the Rake receiver basically consists of a parallel concatenation of several correlators also called fingers, each synchronized to a dedicated propagation path. The received signal y[k] is first delayed in each finger by 0 ≤ κ < Lt , then weighted with the spreading code (with a constant delay Lt − 1), and integrated over a spreading period. Notice that integration starts after Lt − 1 samples have been received, that is, even the most delayed replica h[, Lt − 1] · x[k − Lt + 1] is going to be sampled. Next, the branch signals are weighted with the complex conjugated channel coefficients and summed up. Therefore, the Rake receiver maximum ratio combines the propagation paths and fully exploits the diversity (see Section 1.5) provided by the frequency-selective channel. All components of the Rake receiver perform linear operations and their succession can be changed. This may reduce the computational costs of an implementation that depends on the specific hardware and the system parameters such as spreading factor, maximum delay, and number of Rake fingers. A possible structure is shown in Figure 4.4. The tapped delay line represents a filter matched only to the channel impulse response and not to the whole signature. We need only a single correlator at the filter output to perform the despreading. Next, we have to consider the output signal r[k] in more detail. Inserting y[k] = a[] · s[, k] + n[k]
into (4.9) yields r[] =
L t −1 κ=0
∗
h[, Lt − 1 − κ] ·
! k
#
a[ ] · s[ , k − κ] + n[k − κ] · c[, k − Lt + 1].
CODE DIVISION MULTIPLE ACCESS
179
y[k] Tc
Tc
h∗ [, Lt − 1]
Tc h∗ [, 0]
h∗ [, Lt − 2]
c[, k − Lt + 1] L t −1
(+1)N s +Lt −2
κ=0
k=Ns +Lt −1
r[]
Figure 4.4 Structure of Rake receiver as serial concatenation of channel matched filter and correlator Since the signatures s[, k] exceed the duration of one symbol, symbols at = ± 1 overlap with a[] and cause intersymbol interference (ISI). These signal parts are comprised in a term nISI [] so that in the following derivation we can focus on = . Moreover, the noise contribution at the Rake output is denoted by n[]. ˜ We obtain with s[, k] = h[, κ] · c[, k − κ] κ ˜ + a[] · r[] = n [] + n[] ISI
L t −1 L t −1 κ=0
·
(+1)N s +Lt −2
h[, Lt − 1 − κ]∗ · h[, κ ]
(4.10)
κ =0
c[, k − κ − κ ] · c[, k − Lt + 1].
k=Ns +Lt −1
The last sum in 4.10 represents again the autocorrelation φCC [, κ + κ − (Lt − 1)] of the spreading code c[, k]. The substitution κ → Lt − 1 − κ finally results in r[] = a[] ·
L t −1 L t −1 κ=0
h[, κ]∗ · h[, κ ] · φCC [, κ − κ] + nISI [] + n[] ˜
(4.11a)
κ =0
˜ = r a [] + r PCT + nISI [] + n[].
(4.11b)
We see from (4.11a) that the autocorrelation function of spreading codes influences the output of the Rake receiver. If it is impulse-like, that is, φCC [, κ] ≈ 0 for κ = 0, each branch of the Rake receiver extracts exactly one propagation path and suppresses the other interfering signal components. More precisely, the first (left) finger extracts the path with the largest delay (h[, Lt − 1]) because we start integrating at k = Ns + Lt − 1 while the last (right) finger detects the path with the smallest delay corresponding to h[, 0]. Owing to this temporal reversion, all signal components are summed synchronously and the output of the Rake receiver consists of four parts as stated in (4.11b). The first term r a [] =
L t −1
|h[, κ]|2 · a[]
(4.12)
κ=0
obtained for κ = κ combines the desired signal parts transmitted over different propagation paths according to the maximum ratio combining (MRC) principle.2 This maximizes the 2 Compared
to (1.104), the normalization with
Lt −1 κ=0
|h[, κ]|2 was neglected.
180
CODE DIVISION MULTIPLE ACCESS
SNR and delivers an Lt -fold diversity gain. The second term r
PCT
[] =
L t −1 κ=0
a[] ·
L t −1
h∗ [, κ]h[, κ ] · φCC [, κ − κ ]
(4.13)
κ =0 κ =κ
represents path crosstalk between different Rake fingers caused by imperfect autocorrelation properties of the √ spreading code.3 For random spreading codes and rectangular chip impulses, φCC [, κ] ≈ 1/Ns holds for κ > 0 and a large spreading factor Ns . Hence, the power of asynchronous signal components is attenuated by the factor 1/Ns . Path crosstalk can be best suppressed for spreading codes with impulse-like autocorrelation functions. It has to be mentioned that the Rake fingers need not be separated by fixed time delays as depicted in Figure 4.3. Since they have to be synchronized onto the channel taps – which are not likely to be spaced equidistantly – the Rake fingers are individually delayed. This requires appropriate synchronization and tracking units at the receiver. Nevertheless, the Rake receiver collects the whole signal energy of all multipath components and maximizes the SNR. Figure 4.5 shows the bit error rates (BERs) versus Eb /N0 for an uncoded single-user DS spread spectrum system with random spreading codes of length N = 16. The mobile radio channel was assumed to be perfectly interleaved, that is, successive channel coefficients are independent of each other. The number of channel taps varies between Lt = 1 and Lt = 8 and their average power is uniformly distributed. Obviously, the performance becomes better with increasing diversity degree D = Lt . However, for growing Lt , the difference between the theoretical diversity curves from (1.118) and the true BER curves increases as well. This effect is caused by the growing path crosstalk between the Rake fingers due to imperfect autocorrelation properties of the employed spreading codes. 0
10
Lt = 1 Lt = 2 Lt = 4 Lt = 8 theory AWGN
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
5
10 15 Eb /N0 in dB →
20
Figure 4.5 Illustration of path crosstalk and diversity gain of Rake receiver 3 The exact expression should consider the case that the data symbol may change during the correlation due to the relative delay κ − κ . In this case, the even autocorrelation function (ACF) has to be replaced by the odd ACF defined in (4.37) on page 191.
CODE DIVISION MULTIPLE ACCESS
181 Ns Lt − 1
s[0] s[1] s[2] Figure 4.6 Structure of system matrix S for frequency-selective fading
Channel and Rake receiver outputs can also be expressed in vector notations. We combine all received samples y[k] into a single vector y and all transmitted symbols a[] into a vector a. Furthermore, s[] contains all Ns + Lt − 1 samples of the signature s[, k] for k = Ns , . . . , ( + 1)Ns + Lt − 2. Then, we obtain y = S · a + n,
(4.14)
where the system matrix S contains the signatures s[] as depicted in Figure 4.6. Each signature is positioned in an individual column but shifted by Ns samples. Therefore, Lt − 1 samples overlap leading to interfering consecutive symbols. For Ns Lt , this interference can be neglected. With vector notations and neglecting the normalization to unit energy, the Rake’s output signal in (4.9) becomes r = SH · y = SH S· a + SH n.
(4.15)
4.1.2 Direct-Sequence CDMA In CDMA schemes, spread spectrum is used for separating the signals of different subscribers. This is accomplished by assigning each user u a unique spreading code cu [, k] with 1 ≤ u ≤ Nu . The ratio between the number of active users Nu and the spreading factor Ns is denoted as the load Nu β= (4.16) Ns of the system. For β = 1, the system is said to be fully loaded. Assuming an error-free transmission, the spectral efficiency η of a system is defined as the average number of information bits transmitted per chip η=
Nu mNu = mRc · = mRc · β Gp Ns
(4.17)
and is averaged over all active users. In (4.17), m = log2 (M) denotes the number of bits per symbol a[] for M-ary modulation schemes. Obviously, spectral efficiency and system load are identical for systems with mRc = 1. Mathematically, the received signal can be conveniently described by using vector notations. Therefore, the system matrix S in (4.14) has to be extended so that it contains the signatures of all users as illustrated in Figure 4.7. Each block of the matrix corresponds
182
CODE DIVISION MULTIPLE ACCESS b)
a) u
u =2
=0
=2
=0
=1
=1
Figure 4.7 Structure of system matrix S for direct-sequence CDMA a) synchronous downlink, b) asynchronous uplink
to a certain time index and contains the signatures su [] of all users. Owing to this arrangement, the vector a = [a1 [0] a2 [0] · · · aNu [0] a1 [1] a2 [1] · · · ]T
(4.18)
consists of all the data symbols of all users in temporal order. Downlink Transmission At this point, we have to distinguish between uplink and downlink transmissions. In the downlink depicted in Figure 4.8, a central base station or access point transmits the user signals xu [k] synchronously to the mobile units. Hence, looking at the link between the base station and one specific mobile unit u, all signals are affected by the same channel hu [, κ]. Consequently, the signatures of different users v vary only in the spreading code, that is, sv [, κ] = cv [, κ] ∗ hu [, κ] holds, and the received signal for user u becomes (4.19) yu = S · a + nu = Thu [,κ] C · a + nu . c1 [, k] a1 []
√
P1
x1 [k] nu [k]
cNu [, k] aNu []
8
PNu
hu [, κ]
yu [k]
xNu [k]
Figure 4.8 Structure of downlink for direct-sequence CDMA system
CODE DIVISION MULTIPLE ACCESS
183
In (4.19), Thu [,κ] denotes the convolutional matrix of the time varying channel impulse response hu [, κ] and C is a block diagonal matrix C[0] C[1] (4.20) C= C[2] .. . containing in its blocks C[] = c1 [] · · · cNu [] the spreading codes T cu [] = cu [, Ns ] · · · cu [, ( + 1)Ns − 1] of all users. This structure simplifies the mitigation of MUI because the equalization of the channel can restore the desired correlation properties of the spreading codes as we will see later. However, channels from the common base station to different mobile stations are different, especially the path loss may vary. To ensure the demanded Quality of Service (QoS), for example, a certain signal to interference plus noise ratio (SINR) at the receiver input, power control strategies are applied. The aim is to transmit only as much power as necessary to obtain the required SINR at the mobile receiver. Enhancing the transmit power of one user directly increases the interference of all other subscribers so that a multidimensional problem arises. In the considered downlink, the base station chooses the transmit power according to the requirements of each user and the entire network. Since each user receives the whole bundle of signals, it is likely to happen that the desired signal is disturbed by high-power signals whose associated receivers experience poor channel conditions. This imbalance of power levels termed near–far effect represents a penalty for weak users because they suffer more under the strong interference. Therefore, the dynamics of downlink power control are limited. In wideband CDMA systems like UMTS (Holma and Toskala 2004), the dynamics are restricted to 20 dB, to keep the average interference level low. Mathematically, power control can be described by introducing a diagonal matrix P into (4.14) containing the user-specific power amplification Pu (see Figure 4.8). y = SP1/2 · a + n
(4.21)
Uplink Transmission Generally, the uplink signals are transmitted asynchronously, which is indicated by different starting positions of the signatures su [] within each block as depicted in Figure 4.7b. Moreover, the signals are transmitted over individual channels as shown in Figure 4.9. Hence, the spreading codes have to be convolved individually with their associated channel impulse responses and the resulting signatures su [] from (4.4) are arranged in a matrix S according to Figure 4.7b. The main difference compared to the downlink is that the signals interfering at the base station experienced different path losses because they were transmitted over different channels. Again, a power control adjusts the power levels Pu of each user such that
184 c1 [, k] a1 []
√
CODE DIVISION MULTIPLE ACCESS P1
x1 [k]
cNu [, k] aNu []
h1 [, κ]
n[k] y[k]
8 PNu
xNu [k] hNu [, κ]
Figure 4.9 Structure of uplink for direct-sequence CDMA system
its required SINR is obtained at the receiving base station. Contrary to the downlink, the dynamics are much larger and can amount to 70 dB in wideband CDMA systems (Holma and Toskala 2004). However, practical impairments like fast fading channels and an imperfect power control lead to SINR imbalances also in the uplink. Additionally, identical power levels are not likely in environments supporting multiple services with different QoS constraints. Hence, near–far effects also influence the uplink performance in a CDMA system. Receivers that care about different power levels are called near-far resistant. In the context of multiuser detectors, a near–far-resistant receiver will be introduced. Multirate CDMA Systems As mentioned in the previous paragraphs, modern communication systems like UMTS or CDMA 2000 are designed to provide a couple of different services, like speech and data transmission, as well as multimedia applications. These services require different data rates that can be supported by different means. One possibility is to adapt the spreading factor Ns . Since the chip duration Tc is a constant system parameter, decreasing Ns enhances the data rate while keeping the overall bandwidth constant (Tc = Ts /Ns → B = Ns /Ts ). However, a large spreading factor corresponds to a good interference suppression and subscribers with large Ns are more robust against MUI and path crosstalk. On the contrary, users with low Ns become quite sensitive to interference as can be seen from (4.27) and (4.31). These correspondences are similar to near–far effects – a small spreading factor is equivalent to a low transmit power and vice versa. Hence, low spreading users need either a higher power level than the interferers, a cell with only a few interferers, or sophisticated detection techniques at the receiver that are insensitive to these effects. The multicode technique offers another possibility to support multiple data rates. Instead of decreasing the spreading factor, several spreading codes are assigned to a subscriber demanding high data rates. Of course, this approach consumes resources in terms of spreading codes that can no longer be offered to other users. However, it does not suffer from an increased sensitivity to interference. A third approach proposed in the UMTS standard and limited to ‘hot-spot’ scenarios with low mobility is the HSDPA (high speed downlink packet access) channel. It
CODE DIVISION MULTIPLE ACCESS
185
employs adaptive coding and modulation schemes as well as multiple antenna techniques (cf. Chapter 6). Moreover, the connection is not circuit switched but packet oriented, that is, there exist no permanent connection between mobile and base station but data packets are transmitted according to certain scheduling schemes. Owing to the variable coding and modulation schemes, an adaption to actual channel conditions is possible but requires slowly fading channels. Contrary to standard UMTS links, the spreading factor is fixed to Ns = 16 and no power control is applied (3GPP 2005b).
4.1.3 Single-User Matched Filter (SUMF) The optimum single-user matched filter (SUMF) does not care about other users and treats their interference as additional white Gaussian distributed noise. In frequency-selective environments, the SUMF is simply a Rake receiver. As described earlier, its structure can be mathematically described by correlating y with the signature of the desired user. Using vector notations, the output for user u is given by H H H ru = SH u · y = Su Su · au + Su S\u · a\u + Su · n
(4.22)
where Su contains exactly those columns of S that correspond to user u (cf. Figure 4.6). Consequently, S\u consists of the remaining columns not associated with u. The same notation holds for au and a\u . The noise n˜ = SH u n is now colored with the covariance matrix N˜ N˜ = E{n˜ n˜ H } = σN2 SH u Su . If the signatures in Su are mutually orthogonal to those in S\u , then SH u S\u is always zero and ru does not contain any MUI. In that case, the MF describes the optimum detector and the performance of a CDMA system would be that of a single-user system with Lt -fold diversity. However, although the spreading codes may be appropriately designed, the mobile radio channel generally destroys any orthogonality. Hence, we obtain MUI, that is, symbols of different users interfere. This MUI limits the system performance dramatically. The output of the Rake receiver for user u can be split into four parts ru [] = rua [] + ruMUI [] + ruISI [] + n˜ u []
(4.23)
Comparing (4.23) with (4.11) shows that path crosstalk, ISI, and noise are still present, but a fourth term denoting the multiple access interference stemming from other active users now additionally disturbs the transmission. This term can be quantified by ruMUI [] =
Nu L t −1 κ=0 v=1 v=u
L t −1 8 Pv · h∗u [, κ]hv [, κ ] · φCu Cv [, κ − κ ] · au []av []
(4.24)
κ =0
where the factor Pv adjusts the power of user v. From (4.24), we see that the crosscorrelation function φCu Cv [, κ − κ ] of the spreading codes determines the influence of MUI. For orthogonal sequences, r MUI [] vanishes and the MF is optimum. Moreover, the SUMF is not near–far resistant because high-power levels Pv of interfering users increase the interfering power and, therefore, the error rate. Assuming a high number of active users, the interference is often modeled as additional Gaussian distributed noise due to the central limit theorem. In this case, the SNR defined
186
CODE DIVISION MULTIPLE ACCESS
in (1.14) has to be replaced by a (SINR). SNR =
σX2 σN2
−→
SINR =
σX2 σN2 + σI2
(4.25)
The term σI2 denotes the interference power, that is, the denominator in (4.25) represents the sum of interference and noise power. Generally, these powers vary in time because they depend on the instantaneous channel conditions. For simplicity, the following analysis on the additive white Gaussian noise (AWGN) channel is restricted. Assuming random spreading codes, the power of each interfering user is suppressed in the average by a factor Ns and Es 1 Es 2 σI2 = · Pv · φu,v [0] = · · Pv (4.26) Ts v=u Ns Ts v=u holds. Next, the difference between uplink and downlink is illuminated, especially for realvalued modulation schemes. For the sake of simplicity, only binary phase shift keying (BPSK) and quaternary phase shift keying (QPSK) are considered. Downlink Transmission for AWGN Channel Three cases are distinguished: 1. No power control and real symbols. If the modulation alphabet contains only real symbols, we consider only the real part of the matched filtered signal and only half of the noise power disturbs the transmission. Hence, σN2 = N0 /2/Ts has to be inserted into (4.25) (cf. page 12). Without power control, all users experience the same channel in the downlink so that their received power levels Pv = 1 are identical. The resulting average SINR for BPSK can be approximated by SINR ≈
Es Eb = . N0 /2 + (Nu − 1)Es /Ns N0 /2 + (Nu − 1)Eb /Ns
(4.27)
Obviously, enlarging the spreading factor Ns results in a better suppression of interfering signals for fixed Nu . Figure 4.10 shows the SINR versus the number of active users and SINR versus the 2Eb /N0 .4 We recognize that the SINR decreases dramatically for growing number of users. For very high loads, the SINR is dominated by the interference and the noise plays only a minor role. This directly affects the bit error probability so that the performance will degrade dramatically. According to the general result in (1.49) on page 21, the error probability amounts to !" # !" # σX2 σX2 1 1 Pb = · erfc = · erfc 2 2 σN2 2σN2 for BPSK transmission over an AWGN channel. The argument of the complementary error function is half of the effective SNR σX2 /σN2 after extracting the real part. Using 4 For BPSK, E = E holds. Furthermore, we use the effective SNR 2E /N after extracting the real part since b s b 0 this determines the error rate in the single-user case.
187
20
20
15
15
SINR in dB →
SINR in dB →
CODE DIVISION MULTIPLE ACCESS
Eb /N0
10 5
10 5
0
0
−5 0
−5 0
5 10 15 Nu = β · Ns →
20
Nu , β 5 10 15 2Eb /N0 in dB →
20
Figure 4.10 SINR for downlink of DS-CDMA system with BPSK, random spreading (Ns = 16) and AWGN channel, 1 ≤ Nu ≤ 20 this result and substituting the SNR by the SINR, we obtain for the considered CDMA system !A !" # # SINR Eb 1 1 . (4.28) Pb ≈ · erfc = · erfc 2 2 2 N0 + 2(Nu − 1)Eb /Ns Figure 4.11 shows the corresponding results. As predicted, the bit error probability increases dramatically with growing system load β. For large β, it is totally dominated by the interference. 0
10
Nu , β
−2
BER →
10
−4
10
−6
10
0
5
10 15 Eb /N0 in dB →
20
Figure 4.11 Bit error probability for downlink of DS-CDMA system with BPSK, random spreading (Ns = 16) and an AWGN channel, 1 ≤ Nu ≤ 20
188
CODE DIVISION MULTIPLE ACCESS 0
10
−1
BER →
10
−2
Es /N0
10
−3
10
0
1
10
10 Pv →
2
10
Figure 4.12 Bit error probability for downlink of DS-CDMA system with power control, BPSK, random spreading (Ns = 16) and an AWGN channel, Nu = 3 users 2. No power control and complex symbols. If we use a complex QPSK symbol alphabet, the total noise power σN2 instead of σN2 affects the decision and (4.27) becomes with Es = 2Eb SINR ≈
Es Eb = . N0 + (Nu − 1)Es /Ns N0 /2 + (Nu − 1)Eb /Ns
(4.29)
This is the same expression in terms of Eb as in (4.27). Therefore, the bit error rates of inphase and quadrature components equal exactly those of BPSK in (4.28) when Eb is used. This result coincides with those presented in Section 1.4. 3. Power control and real symbols. As a last scenario, we look at a BPSK system with power control where the received power of a single-user v is much higher than that of the other users (Pv Pu=v ). The SINR results in SINRu ≈
Eb . N0 /2 + Eb /Ns v=u Pv
(4.30)
Figure 4.12 shows the results obtained for Nu = 3 users from which one of the interferers varies its power level while the others keep their levels constant. Obviously, the performance degrades dramatically with growing power amplification Pv of user v. For Pv → ∞, the SNR has no influence anymore and the performance is dominated by the interferer. Hence, the SUMF is not near–far resistant. Uplink Transmission The main difference between uplink and downlink transmissions is the fact that in the first case each user is affected by its individual channel, whereas the signals arriving at a certain mobile are passed through the same channel in the downlink. We now assume
189
20
20
15
15
SINR in dB
SINR in dB
CODE DIVISION MULTIPLE ACCESS
Eb /N0
10 5 0
10 5 0
Nu , β −5 0
5 10 15 Nu = β · Ns →
20
−5 0
5 10 15 Eb /N0 /2 in dB →
20
Figure 4.13 SINR for uplink of DS-CDMA system with BPSK, random spreading (Ns = 16) and AWGN channels with random phases, 1 ≤ Nu ≤ 20
a perfect power control that ensures the same power level for all users at the receiver. Note that this differs from the downlink where all users are influenced by the same channel and a power control would result in different power levels. Again, we restrict to the AWGN channel but allow random phase shifts by ϕu on each channel. We distinguish two cases: 1. Real symbols and AWGN channel with random phases. After coherent reception by multiplying with e−j ϕu , real-valued modulation schemes like BPSK benefit from the fact that the interference is distributed in the complex plane due to ej (ϕv −ϕu ) with ϕv − ϕu = 0 while the desired signal is contained only in the real part. Hence, only half of the interfering power affects the real part and the average SINR becomes SINR ≈
Es 2Eb = . BPSK N0 /2 + 1/2 · (Nu − 1)Es /Ns N0 + (Nu − 1)Eb /Ns
(4.31)
Figure 4.13 shows the corresponding results for AWGN channels. A comparison with Figure 4.10 shows that the SINRs are much larger, especially for high loads, and that a gain of 3 dB is asymptotically achieved. With regard to the performance, 1 Pb ≈ · erfc 2
!A
SINR 2
#
1 = · erfc 2
!"
Es N0 + (Nu − 1)Es /Ns
# (4.32)
delivers the results depicted in Figure 4.14. A comparison with the downlink in Figure 4.11 illustrates the benefits of real-valued modulation schemes in the uplink, too. However, it has to be emphasized that complex modulation alphabets have a higher spectral efficiency, that is, more bits per symbol can be transmitted.
190
CODE DIVISION MULTIPLE ACCESS 0
10
Nu , β
−2
BER →
10
−4
10
−6
10
0
5
10 15 Eb /N0 in dB →
20
Figure 4.14 Bit error probability for uplink of DS-CDMA system with BPSK, random spreading codes (Ns = 16) and an AWGN channel with random phases, 1 ≤ Nu ≤ 20 2. Complex symbols and AWGN channel with random phases. It is straightforward to recognize that the distribution of the interference in the complex plane provides no advantage for complex-valued modulation schemes like QPSK. Therefore, the entire interfering power σI2 disturbs the transmission and the SINR becomes SINR ≈
Es Eb = . N0 + (Nu − 1)Es /Ns N0 /2 + (Nu − 1)Eb /Ns
(4.33)
Comparing (4.31) with (4.33) in terms of Eb , we see that BPSK and QPSK behave differently in the uplink. It has to be mentioned that orthogonal spreading codes could be employed for synchronous frequency-nonselective channels. In this case, no MUI would disturb the transmission and the above discussion would be superficial. Nevertheless, the assumptions simplified the above analysis and the principle differences between uplink and downlink still hold for frequency-selective channels. With reference to a synchronous downlink transmission, even the use of scrambled orthogonal sequences (see page 192) makes sense because synchronous signal components are perfectly suppressed and only asynchronous parts interfere. Furthermore, the equalization of the single channel could restore orthogonality. For totally asynchronous transmissions, orthogonal codes generally do not lead to any advantage. We saw that the auto and crosscorrelation properties of spreading codes play a crucial role in CDMA systems. Therefore, the next section briefly introduces some important code families. Afterwards, the performance of a single-user MF is discussed in the context of OFDM-CDMA and in coded environments.
CODE DIVISION MULTIPLE ACCESS
191
4.1.4 Spreading Codes Requirements for Spreading Codes As illustrated above, the correlation properties of spreading codes have deep impact on the performance of spread spectrum and CDMA systems. To simplify the notation, we consider the spreading codes cu [, k] only within one symbol period. Hence, the time variable is a constant and can be neglected in this section. The even or periodic correlation function for real signals is defined by [κ] = φCeven u Cv
N s −1
cu [k]cv [k + κ].
(4.34)
k=0
For u = v, it represents the even (periodic) autocorrelations function, and for u = v, the even crosscorrelation function. Regarding short codes, φCu Cv [κ] is itself periodic with a period that equals the spreading factor Ns . On the one hand, different propagation paths must be separated by the Rake receiver to exploit the diversity provided by a frequency-selective channel and to avoid path crosstalk. This requires an impulse-like shape of the autocorrelation function, that is, φCu Cu [κ] ≈ 0 for κ = 0 is desirable. This property is also important for synchronization purposes. On the other hand, interfering users must be suppressed sufficiently by low crosscorrelations φCu Cv=u [κ] ≈ 0 for arbitrary κ. Unfortunately, both conditions cannot be fulfilled simultaneously as shown in Sarvate and Pursley (1980). Hence, a trade-off between autocorrelation and crosscorrelation properties is required. Moreover, a lot of spreading sequences should exist to provide many users access to the system. Regarding the uplink of a CDMA system, the transmission is generally asynchronous. Therefore, changes of the data symbols au [] occur during correlation and the definition given in (4.34) cannot be applied anymore. In fact, we need the nonperiodic correlation function Ns −1−κ cu [k] · cv [k + κ] 0 ≤ κ < Ns k=0 (4.35) [κ] = φCnon Ns −1+κ u Cv cu [k − κ] · cv [k] −Ns < κ ≤ 0. k=0
With (4.35), the even correlation function in (4.34) becomes φCeven [κ] = φCnon [κ] + φCnon [Ns − κ]. u Cv u Cv u Cv
(4.36)
Equivalently, the odd correlation function [κ] = φCnon [κ] − φCnon [Ns − κ] φCodd u Cv u Cv u Cv
(4.37)
describes the correlation between two BPSK modulated signals u[k] and v[k] if they have a mutual delay κ and one of the information symbols au [] or av [] changes its sign during correlation.
192
CODE DIVISION MULTIPLE ACCESS
Orthogonal Spreading Codes With respect to MUI, orthogonal spreading codes would be optimum because they suppress interfering signals perfectly. An example of orthogonal codes are Hadamard codes or Walsh sequences (Harmuth 1964, 1971; Walsh 1923) that were already introduced as forward error correction (FEC) codes in Section 3.2.4. For a synchronous transmission over frequencynonselective channels, signals can be perfectly separated because
1 u=v (4.38) φCwu Cwv [κ = 0] = 0 else holds. Walsh sequences exist for those lengths Ns for which Ns , Ns /12 or Ns /20 are powers of 2. The number of Walsh sequences of a given length Ns equals exactly Ns . Since autocorrelation and crosscorrelation properties cannot be perfect at the same time, we can conclude that the autocorrelation properties of Walsh sequences are rather bad. Moreover, even the crosscorrelation can take quite large values for κ = 0. Hence, for asynchronous transmissions or frequency-selective mobile radio channels, the orthogonality is destroyed and severe interference makes a reliable transmission nearly impossible. A solution to this problem is to combine Walsh sequences cuw [k] with an outer scrambling code cs [k]. This second code does not perform an additional spreading because the duration of its chips is the same as for cuw [k]. In cellular networks, cs [k] is usually identical for all users within the same cell (Dahlman et al. 1998; Gilhousen et al. 1991) and allows for cell identification. Therefore, the new code cu [k] = cuw [k] · cs [k]
(4.39)
maintains orthogonality for synchronous signal components and suppresses asynchronous parts, like a random code. Maximum Length Sequences (m-sequences) To have an efficient implementation, spreading codes are often generated by feedback shift registers. Figure 4.15 shows an example with a register length of m = 9. The register can be initialized arbitrarily with ±1 and generates a periodic sequence. If the polynomial g(D) = g0 + g1 D + · · · gm D m describing the feedback structure of the register is prime, the period of the sequence is maximized to 2m − 1 and the register passes through all 2m − 1 states except the all-one-state within one period. Therefore, these sequences are termed m-sequences or maximum length sequences. One important property of m-sequences is that they have a near-optimum autocorrelation function
1 for κ = 0 m−seq (4.40) φCu Cu [κ] = −1/Ns else. Hence, they are called quasi-orthogonal since they tightly approach an impulse-like shape. The power of asynchronous replicas of the desired signal can be suppressed by a factor Ns−2 . With respect to the crosscorrelation, m-sequences perform much worse. Moreover, given a certain spreading factor Ns , there exist only a few m-sequences. This dramatically limits the applicability in CDMA systems because only few users can be
CODE DIVISION MULTIPLE ACCESS
g0=1
193
g4=1 g1=g2=g3=0 g5=g6=g7=g8=0 1
2
3
4
5
6
7
8
g9=1
√ 1/Ns c[k]
9
Figure 4.15 Feedback shift register of length m = 9 for generating an m-sequence with period = 29 − 1 = 511 supported. As an example, there exist only six m-sequences of length Ns = 31 whose feedback polynomials are g1 (D) = 1 + D 2 + D 5 g3 (D) = 1 + D + D 2 + D 3 + D 5 g5 (D) = 1 + D + D 3 + D 4 + D 5
g2 (D) = 1 + D 3 + D 5 g4 (D) = 1 + D + D 2 + D 4 + D 5 g6 (D) = 1 + D 2 + D 3 + D 4 + D 5 .
(4.41)
Gold Codes Gold discovered in 1967 that the crosscorrelation between certain pairs of m-sequences take only three different values. Moreover, such preferred pairs can be used to construct a whole family of codes that have the same period as well as the same correlation property (Gold 1967). This is accomplished by multiplying the outputs of the corresponding shift registers as shown in Figure 4.16. Different codes are generated by inserting different delays between both registers. The delay n can be adjusted between n = 0 and n = 2m − 1. Hence, a set of 2m + 1 Gold codes can be constructed on the basis of a preferred pair of m-sequences including the two generating m-sequences itself.
√ 1/Ns
1
2
3
4
5
6
7
8
9 c[k]
1
2
3
4
5
6
7
8
9
z−n
Figure 4.16 Pair of feedback shift register of length m = 9 for generating a Gold code of length 29 − 1 = 511
194
CODE DIVISION MULTIPLE ACCESS Table 4.1 Preferred pairs of m-sequences for generation of Gold code with length 31 g1 (D) g1 (D) g2 (D) g3 (D) g4 (D) g5 (D) g6 (D)
g2 (D)
g3 (D)
g4 (D)
g5 (D)
g6 (D)
x x – x x
x x x –
x x x
x x
– x x x x
– x x x x
x
– x
x x –
Table 4.1 shows preferred pairs of m-sequences for the feedback polynomials listed in (4.41). The sequence length is Ns = 31 (register length m = 5). Gold codes can also be generated by a single shift register whose length is twice as large as m (Pickholtz et al. 1982). Consequently, Gold codes are no maximum length sequences. Furthermore, they exist only for shift registers whose lengths are not multiples of four. The even crosscorrelation function of Gold codes takes the values −1 1 (m+2)/2 [κ] = · (4.42) φCGold −2 −1 u Cv Ns (m+2)/2 −1 , 2 where x denotes the integer part of x (Sarvate and Pursley 1980). A large number of sequences like Kasami sequences or complex-valued spreading codes exist, which are not addressed in this book. Since the code optimization is not the focus, random codes without any designed correlation properties are often used.
4.2 OFDM-CDMA 4.2.1 Multicarrier Transmission The above sections showed that spreading with sequences having appropriate correlation properties is a suited mean to overcome the frequency selectivity of the transmission channel. Moreover, diversity is gained by separating different propagation paths during despreading and combining them according to the MRC principle (see Rake receiver on page 178). A different way to treat frequency selectivity is to use narrow-band signals whose bandwidth B is much smaller than the coherence bandwidth of the channel. In this case, ISI can be neglected and equalization is carried out by multiplying the received signal with the complex conjugate channel coefficient. However, a small bandwidth corresponds to large symbol durations and low data rates. To overcome this penalty, multiple narrow-band data streams can be transmitted in parallel directions on different subcarriers. Figure 4.17 shows the structure of a MC transmitter. As in a conventional single-carrier system, the bits d[i] are fed to a linear M = 2m -ary mapper. Afterwards, the symbols a[] ˜ of duration Ts are mapped onto Nc parallel streams a[l, µ] = a[lN ˜ c + µ] with 0 ≤ µ < Nc where they are lowpass filtered with gT (t) (refer to Section 1.2). Next, the signals are multiplied with their associated carriers. Note that the
CODE DIVISION MULTIPLE ACCESS a[l, 0] a[l, 1] d[i]
mapper
a[] ˜
195 ej 2πf0 t gT (t) ej 2πf1 t gT (t)
S/P
a[l, Nc − 1]
x(t)
ej 2πfNc −1 t gT (t)
Figure 4.17 Structure of multicarrier transmitter frequencies fµ do not describe the transmission band but still the equivalent baseband, that is, −fmax ≤ fµ ≤ +fmax holds. The complex baseband signal has the form xMC (t) =
∞ N c −1
a[l, µ] · gT (t − lTMC ) · ej 2πfµ t .
(4.43)
l=−∞ µ=0
The symbol rate of one MC symbol amounts to 1 TMC
=
1 1 = Nc · m · Tb Nc · Ts
(4.44)
and the bandwidth of each sub-stream is Nc times smaller than for a single-carrier transmission. However, MC systems also come along with some drawbacks. First, the superposition of Nc independent signals results in a complex envelope |x(t)| that varies over a wide range. This is essentially a problem for power amplifiers. They are designed to work efficiently at a defined working point, that is, a certain magnitude of the signal. If this working point is exceeded, the amplifier loses its linearity and efficiency. Therefore, signals whose magnitudes have a large peak-to-average ratio are often nonlinearly distorted. This leads to a higher error rate performance as well as to an out-of-band radiation that is essentially bad in FDMA systems. Additionally, the narrow bandwidth leads to flat fading conditions on each subcarrier. Therefore, no frequency diversity can be exploited.
4.2.2 Orthogonal Frequency Division Multiplexing OFDM represents a special kind of MC technique. The first ideas for a MC transmission can be traced back to (Saltzberg 1967; Weinstein 1971) and have been reinvented several times (Bingham 1990; Kammeyer et al. 1992; Kolb 1981). A practical breakthrough happens with the definition of the European terrestrial digital audio broadcasting (DAB (Hoeg and Lauterbach 2001; Schulze and L¨uders 2005). DAB employed OFDM for the first time in a mobile radio application. Shortly after DAB, the terrestrial digital video broadcasting (DVB-T), also employing OFDM, was defined (Reimers 1995; Schulze 1998). Today,
196
CODE DIVISION MULTIPLE ACCESS
OFDM is used in wireless local area networks (WLAN) according to the hiperlan/2 or the IEEE 802.11a standards (ETSI 2000, 2001; Hiperlan2 1999). OFDM is also applied in wired transmission systems like digital subscriber line (DSL) technologies. In this context, the MC technique is termed discrete multitone (DMT). Transmitter Design Starting with the above-described MC transmitter, we obtain an OFDM transmitter by using the following specifications. The lowpass filter gT (t) has a rectangular shape
1 for 0 ≤ t < TMC gT (t) = (4.45) 0 else . Since the rectangular impulse in (4.45) corresponds to the sinc-function in the frequency domain . . . . . . . . .GT (j 2πf ). = . sin(2πf TMC /2) . = .sinc(πf TMC )., (4.46) . 2πf T /2 . MC
the spectrum has equidistant zeros at multiples of f = 1/TMC . Adjusting the carrier spacing according to fµ − fµ−1 = f = 1/TMC , the sinc-functions overlap such that a maximum of one subcarrier coincides with zeros of all other sinc-impulses. Therefore, the first Nyquist condition is fulfilled in the frequency domain and parallel signals are mutually orthogonal so that no interference disturbs the transmission. This explains the naming ‘orthogonal’ in OFDM and is illustrated in Figure 4.18. In Chapter 1, we derived a time-discrete channel model that is used for subsequent investigations. Therefore, we do not proceed with the time-continuous signal xMC (t) but a sampled version of it. Inserting (4.45) into (4.43), we obtain with fµ = µ/TMC = µ/(Nc Ts ) the time-discrete signal x[] := x(Ts ) =
N c −1
a[l, µ] · ej 2πµ/(Nc Ts )Ts
µ=0
=
N c −1
a[l, µ] · ej 2πµ/Nc = Nc · IDFT(µ) {a[l, µ]}
(4.47)
µ=0
of rate 1/Ts = Nc /TMC . The samples in the interval lNc ≤ < (l + 1)Nc represent the l-th OFDM symbol of length Nc . Moreover, we recognize from (4.47) that the OFDM transmitter can be efficiently implemented with the inverse fast Fourier transform (IFFT). Figure 4.19 shows the whole OFDM transmitter.
B = Nc / TMC ...
...
... f0 f 1 f2
...
... fNc −1
Figure 4.18 Spectrum of OFDM transmit signal
f
CODE DIVISION MULTIPLE ACCESS
197
OFDM transmitter a[l, 0] a[l, 1]
x[] IFFT
P/S
guard interval
time-discrete channel
a[l, Nc − 1]
Figure 4.19 Structure of OFDM transmitter The Guard Interval The most important property of OFDM is the orthogonality in time as well as in frequency domain. In the time domain, this is ensured by using nonoverlapping rectangular impulses gT (t); in the frequency domain, the specific spacing of the subcarriers avoids intercarrier interference (ICI). However, this orthogonality is destroyed by frequency-selective channels as depicted in Figure 4.20. Without loss of generality, we regard the OFDM symbol at time instant 0 and its direct neighbors. If Lt is the length of channel impulse response, Lt − 1 samples of consecutive OFDM symbols overlap due to the post-transient phase. This effect is termed ISI and destroys orthogonality in time. Moreover, the pre-transient phase leads to ICI. Orthogonality can be maintained by inserting a cyclic prefix called guard interval in front of each OFDM symbol. As demonstrated in Figure 4.21, each OFDM symbol will be preceded by its own tail in the interval. If the length of the guard interval Ng is at least as long as the length of the channel impulse response Lt , the post-transient phase of symbol −1 and the pre-transient phase of symbol 0 are restricted to the guard interval and do not affect the core OFDM symbols. The insertion of the cyclic prefix results in a cyclic convolution of channel impulse response and OFDM symbol. According to the properties of the discrete Fourier transformation, a cyclic convolution of two signals corresponds to the product of the associated discrete Fourier transform (DFT) spectra. Hence, for κmax < Ng , we obtain x[] ˜ = x[] h[l, − lNc ] ◦ | •
(4.48)
˜ µ] = DFT() {x[]} · DFT() {h[l, ]} = a[l, µ] · H [l, µ] X[l, a[l, µ] H [l, µ] where denotes the cyclic convolution with respect to . We can conclude from (4.48) that the symbols a[l, µ] on each subchannel are multiplied by a scalar complex channel
198
CODE DIVISION MULTIPLE ACCESS
OFDM symbols symbol l = −1 symbol l = 0 symbol l = +1 Nc
0
t
2Nc
h[l, κ] post-transient phase (ISI) pre-transient phase (ICI)
received symbols
length of receive filter Figure 4.20 Influence of frequency-selective channel on OFDM transmission OFDM symbols
l = −1 −(Nc + Ng )
l=0 −Ng h[l, κ]
0
l = +1
t
Nc Nc + Ng
t
h[l − (N c + Ng ), κ] post-transient phase (ISI) pre-transient phase (ICI)
received symbols
length of receive filter Figure 4.21 Effect of guard interval
CODE DIVISION MULTIPLE ACCESS
199
coefficient H [l, µ]. Therefore, we have flat fading conditions on each subcarrier and no diversity can be gained for uncoded OFDM transmissions. Receiver Design The above mentioned properties simplify the receiver structure remarkably as depicted in Figure 4.22. First, the guard interval is removed and only the core OFDM symbol in the time interval lNc ≤ < (l + 1)Nc is processed. Since this also removes pre-transient and post-transient phases, we have a steady state within each core symbol and can process successive OFDM symbols independently. Furthermore, orthogonality between subcarriers is maintained. After transforming the core symbol back into the frequency domain, we obtain the received vector at time instant l ˜ y[] ˜ = x[] h[l, − lNc ] + n[] ◦ | • y[l, µ] = H [l, µ] · a[l, µ] + n[l, µ].
(4.49)
Owing to the cyclic prefix, we have transformed the frequency-selective channel into a flat fading channel on each subcarrier and the received vector can be described in the frequency domain by y[l] = H[l] · a[l] + n[l] (4.50) where H[l] = diag H [l, 0] · · · H [l, Nc − 1] represents a matrix with the transfer function on the main diagonal. The vectors a[l] and n[l] contain the information symbols a[l, µ] and the noise samples n[l, µ] for 0 ≤ µ < Nc , respectively. The noise statistics are not affected by the Fourier transformation because it is orthogonal, that is, n[l, µ] is still Gaussian distributed with zero mean and variance σN2 = N0 /Ts . According to Figure 4.22, equalization is now performed by weighting the symbols y[l, µ] with scalar equalizer coefficients. a[l, ˆ µ] = E[l, µ] · y[l, µ] ⇔ aˆ [l] = E[l] · y[l] (4.51) OFDM receiver
E[l, 0] a[l, ˆ 0]
y[l, 0]
E[l, 1] time-discrete channel
remove guard interval
y[k] ˜
y[l, 1]
S/P
a[l, ˆ 1]
FFT E[l, Nc − 1] y[l, Nc − 1]
Figure 4.22 Structure of OFDM receiver
a[l, ˆ Nc − 1]
200
CODE DIVISION MULTIPLE ACCESS
In (4.51), E[l] is a diagonal matrix containing the equalizer coefficients E[l, µ] of each subcarrier. For the single-user case, E[, µ] = H [, µ]−1 is an appropriate choice, that is, the channel is perfectly equalized according to the zero-forcing (ZF) criterion. Estimates on very weak subchannels are very unreliable, leading to high error rates in uncoded systems. If channel coding is applied, reliability information has to be passed to the decoder for each bit. For BPSK and QPSK, this is equivalent to using E[, µ] = H [, µ]∗ . For modulation schemes with M > 4, appropriate soft-output demodulation has to be performed. Further information on equalization techniques in the context of multiusers is presented in the next subsection. The main benefit of OFDM is the extremely simple equalization. This advantage is obtained at the expense of a reduced spectral efficiency due to the insertion of the cyclic prefix. Removing the prefix at the receiver results in a violation of the MF principle and consequently in an SNR loss δ2 = 1 −
Ng Nc 1 = = . Nc + Ng Nc + Ng 1 + Ng /Nc
(4.52)
Obviously, the SNR loss and, equivalently, the efficiency of OFDM depend on the ratio Ng /Nc , that is, the fraction of the guard interval compared to the core OFDM symbol. The larger the ratio, the larger the loss (small δ 2 ). A practical value is Ng /Nc = 0.25 resulting in a loss of approximately 1 dB. Since the minimum length of the guard interval is fixed by the channel length Lt , the efficiency can be improved by increasing the number of carriers Nc . However, this also enlarges total symbol duration Nc + Ng which may become critical for time varying channels.5 Therefore, we always have to find a compromise between frequency selectivity, time selectivity, and spectral efficiency in OFDM systems. It is often stated that OFDM has a lower spectral efficiency compared to single-carrier systems due to the guard interval. However, a fair comparison can only be drawn if real transmit filters gT (t) – for example, root raised cosine filters with realistic roll-off factor – having similar sharp edges in the power density spectrum are taken into consideration.
4.2.3 Combining OFDM and CDMA It is quite obvious that spread spectrum and OFDM represent two opposite techniques to treat frequency-selective channels. Spread spectrum systems use large bandwidths to have the capability for separating the propagation paths with the Rake receiver. Since this separation is not perfect, the receiver suffers from path crosstalk. On the contrary, OFDM divides the signal into parallel narrow-band components from which each experiences only a flat channel. This allows very simple receivers at the expense of an SNR loss due to the guard interval. A promising candidate for future mobile radio systems is the combination of both approaches (Hanzo et al. 2003a,b). There exist several possibilities to combine MC techniques and CDMA (DaSilva and Sousa 1993; Dekorsy 2000; Fazel and Papke 1993; Kaiser 1998; Vandendorpe 1995; Yee et al. 1993). We restrict to the combination of OFDM and CDMA called OFDM-CDMA with spectral spreading in the frequency domain (Dekorsy 2000; K¨uhn et al. 2000b). 5 It was always assumed that the channel remains constant during one OFDM symbol. Otherwise, the orthogonality between subcarriers is lost.
CODE DIVISION MULTIPLE ACCESS
spreading
S/P
OFDM transmitter
201
channel
OFDM receiver
P/S
despreading
Figure 4.23 Structure of OFDM-CDMA system xu [l]
au [l]
aˆ u [l]
y[l]
Ns
Ns
S/P
Nb
Nb
P/S
Ns
Ns Cu [l] spreading
Hu [l]
n[l]+MUI
OFDM channel
Eu [l] despreading
Figure 4.24 Alternative description of OFDM-CDMA system
Figure 4.23 shows the principal structure of an OFDM-CDMA system. It simply consists of a serial concatenation of the classical DS spreading part and the block-oriented operating OFDM part. In contrast to a single-carrier CDMA, spreading is now performed in the frequency domain. This is demonstrated in more detail in Figure 4.24. Generally, the serial to parallel converter extracts Nb symbols out of a˜ u [] that contribute to one OFDM symbol of user u. Each of these symbols is spread by a factor Ns resulting in Nc = Nb · Ns chips modulating the different subcarriers. Since these chips represent the input of the IFFT block, spreading takes place in the frequency domain. Exploiting the OFDM properties derived in the last section, we obtain Nc parallel data streams that are weighted with the coefficients Hu [l, µ] of the channel transfer function (flat fading) and disturbed by independent noise samples n[l, µ] as well as potential multiuser interference. Hence, path crosstalk known from the Rake receiver is avoided at the expense of a reduced spectral efficiency due to the cyclic prefix. Assuming a quasi-synchronous transmission, exactly this prefix enables us to perform simple block-oriented signal processing at the receiver because interblock interference is mitigated.6 On the contrary, single-carrier CDMA requires that a sequence of consecutive 6 Quasi-synchronous means that the sum of maximal mutual delay among all users and maximal channel delay is limited to the length of the guard interval. In this case, the core OFDM symbols of all users can be processed in a single step and intersymbol interference is mitigated.
202
CODE DIVISION MULTIPLE ACCESS
symbols, which mutually interfere, be considered. The l-th received OFDM symbol y[l] = T y0 [l] · · · yNc −1 [l] has the well-known form y[l] = S[l] · a[l] + n[l],
(4.53)
where a[l] comprises the symbols of all users contributing to this symbol T a[l] = a1 [l, 0] · · · a1 [l, Nb − 1] · · · aNu [l, Nb − 1] .
(4.54)
Synchronous Downlink Transmission As explained before on page 186, the downlink toward a specific user is characterized by the fact that all signals experience the same channel. Looking at the u-th mobile receiver, the signature matrix S[l] becomes Su [l] = Hu [l] · C[l] (4.55) function of with Hu [l] = diag Hu [l, 0] · · · Hu [l, Nc − 1] containing the channel transfer user u on its diagonal. The spreading matrix C[l] = C1 [l] · · · CNu [l] concatenates the user-specific matrices cu [l, 0] Cu [l] =
..
.
cu [l, Nb − 1]
(4.56)
T which consist of vectors cu [l, µ] = cu,0 [l, µ] · · · cu,Ns −1 [l, µ] representing the spreading code of user u for symbol au [l, µ]. If only a single symbol is mapped onto one OFDM symbol, the matrices Cu [l] reduce to column vectors. In multirate CDMA systems with various symbol rates and spreading factors, the number Nb of symbols associated with one OFDM symbol and, hence, the length of different spreading code parts cu [l, µ] vary among users. This leads to different structures of the matrices Cu [l]. At the receiver, we now have to perform the despreading. The specific form of the signature matrix in (4.55) allows different, very efficient ways for despreading (Fazel and Kaiser 2003; Hanzo et al. 2003a). Maximum ratio combining (MRC) According to the classical MF, we have to multiply y[l] with the Hermitian form of Su [l] = Hu [l] · Cu [l] leading to T H ru [l] = EMRC [l] · y[l] = SH u u [l] · y[l] = Cu [l] · Hu [l] · y[l].
(4.57)
Multiplication with HH u [l] weights each chip with the corresponding conjugate complex channel coefficient, ensuring a coherent reception. The subsequent despreading with SH u [l] performs an MRC. Inserting the structure of y[l] into (4.57), the Nb output symbols for
CODE DIVISION MULTIPLE ACCESS
203
user u are obtained by ˜ ru [l] = CTu [l] · HH u [l] · Hu [l] · Cu [l] · a[l] + n[l].
(4.58)
Although this approach maximizes the SNR and perfectly exploits diversity, it does not consider MUI, which dramatically limits the system performance (Dekorsy 2000; Kaiser T 1998). The diagonal matrix HH u [l] · Hu [l] between Cu [l] and Cu [l] in (4.58) destroys the orthogonality of the spreading codes because the chips of the spreading codes are weighted with different magnitudes. The performance degradation is the same as in single-carrier CDMA systems. Orthogonal restoring combining (ORC) The influence of MUI can be easily overcome in OFDM-CDMA systems. Restoring the orthogonality is possible by perfectly equalizing the channel also known as ZF solution (Fazel and Kaiser 2003). In OFDM-based systems, this is easily implemented by −1 dividing −1each symbol−1in y[l] with the corresponding channel coefficient. With Hu [l] = diag Hu [l, 0] · · · Hu [l, Nc − 1] , we obtain / 0 ru [l] = EORC [l] · y[l] = CTu [l] · H−1 u u [l] · Hu [l] · C[l] · a[l] + n[l] = CTu [l] · C[l] · a[l] + CTu [l] · H−1 u [l] · n[l].
(4.59)
If the partial spreading codes cu [l, µ] of different users are mutually orthogonal, CTu [l] · C[l] = [0Nb ×(u−1)Nb INb 0Nb ×(Nu −u)Nb ] holds. Hence, the multiplication with CTu [l] suppresses all users except user u and (4.59) becomes ru [l] = au [l] + CTu [l] · H−1 u [l] · n[l].
(4.60)
We see that the desired symbols au [l] have been perfectly extracted, and only the modified background noise disturbs a decision. However, this same background noise is often significantly amplified by dividing through small channel coefficients leading to high error probabilities, especially at low SNRs. This effect is well-known from ZF equalization (Kammeyer 2004) and linear multiuser detection (Moshavi 1996). A comparison with the linear ZF detector in Subsection 5.2.1 on page 234 shows the following equivalence. For a fully loaded system with Ns = Nu , C[l] is an orthogonal Nu × Nu matrix. Neglecting time indices, the ZF criterion (4.59) delivers with S = HC −1 E = SH S SH = C−1 H−1 H−H C−H CH HH = CT H−1 . (4.61) Obviously, (4.61) coincides with EORC [l] in (4.59). For the downlink, OFDM-CDMA allows u a very efficient implementation of the ZF multiuser detector. Equal gain combining (EGC) Two approaches exist that try to find a compromise between interference suppression and noise amplification. In the first, instead of dividing through a channel coefficient, we could just correct the phase shift and keep the amplitude constant. Hence, all chips experience
204
CODE DIVISION MULTIPLE ACCESS
the same ‘gain’ resulting in ru [l] = EEGC [l] · y[l] u H ∗ [l,0]
u
= CTu [l] ·
|Hu [l,0]|
..
.
Hu∗ [l,Nc −1] |Hu [l,Nc −1]|
|H = CTu [l] ·
2 u [l,0]| |Hu [l,0]|
..
(Hu [l] · C[l] · a[l] + n[l])
. −1]|2
|Hu [l,Nc |Hu [l,Nc −1]|
· C[l] · a[l] + n[l]. ˜
(4.62)
From (4.62) we see that the equalizer coefficients have unit magnitudes so that the noise is not amplified. The second impact is that amplitude variations of the channel transfer function are not emphasized by the equalizer so that the originally perfect correlation properties of the spreading codes become not so bad after equalization as for MRC. Minimum mean squared error (MMSE) A second possibility to avoid an amplification of the background noise is to use the MMSE solution. Starting with the MMSE criterion ,5 52 Eu = argmin E 5Wy[l] − au [l]5 W
,5 52 = argmin E 5W Hu [l]C[l]a[l] + n[l] − au [l]5 ,
(4.63)
W
for user u, a solution is obtained by setting the derivation with respect to EH u to zero and solving the equation system. This yields ! EMMSE [l] u
=
CTu [l]
·
HH u [l]
· Hu [l] ·
HH u [l]
+
σN2 σA2
#−1 · INc
.
(4.64)
Since Hu [l] is a diagonal matrix, the application of (4.64) results in [l] · y[l] ru [l] = EMMSE u |H [l,0]|2
u
= CTu [l]
2 /σ 2 |Hu [l,0]|2 +σN A
..
. |Hu [l,Nc −1]|2 2 /σ 2 |Hu [l,Nc −1]|2 +σN A
˜ C[l]a[l] + n[l].
(4.65)
Obviously, we have to add the ratio between noise power σN2 and signal power σA2 to the squared magnitudes in the denominators. This avoids the noise amplification at the subcarriers with deep fades. For infinite high SNR, σN2 /σA2 → 0 holds and the MMSE equalization equals the ORC scheme.
CODE DIVISION MULTIPLE ACCESS
205
Similar to the ORC solution, we can compare (4.64) with the linear MMSE multiuser detector on page 238. For a fully loaded system with Ns = Nu and orthogonal spreading codes, C[l] is an orthogonal Nu × Nu matrix and C[]T C[] = INu holds. The MMSE criterion in (5.37) delivers with S = HC ! ! #−1 #−1 2 2 σ σ N N SH = CT HH HC + 2 INu CT HH E = SH S + 2 INu σA σA ! #−1 σN2 −1 H =C C−T CT HH H H + 2 INu σA ! #−1 σN2 T H = C H H + 2 INu HH . (4.66) σA Since the channel matrices are diagonal, (4.66) and (4.64) are identical. Hence, OFDMCDMA allows a very efficient implementation of the MMSE multiuser detector for the downlink without matrix inversion. To evaluate the performances of the described equalization techniques, we consider the synchronous downlink of an OFDM-CDMA system with BPSK modulation. Scrambled Walsh codes with a spreading factor Ns = 16 are employed. The choice of Nc = 16 subcarriers results in a mapping of one information bit onto one OFDM symbol. Moreover, a 4-path Rayleigh fading channel is used requiring a guard interval of length Lt − 1 = 3 samples. The Eb /N0 loss due to the insertion of the cyclic prefix has not been considered because it is identical for all equalization schemes. As explained earlier, the frequency selectivity of the channel destroys the Walsh codes’ orthogonality and MUI disturbs the transmission. For a load of β = 1/2, we see from
BER →
10
10
10
10
a) Nu = 8, β = 1/2
0
b) Nu = 16, β = 1
0
10
−1
−1
10
BER →
10
−2
−3
−4
0
Nu = 1 MRC ORC EGC MMSE 5 10 15 Eb /N0 in dB →
−2
10
−3
10
−4
20
10
0
Nu = 1 MRC ORC EGC MMSE 5 10 15 Eb /N0 in dB →
20
Figure 4.25 Error rate performance of OFDM-CDMA system with Gp = 16 and different equalization techniques for a 4-path Rayleigh fading channel a) Nu = 8 active users, b) Nu = 16 active users
206
CODE DIVISION MULTIPLE ACCESS
BER →
10
10
10
10
a) Eb /N0 = 8 dB
0
b) Eb /N0 = 12 dB
0
10
−1
−1
10
BER →
10
−2
Nu = 1 MRC ORC EGC MMSE
−3
−4
4
8
12 Nu →
−2
10
−3
10
−4
16
10
4
8
12 Nu →
16
Figure 4.26 Error rate performance of OFDM-CDMA system with Gp = 16 and different equalization techniques for a 4-path Rayleigh fading channel a) Eb /N0 = 8 dB, b) Eb /N0 = 12 dB Figure 4.25a that the MMSE approach performs best over the whole range of SNRs. EGC comes very close to the MMSE solution at low and medium SNRs, but loses up to 3 dB for high SNRs. MRC performs much worse except in the low SNR regime where the background noise dominates the system reliability. In this area, ORC represents the worst approach; it can outperform MRC only for SNRs larger than 14 dB due to the noise amplification. None of the equalizing schemes can reach the single-user bound (SUB) that represents the achievable error rate in the absence of interference. Figure 4.25b depicts the results for Nu = 16, that is, a fully loaded system with β = 1. The advantage of the MMSE solution becomes larger. Especially, EGC loses a lot and is even outperformed by ORC at high SNRs. The higher the load, the better is the performance of ORC compared to EGC and MRC because interference becomes the dominating penalty. As will be shown in Section 5.2, linear multiuser detection schemes are not able to reach the SUB for high load. The discussed effects are confirmed in Figure 4.26 where the bit error rate is depicted versus the number of users. First, we recognize that ORC is independent of the load β since the whole interference is suppressed. Different SNRs just lead to a vertical shift of the curve (cf. Figs 4.26a and b). Moreover, ORC outperforms MRC and EGC for high loads and SNRs. MMSE equalization shows the best performance except for very low loads. In that region, EGC and, especially, MRC show a better performance because the interference power is low and optimizing the SNR ensures the best performance. Figure 4.27 points out another interesting aspect that holds for single-carrier CDMA systems also. Since the frequency selectivity destroys the orthogonality of spreading codes, there exists a rivalry between diversity and MUI. The trade-off depends on the kind of equalization that is applied. For the MMSE equalizer, the diversity gain dominates and the error rate performance is improved for growing Lt . On the contrary, the MUI conceals the diversity effect for EGC and performance degrades for increasing Lt .
CODE DIVISION MULTIPLE ACCESS
207
0
10
−1
EGC
BER →
10
−2
10
−3
10
−4
10
0
Lt Lt Lt Lt
=2 =3 =4 =8 5
MMSE 10 15 Eb /N0 in dB →
20
Figure 4.27 Error rate performance of OFDM-CDMA system with Gp = 16 and Nu = 16 for EGC and MMSE equalization and Lt -path Rayleigh fading channels Quasi-Synchronous Uplink Transmission With respect to the uplink, an equalization is not as easy because each user is affected by an individual channel. For simplicity, we assume a coarse synchronization ensuring that the maximum delay κ between two users is limited to the length Ng of the guard interval minus the maximum channel delay κmax . κ ≤ Ng − κmax
(4.67)
In this case, a block-oriented processing is possible and a single FFT block can transform the OFDM symbols of all users simultaneously into the frequency domain. Hence, the signature matrix S[l] becomes (4.68) S[l] = s1 [l] · · · sNu [l] with su [l] = diag Hu [l, 0] · · · Hu [l, Nc − 1] · cu [l]. The signature of a user is obtained by multiplying the coefficients of the channel transfer function element-wise with the chips of the spreading code. The data vector a[l] is defined as described in (4.54). The simple MF provides the sufficient statistics, that is, we do not lose any information and an optimum overall processing is still possible. Hence, despreading with MRC has to be applied, resulting in r[l] = EMRC [l] · y[l] = SH [l] · y[l] = SH [l] · S[l] · a[l] + SH [l] · n[l].
(4.69)
Owing to the nondiagonal structure of SH [l] · S[l], MUI degrades the system performance. This is confirmed by the results shown in Figure 4.28. With growing β, error floors occur so that a reliable uncoded transmission is not possible for loads larger than 0.5. The larger β, the smaller is the influence of the background noise as depicted in Figure 4.28. Concluding, we can state that OFDM represents a pretty good technique for synchronous downlink transmissions while the discussed benefits cannot be exploited in the uplink. Here,
208
CODE DIVISION MULTIPLE ACCESS a)
0
10
−1
−1
10
−2
10
−3
10
−4
10
−5
0
β = 1/16 β = 0.5 β=1 β = 1.5 β=2 SUB 4 8 12 16 Eb /N0 in dB →
BER →
BER →
10
10
b)
0
10
−2
10
−3
10
−4
10
−5
20
10
0
0.5
1 β→
0 dB 5 dB 10 dB 15 dB 20 dB 1.5 2
Figure 4.28 Error rate performance of OFDM-CDMA uplink with Gp = 16, BPSK, and 4-path Rayleigh fading channels
each signal experiences its own channel so that a common equalization is not possible. Moreover, different carrier frequency offsets between transmitter and receiver pairs destroy even the orthogonality between subcarriers of the same user and cause ICI. Therefore, more sophisticated detection algorithms as presented in Chapter 5 are required.
4.3 Low-Rate Channel Coding in CDMA Systems The previous sections illustrated that MUI dramatically degrades the system performance. Using OFDM-CDMA in a downlink transmission allows an appropriate equalization that suppresses the interference efficiently. However, this is not possible in an asynchronous uplink transmission. One possibility is the application of multiuser detection techniques that exploit the interference’s structure and are discussed in Chapter 5. Alternatively, we can interpret the interference as additional AWGN. This assumption is approximately fulfilled for a large number of users according to the central limit theorem. It is well-known that noise can be combated best by strong error-correcting codes. One important feature of CDMA systems is the inherent spectral spreading, already depicted in Figures 4.1 and 4.2. As shown in Figure 4.29, this spreading can also be described from Figure 4.24 as simply repeating each symbol a[] Ns times and subsequent scrambling with a user-specific sequence c[, k] (Dekorsy 2000; Dekorsy et al. 2003; Frenger et al. 1998a; K¨uhn et al. 2000a,b; Viterbi 1990). Scrambling means that the repeated data stream is symbol-wise multiplied with the user-specific sequence without spectral spreading. Therefore, an ‘uncoded’ CDMA system with DS spreading can also be interpreted as a system with a scrambled repetition code of low rate 1/Ns . The block matched filter in Figure 4.29 may describe the OFDM equalizers discussed in Subsection 4.2.2 (Figure 4.24) or a Rake receiver as depicted in Figure 4.4 excluding the summation over Ns chips after the multiplication with c[, k]. The summation itself
CODE DIVISION MULTIPLE ACCESS
209 MUI+n[k]
c[, k] a[] repetition encoder
k
y[k]
h[k, κ]
c[, k]
matched filter
repetition decoder
a[] ˆ
super channel Ns
k
Figure 4.29 Illustration of direct-sequence spreading as repetition coding and scrambling
is common to OFDM-CDMA and single-carrier CDMA systems and is carried out by the repetition decoder. If the repetition is counted among the channel-coding parts of a communication system, only scrambling remains a CDMA-specific task and the system part between channel encoder and decoder depicted in Figure 4.29 can be regarded as a user-specific time-discrete super channel. However, repetition codes are known to have very poor error-correcting capabilities regarding their very low code rate. Hence, the task is to replace them with more powerful low-rate FEC codes that perform well at very low SNRs. This book does not claim to present the best code suited to this problem. In fact, some important aspects concerning the code design are illuminated and the performances of four different coding schemes are compared. Specifically, we look at traditional convolutionally encoded systems in which the rates of convolutional and repetition code are exchanged, a code-spread system, and serial as well as parallel code concatenations. The performance evaluation was carried out for an OFDM-CDMA uplink with Nc = 64 subcarriers and a 4-path Rayleigh fading channel with uniform power delay profile.7 Successive channel impulse responses are statistically independent, that is, perfect interleaving in the time domain is assumed. For notational simplicity, we restrict the analysis on BPSK although a generalization to multilevel modulation schemes is straightforward. In the next four subsections, the error rate performance of each coding scheme is analyzed for the single-user case. In Subsection 4.3.5, all schemes are finally compared in multiuser scenarios.
4.3.1 Conventional Coding Scheme (CCS) The first approach abbreviated as CCS does not change the classical DS spreading and can be interpreted as a concatenation of convolutional code and repetition code. It is illustrated in Figure 4.30. The convolutional code is described by its constraint length Lc and the code rate Rccc = 1/n. Subsequent repetition encoding with rate Rcrc = 1/Ns = n/Gp ensures a constant processing gain Gp = Rc−1 = (Rccc · Rcrc )−1 . The influence of different convolutional codes is illuminated by choosing different combinations of Rccc and Rcrc while their product remains constant. The employed convolutional codes are summarized in Table 4.2. They have been found by a nested code search (Frenger et al. 1998b) and represent codes with maximum free distance and minimum number of sequences with weight df 7 Similar results can be obtained for single-carrier CDMA systems. The differences concern only the path crosstalk of the Rake receiver and the Eb /N0 -loss due to the cyclic prefix for OFDM-CDMA.
210
CODE DIVISION MULTIPLE ACCESS c[l, k] d[i]
˜ convolutional b[l] code Rccc =
Rcrc =
1 n
b[k]
repetition code
CCS
1 Ns
=
BPSK
a[k]
x[k]
n Gp
Figure 4.30 Conventional coding scheme (CCS) consisting of outer convolutional code, interleaver, and inner repetition code
Table 4.2 Parameters of coding schemes for OFDMCDMA system with processing gain Gp = Rc−1 = 64 Lc
Rccc
generators
Rcrc
df
CCS 2
7
1/2
1338 , 1718
1/32
10
CCS 4
7
1/4
1178 , 1278 , 1558 , 1718
1/16
20
CCS 8
7
1/8
1178 , 1278 , 1558 , 1718
1/8
40
1
320
1358 , 1738 , 1358 , 1458 CSS
7
1/64
(Frenger et al. 1998b)
We see from Figure 4.31 that the performance can be improved by decreasing the code rate. The largest gains are obtained by changing from Rccc = 1/2 to Rccc = 1/4 while a further reduction of Rccc leads only to minor improvements. The reason is that convolutional codes of very low rate incorporate a repetition of parity bits as well. The contribution of repeated bits becomes larger for decreasing constraint lengths and code rates. Therefore, no large gains can be expected for extremely low-rate convolutional codes. This is confirmed by the free distances summarized in Table 4.2, which grow in the same way as Rc is reduced.
4.3.2 Code-Spread Scheme (CSS) Reducing Rccc to the minimum value of Rc = 1/Gp results in a single very low-rate convolutional code and the repetition code is discarded. The corresponding structure of the transmitter is depicted in Figure 4.32. The convolutional encoder already performs the entire spreading so that the coded sequence is directly scrambled with the user-specific sequence. Many ideas of the so-called code-spreading are encapsulated in Viterbi (1990). In (Frenger et al. 1998b) an enormous number of low-rate convolutional codes found by computer search are listed. These codes have a maximum free distance df and a minimum number of sequences with weight df . However, the obtained codes also include a kind of unequal repetition code, that is, different bits of a code word are repeated unequally (Frenger et al. 1998b). Therefore, the performance of CSS is comparable to that of CCSs, as the results in Figure 4.31 show.
CODE DIVISION MULTIPLE ACCESS
211
0
10
CCS 2 CCS 4 CCS 8 CSS
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
2 4 Eb /N0 in dB →
6
Figure 4.31 Performance of single-user OFDM-CDMA system with Nc = 64 subcarriers, 4-path Rayleigh fading and different convolutional codes from Table 4.2 c[l, k] d[i]
convolutional code Rccc =
1 Gp
b[k]
BPSK
a[k]
x[k]
CSS
Figure 4.32 CSS consisting of single low-rate convolutional code
4.3.3 Serially Concatenated Coding Scheme (SCCS) Instead of reducing the code rate of the convolutional code, we know from Section 3.6 that parallel and serial concatenations of very simple component codes lead to extremely powerful codes. Hence, the inner repetition code should be at least partly replaced by a stronger code. With regard to the serial concatenation, we know from Section 3.6 that the inner code should be a recursive convolutional code in order to exploit the benefits of large interleavers (Benedetto et al. 1996). In the following part, two different concatenated coding schemes are considered: a serial concatenation of two convolutional codes serial concatenated convolutional code (SCCC) and a serial concatenation of an outer convolutional code, and an inner Walsh code (SCCW) (Dekorsy et al. 1999a,b). The latter scheme is used in the uplink of IS95 (Gilhousen et al. 1991; Salmasi and Gilhousen 1991) where Walsh codes are employed as an orthogonal modulation scheme allowing a simple noncoherent demodulation. Although Walsh codes are not recursive convolutional codes, they offer the advantage of a small code rate (large spreading) and low computational decoding costs even for soft-output decoding (see Fast Hadamard Transform in Subsection 3.4.5).
212
CODE DIVISION MULTIPLE ACCESS c[l, k]
d[i]
b1 [l]
conv. code Rccc =
1 n
SCCS
inner code
b2 [l ]
Rcinner
b[k]
rep. code Rcrc =
BPSK
a[k]
x[k]
1 Ns
Figure 4.33 Structure of serially concatenated coding scheme (SCCS) La (bˆ1 [l])
rep. dec.
L(bˆ2 [l ])
inner dec.
−1
Le (bˆ1 [l])
convolutional decoder
ˆ d[i]
Figure 4.34 Decoder structure of serially concatenated coding scheme (SCCS) Figure 4.33 shows the structure of the SCCS. The outer convolutional encoder is followed by an interleaver and an inner code that can be chosen as described above. The final repetition code may be necessary to ensure a constant processing gain. Since we are not interested in interleaver design for concatenated codes, we simply use random interleavers as described in Chapter 3 and vary only the length Lπ . The corresponding decoder structure is shown in Figure 4.34. First, the received signal is equalized in the frequency domain according to the MRC principle including the descrambling.8 Next, an integrate-and-dump filter decodes the repetition code and delivers the log-likelihood ratios (LLRs) L(bˆ2 [l ]). Now, the iterative decoding process starts with the inner soft-in soft-out decoder. The extrinsic part Le (bˆ1 [l]) of its output is deinterleaved and fed to the outer soft-output convolutional decoder. Again, extrinsic information is extracted and fed back as a priori information La (bˆ1 [l]) to the inner decoder. This iterative turbo processing is carried out several times until convergence is obtained (cf. Section 3.6). Owing to the high number of parameters, we fix the code rate of the outer convolutional code to Rccc = 1/2. Hence, introducing the inner code affects only the repetition code whose code rate Rcrc increases in the same way as Rcinner decreases (see. Table 4.3). Although theoretical analysis tells us that the minimum distance of the outer code should be as large as possible (see page 138), the iterative decoding process benefits from a stronger inner code. This is confirmed by simulation results showing that lower rates of the outer convolutional code, for example, Rccc = 1/6, coming along with higher rates of the inner codes, for example, Rcrc = 1, lead to a significant performance loss. The interleaver between the outer convolutional and the inner encoder is a randomly chosen interleaver of length N = 600 or N = 6000.9 8 In single-carrier CDMA, this corresponds to the Rake receiver of Figure 4.4 excluding the summation over Ns chips after the multiplication with c[, k]. 9 The shorter interleaver may be suited for full duplex speech transmission, while the longer one is restricted to data transmission with weaker delay constraints.
CODE DIVISION MULTIPLE ACCESS
213
Table 4.3 Main parameters of serially concatenated coding schemes (feedback polynomial of recursive convolutional encoders indicated by superscript r) outer NSC code, Rccc = 1/2
inner code
rep. code
g1 = 78 , g2 = 58
Walsh, Rcwh = 4/16
Rcrc = 1/8
SCCW 2
g1 = 78 , g2 = 58
Rcrc = 1/3
SCCW 3
g1 = 78 , g2 = 58
Walsh, Rcwh = 6/64 Walsh, Rcwh = 8/256 Walsh, Rcwh = 6/64 Walsh, Rcwh = 6/64 g1 = 78 , g2r = 58
SCCW 1
SCCW 4
g1 = 238 , g2 = 558
SCCW 5
g1 = 1338 , g2 = 1718
SCCC 1
g1 = 78 , g2 = 58
SCCC 2
g1 = 78 , g2 = 58
g1 = 238 ,
g2 = 278
= 358 ,
g4 = 378
g3r
Rcrc = 1/3 Rcrc = 1/3 Rcrc = 1/16 Rcrc = 1/8
Figure 4.35 shows the error rate performance of the concatenation of an outer half-rate convolutional code with Lc = 3 and different inner Walsh codes (SCCW) and interleaver lengths Lπ . We observe that the weakest (shortest) Walsh codes (SCCW 1) perform better for low SNR, that is, the iterative process converges earlier. For medium SNR, the SCCW 2 system with M = 64 represents the best choice and for high SNR, the code with M = 256 (SCCW 3) shows the best asymptotical performance. Moreover, increasing the interleaver length from Lπ = 600 to Lπ = 6000 leads to improvements of 0.5 dB for SCCW 1, 0.7 dB for SCCW 2, and 1 dB for SCCW 3. Compared to a single convolutional code with Lc = 7, the SCCWs perform better for medium and high SNR, but not for extremely low SNR. However, the low SNR regime is exactly the working point for high MUI. This region will a) Lπ = 600
0
10
SCCW 1 SCCW 2 SCCW 3 CCS 8
−1
10
−2
10
−2
10
−3
10
−4
10
−5
−4
−5
10
−6
0
−3
10 10
10 10
SCCW 1 SCCW 2 SCCW 3 CCS 8
−1
BER →
BER →
10
b) Lπ = 6000
0
10
−6
1
2 3 4 5 Eb /N0 in dB →
6
10
0
1
2 3 4 5 Eb /N0 in dB →
6
Figure 4.35 Performance of SCCW systems with Lc = 3 convolutional code for different Walsh codes and interleaver lengths, 10 decoding iterations
214
CODE DIVISION MULTIPLE ACCESS a) Lπ = 600
0
10
SCCW 2 SCCW 4 SCCW 5 CCS 8
−1
10
−2
−2
10
−3
10
−4
10
−5
−4
−5
10
−6
0
−3
10 10
10 10
SCCW 2 SCCW 4 SCCW 5 CCS 8
−1
10
BER →
BER →
10
b) Lπ = 6000
0
10
−6
1
2 3 4 5 Eb /N0 in dB →
6
10
0
1
2 3 4 5 Eb /N0 in dB →
6
Figure 4.36 Performance of SCCW system with M = 64 Walsh code for different convolutional codes and interleaver lengths, 10 decoding iterations be of special interest in Chapter 5 where we consider multiuser detection techniques that include channel coding. We now choose the M = 64 Walsh code as the inner code and vary the constraint length of the outer convolutional code. Obviously, the SCCW 2 system with Lc = 3 performs best over a wide range of BERs as can be seen from Figure 4.36a. Only asymptotically, SCCW 4 and SCCW 5 can benefit from their stronger outer convolutional codes. A comparison of Figs 4.36a and 4.36b illustrates that the larger the interleaver, the steeper is the slope of the curves in the waterfall region and the clearer becomes the asymptotic advantage. Since the decoding complexity is much lower for SCCW 2, this scheme is our favorite among the tested concatenations. Next, we compare the SCCW 2 scheme with two serially concatenated convolutional codes also listed in Table 4.3. The inner code is now a recursive systematic convolutional code. From Figure 4.37a we recognize that SCCW 2 performs better, down to error rates of 10−6 . A stronger inner convolutional code in SCCC 2 cannot increase the performance of the iterative decoding process. Naturally, the code rate of SCCW 2 is much lower than for the SCCC approaches, but since we anyway spread the signals by a fixed processing gain, this is no disadvantage. Hence, low-rate coding in CDMA systems can be efficiently accomplished by serially concatenating an outer convolutional code with an inner Walsh code. For very low SNR, the single convolutional code CCS 8 still shows the best performance.
4.3.4 Parallel Concatenated Coding Scheme (PCCS) Extremely low-rate codes for spread spectrum applications were introduced in (Viterbi 1995). The key idea behind these super-orthogonal codes is to incorporate low-rate Walsh codes into the structure of a convolutional encoder. Figure 4.29 shows an example using a recursive systematic convolutional (RSC) code with constraint length Lc = 5. The inner Lc − 2 = 3 register elements are fed to the Walsh encoder of rate Rcwh = (Lc − 2)/2Lc −2 = 3/8. The
CODE DIVISION MULTIPLE ACCESS a) Lπ = 600
0
10
−2
−2
10
−3
10
−4
10
−5
−4
−5
10
−6
0
−3
10 10
10 10
SCCC 1 SCCC 2 SCCW 2 CCS 8
−1
10
BER →
10
b) Lπ = 6000
0
10
SCCC 1 SCCC 2 SCCW 2 CCS 8
−1
10
BER →
215
−6
1
2 3 4 5 Eb /N0 in dB →
6
10
0
1
2 3 4 5 Eb /N0 in dB →
6
Figure 4.37 Performance of SCCC system with different convolutional codes and interleaver lengths, 10 decoding iterations bits a[i] and a[i − Lc − 1] are added element-wise to the Walsh coded bits bwh [l]. This ensures that not only the original Walsh codewords but their binary complements are also valid code words, and branches in the trellis leaving the same state are assigned to antipodal code words. The entire code rate depends on the constraint length of the convolutional code and amounts to 1 1 Rcso = L −2 = (4.70) c 2 8 because each information bit at the encoder input corresponds to n = 2Lc −2 output bits. Naturally, super-orthogonal codes can also be used as constituent codes in a concatenated coding scheme (van Wyk and Linde 1998). In fact, we are looking at a PCCS according to Figure 3.19 using two super-orthogonal codes as depicted in Figure 4.38. For each information bit, two code words each of length n = 8 are generated, yielding a total code rate pccs of the concatenated scheme of Rc = 1/16. Hence, a repetition code with rate Rcrc = 1/4 is necessary to obtain a desired processing gain of Gp = 64. At the receiver, appropriate turbo decoding has to be performed. The well-known BahlCocke-Jelinek-Raviv (BCJR) algorithm described in Section 3.4.4 has to be extended for super-orthogonal codes. Essentially, we need an incremental metric for each branch in the trellis comparing the hypothesis with the received codeword y[i]. This metric can be obtained by performing a fast Hadamard transform of y[i] delivering after appropriate scaling a LLR for each possible Walsh codeword. These LLRs are now used as incremental metrics in the BCJR algorithm. The results obtained for the above-described super-orthogonal code and the different interleaver sizes are shown in Figure 4.39 for a perfectly interleaved 4-path Rayleigh fading channel, a processing gain of Gp = 64, and BPSK modulation. Note that frame lengths of 600 bits and 6000 bits for interleaver sizes Lπ = 300 and Lπ = 3000, respectively, are the same as for the SCCS systems because only information bits are permuted in a parallel
216
CODE DIVISION MULTIPLE ACCESS
d[i]
a[i]
g1,1 = 0
D
g2,0 = 1
D g2,1 = 1
g1,2 = 0
g1,3 = 1 g1,4 = 1
D g2,2 = 1
D g2,3 = 1 g2,4 = 1
Walsh encoder bwh [l] b[l] Figure 4.38 Encoder structure of super-orthogonal convolutional codes with Lc = 5 (Viterbi 1995) 0
10
CCS 8 Lπ = 300 Lπ = 3000
−1
10
−2
BER →
10
−3
10
−4
10
−5
10
−6
10
0
1
2 3 4 Eb /N0 in dB →
5
6
Figure 4.39 Performance of parallel concatenated super-orthogonal codes for Nu = 1, Gp = 64 and different interleaver lengths, 10 decoding iterations concatenation. Obviously, PCCS outperforms the conventional convolutional code with rate Rccc = 1/8. At a BER of 10−3 , the gains amount to 1.5 dB and 2.2 dB while they increase up to 3.5 B and more than 5 dB for 10−6 . Compared to the serial code concatenations depicted in Figure 4.37, the performance can be improved by approximately 1 dB for the smaller interleaver and by more than 1.5 dB for the larger interleaver at 10−6 .
4.3.5 Influence of MUI on Coding Schemes Finally, we have to analyze the behavior of the coding schemes under the influence of severe MUI. Regardless of the specific coding scheme, it has to be recalled that the processing gain Gp defined in (4.2) comprises the spreading factor Ns as well as the code rate Rc , that is, Gp = Ns /Rc describes the entire spreading including the FEC code. Since we exchange the contribution of channel coding and spreading while keeping Gp constant, the system load β = Nu /Ns defined in (4.16) varies although Nu and the entire bandwidth are kept
CODE DIVISION MULTIPLE ACCESS
BER →
10 10 10 10 10
a) η = 0.5
0
b) 10 log10 (Eb /N0 ) = 4 dB
0
10
−1
−1
10
−2
BER →
10
217
−3
−4
−5
0
CCS 2 CCS 4 CCS 8 UMTS CSS 2 4 6 8 10 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
12
10
0
10
20 30 Nu →
CCS 2 CCS 4 CCS 8 UMTS CSS 40 50
Figure 4.40 Performance of OFDM-CDMA system with Nc = 64 subcarriers, 4-path Rayleigh fading and different convolutional codes from Table 4.2 constant. For some of the mentioned coding schemes, even spreading and coding cannot be distinguished anymore. Therefore, we will base a comparison on the spectral efficiency defined in (4.17) instead of the system load. Figure 4.40a compares the same coding schemes as Figure 4.31, but now for a spectral efficiency of η = Nu /Gp = 32/64 = 1/2 instead of the single-user case. Note that half-rate coded TDMA or FDMA systems as employed in Global System for Mobile telecommunications (GSM) (Mouly and Pautet 1992) can reach at most η = 1/2 if all time and frequency slots are occupied. Since they represent narrow-band systems without spectral spreading, low-rate coding cannot be applied. The ranking of the coding schemes is qualitatively the same as in the single-user case. We recognize that a BER of 10−3 can only be achieved for the low-rate convolutional codes with Rccc ≤ 1/4. The half-rate code cannot reach this error rate. Lower error probabilities as required for data services cannot be supported for η = 1/2. Figure 4.40b illustrates the performances for 10 log10 (Eb /N0 ) = 4 dB versus the number of active users. For this SNR and error rates below 10−3 , the convolutional code of rate Rccc = 1/2 can only support four users, while the codes with lower rates support up to 11 users. This corresponds to spectral efficiencies of η = 6.25 · 10−2 and η = 0.172, respectively. For higher efficiencies, larger SNRs are required to meet this error rate constraint. Convolutional codes with higher constraint length as used in UMTS (Holma and Toskala 2004) can slightly improve the performance. In the UMTS uplink, a code with Lc = 9 and rate Rccc = 1/3 is employed. Its performance is also depicted in Figure 4.40 with the label ‘UMTS’. Obviously, it performs better for low efficiencies, or equivalently at high SNRs. For Nu > 10, it performs worse than CCS 8 and CSS, and for Nu > 24, even CCS 4 is better. Concluding, none of the coding schemes described so far is able to reach a target error rate of 10−3 for low SINRs. Next, we look at the introduced concatenated coding schemes. A comparison with CCS 8 in Figure 4.41a shows that all depicted schemes can reach a BER of 10−3 while
218
CODE DIVISION MULTIPLE ACCESS pccs
10
BER →
10 10 10 10 10
pccs
= 300
10
SCCW 2 SCCC 1 PCCS CCS 8
−1
−2
−2
10
−4
= 3000
SCCW 2 SCCC 1 PCCS CCS 8
−1
10
−3
−3
10
−4
10
−5
−5
10
−6
0
= 6000, Lπ b) Lsccs π
0
BER →
10
a) Lsccs = 600, Lπ π
0
−6
4
8 12 16 Eb /N0 in dB →
20
10
0
4
8 12 16 Eb /N0 in dB →
20
Figure 4.41 Performance of concatenated coding schemes for different interleaver lengths, Nu = 32 active users, Gp = 64 and 10 decoding iterations much lower error rates can be ensured only by concatenated schemes. SCCC 1 shows the same performance as the conventional convolutional code, the Walsh coded system SCCW 2 gains about 3 dB compared to them. Parallel concatenated super-orthogonal codes gain additionally 2 dB. At 10−5 , the differences become even larger, that is, SCCW 2 outperforms SCCC1 by 5 dB and PCCS gains 4 dB compared to SCCW 2. All schemes show an error floor starting roughly at 10−6 . For the larger interleaver, Figure 4.41b illustrates that all concatenated schemes perform better, especially their error floors move out of the visible area. The parallel concatenated super-orthogonal codes still show the best performance and gain approximately 4 dB compared to the SCCS. For SNRs larger than 8 dB, SCCC 1 performs better than SCCW 2 while the latter is superior between 4 and 8 dB. Below 2 dB, the conventional convolutional code represents the best choice. A different visualization in Figure 4.42 depicts the error rate performance versus the number of active users Nu for a SNR of 3 dB. The dramatic performance degradation due to MUI becomes obvious. For this Eb /N0 value, Lπ = 600 and a target bit error rate of 10−3 , the conventional convolutional code and the SCCC 1 scheme can support only up to six users, while SCCW 2 and PCCS support up to 12 and 20 users, respectively. For the larger interleaver, the Walsh coded system reaches 20 users, SCCC 1 only 13, and PCCS 30 users. For higher SNRs not depicted here, the relations change and SCCC 1 outperforms SCCW 2. Owing to the waterfall region of concatenated coding schemes, the performance degrades very rapidly with increasing system load while the degradation is rather smooth for the convolutional code. Hence, there exists an area of very low SNR or very high load where conventional convolutional codes outperform concatenated schemes. Although these areas correspond to high error rates that will generally not satisfy certain QoS constraints, they represent the starting point of iterative interference cancellation approaches discussed in Chapter 5. Therefore, we may expect that iterative interference cancellation incorporating FEC decoders converge earlier for convolutional codes than for concatenated codes.
CODE DIVISION MULTIPLE ACCESS pccs
10
BER →
10 10 10 10 10
pccs
= 300
0
= 6000, Lπ b) Lsccs π
= 3000
10
−1
−1
10
−2
−2
10
BER →
10
a) Lsccs = 600, Lπ π
0
219
−3
−4
−5
−6
1
10
20 Nu →
SCCW 2 SCCC 1 PCCS CCS 8 30 40
−3
10
−4
10
−5
10
−6
10
1
10
20 Nu →
30
40
Figure 4.42 Performance of concatenated coding schemes for different interleaver lengths, Eb /N0 = 3 dB, Gp = 64 and 10 decoding iterations
4.4 Uplink Capacity of CDMA Systems The last section showed that strong error control coding can provide good performance even for highly loaded systems. However, it is not yet known if the sole employment of good codes is the best choice for communications in interference-limited environments. Hence, we compare the capacity of CDMA systems deploying optimal detectors and delivering the maximal spectral efficiency with that of systems using only linear receivers. For the sake of simplicity, we consider the uplink of a synchronous OFDM-CDMA system with real Gaussian inputs and binary spreading codes. Results are presented for an AWGN channel and a 4-path Rayleigh fading channel with uniform power distribution. Looking at a single cell environment, the l-th received symbol consists of Ns chips and can be described by y[l] = S[l] · a[l] + n[l] with S[l] = s1 [l] · · · sNu [l] containing the signatures su [l], 1 ≤ u ≤ Nu , of all users in its columns. The capacity C(S) depends on the system matrix S and the user-specific SNRs. It represents the total number of information bits that can be reliably transmitted per Ns chips and has to be shared among the active users in this cell. The ergodic capacity is obtained by calculating the expectation C¯ = E{C(S)} with respect to the multivariate process S. Assuming an asymptotically symmetric situation where all users have identical conditions, the average capacity per user is obtained by E C(S) C¯ Cu = = . (4.71) Nu Nu The division of C¯ by the spreading factor Ns delivers the spectral efficiency η=
C¯ = β · Cu Ns
(4.72)
220
CODE DIVISION MULTIPLE ACCESS
already defined in (4.17). It describes the average number of information bits transmitted per chip and is measured in bits/chip. While (4.72) assumes a perfect coding scheme ensuring an error-free transmission, (4.17) considers a practical code and is always related to a certain target error rate. The definitions can be transferred into each other by replacing Cu with the code rate Rc or, equivalently, C¯ with Rc Nu . For small β, only few users are active and the spectral efficiency η of the system will be low because the large bandwidth is not efficiently used. On the contrary, many users will decrease Cu because the cell capacity C¯ is fixed and has to be shared.
4.4.1 Orthogonal Spreading Codes We start our analysis with orthogonal spreading codes that can be employed for synchronous transmission in frequency-nonselective environments. Therefore, no MUI disturbs the transmission, resulting in Nu independently transmitted parallel data streams. For real Gaussian distributed inputs, each of these streams experiences an AWGN channel with a user-specific capacity that equals exactly the expression given in (2.54) & ' Es 1 orth. . Cu = · log2 1 + 2 2 N0 The capacity depends only on the SNR. The spectral efficiency & ' Es β ηorth. = β · Cuorth. = · log2 1 + 2 for 0 ≤ β ≤ 1 2 N0
(4.73)
grows linearly up to β = 1 for fixed SNR. At this load, all orthogonal binary spreading codes are occupied. For β > 1, so-called Welch-bounded sequences have to be employed (Rupf and Massey 1994) with which η stays at a constant level depending on the actual SNR (Verdu and Shamai 1999). It has to be mentioned that the SUMF is the optimum receiver for orthogonal spreading with β ≤ 1 while random codes require much higher computational costs for optimum detection.
4.4.2 Random Spreading Codes and Optimum Receiver For random spreading codes, the optimum receiver performs a joint maximum likelihood decoding (see Chapter 5). Considering the uplink, the mobile units transmit independently from each other, that is, there is no cooperation among them. Hence, we have to apply (2.82) which becomes for a real-valued transmission & ' & ' r r Es Eb 1 1 log2 1 + 2λν · log2 1 + 2λν · C(S) · = · . C(S) = · 2 N0 2 N0 ν=1
ν=1
The eigenvalues λν belong to the matrix S[l] · SH [l]. Similar to Chapter 2, we can calculate ergodic and outage capacities. Analytical expressions for the eigenvalue distribution can only be obtained with a large system analysis where Ns and Nu tend to infinity while their ratio β = Nu /Ns is constant. Instead, we calculate the ergodic capacities by choosing an appropriate number of system matrices S, perform an eigenvalue analysis, calculate the instantaneous capacities according to (2.82), and average them. User-specific capacities and spectral efficiencies are obtained by applying (4.71) and (4.72).
CODE DIVISION MULTIPLE ACCESS
221 b) spectral efficiency η
a) capacity per user Cu
Cu →
3
2
β β β β β β β
4
= 1/8 = 1/4 = 1/2 = 3/4 =1 = 1.5 =2
3
η→
4
1
0 −5
2
β β β β β β β
= 1/8 = 1/4 = 1/2 = 3/4 =1 = 1.5 =2
1
0 5 10 Eb /N0 in dB →
15
0 −5
0 5 10 Eb /N0 in dB →
15
Figure 4.43 Ergodic capacity/spectral efficiency of DS-CDMA system with Ns = 64 and AWGN channel (bold lines: orthogonal spreading, normal lines: random spreading) The results in Figure 4.43a illustrate the user-specific capacities Cu for an AWGN channel and a spreading factor Ns = 64. Normal lines correspond to random binary spreading codes while bold lines represent the optimum capacity for orthogonal spreading. In the latter case, no interference disturbs the transmission and Cuorth. does not depend on the load for β ≤ 1. This leads to an upper bound as can be seen from the curve with circles in Figure 4.43a. For β > 1, Cuorth. also degrades because the overall spectral efficiency η shared among all users remains constant leading to Cuorth. = η/β. With regard to random spreading, Cu is largest for few users because the multiple access interference is low. With growing Nu , the interference becomes stronger and the capacity for each user degrades. Figure 4.43b illustrates the overall spectral efficiency versus Eb /N0 . We recognize that η generally increases with growing β because more users are sharing the same medium. This is illustrated by the fact that Cu is approximately halved from 2.5 bits/s/Hz for β = 3/4 down to 1.3 bits/s/Hz for β = 2 at 10 log10 (Eb /N0 ) = 10 dB. Hence, the load grows by a factor 2.67 > 2 and the entire system efficiency increases by a factor 1.4. While these gains are rather large for small β, they reduce for high loads, for example, as β approaches two. The efficiencies ηorth. for orthogonal codes always represent upper bounds for the random spreading case. As can be seen from Figure 4.43b, ηorth. does not grow anymore for β > 1. The above-described behavior is again depicted in Figure 4.44 showing Cu and η versus the load β for different SNRs. While Cu decreases with growing load, the spectral efficiency increases. The curves intersect always for β = 1 because all users are assumed to have the same SNR so that Cu = η · β = η holds at this point. Comparing Figure 4.44a with 4.44b, we recognize that there is nearly no difference between the AWGN channel and OFDMCDMA with a 4-path Rayleigh fading channel and uniform power delay profile if the loss due to the guard interval is neglected.
222
CODE DIVISION MULTIPLE ACCESS b) OFDM-CDMA, 4-path Rayleigh 5 Cu for Eb /N0 = 5 dB Cu for Eb /N0 = 10 dB 4 η for Eb /N0 = 5 dB η for Eb /N0 = 10 dB
a) AWGN channel 5 4
Cu for Eb /N0 = 5 dB Cu for Eb /N0 = 10 dB η for Eb /N0 = 5 dB η for Eb /N0 = 10 dB
3
3
2
2
1
1
0 0
0.5
1 β→
1.5
2
0 0
0.5
1 β→
1.5
2
Figure 4.44 Ergodic capacity/spectral efficiency of DS-CDMA system with random spreading, Ns = 64 and AWGN channel
4.4.3 Random Spreading Codes and Linear Receivers The results described above hold for optimum signal processing at the receiver. However, it has been already explained that optimal solutions like joint maximum likelihood decoding of all user signals are infeasible in practice. Moreover, the orthogonality is generally destroyed by the influence of the mobile radio channel so that the MF performs far from optimum. Hence, in the next chapter, suboptimum strategies that perform some kind of preprocessing for separating the users and individual FEC decoding will be discussed. In this context, we can distinguish linear and nonlinear techniques. We consider first the potential of linear preprocessors in terms of spectral efficiency η versus the load β. Specifically, the MF as well as ZF and MMSE filters with subsequent optimum user-specific FEC decoding are analyzed. The following results are extracted from Verdu and Shamai (1999) where a detailed description of the derivation can be found. Single-User Matched Filter (SUMF) We consider again a simple AWGN channel, real-valued Gaussian distributed input signals, and random spreading codes for all users. In contrast to orthogonal spreading, the singleuser matched filter is not optimum anymore for random spreading codes. In Verdu and Shamai (1999), it is shown that the user-specific capacity can be expressed by & ' 1 2Es /N0 CMF,u = · log2 1 + . (4.74) 2 1 + 2βEs /N0 This result has been obtained by applying a large system analysis where Nu and Ns grow infinitely while their ratio β remains constant. To compare different code rates or spreading factors, we have to find an expression depending on Eb /N0 rather than Es /N0 . If each user encodes with a rate Rc = Cu , Es /N0 = Cu Eb /N0 holds and (4.74) becomes an implicit
CODE DIVISION MULTIPLE ACCESS equation CMF,u =
223
& ' 2Cu Eb /N0 1 1 = · log2 (1 + x). · log2 1 + 2 1 + 2βCu Eb /N0 2
(4.75)
Resolving x=
2Cu Eb /N0 1 + 2βCu Eb /N0
with respect to N0 /Eb and substituting Cu =
· log2 (1 + x) yields & ' 1 N0 = log2 (1 + x) · −β Eb x
(4.76)
1 2
(4.77)
that has to be numerically solved. The value x can be inserted into (4.75) to obtain the CMF,u or the spectral efficiency ηMF = β · CMF,u . Zero-Forcing Receiver (Decorrelator, ZF) Although ZF receiver and MMSE filter will be introduced in Sections 5.2.1 and 5.2.2, their principle performance should be analyzed here. From the large scale analysis (Nu → ∞, Ns → ∞, β constant) in Verdu and Shamai (1999) we obtain & ' Es 1 (4.78) CZF,u = · log2 1 + 2(1 − β) 2 N0 and consequently ηZF
& ' Es β . = · log2 1 + 2(1 − β) 2 N0
(4.79)
Comparing (4.78) with the spectral efficiency for orthogonal codes given in (4.73) we observe that the only difference is an SNR loss depending on the load β. Hence, the decorrelator totally removes the interference at the expense of an amplification of the background noise (see Section 5.2.1). This penalty is expressed by the factor (1 − β) in front of Es /N0 . Minimum Mean Squared Error (MMSE) Filter With reference to the linear MMSE filter, Verdu and Shamai (1999) provides the solution * & '+ Es Es 1 1 CMMSE,u = · log2 1 + 2 − F 2 ,β (4.80) 2 N0 4 N0 with
'2 & √ √ 2 2 a(1 + b) + 1 − a(1 − b) + 1 . F (a, b) =
(4.81)
Again, we can apply the substitutions C = 1/2 · log2 (1 + x) and Es /N0 = CMMSE,u Eb /N0 resulting in the implicit equation ' & Eb Eb 1 x = 2CMMSE,u − F 2CMMSE,u , β N0 4 N0 & ' Eb 1 Eb − F log2 (1 + x) , β (4.82) = log2 (1 + x) N0 4 N0
224
CODE DIVISION MULTIPLE ACCESS b) Eb /N0 = 10 dB
a) Eb /N0 = 5 dB 2.5
η→
2
3
orth. MF ZF MMSE opt.
2.5 2
η→
3
1.5
1.5
1
1
0.5
0.5
0 0
0.5
1 β→
1.5
0 0
2
0.5
1 β→
1.5
2
Figure 4.45 Ergodic spectral efficiency of DS-CDMA system versus load for Eb /N0 = 10 dB and AWGN channel
which has to be solved numerically. The user-specific capacity is finally obtained by CMMSE,u =
1 log2 (1 + x) 2
⇐⇒
ηMMSE =
β log2 (1 + x) 2
(4.83)
The results for the above analyzed receiver concepts are depicted in Figure 4.45. We recognize that the highest efficiency is always obtained for orthogonal spreading codes because no MUI disturbs the transmission and each user experiences a simple AWGN channel. Therefore, the user-specific capacity grows linearly up to a load of β = 1 for fixed Eb /N0 . For β > 1, it stays at a constant level depending on the actual SNR. For random spreading codes, the spectral efficiency of the optimum receiver performing a joint maximum likelihood decoding of all users was determined as described in Subsection 4.4.2. As expected, it shows the best performance of all multiuser detection techniques. Among the linear receivers, the MF shows only a good performance for very low loads where the background noise dominates the transmission. However, its spectral efficiency increases monotonically with growing β. The decorrelator (ZF) shows near-optimum performance up to a load of approximately η = 0.25. The load with the highest spectral efficiency depends on the SNR because the decorrelator suffers severely from the background noise (cf. Figs 4.45a and b). For loads above this optimum, its performance degrades dramatically and reaches ηZF = 0 below β = 1. The MMSE receiver overcomes the drawback of amplifying the noise and shows the best performance of all linear schemes. However, even the MMSE filter shows a maximum spectral efficiency for finite load and degrades beyond this optimum. A large gap remains between linear and nonlinear techniques, especially for the MF even with optimum channel coding. Therefore, the next chapter focuses not only on linear but also on nonlinear multiuser detection strategies that jointly process all user signals.
CODE DIVISION MULTIPLE ACCESS
225
4.5 Summary This chapter has introduced CDMA systems. They incorporate spread spectrum techniques like DS spreading that multiply the signal’s bandwidth. Owing to spectral spreading, a certain robustness against the frequency selectivity of mobile radio channels is achieved. With appropriately designed spreading codes, the MF called Rake receiver represents an efficient and powerful structure. For multiuser scenarios, the detrimental effect of MUI that degrades the system performance remarkably when using a simple Rake receiver has been demonstrated. Moreover, the difference between principles of uplink and downlink transmissions was explained. Furthermore, some families of appropriate spreading codes such as Walsh-Hadamard codes, m-sequences, and Gold codes, as well as their correlation properties have been discussed. In Section 4.2, the combination of orthogonal FDMA (OFDM) and CDMA was derived. Explaining first the principles of OFDM, OFDM-CDMA systems have been analyzed in more detail. Especially, the advantage of a simple channel equalization for the downlink that partially restores the orthogonality of spreading codes illustrated the attractiveness of OFDM-CDMA. Next, low-rate coding in CDMA systems was investigated. Owing to the inherent spreading in CDMA systems, which is mainly done by repetition coding and scrambling, there is a lot of space for powerful low-rate codes that partially replace the repetition code. It was shown that the error rate performance can be significantly improved by choosing appropriate codes even under severe MUI. Especially, the parallel concatenated super-orthogonal codes exhibit an amazing performance. However, the final examination of the capacities clearly illustrated that a single MF even with a capacity achieving error correction code leads to a poor overall system capacity so that powerful multiuser detection strategies will be discussed in the next chapter.
5
Multiuser Detection in CDMA Systems In this chapter, the uplink of a coded DS-CDMA system in which a common base station has to detect all incoming signals is considered. This is a major difference compared to the downlink where the mobile generally knows only its own spreading code and is interested only in its own signal. This requires the application of blind or semiblind multiuser detection (MUD) techniques (Honig and Tsatsanis 2000) that is not the focus of this work. A comparison with the detection algorithms for multilayer space-time transmission (B¨ohnke et al. 2003; Foschini et al. 1999; Golden et al. 1998; W¨ubben et al. 2001) presented in Chapter 6 demonstrates the strong equivalence of CDMA systems and multiple antenna systems when used for multilayer transmission. The next section starts with optimum MUD strategies. Since they are generally infeasible for practical implementations, linear joint detection techniques are derived in Section 5.2. Efficient approximations are obtained with multistage detectors such as iterative parallel and successive interference cancellation (SIC) schemes. A further performance improvement is derived in Section 5.3 by exploiting the discrete nature of the signal alphabet leading to nonlinear iterative MUD algorithms. Finally, the combination of linear joint detection and nonlinear interference cancellation is discussed in Section 5.4. The chapter closes with a summary of the main results.
5.1 Optimum Detection With regard to the uplink, each user maps a word du consisting of k information bits by = b [0] · · · bu appropriate forward error correction (FEC) encoding onto a code word b u u [n − 1] . After subsequent linear phase shift keying (PSK) or quadrature amplitude modulation (QAM) modulation delivering the sequence au , each symbol au [] is spread with a spreading code cu []. Since we look at the generally asynchronous uplink, random spreading codes are assumed. In short code CDMA systems, cu [] is independent of the time index while it varies from symbol to symbol in long code systems. The obtained sequence xu Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
228
MULTIUSER DETECTION IN CDMA SYSTEMS
is transmitted over the individual channel with impulse response hu []. At the receiver, that is, a common base station, all transmitted signals are superimposed and additive white Gaussian noise (AWGN) n is added resulting in y = H · x + n = S · a + n.
(5.1)
In (5.1), H = Th1 , . . . , ThNu represents a concatenation of several user-specific convolution matrices Thu (Kammeyer 2004) that cover all time instants of the considered sequence. T The vector x = xT1 , . . . , xTNu comprises the spread sequences of all users. According to (4.4) on page 176, the convolution of spreading code cu [] and channel impulse response hu [] can be expressed by the signature su []. This leads to the second part of (5.1) where S comprises the signatures of all users and a the modulated symbols (cf. Section 4.1.2).
5.1.1 Optimum Joint Sequence Detection In this section, we assume perfect knowledge of the channel impulse responses and the spreading codes. As already mentioned in Section 1.3, the optimum detector performs a joint maximum a posteriori detection including FEC decoding for all users based on the received sequence y. Mathematically, this can be expressed by ˜ dˆ map = argmax Pr{d˜ | S, y} = argmax pY|d,S ˜ (y) · Pr{d}, d˜
(5.2)
d˜
where the second equality is obtained by applying the Bayes rule and exploiting the fact that pY|S (y) is independent of the hypothesis d˜ and, therefore, does not affect the decision. As already explained in Section 1.3, the hypotheses d˜ are generally uniformly distributed ˜ are not known a priori at the receiver. In these cases, Pr{d} ˜ is not or the probabilities Pr{d} available and the joint maximum likelihood detector has to be applied dˆ mld = argmax pY|d,S ˜ (y).
(5.3)
d˜
The conditional probability density function in (5.3) has the form / 0 1 ˜ H −1 y − S · a(d) ˜ pY|d,S · exp − y − S · a(d) ˜ (y) = N N det(π N N )
(5.4)
˜ denotes the modulated sequence associated with the hypothesis d. ˜ Inserting where a(d) ˜ (5.4) into (5.3), neglecting all terms independent of the hypothesis d, and applying the natural logarithm, we obtain with N N = σN2 I ! # ˜ H y − Sa(d) ˜ y − Sa(d) mld = argmax log exp − dˆ σN2 d˜ 5 5 ˜ 52 . = argmin 5y − S · a(d) (5.5) d˜
From (5.5), we recognize that the joint maximum likelihood detector searches for that hypothesis d˜ which minimizes the squared Euclidian distance between the received sequence ˜ Since d consists of discrete values, gradient-based methods cannot be y and S · a(d).
MULTIUSER DETECTION IN CDMA SYSTEMS
229
applied and an exhaustive search is necessary. Obviously, even for medium-sized systems the required computational complexity is far too high for practical implementations because it grows exponentially with the number of users as well as the lengths of the sequences. Contrary to the joint detection, each user can also be individually detected (Verdu 1998). Individual MAP detection of a certain user u can be expressed by = argmax Pr{d˜ u | S, y} = argmax pY|d˜ u ,S (y) · Pr{d˜ u }. dˆ map u d˜ u
(5.6)
d˜ u
Equivalently to (5.3), the individual ML detector is obtained by dˆ mld u = argmax pY|d˜ u ,S (y).
(5.7)
d˜ u
Individual and joint detections do not always lead to the same result as described in (5.2), which has already been illustrated by the simple example on page 19.
5.1.2 Joint Preprocessing and Subsequent Separate Decoding The first step toward a reduced complexity receiver is to separate FEC decoding and MUD. To exploit as much information as possible, a hard decision multiuser detector must be avoided. Instead, the joint preprocessor should deliver log-likelihood ratios L(bˆu [] | y) of the coded bits bu [] for each user u that can be directly processed by the FEC decoders. For the sake of notational simplicity, we restrict to synchronous single carrier CDMA systems in frequency-nonselective environments and OFDM-CDMA systems for frequencyselective channels assuming a coarse synchronization (K¨uhn 2003) (cf. Section 4.2). In these scenarios, the system matrix S is block diagonal and can be split into several Ns × Nu matrices S[]. Hence, symbol-wise preprocessing is optimal and we omit the time variable . However, the derivation can generally be extended to the asynchronous case. Furthermore, we consider only binary phase shift keying (BPSK) modulation so that no soft-output demodulation is required.1 In this case, bu ∈ {0, 1} and au ∈ {+1, −1} can be used equivalently. The resulting structure is shown in Figure 5.1. Each transmitter consists of an FEC encoder including the BPSK mapping and a user-specific interleaver u (Bruck et al. 2000). Spreading with the codes cu [] and the transmission over the channels hu [] are embraced in the blocks denoted by the signatures su []. Finally, the received signal y[] is obtained by summing up all transmit signals and the background noise n[]. At the receiver, the processing for y = y[] at time instant starts with the calculation of the log-likelihood ratios L(aˆ u | y) = log
pY|au =+1 (y) · Pr{au = +1} Pr{au = +1 | y} = log Pr{au = −1 | y} pY|au =−1 (y) · Pr{au = −1}
(5.8)
= L(y | aˆ u ) + La (au ). 1 In the case of QPSK, a soft-output demodulator demultiplexes inphase and quadrature components into a single data stream. For M-ary modulation schemes with M > 4, appropriate soft-output demodulation algorithms have to be employed.
230
MULTIUSER DETECTION IN CDMA SYSTEMS n[] L(a1 [] | y[])
d1 [i] FEC a1 [] 1 enc.
cc
s1 [] y[]
dNu [i]
aNu []
FEC enc.
Nu
joint preprocessor
−1 1
L(aNu [] | y[]) −1 Nu
sNu []
ˆ FEC d1 [i] dec.
FEC dec.
dˆNu [i]
Figure 5.1 CDMA system with joint preprocessing and individual FEC decoding
Extending au in the numerator and denominator of L(y | aˆ u ) to a and summing up all possible combinations of aν=u yields with N N = σN2 · INs
a,au =+1 pY|a (y) ·
Pr{a}
a,au =−1 pY|a (y) ·
Pr{a}
L(aˆ u | y) = log
− y − S · a2 /σN2 · Pr{a} . = log 2 2 a,au =−1 exp − y − S · a /σN · Pr{a}
a,au =+1 exp
(5.9)
The probabilities Pr{a} in (5.9) are generally not known a priori so that they are not considered for the moment. However, we will come back to them when talking about iterative schemes. Performing some manipulations on the squared magnitudes in the numerator and denominator and dropping all terms that do not depend on a results in −y − S · a2 ⇒ 2 · Re aH · SH · y − aH · SH · S · a = 2 · Re aH · r − aH · R · a. (5.10) We recognize that the matched filtered signal r = SH · y is correlated with the hypothesis a. In the case of BPSK, a and r = Re SH · y are real-valued and (5.10) simplifies to −y − S · a2
⇒
2aT r − aT R a
(5.11)
with R = Re {R}. The output of the matched filter represents a sufficient statistics, that is, it contains the same mutual information as the channel output y, and subsequent processing stages can still provide optimal log-likelihood ratios. We obtain T 2 T a,au =+1 exp 2a r − a R a /(2σN ) . L(r | aˆ u ) = log (5.12) 2 T T a,au =−1 exp 2a r − a R a /(2σN ) Obviously, the computational complexity still depends exponentially on the number of users (and the alphabet size). A modest complexity reduction can be achieved by applying the
MULTIUSER DETECTION IN CDMA SYSTEMS
231
already known approximation of (3.70) resulting in L(r | aˆ u ) ≈
1 · max 2aT r − aT R a 2 a,a =+1 2σN u −
(5.13)
1 · max 2aT r − aT R a . 2 a,a =−1 2σN u
5.1.3 Turbo Detection with Joint Preprocessing and Separate Decoding In the next step, individual de-interleaving and FEC decoding is performed. According to the turbo-principle, it is possible to give feedback information to the preprocessor so that an iterative process arises (Hagenauer 1996a; Lampe and Huber 2002; Reed et al. 1998). If soft-output decoding algorithms like the (BCJR) algorithm explained in Section 3.4.3 are employed in delivering the LLRs L(aˆ u ) = La (au ), these LLRs can be used to calculate a priori probabilities eLa (au )/2 · eau La (au )/2 (5.14) Pr{au } = 1 + eLa (au ) that can be inserted into (5.9). Because of individual interleavers, the LLRs La (au ) are assumed to be statistically independent so that Pr{a} =
Nu 7 eLa (au )/2 · eau La (au )/2 1 + eLa (au )
(5.15)
u=1
holds. Using the results for BPSK from (5.11) and inserting (5.15) into (5.9) results in 2 ) N 2aT r −aT R a /(2σN eLa (aµ )/2 aµ La (aµ )/2 u · µ=1 e a,au =+1 e 1+eLa (aµ ) L(aˆ u | r) = log 2 T T L (a )/2 2a r −a R a /(2σN ) e a µ u aµ La (aµ )/2 · N a,au =−1 e µ=1 1+eLa (aµ ) e 2 )+ Nu a L (a )/2 2aT r −aT R a /(2σN µ=1 µ a µ a,au =+1 e = log . (5.16) 2 )+ Nu a L (a )/2 2aT r −aT R a /(2σN µ=1 µ a µ a,au =−1 e The terms eLa (aµ )/2 /(1 + eLa (aµ ) ) are independent of the hypotheses a within the summations and, therefore, are identical in the numerator and denominator so that they can be cancelled. Furthermore, the u-th contribution in the inner sums of numerator and denominator in (5.16) is constant due to the restrictions of the outer summations over a with au = +1 or au = −1. Hence, we can extract the common factor exp[La (au )] from the ratio and obtain (5.17a) L(aˆ u | r) = La (au ) + Le (aˆ u ) with Le (aˆ u ) = log
a,au =+1 e a,au =−1 e
Nu
Nu
2 )+ 2aT r −aT R a /(2σN 2 )+ 2aT r −aT R a /(2σN
µ=1 aµ La (aµ )/2 µ=u µ=1 aµ La (aµ )/2 µ=u
.
(5.17b)
232
MULTIUSER DETECTION IN CDMA SYSTEMS
The approximation already known from (5.13) becomes
2aT r − aT R a 1 L(aˆ u | r) ≈ La (au ) + max + a,au =+1 2 2σN2
Nu µ=1 µ=u
aµ La (aµ )
Nu 2aT r − aT R a 1 + aµ La (aµ ) − max . 2 a,au =−1 2 2σN
(5.18)
µ=1 µ=u
The receiver structure depicted in Figure 5.2 illustrates how to calculate the solutions presented in (5.16) and (5.18). In the first iteration, no a priori information is available and only the terms given in (5.11) have to be calculated. They remain constant during subsequent iteration steps and are to be determined only once. After having performed the FEC decoding for the first time, the LLRs La (bu ) can be used as a priori information according to (5.15) to improve the estimates L(aˆ u | r). As already mentioned, the complexity of the joint preprocessor still grows exponentially with the number of users while the decoding part depends linearly on Nu . For a small OFDM-CDMA system with a spreading factor Ns = 4, some results are shown in Figure 5.3. They have been obtained for a 4-path Rayleigh fading channel with uniform power delay profile and random spreading codes. A half-rate convolutional code with Lc = 3 and generators g1 = 58 , g2 = 78 was employed. At the receiver, perfect LLRs from (5.16) as well as for the approximation of (5.18) have been used. Obviously, the error rate can be significantly reduced with the second iteration. Subsequent iterations not shown here do not lead to substantial additional improvements because we are already close to the single-user bound (SUB). This lower bound is reached at a signal-to-noise ratio (SNR) of 7 dB for β = 1 as well as β = 2. Only for small SNRs, the La (a1 )
cc y
Le (aˆ 1 )
1 −1 1
SISO decoder
−1 Nu
SISO decoder
dˆ1
joint preprocessor Le (aˆ Nu )
La (aNu )
dˆNu
Nu
Figure 5.2 Turbo receiver for joint preprocessing and individual FEC decoding
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10 10 10 10 10
a) β = 1
0
SUB perf. LLR app. LLR
−1
SUB perf. LLR app. LLR
−1
10
−2
−3
−4
−2
10
−3
10
−4
10
−5
0
b) β = 2
0
10
BER →
10
233
−5
2
4 6 8 Eb /N0 in dB →
10
10
0
2
4 6 8 Eb /N0 in dB →
10
Figure 5.3 Performance of optimum multiuser detection with subsequent decoding for the first two iterations (first iteration: ◦ and ×, second iteration: and +) loss compared with the SUB amounts to 0.2 dB for β = 1 and 0.5 dB for β = 2. Moreover, the approximate solution (× and +) performs nearly as well as the optimum solution.
5.2 Linear Multiuser Detection A further reduction of the computational complexity can be achieved by applying only linear joint preprocessing. A common property of all linear techniques is that they do not exploit the knowledge of the finite signal alphabet but assume continuously distributed transmit signals. Therefore, we have only a polynomial complexity with respect to the number of active users instead of an exponential relationship. The following approaches are related to simple linear algebra, that is, the linear equation system y = Sa + n has to be solved with respect to a, that is, we look for a matrix W with aˆ = W · y. For simplicity, we first restrict to the uncoded case and the system matrix S consists of Nu columns and Ns rows.
5.2.1 Decorrelator (Zero-Forcing, ZF) First, we consider the decorrelator being equivalent to the zero-forcing (ZF) equalizer (Honig and Tsatsanis 2000; Moshavi 1996). It searches for the unconstrained vector a˜ ZF ∈ CNu that minimizes the squared Euclidean distance to the received vector y according to 52 5 a˜ ZF = argmin 5y − S˜a5 . (5.19) a˜ ∈CNu
The multiplication of a˜ ZF with S leads to the best reconstruction of y. Although (5.19) looks very similar to the maximum likelihood solution, the search space is not restricted to a finite alphabet (unconstrained) but comprises the entire Nu -dimensional space of complex numbers. Therefore, the final detection requires a scalar hard decision aˆ ZF = Q(˜aZF ) of the soft values in a˜ ZF . Nevertheless, the decorrelator represents the joint maximum likelihood
234
MULTIUSER DETECTION IN CDMA SYSTEMS
detector for continuously distributed signal alphabets. Without the exploitation of the finite signal space there is no combinatorial problem and the optimization is solved by setting the derivative of the squared Euclidean distance in (5.19) with respect to a˜ H , to zero. With the relation ∂ a˜ /∂ a˜ H = 0 (Fischer 2002), we obtain the Wirtinger derivative H ∂ ∂ y − S˜a · y − S˜a = H yH y − yH S˜a − a˜ H SH y + a˜ H SH S˜a H ∂ a˜ ∂ a˜ !
= −SH y + SH S˜a = 0
(5.20)
whose solution is obviously (Verdu 1998) H −1 H a˜ ZF = WH · S · y = R−1 · r (5.21) ZF · y = S S H −1 with WZF = S · S S . We recognize that the decorrelator starts again with a bank of matched filters (multiplying y with SH ). Afterwards, the intermediate results in r are decorrelated with R−1 , that is, the correlations between the matched filter outputs are removed. If S consists of linear independent columns which are generally fulfilled for Nu ≤ Ns , the correlation matrix R = SH S has full rank and its inverse exists. Inserting the structure of y into (5.21) leads to a˜ ZF = R−1 SH · Sa + n = a + R−1 SH · n = a + WH (5.22) ZF · n. The output of the decorrelator consists of the desired symbol vector a and a modified noise term. Therefore, as already mentioned in Section 4.4.3, the interference can be totally suppressed. However, the background noise is multiplied with the inverse of the correlation matrix R leading, especially for high loads β, to a strong noise amplification and, hence, to low SNRs. This drawback is also indicated by the error covariance matrix , H ZF = E a˜ ZF − a a˜ ZF − a , H H H = WH n − a a + W n − a WZF = E a + WH ZF ZF ZF E nn 2 −1 H −1 = σN2 WH = σN2 R−1 . ZF WZF = σN R S SR
(5.23)
It contains on its diagonal, the mean squared error (MSE) for each user and equals the inverse of the correlation matrix R multiplied with the noise power σN2 . Real-Valued Modulation Schemes For real-valued modulation schemes such as BPSK or M-ary amplitude shift keying (ASK), significant improvements can be achieved by the following modification. Since we know that real symbols have been transmitted, only the real part of the matched filter outputs r = Re SH y is of interest. Consequently, the inverse has to be determined only from R = Re SH S and we obtain −1 · Re SH · y = R−1 · r . a˜ ZF = Re SH S
(5.24)
The advantage of (5.24) becomes obvious when the received signal is described only by real matrices. Splitting up real and imaginary parts doubles the sizes of all vectors and
MULTIUSER DETECTION IN CDMA SYSTEMS matrices leading to * + * S y yr = = y S
+ * + * + * + * + −S a n S n · + = · a = S˜ r a + nr . + S a n S n
235
(5.25)
Since a = 0 holds for real-valued modulation schemes, only the first Nu columns of the real-valued system matrix influence the channel output. This halves the effective interference so that S˜ r is better conditioned than S and the background noise is less amplified. Moore-Penrose Pseudo Inverse Next, we have to discuss the case when R is singular and its inverse does not exist. Since we assume binary spreading codes, this situation occurs for Nu > Ns and requires the pseudo inverse or Moore-Penrose inverse. It is obtained from the singular-value decomposition of S (Golub and van Loan 1996) (see also Appendix C) + + * * 2 0 0 H 0 0 H H V U = UUH , ⇒ R=U (5.26) S = UV = U 0 0 0 0 where U ∈ CNs ×Ns and V ∈ CNu ×Nu are unitary matrices. The diagonal matrix 0 contains all nonzero singular values of S. With this definition, we obtain the pseudo inverses that also include the nonsingular case
−1 + * −1 SH S SH rank(S) = Nu 0 H 0 † † H U = (5.27a) S = V U = V −1 0 0 SH SSH rank(S) = Ns and
* −2 † 0 R = SH S = V 0 †
−1 + SH S rank(S) = Nu 0 H V = H −2 H 0 S SS S rank(S) = Ns .
(5.27b)
† Using (5.27a) and (5.27b), we obtain the general result WH ZF = S .
Near-Far Scenarios In scenarios with near-far effects (cf. page 183), the received signal can be expressed as y = SP1/2 a + n. The corresponding decorrelating filter is −1/2 † S , WH ZF = P
(5.28)
that is, it consists of the already known decorrelator S† with successive scalar weighting of the filter outputs. Since the decorrelator totally removes the interference, weak users do not suffer from strong users and the decorrelator is called near-far resistant. Figure 5.4a shows the bit error rate (BER) performance of the decorrelator for BPSK. Although the multiple access interference can be totally suppressed, the performance degrades with increased load. Only for very low β the SUB can be reached. The reason for this behavior is the amplification of the background noise that becomes larger with growing load. For β = 2, a reliable transmission is not possible. However, except for low SNR and extremely high loads, the decorrelator performs better than the simple matched filter represented by dashed lines.
236
MULTIUSER DETECTION IN CDMA SYSTEMS 10
a) error rate versus Es /N0
0
b) error rate versus β
0
10
0 dB 10
−1
−1
10
10 10 10 10
−3
−4
−5
0
Nu= 2 Nu= 16 Nu= 32 Nu= 48 Nu= 64 SUB 4 8 12 16 Eb /N0 in dB →
BER →
BER →
5 dB −2
−2
10
10 dB −3
10
−4
15 dB
10
20 dB
−5
20
10
0
0.5
1 β→
1.5
2
Figure 5.4 Error rate performance of decorrelator for different loads and uncoded OFDMCDMA system with BPSK, processing gain Gp = 32, and 4-path Rayleigh fading channel (dashed lines: matched filter) This can also be observed in Figure 5.4b showing the error rate performance versus β for different SNRs Es /N0 . While the error rate degrades dramatically for the matched filter even for moderate loads, the BER increases more slowly for the decorrelator. Nevertheless, the noise amplification leads to a weak performance at high SNRs, the error rate approaches 0.5 when β tends to 2. It has to be emphasized that the example assumes independent fading channels for all users (see uplink transmission in Section 4.2) so that the interference is distributed in the complex plane. Since we use real-valued BPSK, only half of the interference power affects the transmission. Otherwise, the error rate of 0.5 would have already been reached for β = 1. Figure 5.5 compares the performance of the decorrelator with that of a simple matched filter. It depicts the gain Eb /N0 |MF (5.29) γZF−MF = 10 log10 Eb /N0 |ZF of the decorrelator for different target error rates Pt , that is, the logarithmic ratio of required SNRs to achieve the bit error rate Pt . Obviously, γZF−MF grows with β and the slope becomes higher for low target error rates. Note that the matched filter cannot reach certain target error rates if the load is too high, resulting in infinite large gains. For small SNRs and if β approaches 2, the decorrelator performs worse than the matched filter as can be seen from Figure 5.4.
5.2.2 Minimum Mean Squared Error Receiver (MMSE) So far, we considered two extreme linear detectors: The matched filter in Chapter 4 that addresses only the background noise and, therefore, suffers severely from multiuser interference and the decorrelator of the preceding section that concentrates only on the interference neglecting the influence of the noise. A compromise between both is obtained with the
MULTIUSER DETECTION IN CDMA SYSTEMS
237
10
Pt Pt Pt Pt
γZF−MF in dB
8
= 10−1 = 5 · 10−2 = 10−2 = 5 · 10−3
6 4 2 0 0
0.2
0.4
0.6
0.8
1
β→ Figure 5.5 Gain of decorrelator compared to MF for different loads and uncoded OFDMCDMA system with BPSK and 4-path Rayleigh fading channel minimum mean squared error (MMSE) detector WMMSE minimizing the average squared Euclidean distance between the estimate a˜ MMSE = WH MMSE · y and the true data vector a. ,5 5 5W B H y − a52 . aˆ MMSE = WH (5.30) MMSE · y with WMMSE = argmin E B CNu ×Ns W∈
Again, a solution of the problem defined in (5.30) is found by setting the partial derivative of B H /∂ W B =0 ˜ to zero. With the relation ∂ W the squared Euclidean distance with respect to W, (Fischer 2002), we obtain H ∂ B H y − a)H B y − a)(W E tr (W B ∂W 2 1 ∂ B −W B H Y A − A Y W B + A A B H Y Y W = tr W B ∂W ! B H Y Y − A Y = 0 =W
(5.31)
leading to the well-known Wiener solution −1 WH MMSE = A Y · Y Y .
(5.32)
To determine the covariance matrices in (5.32), some general assumptions are made that are fulfilled in most practical communication systems. First, the AWGN channel adds white noise, that is, successive noise samples are uncorrelated. Second, the data symbols au of different users are statistically independent. Furthermore, the data symbols are independent from the noise samples leading to the following set of equations. N N = E{nnH } = σN2 · INs
(5.33a)
A A = E{aaH } = σA2 · INu
(5.33b)
A N = E{anH } = 0.
(5.33c)
238
MULTIUSER DETECTION IN CDMA SYSTEMS
These results deliver the covariance matrix of the received samples Y Y = E{yyH } = SA A SH + N N = σA2 · SSH + σN2 · INs
(5.34)
and the crosscovariance matrix A Y = E{ayH } = A A SH + A N = σA2 · SH .
(5.35)
Inserting (5.34) and (5.35) into (5.32), we obtain the MMSE filter 2 H −1 H σN2 −1 2 H 2 H WH = σ S SS + σ I = S σ SS + 2 INs N MMSE A A N s σA which can be shown to be equivalent to2 ! #−1 '−1 & σN2 N0 H H H WMMSE = S S + 2 INu ·S = R+ IN · SH . Es u σA
(5.36)
(5.37)
Analyzing the result in (5.37), we see that the MMSE detector starts like the decorrelator with a matched filter bank. Moreover, WMMSE represents a compromise between matched filter and decorrelator. For σN2 → 0, that is, infinitely large SNR, the identity matrix in the inverse is cancelled and we obtain the decorrelator in (5.21) suppressing the interference perfectly. On the contrary, the identity matrix dominates the inverse for σN2 → ∞ so that R can be neglected. In this case, we simply obtain the matched filter that addresses only the noise (M¨uller 1998). The MMSE detector does not suppress the multiuser interference perfectly and some residual interference still disturbs the transmission. Moreover, the estimate is biased. The error covariance matrix with (5.1) and (5.37) now becomes , H MMSE = E a˜ MMSE − a a˜ MMSE − a ! #−1 ! #−1 2 2 σ σ N N R = σN2 R + 2 INu . (5.38) = σA2 INu − R + 2 INu σA σA The normalized MSE per user can be easily calculated and amounts to ! #−1 2 σ 1 1 σN2 tr [MMSE ] = tr UUH + N2 INu Nu σA2 Nu σA2 σA =
rank(S) 1 N0 /Es , Nu λµ + N0 /Es
(5.39)
µ=1
where λµ denotes the µ-th nonzero eigenvalue of the correlation matrix R. Hence, the average error depends on the singular-value distribution of S. For large SNR Es /N0 → ∞, the error diminishes; for Es /N0 → 0, it becomes 1. 2 A simple way to derive (5.37) is to start with a matched filter bank and minimizing WH SH y − a2 = WH r − a2 .
MULTIUSER DETECTION IN CDMA SYSTEMS
239
Real-Valued Modulation Schemes Looking again at a BPSK transmission, the performance can be improved by concentrating on the important real parts. Similar to the derivation for the decorrelator we obtain '−1 & N0 a˜ MMSE = R + INu · r . (5.40) 2Es Note that the noise power in (5.40) is halved compared to the complex-valued case! Near-Far Scenarios In near-far scenarios, the MMSE filter has to be adapted to the different power levels among the users (Feuers¨anger and K¨uhn 2001). Substituting S with SP1/2 , it becomes −1 WMMSE = P1/2 SH SPSH + σN2 INs (5.41a) −1 (5.41b) = P1/2 SH SP + σN2 INu SH . From (5.41a) and (5.41b), we observe that the power levels affect the behavior of the MMSE filter. If the SNR grows without bound, the influence of the background noise vanishes and the MMSE filter concentrates on the interference suppression. Hence, the MMSE detector approaches the decorrelator, and is asymptotically near-far resistant (Verdu 1998). However, for finite SNR weak users suffer more from strong users than in the case of the decorrelator, that is, near-far resistance is not generally given. Figure 5.6a shows the BER performance of the MMSE detector for an OFDM-CDMA system and a 4-path Rayleigh fading channel. Similar to the decorrelator, performance degrades for increased system load β. However, the comparison with the decorrelator in Figure 5.6b illustrates that the degradation is much smaller than for the ZF solution. The
10
a) error rate versus Es /N0
0
b) error rate versus β
0
10
0 dB
10 10 10 10
−1
−1
10
−2
−3
−4
−5
0
β≈0 β = 0.5 β=1 β = 1.5 β=2 SUB 4 8 12 16 Eb /N0 in dB →
BER →
BER →
10
5 dB
−2
10
10 dB
−3
10
−4
10
15 dB 20 dB
−5
20
10
0
0.5
1 β→
1.5
2
Figure 5.6 Error rate performance of MMSE filter for different loads and OFDM-CDMA system with 4-path Rayleigh fading channel (dashed line: decorrelator)
240
MULTIUSER DETECTION IN CDMA SYSTEMS 6
Pt Pt Pt Pt
γZF−MF in dB
5 4
= 10−1 = 10−2 = 10−3 = 10−4
3 2 1 0 0
0.5
1 β→
1.5
2
Figure 5.7 Gain of MMSE filter compared to decorrelator for different loads and an uncoded OFDM-CDMA system with BPSK and 4-path Rayleigh fading channel MMSE filter outperforms the decorrelator for all loads β and all SNRs and the gain grows with the system load. This is also confirmed by Figure 5.7. Concluding, we can state that linear MUD techniques reduce multiple access interference efficiently. Especially, the MMSE filter outperforms the simple matched filter for all SNRs. Although the computational complexity of the described linear detectors is much lower than that of the optimum multiuser detector, it still grows cubically with the system size. Hence, calculating the pseudo inverse of S becomes a demanding task for large systems. The next subsections describe some iterative schemes approximating the ZF and MMSE solutions with lower computational costs.
5.2.3 Linear Parallel Interference Cancellation (PIC) By calculating the pseudo inverse of the system matrix S, we basically solve a linear equation system. From linear algebra (Golub and van Loan 1996) we know that the solution can be obtained iteratively. Starting with a matched filter bank, we have to process the signal (5.42) r = Ra + SH n ⇒ a˜ = WH · r ⇔ M · a˜ = r. Instead of looking in (5.42) for a matrix W that leads to a good estimate a˜ , we can describe the problem as solving the linear equation system M · a˜ = r. The matrix M will be determined later. A single row of this linear equation system can be presented in the form Nu u−1 ru = Mu,u a˜ u + Mu,v a˜ v + Mu,v a˜ v , (5.43) v=1
v=u+1
that is, the received value ru consists of the superposition of the scaled desired symbol a˜ u and the weighted interfering symbols a˜ v=u . An iterative solution of the equation system is obtained by using the weighted matched filter outputs of the interfering symbols as initial
MULTIUSER DETECTION IN CDMA SYSTEMS
r1
a˜ 1(0)
−1 M1,1
Nu
Mcc ˜ u(0) 1,u a
u=2
241
a˜ 1(1)
Nu
−1 M1,1
M1,u a˜ u(1)
u=2
a˜ 1(2) −1 M1,1
y matched filter bank rNu
a˜ N(0)u
MN−1u ,Nu
N u −1 u=1
MNu ,u a˜ u(0)
a˜ N(1)u
a˜ N(2)u
N u −1
MN−1u ,Nu
u=1
MNu ,u a˜ u(1)
MN−1u ,Nu
second stage
first stage
Figure 5.8 Structure of multistage detector for iterative parallel interference cancellation soft estimates aˆ v(0) =u = rv=u /Mv,v . Subtracting them from ru leads to an improved estimate (1) a˜ u after the first iteration. The interference cancellation is simultaneously applied to all (µ) users and repeated with updated estimates a˜ u in subsequent iterations. In the µ-th iteration, the u-th symbol becomes 3 4 Nu u−1 −1 a˜ u(µ) = Mu,u · ru − Mu,v a˜ v(µ−1) − Mu,v a˜ v(µ−1) . (5.44) v=1
v=u+1
The simultaneous application of (5.44) for all symbols au , 1 ≤ u ≤ Nu , is also called Jacobi algorithm and known as linear parallel interference cancellation (PIC). An implementation leads directly to a multistage detector depicted in Figure 5.8 (Honig and Tsatsanis 2000; Moshavi 1996). Several identical modules highlighted by the gray shaded areas are serially concatenated. Each module represents one iteration step so that we need m stages for m iterations. The choice of the matrix M determines the kind of detector that is approximated. For M = R, we approximate the decorrelator, and the coefficients Mu,v = Ru,v used in (5.44) equal the elements of the correlation matrix. The MMSE filter is approximated for M = R + σN2 /σA2 · INu . Hence, the diagonal elements of M have to be replaced with Mu,u = Ru,u + N0 /Es . Convergence Behavior of Decorrelator Approximation The convergence properties of this iterative algorithm depend on the eigenvalue distribution of M. Therefore, (5.44) is described using vector notations. The matrix A = diag(diag(R)) is diagonal and contains the diagonal elements of the correlation matrix R. The PIC approximating the decorrelator delivers −1 ·r a˜ (0) ZF = A 1 2 −1 −1/2 r − R − A a˜ (0) INu − A−1/2 R − A A−1/2 A−1/2 r a˜ (1) ZF = A ZF = A
242 a˜ (2) ZF
MULTIUSER DETECTION IN CDMA SYSTEMS 1 2 = A−1 r − R − A a˜ (1) ZF * + −1/2 / −1/2 −1/2 02 −1/2 −1/2 −1/2 INu − A A =A + A r R−A A R−A A .. .
−1/2 a˜ (m) ZF = A
m −1/2 µ A A − R A−1/2 A−1/2 r.
(5.45)
µ=0
The output after the m-th iteration in (5.45) represents the m-th order Taylor series approximation of R−1 (M¨uller and Verdu 2001). Rewriting it with the normalized correlation ¯ = A−1/2 RA−1/2 yields matrix R −1/2 a˜ (m) ZF = A
m
¯ µ A−1/2 r. INu − R
(5.46)
µ=0
This series only converges to the true inverse of R if the magnitudes of all eigenvalues of ¯ are smaller than 1. This condition is equivalent to λmax (R) ¯ < 2. Since λmax tends INu − R √ asymptotically to (1 + β)2 (M¨uller 1998), we obtain an approximation of the maximum load below which the Jacobi algorithm will converge. √ NU,max < ( 2 − 1)2 ≈ 0.17. (5.47) βmax = Ns Obviously, the Jacobi algorithm or, equivalently, the linear PIC converges toward the true decorrelator only for very low loads. Hence, this technique is not suited for highly loaded systems. Convergence Behavior of MMSE Approximation According to the last section, we have to replace the diagonal matrix A with the matrix D = A + σN2 /σA2 INu to approximate the MMSE detector. With this substitution, we obtain the following estimates after different iterations. −1 ·r a˜ (0) MMSE = D 1 (0) 2 −1 r − R − A a˜ MMSE a˜ (1) = D MMSE = D−1/2 INu − D−1/2 R − A D−1/2 D−1/2 r 1 (1) 2 −1 r − R − A a˜ MMSE a˜ (2) = D MMSE * + −1/2 / −1/2 −1/2 02 −1/2 −1/2 −1/2 INu − D D =D + D r R−A D R−A D
.. . −1/2 a˜ (m) MMSE = D
m −1/2 µ D A − R D−1/2 D−1/2 r. µ=0
(5.48)
MULTIUSER DETECTION IN CDMA SYSTEMS
243
To determine the convergence properties concerning the MMSE filter, (5.48) can be transformed into the form of (5.46) 3 ! # 4µ m σN2 (m) −1/2 −1/2 −1/2 a˜ MMSE = D D−1/2 r. (5.49) INu − D R + 2 INu D σ A µ=0 Now, the same argumentation as for the decorrelator can be applied and the condition for convergence becomes (Grant and Schlegel 2001) C 2 D 2 D σ A2u,u λu + σN2 /σA2 max < 2 ⇒ β < min E2 + 2 N 2 − 1 . u=1...Nu u=1...Nu A2u,u + σ 2 /σ 2 Au,u σA N A The first difference compared to the decorrelator is that the maximum load β depends on the SNR σN2 /σA2 = N0 /Es . This term increases the convergence area a little bit. However, for high SNR σN2 /σA2 becomes small and both decorrelator and MMSE filter are approached only for low loads. This behavior is illustrated in Figure 5.9a showing the results for the first five iterations and a load β = 0.5. Only for very low SNR (large σN2 /σA2 ) the iterative approximation reaches the true MMSE filter. For higher SNRs, β = 0.5 is beyond the convergence region and the PIC performs even worse than the matched filter. Figure 5.9b shows the results for Eb /N0 = 10 dB versus β. Again, it is confirmed that convergence can be ensured only for low load.
5.2.4 Linear Successive Interference Cancellation (SIC) The poor convergence properties of the linear PIC can be substantially improved. Imagine that the interference cancellation described in (5.44) is carried out successively for different
10
BER →
10 10 10 10 10
a) β = 0.5
0
b) Eb /N0 = 10 dB
0
10
−1
−2
−1
10
BER →
10
−3
−4
−5
−6
0
1 iteration 2 iterations 3 iterations 4 iterations 5 iterations 5 10 15 Eb /N0 in dB →
−2
10
−3
20
10
0
0.5
1 β→
1.5
2
Figure 5.9 Performance of linear PIC approximating the MMSE filter (upper dashed line: matched filter; lower dashed line: true MMSE filter)
244
MULTIUSER DETECTION IN CDMA SYSTEMS
users starting with u = 1 and ending with u = Nu . Considering the µ-th iteration for user u, (µ−1) only estimates a˜ v=u of the previous iteration µ − 1 are used. However, updated estimates (µ) a˜ v
v=u+1
Besides improved convergence properties another advantage is the in-place implementation, that is, updated estimates can directly overwrite old values because they are not used any longer, thereby saving valuable memory. The analysis of the convergence behavior is not as easy as for the PIC. In Golub and van Loan (1996) it is shown that the algorithm always converges for Hermitian positive definite matrices M. Fortunately, in the context of our CDMA system M represents the correlation matrix R or R + σN2 /σA2 INu . Hence, M can be assumed to be Hermitian and positive definite so that the Gauss-Seidel algorithm always converges. Figure 5.10a confirms the promised convergence properties. Considering a half-loaded system, five iterations suffice to approach the true MMSE filter. At low SNRs, the performance of the MMSE filter is reached with even less iterations. Figure 5.10b shows that with increasing load more iterations are needed. For loads above β = 1, the first iteration can perform even worse than the matched filter. However, successive iterations substantially improve the performance. Comparing the computational costs of a direct matrix inversion with the iterative approximations in terms of number of multiplications, we see from (5.50) that Nu multiplications per iteration and user are needed. For m iterations, this leads to mNu2 multiplications
10
BER →
10 10 10 10 10
a) β = 0.5
0
b) Eb /N0 = 10 dB
0
10
−1
−2
−1
10
BER →
10
−3
−4
−5
−6
0
1 iteration 2 iterations 3 iterations 4 iterations 5 iterations 5 10 15 Eb /N0 in dB →
−2
10
−3
20
10
0
0.5
1 β→
1.5
2
Figure 5.10 Performance of linear SIC approximating the MMSE filter (upper dashed line: matched filter, lower dashed line: true MMSE filter)
MULTIUSER DETECTION IN CDMA SYSTEMS
245
compared to a complexity of O(Nu3 ) for the direct matrix inversion. Hence, as long as the number of iterations is smaller than Nu , we save computational costs. Besides parallel and SIC strategies, there exist further iterative approaches like the conjugate gradient method and a general polynomial series expansion of the inverse (M¨uller 1998). These approaches are not pursued here. All linear techniques described so far do not reach the SUB, that is, interference remains in the system after filtering. The information theoretic analysis in Section 4.3 showed that the optimum detector performs much better than linear techniques. Therefore, we have to look for nonlinear approaches that come closer to the optimum solution. These techniques exploit the finite signal alphabet to improve the MUD.
5.3 Nonlinear Iterative Multiuser Detection A major drawback of the previously introduced linear detectors is not exploiting the discrete nature of the transmit signals. This shortcoming can be easily overcome by introducing nonlinear devices into the multistage structure to exploit the discrete alphabets. This means (µ) that the signals a˜ v=u in (5.44) or (5.50) are passed through a suited nonlinear device before they are used for interference cancellation. For simplicity, we restrict the analysis to a normalized BPSK, that is, we transmit x = ±1. An extension to quaternary phase shift keying (QPSK) that treats real and imaginary parts separately is straightforward while schemes with more levels need more sophisticated methods.
5.3.1 Nonlinear Devices The simplest nonlinearity is naturally a hard decision, that is, determining the sign of a signal (5.51) QHD (y) = sgn(y). If the tentative decision is correct, the interference can be cancelled perfectly. However, if the decision is wrong, which may be very likely in the early stages of the detection process, especially for large β, interference is not reduced but in fact increased and the situation becomes worse. Therefore, more sophisticated functions taking into account the reliability of the signals should be preferred. A selection analysed by K¨uhn et al. (2002) is depicted in Figure 5.11. To keep the influence of wrong decisions as small as possible, it is advantageous not to decide on unreliable small samples but to keep them small. Obviously, interference is generally not perfectly cancelled by these approaches, but the error made by wrong decision is remarkably reduced. The simplest form that follows this strategy is the clipper or limiter. It has a linear shape for |y| ≤ 1 and outputs ±1 for larger inputs |y| > 1 −1 for y < −1 (5.52) Qclip (y) = y for |y| ≤ 1 +1 for y > +1. Hence, the clipper exploits the fact that the transmitted signals cannot be larger than 1.3 Interference is totally cancelled if the signal has the correct sign and a magnitude larger 3 For
notational simplicity, we assume the normalization to Es /Ts = 1.
246
MULTIUSER DETECTION IN CDMA SYSTEMS clipper
HD
1
tanh
1
1
-1 -1
1
-1
-1
NL 1
NL 2 1 α
-1
1 -1
1 α α 1 -1
-1
α 1 -1
Figure 5.11 Examples for nonlinear devices
than 1. For small values, the reliability is low and the interference can only be partly reduced. In case of a wrong sign, the degradation is not as large as for the hard decision. A smooth version of the clipper is obtained with the tanh-function avoiding sharp edges. We know from Section 3.4 on page 110 that the expectation of a bit is obtained from its log-likelihood ratio L by tanh(L/2). However, the LLR can be determined only if the signal to interference plus noise ratio (SINR) is perfectly known. This represents a big difficulty because we do not know the exact interference level in each iteration. Therefore, we introduce a parameter α according to Qtanh (y) = tanh(αy)
(5.53)
that depends on the SNR as well as the effective interference and has to be optimized with respect to a minimum error rate. Figure 5.12 compares the tanh-function for different α with the hard decision and the clipper. For small α, the tanh is very smooth and its output is pretended to be unreliable even for large inputs. On the contrary, α = 1 comes close to the clipper in the nearly linear area around the origin and large α > 1 approach the hard decision. Next, two further nonlinear functions are proposed. The first one (NL 1) has a linear shape around the origin and hops to ±1 for values larger than a certain threshold α −1 for y < −α (5.54) QNL1 (y) = y for |y| ≤ α +1 for y > α. The difference compared to the clipper is that this nonlinearity starts to totally remove the interference for values smaller than 1. The parameter α has to be optimized according to the load and the SNR. The second function (NL 2) avoids any cancellation for unreliable
MULTIUSER DETECTION IN CDMA SYSTEMS
247
1
α = 0.5 α=1 α=2 α=4 HD clipper
q(x) →
0.5
0
−0.5
−1 −4
−3
−2
−1
0 x→
1
2
3
4
Figure 5.12 Comparison of tanh for different α with hard decision and clipper values and allows interference reduction only above a threshold α −1 for y < −1 y for − 1 ≤ y ≤ −α QNL2 (y) = 0 for |y| < α y for α ≤ y ≤ +1 +1 for y > +1.
(5.55)
Obviously, it reduces to a simple clipper for α = 0. Finally, we will look at coded CDMA systems. If the computational costs do not represent a restriction, the channel decoder can be used as a nonlinear device (Hagenauer 1996a). Since it exploits the redundancy of the code, it can increase the reliability of the estimates remarkably. Again, we have to distinguish between hard-output and soft-output decoding. For convolutional codes presented in Chapter 3, hard-output decoding can be performed by the Viterbi algorithm while soft-output decoding can be carried out by the BCJR or Max-Log-MAP algorithms.
5.3.2 Uncoded Nonlinear Interference Cancellation Uncoded Parallel Interference Cancellation First, we have to optimize the parameter α for the nonlinear functions NL 1, NL 2, and tanh. We start our analysis with the PIC whose structure for the linear case in Figure 5.8 has to be extended. Figure 5.13 shows the µ-th stage of the resulting multistage receiver. (µ−1) of the previous Prior to the interference cancellation, the interference reduced signals r˜v −1 iteration are scaled with coefficients Mv,v . The application of the nonlinear function now yields estimates for all signals −1 (µ−1) · r˜v . (5.56) a˜ v(µ−1) = Q Mv,v
248
MULTIUSER DETECTION IN CDMA SYSTEMS r1 (µ−1)
r˜1
−1 M1,1
(µ−1)
r˜Nu rNu
MN−1u ,Nu
(µ−1)
Q(·)
a˜ 1
Nu v=2
(µ−1)
Q(·)
a˜ Nu
N u −1
(µ)
(µ)
cca˜ v(µ−1) M1,v
r˜1
−1 M1,1
Q(·)
(µ)
(µ)
(µ−1) MNu ,v a˜ v
v=1
r˜Nu
MN−1u ,Nu
a˜ 1
Q(·)
a˜ Nu
µ-th stage
Figure 5.13 µ-th stage of a multistage detector for nonlinear parallel interference cancellation (µ−1)
For user u, all estimates a˜ v=u are first weighted with the correlation coefficients Mu,v=u , then summed up and finally subtracted from the matched filter output ru r˜u(µ) = ru − Mu,v · a˜ v(µ−1) . (5.57) v=u
After this cancellation step is performed for all users, the procedure is repeated. If the iterative scheme converges and the global optimum is reached, the interference is cancelled more and more until the single-user performance is obtained. However, the iterative algorithm may get stuck in a local optimum. Figure 5.14 shows the performance of the nonlinearities NL 1 and NL 2 versus the NL 2 = 0 is always the best choice design parameter α. Looking at NL 2, we observe that αopt regardless of the number of iterations. Hence, NL 2 reduces to a simple clipper. With regard to NL 1, the optimum α depends on the iteration. In the first stage, the minimum NL 1 = 1. For the fifth stage, 0.3 ≤ BER is also delivered by a clipper obtained with αopt NL 1 αopt ≤ 0.4 is the best choice. Moreover, the comparison of NL 2 with NL 1 shows that NL 1 is at least as good as NL 2 and generally outperforms NL 2 (Zha and Blostein 2003). The same analysis has been performed for the tanh-function. From Figure 5.15 we recognize that 1 ≤ α ≤ 2 is an appropriate choice for a large variety of loads. With growing β, the optimum α becomes smaller and approaches 1 for β = 1.25. This indicates that the SINR is small for large β. However, the differences are rather small in this interval. Only very low values of α result in a severe degradation because no interference is cancelled for α = 0 leading to the matched filter performance. If α is chosen too large, the tanh function saturates for most inputs and the error rate performance equals that of a hard decision. Figure 5.16a now compares all proposed nonlinearities for a fully loaded OFDM-CDMA system with β = 1 and five iterations. The tanh-function with optimized α shows the best performance among all schemes. NL 1 and clipper come closest to the tanh. The hard decision already loses 2 dB compared to the tanh. Although the nonlinearities consider the finite nature of the signal alphabet and all nonlinearities clearly outperform the matched
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10
10
a) first iteration
0
−1
β β β β
b) fifth iteration
0
10
= 0.5 = 0.75 =1 = 1.25
−1
10
BER →
10
249
−2
β β β β
−2
10
SUB 10
SUB
−3
0
= 0.5 = 0.75 =1 = 1.25
−3
0.2
0.4 0.6 α→
0.8
10
1
0
0.2
0.4 0.6 α→
0.8
1
Figure 5.14 PIC optimization for NL 1 and NL 2 in an uncoded OFDM-CDMA system with a 4-path Rayleigh fading channel and Eb /N0 = 8 dB (solid lines: NL 1, dashed lines: NL 2)
BER →
10
10
a) first iteration
0
−1
β β β β
b) fifth iteration
0
10
= 0.5 = 0.75 =1 = 1.25
−1
10
BER →
10
−2
β β β β
−2
10
SUB 10
SUB
−3
0
= 0.5 = 0.75 =1 = 1.25
−3
0.4
0.8 1.2 α→
1.6
2
10
0
0.4
0.8 1.2 α→
1.6
2
Figure 5.15 PIC optimization for tanh in an uncoded OFDM-CDMA system with a 4-path Rayleigh fading channel and Eb /N0 = 8 dB filter, we observe an error floor that the SUB cannot be reached. Figure 5.16b illustrates this loss versus β. At a load of β = 1, the error rate is increased by one decade compared to the single-user case; for β = 1.5, only the tanh can achieve a slight improvement compared to a simple matched filter. Therefore, we can conclude that nonlinear devices taking into account the finite nature of the signal alphabet improve the convergence behavior of PIC. The SUB is approximately reached up to loads of β = 0.5. For higher loads, performance degrades dramatically until no benefit to the matched filter can be observed.
250
MULTIUSER DETECTION IN CDMA SYSTEMS a) β = 1
0
10
−1
−1
−2
10
−3
0
10
MF HD NL 1 clip tanh 2 4 6 8 Eb /N0 in dB →
BER →
BER →
10
10
b) Eb /N0 = 8 dB
0
10
MF HD NL 1 clip tanh
−2
10
−3
10
10
0
0.5
1
1.5
β→
Figure 5.16 PIC performance comparison of different nonlinearities with optimized α in an uncoded OFDM-CDMA system with 4-path Rayleigh fading channel Uncoded Successive Interference Cancellation From linear interference cancellation techniques, we already know that SIC according to the Gauss-Seidel algorithm converges much better than the PIC. Consequently, we now analyze on the nonlinear SIC. Figure 5.17 illustrates the influence of the parameter α for NL 2 = 0 is the NL 1 and NL 2 on the SIC performance. As already observed for PIC, αopt best choice regardless of the load and the considered iteration, and reduces nonlinearity
BER →
10
10
a) first iteration
0
−1
β β β β
b) fifth iteration
0
10
= 0.5 = 0.75 =1 = 1.25
−1
10
BER →
10
−2
β β β β
−2
10
SUB 10
SUB
−3
0
= 0.5 = 0.75 =1 = 1.25
−3
0.2
0.4 0.6 α→
0.8
1
10
0
0.2
0.4 0.6 α→
0.8
1
Figure 5.17 SIC optimization for NL 1 and NL 2 in an uncoded OFDM-CDMA system with a 4-path Rayleigh fading channel and Eb /N0 = 8 dB (solid lines: NL 1, dashed lines: NL 2)
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10
10
a) first iteration
0
β β β β
−1
b) fifth iteration
0
10
= 0.5 = 0.75 =1 = 1.25
β β β β
−1
10
BER →
10
251
−2
−2
10
SUB 10
SUB
−3
0
= 0.5 = 0.75 =1 = 1.25
−3
0.4
0.8 1.2 α→
1.6
2
10
0
0.4
0.8 1.2 α→
1.6
2
Figure 5.18 SIC optimization for tanh in an uncoded OFDM-CDMA system with a 4-path Rayleigh fading channel and Eb /N0 = 8 dB NL 2 to a simple clipper. However, the influence of α on the error rate performance is much larger than for PIC. For α NL 2 → 1 which leads to a large interval of magnitudes where no interference is cancelled, the error rate tends to 0.5 for all iterations while the loss was quite moderate for the PIC. With regard to NL 1, α has nearly no influence at the first iteration. In subsequent stages, for example, the fifth iteration, the influence increases with growing load β and the NL 1 = 0.4. Again, NL 1 with optimum α shows a better lowest error rate is obtained for αopt performance than NL 2. Figure 5.18 depicts the optimization for the tanh-function. Astonishingly, the results in the first iteration differ from those of the PIC. The lowest error probability is obtained for tanh = 2 regardless of the load β. Also, in the subsequent stages this choice of α represents αopt a very good solution and it coincides with the results of the PIC. As can be seen from Figure 5.19, NL 1 outperforms all other schemes and represents the best nonlinearity under consideration. For β = 1, the SUB is reached within a gap of 0.5 dB for all SNRs. The clipper (NL 2 with α = 0) and the tanh come closest to NL 1 while hard decisions lose remarkably. From Figure 5.19b we see that, compared to the SUB, NL 1 and the tanh are able to keep the loss quite low up to a load of β = 1.5. Even for this high load, the gain over the matched filter is significant. Hence, we can conclude that the considered nonlinearities with optimum design parameters improve the performance for both PIC and SIC. However, SIC still shows a better convergence behavior and comes close to the SUB even for high loads. Performance of Nonlinear SIC for QPSK Modulation If we change from BPSK to QPSK, the effective interference is doubled (cf. Chapter 4). Only slight changes are necessary to adapt the presented algorithms to QPSK. All nonlinearities have to be applied separately to the real and imaginary parts of the signals. Because
252
MULTIUSER DETECTION IN CDMA SYSTEMS a) β = 1
0
10
−1
−1
−2
10
−3
0
10
MF HD NL 1 clip tanh 2 4 6 8 Eb /N0 in dB →
BER →
BER →
10
10
b) Eb /N0 = 8 dB
0
10
MF HD NL 1 clip tanh
−2
10
−3
10
10
0
0.5
1
1.5
β→
Figure 5.19 Performance comparison of different nonlinearities with optimized α for SIC in an uncoded OFDM-CDMA system with 4-path Rayleigh fading channel (five iterations)
BER →
10
10
10
a) Eb /N0 = 8 dB
0
b) Eb /N0 = 20 dB
0
10
−1
−1
10
BER →
10
−2
−3
0
−2
10
first iteration fifth iteration 10th iteration 0.2
0.4 0.6 α→
−3
0.8
1
10
0
0.2
0.4 0.6 α→
0.8
1
Figure 5.20 SIC optimization for NL 1 and NL 2 in an uncoded OFDM-CDMA system with β = 1, QPSK, and a 4-path Rayleigh fading channel (solid lines: NL 1, dashed lines: NL 2) of the doubled interference, the results we obtain for β = 0.75 and QPSK are nearly the same as for β = 1.5 and BPSK. Figure 5.20 analyzes the influence of the parameter α of the nonlinearities for an OFDM-CDMA system with β = 1 and a 4-path Rayleigh fading channel. For medium SNRs like Eb /N0 = 8 dB, the results coincide with those already obtained for BPSK. Nearly no influence can be observed in the first stage. For further iterations and larger
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10
10
10
a) fifth iteration
0
b) 10th iteration
0
10
−1
−2
−3
0
−1
10
MF HD NL 1 clip tanh
BER →
10
253
−2
10
SUB
SUB
5 10 15 Eb /N0 in dB →
−3
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 5.21 SIC performance for nonlinearities with optimized α in an uncoded OFDMCDMA system with QPSK and 4-path Rayleigh fading channel (β = 1)
SNR, for example, 20 dB, NL 1 requires a larger α to perform optimally. Because of the high interference, the estimates are less reliable and the step toward ±1 occurs at higher tanh amplitudes. With reference to the tanh, αopt = 2 still represents a very good choice. Figure 5.21 shows the BER performance for 5 and 10 iterations and optimized α for NL 1 and tanh. While the hard decision does not gain from additional iterations, the nonlinear function NL 1, the tanh-function, and the clipper enhance the error rate remarkably. The tanh with optimized α is still the best choice. However, for this high load there remains a large gap to the SUB (bold line) that roughly amounts to 4 dB at an error rate of 2 · 10−3 .
5.3.3 Nonlinear Coded Interference Cancellation Resuming the way from linear multistage receivers to nonlinear interference cancellation schemes, it is straightforward to incorporate the channel decoder into the iterative structures for coded CDMA systems. Again, we restrict to BPSK and QPSK schemes for notational simplicity. The structure of the transmitter is already known from Figure 5.1. The corresponding receiver for PIC is depicted in Figure 5.22. After the matched filter bank, the obtained signals ru , 1 ≤ u ≤ Nu , are de-interleaved and FEC decoded. The decoders deliver either soft-outputs L(bˆu ) of the code bits like log-likelihood ratios or hard estimates bˆu . Softoutputs can be generated by the BCJR or the Max-Log-MAP decoder while hard-outputs are obtained by the Viterbi algorithm. Next, the outputs are interleaved and processed by a nonlinear function. This is necessary for soft-outputs because log-likelihoods are generally not limited in magnitude while the true code bits are either +1 or −1. From Chapter 3, we know that the expectation of a bit can be calculated with its log-likelihood ratio by tanh(L/2) (see (3.36) on page 110). This is exactly the reason for using the tanh. Finally, the interfering signals are weighted with
254
MULTIUSER DETECTION IN CDMA SYSTEMS r1 L(bˆ1 )
(µ−1)
aˆ 1
−1 1
FEC dec.
PIC/SIC stage
b˜1
tanh
−1 Nu
FEC dec.
(µ)
Nu cc M1,v b˜v
aˆ 1
v=2 (µ) dˆ1
L(bˆNu )
(µ−1)
aˆ Nu
1
Nu
b˜Nu
tanh
N u −1
MNu ,v b˜v
(µ)
aˆ Nu
v=1
(µ) dˆNu
rNu Figure 5.22 Single stage of a nonlinear PIC receiver in coded CDMA systems
the correlation coefficients Mu,v , summed up, and subtracted from the matched filter output ru . The obtained estimate represents the input of the next stage. For the following simulation results, an OFDM-CDMA system with a half-rate convolutional code of constraint length Lc = 7 is considered. As in a mobile radio channel, a 4-path Rayleigh fading channel with uniform power delay profile is employed. Moreover, BPSK and QPSK are alternatively chosen and on an average each user has the same SNR. Figure 5.23a shows the performance of the coded PIC. After four PIC iterations, the SUB is obtained even for β = 1.5, which is equivalent to a spectral efficiency of η = Rc · β = 0.75. We see that decoding helps improve the convergence of iterative interference cancellation schemes. The reliability of the estimated interference is enhanced, leading to a better cancellation step. At low SNR, a gap to the SUB occurs that grows for increasing load. For β = 2, the PIC scheme does not converge anymore. Figure 5.23b compares hard- and soft-decision outputs at the decoder. The upper bold solid line denotes the matched filter performance and the lower bold solid curve represents the SUB. Naturally, the performances for hard- and soft-outputs after the first decoding (single-user matched filter (SUMF), upper solid line) are the same. For subsequent iterations, the soft-output always outperforms the hard-decision output. However, the differences are rather small and amount at the most to 0.5 dB. For this example, both hard- and soft-output decoding reach the SUB at error rates below 10−4 . Nevertheless, for extremely high loads, convergence may be maintained with soft-output decoding while hard-decision decoding will fail. Next, parallel and SIC are compared in Figure 5.24a. The upper bold solid line represents the matched filter performance without interference cancellation, and the lower bold solid curve represents the SUB. After three PIC iterations, both SIC and PIC reach the SUB. However, SIC converges faster and in this example needs one iteration less than the PIC scheme. Hence, the benefits of SIC are preserved when coding is applied. In Figure 5.24b, the load is increased to β = 2, that is, the spectral efficiency η = 1 bit/s/Hz of such a system is twice as high as for half-rate coded TDMA or FDMA systems (K¨uhn 2001a,c). We
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10 10 10 10 10
a) four iterations
0
b) β = 1.5
0
10
−1
−1
10
−2
BER →
10
255
−3
−4
−5
0
β β β β
= 0.5 =1 = 1.5 =2
2 4 6 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
10
8
0
it. 1 it. 2 it. 3 it. 4 2 4 6 Eb /N0 in dB →
8
Figure 5.23 PIC performance of coded OFDM-CDMA system with 4-path Rayleigh fading channel, convolutional code with Rc = 1/2, and Lc = 7 (bold line: single-user bound) a) comparing various loads b) comparing hard-output (×) and soft-output () decoding
BER →
10 10 10 10 10
a) β = 1.5
0
b) four iterations
0
10
−1
−1
10
−2
BER →
10
−3
−4
−5
0
it. 1 it. 2 it. 3 2 4 6 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
8
10
0
β = 1.5 β=2 2 4 6 Eb /N0 in dB →
8
Figure 5.24 Performance of PIC (×) and SIC () for coded OFDM-CDMA system with 4-path Rayleigh fading channel (bold line: single-user bound) a) comparing convergence of PIC and SIC b) comparing PIC and SIC for various loads see that the SIC scheme performs as much for β = 2 as the PIC approach performs for β = 1.5. For a doubly loaded system, the PIC scheme does not converge anymore. Sorted Nonlinear Successive Interference Cancellation There exists a major difference between parallel and SIC. Owing to the problem of error propagation, the order of detection is crucial for SIC. This dependency is illustrated in
256
MULTIUSER DETECTION IN CDMA SYSTEMS 0
10
it. 1 it. 2 it. 3 it. 4
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
2
4 Eb /N0 in dB →
6
8
Figure 5.25 SIC performance (solid: unsorted, dashed: sorted) for coded OFDM-CDMA system with 4-path Rayleigh fading channel and β = 2 (bold line: single-user bound) Figure 5.25 comparing sorted and unsorted SIC. Sorting is implemented by calculating the average magnitude of the decoder output and starting with the strongest user. This is probably not the best strategy, but it can be easily implemented. Obviously, sorting leads to a faster convergence. Especially, at low SNR the gap to the SUB can be decreased. Therefore, sorting is always applied for SIC in subsequent parts. Figure 5.26 now compares the performance of SIC for BPSK and QPSK modulations. We know from Section 4.1 that the use of QPSK in the uplink doubles the effective interference. The spectral efficiency is also doubled because we transmit twice as many bits 0
10
BPSK, β = 1 BPSK, β = 2 QPSK, β = 1 QPSK, β = 2
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
2
4 6 Eb /N0 in dB →
8
Figure 5.26 SIC performance for OFDM-CDMA system with a half-rate convolutional code (Lc = 7) and 4-path Rayleigh fading channel (bold line: single-user bound)
MULTIUSER DETECTION IN CDMA SYSTEMS
257
per symbol as for BPSK, ηQPSK = 2ηBPSK holds. After four PIC iterations, we observe that the SUB has been reached for QPSK and β = 1. This is equivalent to β = 2 for BPSK. No convergence is obtained for β = 2 and QPSK because the initial SINR is too low for achieving reliable estimates from the decoders. Influence of Convolutional Codes Finally, the influence of different convolutional codes is analyzed. Figure 5.27 compares two half-rate convolutional codes: the already used Lc = 7 code and a weaker Lc = 3 code. Looking at Figure 5.27a, we see that three or four iterations suffice to reach the SUB for BPSK. With regard to QPSK, even 10 iterations cannot close the gap of approximately 4 dB. On the contrary, convergence starts earlier with the weak Lc = 3 code as shown in Figure 5.27b. Although the Lc = 3 code has a worse SUB, it performs better than the strong convolutional code and reaches its SUB even for β = 2 and QPSK. Although the difference between the two SUBs amounts to 2 dB in favor of the stronger convolutional code, the Lc = 3 code now gains 2 dB compared to the Lc = 7 code. The explanation for this behavior can be found by observing the SUB curves for both codes in Figure 5.27. We see that the Lc = 3 code has a slightly better performance at low SNR. For larger SNR, the curves intersect and the Lc = 7 code becomes superior. However, the first interference cancellation stage suffers from noise as well as from severe interference that was not yet cancelled. For increasing loads, the SINR at the decoder inputs becomes smaller and smaller until it reaches the intersection of both curves. For higher loads, the weak code now performs better. In our example, parameters were chosen such that the strong code cannot achieve convergence while the Lc = 3 code still reaches its SUB. Therefore, we can conclude that strong error control codes are not
BER →
10 10 10 10 10
a) Lc = 7
0
b) Lc = 3
0
10
−1
−1
10
−2
BER →
10
−3
−4
−5
0
−2
10
−3
10
−4
10
BPSK QPSK 2 4 6 8 10 Eb /N0 in dB →
−5
12
10
0
BPSK QPSK 2 4 6 8 Eb /N0 in dB →
10
Figure 5.27 Sorted SIC performance for half-rate coded OFDM-CDMA system with β = 2, a 4-path Rayleigh fading channel and different convolutional codes (bold line: single-user bound)
258
MULTIUSER DETECTION IN CDMA SYSTEMS
always the best choice and that the coding scheme has to be carefully adapted to the detector.4
5.4 Combining Linear MUD and Nonlinear SIC 5.4.1 BLAST-like Detection As we saw in the previous results, the first detection stage suffers severely from multiuser interference. Hence, its error rate will dominate the performance of subsequent detection steps because of error propagation. The overall performance and the convergence speed can be improved by a linear suppression of the interference prior to the first detection stage. The Bell Labs Layered Space-Time (BLAST) detection of the Bell Labs (Foschini 1996; Foschini and Gans 1998; Golden et al. 1998; Wolniansky et al. 1998) pursues this approach for multiple antenna systems. It can be directly applied to CDMA systems since both systems have similar structures and the same mathematical description y = Sa + n. In a first step, a linear filter w1 is applied suppressing the interference for user 1. The filter can be designed according to the ZF or MMSE criterion, both discussed in Section 5.2, that is, w1 denotes the first column of W1 = WZF or W1 = WMMSE according to Section 5.2 yielding a˜ 1 = wH (5.58) 1 · y. Next, the symbol aˆ 1 = Q(a˜ 1 ) can be decided with improved reliability because less interference disturbs this decision. Instead of performing a hard decision, other nonlinear functions as analyzed in Section 5.3 can be used. After detecting aˆ 1 , its influence onto the remaining signals can be removed by subtracting its contribution s1 aˆ 1 from the received vector y (interference cancellation) y˜ 2 = y − s1 aˆ 1 , (5.59) where the vector s1 represents the first column of the system matrix S (see Figure 4.7 or (4.68)). The residual signal y˜ 2 is then processed by a second filter w2 . It is obtained by removing s1 from S and calculating the ZF or MMSE filter W2 for the reduced system matrix S˜ 2 = s2 · · · sNu . The first column of W2 denotes the filter w2 that is used for ˜ 2 . This procedure is repeated suppressing the interference of the second user by a˜ 2 = wH 2 ·y until all users have been detected. To determine the linear filters in the different detection steps, the system matrices describing the reduced systems have to be inverted. This causes high implementation costs. However, a much more convenient way exists that avoids multiple matrix inversions. This approach leads to identical results and is presented in the next section.
5.4.2 QL Decomposition for Zero-Forcing Solution In this subsection, an alternative implementation of the BLAST detector is introduced. It saves computational complexity compared to the original detector introduced in the last 4 It has to be mentioned that this conclusion holds for uniform power distribution among the users. If different power levels occur (near-far effects), strong codes can have a better performance than weak codes (Caire et al. 2004).
MULTIUSER DETECTION IN CDMA SYSTEMS
259
section. As for the linear multiuser detectors, we can distinguish the ZF and the MMSE solution. We start with the derivation of the QL decomposition for the linear ZF solution.5 Going back to the model y = Sa + n, the system matrix S can be decomposed into an Ns × Nu matrix Q with orthogonal columns qu of unit lengths and an Nu × Nu lower triangular matrix L (Golub and van Loan 1996) y = QLa + n. Because QH Q = INu , the multiplication of y with QH yields 0 L1,1 L2,1 L2,2 0 y˜ = QH · y = La + QH n = . · a + n˜ . . .. .. .. LNu ,1 LNu ,2 · · · LNu ,Nu
(5.60)
(5.61)
with n˜ still representing white Gaussian noise because Q is unitary. To clarify the effect of multiplying with QH , we consider the matched filter outputs r = SH y = Ra + SH n. Performing a Cholesky decomposition (Golub and van Loan 1996) of R = LH L with the lower triangular matrix L, results in r = LH La + SH n
⇒
r˜ = La + L−H SH n.
(5.62)
The comparison of (5.61) with (5.62) illustrates that the multiplication of y with QH can be split into two steps. First, a matched filter is applied, providing the colored noise vector SH n with the covariance matrix = σN2 R = σN2 LH L. Therefore, the second step represents a multiplication with −1/2 = L−H which can be interpreted as whitening. Because of the triangular structure of L in (5.61), the received vector y˜ has been partly freed of interference, for example, y˜1 depends only on a1 disturbed by the noise term n˜ 1 . Hence, it can be directly estimated by appropriate scaling and the application of a nonlinearity Q(·) 0 / . (5.63) · y ˜ aˆ 1 = Q L−1 1 1,1 The obtained estimate can be inserted in the second row to subtract interference from y˜2 and so on. We obtain the u-th estimate by ! 3 4# u−1 1 · y˜u − Lu,v · aˆ v aˆ u = Q . (5.64) Lu,u v=1
This procedure abbreviated by QL-SIC has to be continued until the last symbol aNu has been estimated so that a SIC as depicted in Figure 5.28 is carried out. With this recursive procedure, the matrix inversions of the original BLAST detection can be circumvented and only a single QL decomposition has to be carried out. Furthermore, we can exploit the finite nature of the signal alphabet by introducing Q(·). The linear filtering with Q has to cope partly with the same problems as the decorrelator. For the first user, all Nu − 1 interfering signals have to be linearly suppressed. Since all 5 Throughout the subsequent derivation, the QL decomposition will be used. Equivalently, the QR decomposition of S can often be found in publications.
260
MULTIUSER DETECTION IN CDMA SYSTEMS y˜1 1/L1,1 y˜2
aˆ 1 Q(·)
L2,1
L4,1
L3,1
1/L2,2
Q(·)
aˆ 2
L3,2 L4,2
y˜3 1/L3,3
Q(·)
aˆ 3 L4,3
y˜4 1/L4,4
Q(·)
aˆ 4
Figure 5.28 Illustration of interference cancellation after decomposing S into Q and L and filtering y with QH
columns of Q have unit length, the noise is not amplified but the desired signal may be very weak. This results in a small diagonal element in L and, hence, a low SNR. In successive steps, more interference is already cancelled and the columns of Q tend more and more toward a matched filter. In multiple antenna systems, this also affects the achievable diversity gain as shown in Section 6.4. From the above explanation, we can also conclude that the proposed scheme cannot be directly applied to systems with a load β > 1. For Nu > Ns , the system matrix S has more columns than rows. Since only Ns orthogonal columns exist that span the Ns dimensional space, Q is an Ns × Ns matrix. Consequently, L would not be a lower triangular matrix but will have the form L1,Ns +1 . . . L1,Nu L1,1 L2,1 L2,2 L2,Ns +1 . . . L2,Nu L= . . .. . .. .. .. .. . . . LNu ,1
LNu ,2
. . . LNu ,Ns
LNu ,Ns +1
. . . LNu ,Nu
We recognize that all layers – even the first – now suffer from interference. Further information about overloaded systems in the context of multiple antenna systems can be found in Damen et al. (2003). With reference to the MMSE solution, a direct implementation is possible even for β > 1 (see Section 5.4.3). Improvement of Real Modulation Schemes From linear detectors such as the decorrelator and MMSE filter, we already know that remarkable improvements can be achieved by taking into account that the imaginary part of the transmitted symbols does not contain information for real-valued modulation schemes. To exploit this knowledge for the QL decomposition also, we have to separate the real and
MULTIUSER DETECTION IN CDMA SYSTEMS
261
imaginary parts for all terms of the equation y = Sa + n. With y , S , a and n denoting real parts and y , S , a , and n representing the imaginary parts, we obtain * + * + * + * + S −S a n y = · + . (5.65) y S S a n Since a = 0 for real modulation schemes, a = a holds and (5.65) reduces to * + * + * + n S y = · a + = Sr · a + nr . y S n
(5.66)
From (5.66), we see that the real system matrix has twice as many rows as S. This generally leads to a better condition and, therefore, a lower noise amplification in the ZF case. Representing the received symbols by separated real and imaginary parts, we simply have to decompose Sr . The obtained real matrices Qr and Lr are then used for the subsequent detection procedure, as described above. The results will be presented at the end of this subsection. Modified Gram-Schmidt Algorithm QL decompositions can be implemented with different algorithms using Householder reflections or Givens rotations (see Appendix C). In this book, we will refer to the modified Gram-Schmidt algorithm (Golub and van Loan 1996) leading to S = s1 · · · sNu −1 sNu = QL L1,1 . .. . . (5.67) = q1 · · · qNu −1 qNu · . . LNu −1,1 LNu −1,2 LNu −1,Nu −1 LNu ,1 LNu ,2 LNu ,Nu −1 LNu ,Nu The orthonormal columns qu are determined successively one after the other. Neglecting for the moment an appropriate sorting, we start with the last column qNu = sNu /sNu , that is, qNu points to the same direction as the last signature sNu and has unit norm. Because sNu = LNu ,Nu · qNu , LNu ,Nu = sNu equals the length of the last signature. The next column vector qNu −1 must be orthogonal to qNu . Hence, we first subtract the projection of sNu −1 onto qNu LNu ,Nu −1 = qH Nu sNu −1
⇒
q˜ Nu −1 = sNu −1 − LNu ,Nu −1 · qNu
(5.68a)
and normalize the difference with LNu −1,Nu −1 to length 1. LNu −1,Nu −1 = q˜ Nu −1
⇒
qNu −1 = q˜ Nu −1 /LNu −1,Nu −1
(5.68b)
The third column qNu −2 has to be perpendicular to the plane spanned by qNu and qNu −1 . Hence, the above procedure has to be repeated and we obtain the general construction of the u-th column qu . Lv,u = qH v · su for u ≤ v < Nu
⇒
q˜ u = su −
Nu
Lv,u · qv
(5.69a)
v=u+1
Lu,u = q˜ u
⇒
qu = q˜ u /Lu,u
(5.69b)
262
MULTIUSER DETECTION IN CDMA SYSTEMS Table 5.1 Pseudo Schmidt algorithm step (1) (2) (3) (4) (5) (6) (7) (8) (9)
code
for
modified
Gram-
task Initialize with L = 0, Q = S for u = Nu , . . . , 1 determine diagonal element Lu,u = qu normalize qu = qu /Lu,u to unit length for v = 1, . . . , u − 1 calculate projections Lu,v = qH u · qv qv = qv − Lu,v · qu end end
This procedure develops Q and L from right to left and has to be continued until all Nu columns in Q and the corresponding elements in L have been determined. Subtracting the projections of the unprocessed columns sv
(5.70)
Hence, the smallest diagonal element of ZF corresponds to the smallest row norm of L−1 . The order of detection can be optimized after the QL decomposition by an algorithm proposed in Hassibi (2000) and termed Post-Sorting Algorithm (PSA). Starting with the unsorted QL decomposition according to the modified Gram-Schmidt algorithm, we have to permute the rows of L−1 according to a certain sorting criterion. However, permutations destroy the triangular structure of L−1 . The structure can be restored by applying Householder reflections (Golub and van Loan 1996), that is, we multiply with a unitary matrix that forces certain elements of a row or a column to zero without changing its norm (see also Appendix C). Figure 5.29 illustrates the principle of the PSA. In the first step, we have to find the row with the smallest norm (dark gray). It is exchanged with the first row of L−1 , which is performed by a permutation matrix P1 . In our example, 0 0 1 0 0 1 0 0 P1 = 1 0 0 0 0 0 0 1
MULTIUSER DETECTION IN CDMA SYSTEMS
263
L−1
P1 L−1 1
P2 P1 L−1 1 2
P1 L−1
P2 P1 L−1 1
P3 P2 P1 L−1 1 2
P1 L−1 1
P2 P1 L−1 1 2
P3 P2 P1 L−1 1 2 3
Figure 5.29 Illustration of post-sorting algorithm (white squares indicate zeros, light gray squares indicate nonzero elements, dark gray squares indicate row with minimum norm, crossed squares are neglected subsequent steps)
holds. Next, we apply the Householder reflection matrix 1 to force the last three elements in the new first row to zero to match this row to the triangular structure. Note that Householder reflections do not affect the row norm so that the norm of the considered row is concentrated in a single nonzero element. Now, we have generated a tentative triangular matrix P1 L−1 1 whose first row has only a single nonzero element, that is, no interference disturbs the corresponding symbol. Assuming that this symbol is decided correctly (it has the lowest error probability of all symbols), its interference on the remaining symbols can be perfectly cancelled. Hence, it has no influence on subsequent cancellation steps so that the first row and the first column of P1 L−1 1 can be removed. The whole procedure is now repeated for the reduced matrix until the optimum order is obtained. We have to carry out at the most Nu permutations and Householder reflections so that we finally obtain −1 L−1 opt = PNu · · · P1 L 1 · · · Nu
⇔
H H H Lopt = H Nu · · · 1 LP1 · · · PNu
(5.71)
and S = QL = Qopt Lopt
⇔
Qopt = Q1 · · · Nu .
(5.72)
264
MULTIUSER DETECTION IN CDMA SYSTEMS
Using this QL decomposition provides the best order of detection in the SIC stage and delivers estimates in a permuted vector aˆ opt = PNu · · · P1 aˆ . Its symbols have to be re-sorted into the original succession by H ˆ opt . aˆ = PH 1 · · · PNu · a
(5.73)
Obviously, the sorting does not change the system properties because the received signal can be expressed as y = Sopt · aopt + n = Qopt · Lopt · aopt + n H H H = Q 1 · · · Nu · H N · · · 1 L P1 · · · PNu · PNu · · · P1 a + n = QLa + n. u INu
INu
Besides the QL decomposition of S, the optimum sorting requires the inversion of L and several permutations and Householder reflections. To reduce the computational costs, the next subsection introduces a sub-optimum algorithm whose performance is very close to the optimum solution. Sorted QL Decomposition A sub-optimum but very efficient solution for the sorting problem is presented in W¨ubben et al. (2001), which directly affects the QL decomposition. The main problem to be solved is that the Gram-Schmidt algorithm used for the QL decomposition starts with LNu ,Nu in the lower right corner and proceeds up to L1,1 , while we would like to fix the largest possible L1,1 first and continue down to the shortest row at the bottom. In other words, the order of detection is reverse to the order of decomposition. We saw from the PSA that the order of detection can be adapted by permuting the columns qu of Q as well as the rows of L. This coincides with a different sorting of the column vectors su and the data symbols au in a. A sub-optimum permutation can be carried out during the QL decomposition itself. The algorithm proposed in W¨ubben et al. (2001) is based on a QR decomposition but can be directly adapted to the QL decomposition considered here. The basic idea behind the algorithm is that the determinant of a triangular matrix equals the product of the diagonal elements and is invariant with respect to permutations of rows or columns. The Sorted QL Decomposition (SQLD) now assumes that starting with the smallest possible LNu ,Nu will finally lead to large values in the upper left part of L, since the product is constant. The algorithm is summarized as a pseudo code in Table 5.2. After the initialization Q = S, the column qku with the smallest norm is determined and exchanged with the right-most unprocessed vector. Since the projections of the remaining columns onto a new vector are immediately subtracted in each step, no Householder reflections are explicitly necessary. At the end of the procedure, we obtain a orthonormal matrix Q, a triangular matrix L, as well as a set of permutation matrices Pu with 1 ≤ u ≤ Nu . It has to be mentioned that the proposed SQLD does not always lead to the optimum detection order. Problems occur especially in situations where two column vectors have large lengths but point in similar directions. In this case, these large vectors are among the latest columns to be orthogonalized but since the projection of one vector onto the other vector is very large, the orthogonal component becomes very small. Hence, we obtain a very small diagonal element in the upper left corner of L.
MULTIUSER DETECTION IN CDMA SYSTEMS
265
Table 5.2 Pseudo code for sorted QL decomposition step (1) (2) (3)
task Initialize with L = 0, Q = S for u = Nu , . . . , 1 search for minimum norm among remaining columns in Q ku = argmin qv 2 v=1,...,u
(4) (5) (6) (7) (8) (9) (10) (11)
exchange columns u and ku in Q, and determine Pu determine diagonal element Lu,u = qu normalize qu = qu /Lu,u to unit length for v = 1, . . . , u − 1 calculate projections Lu,v = qH u · qv qv = qv − Lu,v · qu end end
However, events leading to sub-optimum sorting are very rare and the SQLD represents an appropriate pre-sorting algorithm. For the aforementioned situations, the presented PSA can be applied for further improvements. It then requires only a few additional permutations (low complexity) because of the pre-sorting and can still achieve the optimum order of detection. The whole receiver structure is depicted in Figure 5.30. First, the system matrix S – either ideally known or estimated – is decomposed according to the SQLD algorithm and potentially post processed by the PSA. The latter delivers the matrices Q for linear pre-filtering of the received vector y, resulting in y˜ , L for the SIC providing aˆ opt and P for the inverse permutation, leading to the final estimate aˆ . Simulation Results Figure 5.31a analyzes the influence of α for NL 1 on the error rate performance. Obviously, there is only a slight dependency for QPSK and small values of α should be preferred. Although the average BER across all users is considered, similar results are obtained for
S
[Q, L, P]
SQLD
Q y
PSA
L QH
y˜
SIC
P aˆ opt
P−1
aˆ
Figure 5.30 Block diagram of SQLD-SIC detector with post-sorting algorithm (PSA)
266
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10
10
a) optimizing α for NL 1
0
b) different nonlinearities
0
10
BPSK QPSK SUB −1
NL 1 HD tanh
−1
QPSK
10
BER →
10
−2
−2
10
BPSK 10
−3
0
−3
0.2
0.4 0.6 α→
0.8
1
10
0
5 10 15 Eb /N0 in dB →
20
Figure 5.31 Performance of ZF-SQLD-SIC for uncoded OFDM-CDMA system, 4-path Rayleigh fading channel, and β = 1; a) Eb /N0 = 8 dB b) optimized α
user-specific error probabilities. Figure 5.31a compares different nonlinearities. Their performances are nearly identical because the average error rate is dominated by the first layer where no interference cancellation was applied. Because of the higher effective interference, the SQLD-SIC performs worse for QPSK. For BPSK, the SUB is reached asymptotically. Figure 5.32a illustrates the performance improvement obtained by linear preprocessing with QH . Comparing the error rate performance of the SQLD-SIC approach with that of a sole SIC after the first detection stage, we observe that linear pre-filtering enhances the performance significantly. While it approaches the SUB, the pure SIC saturates at Pb = 10−2 . Only at very low SNR, the performances are comparable. Additionally, the dashed line with circles emphasizes the loss that occurs for SQLD-SIC when the real nature of BPSK is not considered in the QL decomposition. At an error rate of 10−2 , the loss amounts to 7 dB. Figure 5.32b depicts the results for QPSK. The gain of SQLD-SIC compared to SIC becomes larger for growing interference as shown by the difference between BPSK and QPSK. However, for both detection schemes a large loss compared to the SUB can be observed. Moreover, it is demonstrated that even sub-optimum sorting for the SQLD-SIC improves the performance compared to unsorted QLD-SIC. Applying the optimum PSA, yields an additional gain of more than 3 dB over the SQLD-SIC. We now have a look at the variations of user-specific error rates. From Subsection 5.2.1, we know that the decorrelator totally suppresses multiuser interference at the expense of a noise amplification. Moreover, it has already been mentioned that the first row of QH also suppresses the interference perfectly but may suffer from small SNRs, while the last row represents a matched filter dealing perfectly with the background noise.6 Hence, the 6 The
noise power is not influenced but the signal power may be very small.
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10
10
10
a) BPSK
0
SQLD real SQLD cmplx SIC SUB
−1
b) QPSK
0
10
−1
10
BER →
10
267
−2
−2
10
−3
−3
0
5 10 15 Eb /N0 in dB →
20
10
0
SIC QLD SQLD PSA SUB
5 10 15 Eb /N0 in dB →
20
Figure 5.32 Performance of different detection schemes with NL 1 for uncoded OFDMCDMA system, 4-path Rayleigh fading channel, and β = 1
0
10
min BER max BER mean BER SUB
−1
BER →
10
−2
10
−3
10
−4
10
0
5 10 Eb /N0 in dB →
15
Figure 5.33 Error rates for weakest, strongest, and average user after ZF-SQLD-SIC for uncoded OFDM-CDMA system with BPSK, 4-path Rayleigh fading channel, and β = 1 SNRs of the remaining rows are somewhere in between. This is illustrated in Figure 5.33 showing the minimum and the maximum user-specific error rates, as well as the average error rates. To improve the weak users, especially, two possibilities exist that can also be combined. First, we can replace the ZF approach with an MMSE solution as presented in the next subsection. Second, iterative turbo processing with an initial QL detection stage as explained in Subsection 5.4.4 can significantly improve the performance of a CDMA system.
268
MULTIUSER DETECTION IN CDMA SYSTEMS
5.4.3 QL Decomposition for MMSE Solution From Subsection 5.2.2 we already know the linear MMSE detector & '−1 N0 H WH = S S + I · SH N MMSE Es u that looks for a compromise between interference suppression and amplification of thermal noise. Hence, some residual interference remains in all outputs. To find the QL decomposition according to the MMSE criterion and an appropriate order of detection, the MMSE solution is now described as the ZF solution of a modified system (B¨ohnke et al. 2003; W¨ubben et al. 2003a). This modification is obtained by extending the received vector y and the system matrix S according to Hassibi (2000) * * + + S y , S = σN . (5.74) y= 0Nu ×1 σ · INu
A
−1 = SH S SH to the modified system yields
Applying now the decorrelator with WH ZF H −1 H aˆ MMSE = S S S · y &1 2 * S +'−1 1 H σN · I S S = Nu · σN σA σA · INu ! #−1 2 σ = SH S + N2 · INu · SH y. σA
(5.75a) σN σA
2 * y + · INu · 0Nu ×1 (5.75b)
We recognize from (5.75b) that the ZF approach of the modified system leads directly to the MMSE solution for the original system. Hence, we can apply the same QL decomposition techniques derived in the previous section to S instead of S. We obtain + + * * + * S Q1 Q1 · L =Q·L= ·L= (5.76) S = σN Q2 Q2 · L σA · INu where Q is now a unitary (Ns + Nu ) × Nu matrix while L is again a lower triangular Nu × Nu matrix. Obviously, they are different from Q and L of the ZF solution. The matrices Q1 and Q2 divide the matrix Q into the upper Ns and the lower Nu rows. Although Q has orthogonal columns of unit length, this certainly does not hold for Q1 and Q2 . An important difference compared to the ZF solution has to be mentioned. Since the extended system matrix S consists of Ns + Nu rows and Nu columns, it can always be QL decomposed regardless of the load β. Hence, the MMSE approach is also applicable for overloaded systems. Looking at the filtered received vector H H H H y˜ = QH y = QH 1 y + Q2 0 = Q1 y = Q1 Sa + Q1 n,
(5.77)
we observe that only the first part Q1 of Q contributes to y˜ because y was extended by zeros. Multiplying only with Q1 slightly reduces the computational costs. Inserting S = QL into (5.77) results in H y˜ = QH (5.78) 1 QLa + Q1 n.
MULTIUSER DETECTION IN CDMA SYSTEMS
269
Since the columns in Q1 and Q are not orthogonal, QH 1 · Q does not equal the identity matrix and the interference between two neighboring rows is not totally suppressed. This illustrates the known trade-off between interference suppression and noise amplification. With σN σN ! · QH ⇒ QH · QH (5.79) QH S = QH 1 S+ 2 =L 1 S=L− 2 , σA σA we can replace the term QH 1 · S in (5.77) and obtain y˜ = L · a −
σN H · QH 2 a + Q1 n σA
(5.80)
for the filtered received signal. The second term on the right-hand-side of (5.80) represents 2 the residual interference after filtering with QH 1 . The smaller the noise power σN , the 2 less interference remains in the system. For σN → 0, the MMSE solution tends to the ZF solution as already shown in Section 5.2. Sorting for the MMSE QL Decomposition As in the case of the ZF-QL decomposition, the order of detection has to be determined with respect to the error covariance matrix MMSE . Using the extended system model with S, the ZF criterion delivers with the results of Subsection 5.2.1 −1 MMSE = ZF = σN2 · SH S = σN2 · L−1 L−H . (5.81) Hence, the row norms of L−1 determine the optimum sorting. In the MMSE case, the inverse of L need not be calculated explicitly because it is already contained in (5.76). σN · INu = Q2 · L σA
⇒
L−1 =
σA · Q2 σN
(5.82)
Therefore, the inverse of L is obtained as a by-product of the initial QL decomposition. Sorting by the SQLD or the optimum PSA simply has to exploit the row norms of Q2 in the lower part of Q. This compensates for the higher computational costs for QL decomposing of the extended matrix S. Simulation Results Figure 5.34a depicts the results obtained for ZF- and MMSE-SQLD-SIC for a BPSK modulated system with β = 1. Both approaches show nearly identical performances and reach the SUB above 10 dB, while the curve for noniterative SIC without precoding saturates at an error rate of 10−2 . The real-valued nature of BPSK has been exploited so that the background noise amplification of the ZF solution does not have such a severe influence. However, five iterations of the nonlinear SIC without linear preprocessing (shown in Figure 5.19) achieve the same performance with lower computational complexity. Hence, filtering with QH is not necessary for moderate SINR. For QPSK leading to a twice-as-high effective interference, the results are depicted in Figure 5.34b. We see that the simple nonlinear SIC without precoding (bold dashed line) cannot ensure a reliable transmission. Among the schemes with linear preprocessing, the ZF-SQLD-SIC with optimum post sorting loses approximately 3 dB compared to the
270
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10 10 10 10 10
a) BPSK
0
b) QPSK
0
10
−1
−1
10
−2
BER →
10
−3
−4
−5
0
MMSE-SQLD ZF-SQLD SIC SUB
5 10 15 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
20
10
0
ZF-QLD-PSA QLD SQLD SQLD-PSA
5 10 15 Eb /N0 in dB →
20
Figure 5.34 Performance of QLD based SIC for ZF and MMSE with and without optimum PSA for uncoded OFDM-CDMA system, a 4-path Rayleigh fading channel, and β = 1 unsorted MMSE-QLD-SIC at an error rate of 10−3 . Further improvements can be achieved by appropriate sorting. The MMSE-SQLD-SIC leads to an additional gain of 4 dB, while optimum post sorting provides a gain of 6.5 dB and comes close to the SUB. The turbo SIC without precoding (see Figure 5.21) with 10 detection stages saturates at an error rate of 10−3 for large SNR. On the contrary, all QL-based schemes do not show an error floor in the depicted range.
5.4.4 Turbo Processing To improve the performance of the SQLD-SIC schemes, iterative turbo detection can be applied. Once all the signals have been detected, the filter QH has to be substituted with a bank of matched filters and SIC as described in Section 5.3 is carried out. Hence, the linear filter is just used as a catalyst improving the performance of the first stage and then dropped. The reason for this approach is twofold: First, the interference has already been partly cancelled after filtering with QH . Because of the triangular structure of L, the signal y˜1 in the first row does not suffer from interference of a2 · · · aNu and, hence, its decision cannot be improved by the knowledge of the other signals. Second, suppressing the interference by ZF leads to a strong noise amplification; the MMSE solution represents a trade-off between interference suppression and noise amplification. The optimum receiver in the absence of interference, for example, after a perfect cancellation, would be the matched filter performing just the maximum ratio combining. Hence, after the QL decomposition and a first SIC stage delivering aˆ (1) , we use the output of the matched filter bank to apply the already introduced SIC. Figure 5.35a shows the results obtained for a load β = 1 and the ZF criterion. Both, SQLD-SIC with and without PSA gain significantly by subsequent detection stages and outperform the pure SIC (bold dashed line) at high SNRs. Only at very low SNR, the
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10 10 10 10 10
a) ZF
0
b) MMSE
0
10 SQLD QLD-PSA
−1
−2
−3
SQLD QLD-PSA
−1
10
BER →
10
271
−4
−2
10
−3
10
−4
10
−5
−5
0
5 10 15 Eb /N0 in dB →
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 5.35 Performance of iterative SQLD-SIC with and without PSA for uncoded OFDMCDMA system, QPSK and a 4-path Rayleigh fading channel, and β = 1 (dashed lines: initial stage, dotted lines: three subsequent stages, solid lines: 10 subsequent stages)
0
10
SQLD – initial SQLD – 10 it. PSA – initial
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
5
10 15 Eb /N0 in dB →
20
Figure 5.36 Performance of iterative MMSE-SQLD-SIC for uncoded OFDM-CDMA system, QPSK and, a 4-path Rayleigh fading channel, and β = 1.5 performance of SIC is the same. Optimum post sorting of the initial SQLD improves the performance even after several iterations, whereby the gain becomes smaller with each stage. Figure 5.35b compares the same detection schemes for the MMSE criterion. Contrary to the ZF case, the PSA provides a performance close to the SUB, even in the initial detection stage. Hence, subsequent iterations are not necessary, saving valuable computational resources. Even without PSA, the MMSE approach performs much better than ZF and the gains due to additional iterations are smaller. At an error rate of 10−2 , the MMSE
272
MULTIUSER DETECTION IN CDMA SYSTEMS
BER →
10 10 10 10 10
a) Lc = 3
0
SQLD-SIC 10 it. SIC, 10 it.
−1
−2
−3
−4
−1
10
−2
10
−3
10
−4
10
−5
0
b) Lc = 7
0
10
BER →
10
−5
2
4 6 8 10 12 14 Eb /N0 in dB →
10
0
SQLD-SIC 4 it. SIC, 4 it. SIC, 10 it.
2
4 6 8 10 12 14 Eb /N0 in dB →
Figure 5.37 Performance of iterative MMSE-SQLD-SIC and SIC for coded OFDM-CDMA system, QPSK, and a 4-path Rayleigh fading channel, and β = 2 solution gains about 1 dB compared to the ZF approach, at 10−3 , the gain has increased to 2 dB. From Figure 5.36 we see that for higher loads, for example, β = 1.5, there is still a small improvement achievable by iterations. However, a convergence toward the SUB cannot be obtained anymore. Nevertheless, optimum sorting still provides a remarkable gain over the sub-optimum sorting of the SQLD. If channel coding is applied, we can obviously incorporate the decoder into the interference cancellation steps as described in Subsection 5.3.3. However, long codes for spreading or variations of the channels during one coded frame would lead to various system matrices S[k] that have to be QL decomposed. In high probability, the sorted version of the QL decomposition would deliver different detection orders. Therefore, the interference cancellation would start with a different user for each block. This causes problems for the channel decoding because decoding can start only if a complete frame of a certain user is processed. Since the SIC in Subsection 5.3.3 was evaluated for a coded OFDM-CDMA system with seven fading blocks per frame, we now use the unsorted version of the QL decomposition, allowing the same permutation for all blocks. Hence, after filtering with QH , the coded frame of user 1 is totally freed from interference, can be decoded, and the estimated coded bits are used to cancel interference from subsequent signals. Then, the procedure is continued for user 2 and so on. Figure 5.37a shows the results for a half-rate convolutional code with constraint length Lc = 3, QPSK, and a load of β = 2. The dashed lines indicate the error rate performance for the SQLD-SIC scheme after different iterations, where the final 10th iteration corresponds to the bold dashed line. With the iterative scheme, it is possible to approach the SUB within a gap of 1 dB for SNRs above 6 dB. Comparing the results with those of Figure 5.27 (solid line with circles), it becomes obvious that the linear prefilter improves the performance at low SNR while both schemes have approximately the same performance for higher SNR.
MULTIUSER DETECTION IN CDMA SYSTEMS
273
Results for a convolutional code with Lc = 7 are shown in Figure 5.37b. The solid lines with circles and squares represent the SIC performance after 4 and 10 iterations, respectively (cf. Figure 5.27). Obviously, the SQLD-SIC scheme provides a much faster convergence. After 4 iterations, the simple SIC with 10 iterations is outperformed. Nevertheless, there remains a large gap of 4 dB to the SUB that cannot be closed. The comparison with the Lc = 3 code shows that the performance is slightly worse although the computational costs are much higher. Four iterations for the Lc = 7 code require a complexity that is more than six times higher than the complexity for 10 iterations of the Lc = 3 code.
5.5 Summary Contrary to Chapter 4 that treated multiple access interference as additional noise, this chapter addressed MUD strategies that exploit the interference’s specific structure. We started with linear MUD approaches that do not consider the discrete nature of the signal alphabets but assume continuously distributed inputs. The decorrelator fulfilling the ZF criterion totally suppresses the interference at the expense of a severe amplification of the background noise. The MMSE filter represents a compromise between interference suppression and noise amplification and showed the best performance among all linear techniques. It was shown that both approaches can be approximated by iterative algorithms saving valuable computational costs, especially for large systems. While the linear PIC converges only for low loads, the successive counterpart showed a very robust behavior. Generally, we can state that the MMSE solution leads to substantial gains compared to the simple matched filter, although it is still far away from reaching capacity. A major shortcoming of all the linear techniques is the assumption of continuously distributed inputs. This drawback was overcome by introducing nonlinear devices into the iterative interference cancellation structures. Simple hard decisions as well as more sophisticated soft-output devices including FEC decoders have been examined. All nonlinearities lead to substantial gains over the linear approaches. The best performance was achieved with the incorporation of the FEC decoder, approaching the SUB up to a maximum load of β = 2 even for complex QPSK modulation. The price to pay for this amazing performance is a high computational complexity because the decoding has to be carried out several times for each user. Finally, the combination of linear and nonlinear MUD strategies have been analyzed. On the basis of a QL decomposition of the system matrix, a linear prefilter partly suppresses the interference, leading to a better convergence of the subsequent nonlinear SIC stage. Appropriate sorting was shown to be an important feature because it minimizes the risk of error propagation. Besides the optimum sorting algorithm, a sub-optimum low cost alternative was presented, termed sorted QL decomposition (SQLD). Moreover, the QL decomposition based on the MMSE criterion was also derived. Compared to pure SIC without pre-filtering, an improved convergence, especially, for extremely high loads was observed.
6
Multiple Antenna Systems Multiple antenna systems became popular roughly a decade ago as a result of the fundamental work of Alamouti (1998), Foschini (1996), Foschini and Gans (1998), K¨uhn and Kammeyer (2004), Seshadri and Winters (1994), Seshadri et al. (1997), Tarokh et al. (1998), Telatar (1995), Wittneben (1991), Wolniansky et al. (1998), and many others. The reason for the great interest is that multiple antennas offer an efficient way to increase the spectral efficiency of mobile radio systems by exploiting the resource space. After their large potential has been widely recognized, they found their way into several standards. As an example, very simple structures can be found in Release 99 of UMTS systems (Holma and Toskala 2004). More sophisticated methods are under discussion for further evolutions (3GPP 2005a; Hanzo et al. 2002b). This chapter gives a brief overview of different multiple-input multiple-output (MIMO) strategies for point-to-point communications without claiming to be comprehensive. Multiuser scenarios are briefly discussed in Section 2.4. After the introduction, Section 6.2 addresses spatial diversity concepts. Starting with simple receive diversity, different possibilities to obtain a diversity gain with multiple transmit antennas are discussed. While these techniques improve the link reliability, multilayer transmission presented in Section 6.3 multiply the data rate without increasing the signal bandwidth. Linear dispersion (LD) codes introduced in Section 6.4 represent a comprehensive description of space–time coding (STC) and multilayer transmission and allow optimal trade-offs between diversity and multiplexing gains. Finally, the high potential of multiple antenna systems is illustrated by looking at the channel capacity in Section 6.5.
6.1 Introduction There exist a multitude of reasons for using multiple antenna systems. This section gives a brief overview of different strategies without claiming to be comprehensive. Principally, two different categories can be distinguished. The first objective is to improve the link reliability, that is, the ergodic error probability or the outage probability are reduced. This can be accomplished by enhancing the instantaneous signal-to-noise ratio (SNR) (beamforming) Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
276
MULTIPLE ANTENNA SYSTEMS
or by decreasing the variations of the SNR (diversity). If multiple access or cochannel interference in cellular networks disturbs the transmission, interferers that are separable in space can be suppressed with multiple antennas, resulting in an improved signal to interference plus noise ratio (SINR). Section 6.3 is restricted to diversity techniques for reducing SNR variations. The second objective discussed in Section 6.3 is to multiply the data rate by transmitting several data streams simultaneously over different antennas. This approach is denoted as space division multiple access (SDMA) and can certainly be combined with other multiple access schemes. Since bandwidth became a very valuable and expensive resource, using the space for increasing data rates without expanding the bandwidth is very attractive. Moreover, we will see in Section 6.5 dealing with channel capacity aspects that the potential capacity gain of multiple antenna systems is much larger than the gain obtained by simply increasing the transmit power. A technique termed multi-stratum codes offers a combination of diversity and multiplexing gains (B¨ohnke et al. 2004a,b,c; Wachsmann 2001). The manner in which multiple antennas should be used depends on the properties of the channel, especially on the rank r of H or its covariance matrix HH . As an example, we know from Section 1.5 that correlation among the subchannels reduces the diversity gain. In the case of a strong line-of-sight component (Rice fading), diversity is also not an appropriate means because fading is not a severe problem. If we can exploit other sources of diversity, for example, frequency diversity with the Rake receiver or time diversity due to coding over time-varying channels, we are probably already close to the additive white Gaussian Noise (AWGN) performance and little can be gained by a further increase of the diversity degree. In each of these cases, multiple antennas should be used in a different way. If we look for spatial multiplexing, we know from Section 2.3 that we need a channel whose rank is larger than one. Otherwise, we cannot reliably transmit parallel data streams. Hence, for highly correlated channels with a rank r = 1, beamforming that exploits only the strongest eigenmode of a channel would be an appropriate choice instead of multilayer transmission. Therefore, the manner in which multiple antennas are used has to be properly adapted to the general propagation conditions. In order to simplify notation, this chapter is restricted to frequency-nonselective channels. Hence, the impulse response hν,µ [, κ] between transmit antenna µ and receive antenna ν reduces to scalar coefficients hν,µ [] and the channel matrix in (1.33) of Section 1.2.4 becomes H[] = H[, 0]. Figure 6.1 illustrates the resulting structure of the communication system. The received signal can be described by y[] = H[] · x[] + n[]. h1,1 [] x1 []
(6.1) n1 []
h2,1 []
y1 [] n2 [] y2 []
x2 [] h2,NT [] xNT []
nNR []
hNR ,NT [] Figure 6.1 Structure of MIMO channel
yNR []
MULTIPLE ANTENNA SYSTEMS
277
Some extensions for frequency-selective environments are discussed in Al-Dhahir (2001), Al-Dhahir and Sayed (2000), W¨ubben (2006), W¨ubben and Kammeyer (2003), W¨ubben et al. (2003b).
6.2 Spatial Diversity Concepts This section addresses the application of multiple antennas at the receiver and/or the transmitter for the purpose of increasing the diversity degree. As already mentioned, only frequency-nonselective channels are considered for notational as well as conceptual simplicity. Another reason is that spatial diversity concepts achieve the highest gains for channels that do not provide diversity in other dimensions such as frequency or time. Moreover, only Rayleigh fading channels without a line-of-sight component are considered. Since we know that correlations among the contributing channels reduce the diversity gain (see Section 1.5) we further assume that the channels are totally uncorrelated. For Rice and correlated fading channels, refer to the theoretical results presented in Section 1.5. Uncorrelated channels can be achieved by an appropriate antenna spacing depending on the spatial channel characteristics, for example, the angle spread. Assuming a uniform linear array with equidistantly arranged antennas and an isotropic scattering environment where signals impinge from all directions with the same probability, a small distance d = λ/2 between neighboring elements may be sufficient. The parameter λ denotes the wavelength and is related to the carrier frequency f0 by λ = c/f0 where c describes the speed of light. On the contrary, d λ/2 must hold in scenarios with small angle spread and d can take values up to 10λ. This obviously requires a device large enough to host several antennas with appropriate distances. This section is divided into three parts: First, receive diversity is shortly explained. Next, orthogonal space–time block codes (STBCs) are addressed. Nonorthogonal block codes are not considered here and the interested reader is referred to Bossert et al. (2000, 2002), Gabidulin et al. (2000), Lusina et al. (2001, 2003, 2002). In the last subsection, space–time trellis codes providing an additional coding gain are introduced. An overview of space–time coding can also be found in Liew and Hanzo (2002).
6.2.1 Receive Diversity The simplest method to achieve spatial diversity is to use multiple antennas at the receiver. The structure of the system is depicted in Figure 6.2. It can be mathematically described with y[] = h[] · x[] + n[] (6.2) where h[] = h1 [], . . . , hNR [] T comprises all contributing channel coefficients. Since there is no interference, a simple matched filter performing maximum ratio combining represents the optimum receiver and we obtain r[] =
h[]H · y[] = x[] + n[] ˜ h[]2
where n[] ˜ = h[]H · n[]/h[]2 denotes the noise at the matched filter output.
(6.3)
278
MULTIPLE ANTENNA SYSTEMS n1 []
h1 []
y1 [] n2 [] y2 []
h2 []
x[]
nNR []
hNR []
yNR []
Figure 6.2 Structure of receive diversity system
Comparing (6.3) with the theoretical result from Section 1.5, we recognize that the full diversity degree D = NR is achieved as long as the channel coefficients remain uncorrelated. The single-input multiple-output (SIMO) channel is transformed by matched filtering into an equivalent single-input single-output system (SISO) channel with smaller variations of the SNR. The only difference between (6.3) and (1.104) √ affects √ the total received power that has not been normalized to Es /Ts (missing factor D = NR ). The reason for this is that the application of multiple receive antennas yields not only a diversity gain but also an array gain because the NR -fold power is collected, leading to a gain of 10 log10 (NR ) dB. This gain is independent of diversity considerations and is also available for totally correlated channels. Since Section 1.5 illustrates only the diversity effect, this array gain was suppressed by normalizing the received power. Hence, we always have to look carefully at the definition of the SNR when multiple antennas are applied. Figure 6.3 illustrates this difference by showing the results known from Section 1.5. In Figure a, the total transmitted energy per information bit is fixed at Eb . Hence, it is a) SNR per bit
0
0
10
−1
−1
10
10
−2
Ps →
Ps →
−2
10
−3
10
−4
−3
10 10
−5
0
10
−4
10 10
b) SNR per receive antenna
10
−5
5 10 15 Eb /N0 in dB →
20
10 −10 −5
0 5 10 15 Es /N0 in dB →
20
Figure 6.3 Performance of receive diversity for BPSK and uncorrelated Rayleigh fading channels; bold dashed line denotes AWGN channel (‘◦’: NR = 1, ‘×’: NR = 2, ‘’: NR = 4, ‘∇’: NR = 8, ‘!’: NR = 16)
MULTIPLE ANTENNA SYSTEMS
279
independent of NR and is equally distributed onto the diversity paths so that only the NR th part of Eb can be exploited at each receive antenna. In this scenario, the gain obtained solely by diversity can be observed. On the contrary, Figure b depicts the error rate versus the average Es /N0 at each receive antenna. Therefore, the total transmit power increases linearly with NR and the entire SNR after maximum ratio combining becomes NR times larger, indicating the additional array gain. Comparing the difference between adjacent curves in both the plots, we recognize a difference of 3 dB that exactly represents the gain obtained by doubling the number of receive antennas. We can conclude that receive diversity is an efficient and simple possibility to increase the link reliability. However, its applicability becomes immediately limited if the size of the receiving terminal is very small. Cell phones for mobile radio communications have become smaller and smaller in recent years so that it is a difficult task to place several antennas on such small devices. Even if we succeed, it is questionable whether the spacing would be large enough to guarantee uncorrelated channels. Although different polarizations represent a further dimension to obtain diversity, the decoupling is generally imperfect, leading to cross talk. In this situation, the question arises whether diversity can also be exploited with multiple antennas at the transmitter.
6.2.2 Performance Analysis of Space–Time Codes In this subsection, the general concept of space–time transmit diversity is addressed, that is, using multiple antennas at the transmitter. A straightforward implementation where a signal x[] is transmitted simultaneously over several antennas will not provide the desired diversity gain. Looking at the received signal NT 1 y[] = √ · x[] · hν + n[] NT ν=1
(6.4)
we see that an incoherent superposition is obtained, resulting in a new Rayleigh-distributed channel.1 Hence, the equivalent SISO channel still has SNR variations as large as the original single-input single-output system and no diversity has been gained. To overcome this dilemma, appropriate coding is required at the transmitter. This coding is performed in the dimensions space and time leading to the name space–time codes. First, this subsection discusses the potential of STCs and derives some guidelines concerning the code construction. In the next two subsections, specific codes, namely, orthogonal space–time block codes (oSTBCs) and space–time trellis codess (STTCs) are introduced. The general structure of the considered system is depicted in Figure 6.4. The data bits T d[i] are fed into the space–time encoder that outputs L vectors x[k] = x1 [k] · · · xNT [k] of length NT . They are transmitted over a MIMO channel according to (6.1). The channel coefficients hµ,ν [k] = hµ,ν are assumed to be constant during one encoded frame so that the received signal becomes y[k] = H · x[k] + n[k]. (6.5) 1 Note that the total transmit power has been normalized according to the agreement on page 289 so that it is independent of the number of antennas.
280
MULTIPLE ANTENNA SYSTEMS
−
Figure 6.4 Structure of transmit diversity system with NR receive antennas Combining all L vectors x[k], y[k], and n[k] within one coded frame as column vectors into the matrices X, Y, and N, respectively, results in Y = H · X + N, where the NT × L matrix
x1 [0] x2 [0] .. .
(6.6)
x1 [1] x2 [1]
X = x[0] x[1] · · · x[L − 1] = xNT [0] xNT [1]
··· ··· .. .
x1 [L − 1] x2 [L − 1] .. .
(6.7)
· · · xNT [L − 1]
denotes the entire data frame encoded in space and time. The code comprising all possible code matrices is termed X. The matrices N and Y have the dimensions NR × L. Next, we derive some general results concerning the achievable diversity and coding gains that can be used for the code design. An optimum maximum likelihood decision and a perfectly known channel matrix H are assumed at the receiver. We start with the ˜ already known from pairwise error probability between two competing codewords X and X Section 1.3. Contrary to Section 1.3, we now receive a mixture of all transmit signals at each receive antenna. Therefore, we have to look at the squared Frobenius (see Appendix C on ˜ 2 of both codewords instead page 336) norm of the noiseless received signals HX − HX F 2 ˜ of X − XF . The conditional pairwise error probability of (1.49) then becomes C 52 D5 , - 1 D 5HX − HX ˜5 E F ˜ | H = · erfc (6.8) Pr X → X . 2 2 4σN √ √ ˜ = X/ ˜ We now normalize the space–time codewords to B = X/ Es /Ts and B Es /Ts in the same way as was done in Section 1.3. This changes the squared Euclidean distance to 5 5 5 5 5H · (X − X) ˜ 52 · Es ˜ 52 = 5H · (B − B) (6.9) F F T s and (6.8) becomes with σN2 = N0 /Ts for complex-valued signals # !" , - 1 5 52 Es 5H(B − B) ˜ | H = · erfc ˜ 5 · Pr X → X . F 4N 2 0
(6.10)
MULTIPLE ANTENNA SYSTEMS
281 √ The complementary error function can be upper bounded by erfc( x) < e−x . Denoting the µth row of H with hµ leads to an upper bound + * , - 1 5 52 Es 5 5 ˜ ˜ Pr B → B | H ≤ · exp − H(B − B) F · 2 4N0 NR 52 Es 5 1 5h (B − B) ˜ 5 · ≤ · exp − µ 2 4N0 µ=1
≤
* + NR Es 1 7 ˜ ˜ H hH . · · exp − hµ (B − B)(B − B) µ 2 4N0
(6.11)
µ=1
˜ ˜ H is Hermitian and its rank r equals that of Obviously, the matrix A = (B − B)(B − B) ˜ Moreover, it is positive semidefinite and its r nonzero eigenvalues λν obtained B − B. by an eigenvalue decomposition A = UUH are real and positive. The pairwise error probability can now be expressed as + * NR , 7 Es ˜ |H ≤ 1· · Pr B → B exp − hµ UUH hH µ 2 4N0 µ=1
≤
* + NR 1 7 Es · exp −β µ β H · . µ 4N 2 0
(6.12)
µ=1
The new row vectors β µ = hµ U = [βµ,1 · · · βµ,NT ] still consist of complex rotationally invariant Gaussian distributed random variables βµ,ν because U is unitary (Naguib et al. 1997). Hence, the squared magnitudes of their elements are chi-squared distributed with two degrees of freedom. In order to obtain a pairwise error probability that is independent of the instantaneous channel matrix H, we have to calculate the expectation of (6.12) with respect to H. This results in , Pr B → B˜ = EH Pr B → B˜ | H ≤
* +F NR 7 r 1 7 Es · Eβ exp −λν · |βµ,ν |2 · 2 4N0 µ=1 ν=1
≤
+ * NR 7 r ∞ 1 7 Es dξ · e−ξ · exp −ξ · λν 2 4N0 0 µ=1 ν=1
3 r 1 7 1 ≤ · 2 1 + λν · ν=1
4NR Es 4N0
(6.13)
where r denotes the rank of A, that is, the number of nonzero eigenvalues. A further upper bound that is tight for large SNRs is obtained by dropping the +1 in the denominator.
282
MULTIPLE ANTENNA SYSTEMS
Rewriting (6.13) finally leads to the expression ! r #1/r −rNR 7 1 Es Pr B → B˜ < · · λν . 2 4N0
(6.14)
ν=1
From (6.14), the following conclusions can be drawn. Owing to the similarity with (1.112) where the reciprocal of the SNR is taken to the power of D, the exponent rNR is called the diversity gain . Hence, in order to achieve the maximum possible diversity degree, the minimum rank r among all pairwise differences B − B˜ should be maximized, leading to the diversity gain ˜ . gd = NR · min rank B − B (6.15) ˜ (B,B)
On the other hand, the coding gain leading to a horizontal shift of the error rate curves can be described by ! r #1/r 7 λν . (6.16) gc = min ˜ (B,B)
ν=1
If the code design ensures full-rank differences with r = rank{A} = NT , the product of the eigenvalues equals the determinant det(A) gc = min
˜ (B,B)
!N T 7 ν=1
#1/NT λν
/ 01/NT ˜ = min det(B − B) . ˜ (B,B)
(6.17)
We obtain the code design criteria according to (Tarokh et al. 1998): • rank criterion: In order to obtain the maximum diversity gain, the first design goal is ˜ The diversity degree equals to maximize the minimum rank r of all matrices X − X. rNR ; its maximum is NT NR . • determinant criterion: For a diversity gain of rNR , the coding gain is maximized if the minimum of ( rν=1 λν )1/r is maximized over all codeword pairs. A code optimization according to these criteria cannot be performed analytically but has to be carried out as a computer-based code search. The next two subsections introduce examples for space–time coding schemes. First, orthogonal STBCs are presented. Since their codewords are obtained by orthogonal matrix design, the determinant is constant and no coding gain is obtained. However, full diversity gains are achievable and the receiver structures are very simple. Second, space–time trellis codes are briefly described, providing additional coding gains at the expense of much higher decoding complexity.
6.2.3 Orthogonal Space–Time Block Codes Figure 6.5 shows the principle structure of a space–time block coding system for NR = 1 receive antenna. The subsequent derivation includes more generally the application of an arbitrary number of receive antennas. As a variation from the general concept of space–time coding depicted in Figure 6.4, the signal mapper and space–time encoder are separated. First, the data bits are mapped onto symbols a[] that are elements of a finite signal
MULTIPLE ANTENNA SYSTEMS
283
−
Figure 6.5 System structure for space–time block codes with NR = 1 receive antenna constellation according to the linear modulation schemes presented in Section 1.4. Next, the space–time block encoder collects a block of K successive symbols a[] and maps them T onto a sequence of L consecutive vectors x[k] = x1 [k] · · · xNT [k] , 0 ≤ k < L. Hence, the generated symbols a[] are encoded in two dimensions, namely, in space and time explaining the name space–time coding. The code rate amounts to Rc =
K . L
(6.18)
The system can certainly be improved by an outer forward error correction (FEC) coding scheme. In the following part, we make the widely used assumption that the channel remains constant during one coding block. Therefore, we can drop the time indices of the channel coefficients (hµ [k] → hµ ) in subsequent derivations. Alamouti’s Scheme In order to illustrate how oSTBCs work, a simple example introduced by Alamouti (1998) is used. Originally, it employs NT = 2 transmit antennas and NR = 1 receive antenna. However, it can be easily extended to more receive antennas. To be precise, we have to consider blocks of K = 2 consecutive symbols, say a1 = a[2] and a2 = a[2 + 1]. These two symbols √ are now encoded in the following way. At time instant √ 2k = 2, symbol x1 [2k] = a1 / 2 is transmitted at the first antenna and x2 [2k] = a2 / 2 at the second √ antenna. At the next time instant 2k + 1, the symbols are flipped and x1 [2k + 1] = −a2∗ / 2 √ as well as x2 [2k + 1] = a1∗ / 2 hold. The whole codeword arranged in space and time can be described using vector notations + * 1 a −a2∗ (6.19) X2 = x[2k] x[2k + 1] = √ · 1 ∗ 2 a2 a1 √ where the factor 1/ 2 ensures that the total average transmit power per symbol equals Es /Ts . The entire set of codewords is denoted by X2 . The columns comprise the symbols transmitted at a certain time instant, while the rows represent the symbols transmitted over a certain antenna. Since K = 2 symbols a1 and a2 are transmitted during L = 2 time instants, the rate of this code is Rc = K/L = 1. It is important to mention that the columns in X2 are orthogonal and so Alamouti’s scheme does not provide a coding gain.
284
MULTIPLE ANTENNA SYSTEMS
A different implementation was chosen in the UMTS standard (3GPP 1999) without changing the achievable diversity gain. Here, the code matrix has the form * + 1 a1 a2 X2 = x[2k] x[2k + 1] = √ · (6.20) ∗ ∗ . 2 −a2 a1 The advantage of this implementation is that the original symbols a1 and a2 are transmitted over the same antenna. Therefore, the first antenna is used in the same way as without space–time coding. Switching from NT = 1 to NT = 2 just requires the activation of the second antenna without influencing the data stream x1 []. Nevertheless, we will restrict our analysis on the first notation of (6.19). The corresponding two received symbols can be expressed by 1 y[2k] = √ · (h1 a1 + h2 a2 ) + n[2k] 2 1 y[2k + 1] = √ · (h1 (−a2∗ ) + h2 a1∗ ) + n[2k + 1]. 2
(6.21a) (6.21b)
Using vector notations, we can combine the two received symbols and the two noise samples T T into vectors y = y[2k] y[2k + 1] and n = n[2k] n[2k + 1] , respectively. This yields the compact description + * + * + * * + 1 h1 n y1 a1 a2 = √ · + 1 = X2 · h + n. (6.22) y= ∗ ∗ · y2 −a a h n2 2 2 2 1 Rewriting (6.22) by taking the conjugate complex of the second line, we obtain * + + * + * + * 1 1 h2 y a1 n h ˜ y˜ = 1∗ = √ · ∗1 + ∗1 = √ · H[X2 ] · a + n. ∗ · y2 h −h a n 2 2 2 2 1 2
(6.23)
With this slight modification, we have transformed the multiple-input single-output (MISO) channel h into an equivalent MIMO channel H[X2 ]. The matrix describing this equivalent channel has orthogonal columns. In this case, we already know from Chapter 4 that the matched filter represents the optimum detector according to the maximum likelihood principle. The matched filter output becomes + * 1 |h1 |2 + |h2 |2 0 ˜ · a + HH [X2 ] · n. (6.24) r˜ = HH [X2 ] · y˜ = √ · 0 |h1 |2 + |h2 |2 2 Looking at the diagonal elements that equal the squared norm of the contributing channel coefficients, we observe that the Alamouti scheme provides the full diversity degree D = NT = 2 that can be achieved with two transmit antennas. Moreover, no interference between a1 and a2 disturbs the transmission because HH [X2 ]H[X2 ] is a diagonal matrix. Owing to this reason and the fact that the noise remains white when multiplied by a matrix consisting of orthogonal columns, the ML decision with respect to the vector a can be split into element-wise decisions .2 . (6.25) aˆ µ = argmin .r˜µ − (|h1 |2 + |h2 |2 )a˜ . . a˜
MULTIPLE ANTENNA SYSTEMS
285
Although (6.24) looks similar to the result of simple receive diversity, there exists a major difference. Indeed, the diversity gain√is exactly the same for receive and transmit diversity concepts. However, the factor 1/ 2 in (6.24) leads to an SNR loss of 3 dB. The reason is that the receiver was assumed to have perfect channel knowledge so that beamforming with an antenna gain of 10 log10 (NR ) ≈ 3 dB is possible. On the contrary, we have no channel knowledge at the transmitter so that space–time transmit diversity techniques do not achieve any antenna gain. As all space–time coding schemes, the Alamouti scheme can be easily combined with multiple receive antennas. According to (6.23), we obtain a vector y˜ µ = Hµ [X2 ]a + n˜ µ
(6.26)
containing two successive symbols at each receive antenna 1 ≤ µ ≤ NR . They are now included in the vector T y˜ = y˜ T1 · · · y˜ TNR . Consequently, the equivalent channel matrix H[X2 ] also has to be extended. Following the notation in (6.23) it becomes h1,1 h1,2 ∗ ∗ h1,2 −h1,1 H1 [X2 ] . .. . . . .. = (6.27) H[X2 ] = . . hNR ,1 hNR ,2 HNR [X2 ] h∗NR ,2 −h∗NR ,1 The receiver now consists of a bank of matched filters, one for each receive antenna. Their outputs are simply summed, yielding R 1 ˜ r˜ = HH [X2 ] · y˜ = √ |hµ,1 |2 + |hµ,2 |2 · a + HH [X2 ] · n. 2 µ=1
N
(6.28)
As long as all channels remain uncorrelated, a maximum diversity degree of D = 2NR can be achieved. Extension to More than Two Transmit Antennas Using some basic results from matrix theory, one can show that Alamouti’s scheme is the only orthogonal space–time code with rate 1. For more than two transmit antennas, several orthogonal codes have been found with lower rates, so that spectral efficiency is lost. The code matrix XNT generally consists of NT rows and L columns and contains the symbols ∗ a1 , . . . , aK as well as the conjugate complex counterparts a1∗ , . . . , aK . The construction of XNT has to be performed such that XNT has orthogonal rows, that is, XNT XH NT = P · INT
(6.29)
holds, where P is a constant depending on the symbol powers that will be discussed on page 289. In the following part, all codeword matrices are presented without normalization.
286
MULTIPLE ANTENNA SYSTEMS
In Tarokh et al. (1999a), it is shown that there number of transmit antennas. The code matrices for examples. For NT = 3, we obtain a1 −a2 −a3 −a4 a1∗ a4 −a3 a2∗ X3 = a2 a1 a3 −a4 a1 a2 a3∗
exist half-rate codes for an arbitrary NT = 3 and NT = 4 are presented as −a2∗ a1∗ −a4∗
−a3∗ a4∗ a1∗
−a4∗ −a3∗ a2∗
(6.30)
providing a diversity degree of D = NT = 3. Obviously, X3 consists of L = 8 columns and K = 4 different symbols a1 , . . . , a4 are encoded, leading to the rate Rc = K/L = 1/2. Each symbol aµ occurs six times with full energy in X. From (6.30), we can write the received vector as h3 0 0 0 0 0 a1 h1 h2 a2 h2 −h1 0 −h 0 0 0 0 3 h3 0 −h1 h2 0 0 0 0 a3 0 0 0 0 h3 −h2 −h1 0 a4∗ + n. (6.31) y= 0 h3 0 a1 0 0 0 h1 h2 ∗ 0 0 −h3 0 0 0 h2 −h1 a2∗ 0 0 0 0 h3 0 −h1 h2 a3 0 0 0 0 0 h3 −h2 −h1 a4∗ We observe in (6.31) that the last four symbols in y only depend on the conjugate complex transmit symbols. Hence, conjugating the last four rows similar to the procedure for Alamouti’s scheme in (6.23) results in y1 n1 h3 0 h1 h2 n2 y2 h2 −h1 0 −h 3 n3 y3 h3 0 −h1 h2 a1 y4 0 h3 −h2 −h1 a2 n∗4 . + y˜ = H[X3 ]a + n˜ ⇒ = (6.32) ∗ y ∗ h∗ h∗ h3 0 a3 n5 2 5∗ 1∗ y h −h∗ n∗ 0 −h∗3 1 6∗ 2∗ 6∗ a4 ∗ ∗ y7 h n7 0 −h1 h2 3 ∗ ∗ ∗ ∗ y8 0 h3 −h2 −h1 n∗8 Obviously, (6.32) uses only the original symbols a = [a1 · · · a4 ]T and not their conjugate complex versions. Moreover, the columns in H[X3 ] are orthogonal so that HH [X3 ] · H[X3 ] = 2 ·
NT
|hµ |2 · I4 = 2 · |h1 |2 + |h2 |2 + |h3 |2 · I4
(6.33)
µ=1
holds. Therefore, the optimum receiver is again a matched filter that multiplies the modified received vector y˜ with HH [X3 ]. In the case of multiamplitude modulation, an appropriate scaling prior to the hard decision is necessary. For NT = 4, a diversity gain of D = NT = 4 is achieved with the code matrix a1 −a2 −a3 −a4 a1∗ −a2∗ −a3∗ −a4∗ a2 a1 a4 −a3 a2∗ a1∗ a4∗ −a3∗ . (6.34) X4 = ∗ ∗ a3 −a4 a1 a2 a3 −a4 a1∗ a2∗ a4 a3 −a2 a1 a4∗ a3∗ −a2∗ a1∗
MULTIPLE ANTENNA SYSTEMS Equivalent to the case of NT = 3, we h1 h2 h3 h4 h2 −h1 h4 −h3 h3 −h4 −h1 h2 h4 h3 −h2 −h1 y= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
287 obtain a received vector y according to a1 0 0 0 0 a2 0 0 0 0 0 0 0 0 a3 0 0 0 0 a4∗ + n. h1 h2 h3 h4 a1 ∗ h2 −h1 h4 −h3 a2∗ h3 −h4 −h1 h2 a3 h4 h3 −h2 −h1 a4∗
(6.35)
Complex conjugation of the last four elements in y leads to y˜ = H[X4 ] · a + n˜ with h3 h4 h1 h2 h2 −h1 h4 −h3 h3 −h4 −h1 h2 h4 h3 −h2 −h1 . H[X4 ] = (6.36) h∗ h∗ h∗3 h∗4 2 1∗ h −h∗ h∗ −h∗ 1 4 3 2∗ h −h∗ −h∗ h∗ 3 4 1 2 h∗4 h∗3 −h∗2 −h∗1 Again, the columns of H[X4 ] are mutually orthogonal and estimates aˆ are obtained by multiplying y˜ with HH [X4 ] and appropriate scaling. Looking at higher spectral efficiencies, only two codes with NT = 3 and NT = 4 have been found for Rc > 1/2 (Tarokh et al. 1999a,b). In order to distinguish them from the codes presented so far, we use the notations T3 and T4 . For NT = 3, the orthogonal space–time codeword is √ ∗ √ ∗ 2a1 −2a2∗ 2a 2a3 3 √ √ . (6.37) T3 = √2a2 √2a1∗ 2a3∗ − 2a3∗ ∗ ∗ ∗ ∗ 2a3 2a3 −a1 − a1 + a2 − a2 a1 − a1 + a2 + a2 Since it comprises four time instants for transmitting three symbols, the code rate amounts to Rc = 3/4. Using (6.37), the received vector can be written as a1 h2 √h32 0 0 0 h1 a2 0 0 √h32 h2 −h1 0 a3 y = 2 (6.38) h3 h3 +h2 a ∗ + n. h1√ − √h3 √h3 0 −√ −√ 2 2 2 2 2 1∗ h3 h3 h3 h3 −h2 a2 h1√ √ √ √ 0 −√ 2 2 2 2 2 a3∗ Unfortunately, the channel matrix in (6.38) does not have the block diagonal structure so that a separation into rows associated only with the original symbols a1 , . . . , a3 and those associated with their complex conjugate versions is not possible. Hence, a direct construction of an equivalent matrix H[T3 ] containing the complex channel coefficients is not possible. However, we can separate real and imaginary parts of all components and stack them into vectors and matrices similar to the approach applied to linear multiuser
288
MULTIPLE ANTENNA SYSTEMS
detectors for real-valued modulation schemes discussed in Sections 5.2.1, 5.2.2, and 5.4.2. Denoting the real part of a complex symbol y with y and the imaginary part with y , we define the real-valued vectors T (6.39a) yr = y1 · · · yL y1 · · · yL T nr = n1 · · · nL n1 · · · nL (6.39b) T a1 · · · aK ar = a1 · · · aK . (6.39c) The received vector can now be expressed by yr h √3 h1 h2 2 h3 h √ −h 2 1 2 h1 +h2 −h √ 0 3 2 h1 −h2 0 √ h 3 2 Hr [T3 ] = h3 h √ h2 1 2 h h √3 −h 1 2 2 h1 +h2 −h √ 0 3 2 0
h3
h1 −h2 √ 2
= Hr [T3 ]ar + nr with −h1
−h2
h2
−h1
0
−h3
−h3
0
h1
h2
−h2
h1
0
h3
h3
0
h
− √32
h − √32 h1 +h2 √ 2 h1 −h2 √ 2 . h √3 2 h √3 2 h +h − 1√2 2 h1 −h2 √ − 2
(6.40)
Owing to the separation of real and imaginary parts, we have again obtained a matrix with orthogonal columns
NT T Hr [T3 ] · H[T3 ] = 2 |hµ |2 · I3 = 2 · |h1 |2 + |h2 |2 + |h3 |2 · I3 . µ=1
T After multiplying yr with Hr [T3 ] , real and imaginary parts of each symbol experience a diversity gain of NT . For multiamplitude modulation, they have to be normalized and combined into a complex symbol again to allow the demodulation. Finally, a space–time coding scheme with NT = 4 transmit antennas shall be presented. The space–time codeword is √ ∗ √ ∗ −2a2∗ 2a3 2a3 2a1 √ √ ∗ 2a2 2a3∗ − 2a3∗ . √ √2a1 (6.41) T4 = ∗ ∗ ∗ ∗ 2a3 a1 − a1 + a2 + a2 √2a3 −a1 − a1∗ + a2 − a2∗ √ 2a3 − 2a3 −a1 − a1 − a2 − a2 −(a1 + a1∗ + a2 + a2∗ ) Again, three symbols are transmitted within a block covering four time instants, leading to Rc = 3/4. The received vector can be described using (6.41) yielding a1 +h4 h3√ h2 0 0 0 h1 2 a2 h3√ −h4 0 0 h −h 0 2 1 2 a3 y = 2 (6.42) h3 +h4 h3 +h4 +h2 a ∗ + n. h1√ −h3 +h4 h3 −h2 0 − 2 − 2 2 2 2 1∗ h3 −h4 h3 −h4 h3 +h4 −h2 h1√ a2 4 0 − h3 +h 2 2 2 2 2 a3∗
MULTIPLE ANTENNA SYSTEMS
289
The channel matrix for the real-valued received vector can now be expressed as
h1
h2
h2 −h3 −h 4 r H [T4 ] = h 1 h2 −h3 −h4
−h1 −h4 h3 h2 −h1 −h4 h3
h3 +h4 √ 2 h3 −h4 √ 2 h1 +h2 √ 2 h1 −h2 √ 2 h3 +h4 √ 2 h3 −h4 √ 2 h1 +h2 √ 2 h1 −h2 √ 2
−h1
−h2
h2
−h1
−h4
−h3
−h3
h4
h1
h2
−h2
h1
h4
h3
h3
−h4
h3 +h4 √ 2 −h3 +h4 √ 2 h1 +h2 √ 2 h1 −h2 √ 2 . h3 +h4 √ 2 h3 −h4 √ 2 h1 +h2 − √2 h1 −h2 √ − 2
−
(6.43)
Owing to the separation of real and imaginary parts, we have again obtained a matrix with orthogonal columns. Certainly, the real-valued description can also be applied to Alamouti’s scheme and to the codes X3 and X4 . Therefore, it is more general and can exploit more degrees of freedom because it is not restricted to use complex symbols and their conjugate versions. Linear STBCs constructed with real-valued notations are called linear dispersion codes (Hassibi and Hochwald 2002) and are addressed in Section 6.5. As already explained for Alamouti’s scheme, each of the discussed STBCs can be combined with several receive antennas. In this case, we obtain several equivalent channel matrices which are stacked into a large matrix according to (6.27). The receiver consists of a bank of NR matched filters and simply sums their outputs. This leads to an overall diversity degree of D = NT · NR . Although oSTBCs do not provide a coding gain, they have the great advantage that decoding simply requires some linear combinations of the received symbols. Moreover, they provide the full diversity degree achievable with a certain number of transmit and receive antennas. Normalizing the Transmit Power We now have to consider the transmit power of the presented STBCs in more detail. Certainly, there exist several possibilities for normalizing the transmit power. From the channel coding perspective, we know to distinguish Es and Eb . In the context of space–time coding, we have the possibility of fixing the average SNR per channel use, that is, per time instant. In this case, the constant P in (6.29) grows linearly with the length L of a space–time codeword and we obtain Es tr XNT XH . NT = L · Ts
(6.44)
Since the trace in (6.44) also depends on the number of transmit antennas, all codeword √ matrices have to be multiplied with the factor 1/ NT .
290
MULTIPLE ANTENNA SYSTEMS
In order to draw a fair comparison among the discussed STC approaches, we can also fix the average power spent per data symbol to Es /Ts , leading to Es tr XNT XH . NT = K · Ts
(6.45)
Starting with Alamouti’s scheme, each of the two symbols a1 and a2 (including their complex conjugate versions) is transmitted twice during one block. This leads to a scaling √ √ factor of 1/ K = 1/ 2 as already used on page 283. For the codes X3 and X4 , each √ symbol√is transmitted six and eight times, respectively. Hence, we obtain the factors 1/ 6 and 1/ 8. In relation to T3 and T4 , the scaling factors before the codeword matrices amount to 1/2. With this normalization, the error rate is depicted against Es /N0 . Finally, a comparison of schemes with different spectral efficiencies is generally drawn with respect to Eb /N0 instead of Es /N0 . Normalizing to the number of receive antennas so that no array gain is measured, we obtain the following relationship between the average energy Eb per information bit and the symbol energy Es Es =
m · Rc m·K · Eb = · Eb , NR L · NR
(6.46)
where m denotes the number of bits per symbol. Alternatively, the SNR at each receive antenna can also be used so that the array gain of the receiver becomes obvious. However, this must be explicitly mentioned. Simulation Results We now look at the error rate performance of the space–time block coding schemes explained so far. First, Figure 6.6a depicts the error rates of Alamouti’s scheme with
BER →
10 10 10 10 10
a) BPSK
0
NR NR NR NR
−1
=1 =2 =3 =4
QPSK, NR = 1 QPSK, NR = 4 8-PSK, NR = 1 8-PSK, NR = 4
−1
10
−2
−3
−4
−2
10
−3
10
−4
10
−5
0
b) different modulations
0
10
BER →
10
−5
5
10 15 20 25 Eb /N0 in dB →
30
10
0
5
10 15 20 25 Eb /N0 in dB →
30
Figure 6.6 Bit error rate of Alamouti’s scheme for different modulation types and number of receive antennas, (solid bold line: AWGN channel, solid dashed line: Rayleigh fading channel without diversity)
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) error rate versus Es /N0
0
X2 X3 X4 T3 T4
−1
−2
X2 X3 X4 T3 T4
−1
10
−3
−4
−2
10
−3
10
−4
10
−5
0
b) error rate versus Eb /N0
0
10
BER →
10
291
−5
5 10 15 Es /N0 in dB →
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.7 Bit error rate for different orthogonal STBCs, BPSK, and NR = 1 receive antenna different number of receive antennas. Since X2 provides a diversity degree of D = 2, additional receive antennas multiply this degree, leading to D = 4, D = 6 and D = 8 for NR = 2, NR = 3, and NR = 4, respectively. A comparison between theoretical results from Section 1.5 (lines) and simulation results (symbols) illustrates that both coincide perfectly. Hence, as long as the channel is ideally known to the receiver, optimum diversity performance is achieved. Figure 6.6b shows the performance of X2 for different modulation schemes. Both quaternary phase shift keying (QPSK) and 8-PSK profit by an increased diversity degree. Next, we compare space–time coding schemes for binary phase shift keying (BPSK) and a single receive antenna. From Figure 6.7a, it becomes obvious that X3 and T3 have identical diversity degrees in addition to X4 and T4 . The results are identical with those obtained from Section 1.5. However, the codes have different rates Rc , leading to different spectral efficiencies. Therefore, we have to depict the error rates against Eb /N0 instead of Es /N0 . Figure 6.7b shows the corresponding relations. The slopes of all curves are still the same as shown in Figure 6.7a but those of X3 , X4 , T3 , and T4 are shifted horizontally by 10 log10 (Rc ). The half-rate codes X3 and X4 perform worse especially at small SNRs compared to T3 and T4 . Despite its higher diversity degree, X3 outperforms Alamouti’s scheme only for SNRs above 15 dB. Similar intersections exist for X4 and T3 . A fair comparison between different space–time coding schemes can be guaranteed if it is drawn for identical spectral efficiencies. This can be achieved by choosing an appropriate modulation scheme for each STC. Table 6.1 summarizes some constellations considered here. For η = 2 bits/s/Hz, Alamouti’s scheme employs a QPSK while X3 and X4 have to use a 16-QAM or 16-PSK because of their lower code rate of Rc = 1/2. For η = 3 bits/s/Hz, we use the 8-PSK for X2 and 16-QAM for T3 and T4 . The results for η = 1 bit/s/Hz are depicted in Figure 6.8a. Since BPSK and QPSK show the same bit error rate (BER) performance against Eb /N0 , X3 and X4 do not suffer from a higher sensitivity of the modulation scheme and can fully exploit the larger
292
MULTIPLE ANTENNA SYSTEMS Table 6.1 Combinations of space–time codes and modulation schemes for different overall spectral efficiencies
BER →
10 10 10 10 10
X2
X3
X4
T3
T4
1 bit/s/Hz 2 bits/s/Hz 3 bits/s/Hz
BPSK QPSK 8-PSK
QPSK 16-QAM –
QPSK 16-QAM –
– – 16-QAM
– – 16-QAM
a) η = 1 bit/s/Hz
0
−1
X2 , BPSK X3 , QPSK X4 , QPSK
−1
10
−2
−3
−4
X2 , QPSK X3 , 16-QAM X4 , 16-QAM
−2
10
−3
10
−4
10
−5
0
b) η = 2 bits/s/Hz
0
10
BER →
10
η
−5
5 10 15 Eb /N0 in dB →
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.8 Bit error rate for different orthogonal STCs, NR = 1 receive antenna and different spectral efficiencies (solid bold line: AWGN, bold dashed line: flat Rayleigh fading; in Figure b) both for 16-QAM)
diversity degree. In Figure 6.8b, we observe different results for η = 2 bit/s/Hz. QPSK is much more robust than 16-QAM against the influence of noise. Hence, the higher diversity degree becomes obvious only for high SNRs. At low SNRs, Alamouti’s scheme with QPSK still performs best. Finally, Figure 6.9 illustrates the results obtained for a spectral efficiency of η = 3 bit/s/Hz. Because of the relative high code rate of Rc = 0.75, we have to just switch between 8-PSK and 16-QAM. However, 16-QAM performs nearly as good as 8-PSK because it exploits the signal space more efficiently (cf. Section 1.4). Therefore, the loss obtained by changing from 8-PSK to 16-QAM is rather low and the diversity gain dominates the bit error rate for T3 and T4 . The following conclusion can be drawn in relation to the trade-off between diversity degree and modulation type for a fixed spectral efficiency η. In the high SNR regime, diversity is most important and overcompensates the larger sensitivity of high-order modulation schemes. At low SNRs, robust modulation schemes such as QPSK should be preferred because the diversity gain is smaller than the loss associated with a change of the modulation scheme.
MULTIPLE ANTENNA SYSTEMS
293
0
10
X2 , 8-PSK T3 , 16-QAM T4 , 16-QAM
−1
BER →
10
−2
10
−3
10
−4
10
−5
10
0
5
10 15 20 Eb /N0 in dB →
25
30
Figure 6.9 Bit error rate for different orthogonal STCs and NR = 1 receive antenna, spectral efficiency η = 3 bit/s/Hz
6.2.4 Space–Time Trellis Codes Contrary to the previously presented oSTBCs, STTCs can also provide a coding gain. First, optimization criteria and some handmade codes have been presented in Seshadri et al. (1997), Tarokh et al. (1997, 1998). Results of a systematic computer-based code search can be found in B¨aro et al. (2000a,b) and some implementation aspects in Naguib et al. (1997, 1998). Figure 6.10 shows the general structure of an encoder with NT = 2 transmit antennas. Obviously, STTCs are related to convolutional codes explained in Section 3.3. At each time T instant , a vector d[] = d1 [] · · · dK [] is fed into the linear shift register consisting of Lc blocks each comprising K bits. The old content is shifted by K positions to the right. Hence, the total length of the register is Lc K bits and Lc represents the constraint length as for convolutional codes. The variable Q = Lc − 1 denotes the memory of the register. The major difference compared to binary convolutional codes is the way in which the register content d[] T .. (6.47) q[] = = q1 [] · · · qK [] qK+1 [] · · · qLc K [] . d[ − Q] state input vector d[] is combined to form the outputs b1 [] and b2 []. Assuming an M-ary linear modulation scheme according to Section 1.4, the generator coefficients gi,j are generally nonbinary with gi,j ∈ {0, 1, · · · M − 1}. They can be included in the generator matrix g1,2 · · · g1,Lc K g1,1 g2,1 g2,2 · · · g2,Lc K G= (6.48) .. .. .. ··· . . . gNT ,1 gNT ,2 · · · gNT ,Lc K
294
MULTIPLE ANTENNA SYSTEMS b1 []
+
d[]
g1,1
g1,K
g1,QK+1
g1,Lc K
q1 []
qK []
qQK+1 []
qLc K []
g2,1
g2,K
g2,QK+1
g2,Lc K
+
b2 []
M
M
x1 []
x2 []
Figure 6.10 General structure of space–time trellis encoder for NT = 2 transmit antennas (Q = Lc − 1) with which the output vector b[] = b1 [] · · · bNT [] can be described by b[] = G · q[] mod M.
(6.49)
The NT integers bµ [] ∈ {0, · · · M − 1} are then mapped onto M-ary phase shift keying (PSK) or quadrature amplitude modulation (QAM) symbols by NT independent signal mappers. In Tarokh et al. (1998), it is shown that the maximum K is restricted by the modulation scheme if maximum diversity degree of NT NR should be achieved. Hence, K = log2 (M) holds for M-ary modulation schemes. The number of states naturally depends on the memory of the register. However, it may happen that the left-most and the right-most bit tuples d[] and d[ − Q] are not fully connected to the generators. Assuming that the last τ elements of a[] are not connected to the generators, only QK − τ memory elements are used and the number of states reduces to 2QK−τ . In this case, the generator matrix is not fully loaded (Blum 2000). Similar to convolutional codes, STTCs can also be graphically described with a trellis diagram. An example with four states and NT = 2 transmit antennas is depicted in Figure 6.11 where K = 2 and Lc = 2 hold, resulting in 22 = 4 states. At each time instant, two input bits d1 [] and d2 [] are encoded in a register with memory Q = 1, resulting in four branches leaving each state. On the left-hand side, the binary representation of each state, that is, the register content [q3 []q4 []], is depicted. On the right-hand side, the output symbols x1 [] and x2 [] belonging to different branches are listed, wherein the first symbol pair belongs to the uppermost branch leaving a state and the last belongs to the lowest branch. Generally, natural mapping (see Section 1.4) is applied as can be seen from the signal space of QPSK. Decoding Space–Time Trellis Codes Owing to the equivalence between convolutional codes and STTCs, we can use the Viterbi algorithm for decoding. However, there exists a major difference. In the case of binary
MULTIPLE ANTENNA SYSTEMS
295
state (q3 [] q4 []) 00
output symbols (x1 [] x2 []) 00, 02, 22, 20
10
31, 33, 13, 11
01
10, 12, 32, 30
11
01, 03, 23, 21
QPSK 1
0
2
3
Figure 6.11 Trellis diagrams for space–time code with 2QK = 2K = 4 states, QPSK, and NT = 2 transmit antennas convolutional codes, the n bits b1 [i], . . . , bn [i] belonging to one codeword b[i] are received successively. On the contrary, the NT symbols x1 [] up to xNT [] at the output of the STC encoder interfere incoherently at the receive antenna and are not obtained separately. For the general case of NR receive antennas, y[] = H · x[] + n[]
(6.50)
with the µth received signal yµ [] = hµ · x[] + nµ [] =
NT
hµ,ν · xν [] + nµ []
(6.51)
ν=1
holds. This modification has to be considered when calculating the incremental metrics γ (s →s) [] given in (3.32). All interfering symbols at time instant originate from the same state s . The incremental metric between the states s and s becomes NT . NR 52 .2 5 5 . . 5 γ (s →s) [] = 5y[] − H · z(s →s) 5 = (6.52) .yµ [] − hµ,ν · zν(s →s) . µ=1 ν=1 (s →s)
denotes the hypothesis of the symbol transmitted over antenna ν for the where zν transition between states s and s. Consequently, z(s →s) comprises all NT hypotheses. The remaining parts of the Viterbi algorithm are identical to that of convolutional codes. Examples for Space–Time Trellis Codes In the following part some codes, derived by Wittneben, Tarokh, Yan, and Bro (Tarokh et al. 1998; Wittneben 1991, 1993; Yan and Blum 2000), are presented. This list does not claim to be comprehensive. In order to distinguish the codes, the following notation is used. The codes from Wittneben, Tarokh, Yan, and Bro are denoted by W(M, Z, NT ), T(M, Z, NT ), Y(M, Z, NT ), and B(M, Z, NT ), respectively. The three parameters describe the constellation size M of the linear modulation, the number of states Z in the trellis, and the number of transmit antennas NT . All codes achieve the maximum diversity gain NT NR so that only the coding gain has to be considered.
296
MULTIPLE ANTENNA SYSTEMS
Delay diversity by Wittneben The delay diversity scheme proposed by Wittneben (1991, 1993) represents an exception because it provides no coding gain. However, it can be interpreted as the simplest STTC and is illustrated in Figure 6.12. For the general case of M-ary modulation schemes, K = log2 (M) bits are fed into the shift register at each time instant. In the example, QPSK, resulting in K = 2 is used. The number of transmit antennas equals the constraint length Lc = NT because each K bit block is connected to a mapper of only one antenna. This leads to the general structure of the generator matrix 0 ··· 0 ··· 0 ··· 0 1 2 · · · 2K−1 0 ··· 0 1 2 · · · 2K−1 · · · 0 ··· 0 (6.53) G= .. . . . . 1 2 · · · 2K−1 and the number of states grows exponentially with the number of transmit antennas. Obviously, the transmit antennas emit delayed versions of the same piece of information. As a consequence, the signal at receive antenna µ becomes yµ [] =
NT
hµ,ν · xν [] + nµ [] =
ν=1
NT
hµ,ν · x1 [ − ν + 1] + nµ [].
(6.54)
ν=1
We recognize that (6.54) describes the convolution of a sequence x1 [] with a frequencyselective channel hµ = [hµ,1 · · · hµ,NT ]. Therefore, the flat MISO channel is transformed by the delay diversity scheme into a frequency-selective single-input single-output channel providing the full diversity degree of D = NT = Lc . Decoding is identical to the equalization of intersymbol interference channels and can be performed by a Viterbi equalizer (Kammeyer 2004; Proakis 2001). For the example W(4, 4, 2) of NT = 2 transmit antennas, QPSK, and four states, we obtain the trellis segment depicted in Figure 6.13. While the first antenna always transmits
1 d[]
2
q1 [] q2 []
1
b1 []
M
b2 []
M
x1 [] x2 []
2
q3 [] q4 []
q2Q+1 [] q2Lc []
1
2
bNT []
M
xNT []
Figure 6.12 Structure of delay diversity scheme by Wittneben with QPSK
MULTIPLE ANTENNA SYSTEMS
297
state (q3 [] q4 []) 00
output symbols (x1 [] x2 []) 00, 10, 20, 30
10
01, 11, 21, 31
01
02, 12, 22, 32
11
03, 13, 23, 33
QPSK 1
0
2
3
Figure 6.13 Structure of delay diversity scheme by Wittneben with memory Q = 1, NT = 2, and QPSK
symbol ν in the νth state, the second antenna transmits a symbol µ identifying the successive state s = µ. The generator matrix has the form * + 1 2 0 0 . (6.55) W(4, 4, 2) = 0 0 1 2 Owing to the simplicity of this scheme, it can be easily extended to an arbitrary number of transmit antennas and, therefore, to a very high diversity gain. However, the decoding or detection complexity grows exponentially with NT and limits the potential due to practical restrictions. Space–time trellis codes with NT = 2 transmit antennas Next, we focus on schemes providing a coding gain gc with only two transmit antennas. Table 6.2 lists the codes T(4, Z, 2) and Y(4, Z, 2) by Tarokh et al. (1998), Yan and Blum (2000) for Z states and QPSK, Table 6.3 the codes B(4, Z, 2) by Bro (B¨aro et al. Table 6.2 List of space–time trellis codes taken from Tarokh et al. (1998), Yan and Blum (2000) for NT = 2, QPSK, η = 2 bits/s/Hz and diversity degree D = 2 Z 4 * 8 16 * 32
T(4, Z, 2) * + 1 2 0 0 0 0 1 2
+ 0 0 1 2 2 1 2 0 0 2
* 0 0 1 2
+ 1 2 2 0 2 0 0 2
+ 0 0 1 2 2 3 2 1 2 1 2 0 3 2
gc
*
2 √ 12 √ 12 √ 12
*
Y(4, Z, 2)
+ 2 0 1 2 2 2 2 1
* 0 2 1 2 1 0
+ 0 2 2 2
+ 0 2 1 1 2 0 2 2 1 2 0 2
* 0 2 3 1 2 0 1 2
+ 2 0 2 1 2 2
gc √ 8 4 √ √
32 40
298
MULTIPLE ANTENNA SYSTEMS Table 6.3 List of space–time trellis codes taken from B¨aro et al. (2000a,b) for NT = 2, QPSK, η = 2 bits/s/Hz and diversity degree D=2 Z 4 * 8 16
B(4, Z, 2) * + 0 2 3 1 2 2 1 0
+ 0 2 1 2 2 1 2 0 0 2
* + 2 1 0 2 2 0 0 2 2 1 0 2
gc √ 8 √ √
12 20
b1 []
1 d[]
q1 [] q2 []
1
2
2
q3 [] q4 []
M
x1 []
2 q5 []
2 b2 []
M
x2 []
Figure 6.14 Structure of encoder for T(4, 8, 2) for NT = 2 and QPSK 2000a,b). As an example, the encoder structure of T(4, 8, 2) is shown in Figure 6.14. The corresponding generator matrix is not fully loaded because of τ = 1 leading to 2KQ−1 = 8 states. The coding gains gc listed in the tables have been obtained by analyzing the pairwise ˜ as described in Subsection 6.2.2. The codes by Yan achieve the highest differences X − X coding gains, while those of Tarokh show no improvement for more than eight states. These theoretical results are now evaluated by simulations. First, we analyze the achievable coding gains of the codes Y(4, Z, 2) for different number of states Z. The frame error rates (FERs) have been determined by transmitting code frames of length 130 symbols over time-invariant channels so that diversity is only gained by the resource space. The obtained frame and BERs for NR = 1 receive antenna are depicted in Figure 6.15. It can be observed that the FER decreases with growing Z. Since all codes provide the full diversity gain of D = NT = 2, the slopes of all curves are identical and only the coding gain is observed. For a FER of 10−2 , the code with 32 states gains a little less than 3 dB compared to Y(4, 4, 2). According to Table 6.2, we should
MULTIPLE ANTENNA SYSTEMS
10
10
b) bit error rate
0
10
−1
−1
−2
−3
0
4 states 8 states 16 states 32 states
10
FER →
10
a) frame error rate
0
BER →
10
299
4 states 8 states 16 states 32 states 5 10 15 Eb /N0 in dB →
−2
10
−3
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.15 Error rate performance of code Y(4, Z, 2) by Yan for QPSK and NR = 1 receive antenna (bold line: theoretical error rate for diversity D = 2) have achieved a gain of 10 · log10
√ gc (Y(4, 32, 2)) 40 = 10 · log10 √ = 3.5 dB. gc (Y(4, 4, 2)) 8
(6.56)
On the contrary, the BER depicted in Figure 6.15b) shows only minor differences. At very low SNRs, the weak codes with low memory perform slightly better; a result that we know already from Chapter 3. At high SNRs, the codes with higher memory close the gap but cannot significantly outperform the weak codes. Moreover, the theoretical BER curve for NT = 2-fold diversity is not reached and a gap of 2 dB remains. This observation can be explained by the fact that some frames cannot be correctly decoded and the decoding process itself artificially generates additional errors, increasing the BER. This specifically happens for bad instantaneous channels. With reference to the FER, the number of errors within one frame is not important and hence it is not affected. Therefore, the focus is on the FER in the following part. It has to be mentioned that things will change if the channel varies during one frame. In this case, the decoder exploits time diversity and the BER especially can be improved remarkably. A higher diversity degree is also obtained if the number of receive antennas is increased. Figure 6.16 shows the corresponding results for NR = 2. Comparing the FERs with those of Figure 6.15, we see that all codes profit from the increased diversity degree and gain between 5 and 6 dB at an FER of 10−2 . However, the differences between them become smaller and the gain of Y(4, 32, 2) compared to Y(4, 4, 2) reduces to only 2 dB. The BER is also improved by approximately 3 dB but all codes still perform very similar so that no coding gain can be observed. The gap to the theoretical diversity curve is closed.
300
MULTIPLE ANTENNA SYSTEMS
−1
10
10
4 states 8 states 16 states 32 states
−1
10
−2
4 states 8 states 16 states 32 states
−2
10
−3
0
b) bit error rate
0
10
FER →
10
a) frame error rate
0
BER →
10
−3
5 10 15 Eb /N0 in dB →
10
20
0
5 10 15 Eb /N0 in dB →
20
Figure 6.16 Error rate performance of code Y(4, Z, 2) by Yan for QPSK and NR = 2 receive antennas (bold line: theoretical error rate for diversity D = 4)
10
10
b) NR = 2
0
10
−1
FER →
10
a) NR = 1
0
−2
−3
0
−1
10
W(4, 4, 2) T(4, 4, 2) Y(4, 4, 2) B(4, 4, 2) T(4, 16, 2) Y(4, 16, 2) B(4, 16, 2) 5 10 15 Eb /N0 in dB →
FER →
10
−2
10
−3
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.17 Frame error rate performance of different codes for QPSK Figure 6.17a compares the performances of the codes W(4, 4, 2), T(4, Z, 2), Y(4, Z, 2), and B(4, Z, 2) for NR = 1 receive antenna and 4 or 16 states. The theoretical differences indicated in Tables 6.2 and 6.3 cannot be confirmed. For example, a gain of √ gc (Y(4, 16, 2)) 32 = 10 · log10 √ = 2.13 dB (6.57) 10 · log10 gc (T(4, 16, 2)) 12 should occur between T(4, 16, 2) and Y(4, 16, 2). However, all codes with 4 states show a performance similar to all codes with 16 states. For W(4, 4, 2) and T(4, 4, 2), this is
MULTIPLE ANTENNA SYSTEMS
301
not surprising because Tarokh’s four-state code is identical to the delay diversity scheme with two transmit antennas. The code proposed by Yan shows no significant performance improvement. The only observable difference is the improvement of 2 dB obtained by increasing the number of states to 16. For NR = 2 receive antennas, larger differences between the curves can be observed, although the promised gains are not achieved. The codes from Yan perform best closely followed by those of B¨aro. However, the differences remain small. The reason for this behavior is the fact that the coding gain was calculated only with respect to the minimum determinant of the difference matrices described in Subsection 6.2.2. This criterion is comparable to the minimum Hamming distance of a code that dominates the error rate only asymptotically for large SNRs. In low or medium SNR regions, sequence pairs with larger distance also influence the performance, which is not considered in the theoretical derivation. So far, only QPSK modulation has been used. For 8-PSK, 3 bits can be transmitted per time instant, resulting in a higher spectral efficiency η = 3 bits/s/Hz. Tarokh et al. (1998) presented some space–time trellis codes for 8-PSK. The generator matrices are given in Table 6.4. Owing to the high computational costs, no theoretical results on the coding gains exist. Figure 6.18 shows the corresponding simulation results. Only very small gains can be obtained by increasing the number of states, and, therefore, the decoding complexity. Table 6.4 List of space–time trellis codes taken from Tarokh et al. (1998) for NT = 2, 8-PSK, η = 3 bits/s/Hz and diversity degree D = 2 T(8, 8, 2) * + 0 0 0 5 2 4 1 2 4 0 0 0
10
T(8, 16, 2)
+ 0 0 0 5 2 4 1 1 2 4 1 2 4 5
a) NR = 1
0
* 0 0 1 2
T(8, 8, 2) T(8, 16, 2) T(8, 32, 2) −1
T(8, 32, 2)
+ 0 5 2 4 3 2 4 1 2 4 7 2 b) NR = 2
0
10
T(8, 8, 2) T(8, 16, 2) T(8, 32, 2)
−1
FER →
10
FER →
10
*
10
10
−2
−2
10
−3
0
−3
5
10 15 20 25 Eb /N0 in dB →
30
10
0
5
10 15 20 25 Eb /N0 in dB →
30
Figure 6.18 Frame error rate performance for codes by Tarokh for 8-PSK and different number of states
302
MULTIPLE ANTENNA SYSTEMS Table 6.5 List of space–time trellis codes taken from (Yan and Blum 2000) for NT = 3 and NT = 4, BPSK, η = 1 bit/s/Hz Y(2, Z, 3) * + 0 1 1 1 + * 0 1 1 1 0 1
Z 2 4 * 8
16
+ 1 0 1 1 1 1 0 1
* + 0 1 0 1 1 1 0 1 0 1
Y(2, Z, 4)
gc 4 √ 48 √ 80 √
128
gc
0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1
4
2561/3
8
Moreover, the relations between the curves hardly changes for different number of receive antennas. Space–time codes for more than two transmit antennas For more than NT = 2 transmit antennas and BPSK modulation, Yan presented in Yan and Blum (2000) some codes for NT = 3 and NT = 4 transmit antennas. They are listed in Table 6.5. Again, coding gains between 2 and 2.4 dB are promised by the determinant criterion of Subsection 6.2.2 between two and four states for NT = 3 and between four and eight states for NT = 4. The corresponding simulation results are shown in Figure 6.19 for one receive antenna. Owing to the higher diversity degree of D = NT = 3 or D = NT = 4, an increase in the 10
b) NT = 4
0
10
−1
−1
W(2, 4, 4) Y(2, 4, 4) Y(2, 8, 4) Y(2, 16, 4)
FER →
10
FER →
10
a) NT = 3
0
10
10
−2
−3
0
−2
W(2, 4, 3) Y(2, 4, 3) Y(2, 8, 3) Y(2, 16, 3) 5 10 15 Eb /N0 in dB →
10
−3
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.19 Frame error rate performance for codes by Yan for BPSK, different number of transmit antennas, and NR = 1 receive antenna
MULTIPLE ANTENNA SYSTEMS 10
a) NT = 3
0
10
W(2, 4, 3) Y(2, 4, 3) Y(2, 8, 3) Y(2, 16, 3)
−1
10
W(2, 4, 4) Y(2, 4, 4) Y(2, 8, 4) Y(2, 16, 4)
FER →
−1
b) NT = 4
0
FER →
10
303
10
10
−2
−2
10
−3
0
−3
5 10 15 Eb /N0 in dB →
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.20 Frame error rate performance for codes by Yan for BPSK, different number of transmit antennas, and NR = 2 receive antennas (bold dashed line: theoretical frame error rate) number of states really leads to a measurable coding gain. The theoretical gains can be approximately confirmed. The same holds for the case of NR = 2 receive antennas depicted in Figure 6.20. These diagrams additionally show the theoretical FERs (bold dashed curves) for D-fold diversity. The theoretical gains can be calculated by first looking at the instantaneous frame error probability Pf (H) as a function of the symbol error probability Ps (H). For a frame of length L, we assume that the channel H remains constant. A frame error occurs if at least one symbol is wrong. In other words, the whole sequence is only correct if all L symbols are correct. Since the noise is white, the error probabilities Ps (H) of successive symbols are identical and independent and the probability that a frame is received correctly amounts to L 1 − Ps (H) . Hence, the frame error probability is L Pf (H) = 1 − 1 − Ps (H) . (6.58) The ergodic probability is now obtained by calculating the expectation of (6.58) , L Pf = E {Pf (H)} = E 1 − 1 − Ps (H)
(6.59)
which is not easy because expectations over powers of Ps (H) have to be calculated. A tight approximation applies a series expansion to the Lth power and considers only the linear terms. This yields (6.60) Pf ≈ E 1 − 1 − L · Ps (H) ≈ L · Ps . We recognize from Figure 6.20 that this theoretical result coincides with the error rates for delay diversity which, therefore, fully exploits the diversity degree of D = NR · NT , while the codes by Yan additionally profit from the coding gain. Generally, we can conclude that the coding gains of STTCs promised by the determinant criterion can hardly be achieved in practice. Only for high diversity degrees that require
304
MULTIPLE ANTENNA SYSTEMS
Figure 6.21 Multilayer transmission with perfect channel knowledge at the transmitter and receiver either time-selective channels or many antennas at the transmitter or receiver do coding gains become visible. Naturally, STBCs as well as STTCs can be combined with classical error correction coding schemes (Bauch 1999; Bauch et al. 2000). This leads to concatenated schemes that can be processed iteratively according to the turbo principle introduced in Section 3.6.
6.3 Multilayer Transmission 6.3.1 Channel Knowledge at the Transmitter and Receiver The diversity techniques discussed so far improve the link reliability, mainly by using multiple antennas at the transmitter and one or more receive antennas. On the contrary, we now try to enhance the data rate by transmitting parallel data streams termed layers over the antennas and, thus, perform spatial multiplexing. Hence, we remember the general MIMO concept illustrated in Figure 6.1 with the channel output y = Hx + n given in (6.1). First, we focus on the case where the transmitter and receiver both have perfect channel knowledge. In this case, we know from Chapter 2 that we have to exploit the eigenmodes of the channel. Hence, a singular value decomposition (SVD) H = U · · VH
(6.61)
of the channel matrix H has to be performed at the transmitter and receiver. Remember that U and V are unitary matrices of size NR × NR and NT × NT , respectively. The NR × NT matrix has the form + * 0 0 (6.62) = 0 0 where the r × r matrix 0 contains on its diagonal all r nonzero singular values of H. The parameter r ≤ min(NT , NR ) denotes the rank of the channel and represents the maximum number of independent data streams that can be transmitted. The actual number of streams may be even smaller than r if the chosen power distribution excludes some layers and spends power only on the strongest modes (compare with the waterfilling principle in Section 2.3).
MULTIPLE ANTENNA SYSTEMS
305
In the following part, it is assumed that r independent data streams aν are transmitted as shown in Figure 6.21. We first adjust their power levels according to an appropriate criterion by multiplying a = [a1 , . . . , ar ]T with the square root of the diagonal matrix = diag[λ1 · · · λr ]. Next, each layer is coupled into a certain eigenmode of the channel by using a distinct eigenvector in V. If VX comprises those columns of V that correspond to the used eigenmodes (nonzero singular values), the transmit vector has the form 1/2
x = VX · X · a.
(6.63)
Generally, a consists of La i.i.d. symbols aµ and we obtain with A A = E{aaH } = Es /Ts · ILa the covariance matrix of the transmit vector Es 1/2 1/2 X X = E xxH = VX X A A X VH · VX X VH X = X. Ts
(6.64)
The receiver multiplies y = Hx + n from the left-hand side with an La × NR matrix UH X that comprises those La columns of U corresponding to the used eigenmodes. Assuming that the columns in U and V are sorted according to the magnitude of their associated singular values, we obtain * + ILa and UH (6.65) · U = ILa 0La ×NT −La VH · VX = X 0NT −La ×La and the output signal a˜ can be described by 1/2 H H H a˜ = UH X · Hx + n = UX U · · V VX · X · a + U · n 1/2
˜ = 0 · X · a + n.
(6.66)
Since the matrices 0 and X are diagonal, (6.66) describes a set of decoupled scalar equations, and no interference disturbs the transmission. The νth symbol can be detected from 8 a˜ ν = σν · λν · aν + n˜ ν . (6.67) The SNR of this eigenmode amounts to γν = σν2 · λν ·
Es /Ts ES = σν2 · λν · 2 N0 σN
(6.68)
and depends on the singular value σν of the channel as well as the chosen transmit power level λν . From information theory, we know that the optimum power distribution for Gaussian input signals is obtained by applying the waterfilling principle. If discrete linear modulation schemes such as PSK or QAM are employed in combination with codes that are not capacity achieving, waterfilling does not represent the optimum choice and finding the optimum power distribution is still an unsolved problem. In uncoded systems, appropriate bit and power-loading strategies have been proposed by Hughes-Hartogs (1989), Fischer and Huber (1996), Krongold et al. (1999, 2000), and Mutti and Dahlhaus (2004).
306
MULTIPLE ANTENNA SYSTEMS
Beamforming Classical beamforming exploits only a single eigenmode of the channel and is beneficial if the channel matrix is rank deficient and has only one dominating nonzero singular value. In this case, the transmit power is concentrated on this strong eigenmode and only a single data stream is transmitted. Denoting by σmax the strongest singular value of H and by umax and vmax the associated columns of the unitary matrices U and V, we obtain the transmitted signal x = vmax · a. Filtering the received vector y with uH max results in H H uH ˜ = σmax · a + n. ˜ max · y = umax U · · V vmax · a + n
(6.69)
Since the columns in U as well as in V are mutually orthogonal, the products uH max U and VH vmax deliver vectors that have only a single one exactly at that position belonging to the maximum singular value σmax in . Certainly, beamforming can also be applied when multiple antennas are only present either at the transmitter or at the receiver. For those scenarios, the channel matrix H reduces to a vector h, and we obtain v = hH /h or uH = h/h, respectively. Hence, beamforming either at the transmitter or at the receiver ensures maximum ratio combining for perfect channel knowledge. In practical systems, the channel matrix H has to be estimated. Besides unavoidable estimation errors, there arises also the problem that the channel has to be known at the transmitter and receiver. The direct approach would estimate the channel on the basis of appropriate training sequences at the receiver and feeds the estimates back to the transmitter using a return channel. For large systems, this would require high bandwidth on the feedback channel, which is generally not available and would reduce the overall spectral efficiency remarkably. Alternatively, the channel can be estimated at the transmitter as well as at the receiver requiring pilot signals in uplink and downlink. Unfortunately, uplink and downlink channels need not be identical. For frequency division duplex (FDD) systems, uplink and downlink are assigned to different frequency bands so that the short-term behavior is different. However, long-term statistics such as directions of arrival or departure can be assumed to be identical. Hence, an eigenvalue analysis can be performed on the basis of the long-term statistics, that is, the covariance matrices measured at the transmitter and receiver. The exploitation of long-term statistics is called eigenbeamforming Brunner et al. (2001). In time division duplex (TDD) systems, the frequency range of uplink and downlink is the same as they are separated by different time slots. If the channel varies slowly in time and reciprocity is given, estimates at the transmitter and receiver describe the same channel and instantaneous channel knowledge (short-term statistics) can be used.
6.3.2 Channel Knowledge only at the Receiver If the transmitter has no channel knowledge, we can still exploit the high potential of multiple antenna systems with only a moderate loss in capacity (see Section 6.5). The missing channel knowledge has to be compensated by more sophisticated signal processing at the receiver. This leads to the well-known Bell Labs Layered Space–Time (BLAST) architecture (Foschini 1996; Foschini and Gans 1998; Foschini et al. 1999) depicted in Figure 6.22. We see that the parallel data streams are transmitted with identical power levels over different antennas without further preprocessing. Neglecting the interleaver,
MULTIPLE ANTENNA SYSTEMS
307
−
Figure 6.22 (BLAST)
Multilayer transmission with perfect channel knowledge only at receiver
x1 []
V-BLAST
x2 []
space
x3 [] x4 [] x1 []
D-BLAST
x2 []
space
x3 [] x4 []
Figure 6.23 Data streams of V-BLAST and D-BLAST for NT = 4 transmit antennas each data stream is transmitted over a single antenna, leading to a vertical arrangement of the layers (see V-BLAST in Figure 6.23). Owing to NR receive antennas, the maximum diversity degree amounts to D = NR .2 An alternative implementation that increases the diversity degree for each layer is the diagonal BLAST (D-BLAST) architecture. It employs an interleaver in front of the transmit antennas and a corresponding de-interleaver at the receiver. According to Figure 6.23, interleaving is performed such that the antenna index ν = ( mod NT ) + 1 is increased modulo NT with time index so that NT consecutive symbols of a certain layer are switched successively onto all transmit antennas, resulting in a diagonal space–time arrangement. Therefore, each data stream is spread over all transmit antennas increasing the achievable diversity degree from NR to NT · NR .2 In both systems, a superposition of all layers is obtained at each receive antenna requiring sophisticated space–time signal processing. For notational simplicity, we restrict our discussion to the simpler V-BLAST later in the chapter. 2 It
depends on the kind of detection algorithm whether the full diversity degree can be exploited. For maximum likelihood decoding (MLD) full diversity is obtained for all layers, while the QL-based successive interference cancellation (SIC) provides different diversity degrees for the layers (cf. Figure 6.27).
308
MULTIPLE ANTENNA SYSTEMS
Similarity of CDMA and Multilayer Architectures Next, the similarity of the considered multilayer architectures and the uplink of CDMA systems is briefly discussed. Therefore, we have a look at Figure 6.22 and the received vector y = Hx + n = Ha + n. A comparison with the results from Chapters 4 and 5, especially (5.1) on page 228, demonstrates the equivalence of multiple antenna systems and the uplink of a CDMA system whose output vector can be described by y = Sa + n. Since a contain symbols of a discrete modulation alphabet in both systems, differences between both scenarios are restricted to the properties of the system matrix S and the channel matrix H. If the data layers are assigned to different users, the structural equivalence of BLAST and the uplink of an ordinary SDMA scheme becomes obvious. In OFDM-CDMA systems or single-carrier CDMA systems with flat fading channels, S has the dimensions Ns × Nu and its columns represent the signatures of all users. Remember that the signature was obtained by convolving the spreading code with the channel impulse response. In relation to multiple antenna schemes, H is an NR × NT matrix purely containing the channel coefficients hµ,ν . Hence, the spreading factor Ns is similar to the number of receive antennas, while the number of users Nu is equivalent to the number of layers NT . Consequently, the load of a CDMA system β = Nu /Ns becomes NT /NR in multilayer architectures. The larger the difference between NT and NR with NR > NT , the easier the detection at the receiver. Despite these similarities, both systems differ in that the diversity degree increases with the number of receive antennas, while it does not for growing Ns and constant bandwidth. Owing to these analogies, the multiuser detection algorithms described in Chapter 5 can also be applied to BLAST-like architectures. Strategies such as parallel or successive interference cancellation described in Section 5.3 as well as linear filters such as decorrelators and MMSE filters derived in Section 5.2 can be used. The next subsection compares their error rate performances under the specific conditions of multiple antenna systems and flat Rayleigh fading channels.
6.3.3 Performance of Multilayer Detection Schemes In this section, the focus is on the combination of linear and nonlinear techniques as analyzed in Section 5.4. Since we look only at the first detection stage (no turbo detection), simple hard decisions are used as nonlinearities. The QLD-SIC strategies are compared with optimum maximum likelihood detection for different combinations of NT and NR as well as different modulation schemes.3 Moreover, the influences of sorting and error propagation are illuminated. Figure 6.24 shows the results obtained for an uncoded QPSK system with NT = 4 transmit antennas, NR = 6 receive antennas, and i.i.d. Rayleigh fading channels between them. Diagram a) depicts the results for the zero-forcing criterion, while diagram b) those for the MMSE criterion. First, we see that the linear approaches lose about 1 dB compared to the unsorted interference cancellation schemes (QLD) because they do not consider the discrete nature of the signal alphabet. Applying the SQLD with suboptimum sorting, a gain of 3.3 dB is obtained for the zero-forcing as well as the MMSE solution at an error rate of 10−3 . 5 52 Besides the brute force approach that calculates xˆ = argminx˜ 5y − H˜x5 for each hypothesis x˜ , MLD can also be accomplished by means of sphere detection (Agrell et al. 2002; Fincke and Pohst 1985; Schnoor and Euchner 1994) with lower computational costs. 3
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) ZF detection
0
−1
linear QLD SQLD SQLD+PSA
−1
10
−2
−3
−4
linear QLD SQLD SQLD+PSA
−2
10
−3
10
−4
10
−5
0
b) MMSE detection
0
10
BER →
10
309
−5
5 10 15 Eb /N0 in dB →
20
10
0
5 10 15 Eb /N0 in dB →
20
Figure 6.24 Bit error rate performance of a) ZF and b) MMSE detectors for QPSK system with NT = 4 and NR = 6 antennas (bold line: maximum likelihood detection) The loss compared to the algorithm with optimum post sorting (SQLD+PSA) known as V-BLAST detection (Foschini et al. 1999) and described in Section 5.4.2 is neglectable. This performance is obtained with only a fraction of the computational costs of the V-BLAST. The MMSE criterion outperforms the ZF approach by 0.6–0.7 dB at an error rate of 10−3 so that the SQLD+PSA algorithm approaches the ML detector up to a gap of 1 dB. Next, we increase the load of the system by choosing only NR = 4 receive antennas. Figure 6.25 illustrates the QPSK signal spaces after each detection step of the SQLD-SIC algorithm. After filtering with QH , we have to start the successive detection from top to bottom. The first layer does not suffer from interference and its symbols can be directly detected. For this scenario, an SNR of 16.9 dB was achieved. In the second layer, we can clearly recognize the superposition of two QPSK constellations prior to the interference cancellation step. The QPSK constellation with an SNR of 13.7 dB is obtained only after subtracting the contribution of the first layer. In all subsequent layers, the signal constellations become more obvious after each cancellation step. Owing to the higher load, it can be expected that both the error rates and the gap to the maximum likelihood detector will increase. The results in Figure 6.26 confirm these predictions. The relations between the curves in diagrams a) and b) are similar to the case of six receive antennas. However, the algorithms based on the zero- forcing criterion do not work satisfactorily since the gap to the MLD amounts to more than 10 dB even with optimum post sorting. The MMSE variants perform much better because they avoid the noise amplification. The loss of the SQLD with PSA compared to the MLD grows up to 2.8 dB. Moreover, the difference between the SQLD with and without post sorting becomes larger at high SNRs. Additionally, an illustration of the curves for a theoretical device called genie-aided detector indicated by the bold dashed lines has been provided. It allows a perfect interference cancellation and, therefore, avoids error propagation. Nevertheless, detection errors occur in each layer. The additional gain obtained by the genie detector emphasizes that even with
310
MULTIPLE ANTENNA SYSTEMS
layer 1
step 1
step 3
step 2
step 4
2 0
layer 2
-2 16.9 dB -2 0 2
2
2
0
0
layer 3
-2 -2
2
-2 13.7 dB -2 0 2
2
2
2
0
0
0
-2 -2
layer 4
0
0
2
-2 -2
0
-2 14.6 dB -2 0 2
2
2
2
2
2
0
0
0
0
-2 -2
0
2
-2 -2
0
-2 -2
2
-2
0
2
15.5 dB
-2
0
2
Figure 6.25 Signal spaces per layer after different detection steps of SQLD-SIC for NT = 4 transmit and NR = 4 receive antennas with QPSK modulation
a) ZF detection
0
10
−1
−1
10
−2
10
BER →
BER →
10
−3
10
−4
10
−5
10
b) MMSE detection
0
10
0
linear QLD SQLD SQLD+PSA
5 10 15 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
20
10
0
linear QLD SQLD SQLD+PSA
5 10 15 Eb /N0 in dB →
20
Figure 6.26 Bit error rate performance of a) ZF and b) MMSE detectors for NT = NR = 4 antennas and QPSK (bold solid line: maximum likelihood detection, bold dashed line: genie-aided detector)
MULTIPLE ANTENNA SYSTEMS
311
0
10
layer layer layer layer
−1
BER →
10
1 2 3 4
−2
10
−3
10
−4
10
−5
10
0
5
10 15 Eb /N0 in dB →
20
Figure 6.27 Demonstration of error propagation for QPSK system with ZF-SQLD-SIC and NT = NR = 4 (bold solid line: maximum likelihood detection, bold dashed line: genie aided detector) optimum post sorting, error propagation is a severe problem. At an error rate of 2 · 10−3 , the loss amounts to 4 dB for the ZF solution and approximately 2 dB at 10−3 in the MMSE case. This effect shall be further examined in Figure 6.27 showing layer-specific error rates for ZF-SQLD-SIC with and without a genie aided detector. With the genie aided detector, substantial improvements are achieved with each cancellation step. Layer 4 does not suffer from interference at all and comes very close to the MLD performance. As we know from Section 1.5, the different slopes of the curves indicate that each layer experiences a different diversity degree. In the first layer, the main purpose of QH is to suppress interference so that only a diversity degree of NR − NT + 1 = 1 is achievable (W¨ubben et al. 2002). With each SIC step, we gain a degree of freedom because less layers have to be suppressed so that the linear filter tends toward a matched filter and collects more and more diversity. Hence, the diversity degree grows up to NR = 4 for the last layer. Without the genie, the performance of all layers is dominated by the first one because of error propagation. In order to achieve higher spectral efficiencies, we can employ modulation schemes with more bits per symbol. Figure 6.28 illustrates the results for 16-QAM. Since it is more sensitive to noise and interference than QPSK, larger SNRs are required to obtain similar error rates. Moreover, the SQLD-SIC even with optimum post sorting loses a lot compared to the MLD bound. For approximately the same number of receive antennas as transmit antennas, the matrix describing the MIMO channel is generally poorly conditioned. In this case, the noise amplification of the zero-forcing approach is very high and also the MMSE filter performs weakly. A significant improvement coming close to the MLD performance even under these severe conditions is described in the next subsection. It was not already presented in the context of MUD techniques in Chapter 5 because the considered algorithm works best for small matrices and the system matrix S of a CDMA system is generally much larger than H.
312
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) ZF-SQLD-SIC
0
b) MMSE-SQLD-SIC
0
10
−1
−1
10
−2
BER →
10
−3
−4
−5
0
−3
10
−4
linear SQLD+PSA ML
5 10 15 Eb /N0 in dB →
−2
10
10
−5
20
10
0
linear SQLD+PSA ML
5 10 15 Eb /N0 in dB →
20
Figure 6.28 Performance of a) ZF-SQLD-SIC and b) MMSE-SQLD-SIC detectors and system with NT = 4 and NR = 4, 16-QAM
6.3.4 Lattice Reduction-Aided Detection The aforementioned techniques have severe problems if the channel matrix H is badly conditioned. The condition number of a matrix H is defined as σmax ≥ 1, (6.70) κ H = σmin that is, it equals the ratio of the largest and smallest singular value (Strang 1988). With H2 as the spectral norm of H (see definition in Appendix C on page 336), the condition number becomes 5 5 5 5 (6.71) κ H = 5H52 · 5H−1 52 . For orthogonal real matrices with H−1 = HT , the condition number is 1 and no noise amplification occurs for linear detectors. Hence, a matched filter and decorrelator are identical and optimum receivers. On the contrary, ill-conditioned matrices with large κ(H) lead to high noise enhancements and, therefore, to severe detection problems especially for the decorrelator. In conclusion, we can state that it would be desirable to have a roughly orthogonal matrix with a condition number close to 1.4 This goal can be accomplished with the lattice reduction (LR) technique (Windpassinger and Fischer 2003a,b; W¨ubben et al. 2004b,c). For the derivation, we use the real-valued description of a communication link as described in (5.65) on page 261 + * + * + * + * H −H x n y r r r r = · + . ⇔ y =H x +n y H H x n Restricting ourselves to multiamplitude signaling such as M-ASK or M-QAM, the data symbols xν , xν ∈ X can be expressed as integers with appropriate scaling. From Section 1.4, 4 It is sufficient that the columns in H are orthogonal but have different norms. In this case, κ(H) = 1 holds and HH H would result in a diagonal but not the identity matrix.
MULTIPLE ANTENNA SYSTEMS
313
we know that the set X can be chosen according to
XASK = ± e, ±3e · · · ± (M − 1)e
"
√ XQAM = ± e, ±3e · · · ± ( M − 1)e
3 (M 2 − 1)Es /Ts " 3 e= . 2(M − 1)Es /Ts
e=
with
with
(6.72a)
(6.72b)
Hence, X contains all odd multiples of e in a certain interval whose size depends on M. PSK modulation is explicitly excluded from this derivation. Instead of using only odd numbers, the transformation x = (2b − 1) · e (6.73) with consecutive integers b ∈ B = {−M/2 + 1 . . . M/2} delivers the √ same set XASK . An equivalent expression is obtained for QAM by exchanging M with M. The product Hr · b describes the linear combination of the columns in Hr using as coefficients the integer elements in b. Considering all possible vectors b ∈ B2NT , the columns span a lattice in which each linear combination determines a single point. With the generalization that all integers b ∈ Z are considered, the lattice becomes infinite. An example for the case of NT = 1 is depicted in Figure 6.29. All points in this lattice can be reached by linear integer combinations of the vectors hr1 and hr2 . In our example, the vectors hr1 = [2 1]T and hr2 = [2 2]T are obviously not orthogonal to each other but highly correlated. This is also indicated by the condition number κ(Hr ) = 6.3 so that a multiplication with the Moore–Penrose inverse of Hr would lead to an amplification of the background noise. The basic principle behind the LR technique is to find vectors h˜ r1 and h˜ r2 that generate the same lattice and are roughly orthogonal. For our example, we can find * + ˜hr1 = 2hr1 − hr2 = 2 (6.74a) 0 h˜ r2 = hr1 − hr2 =
+ 0 −1
*
(6.74b)
hr2 hr1
h˜ r2 = hr1 − hr2
h˜ r1 = 2hr1 − hr2
Figure 6.29 Illustration of the principle of lattice reduction for NT = 1 with hr1 = [2 1]T and hr2 = [2 2]T
314
MULTIPLE ANTENNA SYSTEMS
which are exactly orthogonal. The new matrix Hred
= h˜ r1
* + ˜hr = 2 0 2 0 −1
(6.75)
has been obtained by subtracting integer multiples of one column from the other. It has a condition number κ(Hred ) = 2 that is much smaller than κ(Hr ) = 6.3 for the original ˜ r generated from Hr by exchanging columns, multiplying matrix. Generally, any matrix H columns with −1, and adding integer multiples of one column to another spans the same lattice. These linear operations can be expressed by an 2NT × 2NT unimodular matrix T, that is, T consists of only integers and its determinant takes only the values ±1 (Schnoor and Euchner 1994). Therefore, the inverse of T always exists, contains only integers, and allows the operation T · T−1 = I2NT . The reduced channel matrix is obtained by Hred = Hr · T.
(6.76)
2 1 For the above example, T = −1 −1 holds. This result is explained in Appendix C.3 in more detail. The goal is now to find a transformation matrix T such that the condition number of Hred is as small as possible, especially κ(Hred ) < κ(Hr ) should hold. Finding a nearly orthogonal basis is equivalent to the problem of finding vectors of minimum length. This can be easily seen from Figure 6.29. Only if none of the two vectors has a significant projection onto the other do both have a small length and are nearly perpendicular. A first step toward this goal is the QL decomposition of Hr since it already determines a set of orthogonal vectors qν with ν = 1, . . . , 2NT . However, their linear combination is given by L whose elements are generally real numbers and no integers. Nevertheless, this decomposition allows an efficient lattice reduction especially if appropriate sorting is applied. The exact definition of a reduced lattice and a detailed description of the LLL algorithm as an example for a lattice reduction can be found in Appendix C.3. Once we find a transformation matrix T and, thus, the reduced matrix Hred , we can rewrite the received signal as
yr = Hr xr + nr = Hr T · T−1 xr + nr = Hred · T−1 xr + nr .
(6.77)
Applying the transformation in (6.73) to the whole vector b = [b1 , . . . , b2NT ]T , the product T−1 xr can be expressed by T−1 xr = T−1 2b − 12Nt ×1 e = 2z − T−1 12Nt ×1 e, (6.78) where the new variable z = T−1 b contains only integer numbers according to the properties of T. The received signal yr = 2eHred z − eHred T−1 12NT ×1 + nr
(6.79)
consists of three parts: the first term depends on the desired information z, the second is a constant and the third term describes the noise contribution. Since the modified channel matrix Hred is generally better conditioned, we can apply linear detectors as well as SQLD-SIC-based approaches known from Chapter 5 without suffering much from the noise
MULTIPLE ANTENNA SYSTEMS
315
amplification. The main difference compared to the conventional detection is that the decision is made with respect to the integer variable z instead of xr . Assuming the decorrelating detector for simplicity, we obtain with H†red · Hred = I2NT 1 1 1 · H†red yr = z − · T−1 12NT ×1 + · H†red nr . 2e 2 2e
(6.80)
Moving the constant offset to the left-hand side of (6.80) and performing an element-wise integer decision Q(·) delivers the estimate & ' 1 1 † r −1 zˆ = Q · Hred y + · T 12NT ×1 . (6.81) 2e 2 Owing to this decision, the LR-aided detection is a nonlinear technique although the decorrelator itself is linear. Finally, the estimate zˆ has to be transformed back into the original domain by bˆ = Tˆz, leading, with (6.73), to (6.82) xˆ r = 2bˆ − 12Nt ×1 e. Certainly, the MMSE detector or the SQLD-SIC scheme can also be applied instead of the decorrelator. The complete receiver structure is depicted in Figure 6.30. Starting with the QL decomposition of Hr already known from Section 5.4, the matrices Q, L and P are obtained. They are fed into the block termed lattice reduction that provides a QL representation of a reduced matrix Hred consisting of a unitary matrix Qred , a lower triangular matrix Lred , and a transformation matrix T. They are subsequently used for the detection process, for example, linear filtering or SQLD-SIC, as well as for the transformation back into the original signal space. Contrary to the lattice theory, we have only finite signal spaces X and B in the original domain, and, therefore, also in the transformed domain. Moreover, it is very important to realize that the signal space of the new variable z is not identical to the original space B2NT as is illustrated in Figure 6.31. The dimensions are no longer linearly independent so that an optimum decision would require a vector quantization providing an estimate of the entire vector zˆ . In order to keep the computational costs as low as possible, we apply instead a scalar quantization that separately delivers estimates zˆ µ ∈ Z, that is, we look for the integer
Hr
[Q, L, P]
SQLD
Lred
Qred yr
QH red
lattice reduction
y˜
SIC
T zˆ
T
xˆ
Figure 6.30 Structure of lattice reduction-aided QLD-SIC detector
316
MULTIPLE ANTENNA SYSTEMS
B2
T−1
TB2
T
Figure 6.31 Original and transformed signal spaces for NT = 1 closest to our decision variable. Hence, it may happen that zˆ lies outside the transformed alphabet so that the backward transformation bˆ = Tˆz also delivers a result apart from B2NT . In this case, we have to apply a second quantization so that valid symbols are obtained. Although this procedure is not optimum, we will see that a performance close to that of the optimum maximum likelihood detector can be achieved. With reference to the application of the LR-based detection in coded systems, it has to be mentioned that the hard decision in the transformed domain makes the calculation of LLRs for the original domain difficult. This is of special interest because hard decisions prior to decoding lead to a loss in performance. Extensions of the LR-aided detection providing soft information are still a current research topic. Extension to MMSE Solution So far, only the zero-forcing implementation of the LR has been described. A better performance can be obtained with the MMSE criterion. Remembering Section 5.4 on page 268, the QL decomposition for the MMSE solution was obtained by extending the system matrix S according to + * S . S = σN σ INu A
A direct application of this approach on the reduced channel matrix Hred results in + * Hred σN = Qred · Lred · Pred (6.83) σX · INT and a subsequent QL decomposition would deliver Qred , Lred , and Pred . However, this does not yield the best performance because the MMSE solution inverts the matrix [(Hr )H Hr + σN2 /σX2 I2NT ] so that we need a roughly orthogonal basis of H and not of H. Instead, the channel matrix H should be directly extended to + * H (6.84) H = σN σX · INT and the QL decomposition as well as the LR are applied to H. This results in Hred = HT (W¨ubben et al. 2004b).
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) ZF
0
b) MMSE
0
10
−1
−1
10
−2
BER →
10
317
−3
−4
−5
0
−3
10
−4
SQLD-SIC LR LR-SQLD-SIC
5 10 15 Eb /N0 in dB →
−2
10
10
−5
20
10
0
SQLD-SIC LR LR-SQLD-SIC
5 10 15 Eb /N0 in dB →
20
Figure 6.32 Performance of lattice reduction aided detection for QPSK system with NT = NR = 4 (solid bold line: MLD performance, bold dashed line: linear detectors) Simulation Results Now, we want to compare the performance of the introduced LR approach to the detection techniques already described in Chapter 5. We consider multiple antenna systems with an identical number of receive and transmit antennas. Moreover, uncorrelated flat Rayleigh fading channels between different pairs of transmit and receive antennas are assumed. Note that no iterations according to the turbo principle are carried out so that we regard a one-stage detector. If the loss compared to the maximum likelihood detector is large, the performance can be improved by iterative schemes as shown in Chapter 5. Figure 6.32 compares the BER performance of an uncoded 4-QAM system with NT = NR = 4 antennas at the transmitter and receiver. Figure 6.32a summarizes the zero-forcing results. The simple decorrelator (bold dashed curve) based on the original channel matrix H shows the worst performance. It severely amplifies the background noise and cannot exploit diversity and so the slope of the curve corresponds to a diversity degree of D = NR − NT + 1 = 1. The ZF-SQLD-SIC detection gains about 7 dB at 10−2 compared to the decorrelator but is still far away from the maximum likelihood performance. It can only partly exploit the diversity as will be shown in Figure 6.33. The decorrelator based on the reduced channel matrix Hred labeled LR performs slightly worse than the ZF-SQLD-SIC at low SNRs and much better at high SNRs.5 At an error rate of 2 · 10−3 , the gain already amounts to 4 dB. On the one hand, the LR-aided decorrelator does not enhance the background noise very much owing to the nearly orthogonal structure. On the other hand, it fully exploits the diversity in all layers as indicated by the higher slope of the error rate curve. Since the reduced channel matrix Hred is not perfectly orthogonal, multilayer interference still disturbs the decision. Hence, a subsequent nonlinear successive interference 5 As already mentioned, the system representation by a reduced channel matrix requires a decision in the transformed domain and a subsequent inverse transformation. Therefore, the whole detector is nonlinear although a linear device was employed in the transformed domain.
318
MULTIPLE ANTENNA SYSTEMS
cancellation applying hard decisions (ZF-SQLD-SIC) can improve the performance by 1 dB. The gain is not as high as for the conventional SQLD-SIC owing to the good condition of Hred . Looking at the MMSE solutions in Figure 6.32b, we recognize that all curves move closer to the MLD performance. The linear MMSE filter based on H performs worst, the LR-based counterpart outperforms the MMSE-SQLD-SIC at high SNR. The LR-SQLDSIC improves the performance such that the MLD curve is reached. Thus, we can conclude that the LR technique improves the performance significantly and that it is well suited for enhancing the signal detection in environments with severe multiple access interference. For the considered scenario, near-maximum likelihood performance is achieved with much lower computational costs. Next, we analyze how the different detectors exploit diversity. From Figure 6.27, we know already that each layer experiences a different diversity degree for QLD-SIC-based approaches. This is again illustrated in Figure 6.33 for the ZF and MMSE criteria. The curves have been obtained by employing a genie-aided detector that perfectly avoids error propagation. Hence, the error rates truly represent the different diversity degrees and do not suffer from errors made in the previous detection steps. The results for the LR-based detection are depicted with only one curve because the error rates of all the layers are nearly identical. Hence, all layers experience the same diversity degree of D = 4 (compare slope with SQLD-4) so that even the first layer can be detected with high reliability. Since this layer dominates the average error rate especially in the absence of a genie, this represents a major benefit compared to QLD-SIC schemes. Wih reference to the MMSE solution, the differences are not as large but still observable. At very low SNRs, the genie-aided MMSE-SQLD-SIC even outperforms the maximum likelihood detector because no layer suffers from interference and decisions are made layer by layer while the MLD has to cope with all layers simultaneously.
BER →
10 10 10 10 10
a) ZF
0
b) MMSE
0
10
−1
−1
10
−2
−3
−4
−5
0
BER →
10
SQLD-SIC-1 SQLD-SIC-2 SQLD-SIC-3 SQLD-SIC-4 LR-SQLD-SIC
5 10 15 Eb /N0 in dB →
−2
10
−3
10
−4
10
−5
20
10
0
SQLD-SIC-1 SQLD-SIC-2 SQLD-SIC-3 SQLD-SIC-4 LR-SQLD-SIC
5 10 15 Eb /N0 in dB →
20
Figure 6.33 Illustration of diversity degree per layer for SQLD and lattice reduction aided detection for QPSK system with NT = NR = 4 (solid bold line: MLD performance)
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) ZF
0
b) MMSE
0
10
−1
−1
10
−2
BER →
10
319
−3
−4
−5
0
10 15 20 25 Eb /N0 in dB →
−3
10
−4
SQLD-SIC LR LR-SQLD-SIC
5
−2
10
10
−5
30
10
0
SQLD-SIC LR LR-SQLD-SIC
5
10 15 20 25 Eb /N0 in dB →
30
Figure 6.34 Performance of lattice reduction aided detection for 16-QAM system with NT = NR = 4 (solid bold line: MLD performance, bold dashed line: linear detector) Figure 6.34 shows the performance of the same system for 16-QAM. First, it has to be mentioned that the computational complexity of LR itself is totally independent of the size of the modulation alphabet. This is a major advantage compared to the ML detector because its complexity grows exponentially with the alphabet size. Compared to QPSK, larger SNRs are needed to achieve the same error rates. However, the relations between the curves are qualitatively still the same. The LR-based SQLD-SIC gains 1 dB compared to the LR-based decorrelator of 2 dB for the MMSE solution. The SQLD-SIC approach based on the original channel matrix is clearly outperformed but the MLD performance is not obtained anymore and a gap of approximately 1 dB remains for the MMSE approach. Finally, a larger system with NT = NR = 6 and 16-QAM is considered. Figure 6.35 shows that the LR-based SQLD-SIC still outperforms the detector based on H but the gap to the maximum likelihood detector becomes larger. The reason is the efficient but suboptimum LLL algorithm (see Appendix C.3) used for the LR. It loses in performance for large matrices because the inherent sorting gets worse. This is also the reason why the LR-aided detector was not introduced in the context of multiuser detection in CDMA systems in Chapter 5. The considered CDMA systems have much more inputs and outputs (larger system matrices S) than the multiple antenna systems analyzed here so that no advantage could have been observed when compared with the conventional SQLD-SIC.
6.4 Linear Dispersion Codes A unified description for space–time coding and spatial multilayer transmission can be obtained by LD codes that were first introduced by Hassibi and Hochwald (2000, 2001, 2002). Moreover, this approach offers the possibility of finding a trade-off between diversity and multiplexing gain (Heath and Paulraj 2002). Generally, the matrix X describing the
320
MULTIPLE ANTENNA SYSTEMS
BER →
10 10 10 10 10
a) ZF
0
b) MMSE
0
10 SQLD-SIC LR-SQLD-SIC
−1
10
−2
−3
−4
−2
10
−3
10
−4
10
−5
0
SQLD-SIC LR-SQLD-SIC
−1
BER →
10
−5
5
10 15 20 25 Eb /N0 in dB →
30
10
0
5
10 15 20 25 Eb /N0 in dB →
30
Figure 6.35 Performance of lattice reduction aided detection for 16-QAM system with NT = NR = 6 (solid bold line: MLD performance)
space–time codeword or the BLAST transmit matrix is set up of K symbols aµ . As we know from STTCs, a linear description requires the symbols aµ and their conjugate complex counterparts or, alternatively, the real-valued representation by aµ and aµ with aµ = aµ + j aµ . The codeword can be constructed by X=
K µ=1
Bc1,µ · aµ + Bc2,µ · aµ∗ =
K
Br1,µ · aµ + Br2,µ · aµ =
µ=1
2K
Brµ · aµr .
(6.85)
µ=1
The dispersion matrices Bci,µ with i = 1, 2 are used for the complex description, where the index i = 1 is associated with the original symbols and i = 2 with their complex conjugate versions. The real-valued alternative in (6.85) also uses 2K matrices Bri,µ and distinguishes between real and imaginary parts by using indices i = 1, 2, respectively. A generalization is obtained with the right-hand side in (6.85) assuming a set of 2K real-valued symbols aµr with 1 ≤ µ ≤ 2K. The first K elements may represent the real parts aµ and the second K elements the imaginary parts aµ . It depends on the choice of the matrices whether a space–time code, a multilayer transmission, or a combination of both is implemented. In the following part, a few examples, in order to illustrate the manner in which LD codes work, are presented.
6.4.1 LD Description of Alamouti’s Scheme First, we look at the Alamouti’s STBC. As we know, the codeword X2 comprises K = 2 symbols that are arranged over two antennas and two time slots. The matrix has the form + * + * + * * + * + 0 a1 0 0 −a2 0 j a2 a −a2∗ j a1 = + + X2 = 1 + a2 a1∗ a2 0 j a2 0 0 a1 0 −j a1
MULTIPLE ANTENNA SYSTEMS
321
For the complex-valued description, we obtain the matrices * * * * + + + + 1 0 0 0 0 −1 0 0 c c c c B1,1 = , B1,2 = , B2,1 = , B2,2 = . 0 0 1 0 0 0 0 1 Consequently, the real-valued case uses the matrices * * * + + 1 0 0 −1 j r r r B1 = , B2 = , B3 = 0 1 1 0 0
* + 0 0 r , B4 = −j j
j 0
+
where Br1 is associated with a1 , Br2 with a2 , Br3 with a1 , and Br4 with a2 . In the same way, dispersion matrices can be developed for any linear STBC. However, codes without orthogonal designs may require high computational decoding costs because simple matched filtering is not optimum anymore.
6.4.2 LD Description of Multilayer Transmissions Next, we take a look at the multilayer transmission, for example, the BLAST architecture. Following the description of the previous section, NT independent symbols are simultaneously transmitted at each time instant. Hence, each codeword matrix has exactly L = 1 columns so that the dispersion matrices reduce to column vectors. For the complex-valued variant, the vector Bc1,µ consists only of zeros with a single one at the µth position while Bc2,µ = 0NT ×1 holds. For the special case of NT = 2, we obtain * + * + * + * + 1 0 0 0 Bc1,1 = , Bc1,2 = , Bc2,1 = , Bc2,2 = . 0 1 0 0 On the contrary, Br1 =
* + * + * + * + 1 0 j 0 , Br2 = , Br3 = , Br4 = 0 1 0 j
holds for the real-valued case.
6.4.3 LD Description of Beamforming Even beamforming in multiple-input multiple-output (MINO) systems can be described by linear dispersion codes. While the matrices Bc,r µ used so far have been independent of the instantaneous channel matrix, the transmitter certainly requires channel state information (CSI) when beamforming shall be applied. Considering a MISO system, the channel matrix reduces to a row vector h that directly represents the singular vector to be used for beamforming (see page 306). Using the complex notation, the LD description becomes x = Bc1 · a1
⇒
y = h · Bc1 · a1 + n
where the matrix Bc1 = hH reduces to a column vector. Since a1 = a1 + j a1 holds, the real-valued notation has the form x=
2
Brµ · aµr = Br1 · a1 + Br2 · a1
µ=1
with Br1 = hH and Br2 = j hH . For MIMO systems with more than one receive antenna, the right singular vector corresponding to the largest singular vector has to be chosen.
322
MULTIPLE ANTENNA SYSTEMS
6.4.4 Optimizing Linear Dispersion Codes Using the real-valued description, the received data block can generally be expressed with Y=H·X+N=H·
2K
Brµ · aµr + N.
(6.86)
µ=1
It consists of NR rows according to the number of receive antennas and L columns denoting the duration of a space–time codeword. Stacking the columns of the matrices Brµ in (6.86) into long vectors with the operator x1 . vec{X} = vec x1 · · · xn = .. xn delivers
2K
vec Brµ · aµr = Br · ar
(6.87)
µ=1
where the vector ar comprises all data symbols aµr and the matrix Br contains in column µ the vector vec{Brµ }. Since the time instants are not arranged in columns anymore but stacked one below the other, the channel matrix H has to be enlarged by repeating it L times. This can be accomplished by the Kronecker product that is generally defined as A1,1 · B · · · A1,N · B .. .. A⊗B= . . . AM,1 · B · · · AM,N · B Applying the vec-operator to the matrices Y and N leads to the expression ˜ · Br · ar + vec {N} . y = vec {Y} = (IL ⊗ H) · Br · ar + vec {N} = H
(6.88)
The optimization of LD codes can be performed with respect to different measures. Looking at the ergodic capacity already known from Section 2.3 on page 73, we have to choose the matrix Br according to ! # σN2 r H ˜H ˜ B = argmax log2 det I + 2 · HBB H (6.89a) σA B subject to a power constraint, for example, 2K tr Brµ (Brµ )H = K.
(6.89b)
µ=1
Results for this optimization can be found in Hassibi and Hochwald (2000, 2001, 2002). A different approach considering the error rate performance as well is presented in Heath and Paulraj (2002). Generally, the obtained LD codes do not solely pursue diversity or multiplexing gains but can achieve a trade-off between both aspects.
MULTIPLE ANTENNA SYSTEMS
323
6.4.5 Detection of Linear Dispersion Codes For the special case when LD codes are used to implement orthogonal STBCs, simple matched filters as explained in Section 6.2 represent the optimal choice. For multilayer transmissions as well as the general case, we can combine all matrices before the data vector ar in (6.88) into an LD channel matrix HLD and obtain y = Hld · s + n.
(6.90)
With (6.90), we can directly apply multilayer detection techniques from Sections 5.4 and 6.3.
6.5 Information Theoretic Analysis In this section, the theoretical results of Section 2.3 for multiple antenna systems are illustrated. We consider uncorrelated as well as correlated frequency-nonselective MIMO channels and determine the channel capacities for Gaussian distributed input signals for different levels of channel knowledge at the transmitter. Perfect channel knowledge at the receiver is always assumed.
6.5.1 Uncorrelated MIMO Channels First, the uncorrelated SIMO channel is addressed, that is, we obtain the simple receive diversity. The capacity can be directly obtained from (2.78) in Section 2.3. An easier way is to consider the optimal receive filter derived in Section 1.5 performing maximum ratio combining of all NR signals. This results in an equivalent SISO fading channel whose instantaneous SNR depends on the squared norm h[k]2 . Hence, the instantaneous channel capacity has the form & ' 2 Es C[k] = log2 1 + h[k] . (6.91) N0 Ergodic capacities and outage probabilities can be determined from (6.91) by using the statistics of h[k]2 . For independent Rayleigh fading channels, the random variable is chi-squared distributed with 2NR degrees of freedom. Figure 6.36a shows the ergodic capacity for an uncorrelated SIMO channel with up to four outputs versus the SNR per receive antenna. We observe that the capacity increases with growing number of receive antennas owing to the higher diversity degree and the array gain. The latter one shifts the curves by 10 log10 (NR ) to the left, that is, doubling the number of receive antennas leads to an array gain of 3 dB. Concentrating only on the diversity gain, we have to depict the curves versus the SNR after maximum ratio combining as shown in Figure 6.36b. We recognize that the capacity gains due to diversity are rather small and the slope of the curves is independent of NR . Hence, the capacity enhancement depends mainly logarithmically on the SNR because the channel vector h obviously has rank r = 1 owing to NT = 1, that is, only one nonzero eigenvalue exists so that only one data stream can be transmitted at a time. In this scenario, multiple receive antennas can only increase the link reliability, leading to moderate capacity enhancements. Nevertheless, the outage probability can be significantly decreased by diversity techniques (cf. Section 1.5).
324
MULTIPLE ANTENNA SYSTEMS b) SNR after combining
a) SNR per receive antenna
C→
10 8
NR NR NR NR
12
=1 =2 =3 =4
6
10
C→
12
8
NR NR NR NR
=1 =2 =3 =4
6
4
4
2
2
0 0 10 20 30 Es /N0 in dB per receive antenna
0 0
10 20 Es /N0 in dB →
30
Figure 6.36 Channel capacity versus SNR for i.i.d. Rayleigh fading channels, NT = 1 transmit antenna, and NR receive antennas On the contrary, Figure 6.37a shows the capacity for a system with NT = 4 transmit antennas and different number of receive antennas with i.i.d. channels where the total transmit power is fixed at Es /Ts . First, we take a look at the case of a single receive and NT = 4 transmit antennas. The instantaneous capacity of this scheme is & ' h[k]2 Es (6.92) C[k] = log2 1 + · NT N0 because the transmit power is fixed independent of NT . The comparison with Figure 6.36b that normalizes the SNR to the number of receive antennas shows that the combinations NR = 4, NT = 1 and NR = 1, NT = 4 provide identical results, that is, the system is symmetric. However, Figure 6.36a illustrates differences of 10 log10 (NR ) dB between the curves. This discrepancy can be explained by the fact that perfect channel knowledge at the receiver was assumed, allowing receive beamforming and delivering an array gain while no CSI was assumed at the transmitter. Obviously, transmit diversity schemes provide no array gain. The capacity of the general case with multiple receive and transmit antennas can be directly calculated with (2.78) on page 74. From Figure 6.37, we observe that the slope of the curve grows with increasing NT according to the parameter m = min[NR , NT ]. This indicates that m parallel virtual channels exist over which parallel data streams can be transmitted. Hence, the data rate is multiplied by m so that multiple antenna systems may increase the capacity linearly with m, while the SNR may increase it only logarithmically. This emphasizes the high potential of multiple antennas at the transmitter and receiver. Figure 6.37b demonstrates the influence of perfect channel knowledge at the transmitter, allowing the application of the waterfilling principle introduced in Section 2.3. A comparison with Figure 6.37a shows that the capacity is improved only for NT > NR and high SNR. If we have more receive than transmit antennas, the best strategy for high SNRs is
MULTIPLE ANTENNA SYSTEMS
325 b) waterfilling
a) no CSI at transmitter 30
C→
25
NR NR NR NR
35
=1 =2 =3 =4
30 25
C→
35
20
15
10
10
5
5 10 20 Es /N0 in dB →
30
=1 =2 =3 =4
20
15
0 0
NR NR NR NR
0 0
10 20 Es /N0 in dB →
30
Figure 6.37 Channel capacity versus SNR for i.i.d. Rayleigh fading channels, NT = 4 transmit antennas, and NR receive antennas (SNR per receive antenna)
to distribute the power equally over all antennas. Since this is automatically done in the absence of channel knowledge, waterfilling provides no additional gain for NR = NT = 4. Similar to Section 1.5, we can analyze the outage probability of multiple antenna systems, that is, the probability Pout that a certain rate R is not achieved. From Chapter 2, we know that diversity decreases the outage probability because the SNR variations are reduced. This behavior can also be observed from Figure 6.38. Especially figure 6.38a emphasizes that diversity reduces the outage probability and the rapid growth of the curves starts later at higher rates R. However, they also become steeper, that is, a link becomes quickly unreliable if a certain rate is exceeded. Generally, increasing max[NT , NR ] while keeping the minimum constant does not lead to an additional eigenmode and diversity increases the link reliability. On the contrary, increasing min[NT , NR ] shifts the curves to the right because the number of virtual channels and, therefore, the data rate is increased. A strange behavior can be observed in Figure 6.39 for high rates R above the ergodic capacity C. Here, increasing the number of transmit antennas, and, thus the diversity degree, does not lead to a reduction of Pout . Comparing the curves for NR = 1 and NT = 1, 2, 3, 4 (MISO channels) directly, we recognize that Pout even increases with NT . The reason is that the variations of the SNR are reduced so that very low and also very high instantaneous values occur more rarely. Therefore, very high rates are obtained less frequently than for low diversity degrees.
6.5.2 Correlated MIMO Channels Correlated MIMO systems are now considered. This scenario occurs if the antenna elements are arranged very close to each other and the impinging waves arrive from a few dominant directions. Hence, we do not have a diffuse electromagnetic field with a uniform distribution of the angles of arrival, but preferred directions θµ with a certain angle spread θµ .
326
MULTIPLE ANTENNA SYSTEMS b) NT = 2 1
0.8
0.8
0.6
0.6
0.4
NR = 1 NR = 2 NR = 3 NR = 4
0.2 0 0
4
8 R→
12
Pout →
Pout →
a) NT = 1 1
0.4
NR = 1 NR = 2 NR = 3 NR = 4
0.2 0 0
16
4
0.8
0.8
0.6
0.6
NR = 1 NR = 2 NR = 3 NR = 4
0 0
4
8 R→
12
16
Pout →
Pout →
1
0.2
12
16
d) NT = 4
c) NT = 3 1
0.4
8 R→
0.4
NR = 1 NR = 2 NR = 3 NR = 4
0.2 0 0
4
8 R→
12
16
Figure 6.38 Outage probability versus rate R in bits/s/Hz for i.i.d. Rayleigh fading channels and a signal-to-noise ratio of 10 dB
Figure 6.40 compares the ergodic capacity of i.i.d. and correlated 4 × 4 MIMO channels for different levels of channel knowledge at the transmitter. First, it can be seen that perfect channel knowledge (CSI) at the transmitter does not increase the capacity of uncorrelated channels except for very low SNRs. Hence, the best strategy over a wide range of SNRs is to transmit four independent data streams. With reference to the correlated MIMO channel, we can state that channel knowledge at the transmitter increases the capacity. Hence, it is necessary to have CSI at the transmitter for correlated channels. Moreover, the ergodic capacity is greatly reduced because of correlations. Only for extremely low SNRs, correlations can slightly improve the capacity because in this specific scenario, increasing the SNR by beamforming is better than transmitting parallel data streams. Finally, we analyze the performance when only long-term channel knowledge is available at the transmitter. This means that we do not know the instantaneous channel matrix H[k] but its covariance matrix H H = E{HH H}. This approach is motivated by the fact
MULTIPLE ANTENNA SYSTEMS
327
1
NT = 1 NT = 2 NT = 3 NT = 4
Pout →
0.8 0.6 0.4 0.2 0 0
2
4 R→
6
8
Figure 6.39 Outage probability versus rate R in bits/s/Hz for NR = 1 and i.i.d. Rayleigh fading channels and a signal-to-noise ratio of 10 dB
35
i.i.d., noCSI i.i.d., CSI corr., noCSI corr., CSI corr., lt CSI
30
C→
25 20 15 10 5 0 0
5
10 15 20 Es /N0 in dB →
25
30
Figure 6.40 Channel capacity versus SNR for i.i.d. and correlated Rayleigh fading channels, NT = 4 transmit, and NR = 4 receive antennas
that long-term statistics such as angle of arrivals remain constant for a relatively large duration and can therefore be accurately estimated. Moreover, it is often assumed that these ˆ HH long-term properties are identical for uplink and downlink allowing the application of measured in the downlink for the uplink transmission. From Figure 6.40, we see that the knowledge of the covariance matrix (lt CSI) leads to the same performance as optimal CSI for correlated channels. In the absence of correlations, only instantaneous channel information can improve the capacity and long-term statistics do not help at all.
328
MULTIPLE ANTENNA SYSTEMS
6.6 Summary In this chapter, we analyzed the potential of multiple antenna techniques for point-to-point communications. Starting with diversity concepts, we saw that spatial diversity is obtained with multiple antennas at the receiver as well as the transmitter. Space–time transmit diversity schemes do not require channel knowledge at the transmitter but provide the full diversity degree. We distinguished orthogonal STBCs and STTCs. The latter yield an additional coding gain at the expense of a much higher decoding complexity. While diversity increases the link reliability, the great potential of MIMO systems can be exploited by multilayer transmissions discussed in Section 6.3. Here, parallel data streams termed layers are transmitted over different antennas. Without channel knowledge at the transmitter, the detection problem represents the major challenge. Besides multilayer (or multiuser) detection techniques already introduced in Chapter 5, a new algorithm based on the LR has been derived. It shows superior performance at moderate complexity. In Section 6.4, we demonstrated that LD codes provide a unified description of space–time coding and multilayer concepts. With this concept, the trade-off between diversity and multilayer gains can be optimized. Finally, the channel capacity of MIMO systems has been illustrated by numerical examples. It turned out that the rank of the channel matrix determines the major capacity improvement compared to SISO systems and that pure diversity concepts only lead to a minor capacity growth.
Appendix A
Channel Models A.1 Equivalent Baseband Representation The output of the receive filter gR (t) can be expressed by ' & 1 −j ω0 t + y(t) = gR (t) ∗ yBP (t) √ e 2 ' & 1 −j ω t 0 = gR (t) ∗ yBP (t) + j H {yBP (t)} √ e 2 & = gR (t) ∗ hBP (t, τ ) ∗ xBP (t) + nBP (t) ' 1 −j ω t 0 . +j H {hBP (t, τ ) ∗ xBP (t) + nBP (t)} √ e 2 The convolution in (A.2) is defined in (1.7) ∞ h(t, τ ) ∗ x(t) = h(t, τ )x(t − τ )dτ.
(A.1)
(A.2)
0
Exploiting the linearity of the Hilbert transform and the property H {a(t) ∗ b(t)} = a(t) ∗ H {b(t)} yields & y(t) = gR (t) ∗ hBP (t, τ ) ∗ xBP (t) + j hBP (t, τ ) ∗ H {xBP (t)} ' 1 +n+ (t) √ e−j ω0 t 2 −j ω0 t = gR (t) ∗ hBP (t, τ )e ∗ x(t) + n(t) with x(t) = x + (t) √12 e−j ω0 t equivalent to (1.1) and ' & 1 + −j ω0 t . n(t) = gR (t) ∗ n (t) √ · e 2 Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
(A.3)
(A.4)
330
CHANNEL MODELS
Owing to GR (j ω) = 0 for |ω| > B, B f0 and property (1.9) of the analytical signal, gR (t) ∗ hBP (t, τ )e−j ω0 t = F −1 {GR (j ω) · HBP (t, j ω − j ω0 )}
F 1 + = F −1 GR (j ω) · HBP (t, j ω − j ω0 ) 2 ' & 1 + −j ω0 t h (t, τ )e = gR (t) ∗ 2 BP = gR (t) ∗ h(t, τ )
(A.5)
holds. Thus, we get y(t) = gR (t) ∗ h(t, τ ) ∗ x(t) + n(t) = Ts · x[k] · gR (t) ∗ h(t, τ ) ∗ gT (t − kTs ) + n(t).
(A.6) (A.7)
k
The twofold convolution can be interpreted as a single filter ˜ kTs ) = gR (t) ∗ h(t, τ ) ∗ gT (t − kTs ) h(t, and (A.7) becomes y(t) = Ts ·
˜ kTs ) + n(t). x[k] · h(t,
(A.8) (A.9)
k
A.2 Typical Propagation Profiles for Outdoor Mobile Radio Channels In order to receive realistic parameters of mobile radio channels, extensive measurements have been carried out by COST 207 (European Cooperation in the Fields of Scientific and Technical Research) (COST 1989) for the global system for mobile communications GSM. The obtained power delay profiles are listed in Table A.1 and represent typical propagation scenarios. Table A.1 Power delay profile of COST 207 (COST 1989) (delays τ in µs) profile
power delay profile h, h (τ )
Rural Area (RA)
9.21 · exp(−9.2τ ) 0 ≤ τ < 0.7 0 else
Typical Urban (TU)
exp(−τ ) 0 ≤ τ < 7 0 else
Bad Urban (BU)
Hilly Terrain (HT)
0.67 · exp(−τ ) 0≤τ <5 0.335 · exp(5 − τ ) 5 ≤ τ < 10 0 else 3.08 · exp(−3.5τ ) 0≤τ <2 0.1232 · exp(15 − τ ) 15 ≤ τ < 20 0 else
CHANNEL MODELS
331
Table A.2 Doppler power spectrum of COST 207 (COST 1989) Doppler power spectrum hh (fd )
delay 0 < τ < 0.5 µs 0.5 < τ < 2 µs τ > 2 µs
√
A 1−(fd /fd max )2
0
/
|fd | ≤ fd max else, 0
0 / 2 d −0.4fd max ) · exp − (f2(0.1f 2 d max ) 0 0 / / 2 (fd +0.4fd max )2 B d −0.7fd max ) + · exp − B · exp − (f2(0.1f 31.6 )2 2(0.15f )2
A · exp
2 d +0.8fd max ) − (f2(0.05f 2 ) d max
+
A 10
d max
Rural Area (τ = 0)
2πfd max
√0.41
1−(fd /fd max )2
d max
+ 0.91 · δ(fd − 0.7fd max )
Table A.3 Propagation conditions for UMTS in multipath fading environments (3GPP 2005b), delays τ in ns and rel. powers |h|2 in dB, fd classically distributed v = 3 km/h τ 0 976
|h|2 0 −10
v = 3 km/h τ 0 976 20000
|h|2 0 0 0
v = 120 km/h τ 0 260 521 781
|h|2 0 −3 −6 −9
v = 3 km/h τ 0 976
|h|2 0 0
v = 250 km/h τ 0 260 521 781
|h|2 0 −3 −6 −9
Equivalent results were obtained for the Doppler power spectra listed in Table A.2. Principally, the statistical characteristics of the Doppler power spectra are affected by the delay τ . For delays smaller than 0.5 µs, hh (fd ) has a distribution according to the Jakes spectrum, while for larger τ Gaussian distributions with different means and variances occur. The rural area (RA) scenario represents a special case because it is characterized by a line-of-sight link (Rice fading). According to the requirements of the universal mobile telecommunication system (UMTS) standard, different propagation scenarios were defined. They are summarized in Table A.3. Five cases are distinguished that differ with respect to velocity and the number of taps.
A.3 Moment-Generating Function for Ricean Fading The channel coefficient h of a frequency-nonselective Ricean fading channel with average power P and Rice factor K has the form given in (1.28) A /√ 0 P · K +α . h= K +1 It consists of two parts, a constant line-of-sight component and a fading component represented by the factor α whose real and imaginary parts are statistically independent zero-mean Gaussian processes each with variance 1/2. Hence, the real part H of H is
332
CHANNEL MODELS
Gaussian distributed 1 pH (ξ ) = 2π σH2 A =
/ 02 PK ξ − K+1 · exp − 2σH 2
! A #2 √ K +1 K +1 · exp − ξ − K πP P
(A.10a)
√ with mean P K/(1 + K) and variance σH2 = P /[2(K + 1)], while the imaginary part H is Gaussian distributed with the same variance but zero mean. 3 4 A * + (K + 1)ξ 2 K +1 1 ξ2 (A.10b) · exp − · exp − 2 = pH (ξ ) = πP P 2σH 2π σ 2 H
In order to calculate the density of |H|2 , we have to deal with the densities of (H )2 and (H )2 . In Papoulis (1965), the general condition / 8 02 1 /8 0 1 pX 2 (ξ ) = √ · pX ξ + pX − ξ (A.11) 2 ξ between the pdf of a process X and the pdf of X 2 is given. With (A.11), we obtain for the squared real part a noncentral chi-square distribution with one degree of freedom " pH 2 (ξ ) =
3 A 4 * + ξ(K + 1) K +1 ξ K(K + 1) · exp − − K · cosh 2 πP ξ P P
(A.12a)
and for the squared imaginary part a central chi-square distribution with one degree of freedom " * + ξ(K + 1) K +1 . (A.12b) · exp − pH 2 (ξ ) = πP ξ P Since the squared magnitude of H is obtained by adding the squared magnitudes of the real and imaginary parts, their probability densities have to be convolved. This is equivalent to multiplying the corresponding moment-generating functions. They have the form (Proakis 2001) A * + sKP K +1 (A.13a) · exp MH 2 (s) = K + 1 − sP (K + 1) − sP A
and
K +1 . K + 1 − sP Consequently, we obtain the overall moment-generating function * + sKP K +1 M|H|2 (s) = · exp . K + 1 − sP (K + 1) − sP MH 2 (s) =
(A.13b)
(A.14)
Appendix B
Derivations for Information Theory B.1
Chain Rule for Entropies
Let X 1 , X2 , up to Xn be random variables belonging to a joint probability Pr{X1 , . . . , Xn }, the chain rule for entropy has the form: n
I¯(X1 , X2 , . . . , Xn ) =
I¯(Xi | Xi−1 · · · X1 ).
(B.1)
i=1
Proof: We simply repeat the application of the chain rule for two random variables. I¯(X1 , X2 ) = I¯(X1 ) + I¯(X2 | X1 ) I¯(X1 , X2 , X3 ) = I¯(X1 ) + I¯(X2 , X3 | X1 ) = I¯(X1 ) + I¯(X2 | X1 ) + I¯(X3 | X1 , X2 ) .. . I¯(X1 , X2 , . . . , Xn ) = I¯(X1 ) + I¯(X2 , . . . , Xn | X1 ) = I¯(X1 ) + I¯(X2 | X1 ) + I¯(X3 , . . . , Xn | X1 , X2 ) =
n
I¯(Xi | Xi−1 · · · X1 ).
i=1
B.2
Chain Rule for Information
The general chain rule for information is as follows (Cover and Thomas 1991): I¯(X1 , . . . , Xn ; Z) =
n
I¯(Xi ; Z | I¯(Xi−1 , . . . , X1 ).
i=1 Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
(B.2)
334
DERIVATIONS FOR INFORMATION THEORY
Proof: We apply the chain rule for entropies I¯(X1 , . . . , Xn ; Z) = I¯(X1 , . . . , Xn ) − I¯(X1 , . . . , Xn | Z) =
n
I¯(Xi | Xi−1 · · · X1 ) −
i=1
=
n
n
I¯(Xi | Xi−1 · · · X1 , Z)
i=1
I¯(Xi | Xi−1 · · · X1 ) − I¯(Xi | Xi−1 · · · X1 , Z)
i=1
=
n
I¯(Xi ; Z | Xi−1 · · · X1 ).
i=1
B.3
Data-Processing Theorem
Data-processing theorem: We consider a Markovian chain X → Y → Z where X and Z are independent given Y, that is, I¯(X ; Z | Y) = 0. The data-processing theorem states the following: I¯(X ; Y) ≥ I¯(X ; Z). (B.3) Proof: The mutual information I¯(X ; Y, Z) can be extended with the chain rule in two different ways: I¯(X ; Y, Z) = I¯(X ; Z) + I¯(X ; Y | Z) = I¯(X ; Y) + I¯(X ; Z | Y).
(B.4)
Owing to the condition I¯(X ; Z | Y) = 0, we see from (B.4) I¯(X ; Y) = I¯(X | Z) + I¯(X ; Y | Z) .
(B.5)
≥0
Since entropies are always nonnegative, I¯(X ; Y | Z) ≥ 0 holds and we obtain the inequality I¯(X ; Y) ≥ I¯(X | Z).
(B.6)
Appendix C
Linear Algebra C.1 Selected Basics This appendix summarizes some basic results of linear algebra. It is not comprehensive and focuses only on topics needed in this book. Unless otherwise stated, we consider complex vectors and matrices. An N × N identity matrix is denoted by IN , 0N×M is an N × M matrix containing only zeros and 1N×M a matrix of the same size consisting only of ones. Definition C.1.1 (Determinant) A determinant uniquely assigns a real or complex-valued number det(A) to an N × N matrix A. The determinant is zero if a row (column) only consists of zeros or if it can be represented as a linear combination of other rows (columns). The determinant of a product of square matrices is identical to the product of the corresponding determinants det(AB) = det(A) · det(B). (C.1) According to Telatar (1995), we can rewrite (C.1) as det(I + AB) = det(I + BA).
(C.2)
Definition C.1.2 (Hermitian Operation) The Hermitian of a matrix (vector) is defined as the transposed matrix (vector) with complex conjugate elements T T AH = A∗ and xH = x∗ (C.3) The following rules exist: • (µA + νB)H = µ∗ AH + ν ∗ BH • (AB)H = BH AH H • AH =A −1 H H −1 • A = A • In relation to real-valued matrices, the Hermitian form equals the transposed form: AH = AT . Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
336
LINEAR ALGEBRA
Definition C.1.3 (Inner Product) The inner product or dot product (Golub and van Loan T T 1996) of two complex N × 1 vectors x = x1 , x2 , . . . , xN and y = y1 , y2 , . . . , yN is defined by N xH y = xi∗ yi (C.4) i=1
where xi∗ denotes the complex conjugate value of xi . The definition of the inner product allows the calculation of the length of a vector consisting of complex elements: 8 √ x = xH x = |x1 |2 + |x2 |2 + · · · + |xN |2 . (C.5) Two vectors x and y are called unitary, if their inner product is zero (xH y = 0). This is a complex generalization of the orthogonality of real-valued vectors (xT y = 0) and sometimes called conjugated orthogonality (Zurm¨uhl and Falk 1992). For real vectors, the unitary and orthogonal properties are identical. Definition C.1.4 (Spectral norm) The spectral norm or 2 norm of an arbitrary N × M matrix A is defined as (Golub and van Loan 1996) A2 = sup x=0
AX . x
(C.6)
It describes the maximal amplification of a vector x that experiences a linear transformation by A. The spectral norm has the following basic properties. • The spectral norm of a matrix equals its largest singular value σmax A2 = max(σi ) = σmax . i
(C.7)
• The spectral norm of the inverse A−1 is identical to the reciprocal of the smallest singular value σmin of A A−1 2 =
1 1 . = mini (σi ) σmin
(C.8)
Definition C.1.5 (Frobenius norm) The Frobenius norm of an arbitrary N × M matrix A is defined as (Golub and van Loan 1996) C DN M D |Ai,j |2 = tr{AAH }. (C.9) AF = E i=1 j =1
Obviously, the squared Frobenius norm is A2F = tr{AAH }. Definition C.1.6 (Rank) For an arbitrary matrix A, the largest number r of linearly independent columns always equals the largest number of linearly independent rows. This number r is called the rank of a matrix and is denoted by r = rank(A).
LINEAR ALGEBRA
337
From this definition, it follows directly that the rank of an N × M matrix is always less than or equal to the minimum of N and M: r = rank(A) ≤ min(N, M).
(C.10)
We can derive the following properties with the definition of rank(A): • An N × N matrix A is called regular if its determinant is nonzero and, therefore, r = rank(A) = N holds. For regular matrices, the inverse A−1 with A−1 A = IN×N exists. • If the determinant is zero, r = rank(A) < N holds and the matrix is called singular. The inverse does not exist for singular matrices. • For each N × N matrix A of rank r, there exist at least one r × r submatrix whose determinant is nonzero. The determinants of all (r + 1) × (r + 1) submatrices of A are zero. • The rank of the product AAH is rank(AAH ) = rank(A).
(C.11)
Definition C.1.7 (Eigenvalue Problem) The calculation of the eigenvalues λi and the eigenvectors xi of a square N × N matrix A is called eigenvalue problem. The goal is to find a vector x that is proportional to Ax and, therefore, fulfills the eigenvalue equation A · x = λ · x.
(C.12)
This equation can be rewritten as (A − λ IN ) x = 0. Since we are looking for the nontrivial solution x = 0, the columns of (A − λ IN ) have to be linearly dependent, resulting in the equation det (A − λ IN ) = 0 that holds. Hence, the eigenvalues λi represent the zeros of the characteristic polynomial pN (λ) = det(A − λIN ) of rank N . Each N × N matrix has exactly N eigenvalues that need not be different. For each eigenvalue λi , the equation A − λi IN xi = 0 has to be solved with respect to the eigenvector xi . There always exist solutions xi = 0. Besides xi , c · xi is also an eigenvector corresponding to λi . Hence, we can normalize the eigenvectors to unit length. The eigenvectors x1 , . . . , xk belonging to different eigenvalues λ1 , . . . , λk are linearly independent of each other (Horn and Johnson 1985; Strang 1988). There exist the following relationships between the matrix A and its eigenvalues: • The sum of all eigenvalues is identical to the sum of all N diagonal elements called trace of a square matrix A tr(A) =
r=N i=1
Ai,i =
r=N i=1
λi .
(C.13)
338
LINEAR ALGEBRA • If a square matrix A has full rank r = N , the product of its eigenvalues equals the determinant of A: r=N 7 λi = det(A) (C.14) i=1
If the matrix is rank deficient with r < N , the product of the nonzero eigenvalues equals the determinant of the r × r submatrix of rank r. • An eigenvalue λi = 0 exists if and only if the matrix is singular, that is, det(A) = 0 holds. Definition C.1.8 (Orthogonality) A real-valued matrix is called orthogonal, if its columns are mutually orthogonal. Therefore, the inner product between different columns becomes qTi qj = 0. If all the columns of an orthogonal matrix have unit length, qTi qj = δ(i, j )
(C.15)
holds and the matrix is called orthonormal. Orthonormal matrices are generally denoted by Q and have the properties QT Q = IN
QT = Q−1 .
⇔
(C.16)
Definition C.1.9 (Unitary Matrix) A complex N × N matrix with orthonormal columns is called a unitary matrix U with the properties UH U = UUH = IN
⇔
UH = U−1
(C.17)
The columns of U span an N -dimensional orthonormal vector space. From the definition of a unitary matrix U, it follows that: • all eigenvalues of U have unit length (|λi | = 1); • unitary matrices are normal because UUH = UH U = IN holds; • eigenvectors belonging to different eigenvalues are orthogonal to each other; • the inner product xH y between two vectors remains unchanged if each vector is multiplied with a unitary matrix U because (Ux)H (Uy) = xH UH Uy = xH y holds; • the length of a vector does not change when multiplied with U: Ux = x; • a random matrix B has the same statistical properties as the matrices BU and UB; • the determinant of a unitary matrix amounts to det(U) = 1 (Blum 2000). Definition C.1.10 (Hermitian Matrix) A square matrix A is called Hermitian if it equals its complex conjugate transposed version. A = AH
(C.18)
LINEAR ALGEBRA
339
The real part of an Hermitian matrix is symmetric and the imaginary part is antisymmetric: (C.19) Re {A} = Re {A}T and Im {A} = −Im AT . Obviously, the symmetric and Hermitian properties are identical for real matrices. Hermitian matrices have the following properties (Strang 1988): • all diagonal elements Ai,i are real; • for each element, Ai,j = A∗j,i holds; • for all complex vectors x, the number xH Ax is real; • AAH = AH A holds because the matrix A is normal; • the determinant det(A) is real; • all eigenvalues λi of an Hermitian matrix are real; • the eigenvectors xi of a real symmetric matrix or an Hermitian matrix are orthogonal to each other, if they belong to different eigenvalues λi . Definition C.1.11 (Eigenvalue Decomposition) An N × N matrix A with N linear independent eigenvectors xi can be transformed into a diagonal matrix (Horn and Johnson 1985). This can be accomplished by generating the matrix U whose columns comprise all eigenvectors of A. It follows that λ1 λ2 (C.20) U−1 AU = = . .. λN with U = (x1 , x2 , . . . , xN ). The eigenvalue matrix is diagonal and contains the eigenvalues of A on its diagonal. From definition C.1.11, it follows directly that each matrix A can be expressed as A = UU−1 = UUH (eigenvalue decomposition). Definition C.1.12 (c) A generalization of definition (C.1.11) for arbitrary N × M matrices A is called singular value decomposition (SVD). A matrix A with rank r can be expressed as (C.21) A = UVH with the unitary N × N matrix U and the unitary M × M matrix V. The columns of U contain the eigenvectors of AAH and the columns of V contain the eigenvectors of AH A. The matrix is an N × M diagonal matrix with nonnegative, real-valued elements σk on its diagonal. Denoting the eigenvalues of AAH and, therefore, also of AH A with λk , 1 ≤ k ≤ r, the diagonal elements σk are the positive square roots of λk 8 σ k = λk 1 ≤ k ≤ r. (C.22)
340
LINEAR ALGEBRA
They are called singular values of A. For the matrix containing the singular values, we obtain σ1 0 . . . 0 0 . . . 0 0 σ2 0 0 0 r rows .. . . . . .. . . .. .. . (C.23) = 0 0 . . . σr 0 . . . 0 0 0 ... 0 0 ... 0 . . . .. .. .. . .. .. . M-r rows 0 0 ... 0 0 ... 0
r columns
N-r columns
Definition C.1.13 (Spectral Theorem) A real symmetric matrix can be transformed into a diagonal matrix by multiplying with an orthogonal matrix A = QQ−1 = QQT
(C.24)
and every Hermitian matrix can be transformed into a diagonal matrix by multiplying with a unitary matrix: (C.25) A = UU−1 = UUH . Owing to definitions C.1.1 and C.1.9, the determinant of A becomes det(A) = det(U) det() det(U−1 ) = det().
(C.26)
Definition C.1.14 (Square Root of a Matrix) An N × M matrix B is called square root of an N × N matrix A, if (C.27) BBH = A holds. From
H H H H = B B =A AH = BBH
(C.28)
we see that A is always Hermitian and from definition C.1.6, it follows that the rank of A is rank(A) = rank(BBH ) = rank(B).
(C.29)
Definition C.1.15 (Positive Semidefinite Matrix) A Hermitian N × N matrix A is called positive semidefinite, PSD if (C.30) xH Ax ≥ 0 holds for each vector x ∈ CN . The following rules are valid: • A matrix A is positive semidefinite, if its square root B according to definition C.1.14 exists, so that it is always Hermitian. • A Hermitian matrix is positive semidefinite if and only if all eigenvalues are real and nonnegative, so that λi ≥ 0 for 1 ≤ i ≤ N holds.
LINEAR ALGEBRA
341
u x
−2uuH x
y = x
Figure C.1 Illustration of Householder reflection
C.2 Householder Reflections and Givens Rotation Householder Reflections Householder reflections are used to reflect a vector x at a plain surface or line onto a vector y of the same length by multiplying with a unitary matrix . If x and y are column vectors with x = y, we obtain y = · x with the unitary matrix = IN − (1 + w) · uuH and u=
x−y x − y
and
w=
(C.31) xH u . uH x
(C.32)
In the real-valued case, w = 1 holds and (C.31) becomes = IN − 2 · uuH .
(C.33)
The reflection is graphically illustrated in Figure C.1 for real-valued vectors. The vector uuH x is a projection of x onto the vector u. Subtracting this projection twice from the vector x results in a reflection at the line perpendicular to u. As a special example, the Householder reflection can be used to force certain elements of a matrix to zero. This is applied in the post-sorting algorithm (PSA) in Section 5.4. In T this case, the target vector becomes y = 0 x , that is, the last element equals the norm of x, while the remaining part of y is zero. In the same way, Householder reflections can be used to QL decompose an M × N matrix A with M ≥ N as is shown with the pseudo code in Table C.1. If a row vector x has to be reflected instead of a column vector, w and become = IN − (1 + w) · uH u The reflection is performed by y = x · .
and
w=
u xH . x uH
(C.34)
342
LINEAR ALGEBRA Table C.1 Pseudo code for QL decomposition via Householder reflections step
task Initialize with L = A and Q = IM for k = N, . . . , 1 x = L[1 : M − N + k, k] T y = 0 x calculate u, w and L[1 : M − N + k, 1 : k] = · L[1 : M − N + k, 1 : k] Q[:, 1 : M − N + k] = Q[:, 1 : M − N + k] · H end
(1) (2) (3) (4) (5) (6) (7) (8)
Givens Rotation Let G(i, k, θ ) be an N × N identity matrix except for the elements G∗i,i = Gk,k = cos θ = α and −G∗i,k = Gk,i = sin θ = β. Hence, G(i, k, θ ) has the form 1 G(i, k, θ ) =
..
. · · · −β .. .. . . · · · α∗
α .. .
β∗
..
.
(C.35)
1 it is unitary and describes a rotation by θ in the N -dimensional vector space. If the angle θ is chosen appropriately, G(i, k, θ ) can be used to force the ith element of a vector to zero. Considering an arbitrary column vector x = [x1 , . . . , xN ]T , we choose α = cos θ = 8
xk |xi
|2
+ |xk
and
|2
β = sin θ = 8
xi |xi
|2
+ |xk |2
,
(C.36)
and obtain the new vector y = G(i, k, θ )x. 1
..
. √
xk |xi |2 +|xk |2
.. .
√
xi∗ |xi |2 +|xk |2
··· ..
√
.. .
.
···
−xi |xi |2 +|xk |2
√
xk∗ |xi |2 +|xk |2
..
.
x1 x1 .. .. . . xi 0 .. .. · . = . . 8 xk |xi |2 + |xk |2 . .. .. . xN xN 1
LINEAR ALGEBRA
343 8
Obviously, x and y are identical except for two elements: yi = 0 and yk = |xi |2 + |xk |2 . The Givens rotation can also be used to perform QL decompositions similar to the application of Householder reflections.
C.3 LLL Lattice Reduction This appendix describes the LLL algorithm proposed by Lenstra, Lenstra and Lov´asz (Lenstra et al. 1982) for the reduction of a lattice basis. The LLL algorithm does not deliver the optimum solution, that is, the minimum basis, but performs pretty well for the considered multiple antenna systems and has a moderate computational complexity. Remember that all vectors and matrices have been defined with real elements. This doubles their sizes so that the channel matrix has m = 2NT columns and n = 2NR rows. First, we have to define a reduced lattice. Definition C.3.1 (Reduced Lattice by Lenstra-Lenstra-Lov´asz) A basis Hred with a QL decomposition Hred = Qred · Lred is called LLL reduced with parameter δ, 1/4 < δ ≤ 1, if . . . . .L,k . ≤ 1 · .L, . for 1 ≤ k < ≤ m 2
(C.37)
δL2k+1,k+1 ≤ L2k,k + L2k+1,k for 1 ≤ k ≤ m − 1
(C.38)
and hold. Only if (C.37) is fulfilled, that is, the diagonal elements of the lower triangular matrix Lred are at least twice as large as the off-diagonal elements of the same row, the basis is called size reduced. This condition guarantees that no column in Q has a significant projection onto another column. Therefore, (C.37) coincides with the basic explanation given in Section 6.3. If it is violated for a pair (, k), we subtract an integer multiple of the th column from the kth one such that (C.37) is fulfilled. However, (C.37) alone does not guarantee a minimum basis of the lattice. This effect will be illustrated by a small example at the end of this section. Additionally, the columns have to be sorted according to their lengths, that is, we have to start from right to left with the shortest column and end with the largest column. This condition coincides with the sorted QL decomposition that also starts from the right-hand side with the shortest column. Since column exchanges represent the main effort of the lattice-reduction algorithm, a pre-sorted QL decomposition can greatly reduce the computational costs (W¨ubben et al. 2004a,b). Condition (C.38) of the LLL algorithm just approximates the correct sorting because the lengths of columns are only compared on the basis of a small 2 × 2 submatrix as depicted in Figure C.2. The square of the diagonal element Lk+1,k+1 weighted with δ must be smaller than the sum of L2k,k and L2k+1,k . The parameter δ affects the quality of the reduced basis and is generally chosen as δ = 3/4 (Lenstra et al. 1982). If condition (C.38) is not fulfilled, columns have to be exchanged. The consideration of only small submatrices reduces the computational costs at the expense of a degraded performance especially for large matrices. Therefore, the LLL algorithm shows a limited performance in code division multiple access
344
LINEAR ALGEBRA L1,1 L2,1 L3,1 L4,1 L5,1
L2,2 L3,2 L4,2 L5,2
L3,3 L4,3 L5,3
L4,4 L5,4
L5,5
Figure C.2 LLL algorithm for lattice reduction (CDMA) systems although the general concept of using a reduced basis of a lattice may lead to encouraging results. In conclusion, we have to first perform a QL decomposition of the channel matrix H or the extended version H. Using the sorted QL decomposition reduces the computational costs of the subsequent lattice reduction remarkably (W¨ubben et al. 2004a,b) because the number of required column exchanges is decreased. Afterwards, the LLL algorithm determines a new reduced basis Hred of the lattice. The whole algorithm is described via a pseudo code in Table C.2 (Fincke and Pohst 1985; Lenstra et al. 1982). Within the algorithm, A[a : b, c : d] denotes the submatrix of A with elements from rows a, . . . , b and columns c, . . . , d. With x we denote the integer closest to x. The algorithm processes the matrices columnwise. In steps (3–10), the size is reduced by subtracting the integer multiples of already processed columns to the right (column index ) from the column k under consideration. Steps (11–16) perform the sorting. If condition (C.38) is violated, two columns are exchanged and the triangular structure of L is restored by the application of a Givens rotation. Example for Lattice Reduction In order to illustrate the lattice reduction, look at the example of Section 6.3.4 and we consider the 2 × 2 channel matrix H = 21 22 . Obviously, its columns are not orthogonal and we can probably find a reduced basis. A QL decomposition leads to H = QL with * * + + 1 1 1 1 1 0 Q= √ · and L= √ · . 2 −1 1 2 3 4 The transformation matrix T is initialized as identity matrix T = I2 . Now,G we Hstart with the size reduction and look at the elements L2,2 and L2,1 of L. Since µ = 3/4 = 1, we have to subtract the second columns of L and T from the corresponding first columns and obtain * + * + 1 1 0 1 0 and T= Lred = √ · . −1 1 2 −1 4 After steps (3–10) in Table C.2, we obtain the intermediate result * + 0 2 Hred = HT = QLred = . −1 2 Obviously, a size reduction does not lead to the minimum basis already given in Section 6.3.4 because the second column contains integer multiples of the first column. Therefore,
LINEAR ALGEBRA
345
Table C.2 LLL Lattice-Reduction Algorithm (Lenstra et al. 1982) step (1) (2) (3) (4) (5) (6) (7)
task Initialization: Qred := Q, Lred := L, T := P k =m−1 while k ≥ 1 for = kG + 1, . . . , m H µ = Lred [, k]/Lred [, ] if µ = 0 Lred [ : m, k] = Lred [ : m, k] − µ Lred [ : m, ] T[:, k] = T[:, k] − µ T[:, ] end
(8) (9) (10) (11)
if δ Lred [k + 1, k + 1]2 > Lred [k, k]2 + Lred [k + 1, k]2 Exchange columns k and k + 1 in Lred and T Calculate Givens rotation matrix such that element Lred [k, k + 1] becomes zero: * + α = Lred [k+1,k+1] Lred [k : k + 1, k + 1] α −β = with β α β = Lred [k,k+1] Lred [k : k + 1, k + 1] Lred [k : k + 1, 1 : k + 1] = · Lred [k : k + 1, 1 : k + 1] Qred [:, k : k + 1] = Qred [:, k : k + 1] · T k = min{k + 1, m − 1} else k := k − 1 end
(12) (13)
(14) (15) (16) (17) (18) (19) (20)
end
INPUT: Q, L, P (default: P = Im ), OUTPUT: Qred , Lred , T
it is essential to proceed with the sorting part of the algorithm. Condition (C.38) is violated on account of & ' ' ' & & 4 2 3 1 2 3 2 ! · √ =6> √ + √ =5 4 2 2 2 and so the columns in Lred have to be exchanged. This yields * + * + 1 0 1 0 1 Lred = √ · and T= . 1 −1 2 4 −1 Next, the triangular structure of Lred has to be restored by applying the Givens rotation. With * + 1 −1 −1 , = √ · 1 −1 2 we obtain * * + + −2 0 −1 0 Lred = · Lred = and Qred = Q · T = . −2 1 0 −1
346
LINEAR ALGEBRA
The next intermediate result is
*
Hred = Hred T = Qred Lred =
+ 2 0 2 1
which still Gdoes not H represent the minimum basis. We use Lred for a new length reduction. With µ = − 2/1 = −2, this leads to * + * + −2 0 2 1 and T= . Lred = 0 1 −1 −1 We finally obtain the reduced basis Hred = HT = Qred Lred =
* + 2 0 . 0 −1
Bibliography 3GPP 1999 Physical Channels and Mapping of Transport Channels onto Physical Channels (FDD) 3rd Generation Partnership Project, Technical Specification Group Radio Access Network, TS 25.211, http://www.3gpp.org/ftp/Specs/html-info/25-series.htm. 3GPP 2005a Multiple-Input Multiple Output in UTRA 3rd Generation Partnership Project, Technical Specification Group Radio Access Network, TS25.876, http://www.3gpp.org/ftp/Specs/htmlinfo/25-series.htm. 3GPP 2005b User Equipment (UE) radio Transmission and Reception (FDD) 3rd Generation Partnership Project, Technical Specification Group Radio Access Network, TS25.101, http://www.3gpp.org/ftp/Specs/html-info/25-series.htm. Agrell E, Eriksson T, Vardy A and Zeger K 2002 Closest Point Search in Lattices. IEEE Transactions on Information Theory 48(8), 2201–2214. Al-Dhahir N 2001 FIR Channel-Shortening Equalizers for MIMO ISI Channels. IEEE Transactions on Communications 49(2), 213–218. Al-Dhahir N and Sayed A 2000 The Finite-Length Multiple-Input Multiple-Output MMSE-DFE. IEEE Transactions on Signal Processing 48(10), 2921–2936. Alamouti S 1998 A Simple Transmit Diversity Technique for Wireless Communications. IEEE Journal on Selected Areas in Communications 16(8), 1451–1458. Alexander P, Reed M, Asenstorfer J and Schlegel C 1999 Iterative Multiuser Interference Reduction: Turbo CDMA. IEEE Transactions on Communications 47(7), 1008–1014. Bahl L, Cocke J, Jelinek F and Raviv J 1972 Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. International Symposium on Information Theory, Asilomar, p. 90, Pacific Grove, CA, USA. Bahl L, Cocke J, Jelinek F and Raviv J 1974 Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. IEEE Transactions on Information Theory 20, 248–287. Barbulescu A and Pietrobon S 1995 Interleaver Design for Three Dimensional Turbo Codes. IEEE Proceedings of International Symposium on Information Theory, p. 37, Whistler, B.C., Canada. B¨aro S, Bauch G and Hansmann A 2000a Improved Codes for Space-Time Trellis Coded Modulation. IEEE Communications Letters 4(1), 20–22. B¨aro S, Bauch G and Hansmann A 2000b New Trellis Codes for Space-Sime Coded Modulation. Source and Channel Coding, ITG Conference, pp. 147–150, Munich, Germany. Bauch G 1999 Concatenation of Space-Time Block Codes and Turbo-TCM. IEEE Proceedings of International Conference on Communications, pp. 1202–1206, Vancouver, WA, British Columbia, Canada. Bauch G and Franz V 1998 Iterative Equalization and Decoding for the GSM-System. IEEE Proceedings of Vehicular Technology Conference, Ottawa. Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
348
BIBLIOGRAPHY
Bauch G, Hagenauer J and Seshadri N 2000 Turbo-TCM and Transmit Antenna Diversity in Multipath Fading Channels. Proceedings of International Symposium on Turbo Codes, Brest, France. Bell T, Cleary J and Whitten I 1990 Text Compression, Prentice Hall, Englewood Cliffs, NJ. Benedetto S and Biglieri E 1999 Principles of Digital Transmission with Wireless Applications. Kluwer-Academic/Plenum Publishers, New York. Benedetto S, Divsalar D, Montorsi G and Pollara F 1998 Serial Concatenation of Interleaved Codes: Performance Analysis, Design, and Iterative Decoding. IEEE Transactions on Information Theory 44(3), 909–926. Benedetto S and Montorsi G 1996a Design of Parallel Concatenated Convolutional Codes. IEEE Transactions on Communications 44(5), 591–600. Benedetto S and Montorsi G 1996b Unveiling Turbo-Codes: Some Results on Parallel Concatenated Coding Schemes. IEEE Transactions on Information Theory 42(2), 409–428. Benedetto S, Montorsi G, Divsalar D and Pollara F 1996 Serial Concatenation of Interleaved Codes: Performance Analysis, Design, and Iterative Decoding. TDA Report, Jet Propulsion Laboratory (California Institute of Technology) http://tmo.jpl.nasa.gov/progress report/42-126/126D.pdf. ¨ Benthin M 1996 Vergleich koh¨arenter und inkoh¨arenter Codemultiplex-Ubertragungskonzepte f¨ur zellulare Mobilfunksysteme. PhD thesis, Universit¨at Hamburg-Harburg. Berrou C and Glavieux A 1993 Turbo-Codes: General Principles Applications. Proceedings of the 6th Tirrenia International Workshop on Digital Communications, pp. 215–226, Tirrenia, Italy. Berrou C, Glavieux A and Thitimajshima P 1993 Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes (1). IEEE Proceedings of International Conference on Communications, pp. 1064–1070, Geneva, Switzerland. Biglieri E, Divsalar D, McLane P and Simon M 1991 Introduction to Trellis-Coded Modulation with Applications. Macmillan Publishing Company, New York. Bingham J 1990 Multicarrier Modulation for Data Transmission: An Idea whose time has Come. IEEE Communications Magazine, 28(5), 5–14. Blahut R 1983 Theory and Practice of Error Control Codes. Addison-Wesley, Reading, MA. Blogh J and Hanzo L 2002 Third-Generation Systems and Intelligent Wireless Networking: Smart Antennas and Adaptive Modulation. IEEE-Press, John Wiley. Blum R 2000 Analytical Tools for the Design of Space-Time Convolutional Codes. Conference on Information Sciences and Systems, Princeton University, Princeton, NJ. B¨ohnke R, K¨uhn V and Kammeyer KD 2004a Efficient Near Maximum-Likelihood Decoding of Multistratum Space-Time Codes. IEEE Semiannual Vehicular Technology Conference (VTC2004Fall), Los Angeles, CA. B¨ohnke R, K¨uhn V and Kammeyer KD 2004b Multistratum Space-Time Codes for the Asynchronous Uplink of MIMO-CDMA Systems. International Symposium on Spread Spectrum Techniques and Applications (ISSSTA’04), Sydney, Australia. B¨ohnke R, K¨uhn V and Kammeyer KD 2004c Quasi-Orthogonal Multistratum Space-Time Codes. IEEE Global Conference on Communications (Globecom’04), Dallas, TX. B¨ohnke R, W¨ubben D, K¨uhn V and Kammeyer KD 2003 Reduced Complexity MMSE Detection for BLAST Architectures. IEEE Proceedings of Global Conference on Telecommunications, San Francisco, CA. Bossert M 1999 Channel Coding for Telecommunications. John Wiley & Sons, New York. Bossert M, Gabidulin E and Lusina P 2000 Space-Time Codes Based on Hadamard Matrices. IEEE Proceedings of International Symposium on Information Theory, p. 283, Sorrento, Italy. Bossert M, Gabidulin E and Lusina P 2002 Space-time Codes Based on Gaussian Integers. IEEE Proceedings of International Symposium on Information Theory, p. 273, Lausanne, Switzerland.
BIBLIOGRAPHY
349
Bronstein I, Semendjajew K, Musiol G and M¨uhlig H 2000 Taschenbuch der Mathematik, 5th edn. Harri Deutsch Verlag. Bruck S, Sorger U, Gligorevic S and Stolte N 2000 Interleaving for Outer Convolutional Codes in DS-CDMA Systems. IEEE Transactions on Communications 48(7), 1100–1107. Brunner C, Utschick W and Nossek J 2001 Exploiting the Short-Term and Long-Term Channel Properties in Space and Time: Eigenbeamforming Concepts for the BS in WCDMA. European Transactions on Telecommunications 12(5), 365–378. Caire G, M¨uller R and Tanaka T 2004 Iterative Multiuser Joint Decoding: Optimum Power Allocation and Low-Complexity Implementation. IEEE Transactions on Information Theory 50(9), 1950– 1973. Calderbank A, Forney G and Vard A 1999 Minimal Tail-Biting Trellises: The Golay Code and More. IEEE Transactions on Information Theory 45(5), 1435–1455. Chung S, Forney G, Richardson T and Urbanke R 2001 On the Design of Low Density Parity Check Codes within 0.0045 dB of the Shannon Limit. IEEE Communication Letters 5, 58–60. Clark G and Cain J 1981 Error-Correcting Coding for Digital Communications. Plenum Press, New York. Cooper G and McGillem C 1988 Modern Communications and Spread Spectrum, 2nd edn. McGraw-Hill. COST 1989 Final Report COST 207, Digital Land Mobile Radio Communications, final report, Commission of the European Communities, Luxembourg, 1989. Costello D, Massey P, Collins O and Takeshita O 2000 Some Reflections on the Mythology of Turbo Codes. 3rd ITG Conference Source and Channel Coding, pp. 157–160, Munich, Germany. Cover T and Thomas J 1991 Elements of Information Theory. Wiley & Sons, New York. Craig J 1991 A New, Simple and Exact Result for Calculating the Probability of Error for TwoDimensional Signal Constellations. IEEE MILCOMM’91, pp. 25.5.1–25.5.5, Boston, MA. Dahlman E, Gudmundson B, Nilsson M and Sk¨old J 1998 UMTS/IMT-2000 Based on Wideband CDMA. IEEE Communications Magazine, 36(9), 70–80. Damen M, Gamal H and Caire G 2003 On Maximum Likelihood Detection and the Search for the Closest Lattice Point. IEEE Transactions on Information Theory 49(10), 2389–2402. DaSilva V and Sousa E 1993 Performance of Orthogonal CDMA for Quasi-synchronous Communication Systems. IEEE Proceedings of International Conference on Universal Personal Communications, pp. 995–999, Ottawa, Ontario, Canada. Dekorsy A 2000 Kanalcodierungskonzepte f¨ur Mehrtr¨ager-Codemultiplex in Mobilfunksystemen, Forschungsberichte aus dem Arbeitsbereich Nachrichtentechnik der Universit¨at Bremen, Nr. 6. Shaker-Verlag, HRSG, K.-D. Kammeyer. Dekorsy A, K¨uhn V and Kammeyer KD 1999a Exploiting Time and Frequency Diversity by Iterative Decoding in OFDM-CDMA Systems. IEEE Proceedings of Global Conference on Telecommunications, Rio de Janeiro, Brazil. Dekorsy A, K¨uhn V and Kammeyer KD 1999b Iterative Decoding with M-ary Orthogonal Walsh Modulation in OFDM-CDMA Systems. Procedings of 1st International OFDM-Workshop, HamburgHarburg, Germany. Dekorsy A, K¨uhn V and Kammeyer KD 2003 Low Rate Channel Coding with Complex Valued Block Codes. IEEE Transactions on Communications 51(5), 800–809. Divsalar D, Dolinar S and Pollara F 2001 Iterative Turbo Decoder Analysis Based on Density Evolution. IEEE Journal on Selected Areas in Communications 19(5), 891–907. Douillard C, J´ez´equel M and Berrou C 1995 Iterative Correction of Intersymbol Interference: TurboEqualization. European Transactions on Telecommunications, 6, 507–511.
350
BIBLIOGRAPHY
ETSI 2000 Broadband Radio Access Networks (BRAN); HIPERLAN type 2; System Overview. Technical Report TS 101 683, European Telecommunications Standards Institute (ETSI), Sophia Antipolis, France. ETSI 2001 Broadband Radio Access Networks (BRAN); HIPERLAN type 2; Physical (PHY) Layer. Norme ETSI, Document RTS0023003-R2, Sophia-Antipolis, France. Fazel K and Kaiser S 2003 Multi Carrier and Spread Spectrum Systems. Wiley. Fazel K and Papke L 1993 On the Performance of Convolutionally-Coded CDMA/OFDM for Mobile Radio Communications Systems. IEEE Proceedings IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, pp. D.3.2.1–D.3.2.5, Yokohama, Japan. Feuers¨anger M and K¨uhn V 2001 Combating Near Far Effects of Linear MMSE Multiuser Detection in Coded OFDM-CDMA. Third International Workshop on Multi-Carrier Spread-Spectrum (MC-SS 2001) & Related Topics, Oberpfaffenhofen, Germany. Fincke U and Pohst M 1985 Improved Methods for Calculating Vectors of Short Length in a Lattice, Including a Complexity Analysis. Mathematics of Computation 44, 463–471. Fischer R 2002 Precoding and Signal Shaping for Digital Transmission. John Wiley & Sons. Fischer R and Huber J 1996 A New Loading Algorithm for Discrete Multitone Transmission. IEEE Global Conference on Telecommunications (Globecom’96), pp. 724,728, London, England. Fischer R, Huber J and Wachsmann U 1998 On the Combination of Multilevel Coding and Signal Shaping. 2nd ITG Conference Source and Channel Coding, pp. 273–278, Munich, Germany. Fischer S, K¨uhn V and Kammeyer KD 2001b Satellite Diversity for a LEO Satellite System with CDMA Technology. ASSI Satellite Communication Letter 1(1), 5–13. Forney G 1966 Concatenated Codes. The M.I.T. Press, Cambridge, MA. Forney G 1972 Maximum-Likelihood Sequence Estimation for Digital Sequences in the Presence of Intersymbol Interference. IEEE Transactions on Information Theory IT-18, 363–378. Forney G 2001 Codes on Graphs: Normal Realizations. IEEE Transactions on Information Theory 47, 520–548. Foschini G 1996 Layered Space-Time Architecture for Wireless Communication in a Fading Environment when Using Multiple Antennas. Bell Laboratories Technical Journal 1(2), 41–59. Foschini G and Gans M 1998 On Limits of Wireless Communications in a Fading Environment when Using Multiple Antennas. Wireless Personal Communications 6(3), 311–335. Foschini G, Golden G, Valencuela A and Wolniansky P 1999 Simplified Processing for High Spectral Efficiency Wireless Communications Employing Multi-Element Arrays. IEEE Journal on Selected Areas in Communications 17(11), 1841–1852. Frenger P, Orten P and Ottosson T 1998a Code-Spread CDMA using Low-Rate Convolutional Codes. IEEE Proceedings of International Symposium on Spread Spectrum Techniques and Applications, pp. 374–378, Sun City, South Africa. Frenger P, Orten P, Ottosson T and Svensson A 1998b Multi-Rate Convolutional Codes. Technical Report 21, G¨oteborg, Chalmers University of Technology, Gothenburg, Sweden. Freudenberger J, Bossert M, Zyablov V and Shavgulidze S 2001 Woven Codes with Outer Warp: Variations, Design, and Distance Properties. IEEE Journal on Selected Areas in Communications 19(5), 813–824. Friedrichs B 1996 Kanalcodierung. Springer Verlag, Berlin, Germany. Gabidulin E, Bossert M and Lusina P 2000 Space-Time Codes Based on Rank Codes. IEEE Proceedings of International Symposium on Information Theory, p. 284, Sorrento, Italy. Gallager R 1963 Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963. Gilhousen K, Jacobs I, Padovani R, Viterbi A, Weaver L and Wheatley C 1991 On the Capacity of a Cellular CDMA System. IEEE Transactions on Vehicular Technology 40, 57–62.
BIBLIOGRAPHY
351
Glisic S and Vucetic B 1997 Spread Spectrum CDMA Systems for Wireless Communications. Artec House Publishers, Boston, MA. Gold R 1967 Optimal Binary Sequences for Spread Spectrum Multiplexing. IEEE Transactions on Information Theory IT-13, 619–621. Golden G, Foschini G, Wolniansky P and Valenzuela R 1998 V-BLAST: A High Capacity Space-Time Architecture for the Rich-Scattering Wireless Channel. Proceedings of the International Symposium on Advanced Radio Technologies, Boulder, Colorado. Golub G and van Loan C 1996 Matrix Computations, 3rd edn. The John Hopkins University Press, London, England. Gradshteyn IS and Ryzhik I 2000 Table of Integrals, Series and Products, 6th edn. Academic Press, San Diego, CA. Grant A and Schlegel C 2001 Iterative Implementations for Linear Multiuser Detectors. IEEE Transactions on Communications 49(10), 1824–1834. Hagenauer J 1988 Rate-Compatible Punctured Convolutional Codes (RCPC-Codes and their Applications). IEEE Transactions on Communications 36(4), 389–400. Hagenauer J 1989 Unequal Error Protection (UEP) for Statistically Time-Varying Channels. Proceedings ITG-Conference ‘Stochastic Models and Methods in Information Technology’. ITG-Bericht 107, 253–262. Hagenauer J 1996a Forward Error Correcting for CDMA Systems. IEEE Proceedings of International Symposium on Spread Spectrum Techniques and Applications, pp. 566–569, Mainz, Germany. Hagenauer J 1996b Source-Controlled Channel Decoding. IEEE Transactions on Communications 43(9), 2449–2457. Hagenauer J and H¨oher P 1989 A Viterbi Algorithm with Soft-Decision Output and its Applications. IEEE Proceedings of Global Conference on Telecommunications, pp. 47.1.1–47.1.7, Dallas, Texas, USA. Hanzo L, Liew T and Yeap B 2002a Turbo Coding, Turbo Equalisation and Space-Time Coding. IEEE-Press, John Wiley. Hanzo L, M¨unster M, Choi B and Keller T 2003a OFDM and MC-CDMA for Broadband Multi-user Communications, WLANs and Broadcasting. IEEE-Press, John Wiley. Hanzo L, Webb W and Keller T 2000 Single- and Multi-carrier Quadrature Amplitude Modulation – Principles and Applications for Personal Communications WLANs Broadcasting, 6th edn. IEEE-Press, John Wiley. Hanzo L, Wong C and Yee M 2002b Adaptive Wireless Transceivers: Turbo-Coded, Space-Time Coded TDMA, CDMA and OFDM Systems. IEEE-Press, John Wiley. Hanzo L, Yang LL, Kuan EL and Yen K 2003b Single- and Multi-Carrier DS-CDMA: Multi-User Detection, Space-Time Spreading, Synchronisation, Networking and Standards. IEEE-Press, John Wiley. Harmuth HF 1964 Die Orthogonalteilung als Verallgemeinerung der Zeit-und Frequenzteilung. Archiv ¨ elektronische Ubertragung 18, 43–50. Harmuth HF 1971 Transmission of Information by Orthogonal Functions. Springer-Verlag, New York. Hassibi B 2000 An Efficient Square-Root Algorithm for BLAST. IEEE Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 5–9, Istanbul, Turkey. Hassibi B and Hochwald B 2000 High Rate Codes That Are Linear in Space and Time. Allerton Conference on Communications, Control, and Computing, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA. Hassibi B and Hochwald B 2001 Linear Dispersion Codes. IEEE Proceedings of International Symposium on Information Theory, Washington, DC.
352
BIBLIOGRAPHY
Hassibi B and Hochwald B 2002 High Rate Codes That Are Linear in Space and Time. IEEE Transactions on Information Theory 48(7), 1804–1824. Heath R and Paulraj A 2002 Linear Dispersion Codes for MIMO Systems Based on Frame Theory. IEEE Transactions on Signal Processing 50(10), 2429–2441. Heller J and Jacobs I 1971 Viterbi Decoding for Satellite and Space Communication. IEEE Transactions on Communications 19(5), 835–848. Hiperlan2 1999 HiperLAN2 Global Forum http://www.hiperlan2.com/. Hochwald B and ten Brink S 2003 Achieving Near-Capacity on a Multiple-Antenna Channel. IEEE Transactions on Communications Technology 51(3), 389–399. Hoeg W and Lauterbach T 2001 Digital Audio Broadcasting. Wiley. H¨oher P 1992 A Statistical Discrete-Time Model for WSSUS Multipath Channels. IEEE Transactions on Vehicular Technology 41(4), 461–468. H¨oher P and Lodge J 1999 Turbo DPSK: Iterative Differential PSK Demodulation and Channel Decoding. IEEE Transactions on Communications 47(6), 837–843. Holma H and Toskala A 2004 WCDMA for UMTS 3rd edn. Wiley. Honig M and Tsatsanis M 2000 Multiuser CDMA Receivers. IEEE Signal Processing Magazine, 17(3), 49–61. Horn R and Johnson C 1985 Matrix Analysis. Cambridge University Press. H¨ubner A, Truhachev D and Zigangirov K 2003 On cycle-based permutor designs for serially concatenated convolutional codes. IEEE Proceedings of International Symposium on Information Theory, p. 279, Yokohama, Japan. H¨ubner A, Truhachev D and Zigangirov K 2004 On Permutor Designs Based on Cycles for Serially Concatenated Convolutional Codes. IEEE Transactions on Communications 52(9), 1494– 1503. Hughes-Hartogs D 1989 Ensemble Modem Structure for Imperfect Transmission Media. U.S. Patents No. 4,679,227 and 4,731,816 and 4,833,706, Orlando, FL. H¨uttinger S, Huber J, Fischer R and Johannesson R 2002 Soft-Output-Decoding: Some Aspects from Information Theory. 4th ITG Conference on Source and Channel Coding, pp. 81–89, Berlin, Germany. Johannesson R and Zigangirov K 1998 Fundamentals of Convolutional Coding. IEEE Digital & Mobile Communications, Piscataway, NJ. Jordan F 2000 Ein Beitrag zur Optimierung von Empf¨angern unter GSM-Randbedingungen. PhD thesis, Universit¨at Bremen. Jordan R, Host R and Johannesson R 2001 On Interleaver Design for Serially Concatenated Convolutional Codes. IEEE Proceedings of International Symposium on Information Theory, p. 212, Washington DC, USA. Jordan R, Johannesson R and Bossert M 2004 On Nested Convolutional Codes and Their Application to Woven Codes. IEEE Transactions on Information Theory 50(2), 380–384. Kaiser S 1998 Multi-Carrier CDMA Mobile Radio Systems – Analysis and Optimization of Detection, Decoding and Channel Estimation. VDI. Kammeyer KD 2004 Nachrichten¨ubertragung 3rd edn. Teubner, Stuttgart, Germany. Kammeyer KD and K¨uhn V 2001 Matlab in der Nachrichtentechnik. J. Schlembach Fachbuchverlag, Weil der Stadt, Germany. Kammeyer KD, Schulze H, Tuisel U and Bochmann H 1992 Digital Multicarrier-Transmission of Audio Signals Over Mobile Radio Channels. European Transactions on Telecommunications 3(3), 23–33. Klein A 1996 Multi-User Detection of CDMA Signals – Algorithms and their Application to Cellular Mobile Radio. PhD thesis, Universit¨at Kaiserslautern.
BIBLIOGRAPHY
353
Kolb H 1981 Untersuchungen u¨ ber ein Digitales Mehrfrequenzverfahren zur Daten¨ubertragung. PhD thesis, Universit¨at Erlangen. Krongold B, Ramchandran K and Jones D 1999 An Efficient Algorithm for Optimum Margin Maximization in Multicarrier Communication Systems. IEEE Proceedings of Global Conference on Telecommunications, pp. 899–903, Rio de Janeiro, Brazil. Krongold B, Ramchandran K and Jones D 2000 Computationally Efficient Optimal Power Allocation Algorithms for Multicarrier Communication Systems. IEEE Transactions on Communications 48(1), 23–27. Kschischang F, Frey B and Loeliger HA 2001 Factor Graphs and the Sum Product Algorithm. IEEE Transactions on Information Theory 47(2), 498–519. K¨uhn V 1998a Turbo-Codes and Turbo-Coded Modulation in CDMA Mobile Radio Systems for Short Frame Transmission. IEEE Proceedings of International Symposium on Spread Spectrum Techniques and Applications, vol. 2, pp. 364–368, Sun City, South Africa. K¨uhn V 1998b Turbo-Codes und Turbo-Codierte Modulation in Codemultiplex-Mobilfunksystemen. PhD thesis, Department of Telecommunications, University of Paderborn, Aachen, Germany. K¨uhn V 1999 Evaluating the Performance of Turbo-Codes and Turbo-Coded Modulation in a DSCDMA Environment. IEEE Journal on Selected Areas in Communications 17(12), 2139–2147. K¨uhn V 2001a Combined Linear and Nonlinear Multi-User Detection for Coded OFDM-CDMA. International Symposium on Theoretical Electrical Engineering (ISTET) 2001, Linz, Austraila. K¨uhn V 2001c Parallel Interference Cancellation in Coded DS-CDMA Systems. COST 262 Workshop on Multiuser Detection in Spread Spectrum Communications, pp. 31–38, Schloss Reisensburg, Germany. K¨uhn V 2003 Iterative-Interference Cancellation and Channel Estimation for Coded OFDM-CDMA. International Conference on Communications (ICC03), Anchorage, AK. K¨uhn V 2004 Analysis of Iterative Multi-User Detection Schemes with EXIT Charts. International Symposium on Spread Spectrum Techniques and Applications (ISSSTA’04), Sydney, Australia. K¨uhn V, B¨ohnke R and Kammeyer KD 2002 Multi-User Detection in Multicarrier-CDMA Systems. elektrotechnik und informationstechnik Journal 11, 395–402. K¨uhn V, Dekorsy A and Kammeyer KD 2000a Channel Coding Aspects in an OFDM-CDMA System. 3-rd ITG Conference Source and Channel Coding, pp. 31–36, Munich, Germany. K¨uhn V, Dekorsy A and Kammeyer KD 2000b Low Rate Channel Coding for CDMA Systems. ¨ 54(6), 353–363. International Journal of Electronics and Communications (AEU) K¨uhn V and Kammeyer KD 2004 Multiple Antennas in Mobile Communications: Concepts and Algorithms. International Symposium on Electromagnetic Theory (URSI 2004), Pisa, Italy. Laiho J, Wacker A and Novosad Te 2002 Radio Network Planning and Optimisation for UMTS. Wiley. Lampe A and Huber J 2002 Iterative Interference Cancellation for DS-CDMA Systems with High System Loads Using Reliability-Dependent Feedback. IEEE Transactions on Vehicular Technology 51(3), 445–452. Land I, H¨oher P and Huber J 2004 Bounds on Information Combining for Parity-Check Equations. International Zurich Seminar on Communications (IZS), pp. 68–71, Zurich, Switzerland. Land I, H¨uttinger S, H¨oher P and Huber J 2003 Bounds on Information Combining. International Symposium on Turbo Codes & Related Topics, pp. 39–42, Brest, France. Lenstra A, Lenstra H and Lov`asz L 1982 Factoring Polynomials with Rational Coefficients. Mathematische Annalen 261, 515–534. Li X, Vucetic B and Sato Y 1995 Optimum Soft-Output Detection for Channels with Intersymbol Interference. IEEE Transactions on Information Theory 41(3), 704–713. Liew T and Hanzo L 2002 Space-Time Codes and Concatenated Channel Codes for Wireless Communications. IEEE Proceedings 90(2), 187–219.
354
BIBLIOGRAPHY
Lin S and Costello D 2004 Error Control Coding, 2nd edn. Pearson Prentice-Hall. Luby M, Mitzenmacher M, Shokrollahi A and Spielman D 2001 Efficient Erasure Correction Codes. IEEE Transactions on Information Theory 47, 569–584. Lucas R, Bossert M and Breitbach M 1998 On Iterative Soft-Decision Decoding of Linear Binary Block Codes and Product Codes. IEEE Journal on Selected Areas in Communications 16(2), 276– 296. Lusina P, Gabidulin E and Bossert M 2001 Efficient Decoding of Space-Time Hadamard Codes Using the Hadamard Transform. IPISIT, p. 243. Lusina P, Gabidulin E and Bossert M 2003 Maximum Rank Distance Codes as Space-Time Codes. IEEE Transactions on Information Theory 49(10), 2757–2760. Lusina P, Shavgulidze S and Bossert M 2002 Space Time Block Code Construction Based on Cyclotomic Coset Factorization. IEEE Proceedings of International Symposium on Information Theory, p. 135, Lausanne, Switzerland. MacKay D 1999 Good Error Correcting Codes Based on Very Sparse Matrices. IEEE Transactions on Information Theory 45, 399–432. Massey J 1984 The How and Why of Channel Coding. IEEE Proceedings of International Zurich Seminar on Digital Communications, pp. 67–73, Zurich, Switzerland. Mertins A 1999 Signal Analysis – Wavelets, Filter Banks, Time-Frequency Transforms and Applications. Wiley. Moshavi S 1996 Multi-User Detection for DS-CDMA Communications. IEEE Communications Magazine, 34(10), 124–136. Mouly M and Pautet M 1992 The GSM-System for Mobile Communications. Eigenverlag. M¨uller R 1998 Power and Bandwidth Efficiency of Multiuser Systems with Random Spreading. PhD thesis, Universit¨at Erlangen-N¨urnberg. M¨uller R and Verdu S 2001 Design and Analysis for Low Complexity Interference Mitigation on Vector Channels. IEEE Journal on Selected Areas in Communications 19(8), 1429–1441. Mutti C and Dahlhaus D 2004 Adaptive Loading Procedures for Multiple-Input Multiple-Output OFDM Systems with Perfect Channel State Information. Joint COST Workshop on Antennas and Related System Aspects in Wireless Communications, Chalmers University of Technology, Gothenburg, Sweden. Naguib A, Tarokh V, Seshadri N and Calderbank A 1997 Space-Time Coded Modulation for High Data Rate Wireless Communications. IEEE Proceedings of Global Conference on Telecommunications, vol. 1, pp. 102–109, Phoenix, Arizone, USA. Naguib A, Tarokh V, Seshadri N and Calderbank A 1998 A Space-Time Coding Modem for HighData-Rate Wireless Communications. IEEE Journal on Selected Areas in Communications 16(8), 1459–1478. Nyquist H 1928 Certain Topics in Telegraph Transmission Theory. Transactions of AIEE 47, 617–644. Offer E 1996 Decodierung mit Qualit¨atsinformation bei Verketteten Codiersystemen. Deutsche Forschungsanstalt f¨ur Luft- und Raumfahrt (DLR). VDI Fortschrittsberichte. Ojanper¨a T and Prasad R 1998a An Overview of Air Interface Multiple Access for IMT-2000/UMTS. IEEE Communications Magazine, 36(9), 82–95. Ojanper¨a T and Prasad R 1998b Wideband CDMA for Third Generation Mobile Communications. Artec House Publishers, Boston, MA. Olofsson H and Furusk¨ar A 1998 Aspects of Introducing EDGE in Existing GSM Networks. IEEE Proceedings of International Conference on Universal Personal Communications, pp. 421–426, Florence, Italy. Papoulis A 1965 Probability, Random Variables and Statistic Processes. McGraw-Hill, New York. Peterson W and Weldon E 1972 Error-Correcting Codes. MIT-Press, Cambridge, MA.
BIBLIOGRAPHY
355
Pickholtz R, Milstein L and Schilling D 1991 Spread Spectrum for Mobile Communications. IEEE Transactions on Vehicular Technology 40(2), 313–321. Pickholtz R, Schilling D and Milstein L 1982 Theory of Spread Spectrum Communications – a Tutorial. IEEE Transactions on Communications 30(5), 855–884. Price R and Greene P 1958 A Communication Technique for Multipath Channels. Proceedings IRE, vol. 46, pp. 555–570. Proakis J 2001 Digital Communications 4th edn. McGraw-Hill. Reed M, Schlegel C, Alexander P and Asenstorfer J 1998 Iterative Multiuser Detection for CDMA with FEC: Near-Single-User Performance. IEEE Transactions on Communications 46(12), 1693– 1699. Reimers U 1995 Digitale Fernsehtechnik. Springer-Verlag. Richardson T, Shokrollahi A and Urbanke R 2001 Design of Capacity-Approaching Irregular Codes. IEEE Transactions on Information Theory 47, 619–637. Richardson T and Urbanke R 2001 Efficient Encoding of Low-Density Parity-Codes. IEEE Transactions on Information Theory 47(2), 638–656. Richardson T and Urbanke R 2005 Modern Coding Theory, to be published, http://lthcwww.epfl. ch/mct/index.php. Robertson P, H¨oher P and Villebrun E 1997 Optimal and Sub-Optimal Maximum a Posteriori Algorithms Suitable for Turbo-Decoding. European Transactions on Telecommunications 8(2), 119–125. Robertson P, Villebrun E and Hoeher P 1995 A Comparison of Optimal and Sub-Optimal MAP Decoding Algorithms Operating in the Log-Domain. IEEE Proceedings of International Conference on Communications, pp. 1009–1013, Seattle, WA. Rupf M and Massey J 1994 Optimum Sequences Multisets for Synchronous Code-Division MultipleAccess Channels. IEEE Transactions on Information Theory 40, 1261–1266. Salmasi A and Gilhousen K 1991 On the System Design Aspects of Code Division Multiple Access (CDMA) Applied to Digital Cellular and Personal Communication Networks. IEEE Proceedings of Vehicular Technology Conference pp. 57–62, St. Louis, Missouri, USA. Saltzberg B 1967 Performance of an Efficient Parallel Data Transmission System. IEEE Transactions on Communications Technology COM-15, 805–811. Sarvate D and Pursley M 1980 Cross-correlation Properties of Pseudorandom and Related Sequences. IEEE Proceedings 68(5), 593–619. Schnoor C and Euchner M 1994 Lattice Basis Reduction: Improved Practical Algorithms and Solving Subset Sum Problems. Mathematical Programming 66, 181–191. Schramm P, Andreasson H, Edholm C, Edvardsson N, H¨oo¨ k M, J¨averbring S, M¨uller F and Sk¨old J 1998 Radio Interface Performance of EDGE, A Proposal for Enhanced Data Rates in Existing Digital Cellular Systems. IEEE Proceedings of Vehicular Technology Conference, pp. 1064–1068, Ottawa, Canada. Schramm P and M¨uller R 1999 Spectral Efficiency of CDMA Systems with Linear MMSE Interference Suppression. IEEE Transactions on Communications 47(5), 722–731. Schulze H 1989 Stochastische Modelle und Digitale Simulation von Mobilfunkkan¨alen. Kleinheubacher Berichte Nr. 32, 473–483. Schulze H 1998 M¨oglichkeiten und Grenzen des Mobilempfang von DVB-T. 3. OFDM Fachgespr¨ach, Braunschweig, Germany. Schulze H and L¨uders C 2005 Theory and Applications of OFDM and CDMA. Wiley. Seshadri N, Tarokh V and Calderbank A 1997 Space-Time Codes for Wireless Communication: Code Construction. IEEE Proceedings of Vehicular Technology Conference, vol. 2-A, pp. 637–641, Phoenix, AZ.
356
BIBLIOGRAPHY
Seshadri N and Winters J 1994 Two Signaling Schemes for Improving the Error Performance of Frequency Division (FDD) Transmission Systems Using Transmitter Antenna Diversity. International Journal of Wireless Networks 1(1), 49–60. Sezgin A, W¨ubben D and K¨uhn V 2003a Analysis of Mapping Strategies for Turbo Coded Space-Time Block Codes. IEEE Information Theory Workshop (ITW03), Paris, France. Sezgin A, W¨ubben D, B¨ohnke R and K¨uhn V 2003b On Exit Charts for Space Time Block Codes. IEEE Proceedings of International Symposium on Information Theory, Yokohama, Japan. Shannon C 1948 A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (part 1), 623–656 (part 2). Shokrollahi A 2002 LDPC Codes: An Introduction, online: http://algo.epfl.ch/contents/output/ pubs/ldpc intro.pdf. Simon M and Alouini MS 2000 Digital Communication over Fading Channels. John Wiley, New York. Sloane N and Wyner A 1993 Claude Elwood SHANNON: Collected Papers, 1st edn. IEEE Press. Steele R and Hanzo L 1999 Mobile Radio Communications, 2nd edn. Wiley. Strang G 1988 Linear Algebra and its Applications, 3rd edn. Harcout Brace Jovanovich College Publishers, Orlando, FL. Tanner R 1981 A Recursive Approach to Low Complexity Codes. IEEE Transactions on Information Theory 27, 533–547. Tarokh V, Jafarkhani H and Calderbank A 1999a Space-Time Block Codes from Orthogonal Designs. IEEE Transactions on Information Theory 45(5), 1456–1467. Tarokh V, Jafarkhani H and Calderbank A 1999b Space-Time Block Coding for Wireless Communications: Performance Results. IEEE Journal on Selected Areas in Communications 17(3), 451–460. Tarokh V, Seshadri N and Calderbank A 1997 Space-Time Codes for High Data Rate Wireless Communication: Performance Criteria. IEEE Proceedings of International Conference on Communications, vol. 1, pp. 299–303, Montreal, Canada. Tarokh V, Seshadri N and Calderbank A 1998 Space-Time Codes for High Data Rate Wireless Communication: Performance Criterion and Ccode Construction. IEEE Transactions on Information Theory 44(2), 744–765. Telatar E 1995 Capacity of Multi-antenna Gaussian Channels. ATT-Bell Labs Internal Technical Memo. ten Brink S 2000a Iterative Decoding Trajectories of Parallel Concatenated Codes. IEEE/ITG Conference on Source and Channel Coding, pp. 75–80, Munich, Germany. ten Brink S 2000b Rate One-Half Code for Approaching the Shannon Limit by 0.1 dB. Electronics Letters 36(15), 1293–1294. ten Brink S 2001a Code Characteristic Matching for Iterative Decoding of Serially Concatenated Codes. Annales des Telecommunication, 56(7–8), 394–408. ten Brink S 2001b Code Doping for Triggering Iterative Decoding Convergence. IEEE Proceedings of International Symposium on Information Theory, pp. 235, Washington DC, USA. ten Brink S 2001c Convergence Behavior of Iteratively Decoded Parallel Concatenated Codes. IEEE Transactions on Communications 49(10), 1727–1737. Toskala A, Castro J, Dahlman E, Latva-Aho M and Ojanper¨a T 1998 Frames FMA2 Wideband-CDMA for UMTS. European Transactions on Telecommunications 9(4), 325–335. Tse D and Hanly S 1999 Linear Multiuser Receivers: Effective Interference, Effective Bandwidth and User Capacity. IEEE Transactions on Information Theory 45(2), 641–657. Tse D and Viswanath P 2005 Fundamentals of Wireless Communications. Cambridge University Press, Cambridge, England.
BIBLIOGRAPHY
357
Ungerboeck G 1982 Channel Coding with Multilevel Phase Signaling. IEEE Transactions on Information Theory IT-25, 55–67. Vandendorpe L 1995 Multitone Spread Spectrum Multiple-Access Communications System in a Multipath Ricean Fading Channel. IEEE Transactions on Vehicular Technology 44(2), 327–337. van Wyk D and Linde L 1998 A Turbo Coded DS/CDMA System with Embedded Walsh-Hadamard Codewords. IEEE Proceedings of International Symposium on Spread Spectrum Techniques and Applications, pp. 359–363, Sun City, South Africa. Verdu S 1998 Multiuser Detection. Cambridge University Press, New York. Verdu S and Shamai S 1999 Spectral Efficiency of CDMA with Random Spreading. IEEE Transactions on Information Theory 45(2), 622–640. Viterbi A 1967 Error-Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory IT-13, 260–269. Viterbi A 1990 Very Low Rate Convolutional Codes for Maximum Theoretical Performance of Spread–Spectrum Multiple–Access Channels. IEEE Journal on Selected Areas in Communications 8(4), 641–649. Viterbi A 1995 CDMA Principles of Spread Spectrum Communication. Addison-Wesley Wireless Communication Series, Reading, MA. Viterbi A and Omura J 1979 Principles of Digital Communication and Coding. McGraw-Hill, New York. Wachsmann U, Thielecke J, and Schotten H. 2001 Exploting the Data-Rate Potential of MIMO channels: Multi-Stratum Space-Time Coding” in Proc. IEEE VTC Spring 2001, Rhodes, Greece, May 6–9. Walsh J 1923 A Closed Set of Orthogonal Functions. American Journal of Mathematics 45, 5–24. Weinstein S 1971 Data transmission by frequency division multiplexing using the discrete fourier transform. IEEE Transactions on Communications COM-19, 628–634. Wicker SB 1995 Error Control Systems for Digital Communication and Storage. Prentice-Hall, Englewood Cliffs, New Jersey. Windpassinger C and Fischer R 2003a Low-Complexity Near-Maximum-Likelihood Detection and Precoding for MIMO Systems using Lattice Reduction. IEEE Information Theory Workshop (ITW 2003), Paris, France. Windpassinger C and Fischer R 2003b Optimum and Sub-Optimum Lattice-Reduction-Aided Detection and Precoding for MIMO Communications. Canadian Workshop on Information Theory, pp. 88–91, Waterloo, Canada, Ontario, CA. Wittneben A 1991 Basestation Modulation Diversity for Digital SIMUL-CAST. IEEE Proceedings of Vehicular Technology Conference, pp. 848–853, St. Louis, CA. Wittneben A 1993 A New Bandwidth Efficient Transmit Antenna Modulation Diversity Scheme for Linear Digital Modulation. IEEE Proceedings of International Conference on Communications, pp. 1630–1634, Geneva, Switzerland. Wolf J 1978 Efficient Maximum Likelihood Decoding of Linear Block Codes Using a Trellis. IEEE Transactions on Information Theory IT-24, 76–80. Wolniansky P, Foschini G, Golden G and Valenzuela R 1998 V-BLAST: An Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel. Invited Paper, Proceeding of International Symposium on Signals, Systems and Electronics-98, Pisa, Italy. W¨ubben D 2006 Effiziente Detektionsverfahren f¨ur Multilayer-MIMO-Systeme. PhD thesis, Universit¨at Bremen. W¨ubben D and Kammeyer KD 2003 Impulse Shortening and Equalization of Frequency-Selective MIMO Channels with Respect to Layered Space-Time Architectures. EUROASIP Signal Processing 83(8), 1643–1659.
358
BIBLIOGRAPHY
W¨ubben D, B¨ohnke R, K¨uhn V and Kammeyer KD 2003a MMSE Extension of V-BLAST based on Sorted QR Decomposition. IEEE Semiannual Vehicular Technology Conference (VTC2003-Fall), Orlando, FL. W¨ubben D, B¨ohnke R, K¨uhn V and Kammeyer KD 2003b Successive Detection Algorithm for Frequency Selective Layered Space-Time Receiver. International Conference on Communications (ICC03), Anchorage, AK. W¨ubben D, B¨ohnke R, K¨uhn V and Kammeyer KD 2004a MMSE-Based Lattice-Reduction for Near-ML Detection of MIMO Systems. ITG Workshop on Smart Antennas, Munich, Germany. W¨ubben D, B¨ohnke R, K¨uhn V and Kammeyer KD 2004b Near-Maximum-Likelihood Detection of MIMO Systems using MMSE-Based Lattice Reduction. IEEE International Conference on Communications (ICC’2004), Paris, France. W¨ubben D, B¨ohnke R, K¨uhn V and Kammeyer KD 2004c On the Robustness of Lattice-Reduction Aided Detectors in Correlated MIMO Systems. IEEE Semiannual Vehicular Technology Conference (VTC2004-Fall), Los Angeles, CA. W¨ubben D, B¨ohnke R, Rinas J, K¨uhn V and Kammeyer KD 2001 Efficient Algorithm for Decoding Layered Space-Time Codes. IEE Electronic Letters 37(22), 1348–1350. W¨ubben D, Rinas J, B¨ohnke R, K¨ohn V and Kammeyer KD 2002 Efficient Algorithm for Detecting Layered Space-Time Codes. 4th International ITG Conference on Source and Channel Coding, Berlin, Germany. Yan Q and Blum RS 2000 Optimum Space-Time Convolutional Codes. Wireless Communications and Networking Conference 2000, Chicago, USA. Yee N, Linnartz JP and Fettweis G 1993 Multicarrier CDMA in Indoor Wireless Radio Networks. IEEE Proceedings of IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, pp. D.1.3.1–D.1.3.5, Yokohama, Japan. Zha W and Blostein S 2003 Soft-Decision Multistage Multiuser Interference Cancellation. IEEE Transactions on Vehicular Technology 52(2), 380–389. Ziemer R and Peterson R 1985 Digital Communications and Spread Spectrum Systems. Macmillan Publishing Company. Ziv J and Lempel A 1977 A Universal Algorithm for Sequential Data-Compression. IEEE Transactions on Information Theory IT-23, 337–343. Zurm¨uhl R and Falk S 1992 Matrizen und ihre Anwendungen 1. Springer-Verlag, Berlin, Heidelberg, Germany.
Index A priori information, 113, 120, 147, 153, 232 Adaptive modulation, 28 Additive white Gaussian noise, see AWGN Alamouti, 283 Amplitude shift keying, see ASK Analytical signal, 10 ASK, 28 Autocorrelation function, 191 Automatic repeat request, 105 AWGN, 11, 29, 31, 44, 126 channel capacity, 64 Bayes rule, 18 BCJR algorithm for binary block codes, 115 for convolutional codes, 118 in logarithmic domain, 120 initialization, 118 Beamforming, 306 Belief propagation, 167 Bhattacharyya bound, 60 error exponent, 61 Binary entropy function, 52 Binary symmetric channel, see BSC Bipartite graph, 165 Bit error probability, 34 BLAST, 307 BLAST detection, 258 BSC, 67 Capacity region, 78 Carrier frequency, 9 Wireless Communications over MIMO Channels Volker K¨uhn 2006 John Wiley & Sons, Ltd
CDMA, 4, 19, 173 Chain rule for entropy, 54, 333 for information, 55, 333 of information, 132 Channel capacity, 55, 58, 219, 323 for CDMA uplink, 219 for diversity reception, 70 for multiple antenna systems, 323 of AWGN channel, 64 of MIMO systems, 73 of Rayleigh fading channel, 68 Channel coding, 92 Channel coding theorem, 59, 63 Channel state information, see CSI Check node, 166 Chi-squared distribution, 15 Cholesky decomposition, 259 Code division multiple access, see CDMA Code rate, 58, 93 Code-spread system, 210 Coding gain for STC, 282 Coherence bandwidth, 13 Coherence time, 13 Complementary error function, 21, 22 Complex envelope, 8 Concatenated codes, 135 parallel concatenation, 136, 141, 145, 149, 159 serial concatenation, 135, 137, 139, 145, 147, 157 Convolutional code, 100, 211 catastrophic code, 106 constraint length, 101
360 Convolutional code (continued ) finite state diagram, 104 NSC encoder, 102 puncturing, 105 RSC encoder, 103 tailbiting code, 104 terminated code, 104 trellis diagram, 105 truncated code, 104 Correlated channels, 325 Correlation function, 191 COST 207, 330, 331 Crosscorrelation function, 191 CSI, 22, 93 Cutoff rate, 61 Cyclic convolution, 197 D-BLAST, 307 Data-processing theorem, 55, 334 Decision region, 20 Decorrelator, 223, 234, 241 Delay diversity, 296, 297 Determinant, 335 Determinant criterion, 282 Differential entropy, 56 Digital Audio Broadcasting (DAB), 195 Digital Video Broadcasting (DVB), 195 Direct-sequence spread spectrum, 174 Discrete multitone, 196 Distance spectrum of codes, 122 Diversity, 36 frequency diversity, 13, 37 gain for STC, 282 polarization diversity, 38 receive diversity, 277 space diversity, 37, 277 space–time transmit diversity, 279 time diversity, 13, 37, 93 Doppler bandwidth, 13, 14 frequency, 13, 14 power spectrum, 13, 331 Downlink, 2, 81, 182, 202 Downlink transmission, 186 Dual code, 97
INDEX Effective distance, 138 Eigenvalue decomposition, 339 Entropy, 52 Equal gain combining, 38, 203 Equalizer, 18 Equivalent baseband representation, 8 Equivalent code, 95 Equivocation, 54 Error covariance matrix, 234, 238 Error function, 21 Even correlation function, 191 Extended Battail algorithm, 118 Extrinsic information, 113 Fast Hadamard Transform, 99, 211 FDMA, 3 FEC, 92 Finite field, 92 Finite impulse response filter, 13 Flat fading, 13, 82, 128 Forward error correction, see FEC Frame error probability, 18 Free distance, 101 Frequency division duplex, 306 Frequency division multiple access, see FDMA Frequency-selective fading channels, 12 Frobenius norm, 280, 336 Fully loaded system, 181 Gallager bound, 62 exponent, 62 function, 62 Gauss-Seidel algorithm, 244 Gaussian distribution, 11 Gaussian normal form, 95 Generator matrix, 95 Givens rotation, 342 Global System for Mobile Communications (GSM), 5 Gold codes, 193 Graph acyclic, 165 cyclic, 165 degree of a node, 165 factor graph, 166
INDEX girth, 165 node, 165 Tanner graph, 166 vertex, 165 Gray mapping, 28 Guard interval, 3, 197 Hadamard code, 99, 192 Hadamard matrix, 99 Hamming bound, 96 code, 98 distance, 94 weight, 122 Hermitian, 335 Hilbert transformation, 10 Householder reflection, 341 ICI, 197 IDFT, 196 IFFT, 196 Information, 94 Information content, 52 Information processing characteristic, see IPC Input–output weight enumerating function, see IOWEF Intercarrier interference, see ICI Interference cancellation, 218 Interleaving, 6 block interleaver, 7 convolutional interleaver, 7 interleaving depth, 7 random interleaver, 7 uniform interleaver, 137 Intersymbol interference, see ISI Intrinsic information, 113 Inverse Discrete Fourier Transform, see IDFT Inverse Fast Fourier Transform, see IFFT IOWEF, 122 of serial code concatenation, 138 IPC, 131 Irrelevance, 54 ISI, 12, 180, 197
361 Jacobi algorithm, 241 Jakes spectrum, 14 Joint entropy, 53 Karhunen-Lo`eve transformation, 47 L-Algebra, 111 Lattice reduction, 312, 343 LDPC codes, 160 irregular codes, 163 random code construction, 163 regular codes, 161 Line of sight, 331 Linear block codes, 94 Linear dispersion codes, 319 Linear multiuser detection, 233 Linear successive interference cancellation, 244 LLL algorithm, 343 LLR, 109 Log-likelihood ratio, see LLR Log-MAP, see BCJR algorithm Long spreading codes, 175 Low-density parity check codes, see LDPC codes M-sequences, 192 MAP criterion, 110 probability, 18 sequence detection, 18 symbol-by-symbol detection, 19 symbol-wise detection, 20 Matched filter, 10, 12, 25, 93, 177, 178, 259 Max-Log-MAP algorithm, 121 Maximum a posteriori, see MAP Maximum length sequences, 192 Maximum likelihood sequence detection, 18 symbol-by-symbol detection, 19 symbol-wise detection, 20 Maximum ratio combining, 38, 178, 202 MC-CDMA, 200 Message passing decoding, 167 MIMO, 1, 17 MISO, 1, 17
362
INDEX
ML, see maximum likelihood MMSE detector, 204, 223, 242 Modified Gram-Schmidt algorithm, 261 Moore-Penrose inverse, 235 MUD, 227 Multicarrier, 195, 200 Multilayer transmission, 227, 304 Multiple access, 3 orthogonal, 81 Multiple access interference, 3 Multiple antennas, 304 Multiple-input multiple-output system, see MIMO Multiple-input single-output system, see MISO Multirate CDMA systems, 184 Multistage detector, 241 Multiuser communications, 1, 78 Multiuser detection, see MUD Mutual information, 55, 80
Parity check matrix, 96 Path crosstalk, 180 Perfect codes, 98 Periodic correlation function, 191 Phase shift keying, see PSK PIC, 241 Positive semidefinite, 340 Post-sorting algorithm, 263 Power control, 183 Power delay profile, 13, 330 Preferred pairs, 193 Processing gain, 174 PSK, 33 Pulse amplitude modulation, see PAM Punctured codes, 108
Natural mapping, 28, 294 Near-far effect, 183, 184 Near-far resistance, 184, 235 Nonperiodic correlations function, 191 Nonsystematic encoder, 94 Nyquist, first criterion, 11
Rake receiver, 178 Random spreading, 220 Rank criterion, 282 Rank of a matrix, 336 Rayleigh distribution, 15 fading, 29, 31, 43 Rayleigh fading, 68 Redundancy, 53, 94 Repetition code, 97 Rice distribution, 16 factor, 15 fading, 45, 331
Odd correlation function, 191 OFDM, 19, 196 OFDM-CDMA, 200 Orthogonal Frequency Division Multiplexing, see OFDM matrix, 338 restoring combining, 203 space–time block codes, 282 spreading codes, 192, 220 Outage capacity, 70 Outage probability, 23, 32, 36, 69 PAM, 28 Parallel concatenated coding scheme, 215 Parallel interference cancellation, see PIC Parity check codes, 98
QAM, 30 QL decomposition, 258, 268 Quadrature amplitude modulation, see QAM
Scattering function, 13 Scrambling code, 192, 208 SDMA, 4, 276 Selection combining, 38 Serially concatenated coding scheme, 211 Short spreading codes, 175 Signal shaping, 66 Signal to interference plus noise ratio, see SINR Signal to noise ratio, 12
INDEX SIMO, 1 Simplex code, 98 Single-input multiple-output system, see SIMO Single-input single-output system, see SISO Single-user matched filter, see SUMF Singular value decomposition, see SVD, 339 SINR, 186 SISO, 1, 17 SISO channel, 8 Soft-in soft-out decoding, 109 Soft-output decoding, 112 of block codes, 118 of convolutional codes, 118 of Walsh codes, 114 Sorted QL decomposition, see SQLD Source coding, 53 Space division multiple access, see SDMA Space–time codes, 279 Spatial multiplexing, 304 Spectral efficiency, 3, 219 Spectral norm, 336 Sphere packing bound, 96 SQLD, 264 Square law combining, 38 Standard array decoding, 97 Successive decoding, 79 Sufficient statistics, 25, 56, 230 Sum rate, 78 SUMF, 185 Super-orthogonal codes, 215 SVD, 74, 235, 304 Symbol error probability ASK and AWGN, 29 ASK and flat Rayleigh fading, 29 BPSK and AWGN, 34 QAM and AWGN, 31 QAM and flat Rayleigh fading, 32 QPSK and AWGN, 34
363 Symbol-by-symbol MAP decoder, 115 Syndrome, 96 Syndrome decoding, 97 System load, 181 Systematic encoder, 94 TDMA, 3 Time diversity, 129 Time division duplex, 306 Time division multiple access, see TDMA Time-selective channel, 13 Trellis diagram for block codes, 99 for convolutional codes, 105 Turbo codes, 144 Turbo decoding, 146, 147, 150 Turbo detection, 231 UMTS, 5, 331 Union bound, 20, 126, 136 Unitary matrix, 338 Universal Mobile Telecommunications System, see UMTS Uplink, 3, 82, 183, 188, 207 V-BLAST, 307 Variable node, 166 Vector channel, 1 Viterbi algorithm, 106 decision depth, 108 incremental metric, 108 survivor, 108 Walsh codes, 211 Walsh sequence, 99, 192 Waterfilling, 77, 84 Whitening, 259 Wide sense stationary uncorrelated scattering, see WSSUS Wiener solution, 237 WSSUS, 13 Zero-forcing, 223, 234, 241