Turbo Codes: Error-Correcting Codes of Widening Application

Turbo Codes This page intentionally left blank INNOVATIVE TECHNOLOGY SERIES INFORMATION SYSTEMS AND NETWORKS Turbo...

Author: Michel Jezequel | Ramesh Pyndiah

501 downloads 1479 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Turbo Codes

This page intentionally left blank

INNOVATIVE TECHNOLOGY SERIES INFORMATION SYSTEMS AND NETWORKS

Turbo Codes error-correcting Codes of Widening Application

edited by

Michel Jézéquel & Ramesh Pyndiah

HPS

HERMES PENTON SCIENCE

First published in 2001 by Hermes Science Publications, Paris First published in 2002 by Hermes Penton Ltd Derived from Annales des Télécommunications, Vol. 56, no. 7-8, GET, Direction Scientifique, 46 rue Barrault, F 75634, Paris, Cedex 13, France. www.annales-des-telecommunications.com Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licences issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address: Hermes Penton Science 120 Pentonville Road London N1 9JN © Hermes Science Publications and GET, 2001 © Hermes Penton Ltd, 2002 The right of Michel Jézéquel and Ramesh Pyndiah to be identified as the editors of this work has been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

British Library Cataloguing in Publication Data A CIP record for this book is available from the British Library. ISBN 1 9039 9626 0

Typeset by Saxon Graphics Ltd, Derby Printed and bound in Great Britain by Biddies Ltd, Guildford and King's Lynn www.biddles. co. uk

Contents

Foreword M. Jézéquel and R. Pyndiah 1. A comparison of turbo codes using different trellis terminations J. Hokfelt, C.F. Leanderson and O. Edfors 2.

3.

4.

5.

6.

7.

8.

9.

vii

1

Design of cyclic shift interleavers for turbo codes J. Yuan, B. Vucetic, W. Feng and M. Tan

10

Code characteristic matching for iterative decoding of serially concatenated codes S.T. Brink

29

Noncoherent iterative decoding of spectrally efficient coded modulations G. Ferrari, G. Colavolpe and R. Raheli

59

Turbo soft interference cancellation for coded asynchronous DS/CDMA J.-R Cancès, G.R. Mohammadkhani and V. Meghdadi

82

A generalized framework for iterative source-channel decoding N. Görtz

105

How do we implement block turbo codes? S. Kerouédan, P. Adde and R. Pyndiah

127

Turbo processing in transmit antenna diversity systems G. Bauch, J. Hagenauer and N. Seshadri

142

Quantum factor graphs M.G. Parker

178

Index

200

This page intentionally left blank

Foreword

Ten years have now gone by since Claude Berrou and Alain Glavieux, Professors at the Ecole Nationale Supérieure des Télécommunications de Bretagne, presented their research work on a new type of error-correcting code: turbo codes. At that time, they had no idea of the upheaval they were about to cause in the field of error-correcting codes. Initially, turbo codes were seen as a parallel concatenation of elementary convolutional codes, separated by an interleaving function. The decoding principle is to be found in an iterative exchange of information between elementary decoders. This information is called extrinsic information. It is this principle of decoding which is at the origin of the term turbo. The turbo concept is now applied to block codes as well as to other parts of a digital transmission system, such as detection, demodulation and equalisation. Since they provide an excellent compromise between complexity and performance, turbo codes have now become the reference in the field of errorcorrecting codes, while their range of applications is increasing as each day goes by. After mobile communications (INMARSAT, UMTS) and interactive television (DVB-RCS, DVB-RCT), after wireless networks (IEEE. 802. 16. 2) and the local radio loop (HiperAccess), other applications will probably integrate turbo codes into their future standards. These applications could include cable transmission (DSL or optical fibre), short-distance communication (Bluetooth) or data-storage (magnetic tape, hard disk and others.) The number of articles on turbo codes in scientific journals and the number of conferences devoting specific sessions to turbo codes bear witness to the fact that scientific interest in them remains high. The last "Symposium on Turbo codes and Related Topics", organised in Brest in September 2000, brought together more than 300 participants from some thirty countries. It should be noted that an everincreasing number of representatives of industry are attending such events. Industrialists represented more than 40% of all attendees in Brest. Nine contributions have been selected for this special issue, divided into three categories. The first concerns the construction of concatenated codes and their properties. The second category looks at the problems of decoding, while the final section examines the extension of turbocodes to other applications. The first group brings together four contributions, the first of which, authored by Johan Hokfelt, Carl Fredrik Leanderson and Ove Edfors, studies the influence of the trellis termination on the performance of turbo codes. The second, written by Jinhong Yuan, Branka Vucetic, Wen Feng and Mark Tan, is a study about the design of cyclic shift interleavers. In the third, Stephan Ten Brink suggests a new

viii

Foreword

method for buildingSCCCbased on the transfer of mutual information between the input and the output ofSISO.In the fourth article, Gianluigi Ferrari, Giulio Colavolpe and Riccardo Raheli examine the iterative decoding of turbo codes in the context of non-coherent reception. The second category is made up of three contributions. The first, authored by Jean-Pierre Cancès, Golam Reza Mohammadkhani and Vahid Meghdadi, looks at the problem of interference cancelling in the receiver for a coded, asynchronous DS/CDMA system. In the second article, Norbert Görtz presents a general method for joint, iterative, channel-source coding. The third article by Sylvie Kerouédan, Patrick Adde and Ramesh Pyndiah, examines a solution for the implementation of block turbo codes. The final category is made up of two articles. In the first, Gerhard Bauch, Joachim Hagenauer and Nambi Seshadri examine the extension of the turbo code principle to multiple antenna systems with space diversity. In the final article, Matthew G. Parker introduces a new concept for turbo-processing, called "Quantum Factor Graphs". Finally, we wish to dedicate this publication to Claude Shannon who departed this life in February 2001. He was a redoubtable theoretician who laid the bases of Information Theory more than fifty years ago. turbo codes represent one application of the numerous theoretical contributions Claude Shannon made. We would also like to address our warm thanks to all the experts who helped us to select the articles and without whom we would not have been able to produce this special issue on time. Michel JéZéQUEL and Ramesh PYNDIAH ENST Bretagne - Technôpole de Brest-Iroise BP 832 - 29285 Brest Cedex - France

Chapter 1

A comparison of turbo codes using different trellis terminations J. Hokfelt Fiberless Society, Lund, Sweden

C.F. Leanderson and O. Edfors Dept of Electroscience, Lund University, Sweden

I. Introduction Turbo codes are in general implemented as two recursive convolutional encoders in parallel, where the input to the second encoder is an interleaved version of the original information sequence fed to the first encoder [1]. At the beginning of each information block, the encoders are initialized to their zero-states. Similarly, at the end of each information block it is desirable to force the encoders back to the zero-state, an operation known as trellis termination. For feedforward convolutional encoders, this is readily achieved by appending tail bits at the end of the encoder input sequence. However, the recursive property of the component encoders used in turbo codes implies a state-dependency on these tail bits and, hence, individual tail sequences are required for each component encoder. The performance of a specific trellis termination method is dependent on the particular interleaver used in the turbo encoder. This dependency is a result of interleaver edge effectsts [2]. These edge effects are described in this paper for th case of uniform interleaving [4]. It is demonstrated how the choice of different termination methods influences the performance for turbo codes with different interleaver lengths and different number of memory elements in the component encoders. The distance spectra are calculated using the concept of uniform interleaving and the tangential sphere bound is used to upper bound the achievable maximum likelihood (ML) decoding performance. The investigated trellis termination strategies are: no termination at all, termination of the first encoder only, termination of both encoders within the length of the interleaver and termination of both encoders with postinterleaver flushing [3].

2

Turbo codes

II. Distance spectra The calculation of the distance spectrum of a specific turbo code involves taking the particular interleaver into account, a task that becomes prohibitively complex even for short-length interleavers. A less computationally demanding method was introduced by Benedetto et al. in [4], where a method to derive the average distance spectrum for the ensemble of all interleavers of a certain length was presented. In this section we summarize their method, and present an extension by which we include the influences of different trellis terminations. The methodology as such is general, but for reasons of simplicity we restrict this presentation to two-component turbo codes with binary systematic recursive convolutional encoders, as introduced in [1]. Benedetto et al. introduced the input-redundancy weight enumerating function (IRWEF) [4]

for a systematic (N + J, N)-code, where Aw,j is the number of codewords with input weight w and parity weight j. N is the number of information bits (corresponding to the interleaver length), J is the number of generated parity bits, and W and Z are dummy variables. Since both component encoders in a turbo code share the same input bits, though in different order, every codeword that belongs to a turbo code is composed of two component-code codewords that both result from sequences of the same weight w. For this reason, Benedetto et al. defined the conditional weight enumerating function (CWEF)

which enumerates the number of codewords of various parity weights J, conditioned on the input weight w. The CWEF of the first and second component encoders are denoted AwC1(Z) and AwC2(Z) respectively, and the CWEF of the overall turbo code AwTC(Z). By introducing a probabilistic interleaver construction called a uniform interleaver, for which all distinct mappings are equally probable, Benedetto et al. obtained the CWEF of the ensemble of all turbo codes using interleavers of length N as

where l/(NW) is the probability that a specific weight-w sequence is mapped to another, specific, weight-w sequence. Finally, the number ad of words of Hamming weight d is equal to

A comparison using different trellis terminations

3

where ATCw,d_w are the coefficients in the turbo code CWEF, i.e. Since we are addressing systematic codes the codeword weight is the sum of input and parity weight, i.e. d = w+ j. When deriving the CWEF of the component codes of turbo codes, it is common practice to take only the error events that end up in the zero-state into account, i.e. to consider only zero-terminating input sequences. Depending on the method of trellis termination, codewords might also exist that result from trellis paths that do not end up in the zero-state after N trellis transitions. In the sequel, a method to derive the CWEF for various trellis termination methods is presented. A. Interleaver edge effects Interleaver edge effects refer to the implications on the distance spectrum resulting from the block partitioning of the input sequence, as the result of a limited-length interleaver [2], Due to this truncation, low-weight parity words can be generated even though the encoder input sequences do not force the encoders back to the zero-states. In terms of weight enumerating functions, this means that we require knowledge not only of the number of trellis paths that lead to the zerostate after the last transition, but also the number of paths that lead to other final states. This can be obtained by partitioning the IRWEF defined by (1) into a statedependent counterpart At,s (W, Z), which enumerates the number of trellis paths that lead to state s, having input weight w and parity weighty. An efficient method to find the state-dependent IRWEF of a convolutional encoder valid after t trellis transitions is to extend the IRWEF of the same encoder obtained for t - 1 transitions. The state and time dependent IRWEF is defined as

where At,s,w,j is the number of paths with input weight w and parity weight j that lead to state s after t trellis transitions. Based on the encoder trellis, the coefficients of the state and time dependent IRWEF are calculated recursively in time as

where S(s, u) is the state that leads to state s when the input symbol is u, and P(S(s, u), u) is the parity weight generated by the corresponding trellis transition. This recursion is illustrated in Figure 1, for a two-state encoder. At time t = 0, the recursive procedure is initialized with A0,0,0,0 = 1 and A0,s,w,j = 0, (s,w,j) (0,0,0), which corresponds to an encoder initialized in the zero state.

4

Turbo codes

Figure 1. Recursive calculation of distance spectrum for a two-state encoder.

Let EC1w,J and EC2w,J denote the multiplicities of codewords with input weight w and parity weighty that correspond to trellis paths that do not end up in the zerostate after encoding length-N input blocks, for component code C1 and C2 respectively. We denote such codewords edge-effect codewords and their multiplicities are the coefficients of the corresponding CWEFS, according to E C l w (Z) = ΣJj=0 EClw,jZj, 1=1,2. The overall CWEFS, including both zero-terminating and edge-effect codewords, are then obtained as

and the resulting CWEF for the turbo code is

Note that AwCl (Z) includes only trellis paths that end in the zero-state after N transitions. Thus, AClwj is obtained from (6) for t = N and s = 0. The difference between ÃwCl (Z) and A Cl w (Z) equal EClw,j, which depends on how the trellises are terminated. In order to evaluate this quantity, we calculate below EClwj for four classes of trellis termination methods: 1. No termination of either component encoder. 2. Termination of the first component encoder. 3. Termination of both component encoders. 4. Post-interleaver flushing. Class I. No trellis termination With no termination of either component encoder, the multiplicities of codewords that stem from interleaver edge effects are calculated by summing the number of paths that end in the non-zero states after N trellis transitions. Thus,

A comparison using different trellis terminations

5

where m1 and m2 are the number of memory elements in encoder 1 and 2, respectively. The overall distance spectrum including edge effect codewords, Ãw T C (Z), is calculated using (7) and (8). Class II. Termination of the first encoder By appending m1 tail bits to the input sequence so that the first encoder is terminated in the zero-state, the edge effect codewords are entirely removed from the first component code. Note that the tail bits are included in the sequence that enters the interleaver, and that their Hamming weight is included in the input weight w. For the second encoder, the situation is identical to the case of no trellis termination. Hence,

Class III. Termination of both encoders It is also possible to terminate both component encoders in their zero-states. At least two different ways of achieving this have been reported in the literature: 1. By imposing interleaver restrictions, the second encoder can be forced to end up in the same state as the first encoder [5, 6]. It is then suffcient to append a single set of tail bits according to termination Class II in order to terminate both encoders in their zero-states. 2. By identifying specific, interleaver dependent, input positions it is possible to force the component encoders to their zero-states independently of each other [7]. This is achieved without any restrictions on the choice of interleaver, but with a slight increase in the number of input bits dedicated to trellis termination (m termination bits are required, max (m l ,m 2 m ml + m2). With both encoders terminated in their zero-states, all edge-effect codewords are removed. Consequently,

6

Turbo codes

Class IV. Post-interleaver flushing Trellis termination by post-interleaver flushing was proposed in [3]. With this method, both encoders are flushed independently of each other, after encoding their N-bit input sequences. The combination of the weight spectra of the component encoders is then similar to the case of no trellis termination, since the trellises are not terminated by the end of their length-N input sequences. However, extra codeword weight is added as a consequence of the encoder flushing. This is accounted for by adding the weight of the flush bits and the corresponding parity bits to the parity weight in the IRWEFS. More precisely,

where Fl (s), / = 1, 2, is the sum of the weight of the flush bits and parity bits generated when forcing encoder / to the zero-state from state s.

III. Evaluation The distance spectra as such are not very useful when assessing the performance of turbo codes. However, in combination with proper bounding techniques a useful assessement can be made. In this section we use the method presented for calculating the distance spectra in combination with the tangential sphere bound [8, 9], which is an upper bound on the frame-error rate (FER) for ML-decoding of codewords transmitted over an additive white Gaussian noise channel. The tangential sphere bound is used since, in contrast with the union bound, it provides a useful bound on the error performance also below the cut-off rate of the channel. To verify the bounds obtained it would be of interest to present MLdecoding simulations as a comparison. Such comparisons are, however, not feasible due to the prohibitive ML-decoding complexity of turbo codes. We therefore compare with simulation results obtained when using standard suboptimal iterative decoding. The simulation results presented are obtained by 15 decoding iterations employing the modified BCJR (Bahl, Cocke, Jelinek, and Raviv) decoding algorithm [1,10]. We have compared rate 1/3 turbo codes using interleavers of lengths 100 and 500 bits, and various feedback and feedforward polynomials. We have limited the investigation to the most common setup where two identical component encoders are used. Figures 2, 3 and 4 show the upper bounds (on ML-decoding) together with the simulated (iterative decoding) performances of a large number of randomly chosen interleavers, for a selection of the codes investigated. The

A comparison using different trellis terminations

Figure 2. Performance of turbo codes with 500-bit random interleaving. The feedforward and feedback polynomials are 58 and 78 respectively. The lines represent the calculated tangential sphere bounds, while the corresponding simulations are indicated by markers.

Figure 3. Performance of turbo codes with 100-bit random interleaving. The feedforward and feedback polynomials are 178 and 158 respectively. The lines represent the calculated tangential sphere bounds, while the corresponding simulations are indicated by markers.

8

Turbo codes

Figure 4. Performance of turbo codes with 500-bit random interleaving. The feedforward and feedback polynomials are 178 and 158 respectively. The lines represent the calculated tangential sphere bounds, while the corresponding simulations are indicated by markers. simulated error-rates exceed the derived upper bounds, which is a direct result of the suboptimal iterative decoding. Disregarding the absolute values, the bounds give good indication on the relative performance of the different codes/termination methods. Thus, even though the distance spectra of different codes cannot be used for absolute perfomance prediction, they are useful when making intelligent design choices for turbo codes. These design choices are however beyond the scope of this paper.

IV. Conclusions A method for deriving interleaver ensemble average distance spectra of turbo codes using different trellis termination methods has been presented. Using this method, we have investigated four principal classes of trellis termination: no termination, termination of the first encoder, termination of both encoders, and post-interleaver flushing. These methods have been evaluated using component encoders with constraint length 3 and 4, and two interleaver sizes: 100 and 500 bits. In general, the performance differences between the termination methods are small, except for the case of no termination. Among the three methods that involve termination, the best distance spectra are observed with post-interleaver flushing and both encoders terminated. The performance degradation when no trellis termination is used show little dependence on the interleaver size but it is highly dependent on the choice of component encoders. Especially, the length

A comparison using different trellis terminations

9

of the period of the encoder impulse responses is crucial; the larger the period, the larger the performance loss of not using any trellis termination. The large performance losses suffered when no trellis termination is used are results of the inferior average distance spectra achieved by the ensemble of turbo codes, corresponding to all possible interleavers. However, this performance degradation can be avoided by proper interleaver design, as discussed in [11]. REFERENCES [1] BERROU (C.), GLAVIEUX (A.), and THITIMAJSHIMA (P.), "Near Shannon limit errorcorrecting coding and decoding: Turbo Codes," in Proc. 1993 IEEE Int. Conf. on Comm., (ICC), pp. 1064-1070, Geneva, Switzerland, May 1993. [2] DOLINAR (S.) and DIVSALAR (D.), "Weight distributions for turbo codes using random and nonrandom permutations," TDA progress report 42-122, Jet Propulsion Lab., Pasadena, CA, Aug. 1995. [3] DIVSALAR (D.) and POLLARA (F.), "Turbo codes for PCS applications," in IEEE Int. Conf. on Comm., New York, USA, 1995. [4] BENEDETTO (S.) and MONTORSI (G.), "Unveiling turbo codes: Some results on parallel concatenated coding schemes," IEEE Trans. on Inf. Theory, vol. 42, pp. 409-428, Mar. 1996. [5] BARBULESCU (A.S.) and PIETROBON (S.S.), "Terminating the trellis of turbo-codes in the same state," Electronics Letters, vol. 31, pp. 22-23, Jan. 1995. [6] HATTORI (M.), MURAYAMA (J.) and MCELIECE (R.J.), "Pseudo-random and selfterminating interleavers for turbo codes," in Winter 1998 Inf. Theory Workshop, San Diego, USA Feb. 1998. [7] GUINAND (P.) and LODGE (J.), "Trellis termination for turbo encoders," in 17th Biennial Symp. On Communications, Kingston, Canada, May 1994. [8] Poltyrev (G.), "Bounds on the decoding error probability of binary linear codes via their spectra," IEEE Trans, on Inf. Theory, pp. 1284-1292, Jul. 1994. [9] SASON (I.) and SHAMAI (S.). "Improved upper bounds on the ML decoding error probability of parallel and serial concatenated turbo codes via their ensemble distance spectrum," IEEE Trans. on Inf. Theory, vol. 46, n°1, pp. 24-47, Jan. 2000. [10] BAHL (L.R.), COCKE (J.), JELINEK (F.) and RAVIV (J.), "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. on Inf. Theory, vol. 20, pp. 284-287, Mar. 1974. [11] HOKFELT (J.), EDFORS (O.) AND MASENG (T.), "On the theory and performance of trellis termination methods for turbo codes", IEEE Journal on Sel. Areas in Comm., vol. 19, n°5, pp. 838-847, May, 2001.

Chapter 2

Design of cyclic shift interleavers for turbo codes J. Yuan School of Electrical Engineering and Telecommunications, Univ of NSW, Australia

B. Vucetic, W. Feng and M. Tan School of Electrical and Information Engineering, University of Sydney, Australia

I. Introduction Turbo codes have been shown to achieve near Shannon capacity limits on AWGN channels [1]. Interleaving size and structure affect the turbo code error performance considerably. At low SNR'S, the interleaver size is the only important factor, as the code performance is dominated by the interleaver gain [2]. The effects induced by changing the interleaver structure at low SNR region are not significant. However, both the interleaver size and structure affect the turbo code minimum free distance and first several distance spectral lines. They play an important role in determining the code performance at high SNR'S, and consequently, the asymptotic performance of the turbo code. It is possible to design particular interleavers which can result in good code performance at high SNR'S. This is achieved by breaking several low weight input patterns that produce low weight codewords in the overall turbo code [7]. There is a number of interleavers that have been designed for turbo codes, such as block interleavers, "simile" interleavers [4], pseudo-randora interleavers [3, 16, 17], convolutional interleavers [11], and algebraic interleavers [13, 14]. In these designs, most of the efforts have been made to find interleaver structures that result in better distance spectrum for turbo codes. In [12], a criterion based on the correlation properties of the extrinsic information is used in the interleaver design to improve the performance of suboptimal iterative decoding. Various interleaving structures have been shown to achieve a large minimum free distance and significant performance improvement at high SNR'S for turbo codes, such as S-random [3] and code matched interleavers [6] [8]. These interleavers have a pseudo-random structure, which is always associated with high design complexity and large memory requirement for practical implementation. Therefore, it is

Design of cyclic shift interleavers

11

desirable to construct an interleaver that has good performance, low design complexity and is easy to implement. In this paper we consider cyclic shift interleavers for turbo coding. The properties of cyclic shift interleavers are discussed. The cyclic shift interleavers are equivalent to the S-random interleavers in the ability to break low weight input patterns. The performance of turbo codes with cyclic shift interleavers are compared with the performance of S-random interleavers for various interleaver sizes. It is shown that a cyclic shift interleaver is capable of outperforming an S-random interleaver if the parameters of the cyclic shift interleaver are chosen properly. In addition to the performance improvement, the cyclic shift interleavers have the advantages of lower design complexity and less memory requirements relative to the S-random interleavers.

II. Code performance analysis A rate k/n, where k is the information sequence length and n is the encoded sequence length, memory order v turbo code which consists of two convolutional codes with an interleaver length N can be represented by an equivalent (n(N + v), kN) block code if the trellis termination is used to drive the constituent encoders to the all-zero state. It is possible to drive both constituent encoders to terminate in the all-zero state by the dual tail-biting proposed in [4] [15]. However, the trellis termination does net affect the interleaver design. The bit error probability of an equivalent block code decoded by a maximum-likelihood algorithm over an additive white Gaussian noise (AWGN) channel can be upper-bounded by a union bound [6]

where R is the code rate,Eb/Nois the signal-to-noise ratioper information bit, dmin is the minimum Hamming distance, Bd is the error coefficient which is equal to the average number of bit errors caused by the transition between the all-zero codeword and codewords of weight d (d d min ). The Q( •) is defined as

The set of all pairs of (d, Bd) represents the turbo code distance spectrum. It determines the contribution of the codewords with the same weight d to the bit error probability. The error coefficient Bd in (1) can be represented by

12

Turbo codes

where Aw,z is the number of the codewords of the equivalent block code with the input information weight w and parity check weight z. The overall Hamming weight of the codeword is d = w + z. From the code distance spectrum, we can obtain the contribution of each spectral line to the BER performance. The contribution of a spectral line (di' Bdi) with the distance di and the error coefficient Bdi, to the overall BER at a particular Eb/No,, denoted by Fdi (E b /N o ), can be written as

Its relative contribution to the total BER can be represented as

Then the contribution of the spectral line at an Eb/N0 range [a,b] is obtained as

where Fdiab is called contribution integral and a and b are specified Eb/No values. Fdabcan be further normalized as

where Fdiab is called the relative contribution integral [8]. The relative contribution integral Fdiabaccurately represents the relative contribution of each spectral line to the overall BER in a SNR range. It can be used as the criterion to determine the contribution of each spectral line in a SNR region. For example, we consider a rate 1/3, 4 state turbo code with generator matrix G =[1,1+D2/1+D+D2]and the interleaver size of 1024 bits. The relative contribution integral Fdiab for this code in Eb/No range of 0-2 dB and 2-4 dB is shown in Table I. From Table I, it is clear that at low SNR [0 ~ 2dB], a large number of medium weight spectral lines determine the code performance. The influence of low weight spectral lines can be ignored. No individual spectral line is significant to the performance. However, at a medium to high SNR range [2 ~ 4dB], a few low weight spectral lines determine the code performance. For the interleaver design, it is important to identify the input sequences which generate the codewords corresponding to these significant spectral lines. Table II lists all these input sequences. In Table II, the input weight is the weight of the input sequences and the parity-check weight is the weight of the corresponding parity-check sequences of one constituent code. For example, if a weight-4 input

Design of cyclic shift interleavers

13

Table I. Relative contribution integral of the 4-state turbo code Fdiab (%).

Fdi

di

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

ab

0-2dB

2-4dB

1.04e-09

2.02e+00

3.03e-12

4.90e-03

1.96e-09

2.66e+00

4.18e-08

4.81e+01

1.98e-09

1.93e+00

2.69e-08

2.26e+01

1.48e-09

1.07e+00

1.32e-08

8.35e+00

9.42e-10

5.22e-01

6.01e-09 6.67e-10

2.94e+00

2.72e-09

l.05e+00

6.19e-10

2.16e-01

2.89e-01

2.20e-09

6.93e-01

6.11e-10

174e-01

1.87e-09

4.87e-01

5.47e-10

1.30e-01

1.39e-09

3.04e-01 8.79e-02

4.36e-10

sequence generates a weight 2 parity-check sequence at the first encoder and a weight-6 paxity-check sequence at the second encoder, it will generate a codeword with total Hamming weight of 12 (4+ 2 + 6= 12); if a weight-4 input sequence generates weight-4 paxity-check sequences in both encoders, it will generate a codeword with total Hamming weight of 12 as well (4 + 4 + 4 = 12). As shown in Table I, the strongest contributions in performance at medium to high SNR (2 ~ 4 dB) come from codewords with weights 10 and 12. From Table II, these codewords are generated by the input patterns with weight 2, 4, 6 and 8. The contribution from codewords with weights 14 and 16 is the next strongest, but much weaker. They are generated by weight-2, 4, 6, 8 and 10 input patterns. Codewords with weight-9 can only come from input weight-3 and 5 patterns, and the contribution is even smaller. Obviously, the code performance can be

14

Turbo codes

Table II. 7. The input patterns that generate significant low weight codewordsewords. codeword weight I Fdiab 2-4dB input weight 2 10 48.1%

22.6%

12

8.35%

14

2.94%

16

2.66%

9

2.02% 1.93%

7 11

4 6 2 4 6

1.07%

18

i

1.05%

2, 4 2 4, 6 2, 4, 6

4

2, 4 2 4, 6, 8 2, 4, 6, 8

6 8 10

2, 4, 6 2, 4 2

2 1

6 8

4, 6, 8, 10 2, 4, 6, 8, 10 2, 4. 6, 8 2, 4, 6

3

2, 4

5 3

2 2

3

2, 4, 6

8 2

5 13

parity-check weight 4

2,

4

3 2, 4, 6, 8 9 2 4 6 8

10

2, 6 2, 4 2, 4 4, 6, 8, 10, 12 2, 1 6, 8, 10, 12 2, 4, 6. 8, 10 2, 4, 6, 8 2, 4, 6

improved by eliminating the codewords that have considerable contributions to the performance. This can be done by disallowing the input patterns that generate these codewords. For this code, the most significant codewords need to be considered in the interleaver design are weight-10 and 12 codewords, since they contribute 70% of errors in the code BER performance at the Eb/No range of (2 ~ 4 dB). The input patterns corresponding to these significant codewords are of weight-2, 4 and 6. An interleaver design method based on this approach is presented in the following sections.

Design of cyclic shift interleavers

15

III. Interleaving III.l. Interleaving Function Interleaving is a process of rearranging the ordering of a data sequence in a one-to-one deterministic format. The inverse of this process is deinterleaving which restores the received sequence to its original order. We consider an interleaver of size N. For simplicity, we assume that the data sequence at the input of the interleaver is binary, given by

where ci ,ε {0, 1 }, l sequence

i N. The interleaver permutes the sequence c to a binary

where τj ε {0, 1}, 1 j N. The τ bas all the elements of c but in a different order. If we consider the input sequence c and the output sequence τ as a pair of sets with N elements, there is one-to-one correspondence ci—> τj- between each element of c and each element of C. Let us define a set A as A = { 1, 2, ... N}. The interleaver can then be defined by a one-to-one index mapping function

where i and j are the index of an element in the original sequence c and the interleaved sequence C, respectively. The mapping function can be represented by an interleaving vector

III.2. Interleaving In Turbo Coding In turbo coding, an interleaver is employed before the information data is encoded by the second component encoder. In general, the interleaver size N is significantly larger than the code memory v and the interleaver vector elements are chosen randomly. The basic role of a random interleaver with large size is to construct a long block code from small memory convolutional codes, as long codes can approach the Shannon capacity limit. Secondly, it spreads out burst errors. The interleaver provides "scrambled" information data to the second component encoder and decorrelates the inputs to the two component decoders so that an iterative suboptimum decoding algorithm based on "uncorrelated" information exchange between the two component decoders can be applied. For example, after correction of some of the errors in the first component decoder, some of the remaining errors can be spread by the interleaver such that they become correctable in the other

16

Turbo codes

decoder. By increasing the number of iterations in the decoding process the bit error probability approaches the channel capacity. The final role of the interleaver is to break low weight input sequences, and hence increase the code free Hamming distance or reduce the number of codewords with small distances in the code distance spectrum. In turbo coding, the interleaving size and structure affect the turbo code error performance considerably. At low SNR'S, the interleaver size is the only important factor, as the code BER performance is dominated by the interleaver gain [2]. The effects induced by changing the interleaver structure at low SNR region are not significant. However, both the interleaver size and structure affect the turbo code minimum free distance and first several distance spectral lines. They play an important role in determining the code performance at high SNR'S, and consequently, the asymptotic performance of the turbo code. It is possible to design particular interleavers which can result in good code performance at high SNR'S. This is achieved by breaking several low weight input patterns that produce low weight codewords in the overall turbocode.

III.DL3. Interleaver Design From the previous code performance analysis, the design criteria of an interleaver for turbo codes can be formulated as follows: 1. Keep the interleaver random. Note that in iterative soft output decoding algorithms, the information exchange between the two component decoders is possible because of the interleaving/deinterleaving operations. The input and output sequences of the interleaver should be uncorrelated. The more "scrambled" the interleaver, the more "uncorrelated" the information exchange. 2. Eliminate low-weight codewords with significant contributions to the errer performance. The most significant input patterns are those giving large contributions to the code errer probability at high SNR'S. These input patterns produce low weight codewords corresponding to the first several distance spectral lines in turbo code distance spectrum. The most significant input patterns are determined on the basis of the performance analysis. The interleaver design makes sure that these input patterns are broken such, that the first several distance spectral lines of the original distance spectrum are eliminated. 3. Reduce the number of other low-weight codewords which could not be eliminated. Obviously, there are quite a large number of low-weight codewords. In practice, it is not possible to eliminate all the low-weight codewords. Thus, one should eliminate as many as possible the significant codewords in the interleaver design and reduce the number of the codewords which could not be eliminated.

Design of cyclic shift interleavers

17

The elimination of a specific codeword can be done by breaking up the input pattern that generates that codeword.

IV. Cyclic shift interleavers Cyclic shift interleavers can be considered as a special case of convolutional interleavers. A standard convolutional interleaver [9, 10] consists of an input and output commutator and a bank of L shift registers as shown in Fig. 1. The information sequence to be interleaved is arranged in blocks of L bits. The input commutator cyclically inserts each block of L bits into the bank of L registers. The ith bit in each block is delayed by the ith shift register and the delay of the shift register is (i - 1)B. The output commutator cyclically samples the bank of L registers in the same order as the input one. The deinterleaver performs the inverse operation. That is, the ith bit in each block is delayed by the ith shift register, where the delay of the shift register is (L - i)B. The convolutional interleaving function, after the initial states of the shift registers have been cleared out, can be expressed as

Figure 1. A convolutional interleaver/deinterleaver.

18

Turbo codes

The parameters of the convolutional interleavers are designed such that no continuous sequence of L symbols in the output sequence contains any symbols that were separated by few than LB - 1 symbols in the input sequence. This property of the convolutional interleavers can be represented as

whenever

If the parameters B and L of a convolutional interleaver are chosen properly, it can break some low weight input patterns which appear in a block interleaver, and it can give very good performance [11]. However, as a result of the convolutional operation, the interleaved sequence is expanded by (L - l)LB symbols relative to the input sequence. To avoid the sequence expansion, we propose a cyclic shift interleaver structure that preserves the convolutional interleaver property. One considers information sequence c = (cl, c2, ..., CN) of length N = m X n with m n. To interleave this sequence, one writes the indices 4, 1 i N, in an m X n array C, column by column, and from the right to the left. For example, for m = 3 and n = 7, the array is

The interleaver description is to give an integer B n/m, that will specify a new matrix C, the i-throw of which is obtained from the i-th row of C by a cyclic shift of amplitude (i - 1)B. For example, with B = 2, one obtains

Let j(i) be the integer that appears in C at the same place as i appears in C. The interleaver described by C produces from c = (c1, c2, ..., cN ),the interleaved sequence τ= (c;(1), c;(2), ..., cj(N)). In this example, the interleaved sequence is given by

Comparing the convolutional interleaver and the cyclic shift interleaver it is obvious that the cyclic shift interleaver retains the property of the convolutional

Design of cyclic shift interleavers

19

interleaver, expressed by (13) and (14) with L = m, but it overcomes its disadvantage of expanding the output sequence.

V. S-random interleavers Since S-random interleavers have a superior performance relative to block and pseudorandom interleavers, we will test the cyclic shift interleavers against the S-random interleavers. An S-random interleaver is defined as follows [3]. Each randomly selected integer is compared to the S1 previously selected integers. If the absolute value of the difference between the current selected integer and any of the S1 previous selected integers is smaller than S2, then the current selected integer is rejected. This process is repeated until all N integers are selected. In general an S-random interleaver can be described as

whenever

where S1 and S2 are two integers smaller than N. In a turbo encoder, these two parameters should, in general, be chosen to correspond to the maximum input pattern lengths to be broken by the interleaver. Thus, they should be chosen as large as possible. However, as the search time for this algorithm becomes prohibitively large for large values of S1 and S2, a good trade-off between interleaver performance and search time is obtained for S1, S2< N/2. For turb codes with identical component encoders, it is appropriate to set S = S1 = S2. For a turbo encoder, an S-random interleaver can break the input patterns with lengths up to S + 1 and generate high weight parity check sequences, as explained below [8]. Let {c} be the set of all the input patterns generating an error event of the component code. The length of an input pattern c is denoted by l(c), and the weight of the input pattern is denoted by w(c). If the length of an input pattern is small, it will likely produce a low weight codeword. Therefore, the interleaver should break this kind of input pattern. With an S-random interleaver, the input pattern will be mapped to another sequence c. If c is not an error pattern, we say that the input pattern for the component encoder is broken. The second encoder will produce a parity check sequence of infinite weight (if no termination is performed). Otherwise, if l(c) S + 1, because of the S-constraint, /(c) > (w(c) - 1) (S + 1). As the path length increases, c will likely produce a high weight parity check sequence. Thus, in both cases, the overall codeword weight will be high. Based on the previous discussion, we can conclude that an S-random interleaver can either break the input patterns with length up to S + 1 or expand these input patterns to longer error patterns of length more than (w - 1) (S + 1),

20

Turbo codes

where w is the input sequence weight, no matter what the component code is. Thus 5-random interleavers can achieve better performance compared to pseudo-random interleavers. It is worth noting that the 5-random interleaver functions (15) and (16) agree with the property of the cyclic shift interleavers as shown in (13) and (14), if the parameters L and B of the cyclic shift interleaver are chosen properly. That is to say these two types of interleavers are equivalent in the sense of their ability to break low weight input patterns.

VI. Cyclic shift interleaver design and performance evaluation 6.1 Interleaver Design Let a cyclic shift interleaver size be N = m X n, where m and n are the number of rows and columns of the interleaving matrix, respectively, such that m n. To generate a cyclic shift interleaver based on (13) and (14) comparable to an 5-random interleaver based on (15) and (16), we should choose

where S1, and S2 are the parameters of the 5-random interleaver For turbo codes with identical component codes, the parameters m and B from (17) become

In a cyclic shift interleaver design, B is an integer such that

Increasing the value of B will result in breaking more low weight input patterns, which on the other hand improves the turbo code performance. In order to get an insight into the breaking up the input patterns corresponding to the significant codewords in cyclic shift interleaver design, let us consider an example of the turbo code with two identical half rate constituent codes with generator matrix G =[1,g1(D)/g0(D)]where g0(D) and g 1 (D) are the feedback and feedforward polynomial, respectively. Assume that the first encoder receives the input sequence P and generates the output pair (P, Y1), while the second encoder takes the interleaved input sequence Q and generates the output pair (Q, y2). Yl and Y2 are the parity-check sequences generated by the first and the second encoder, respectively. The length of the input sequences is equal to the interleaver size N. Let us denote by w the input weight of P and Q, by ω(Y l ) the

Design of cyclic shift interleavers

21

weight of Yl and by ω(Y 2 ) the weight of Y2. The overall weight of the generated codeword is given by

A weight-2 input sequence that generates a finite weight codeword can be represented by a polynomial

where k1 = 1, 2, 3, ..., and µ is the minimum distance between two "l"s in the weight-2 input pattern that generates the finite weight codeword and τ1 is the time delay, τ1, = 1, 2, 3, .... Let us denote by zminmin the minimum weight of parity-check sequence generated by a weight-2 input pattern. Then the parity-check weight of this weight-2 input pattern is given by

Supposing that an interleaver maps this input sequence to another weight-2 input sequence that generate a finite weight parity-check sequence, the input to the second encoder, Q2(D), can be represented in the polynomial form as

where k2 = 1, 2, 3, ... and τ2 is the time delay, The parity-check weight of the second component encoder is given by

The overall weight of the generated codeword can be calculated from (20), (21) and (23) as:

Let us denote by i1 and i2 the positions of "l"s in the weight-2 input sequence, where i1, i2 ε A, and denote by π(i1), π(i 2 ) the positions of "l"s in the interleaved input sequence, where π(i1), π(i 2 ) ε A. If an interleaver mapping function meets the following condition

this interleaver will map the input sequence to another weight-2 input sequence that generates a finite weight parity-check sequence, as illustrated in Fig. 2. Both

22

Turbo codes

encoders generates finite weight parity-check sequences which is detrimental to the overall code errer performance. The resulting codeword will have a finite weight given by (24). In order to avoid this type of mapping, the interleaving function should have the following constraint:

For cyclic shift interleavers, considering the interleaving function (12) with L = m, the mapping condition (26) can be rewritten as

whenever However, only those input patterns that generate low weight codewords which have large contributions to the performance need to be eliminated in the interleaver design. Thus, from (27) we can get the following design conditions

In the case of the weight-3 input patterns that generate low weight codewords, the mapping from one weight-3 input pattern to another weight-3 input pattern that generate low weight codewords is not easy to make. Nevertheless, a similar method described above can be used to break those patterns. In addition, since a cyclic shift interleaver is equivalent to an S-randorn interleaver in the sense of their ability to break low weight input patterns, a cyclic shift interleaver can either break a short weight-3 input pattern with lengths up to m or expand it to a longer one with lengths more than 2mB. These longer input patterns will produce higher weight codewords [7]. This holds for other higher weight input patterns that generate codewords consisting of a single error event, where a single error event

Figure 2. A weight-2 input sequence is mapped to a sequence with the same pattern.

Design of cyclic shift interleavers

23

is defined as a path in the trellis that diverges from the all-zero state and then remerges with the all-zero state within a finite number of branches. In general, for all input patterns that generate a single error event, a cyclic shift interleaver can either break the input patterns with length up to m or expand these input patterns to longer error patterns with lengths more than (ω - 1) mB for any constituent code, where w is the weight of the input patterns. Now we will focus on the higher weight input patterns that generate compound error events. A compound error event is defined as a concatenation of two or more single error events. Let us consider weight-4 input sequences that generate compound error events consisting of two single error events which are generated by two weight-2 input patterns. The weight-4 input sequences can be presented by a polynomial

where k1', k'2 = 1, 2, 3, ..., and τ1 and τ2 are time delay with τ2 > τ1 + µk'1. Suppose an interleaver maps this input sequence to another input sequence consisting of two weight-2 input sequences that generate finite weight parity-check sequences, the input to the second encoder, Q4(D), can be represented in the polynomial form

as

where k'3, k'4 = 1, 2, 3, ..., and τ3 and τ4 are time delay with τ4 >τ3+µk'3. . The overall weight of the generated codeword can be calculated from (20) as

Let us denote by i1, i2, i3 and i4 the position of "l"s in the weight-4 input sequence, where i1,i2,i3and 4 εA and i1 < i2 < i3 < i4. Let us denote by π(i1), π(i 2 ), π(i3) and π(i4) the positions of "l"s in the interleaved input sequence, where π(i1), π(i2), π(i3) and π(i 4 ) ε A. If an interleaver mapping function meets the following conditions

or

24

Turbo codes

Figure 3. A weight-4 input sequence is mapped to another weight-4 input sequence.

this interleaver will map the input sequence to another weight-4 sequence that includes two weight-2 input patterns which generate finite weight parity-check sequences, as shown in Fig. 3. Both encoders generate finite weight parity-check sequences. The resulting codeword will have a finite weight given by (28). In order to avoid this type of mapping, the interleaver should satisfy the following condition | π(i1) — π(i3) | mod μ 0 and π(i2) —π(i4)| mod whenever

or

For cyclic shift interleavers, comparing the interleaving function (12) and the mapping condition (29), we can see that it is not easy to break all these weight-4 input patterns. In the interleaver design, we try to break as many of the weight-4 input patterns generating low weight codewords as possible. In addition, for unbroken input patterns, we try to maximize the overall codeword weight by maximizing the value of k'1 + k'2 + k'3+ k'4 where the parametersk'1,k'2,k'3, and k'4 are shown in Fig. 3.

Design of cyclic shift interleaves

25

The effect of input patterns with high weight CD > 4 on code error performance is small. This is partly due to that fact that they usually produce high weight codewords as a result of the cyclic shift constraint. Therefore these input patterns do not need to be considered in the interleaver design, provided that the conditions of breaking low weight input patterns are satisfied. VI.2 Performance Evaluations Cyclic shift interleavers are designed for interleaver size 1024. Turbo code performance with the cyclic shift interleavers is evaluated by simulations on additive white Gaussian noise (AWGN) channels. In the simulation, iterative decoding with the soft-output Viterbi algorithm (SOVA) is employed, since it is less complex than a MAP method. However, SOVA has a worse performance about 0.5-0.7 dB compared to MAP decoding [7]. The number of iterations is 8. The performance of a cyclic shift interleaver is compared to an S-random interleaver for the same interleaver size. S was chosen to be 15 for interleaver size 1024. For the cyclic shift interleaver, m was set to S + 1 and B was chosen from the set (1,2, . . . [ N / M 2 ] ) . The simulation results are shown in Fig. 4. It can be observed that the turbo code with an S-random interleaver can achieve a better performance relative to a

Figure 4. BER performance of the 16-state, rate 1/3, (1, 33/31) turbo code with S-random and cyclic shift interleavers on an AWGN channel, N = 1024.

26

Turbo codes

Figure 5. BER performance of the 16-state, rate 1/3, (1, 33/31) turbo code with S-random and cyclic shift interleavers on an AWGN channel, N = 4096.

cyclic shift interleaver with B = 1. However, increasing the value of B for the cyclic shift interleaver improves the code performance. For mterleaver size 1024, the cyclic shift interleaver with m = 16 and B = 2 outperforms the 5-random interleaver with S=15. For interleaver size 4 096, the constraint S was 31 and parameters m and B were 32 and 1, respectively. The simulation results for the turbo code with interleavers of size 4 096 are shown in Fig. 5. It can also be seen that the turbo code with the cyclic shift interleaver outperforms the 5-random interleaver. In addition to the performance improvement, the cyclic shift interleavers have the advantages of low design complexity and memory requirement. First, for given parameters, it is easier to construct a cyclic shift interleaver than to search for an 5-random interleaver. Secondly, the interleaving vector of an 5-random interleaver must be stored in memory for both the turbo encoder and decoder. However, for a cyclic shift interleaver, the interleaved or deinterleaved sequence can be generated from the interleaving matrix based on cyclic shifts. There is no need to store the interleaving vector. Therefore, the cyclic shift interleavers reduce the memory requirement and are easy to implement

VII. Conclusions We have shown that cyclic shift interleavers are capable of outperforming 5-random interleavers when the parameters are appropriately selected. Most

Design of cyclic shift interleavers

27

importantly, the performance improvement are accompanied by lower design complexity and less memory requirements relative to S-random interleavers. Acknowledgement The authors would like to thank the anonymous reviewers for their valuable comments.

REFERENCES [1] BERROU (C.) and GLAVIEUX (A.), "Near optimum error correcting coding and decoding: Turbo-codes," IEEE Trans. Commun., vol. 44, n° 10, Oct. 1996, pp. 1261-1271. [2] BENEDETTO (S.) and MONTORSI (G.), "Unveiling turbo-codes: Some results on parallel concatenated coding schemes," IEEE Trans. Inform. Theory, vol. 42, n°2, Max. 1996, pp. 409-428. [3] DIVSALAX (D.) and F. POLLARA (F.), "Turbo codes for PCS applications"; in Proc. ICC'95, Seattle, WA, June 1995, pp. 54-59. [4] BARBULESCU (A.S.) and PIETROBON (S.S.), "Terminating the Trellis of Turbo Codes in the Same State," Electronics Letters, vol. 31, pp. 22-23, 1995. [5] RICHER (I.), "A Sample Interleaver for Use with Viterbi Decoding", IEEE Trans. Commun., COM-26, pp. 406-408, March 1978. [6] YUAN (J.), VUCETIC (B.) and FENG (W.), "Combined turbo codes and interleaver design", IEEE Trans. Commun., vol. 47, n° 4, Apr. 1999, pp. 484-487. [7] VUCETIC (B.) and YUAN (J.), "Turbo codes: principles and applications", Kluwer Academic Publishers, 2000. [8] FENG (W.), YUAN (J.) and VUCETIC (B.), "A code matched interleaver design for turbo codes", submitted to IEEE Trans. Commun. [9] RAMSEY (J.L.), "Realization of optimum interleavers", IEEE Trans. Inform. Theory, vol. 16, n°3, May 1970, pp. 338-345. [10] FORNEY (G.D.), Jr., "Burst-correcting codes for the classic bursty channel", IEEE Trans. Commun., vol. 19, n°5, Oct. 1971, pp. 772-781. [11] HALL (E.K.) and WILSON (S.G.), "Convolutional interleavers for stream-oriented parallel concatenated convolutional codes", in Proc. 1998 IEEE Int. Symposium on Inform. Theory, MIT, Cambridge, MA USA, Aug. 1998, p. 33. [12] HOKFELT (J.), EDFORS (0.) and T. MSSENG (T.), "A turbo code interleaver design criterion based on the performance of iterative decoding", IEEE Communications Letters, vol. 5, n°2, Feb. 2001. [13] TAKESHITA (O.Y.) and COSTELLO (D.J), Jr, "New Classes of Algebraie Interleavers for Turbo Codes", Proc. ISIT98, MA, USA, Aug. 1998, pp. 419. [14] Le DANTEC (C.) and PIRET (P.), "Algebraie and combinatorial methods producing good interleavers", Proc. 2nd Inter. Symp. on Turbo Codes & Related Topics, Brest, France, Sep. 2000, pp. 271-274.

28

Turbo codes

[15] CROZIER (S.), GUINAND (P.), LODGE (J.) and Hunt (A.), "Construction and Performance of New Tail-Biting Turbo Codes", 6th International Workshop on Digital Signal Processing Techniques for Space Applications (DSP'98), Estec, Noordwijk, The Netherlands, Sept. 1998, paper 1.3. [16] ANDERSON (J.D.), "interleaver design for turbo coding", Proc. Inter. Symp. on Turbo Codes and Related Topics, Brest, France, Sep. 1997, pp. 154-157. [17] DANESHGARAH (F.) and MONDIN (M.), "Design of interleaver for turbo codes based on a cost function", Proc. Inter. Symp. on Turbo Codes and Related Topics, Brest, France, Sep. 1997, pp. 255-258.

Chapter 3

Code characteristic matching for iterative decoding of serially concatenated codes ST. Brink Bell Laboratories, Lucent Technologies, New Jersey, USA

I. Introduction Since the discovery of parallel concatenated "turbo" codes [1], iterative decoding has created a lot of interest in the field of digital communications. The "classic" regular turbo code was already approaching the fundamental limits of error correcting coding [2] that closely, that it appeared hard to beat this reference. With recent improvements in the area of irregular low-density parity check codes (LDPCC) [3, 4], irregular turbo-like codes [5], and irregular repeat accumulate codes (ira) [6], code designs have become available which can outperform the classic regular turbocode. For this, particular attention has been paid to the convergence behavior of the iterative decoder, using methods of density evolution as proposed in [7]. In this paper we present code characteristic matching to design serially concatenated codes [8] composed of simple block and convolutional codes. Code characteristic matching uses extrinsic information transfer characteristics based on mutual information to find matching pairs of inner and outer decoders. This proves to be particularly useful for optimizing the convergence behavior of concatenated codes in the turbo cliff region, provided that long sequence lengths are used. A decoding trajectory visualizes the exchange of extrinsic information between inner and outer decoder in the Extrinsic Information Transfer Chart (EXIT chart). In [9] the EXIT chart was introduced to provide design guidelines for constituent codes of parallel concatenated codes (PCC). In this paper we extend these results to serially concatenated codes (scc). We do not claim to present a rigorous proof of stability and convergence of iterative decoding; however, simulation results suggest that the EXIT chart predicts the best possible convergence behavior of the iterative decoder for large interleaving depth.

30

Turbo codes

The paper is organized as follows: Sections III, IV introduce extrinsic information transfer characteristics for the inner and outer decoders. In Section V the convergence of the iterative decoder is traced in the EXIT chart. Section VI explains how to obtain bit error rate (BER) estimates from the EXIT chart. Code design examples with component codes chosen according to optimizing convergence behavior are given in Section VII. In Section VIII we illustrate the robustness of the mutual information measure by re-computing transfer characteristics for different a priori distributions and draw the comparison with transfer characteristics based on signal-to-noise ratio (SNR) measures. Finally, Section IX presents some conclusions.

II. Iterative decoding of serially concatenated codes The iterative decoder for scc is shown in Fig. 1. For each iteration, the inner decoder (soft in/soft out decoder, BCJR algorithm [10]) takes channel observations Z and a priori knowledge A1 on the inner information bits and outputs a posteriori soft values D1. The extrinsic and channel information E1= D1- Al is passed through the bit deinterleaver to become the a priori input A2 for the outer decoder. The outer decoder feeds back extrinsic information E2 = D2 - A2 which becomes the a priori knowledge Al for the inner decoder. The variables Z, A1, D1, El, A2, D2 and E2 denote log-likelihood ratios (L-values [11]). In this paper we restrict ourselves to an overall code rate of R = R1 • R2 = 1 • 1/2 = 1/2, and thus all Eb/N0 - values are given with respect to an R = 1/2 code.

Figure L Encoder, channel and iterative decoder.odewr

Code characteristic matching for iterative decoding

31

III. Inner decoder transfer characteristics In [12] and [13] the authors derive a soft input/soft output Viterbi algorithm (SOVA) targeted to be used as, e. g, an inner decoder in a serial concatenation with an outer algebraic (Reed-Solomon) code. Reference [13] introduces the notion of channel coding as being a means for improving the SNR at the receiver by exploiting the code constraints. Correspondingly, a soft in/soft out decoder can be viewed as a non-linear "SNR filter", or "SNR amplifier", with applications to iterative decoding [14], and convergence analysis thereof [15,16]. In this Section we apply the idea of decoder transfer characteristics to the description of iterative decoding of serially concatenated codes. For this we measure the mutual information between transmitted bits and the corresponding a priori input/extrinsic output at the inner and outer decoder. The inputs to the inner decoder are the noise-corrupted channel observations Z and the a priori knowledge A l on the inner information bits. The decoder outputs extrinsic and channel information E1. From simulations of the iterative decoder we observed thattheextrinsic information L-values E2 (i. e. A1)asfedbackfromtheouter decoder are almost Gaussian distributed, as already noticed in [17]. Additionally, large interleaves keep the a priori L-values A1 fairly uncorrelated over many iterations. Hence, it seems appropriate to model the a priori input A1 by applying an independent Gaussian random variable nA1 with variance σ2A1 and mean zero. In conjunction with the known transmitted inner information bits x1 ε {± 1} we write

such that A1 is Gaussian distributed with mean x1 • μA1 and variance σ2A1 . Since A1 is supposed to be an L-value based on Gaussian distributions the mean value μAl must fulfill

and thus the conditional probability density function (PDF) is

Note that (2) can also be derived using the symmetry condition of [7]. To measure the information content of the a priori knowledge, mutual information IA1 = I(X1; A 1 ), 0 IA1 1, between transmitted inner information bits Xl and the L-values Al is used [18].

32

Turbo codes

With (3), equation (4) becomes

For abbreviation we define

The function J(σ) cannot be expressed in closed form. It is monotonically increasing and thus reversible, σA1 = J - l (I A 1 ). Mutual information is also used to quantify the extrinsic output IE1 = I(X1; E1); it is computed according to (4), using extrinsic output PDFS pE1 (ξ|Xl = -1) and PE1(ξ| Xl = 1) respectively. Viewing IE as a function of IA and the Eb/N0-value, the inner extrinsic information transfer characteristics are defined as

To calculate the characteristic T 1 (I A1 , Eb/N0) for a desired (IA1 , Eb/N0) - input combination, the distributions pE1 are most conveniently determined by means of Monte Carlo simulation. For this, the independent Gaussian random variable of (1) with σA1 = J- l (I A 1 ) is applied as a priori input to the inner decoder of interest. Note that using histogram measurements of the PDFS pE1 implies that we consider per symbol mutual information (rather than per sequence), neglecting any memory of the L-values E1. The rationale behind this is the fact that any additional information given by the memory of the samples E1 is lost after interleaving anyway, so the next decoder stage (outer decoder) can only experience the benefits of per symbol mutual information. Fig. 2 shows transfer characteristics of some recursive and non-recursive inner rate 1 convolutional codes at Eb/N0 = 1 dB (scc code rate 1/2) with feedback polynomial Gr and feedforward polynomial G. The generator polynomials are given in octal numbers, with the most significant bit denoting the very left (input) connection to the shift register. Different shapes of the inner transfer characteristics can be obtained by changing the code memory and the feedforward/feedback polynomials of the shift register. A bigger code memory tends to result in a steeper

Code characteristic matching for iterative decoding

33

Figure 2. Extrinsic information transfer characteristics of some recursive and non-recursive inner rate 1 decoders (E b /N 0 = IdB). curve with a lower IE1 (I A1 = 0)-value. For most codes with feedforward polynomial G being different from a power of two, we find IE1 (0) 0 which makes them inappropriate for use as inner codes in an iterative decoding scheme, as will become more obvious in Section V. The transfer characteristic of the non-recursive memory 1 code does not reach (IA1, IE1) (1, 1), causing yet another problem for the iterative decoding process to converge towards low BER. The value of IE1 (IA1 = 1) is dominated by the average Hamming distance dH between codewords y1| x1,k=-1 and y1|x1f°rfor transmitted information bits x1,k= - 1 and x1,k = at discrete time k. For non-recursive inner codes, dH is limited by the code memory V1; the coded bit y1,k at discrete-time k is only influenced by the past v1 + 1 (constraint length) information bits x 1 , j = k - v l , ..., k at the input of the inner encoder; therefore, only increasing the code memory - and not increasing the sequence length - does contribute to a bigger dH. For recursive shift-registers, the coded bit y1, k at discrete time k is influenced by the current and all previously transmitted information bits x1, j = 0, .... k, and thus dH increases with longer sequence length. That is why a recursive memory one code can already achieve IE1 (1) 1 for sufficiently long sequence length. The curves in Fig. 2 were computed with a sequence length of 104 bits such that tail effects of open/terminated trellises can be neglected.

34

Turbo codes

IV. Outer decoder transfer characteristics The outer transfer characteristic describes the input/output relation between outer a priori input A2 and outer extrinsic output E2. Note that A2 and E2 refeefer to therto thehe outer coded bits.

For the computation of the mutual information IE2 = I(Y2; E2), we assume A2 to be Gaussian distributed and apply the same pragmatic calculation of transfer characteristics as introduced in Section III, using (3), (4), and (5) with respect to the transmitted outer coded bits Y2, working with measurements of extrinsic output PDFS p E2 (ξ|Y 2 = -1) and P E 2 ( ξ | Y 2 = 1) respectively. It is instructive to recognize that the outer transfer characteristics are independent of the Eb/N0 value, as there is no additional decoder input for the channel observations. Extrinsic transfer characteristics of rate 1/2 outer codes over a range of different code memories are given in Fig. 3. The key in the upper left corner lists some systematic block codes of length N, with K systematic bits and N -K parity

Figure 3. Extrinsic information transfer characteristics of outer rate 1/2 decoders for some block and convolutional codes (note: a priori input on ordinate, extrinsic output on abscissa).

Code characteristic matching for iterative decoding

35

check bits. The generator polynomials Gi describing the parity check equations with respect to the systematic bits are given in octal numbers. The key in the lower right corner shows some recursive systematic convolutional codes of different memory. Note that the axes are swapped: the input IA2 is on the ordinate, the output IE2 on the abscissa. This is in preparation of the design tool of Section V where we connect both inner and outer transfer characteristic in a single diagram. From Fig. 3 it becomes evident that the code memory is the single most important parameter of the outer code. Unlike the inner transfer characteristics, we found that different generator polynomials, or a change from recursive systematic convolutional codes (RSC) to non-systematic, non-recursive codes (NSC) has only a little influence on the shape of the outer transfer characteristic. In the remainder of this Section we further consider the two extreme cases of outer codes: a) the simple rate 1/2 repetition code, and b) a convolutional code with big code memory. To a): The a posteriori probability decoding rule for the repetition decoder turns out to be a swapping operation: for two outer a priori L-values A2,0, A2,1 stemming from the same outer information bit, the a posteriori L-values are calculated as D2,0 = D2,1 = A2,0 + A2,1 and thus the corresponding extrinsic L-values at the decoder output are E2,0 = D2,0 - A2,0 = A2,1 and E2,1 = D2,1 - A2,1 = A2,0 ,which corresponds to a simple swapping operation performed on the outer coded bits. The extrinsic output L-values E2 are just permuted (i.e. interleaved) versions of the a priori input L-values A2, and thus the extrinsic transfer characteristic of the repetition code is IE2 = IA2 (diagonal line). To b): For big code memory, the extrinsic transfer characteristic tends to approach a step function

Intuitively, the information bits are very much "hidden", or tight up, by the code constraints for big code memory, and no new (extrinsic) information can be gathered for a priori knowledge IA2 < 0.5. After exceeding a threshold of IA2 > 0.5, almost perfect extrinsic information can be gained from the code redundancy. The Shannon channel coding theorem states that reliable (error-free) transmission is possible for communication with a code rate R2 equal or smaller than the capacity (more precisely, mutual information) C of the channel. For our case, the code rate is R2 = 0.5, and IA2 directly stands for the mutual information C of the channel. Hence, error-free transmission is possible for R2 C = IA2, or 0.5 IA2, giving further justification for the step function-like shape of the outer decoder characteristic for big code memory.

36

Turbo codes

V. Extrinsic information transfer chart To visualize the exchange of extrinsic information, we plot both decoder characteristics into a single diagram, which is referred to as Extrinsic Information Transfer Chart. On the ordinate, the inner extrinsic and channel output IE1 becomes the outer a priori input IA2 (interleaving does not change the per symbol mutual information). On the abscissa, the outer extrinsic output IE2 becomes the inner apriori input IA1. Provided that independence and Gaussian assumptions hold for modelling extrinsic information (apriori information respectively), the transfer characteristics of Sections III, IV should approximate the true behavior of the iterative decoder. Moreover, the decoding trajectory that can be obtained graphically by simply drawing a zigzag-path into the EXIT chart (bounded by the decoder transfer characteristics) should match with the trajectory computed by simulations. In the following we consider the simple serially concatenated code of Fig. 4, with outer rate 1/2 recursive systematic convolutional code of memory 2 and inner differential code. Fig. 5 shows trajectories of iterative decoding at Eb/N0 = 0.6 dB, 1.1 dB and 2.5 dB as obtained from simulations of the iterative decoder. Note that the inner and outer transfer characteristics are just taken from Fig. 2 (for different Eb value) and Fig. 3. For Eb/N0 = 0.6 dB the trajectory gets stuck after about four iterations since both decoder characteristics do intersect. For Eb/N0 = 1.1 dB the inner transfer characteristic has been raised just high enough to open a narrow tunnel ("bottleneck") for the trajectory to "sneak through" and to converge towards low BER ( 10-6). At Eb/N0 = 2.5 dB, fewer iterations are needed to converge towards low BER. This "turbo cliff" - effect is illustrated by the corresponding BER chart in the lower right corner. The simulated trajectories match with the characteristics very well, owing to the large interleaver which ensures that the independence assumption of (1) holds over many iterations; in addition to that, the robustness of the mutual information

Figure 4. Serially concatenated code consisting of outer rate 1/2 memory 2 recursive systematic convolutional code and inner rate 1 memory 1 (differential) code.

Code characteristic matching for iterative decoding

37

Figure 5. Extrinsic Information Transfer Chart (EXIT Chart) of the scc of Fig. 4; inner transfer characteristics over a set of Eb/N0-Values; three iterative decoding trajectories at 0.6dB, 1.1 dB and 2.5 dB; interleaver size 4.105 bits.

measure allows non-Gaussian distributions of a priori information to be overcome, as will be further detailed in Section VIII. It should be emphasized, however, that the decoding trajectory is a simulation result of the iterative decoder, purely based on measurements of mutual information as taken from the output of the respective component decoder. Only to calculate the transfer characteristics of individual decoders we sought to impose the Gaussian and independence assumption on the a priori inputs A1, A2. For short interleavers the trajectory tends to diverge from the characteristics towards smaller extrinsic output after a few iterations, owing to increasing correlation of extrinsic information. In the same way as for our simple example of Fig. 5, the convergence behavior of any combination of inner and outer codes can be studied in the EXIT chart.

VI. Obtaining BER from EXIT chart The EXIT chart can be used to obtain an estimate on the BER after an arbitrary number of iterations. The soft output on the outer coded bits can be written as D2 = A2 + E2. For the sake of deriving a simple formula on the outer bit error

38

Turbo codes

probability Pb2, we assume the a priori knowledge A2 and the extrinsic output E2 to be Gaussian distributed. Consequently, the soft output D2 is Gaussian distributed with variance and mean value μD2 =σ 2 D /2, compare to (2). With the complementary error function, the bit error probability writes as

Assuming independence it is

The variancesσ2A2,σ2E2,are calculated by inverting the J(.) - function

Finally, with (11), (12) and (13) the result is

With (14) an estimate of the BER of the outer coded bits can be calculated (Fig. 6). For outer systematic codes, the systematic bits are part of the coded bits, and (14) also provides a good estimate for the BER on the outer information (i.e. systematic) bits, which is our figure of merit. For the inner decoder, the a posteriori output on the inner information bits can be written as D1 = A1 + E1, and with the same derivation as before we obtain an estimate on the BER of the inner information bits

\

As IA2 = IE1 on the inner transfer characteristic, and IA1 = IE2 on the outer transfer characteristic, equations (14) and (15) are equivalent. Note that the BER contour lines are independent of the Eb/N0 - value of the channel and can be used without change for all EXIT charts in this paper.

VII. Code design with the EXIT chart We are interested in finding serial code concatenations which allow convergence of the iterative decoding algorithm at Eb/N 0 - values close to the

Code characteristic matching for iterative decoding

39

Figure 6. Simulated trajectory of Iterative decoding at E b /N 0 = 1.3 dB with BER scaling as contour plot; interleaver size 4.105 coded bits, simulation length 107 information bits.

theoretical limit. The Shannon capacity limit for rate 1/2 codes on the binary input/continuous output additive white Gaussian noise channel is at Eb/N0| Sh = 0.19 dB. To characterize a particular code concatenation we use the pinch-off limit Eb/ N0 | off, that is, the Eb/N0- value at which inner and outer decoder transfer characteristics are just about to intersect in the EXIT chart, making convergence to low BER impossible. The pinch-off limit shows up in the BER chart as the turbo cliff. With the EXIT chart we can verify whether the inner and outer decoder interact efficiently. The outer extrinsic transfer characteristics of this Section are just taken from the set of curves depicted in Fig. 3. They are independent of the Eb/N0 value of the channel. The inner transfer characteristics are taken from a code search over inner rate one codes up to memory 6. While searching for good serial code concatenations we should be aware of the fact that we do not have the same flexibility as for the design of, e.g., irregular LDPC codes, where there are plentiful of parameters (degree sequences) to adjust. Moreover, we have to work with the transfer characteristics as given by the particular component code (memory, code polynomials), and are bound to finding good

40

Turbo codes

Table I. Comparison of BER predictions from EXIT chart (1.3dB). Columns of simulation results are marked with (s). inner decoder

outer decoder

BER (1.5)

BER (s)

BER (14)

BER. (s)

BER (s)

info bits

coded

pass

info bits

bits

coded bits

info bits

1.

2.10e-l

2.15e-l

1.72e-l

1.75e-1

1.77e-1

2.

1.59e-1 1.32e-l

1.60e-l 1.33e-1

1.40e-l

1.41e-l

1.43e-l

3.

l.l9e-2

4.

l.lle-1

1.13e-l

1.18e-l l.00e-1

9.99e-2

1.21e-1 l.Q2e-l

5.

9.40e-2 7.44e-2

9.40e-2

6.

8.09e-2 5.86e-2

8.07e-2 5.88e-2

8.26e-2 6.05e-2

7.

5.11e-2

3.26e-2

3.32e-2

3.44e-2

8.

2.45e-2

5.14e-2 2.53e-2

4.13e-3

4.78e-3

8.88e-3 2.82e-4

9.33e-3

9.

9.85e-3 4.99e-4

7.43e-2

4.61e-4

matching pairs of such codes. Those codes, however, exhibit - apart from the random interleaver - a simple, regular structure, with linear encoding/decoding complexity. As the transfer characteristics are computed based on individual component codes, it is imperative to verify the EXIT chart predictions by simulating the full iterative decoder for the code concatenations of interest; the result of this simulation is given by the decoding trajectory; an interleaver size of 4.105 bits is used. A. Changing Inner and Outer Code Parameters The pinch-off limit of the simple example in Fig. 4 is at about Eb/N0| off 1.01 dB. We can try to improve the convergence behavior by either changing the inner or the outer code parameters. For Fig. 7 we keep the outer code fixed to memory 2 and study the influence of changing the inner code parameters. The inner memory 4 code of the left EXIT chart does not match well to the outer memory 2 code, resulting in a "late" turbo cliff at about Eb/N0 | off 2.60dB. For the right EXIT chart we substituted every other coded bit of the inner memory 1 code by its systematic counterpart ("systematic doping"). By this, we are able to raise the beginning of the inner decoder transfer characteristic at the cost of sacrificing some extrinsic output strength at higher a priori input. Now, convergence is possible at 1 dB. For comparison, the transfer characteristic of the "undoped" inner memory 1 code of Fig. 4 is given as a dashed line in both charts for the respective Eb/N0- value.

Code characteristic matching for iterative decoding

Figure 7. EXIT Chans with outer memory 2 code and different inner codes.

41

42

Turbo codes

Fig. 8 shows the influence of changing the outer code parameters while keeping the inner code fixed to memory one. Apparently, a big outer code memory (left EXIT chart) does not go well with the inner memory 1 code;

Figure 8. EXIT Chans with inner differential code of memory 1 and different outer codes.

Code characteristic matching for iterative decoding

43

Table II. Abbreviated notation for inner and outer codes to be used in Tab.III. Inner Codes R,1 = 1

Outer Codes R2 = 1/2

(v2;Gr,2,G2)

(v1 ; Gr,1, G1)

rep. code

C'inJ

(1;03,02)

(Cout,.r

C-in:2

(2; 07, 04)

Cout,b

C i n ,3

(3; 013, 010)

Cout,l

(1; 03,02)

Cin, 3

(3; 017, 010)

Cout,2

(2; 07, 05)

Cin,,4

(4; 027, 020)

Cout,3

(3; 013, 015)

(4; 037, 020)

Cout,4

(4; 023, 037)

Cout,3

(5; 067, 045)

Cout,6

(6; 0147, 0117)

Cout,8

(8; 0435, 0777)

Cin,4'

(2, 4; 01, 03)

intersections of both curves make convergence impossible at ldB, and it requires anEb/N0-value of about 1.62dB to raise the inner transfer characteristic enough to allow for convergence to low BER. The right EXIT chart shows a better match of inner and outer code characteristics, resulting in an "earlier" turbo cliff; with the outer memory 1 code convergence is possible at 1 dB. For comparison, the transfer characteristic of the outer memory 2 code of Fig. 4 is given as a dashed line in both charts. The EXIT chart technique can be used to quickly evaluate the pinch-off limits of arbitrary inner and outer code combinations. For this we compute inner transfer characteristics over a set of Eb/N0 - values, say, from 0 dB to 3 dB in steps of 0.1 dB; the graphical intersection with the respective outer decoder transfer characteristic in the EXIT chart yields the desired pinch-off limit Tab. III gives an overview of pinch-off limits for some inner and outer code concatenations. B. Matching Inner and Outer Transfer Characteristics From the previous Section we learned that inner and outer transfer characteristics should match like two pieces of a jigsaw puzzle to achieve a low Eb/N0 | off value. Conversely, a mismatch of both characteristics results in an early intersection at low mutual information and, in turn, leaves "open space" in between the two characteristics unused. This waste of "convergence potential" directly relates to a high Eb/N0 | off - value. Keeping this in mind, we can more vividly outline the design target of seeking for low E b /N 0 | off as code characteristic matching.

44

Turbo codes

Table HI. Pinch-off limits Eb/N0 | off [dB] of some serial code concatenations on the Gaussian channel (R1 = 1, R2 = 1/2). Inner Codes

Outer Codes

Cin,l

Cin,2

Cin,3

Cout,,r

>3

0.88

0.83

Cout,b

1.52

0.68

Cout,1

0.88

C out,2

C in,4

Cin,4'

0.49

0.74

0.46

0.98

0.66

1.08

0.80

0.94

1.30

1.28

1.65

1.62

1.01

1.76

2.02

2.39

2.60

2.87

C out,3

1.13

1.99

2.21

2.66

2.82

>3

C out,4

1.21

2.12

2.31

2.80

2.95

>3

C out,5

1.34

2.32

2.49

>3

>3

>3

Cout.6

1.45

2.46

2.61

>3

>3

>3

C out8

1.62

2.66

2.78

>3

>3

>3

Cin,3'

In our code search we did not find inner codes which would match well with outer codes of large memory. However, we found inner transfer characteristics which match well with outer codes of small memory, or with an outer repetition code. In Section IV it turned out that the transfer characteristic of the outer repetition code is represented by a diagonal line I E 2 = IA2 in the EXIT chart. Consequently, a matching inner transfer characteristic should resemble the shape of a straight line from IE1(IA1 = 0) > 0 to (IA1, IE1) = (1, 1) to allow for steady convergence. Fig. 9 shows an inner memory 3 and an inner memory 4 code whose extrinsic information transfer characteristics are close to the desired shape. From Tab. III the pinch-off limits are determined to 0.49 dB for the memory 3 and 0.46dB for the memory 4 code. However, the EXIT chart teaches us that both inner codes still tend to exhibit a more or less pronounced bottleneck regions where convergence is slowed down. To further approach the Shannon limit, we need to find ways to eliminate this defect. C. Systematic Doping for Approaching the Capacity Limit In our code search over inner rate one codes we noticed that the transfer characteristics of the most promising candidates start at the origin IE1(IA1 = 0) 0, which, however, makes these codes unsuitable for iterative decoding; the iteration would not even get started. If we somehow could "open up" those inner transfer characteristics (i.e. achieve IE1 (IA1 = 0) > 0), we would be able to regain the

Code characteristic matching for iterative decoding

45

i (

aFigure 9. EXIT charts of two inner codes which match well to an outer repetition code.

46

Turbo codes

respective codes for use in an iterative decoding scheme. The trajectory then would find an entry point to initiate steady convergence towards low BER. The easiest way to achieve IE1 (I A1 = 0) > 0 is to inject particles of a priori knowledge at the inner decoder. This can be done by assuming a small number of bits to be known at the receiver (e.g. every 100th coded bit), like pilot symbols in a channel estimation algorithm. Then, effectively, the transfer characteristic starts at IA1 > 0, which provides the desired jump start for the channel and extrinsic output, IE1 (0) > 0. However, this increases the redundancy of the code, which is an undesired side-effect. Apparently, with zero a priori knowledge IA1 = 0 at the beginning of the iterative decoding process, no information about the inner systematic bits can be gained - the coded bits at the output of the inner recursive encoder are too severly tied up by the code constraints. This motivates the following idea: If some of the systematic bits bypass the inner encoder, they can serve as uncoded, readily available (though noisy) information at the decoder, comparable to imperfect a priori knowledge, which would contribute to achieve IE1 (0) > 0. We have already used this principle of systematic doping for the inner memory 1 code of Fig. 7. Fig. 10 shows such a serial concatenation with outer repetition code and inner recursive convolutional encoder. A switch enables the inner coded bits to be replaced by inner systematic bits ("doping"), with a systematic to coded bit ratio of rd = ns/nc. The term "doping" is used rather than "puncturing" to express the notion that the coded bits are substituted by their respective systematic counterparts, without changing the code rate. Only a few systematic bits are needed to "open up" an inner transfer characteristic, and thus the doping ratios used in the following are rather small, i.e. rd<< 1. With Fig. 11 we present two serial concatenations of an inner systematically doped code and an outer repetition code. Note that without systematic doping the inner transfer characteristics would start at the origin IE1(0) =0. The (4; 034,037)

Figure 10. Serially concatenated code consisting of outer rate 1/2 repetition code and inner rate 1 recursive convolutional code with systematic doping.

Code characteristic matching for iterative decoding

47

Figure 11. Exrr charts of systematically doped inner codes with transfer characteristics matched to the outer repetition code characteristic.

48

Turbo codes

- code of the EXIT chart on the left hand side (encoder structure see Fig. 10) has a rather low pinch-off limit of 0.41 dB, but obviously runs into a bottleneck problem at IA1 0.2. Opposed to that, the inner transfer characteristic of the (3;017,07) code almost resembles the ideal shape of a straight line from IE1 (0)> 0 to IE1 (1) = 1. Therefore, the pinch-off limit is very low, Eb/N0 | off 0.27dB [19]. Two concatenations of a systematically doped inner code and a weak outer block code are depicted in Fig. 12. The inner decoder transfer characteristics are matched to the S-shaped transfer characteristic of the outer block code. For comparison purposes, the transfer characteristic of the repetition decoder is shown as a dashed line. From Tab. IV it can be seen that both code concatenations have the same pinch-off limit at Eb/N0 | off = 0.38 dB. However, the codes are different as far as their convergence behavior is concerned: The (2; 06,07) - code converges fast for low to medium IA1, and runs into problems for IA1 close to one, as both characteristics close up. Conversely, the (2; 07,03) - code has a bottleneck-problem around IA1 0.5, but widens up towards IA1 close to 1. From the BER chart of Fig. 13 we can see that the (2; 06,07) - code shows a stronger flattening effect, whereas the serial concatenation with inner (2; 07,03) code has a sharp turbo cliff. The other codes are given as further references.

VIII. Robustness of mutual information Although the decoder transfer characteristics of Sections III, IV are computed based on a simple Gaussian model for a priori knowledge, they match very well Table IV. Selection of some inner codes with low pinch-off limits Eb/N0 /off [dB] in a serial code concatenation with an outer repetition or weak block code; (R1 = 1, R2 = 1/2), systematic doping of inner code with doping ratio rd = ns/nc. Outer

Inner Codes (v1 ;G r,1, G 1 )

Codes

rd = 1/15

1/30

1/50

1/50

(K, N; Gi)

(2;06,07)

(2;07,03)

(3;013,03)

(3;016,017)

rep. code

1.45

0.64

0.48

0.31

(2, 4; 01, 03)

0.38

0.38

0.78

1.20

Outer

Inner Codes (v1 ; Gr,1 , G1 )

Codes

1/50

1/50

1/50

1/50

(K , N ; Gi )

(3;017,07)

(4;034,037)

(4;037,03)

(4;037,07)

rep. code

0.27

0.41

0.37

0.34

(2, 4; 01, 03)

0.93

1.25

0.73

1.03

Code characteristic matching for iterative decoding

49

Figure 12. EXIT charts of systematically doped inner codes with transfer characteristics matched to the outer block code.

50

Turbo codes

Figure 13. BER curves of some rate 1/2 serially concatenated codes; interleaver size 3 .105 bits, 100 iterations.

with the iterative decoding trajectory which is a simulation result of the iterative decoder. We attribute this to the robustness of the mutual information measure against changes in the shape of the actual extrinsic (and, in turn, a priori) distributions. To illustrate the robustness of mutual information, we start with measuring what shapes of extrinsic output distributions pE1 are actually involved in the iterative decoding process. Then we find a handy mathematical model to describe these distributions, and, finally, re-compute the decoder transfer characteristics using a priori knowledge based on extreme cases of the actually measured distributions. A. Measuring the Actual Extrinsic Distributions We chose an inner rate 1 memory 4 code with polynomials (Gr, G) = (037,020) in combination with an outer rate 1/2 repetition code, yielding typical results for the distribution measurements. The corresponding EXIT chart is given in Fig. 14 with iterative decoding trajectory at 0.8dB. We measured the extrinsic distributions at the output of the inner decoder after zero, 5, 15 and 59 iterations; the results of these measurements for pE1ξ | X1 = 1) are depicted in Fig. 15. Obviously, the shape of pE1 is very different from Gaussian

Code characteristic matching for iterative decoding

51

Figure 14. EXIT chart with iterative decoding trajectory at Eb/N0 = 0.8dB; inner (4; 037,020) - code, outer repetition code; interleaver size 4 .105 bits.

at the beginning of the iteration, showing a positive skewness and kurtosis; after 15 iterations it approaches the shape of a Gaussian distribution very closely; for more iterations, the maximum value is shifted more towards higher L-values, and the shape becomes asymmetric again, with negative skewness and kurtosis. B. Refining the Model of a priori Knowledge We observed that the actually involved distributions pE1 can be well modeled by L-value distributions based on additive generalized Gaussian distributed noise. The class of Generalized Gaussian Distributions (GGD, e.g. [20]) is defined as

with

52

Turbo codes

Figure 15. Measured shapes of extrinsic L-value distributions at output of inner decoder after a different number of iterations (compare to Fig. 14); ordinate normalized to one.

and

The Gamma-function is given by the integral

The parameter a > 0 describes the exponential decay rate, > 0 the abscissa scale parameter. The mean value of (16) is zero, the variance is β2. Fig. 16 depicts some GGD for different values of a; special cases include a = 1 (Laplacian distribution), a = 2 (Gaussian distribution), and a (uniform distribution).

Figure 16. Generalized Gaussian Distributions (GGD) for different exponential decay rates a; variance is set to one, ordinate normalized to one.

Code characteristic matching for iterative decoding

53

The a priori L-values A1 are generated based on the transmitted inner information bits x1 ε{± 1} and the additive noise model

Let nA1 be a generalized Gaussian distributed random variable with exponential decay rate a and variance β2; then, the likelihood function of the additive GGD-channel of (20) is

The log-likelihood values (L-values) are calculated as

3giving rise to

. The function L(zAl) maps the channel value zA1 from the additive GGD-channel to the corresponding L-value A1. A closed-form result for the PDF of the L-values A1 is only known for special cases, the most familiar one being the Gaussian case (a = 2), where (23) reduces to L(zAl ) = 2/β2 • Z A 1 , yielding an L-value distribution which is Gaussian as well, compare to (1). For general a we computed the L-value distributions by Monte-Carlo simulation (histogram measurements), applying (21) and (23). It should be emphasized that the GGD are only used for modelling the additive noise random variable nA1 of (20), and just indirectly - through transforming function (23) - for modelling the a priori L-values A1. Therefore, the shape of the L-value distributions of A1 will be different from GGD. Using methods of curve fitting, we determined the best choice of α , β to approximate the shapes of the actual extrinsic output distributions pEl of Section VIII-A. Tab. V shows the result of the curve fitting which sought to minimize the Eucledean distance (mean squared error) between actual measured distribution and generated L-value distribution. We found that curve fitting applying minimization of cross-entropy between both distributions yields very much the same results. The exponential decay rate a is the crucial parameter to control the shape of the L-value distribution. From Tab. V it can be seen that the decay rate spans from α 3 at the beginning of the iteration (positive skewness and kurtosis of the L-value distribution) down to α 1.6 towards the end of the iteration (negative skewness and kurtosis). After 15 iterations we have the closest approach to the Gaussian shape, with α 2. Figure 17 shows that the L-value distributions based on the additive GGD channel model match with the actually measured distributions PE1 of Figure 15 reasonably well (see also parameters α,β of Tab. V marked with an asterisk).

54

Turbo codes

Table V. Extrinsic output PDFS approximated by L-value distributions based on additive GGD noise model with exponential decay rate a and abscissa scale parameterβ.

GGD

GGD

it.

a

β

it.

a

β

().*)

2.993

3.488

16.

1.969

0.610

1.

2.422

2.359

17.

1.917

0.554

2.

2.336

1.939

.1.8.

1.897

0.509

3.

2.315

1.682

19.

1.883

0.477

4.

2.270

1.524

20.

1.851.

0.440

5.*)

2.260

1.392

21.

1.844

0.416

6.

2.242

1.296

22.

1.818

0.393

7.

2.222

1.200

23.

1.812

0.381

8.

2.201

1.118

24.

1.803

0.361

9.

2.171

1.044

25.

1.784

0.347

10.

2.148

0.987

30.

1.732

0.300

11.

2.133 .0.917

35.

1.723

0.275

12.

2.113

0.854

40.

1.707

0.250

13.

2.064

0.791

45.

1.695

0.240

14.

2.034

0.729

50.

1.675

0.225

15.*)

2.000

0.670

59. *>

1.601

0.195

Figure 17. Shapes of extrinsic L-value distributions (a priori distributions respectively) based on additive GGD noise model in comparison to actually measured L-value distributions.

Code characteristic matching for iterative decoding

55

C. Re-Computing Extrinsic Transfer Characteristics Based on the more general model for a priori knowledge introduced in the previous Section, we can re-compute extrinsic transfer characteristics for different values of a and compare them to those obtained using the Gaussian a priori case (α = 2). For Fig. 18 we re-computed the inner decoder transfer characteristic of Fig. 14 for the two extreme cases a = 3 and a = 1.6, correspondingly using shapes of PDFS being close to the actual a priori distributions at the beginning and towards the end of the iteration. The transfer characteristics based on mutual information (chart on left hand side) stay virtually the same for different a, and hardly any difference between the three curves can be seen. This explains why the decoding trajectory stays on the Gaussian based transfer characteristic even though the shape of the involved PDFS changes significantly during the course of the iteration (typical α 1.6...3). The right-hand chart of Fig. 18 shows transfer characteristics based on SNR values (linear scaling, not in dB), that is, we use γE1 =μ2E1/σ2E1 to visualize the extrinsic output, with

and

The a priori input is characterized by the SNR measure γA1 respectively. As evident from Fig. 18, the SNR transfer characteristics are very sensitive to the shape of the a priori distributions, as controlled by the decay rate a, and the decoding trajectory based on SNR measures does not match well, which can be partly attributed to the separate estimation of μE1 and σ2E1 , not involving any Gaussian assumption. Mutual information turns out to be the more robust measure. Apparently it is essential for the iterative decoding process that the mutual information is carried over from the extrinsic output to the next a priori input, and the shape of the involved distributions is of only minor importance. Moreover, for calculating the transfer characteristics based on mutual information, we could apply any of the values a. From Fig. 17 we can see that the Gaussian case with a = 2 represents an average distribution, and thus the Gaussian model of (1) is a good choice which, in addition, is mathematically convenient to handle.

IX. Conclusion We have shown that the EXIT chart can be used to find pairs of inner and outer decoders which interact efficiently in iterative decoding. The serial

56

Turbo codes

Figure 18. Mutual information transfer characteristics (left) and SNR transfer characteristics (right) computed using a priori L-value distributions based on three different GGD decay rates a = 1.6, a = 2 (Gaussian), and a = 3.

Code characteristic matching for iterative decoding

57

concatenation of inner rate one recursive convolutional codes of small memory with outer repetition codes or weak block codes turned out to yield codes which achieve the closest approach to the Shannon limit while being encodable and decodable in linear time. By refining the model of a priori knowledge using generalized Gaussian distributions and recomputing extrinsic transfer characteristics, we illustrated the robustness of the mutual information measure against changes in the shape of the extrinsic distributions (a priori distributions respectively). This contributes to making the EXIT chart an accurate engineering tool for predicting the convergence behavior of serial code conatenations for long interleavers.

REFERENCES [1] BERROU (C.), GLAVIEUX (A.), THITIMAJSHIMA (P.), "Near Shannon limit error-correcting coding and decoding: Turbo-codes," Proc. IEEE Conf. on Commun., pp. 1064-1070, May 1993. [2] SHANNON (C.E.), "A mathematical theory of communication," Bell Syst. Tech. Journal, vol. 27, pp. 379-423, 623-656, July, Oct. 1948. [3] RICHARDSON (T.J.), SHOKROLLAHI (A.), URBANKE (R.), "Design of provably good low-density parity-check codes," submitted IEEE Trans. Inform. Theory, 1999. [4] CHUNG (S.Y.), FORNEY (G.D.), RICHARDSON (T.J.), URBANKE (R.), "On the design of low-density parity-check codes within 0.0045 dB of the Shannon limit," submitted IEEE Comm. Lett., 1999. [5] FREY (B.J.), MACKAY (D.J.C.), "Irregular turbo-like codes," Proc. 2nd Internat. Symp. Turbo codes, pp. 67-72, Sept. 2000. [6] JIN (H.), KHANDEKAR (A.), MCELIECE (R.), "Irregular repeat-accumulate codes," Proc. 2nd Internat. Symp. Turbo codes, pp. 201-210, Sept. 2000. [7] RICHARDSON (T.J.), URBANKE (R.), "The capacity of low-density parity-check codes under message-passing decoding," submitted IEEE Trans. Inform. Theory, 1999. [8] BENEDETTO (S.), DIVSALAR (D.), MONTORSI (G.), POLLARA (F.), "Serial concatenation of interleaved codes: performance analysis, design and iterative decoding," IEEE Trans. Inform. Theory, vol. 44, pp. 909-926, May 1998. [9] TEN BRINK (S.), "Iterative decoding trajectories of parallel concatenated codes," Proc. IEEE/ITG Conf. on Source and Channel Coding, pp. 75-80, Jan. 2000. [10] BAHL (L.), COCKE (J.), JELINEK (F.), RAVIV (J.), "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. Inform. Theory, vol. 20, pp. 284-287, March 1974. [11] HAGENAUER (J.), OFFER (E.), PAPKE (L.), "Iterative decoding of binary block and convolutional codes," IEEE Trans. Inform. Theory, vol. 42, pp. 429-445, March 1996. [12]

BATTAIL (G.), "Pondération des symboles décodés par I'algorithme de Viterbi," Ann. Télécommun., vol. 42, pp. 31-38, Jan. 1987.

58

Turbo codes

[13] HAGENAUER (J.), HOEHER (P.), "A Viterbi algorithm with soft-decision outputs and its applications," Proc. IEEE Globecom Conf., pp. 1680-1686, Nov. 1989. [14] LODGE (J.), YOUNG (R.), HOEHER (P.), HAGENAUER (J.), "Separable MAP filters for the decoding of product and concatenated codes," Proc. IEEE Conf. on Commun., pp. 1740-1745, May 1993. [15]

EL GAMAL (H.), HAMMONS (A.R.), "Analyzing the turbo decoder using the Gaussian approximation," submitted lEEE Journ. Sel. Areas of Comm., Jan. 2000.

[16] DIVSALAR (D.), DOLINAR (S.), POLLARA (F.), "Low complexity turbo-like codes," Proc. 2nd Internat. Symp. on turbo codes, pp. 73-80, Sept. 2000. [17] WIBERG (N.), Codes and decoding on general graphs. PhD thesis, Linköping University, 1996. [18] COVER (T.M.), THOMAS (J.A.), Elements of Information Theory. New York: Wiley,

1991. [19] TEN BRINK (S.), "Rate one-half code for approaching the Shannon limit by O.I dB," IEE Electron. Lett., vol. 36, pp. 1293-1294, July 2000. [20]

FARVARDIN (N.), VAISHAMPAYAN (V.), "Optimal quantizer design for noisy channels: An approach to combined source-channel coding," IEEE Trans. Inform. Theory, vol. 33, pp. 827-838, Nov. 1987

Biography Stephan Ten Brink received the Dipl.-Ing. degree in Electrical Engineering and Information Technology from the University of Stuttgart, Germany in 1997. From 1997 to 2000 he was a research assistant at the Institute of Telecommunications, Stuttgart, working towards the doctoral degree. Since November 2000 he is with the Wireless Research Lab of Bell Laboratories, Lucent Technologies in Holmdel, New Jersey, USA. His research interests include multiuser detection, error correcting coding and channel estimation for digital communication systems.

Chapter 4

Noncoherent iterative decoding of spectrally efficient coded modulations G. Ferrari Communication Sciences Institute, Univ of Southern California, Los Angeles, USA

G. Colavolpe and R. Raheli Università di Parma, Dipartimento di Ingegneria dell' Informazione, Parma, Italy

I. Introduction Since their appearance, concatenated codes with iterative decoding [1, 2] have stimulated a great research interest because of their performance close to the Shannon limit Due to the growing data flow in future communication systems, where concatenated codes could be used, it will be more and more important to achieve high bit-rate transmissions, i.e., merging large coding gains with spectral efficient modulations. Hence, a very promising research area is related to the combination of concatenated codes and iterative decoding [1, 2] with modulation schemes which allow bandwidth efficiency, such as trellis coded modulation (TCM) [3].

Possible combinations of concatenated codes and spectrally efficient modulations have been considered in the literature. They are usually referred to as turbo trellis coded modulation (T-TCM) schemes. The first scheme in the literature appeared in [4], where the output bits of a turbo code are mapped, after puncturing, to a phase shift keying (PSK) or quadrature amplitude modulation (QAM) constellation. Another example of "pragmatic approach" to spectrally efficient modulations for turbo coded systems bas been proposed in [5]. In [6] an "ad-hoc" approach has been considered, by using Ungerboeck codes [3] as component codes and puncturing the modulated symbols. In [7] possible schemes to jointly optimize the parallel concatenated code and the mapping are proposed. In [8] a T-TCM scheme identical to that proposed in [6] is described and a suitable application of soft-output Viterbi algorithm (SOVA) to multilevel modulation is considered. The versatility of T-TCM schemes, besides the performance, is the main concern in [9].

60

Turbo codes

All the proposed schemes [4 - 9] consider transmission over an additive white Gaussian noise (AWGN) channel. It becomes a difficult task to extend the proposed structures to channels having memory. Bandpass transmission channels can be modeled as noncoherent in the sense that the transmitted signal undergoes an unknown phase rotation. This static phase rotation is responsible for an unlimited memory, at least in principle. Recently, noncoherent iterative decoding schemes for concatenated codes have been proposed based on suboptimal soft-output decoding algorithms suited for noncoherent channels [10]. The interest in noncoherent decoding algorithms to be used in iterative processing arises because phase-tracking schemes may deliver an unreliable phase estimate or require use of pilot symbols to avoid tracking losses for very low values of signal-to-noise ratio typical of concatenated coding schemes. Furthermore, noncoherent schemes exhibit inherent robustness to phase and frequency instabilities such as those caused by phase noise and uncompensated time-varying frequency offsets in local oscillators and Doppler shifts in wireless channels. These problems become more critical with an increased constellation size, as in the schemes considered in this paper. Moreover, since there is no inherent performance degradation in noncoherent decoding, provided the channel phase is sufficiently stable [11], an extension of the schemes proposed in [10], where binary phase shift keying (BPSK) was considered, to structures with an increased spectral efficiency, is challenging and of interest. In this paper, we propose possible solutions for noncoherent decoding of concatenated codes with spectrally efficient modulations. We consider two main classes of schemes. A first class is obtained by concatenating parallel coding schemes (turbo codes) with a differential code. In this case, at the receiver side we consider separate detection and decoding: a noncoherent differential detector is followed by a coherent turbo decoder. A second class is obtained by considering serially concatenated coding structures [2, 12] and parallelly concatenated coding schemes derived from the structures proposed in [7]. At the receiver side we consider joint detection and decoding for the component decoders which directly receive the channel outputs (the inner decoder for serially concatenated codes and both component decoders for parallel schemes). The basic noncoherent decoder uses the noncoherent soft-output algorithm proposed in [10], where a parameter N is related to the assumed phase memory. In order to achieve satisfactory decoding performance, N must be sufficiently large. Nonetheless, since the memory and computational requirements grow exponentially with N, it becomes essential to apply reduced-state techniques, such as those recently proposed in [13]. In Section II, we extend the considered soft-output noncoherent decoding algorithm to M-ary modulations. In Section III, we describe a suitable state reduction technique. In Section IV, we consider spectrally efficient schemes which employ separate detection and decoding at the receiver side, whereas in Section V we propose schemes which employ joint detection and decoding at the

Noncoherent iterative decoding of coded modulations

61

receiver side. Numerical results are presented in Section VI and conclusions are drawn in Section VII.

II. Noncoherent soft-output decoding In this section, we extend the noncoherent soft-output algorithm proposed for binary modulations in [10] to Af-ary modulations. The algorithm will be described in the special case of trellis coded modulation [3], where each information symbol is related to more than one bit and the output symbol is mapped to a multilevel complex symbol. For the formulation, we consider the case of a recursive trellis code [3, 14]. Generalizations to other codes, in particular differential encoding, are straightforward. We assume that a sequence of independent Af-ary information symbols {ak} undergoes trellis encoding. Each information symbol ak corresponds to a group of m = Iog2 M bits, i.e., ak = (a^\ ..., ojj.m)). These information bits are coded into m0 output bits, through a recursive encoding rule. The M0-ary output symbol (cj^, ..., c£*0)), where mg = log M0, is then mapped to a complex symbol ck belonging to the considered constellation. For systematic binary Ungerboeck codes of rate n/(n + 1) [3], M = 2" and MO = 2"+1. However, considering puncturing of the systematic output bits [7], Mg may be less than 2"+1. The sampled output {xk} of a filter matched to the transmitted pulse is a sufficient statistic for noncoherent decoding [15]. Each sample may be expressed as xk = cke*e + nk, where {nk} are samples of a zero mean complex-valued white Gaussian noise process and 9 is a random variable uniformly distributed in (-n, n]. We denote by x^ = {*t}fL], the entire sequence of received samples or observations, where K is the transmission length. Similarly, we denote by a* = {ak}%_\, and cf = (ct}^j the entire sequence of information and code symbols, respectively. We now extend the algorithm introduced in [10], relative to the case of a recursive systematic code (RSC), to a TCM code. Note that in this case we must substitute a single information bit with an Af-ary information symbol ak carrying Iog2 Af bits. Denoting by \ik an encoder state, the decoder state, which takes partially into account the channel memory, may be expressed as (1)

Sk = (ak_p a^2,..., ojMr+1, H^+i)

where N is an integer. To account for the possible presence of parallel transitions, it is convenient to identify a trellis branch ek by its beginning state Sk and driving information symbol ak In fact, two states Sk and Sk+l could be connected by parallel transitions driven by different information symbols. We showed in [10] that a good approximation of the a posteriori probability (APP) of symbol ak can be determined on the basis of the considered observations xf. Denoting this value by P{ak \ xf}, it may be written as (2)

P{

62

Turbo codes

in which a(ek )denotes the information symbol driving transition ek and

where [•]* is the conjugate operator, α denotes proportionality and I0(x) is the zeroth-order modified Bessel function ot the first kind. The sum in (2) is extended over all transitions of epoch k driven by information symbol ak The probability density function γk (ek ), relative to a particular trellis transition, depends on the coding structure. P{a k } and P{S - (e k )} denote the a priori probabilities of information symbol ak and state S - (e k ), respectively, where S - (e k ) denotes the beginning state of transition ek. In the following we will consider the expression of probabilities and probability density functions in the natural or logarithmic domain depending on the specific case, with the implicit assumption that the two formulations are equivalent. The performance of iterative decoding at low bit error rate (BER) can be improved considering bit-interleaving [16]. Hence, equation (2) has to be modified in order to provide the a posteriori probabilities of single bits. Assuming that the information bits are independent within each symbol1, we can consider P{ak} = P{a k ( 1 ) }... P{a k (m) } = Πmi=1 P{a (i) k }. In the case of an iterative decoding process, where P{a k (i) } are derived from input extrinsic information, this assumption is just an approximation. Equation (2) may be extended as follows (6)

where ak(ek)(i) denotes the i-th bit of the information symbol driving transition ek Similarly to the well-known algorithm by Bahl, Cocke, Jelinek, and Raviv (BCJR), the probability density functions ak(ek) and βk(ek) can be approximately computed by means of forward and backward recursions [10]. For this reason, we refer to the considered noncoherent soft-output algorithm as noncoherent BCJR-type algorithm. Denoting by S + (e k ) the final state of transition ek we may write (7) ak (ek) ( 1. This assumption is motivated by the presence of bit interleaving.

Noncoherent iterative decoding of coded modulations

63

where

(9)

7

(10) exp -

2σ2

and a (e k _ 1 ) denotes the information symbol "lost" in the transition ek_1' i.e., the oldest information symbol in the initial state S - (e k _l). The couple (S + (e k _1), a (ek-1)) uniquely identifies S - e k - 1 ). With the present definition of state Sk and for a recursive code, a(e k _ 1 ) = ak_N. In (7), the sum is extended over all the transitions of epoch k - 1 that end in the initial state of branch ek. The sum in (8), relative to the trellis section at epoch k + 1, may be interpreted similarly. Proper boundary conditions have to be considered in order to correctly initialize the forward and backward recursions.

III. Reduced-state algorithm Assuming that there are ξe possible encoder states, the decoder states are ξd = ξeMN-1. For example, if ξe = 16, M = 4 and N = 5, then ξd = 4096. In order to make noncoherent decoding with spectrally efficient modulations practical, a complexity reduction suitable to the proposed soft-output decoding algorithm is needed. We consider a recently proposed method which is an extension of reduced-state sequence detection (RSSD) [17-19] to BCJR-type algorithms [13]. The basic idea is to reduce the number of states and build a "survivor map" during the forward recursion (run first) to be used in the backward recursion and in the calculation of a posteriori probabilities. By defining a reduced state as sk = (ak_1' ..., a k _ Q+1 , μ k _ Q + l ), with Q < N, a transition εk in the reduced-state trellis is associated with the symbols (c k-Q+1 , ..., ck). We showed in [13] that a survivor may be associated with each transition εk in the reduced-state trellis. We may define by E(1)k-m(εk) the sequence of / transitions reaching epoch k - m along the survivor of transition εk, i.e., (εk_m_l+1, ...ε k _ m ) = (μk-m-l-Q+2' ak-m-Q+1) = (ck-m-l-Q+2, ..., ck-m-Q+1). The transitions encoder state / information symbols ak_j and code symbols ck_j in E(1)k-m(εk) are those associated

64

Turbo codes

with the path history of transition εk . Hence, the probability density function may be correctly computed, making use of the built survivor map, as

In the reduced-state trellis, in analogy with equation (2), we wish to approximate the a posteriori probability as

(12) where the two quantities αk and βk, in the reduced-state case, are defined as follows

For a recursive code we use the following approximation for the a priori probability of state sk[10]: P{s k }=Π Q - 1 i = 1 P{a k _ i }. l f Q < N , then αk), as defined in (13) for the reduced-state case, is different fromk (εk) as defined in (4) for the full-state case. Similarly, βk(εk ) β k (e k ). However, recursions for the computation of ak and βk may be found in the reduced-state case as well. The survivor map is built during the forward recursion and employed in the backward recursion and to evaluate γk in (1 1). Referring to the original formulation proposed in [10], the extension of the previously introduced general recursions (7) and (8) to (13) and (14) is not immediate. We now show the mathernatical derivation which leads to the forward recursion in the reduced-state trellis. More precisely, assuming the survivor map is known up to epoch k - 1 , we show how to extend it to epoch k. The detailed mathematical derivation of the forward recursion in [10] for the full-state case cannot be applied in this case. In fact, considering in the reduced-state case the same approach followed in [10], we should compute ak as follows k-Q

Assuming Q < N - 1 (state reduction), ak_Q depends on xkk_N+1. Hence making impossible to evaluate this probability.

Noncoherent iterative decoding of coded modulations

65

Another approach has to be considered. More precisely, we may express ak as follows

(16)

Since P{a k _Q| εk} = P{ak_Q}, observing that εk _1 is uniquely determined by ( a k _ Q , εk) and using as in [10] the approximation

we obtain the following approximate forward recursion in the reduced-state trellis

(18)

where αk-1(ε k-1) =p(x 1 k - N - 1 | xk-1k-N,k-1) in agreement with (13). The problem in the computation of (18) is the evaluation of the two probability density functions p(x k k _ N |a k _ Q , εk) and p(X k k - N + 1 | ε k ). In fact, since Q < N, each of the two probability density functions should be correctly computed by averaging over previous information symbols. Since at epoch k the survivor of each transition εk_l is known and since (a k _Q, ε k )= (ε k - 1 ,ε k ), we replace ε k_1,εk) with the probability density function _, k Q k-1,ε k), obtaining the following modified recursion (19)

I.

We now express the forward recursion (19) in the logarithmic demain as follows

66

Turbo codes

(20)

and using the

approximation [26] we obtain αk-Q

(21)

The choice of the survivor associated with εk may be based on this max operation, which can be correctly carried out since the quantities ) and In P{ak_Q} are known and In p(Xkk_N |Ek-2(N-Q)(εk-1), εk-1, εk) )can be computed. The term InP(Xkk-N+1|Ek-2(N-Q)(), εk-1 εkdoesnot affect the max operation and, as a consequence, the survivor selection, but it affects the exact value of αk( εk). We denote by εk-1max the previous transition of the survivor of transition εk Equivalently, the symbol ak-Qmax may be considered. Once the transitionεmaxk-1£^j*has been associated with εk, we replace ln p(xkk-N+1| with the following probability density function in the logarithmic domain

— where the expression x~y denotes that x and y are monotonically related quantities. The resulting forward recursion finally assumes the following form

l

The obtained forward recursion in the reduced-state trellis exhibits some analogy with the corresponding forward recursion in the full-state trellis [10].

Noncoherent iterative decoding of coded modulations

67

This indirectly confirms the validity of the proposed intuitive approximations. The backward recursion can be similarly obtained with the further simplification that the survivor map is now already available because previously determined during the forward recursion. More precisely, remarking that (εk, ak) uniquely identifies εk+1, the backward recursion may be written as follows

A problem connected with trellis coded modulations (especially when the code is recursive and M > 2) is the initialization of the recursion in the reduced-state trellis. Even if this aspect may be neglected when considering continuous transmissions, it is very important in packet transmissions, since interleaving operates on the entire packet, and hence it is not allowed to discard the first decoded symbols. The survivor map is built during the forward recursion, but the survivors should be already available at the very first steps of this recursion. Hence, an initial transient period for the forward recursion may be considered, where a fictitious phase memory parameter is increased by 1 at each step to reach the final value N as detailed in Appendix A. A valid alternative is considering a sequence of pilot symbols at the beginning of the transmission, in order to correctly initialize the forward recursion. The transmission efficiency is not appreciably reduced as the overhead is less than 1% with the packet lengths considered in the numerical results.

IV. Separate detection and decoding The first class considered of spectrally efficient schemes uses coding structures based on the concatenation of a T-TCM block followed by an inner differential encoder. At the receiver side, a noncoherent differential detector computes a posteriori bit probabilities which are passed to the following coherent turbo decoder as logarithmic likelihood ratios. The introduction of the inner differential encoding allows noncoherently non-catastrophic coding schemes [15, 20] to be obtained.

68

Turbo codes

The scheme propos6ed 666in [4] is basically a systematic turbo code of rate 1/3 followed by a puncturer and a mapper. An immediate extension of this scheme to noncoherent decoding is shown in Fig. 1, where a sequence of independent bits {uk} undergoes systematic turbo encoding. The code bits {bk} at the output of the turbo encoder are punctured according to some puncturing pattern [7]. The systematic and code bits, after being serialized, are interleaved. After interleaving they are grouped into m = log2 M bits and mapped into M-ary complex symbols, undergoing differential encoding. In all block diagrams describing the proposed schemes, we associate solid lines with binary symbols and dashed lines with complex symbols. Furthermore, for notational consistency with Section II, we use the symbols ak and ck to denote the input and output symbols, respectively, of the component encoders which are noncoherently decoded according to the described algorithm. Note that the symbols ak are rendered independent by the interleaver, as required by the algorithm in Section II. A similar scheme derived from one of the structures proposed in [7] is shown in Fig. 2. This scheme is basically composed of two parallelly concatenated Ungerboeck codes, and puncturing on information bits is considered before mapping. In this figure, we consider a sequence of couples of information bits (uk(1) ,uk(2)). Both encoders receive this sequence and generate two sequences of coded bits (bk(1), ,bk(1)), but the systematic bits are punctured symmetrically in the two codes, as shown in Fig. 2. We simply consider differential encoding after mapping. Strictly speaking, symbols ak are not independent as assumed in the derivation of the noncoherent decoding algorithm. However, we observed by simulation that breaking this dependence by means of an interleaver (both bit-wise before mapping or symbol-wise after mapping) does not yield substantial performance improvement. This behavior may be related to the implicit puncturing considered in the outer turbo code, which, in a certain sense, decorrelates the bits carried by a modulated symbol.

V. Joint detection and decoding In this case, we consider coding structures which do not employ differential encoding. The proposed schemes perform noticeably well in the case of ideal coherent decoding, i.e., assuming perfectly known phase at the receiver side. Serially concatenated codes [2] have been proven to have remarkable performance (even better than that of turbo codes) with very simple component codes. However, this performance is obtained at the expense of the spectral efficiency of the code. For example, with rate 1/2 inner and outer convolutional codes, the overall rate is 1/4. In order to increase the efficiency of the serial code, we consider an inner Ungerboeck code, as shown in Fig. 3. A similar structure was also considered in [12], where an outer Reed-Solomon code and an inner Ungerboeck code were used. Various combinations of serial codes are considered, where the outer convolutional code is a simple non-recursive code [2, 28],

Figure 1. Berrou-type turbo code followed by differential encoding on the modulated symbols.

70

Turbo codes

Figure 2. Benedetto-type turbo trellis code followed by differential encoding on the modulated symbols.

whereas the inner Ungerboeck code may be a recursive systematic code [14] or a non-recursive one [15]. It is worth noting that interleaving is bit-wise. In fact, the coded bits generated by the outer encoder are serialized and then interleaved. Fig. 3 refers to the case of an outer rate 1/2 code and inner rate 2/3 code. After interleaving, the bits feed the inner encoder in groups of two. The receiver is based on an inner noncoherent decoder of the inner Ungerboeck code, which gives a posteriori probabilities of the systematic bits of each modulated symbol (bits ak(1) and ak(2) in Fig. 3) by using the proposed reduced-state noncoherent algorithm. These soft-outputs are passed, as logarithmic likelihood ratios, to the outer coherent decoder, which acts as a soft-input soft-output module [21]. Obviously, the overall serial code is noncoherently non-catastrophic depending on the characteristics of the inner Ungerboeck code. Hence, particular care has to be taken in choosing this code as a noncoherently non-catastrophic code [15, 20]. Besides serially concatenated coding structures, it is interesting to explore the possibility of deriving parallelly concatenated coding structures suitable to combined noncoherent detection and decoding. The scheme proposed in [6], employing 8-PSK as modulation format at the output of each encoder, cannot be used when considering a noncoherent decoding strategy. In fact, because of puncturing, the proposed BCJR-type noncoherent decoding algorithm fails, since

Figure 3. Serial concatenated code constituted by an outer convolutional code and an inner Ungerboeck code.

Noncoherent iterative decoding of coded modulations

71

the metrics (9) and (10) reduce to 1 every other time epoch. Hence, every other transition in the decoder trellis the forward and backward recursions cannot be correctly extended. This problem obviously affects the reduced-state version of the algorithm described in Section III as well. On the contrary, the scheme proposed in [7] may be directly employed for transmissions over noncoherent channels, provided that the punctured component Ungerboeck codes are noncoherently non-catastrophic. With respect to the scheme proposed in [7], the only proposed modification consists of considering a single bit interleaver between the two Ungerboeck codes, instead of considering a different bit interleaver for each bit stream, as shown in Fig. 2. The input bit streams are serialized in a single bit stream before being interleaved. The interleaved bit stream is then parallelized and undergoes trellis encoding. We noticed that using a single interleaver instead of separate interleavers for each bit stream improves the performance, at least at high signal-to-noise ratios [16]. This is intuitively related to the fact that low reliability values associated with the couple of bits embedded in the same symbol may be better spread over the whole bit sequence. Hence, the receiver has a structure similar to that of a turbo decoder, where each component decoder uses the reduced-state noncoherent soft-output decoding algorithm previously introduced. This scheme may be considered as a direct extension to spectrally efficient modulations of the noncoherent schemes proposed in [10] for binary modulations. In Fig. 4, we consider, for simplicity, the case of a turbo trellis encoder where each of the component Ungerboeck encoders receives a sequence of couples of information bits (ak(1), ak(2)) and generates a parity bit (ck(0) in the upper encoder and dk(0) in the lower encoder). Puncturing may be considered on one of the two

Figure 4. Benedetto et al. turbo trellis coded scheme with 8-PSK modulation. Puncturing may be embedded in the component Ungerboeck codes to consider QPSK modulation.

72

Turbo codes

information bits (symmetrically in the two encoders): in the upper encoder the systematic bit ck(1) = ak(1) is transmitted, whereas in the lower encoder the bit dk(1) = aik(2) is transmitted.2 As shown in Fig. 4, after interleaving the two original bit streams have to be separated in order to consider proper puncturing aik(2) This is possible if the single interleaver is odd-odd, i.e., if it maps the bits stored in odd positions (bits {ak(1)} in odd positions, so that they can be recovered after interleaving. In this case, the single odd-odd interleaver is equivalent to two separate interleavers. A QPSK symbol is generated at the output of each component encoder. The spectral efficiency in this case is 1 bit per channel use. Although the above scheme with QPSK has remarkable performance with coherent decoding, i.e., with an AWGN channel, we observed that the performance noticeably degrades when considering noncoherent decoding, because of the catastrophicity of the code. This motivates the following modification. The spectral efficiency remains the same by eliminating puncturing, hence transmitting an 8-PSK symbol at the output of each component encoder. In this case, both systematic bits at the input of each encoder are mapped to the corresponding generated complex symbol (in Fig. 4 we indicate by dotted lines the supplementary connections which must be considered). This means adding redundancy, at the cost of decreasing the robustness of the modulation constellation. In the coherent case, the performance worsens, whereas in the noncoherent case it improves. Combining modulation and coding when dealing with a noncoherent channel cannot be carried out as in the case of an AWGN channel, because the noncoherent catastrophicity must be taken into account. Moreover, based on an exhaustive search using different constellation mappings, we noticed that the receiver performance in the noncoherent case does not seem to be appreciably influenced by the particular mapping rule (Gray, reordered, etc. [7]). The last parallel scheme considered deserves some remarks about its noncoherent catastrophicity. By reducing the modulation constellation from 8-PSK to QPSK, the code properties, in terms of modulated output symbols, may change. Hence, a code may not be simultaneonsly noncoherently non-catastrophic with and without puncturing. An open problem, currently under study, is the design of a good code for such a trasmitter structure when considering puncturing and QPSK. An important aspect to be considered is the rotational invariance of the component codes, taking into account puncturing and mapping. The methods proposed in [22-24] may be considered. A relevant analysis concerning the rotational invariance of T-TCM schemes is addressed in [25].

VI. Numerical results The performance of the receivers considered in Section IV and Section V is assessed by means of computer simulations in terms of BER versus Eb/No ,Eb being 2. The time instant of the second encoded bit is denoted by ik because of the presence of interleaving.

Noncoherent iterative decoding of coded modulations

73

the received signal energy per information bit and N0 the monolateral noise power spectral density. All the BCJR-type algorithms (noncoherent and coherent) considered in the proposed schemes apply the max-log approximation [26]. The generated extrinsic information is weighted by a coefficient as described in [27]. The value of this coefficient, obtained by trial and error, is about 0.3 in all schemes. In Fig. 5 the performance in the case of the code shown in Fig. 1 is presented. The code is that proposed in [4], with internal random 32 X 32 interleaver. The component RSC codes have generators G1 = 37 and G2 = 21.3 The turbo code has rate 1/2: every 2 information bits (m= 2) two code bits (m-m =2) are retained, with the puncturing pattern considered in [4]. After random bit interleaving, groups of m = 4 bits are mapped into a 16-QAM symbol. It is important to observe that the particular chosen mapping (Gray, reordered, natural, etc.) does not seem to noticeably influence the performance of the noncoherent system. This may be due to the presence of bit interleaving followed by differential encoding. The spectral efficiency of this system is 2 bits per channel use. The inner noncoherent differential detector at the receiver side applies the reduced-state noncoherent decoding algorithm proposed in Section II by reducing the number of states to 16. The phase parameter N is set equal to 4 or 6. For comparison, we also show the performance of the equivalent coherent system (i.e., considering differential

Figure 5. Performance of the system proposed in Fig. 1. The considered numbers of iterations are 1, 3 and 5 in all cases. 3. In the case of binary codes, for example RSC codes, we refer to the generators of the code as {Gi}, following the octal notation in [1,28]. When referring to the Ungerboeck codes, we indicate the generators of the code as {h i } following the octal notation in [7].

74

Turbo codes

encoding after the turbo code). In all cases the iterations are carried out in the outer coherent turbo decoder, and the numbers of considered iterations are 1, 3 and 5. It is evident that there is a slight improvement in the performance of the noncoherent system by increasing N from 4 to 6, and the loss, with respect to the noncoherent decoding, is about 1 dB at BER below 10-4. In Fig. 6 we show the performance in the case of non-coherent decoding of the code proposed in Fig. 2. The component 16-state recursive Ungerboeck codes of the turbo code have generators h0 = 23, h1 - 16 and h2 = 27 [7], and there are two different 32 X 32 random bit-interleavers. We consider a 16-QAM modulation format The system has an efficiency of 2 bits per channel use. As for the previous scheme, in this case also we consider the inner noncoherent detector with the number of states reduced to 16 and phase parameter N equal to 4 or 6, respectively. For comparison, we also show the performance of the equivalent coherent system. The numbers of iterations are 1, 3 and 6 in all cases. The performance loss of the noncoherent system with N = 6 with respect to the coherent system is about 1 dB. In Fig. 7, the performance in the case of the serial scheme shown in Fig. 3 is presented. The outer code is a non-recursive non-systematic convolutional code, with generators Gl = 7 and G2 = 5 and rate 1/2. The inner Ungerboeck code is recursive and systematic, with generators h0 = 23, h1 = 16 and h2 = 27 [7]. The inner interleaver is a 32 X 32 pseudorandom bit-interleaver. The bits at the output of the inner code are mapped to an 8-PSK symbol, considering reordered mapping [7]. The spectral efficiency of this system is 1 bit per channel use. The inner noncoherent decoder at the receiver side applies the reduced-state noncoherent

Figure 6. Performance of the system proposed in Fig. 2. The considered numbers of iterations are 1, 3 and 6 in all cases.

Noncoherent iterative decoding of coded modulations

75

Figure 7. Performance of the system proposed in Fig. 3. The outer code has 8 states and the number of iterations is 10 in all cases.

decoding algorithm proposed in Section II. Various complexity reduction levels, denoted by the couple (N, Q), are considered. The phase parameter N ranges from 4 to 16, while Q is kept fixed to 2 (64 states). For comparison, we also show the performance of the equivalent coherent system, i.e., assuming perfect knowledge of the channel phase at the receiver side. In all cases, the number of considered iterations is 10. As one can see, for increasing values of the phase parameter N the performance of the noncoherent scheme approaches that of the coherent scheme. For N = 16 the performance loss at a BER of 10-5 is around 1 dB. In Fig. 8, we consider again a coding structure as given in Fig. 3, with the same inner Ungerboeck code of Fig. 7 but considering an outer non-recursive non-systematic convolutional code, with generators G1 = 15 and G2 = 13 and rate 1/2. Hence, we replaced an outer 8-state code with a 16-state code. As in the previous case, the noncoherent inner decoder is identified by the couple (N, Q). The phase parameter N ranges from 4 to 16, and Q = 2. The numbers of considered iterations for both the coherent and noncoherent systems are 10. For N = 16 the performance loss of the noncoherent scheme with respect to that of the coherent scheme is only 0.5 dB at a BER of 10-4. In Fig. 9, we consider a serial structure similar to that in Fig. 3 given by an outer rate 2/3 non-recursive code with 16 states and generators G1 = 17, G2 = 06 and G3 = 15 [28] and an inner rate 3/4 non-recursive code with 8 states and generators Gl = 040, G2 = 402, G3 = 240 and G4 = 100 [15]. The inner random interleaver is bit-wise, with length 1536. The spectral efficiency is 2 bits per

76

Turbo codes

Figure 8. Performance of the system proposed in Fig. 3. The outer code has 16 states and the number of iterations is 10 in all cases.

Figure 9. Performance of a system similar to that in Fig. 3. The modulation format is 16-QAM and the number of iterations is 1, 5 and 10 in all cases.

Noncoherent iterative decoding of coded modulations

77

channel use and we consider a 16-QAM modulation format at the output of the inner code. The inner noncoherent decoder at the receiver side applies the reduced-state noncoherent decoding algorithm proposed in Section II.4 Various complexity reduction levels, denoted by the couple (N, Q), are considered. The numbers of iterations are 1, 5 and 10 in all cases, and a comparison with the equivalent coherent system is made. In Fig. 10, we show the performance in the case of noncoherent decoding of the code proposed in Fig. 4. The component 16-state recursive Ungerboeck codes of the proposed scheme have generators h0 = 23, h1 = 16 and h2 = 27 [7] and there is a single 64 X 64 psendorandom bit-interleaver [1]. At the output of each component encoder both the systematic bits are retained and mapped, together with the parity bit, to an 8-PSK symbol. Reordered mapping is considered in this case as well. The system efficiency is 1 bit per channel use. The two component noncoherent decoders have a number of states reduced to 64 and phase parameter N equal to 4 and 6, respectively. For comparison, we also show the performance of the equivalent coherent system. The numbers of iterations are 1, 3 and 6 in all cases. Considering N = 6 and 6 decoding iterations, the performance loss of the noncoherent scheme with respect to the coherent scheme is about 1.5 dB.

Figure 10. Performance of the system proposed in Fig. 4. The modulation format is 8-PSK and the number of iterations is 1, 3 and 6 in all cases. 4. The derivation carried out in Section II in the case of a recursive code may be easily extended observing that in this case å(ek_1) =(ak-2(1),ak-3(2)ak-1(3)).Hence, in this case the symbol a.(ek_1)iis not an information symbol, but it is composed by bits coming from information symbols relative to different time instants.

78

Turbo codes

We now compare the performance of the considered schemes under the same spectral efficiency. In fact, schemes with spectral efficiency of both 1 and 2 bits per channel use have been analyzed. As it appears from Figures 5, 6, and 9, for schemes with spectral efficiency of 2 bits per channel use, the coherent receivers show a BER of 10-4 at a signal to noise ratio between 5 and 6 dB. The corresponding noncoherent schemes exhibit a performance degradation of about 1 dB. As shown in Figures 7, 8, and 10, for schemes with spectral efficiency of 1 bit per channel use, the performance of coherent receivers is between 3 and 4 dB, whereas the noncoherent schemes exhibit a performance loss of less than 1 dB. As one can see, the performance of each scheme is strictly related to its spectral efficiency and is roughly independent of the specific detection strategy (separate or joint). Taking into consideration the performance /complexity trade-off, it turns ont that the simple schemes with separate detection and decoding may offer a good solution.

VII. Conclusions In this paper, we presented possible solutions for noncoherent decoding of concatenated codes with spectrally efficient modulations. We proposed a soft-output noncoherent decoding algorithm and showed that in the case of high order constellations it is essential to apply complexity reduction techniques in order to obtain implementable systems. A state-reduction technique suited to BCJR-type algorithms was successfully applied. We considered a first class of schemes given by the concatenation of a parallel concatenated scheme with a differential encoder, and a second class constituted by serially concatenated schemes and a parallelly concatenated coding structure without differential encoding. In the first case we considered separate detection and decoding, and in the second case we considered joint detection and decoding. We demonstrated the performance for various values of phase memory parameter N, number of trellis states and length of transmitted bit packets. In all cases, the performance of the noncoherent scheme approaches that of the equivalent coherent scheme for increasing value of the parameter N. The described separate and joint decoding schemes offer different levels of performance and complexity. The schemes based on separate detection and decoding have a low complexity, since the inner noncoherent detector accounts for differential encoding only. In particular, these schemes show a lower complexity with respect to the schemes where joint detection and decoding of trellis codes is considered. In terms of the performance/complexity trade-off, it turns out that the simple schemes with separate detection and decoding may offer a good solution in many situations. Appendix A In this appendix we show how the considered soft-output noncoherent algorithm presented in Section III has to be modified in the initial transient period, i.e, for

Noncoherent iterative decoding of coded modulations

79

k
The probability density function P| may be computed by means of a simplified backward recursion:

As it can be noticed from the derivation above, for it e {1, ... , Q - 1} the probability density function ak does not appear in the a posteriori probability (25) (exactly computed without approximations) and survivors are not needed. In fact, according to the definition of a^ a correct initialization at epoch Q (in the logarithmic domain) for the forward recursion is

Hence, for k e { Q + 1 , . . . , N}, the forward recursion may be written as

and the a posteriori symbol probability is obtained as

follows:

5. For simplicity we consider the formulation relative to the a posteriori symbol probability. The extension for a posteriori bit probability is straightforward.

80

Turbo codes

The backward recursion is easily extended in a similar fashion based on the survivor map built during the forward recursion. More precisely, the backward recursion may be approximated as follows:

where the couple (ak, εk) uniquely identifies εk+1. For k > N the general formulation previously introduced holds. REFERENCES [1] BERROU (C.), GLAVIEUX (A.), "Near optimum error correcting coding and decoding: turbo-codes," IEEE Trans. Commun., vol. 44, pp. 1261-1271, October 1996. [2] BENEDETTO (S.), DIVSALAR (D.), MONTORSI (G.), POLLARA (F.), "Serial concatenation of interleaved codes: performance analysis, design, and iterative decoding," IEEE Trans. Inform. Theory, vol. 44, pp. 909-926, May 1998. [3] UNGERBOECK (G.), "Channel coding with multilevel phase signaling," IEEE Trans. Inform. Theory, vol. IT-25, pp. 55-67, January 1982. [4] LE GOFF (S.), GLAVIEUX (A.), BERROU (C.), "Turbo codes and high spectral efficiency modulation," in Proc. IEEE Intern. Conf. Commun, (ICC '94), New Orleans, LA, U.S.A., May 1994, pp. 645-649. [5] WAHLEN (B.E.), MAI (C.Y.), "Turbo coding applied to pragmatic trellis-coded modulation," IEEE Commun, Letters, vol. 4, n° 2, pp. 65-67, February 2000. [6] ROBERSTON (P.), WORZ (T.), "A novel bandwidth efficient coding scheme employing turbo codes," IEEE J. on Sel. Areas in Comm., vol. 16, pp. 206-218, February 1998. [7] BENEDETTO (S.), DIVSALAR (D.), MONTORSI (G.), POLLARA (F.), "Parallel concatenated trellis coded modulation," in IEEE Proc. Intern. Conf. Commun. (ICC '96), June 1996, pp. 974-978. [8] MINOWA (T.), OGIWARA (H.), "Application of soft-in soft-out Viterbi algorithm to turbo trellis-coded modulation," IEicE Trans. Fundam. of Elect., Commun. and Comp. Sciences, vol. E81-A, n° 10, pp. 2047-2054, October 1998. [9] BENEDETTO (S.), MONTORSI (G.), "Versatile bandwidth-efficient parallel and serial turbo trellis-coded modulation," in Proc. Intern. Symp, on Turbo Codes & Relat. Topics (ISTC '00), Brest, France, September 2000, pp. 201-208. [10] COLAVOLPE (G.), FERRARI (G.), RAHELI (R.), "Noncoherent iterative (turbo) decoding," IEEE Trans, Commun, vol. 48, n° 9, pp. 1488-1498, September 2000. [11] COLAVOLPE (G.), RAHELI (R.), "The capacity of the noncoherent channel," European Transactions on Telecommunications, vol. 12, n°4, July-August, 2001. [12] DENG (R.G.), COSTELLO (D.J.), Jr., "High rate concatenated coding systems using bandwidth efficient trellis inner codes," IEEE Trans. Commun., vol. 37, n° 5, pp. 420427, May 1989.

Noncoherent iterative decoding of coded modulations

81

[13] COLAVOLPE (G.), FERRARI (G.), RAHELI (R.), "Reduced-state BCJR-type algorithms," IEEEJ. Sel. Areas Comm., vol. 19, n°5, pp. 849-859, May 2001. [14] BIGLIERI (E.), DIVSALAR (D.), MCLANE (P.J.), SIMON (M.K.), "Introduction to trellis-coded modulation with applications," Macmillan Publishing Company, 1991. [15] COLAVOLPE (G.), RAHELI (R.), "Noncoherent sequence detection," IEEE Trans, Commun., vol. 47, n° 9, pp. 1376-1385, September 1999. [16] FRAGOULI (C.), R. D. WESEL (R.D.), "Symbol interleaved parallel concatenated trellis coded modulation", in Proc. IEEE Intern, Conf, Commun,-Comm. Theory Min-Conf., (ICC-CTMC '99), Vancouver, Canada, June 1999, pp. 42-46. [17] M. V. EYUBOGLU (M.V.), QURESHI (S.U.H.), "Reduced-state sequence estimation with set partitioning and decision feedback," IEEE Trans, Commun., vol. 36, pp. 13-20, January 1988. [18] DUEL-HALLEN (A.), HEEGARD (C.), "Delayed decision-feedback sequence estimation," IEEE Trans, Commun., vol. 37, pp. 428-436, May 1989. [19] CHEVILLAT (P.R.), ELEFTHERIOU (E.), "Decoding of trellis-encoded signals in the presence of intersymbol interference and noise," IEEE Trans, Commun., vol. 37, pp. 669-676, July 1989. [20] RAPHAELI (D.), "Noncoherent coded modulation," IEEE Trans, Commun., vol. 44, n° 2, pp. 172-183, February 1996. [21] BENEDETTO (S.), DIVSALAR (D.), MONTORSI (G.), POLLARA (F.), "A soft-input soft-output APP module for iterative decoding of concatenated codes, "IEEE Communications Letters, vol. 1, pp. 22-24, January 1997. [22] WEI (L.-F.), "Rotationally invariant convolutional channel coding with expanded signal space-Part 1; 180°", IEEE J. on Sel.Areas in Comm., vol. 2, n° 5, pp. 659-671, September 1984. [23] WEI (L.-F.), "Rotationally invariant convolutional channel coding with expanded signal space-Part II: nonlinear codes", IEEE J. on Sel. Areas in Comm., vol. 2, n° 5, pp. 672-686, Sept. 1984. [24] PIETROBON (S.S.), UNGERBOECK (G.), PEREZ (L.C.), COSTELLO (D.J.), JR., "Rotationally invariant nonlinear trellis codes for two-dimensional modulation," IEEE Trans. Inform. Theory, vol. 40, pp. 1773-1791, November 1994. [25] LIU (W), WILSON (S.G.), "Rotationally-invariant concatenated (turbo) TCM codes," Conference Record of Thirty-Third Asilomar Conference on Signals, Systems and Computers, Monterey, California, October 1999, pp. 32-36. [26] ROBERTSON (P.), VILLERBRUN (E.), P. HOEHER (P.). "Optimal and sub-optimal maximum a posteriori algorithms suitable for turbo decoding," Europ. Trans. on Telecommun., (ETT), vol. 8, n° 2, pp. 119-125, March-April 1997. [27] COLAVOLPE (G.), FERRARI (G.), RAHELi (R.), "Extrinsic information in turbo decoding: a unified view", to be published in IEEE trans. Commun., 2001. Also in Proc. IEEE Global Telecom. Conf. (GLOBECOM '99), Rio de Janeiro, Brazil, December 1999, pp. 505-509. [28] PROAKIS (J.G.), "Digital communications", McGraw-Hill International Editions, 1995.

Chapter 5

Turbo soft interference cancellation for coded asynchronous DS/CDMA J.-P.Cances, G.R. Mohammadkhani and V. Meghdadi GESTE, Ecole Nationale Superieure d'Ingenieurs de Limoges, France

I. Introduction In the past few years, adaptive multi-user receivers for CDMA cellular systems have attracted great interest. They are able to cope with multiple access interference (MAI) which is inherent to the multipath propagation impediment and to the asynchronous nature of the proposed multiple access scheme and intersymbol interference (ISI). The goal of these receivers is clearly to reach the single user channel bound. Several kinds of architecture are proposed for the receivers: from the most simple ones using linear filtering techniques such as decorrelators receivers to the most sophisticated such as maximum likelihood receivers. This latter kind of architecture was first addressed by Verdu in 1986 [1] who demonstrated that the CDMA channel process can be viewed as a convolutional code with the number of states being exponential in the number of users. When users first encode their information sequences via single-user convolutional codes, the resulting system can be viewed as a concatenated coding system and, hence, turbo-decoding techniques may be applied. Giallorenzi and Wilson [2] first applied this new approach for an asynchronous system and they proposed a decoder consisting of a Viterbi algorithm running over a trellis with a number of states that was exponential in the product of the number of users and the constraint length of the convolutional codes. Other authors like Valenti, Hagenauer and Alexander [3, 4, 5, 6] have proposed new strategies by separating the CDMA channel coding process and the user's own convolutional codes in the receiver architecture. By doing this, they reduce the overall system complexity since the computation requirement becomes now equal to: O(Q(2K+2v)) where v is the code constraint length, K is the number of users and Q the number of iterations. These receiver architectures are able to reach the single user bound at high SNR. Using the same approach as C. Laot [7] for ISI rejection, which simplified the scheme of Douillard [8] who used two concatenated Viterbi decoders, we propose

Turbo soft interference cancellation for DS/CDMA

83

a suboptimal approach to further lower the receiver complexity. Instead of using a soft-input soft-output (SISO) multiuser detector implemented with the log-MA algorithm, we propose a nonlinear interference suppression technique based on soft interference cancellation and linear minimum mean square error filtering. This structure has already been studied by Wang and Poor in [91 but their approach to the asynchronous multipath fading case is completely different from our proposed system. The paper examines the implementation and the performances of this iterative receiver in both the contexts of a synchronous transmission system with short spreading sequence lengths (Gold sequences with N = 7 for the spreading factor) and in the context of an asynchronous transmission scheme with multipath Rayleigh fading. UMTS-TDD and UMTS-FDD transmission formats are compatible with this system. The receiver performs two successive soft output decisions achieved by a softinput soft-output (SISO) multiuser detector and a bank of single userSISOchanl decoders using the log-MAP algorithm through an iterative process. At each iteration, extrinsic information is calculated from detection and decoding stages and is then used as a priori information in the next iteration, just as in Turbo decoding. Our paper details the computation of the extrinsic information and the simulation results we present clearly prove that this new receiver is able to approach the single user channel bound considering transmission with a computation requirement equal to O(Q(K2+2V)) in the case of the synchronous AWGN transmission scheme.

II. System model a) Transmitter and channel description We consider at first the general case of a coded CDMA system with K users, employing normalized spreading sequences s1, s2, ..., SK and signaling through respective multipath channels with additive white Gaussian noise. The transmitter is represented in Figure 1. The binary information data {dk(m)} for user k, k = 1, 2,..., K, are convolutionally encoded with code rate Rk. An interleaver block is used to scatter the error bursts at the input of each channel decoder. The interleaved bits are BPSK converted, yielding data symbols of duration T. Each data symbol bk(i) is then modulated by a spreading waveform sk(t) and transmitted through the multipath channel. The transmitted signal due to the kth user is given by

where M is the number of data symbols per user per frame, T is the symbol interval, Ak is the amplitude of the kth user and sk(t) 0 t T is the normalized

Figure 1. Transmitter architecture.

Turbo soft interference cancellation for DS/CDMA

85

th

spreading sequence assigned to the k user that means sk(t) has unit energy. Then, xk(t) propagates through a multipath channel with impulse response.

where Lk is the number of paths in the kth user channel and hkl and τkl are, respectively, the complex gain and delay of the lth path of the kth user's signal. At the receiver, the received signal from the kth user's transmitter is then given by

• denotes the convolution operator. The received signal at the base station receiver is the superposition of the K users' signals plus the additive white Gaussian noise and is denoted as

where n(t) is a zero-mean complex white Gaussian noise process with power spectral density σ2. b) Receiver description The iterative Turbo receiver is depicted on Figure 2. It consists of two stages: a SISO multiuser detector, followed by K parallel single-userSISOchannel decoders. The two stages are separated by deinterleavers and interleavers. TheSISOmultiuser detector delivers the a posteriori log-likelihood ratio (LLR) of a transmitted "+1" and a transmitted "- 1" for every code bit of every user

Using Bayes' rule, (5) can be written as

The second term in (6), denoted by λp2[+bk(i)], represents the a priori LLR of the code bit bk(i), which is computed by the channel decoder of the kth user in the

Figure 2. Receiver architecture for the AWGN channel.

Turbo soft interference cancellation for DS/CDMA

87

previous iteration, interleaved and then fed back to the SISO multiuser detector. The superscript p indicates the quantity obtained from the previous iteration. In the particular case of the first iteration, assuming equally likely code bits, we have λp2[bk(i)] = 0, 1 k K, 0 i M. The first term in (6), denoted by λ1[bk(i)], represents the extrinsic information delivered by the SISO multiuser detector, based on the received signal r(t), the structure of the multiuser signal, the prior information about the code bits of all other users, {λpb[bt(i)}, l k, 0 j M}, and the prior information about the code bits of the kth user other than the ith bit, {λp2[bk(j)], j i}. The extrinsic information λ1[bk(i)], which is not influenced by the a priori information λp2[bk(i)] provided by the channel decoder, is then reverse interleaved and fed into the kth user's channel decoder, as the a priori information in the next iteration.

III. SISO multiuser detectors for synchronous CDMA In this part we focus our attention on the multiuser detector which constitutes the main contribution of the paper. To simplify the formulation of the problem we only consider the case of a synchronous CDMA system that means ht(t) = δ(t), for k = 1,..., K. The received signal can then be written as:

In this case, it is well known that the K-vector y(i) is a sufficient statistic for demodulating the ith code bits of the K users. the kth component of y(i) is written as:

Using (7), this sufficient vector y( i) can be expressed as:

where R denotes the normalized cross-correlation matrix of the signal set s1, bk(i)]T. We examine now theSISOmultiuser detector structure. a) SISO multiuser detector We denote

88

Turbo codes

B-k is defined in a similar way. From (9), the extrinsic information λ1[bk(i)] can be computed as:

P[bj] is the following probability: P[b j (i) = bj]. The summations in the numerator (resp denominator) in (11) are extended over all the 2k-1 possible vectors b in B+k (resp B-k). We obtain [9]:

In (12) we see that the extrinsic information λ1[bk(i)] at the output of the SISO multiuser detector is made up of two parts. The first term is the channel value of the desired user Ak • yk(i)/σ2, and the second term is the information extracted from the other user's channel values{yi(i)}ikas well as their prior informations

b) Suboptimum multiuser detector with soft outputs Due to the complexity which is exponential in terms of the number of users (see equation 12), it is clear that the practical implementation of the Multiuser Detector using Viterbi decoding algorithm is impossible. To approximate the optimal solution we propose to mitigate the multiuser interference using a soft cancellation algorithm combined with a Minimum Mean Square Error (MMSE) filtering. The proposed suboptimum algorithm proceeds as follows: Using the loglikelihood ratios of the coded bits of each user, {λp2[bj(i)]} j k, which are given by the channel decoder at the former iteration step, we form a soft estimation of the coded bits of all users

Then, we define the vector:

Turbo soft interference cancellation for DS/CDMA

89

With ek which is the kth vector of the canonical base (e_k = (0, 0,. . . ., 1, 0,... ,0)). Thus, bk(i) is obtained from b(i) by setting its kth component equal to zero. For each user k, a soft interference cancellation is done at the outputs of the matched filters. We obtain then

Such a cancellation scheme has already been proposed in [4] and [9]. The next step involves the use of a MMSE filtering stage in order to help remove the MAI in y-k(i). Using the vector w-k(i), the filtering stage proceeds as follows:

The vector w-k(i) is chosen in order to minimize the mean squared error between the coded bit bk(i) and the filter output zk(i).

The procedure for determining the weights of w-k(i) is detailed in [9].

IV. The case of asynchronous multipath Rayleigh fading channel In this section, we generalise the former principles to the case of an asynchronous multipath Rayleigh fading channel. At first we show that the spreading process can be viewed as a special convolutional code. Let us define 0k as the relative delay between user k and user 1 taken as reference. τ is the maximum delay and L the number of chips corresponding to this maximum delay. N is the spreading factor, in the case of short spreading sequences; we have N = T/TC. (Tc: chip duration)

We have to define the new structure of the transmitted data vector b-ε CLK, the N.M+L MK received data vector r-ε C and the vector y-ε C at the output of the bank of matched filters. The received data vector at the front end of the receiver can be exprimed as (in the case of an AWGN channel):

The matrixT-εCN.M+L,M.Khas vector t-jεCN.M+L:

jth

column (jε{l, ..., M.K}) given by the

90

Turbo codes

the vector 0N.i has N.i components all equal to zero and j = (i-1).K+k. hk*sk is a column vector of size N+L which represents the discrete convolution between hk (discrete channel impulse response for user k) and the spreading sequence sk. In the case of an AWGN channel, hk has only one non-zero component corresponding to the delay of user k. Figure 3 describes the structures of b-, y-, rand T- in the case of an AWGN channel. The vector y- at the output of the bank of matched filters can then be written as: Denoting R - εC M . K X M . K as R = TT. T-, we obtain then: y- = R-.b- + z- with z- = T .n-. R- can be written as: R- = M- + W- with W- a diagonal matrix and M a matrix with a all zero diagonal components. Thus, we obtain the new expression: T -

W- contains the different user powers and M describes the MAI structure. From equation (19) we can write: m. is the jth row of M. Then, y- can be viewed as the output of a convolutional code disturbed by z-. The convolutional code which models the spreading process has a coding rate equal to one and it introduces no redundancy; it only brings correlation between coded bits. Besides its generating polynomials depends on the choice of the spreading sequences sk. The complete system can be considered as the serial concatenation of two convolutional codes and hence turbo detection can be applied in its generalised form. However, due to the complexity of such a scheme we consider only suboptimum decoding techniques based on MAI soft cancellation just as in the synchronous case described before. In the case of Multipath propagation, channels under investigation are shown in Figure 4. We consider two kinds of multipath fading channels which are currently investigated in GSM: the Outdoor-to-Indoor and Pedestrian B (ITU) the Vehicular A (ITU). For each multipath component, the fading process can be represented by a channel coefficient a(k) characterized by a low-pass random process with a power spectrum, called Doppler spectrum, φd(f), φd(f), = 0 for f fd, where fd is the maximum Doppler frequency shift fd = 2.B.TC with B = fc.v/c and Tc equal to chip duration. For mobile communications, the most commonly used channel model is described in [11], and it is often called the "Jakes' model", which has the Doppler spectrum

Turbo soft interference cancellation for DS/CDMA

Figure 3. Matrix and Vector descriptions for the Asynchronous System (AWGN).

91

Figure 4. Multipath propagation channel profiles.

Turbo soft interference cancellation for DS/CDMA

93

where C is a real constant value. This model characterizes the fading channel for an antenna responsive to the EZ field. However, even though the Jakes model is widely used, the fading process in the real world may take other forms. For example, as shown also in [11], the Doppler spectra of signals received by antennas responsive to H fields are different from (20). The Doppler spectrum is also often modeled as a low-pass process generated by passing a white Gaussian noise (WGN) process through a Butterworth low-pass filter with a cutoff rate fd. We consider here chip duration equal to 400 ns together with Gold spreading sequences of length 7. The relative delays between users and the delays on each multipath component are supposed to be within the range [0,71 where T is the symbol period (T = 400.7 = 2800 ns = 2.8 μS). There are two possible architectures for the receiver. In the first case we apply the MUD algorithm after the combining of the different multipat components. This is done by adding a RAKE receiver, which coherently combines the contributions from different paths at the front end of the architecture depicted in Figure 2. The new receiver architecture is drawn in Figure 5(a). The Mux function depicted here is a MRC (Maximum Rati Combining) device, which coherently adds the contributions from the different multipath components. It implies an accurate phase synchronisation to cope with phase rotation on each multipath and a temporal synchronisation to add the different energies from the same bit. It is worth noting that the presence of multipath propagation entails a new computation of the cross-correlation signature sequence matrix, which now takes into account the relative distribution of delays and energies between multipath components. The computation of this matrix is detailed in equations (23) and (25). In the second case, the MUD algorithm is applied before RAKE combining The coherent combining after the MUD stage implies addition of different sof outputs from the Viterbi channel decoders, the hard decision is then taken with the sign of this combination. This architecture is depicted in Figure 5(b). From the complexity point of view it is clear that the second structure requires a heavier computational burden. However, the cross correlation matrix on each multipath component is more precisely computed since we only have to track one multipath component at each time. For the Doppler rate, we consider a normalised Doppler frequency fd = 10-4 for the Vehicular A channel and fd = 10-5 for the Pedestrian B. For the asynchronous nature of the transmission system we decide to remain as compliant as possible with section III. In order to do this, we transform the asynchronous system into a synchronous equivalent one using the addition of fictitious users [12]. This method is usually referred as "one-shot" detection. Relative delay between user k and user 1 is denoted τk and we suppose that {τk}, 2 k K are uniformly distributed over [0, T]. To be more precise we can consider the simplified case of two users in Figure 6 (a). Consider bit N of user 1, which occupies the interval [0,T]. In this time interval, user 1 also overlaps part of bit

Figures 5a (above) and 5b (below). Receiver architecture for the asynchronous multipath Rayleigh fading channel. a) with MUD pre-processing b) with MUD post-processing.

96

Turbo codes

N-1 of user 2 over interval [0,τ2] and part of bit N of user 2 over interval [τ2, 7]. We can view this situation as a three-user synchronous channel with signature waveforms:

Let e1 - fT0 s~211(t).dt, (0 < el < 1) be the energy of signature waveform s~11(t) of fictitious user (1,1). The energy of signature waveform S13(t) of fictitious user (1,3) is thus (1 -e1). Using the values of ^2[bi(n)] at the output of the bank of channel decoders we obtain a set of over-determined equations for the true bits. Using a least squares method and the Moore-Penrose generalised inverse [13], it is possible to recover the true bits. The extension towards the general case of K asynchronous users can easily be done. We take "one-shot" at time interval [0,T] (synchronisation with user one clock), considering the case of an AWGN channel, this will result in 2.K-1 fictitious users as it is shown in Figure 6 (b). In thi particular case, we can write the received signal r(t) = %) + n(f) in a form showing the different users' bit overlap [12].

with: 0< T 2
One-shot taken on user 1 's time interval

One-Shot matched filter, (a) timing

(b) structure

Figures 6a and 6b. Equivalent synchronous system for a given asynchronous transmission system. a) in the case of two users b) in the general case.

98

Turbo codes

Therefore we can obtain, in matrix notation, an expression similar as (9) for the output of one-shot matched filter bank:

where R is (2.K-1)x(2.K-1) matrix that will be termed "partial cross correlation" matrix (PCC), is given by

The entries of R. are the cross correlation between left and right parts of the code sequence sk(t), k = 2,...,K as well as with the code sequence s1(t) as shown in Figure 6(b). Using this equivalent synchronous model, and supposing that the signal propagates through a multipath propagation channel with delays (τ1,τ2,...,τL) (distributed in ascending order) and energies (E1, E2,...,EL) (Σ<Ei.> = 1, since we consider normalised channels), the "true" cross correlationimatrix 5 is given by: a) in the case of MUD post-processing

where Ei is the instantaneous power on the ith multipath component, R(τL-τi.) is a partial cross-correlation matrix computed with the same approach as (23) usin (τL-τi.) as relative delays between user signature sequences. R(0) obviously corresponds to the Lth component corresponding to the expression given in (23) b) in the case of MUD pre-processing

where yl(i) is the ith multipath component output corresponding to bit i. We found by simulation runs that there is no significant difference between post and pre-processing MUD (less than 0.5 dB) and so the results will be given i the context of pre-processing MUD.

Turbo soft interference cancellation for DS/CDMA

99

All the computation steps given in section III A) and III B) for the synchronous case, can be still applied in the asynchronous multipath propagation case.

V. Simulation results V.1. Synchronous transmission system with AWGN In this section, we present some simulation results for the synchronous case using short length spreading sequences (1 symbol = 1 chip period = 7 chips) with a spreading factor equal to 7. All users employ the same ½ convolutional encoder with generators (23,35) in octal form and the channel decoders are all implemented using the log-map algorithm. Each user employs a different interleaver generated randomly [14]. The block size of the information bits for each user is 256 [16 x 16]. The chip waveform employed is the raised-cosine waveform equally split between the transmitter and the receiver with a roll-off factor of 0.4 and the over-sampling factor used together with this particular waveform in the simulation runs is equal to four. We consider at first the case of six users with equal cross-correlation ρi,j = 0,7, for 1 i,j 6. All the users have the same power. In Figure 7, the performances of the proposed SISO multiuser detector are shown. The single-user performance ρi,j = 0 is also shown for comparison purpose. It is clear that the single user bound is reached at high SNR (for example, at BER = 10-4; considering the fifth iteration, the difference is less than 0.2 dB). Next, we consider a severe near-far situation in which three strong users are 4 dB above the weak users. The user cross correlations remain the same. Figure 8 represents the performances of one of the weak users and Figure 9, the

Figure 7. BER performances of the SISO synchronous multiuser detector with AWGN and six users with the same radiated power.

100

Turbo codes

Figure 8. BER performances of the SISO synchronous multiuser detector for the weak users in a severe near-far situation: three strong users are 4 dB above the weak users. performances of one of the strong users. It is seen that in such a near-far situation, the weak users actually benefit from the strong interference whereas the strong users suffer performance loss from the weak interference, a phenomenon previously also observed in the optimal multiuser detector [15] and the multistage multiuser detector [16]. The very interesting fact in all these simulation results is that the performance of such receiver after the first iteration corresponds to the performance of a traditional non-iterative receiver structure consisting of a linear MMSE multiuser detector followed by K parallel soft channel decoders. Thus, it is seen from these figures that at reasonably high SNR, the proposed SISO multiuser detector offers significant performance gain over the non-iterative receiver.

Figure 9. BER performances of the SISO synchronous multiuser detector for the strong users in a severe near-far situation: three strong users are 4 dB above the weak users.

Turbo soft interference cancellation for DS/CDMA

101

V.2. Asynchronous transmission system with multipath Rayleigh fading We consider the same transmitter as in V.A) (i.e: we would have the same cross correlation matrix as in the case of the AWGN channel) but we consider here, for simplicity reasons, rectangular chip waveforms and, due to the presence of multipath fading, we increase the size of the interleavers [64 x 64]. The channels under investigation have been described in section IV. In the case where the propagation delay is a fraction part of the period chip we use a simple first order interpolation between two consecutive matched filter output samples (y((n + δ).Tc) = (1-S).y(n.Tc) + δ.y((n + l).Tc)). We are in the same case as Figure 9 with three strong users, 4 dB above the weak users. The user cross correlations remain the same as in the AWGN case. Figure 10 corresponds to the

Figures 10a and 10b. BER performances of theSISOmultiuser detector in the case of asynchronous Rayleigh multipath fading channel (Pedestrian B rru) and in a severe near-far situation: three strong users are 4 dB above the weak users. a) The weak users b) The strong users.

102

Turbo codes

performances obtained with the Pedestrian B channel for the weak users 10.a) and for the strong users 10.b). One can remark that the turbo effect is more effective for the weak users than for the strong users, particularly at the second iteration but the single user bound is now reached within 0.3 dB at BER = 10-4 even in the case of the strong users. In fact, we can remark that the turbo effect is more efficient over Rayleigh multipath fading channels than over AWGN channel. This is confirmed by the simulation results obtained for the Vehicular A channel in Figure 11. BER performances are better in this case, since deep fading durations are shorter than those encountered with Pedestrian B, but turbo effect is one more time more efficient for weak users 1 l.a). The single user bound is reached at BER = 10-4 for the weak users whereas strong users are within 0.3 dB 1 l.b).

Figures 11a and 11b. BER performances of the SISO multiuser detector in the case of asynchronous Rayleigh multipath fading channel (Vehicular A ITU) and in a severe nearfar situation: three strong users are 4 dB above the weak users. a) The weak users b) The strong users.

Turbo soft interference cancellation for DS/CDMA

103

Unlike the AWGN case, where the received powers are the same for all users at all times, the Rayleigh fading cases give the MUD one more signal parameter: th instantaneous signal power that it can use to differentiate the users. This entails the following conclusion: the gain from using MUD over fading channels is greater than over AWGN channels with power control.

VI. Conclusion In this paper, we have proposed a structure for an iterative soft interference canceler receiver which is able to reach the same performances as the single channel user at high SNR even in the case of an asynchronous transmission system propagating through a frequency selective multipath fading channel. This turboreceiver architecture is based on the use of a non-linear interference suppression technique and a linear minimum mean square filtering. The receiver performs two successive soft output decisions achieved by a softinput soft-output (SISO) multiuser detector and a bank of single userSISOchanl decoders using the log-MAP or the SOVA algorithm through an iterative process. At each iteration, extrinsic information is calculated from detection and decoding stages and is then used as a priori information in the next iteration, just as in Turbo decoding. The performances are tested using short spreading sequences with high level cross correlation products and they demonstrate the ability of this receiver to cope with severe MAI. Besides, since MAI and ISI constitute the major impediment t the overall system performance in a high rate coded CDMA network, this receiver is a good candidate to be implemented in the base stations for future UMTS cellular network. REFERENCES [1] VERDU (S.), Minimum probability of error for asynchronous Gaussian multipleaccess channels, IEEE Trans. Inform, Theory, IT-32, pp. 85-96, Jan. 86. [2] GIALLORENZI (T.R.), WILSON (S.G.), Multiuser ML sequence estimation for convolutional coded asynchronous DS-CDMA systems, IEEE Trans. Commun., COM-44, n°9, pp. 1183-1196, Sep. 96. [3] VALENTI (M.C.), WOERNER (B.D.), Combined mutiuser reception and channel decoding for TDMA cellular systems, Proc. IEEE Vehicular Technology Conf., pp 1915-1919, Ottawa 98. [4] HAGENAUER (J.), Forward error correcting for CDMA systems, in Proc. IEEE Int. Symp. Spread Spectrum Techniques and Applications, Mainz, Germany, Sep. 96, pp. 566569. [5] ALEXANDER (P.D.), GRANT (A.J.), REED (M.C.), Iterative detection in code-division multiple-access with error control coding, European Trans. Telecommun. Special Issue CDMA Techniques Wireless Commun. Syst., 9, pp. 419-425, Sept./Oct. 98.

104

Turbo codes

[6] ALEXANDER (P.D.), REED (M.C), ASENSTORFER (J.A.), SCHLEGEL (C.B.), Iterative Multiuser Interference Reduction: Turbo CDMA, IEEE Trans. Commun., COM-47, n°7, pp 10081014, July 99. [7] LAOT (C.), Egalisation autodidacte et turbo-égalisation: application aux canaux sélectifs en fréquence, PhD Thesis, Rennes I University, July 97. [8] DOUILLARD (C.), JEZEQUEL (M.), PICART (A.), DIDIER (P.), GLAVIEUX (A.), Iterative correction of intersymbol interference: Turbo-equalization, European Trans. Telecommun., 6, pp. 507-511, Sept. 95. [9] WANG (X.), POOR (H.V.), Iterative (Turbo) Soft Interference Cancellation and Decoding for Coded CDMA, IEEE Trans. Commun., COM-47, n°7, pp 1046-1061, July

99. [10] POOR (H.V.), VERDU (S.), Probability of error in MMSE multiuser detection, IEEE Trans. Inform. Theory, IT-43, pp. 858-871, May 97. [11] JAKES (W.C.), Microwave Mobile Communications, John Wiley & sons, 1974. [12] VERDU (S.), Recent progress in multi-user detection, Advances in Commun. and Signal Processing., Springer Verlag (N.Y.), 1989, pp. 164-175. [13] BOUILLON (T.L), ODELL (P.L), Generalised Inverse Matrices, Wiley-lnterscience, 1971. [14] ANDERSEN (J.D.), Interleaver design for turbo coding, in Proc. Int. Symposium on Turbo Codes and Related Topics, Brest, France, Sep. 97, pp. 154-156. [15] VARANASI (M.K.), AAZHANG (B.), Near-optimum detection in synchronous codedivision multiple-access systems, IEEE Trans. Commun., 39, pp. 725-736, May 91. [16] TARKOY (F.), Iterative multiuser decoding for asynchronous users, in Proc. IEEE Int. Symp. on Inform. Theory (isit 97), Ulm, Germany, June 97, p. 30.

Biographies Jean-Pierre Cancés graduated in electrical engineering from ENST Bretagne in 1990. He received his Ph.D degree from Té1écom Paris in satellite communication engineering in 1993. He is now an assistant professor at the Ecole Nationale Supérieure d'lngénieurs de Limoges (ENSIL). His current research interests include satellite communication systems, multiuser detection and synchronization algorithms. Golam Reza Mohammadkhani received his BSc. and MSc. degrees in communication engineering from Sharif University of Technology, Tehran, Iran, respectively in 1993 and 1995. He worked at the Ecole Nationale Supérieure d'Ingénieurs de Limoges (ENSIL) as a PhD Student. at ENSIL. His current research interests are in the field of multiuser detection for asynchronous CDMA. Vahid Meghdadi received his BSc. and MSc. degrees in communication engineering from Sharif University of Technology, Tehran, Iran, respectively in 1989 and 1992. He worked as a lecturer in Mashhad University in 1993. Since 1994, he has been a Ph.D. student in Ecole nationale Supérieure d'Ingenieurs de Limoges (ENSIL) where he is now an assistant professor. His current research interests include satellite communication systems, multiuser detection and synchronization algorithms.

Chapter 6

A generalized framework for iterative source-channel decoding N. Görtz Institute for Communications Engineering, Munich University of Technology, Germany

I. Introduction In practical communication systems, especially in mobile telephony applications, the block-length for source and channel coding is limited due to complexity and delay constraints. Therefore, the data-bits that are issued by a source encoder usually contain residual redundancies, i.e., source coding can not be carried out perfectly in the sense of information theory. Furthermore, "perfect" channel codes are also only achievable with infinite block-length, i.e., the output bits of a practical channel decoder are not error-free, even if the code rate resides below the channel capacity. Thus, the application of the separation theorem of information theory is not really justified in practice. In order to improve the performance of communication systems without sacrificing resources like bandwidth or transmission power, many authors have considered joint source-channel coding and decoding. One direction of joint source-channel coding is given by channel-optimized vector quantization [1,2], and related approaches like optimization of the index assignments, e.g., [3]. These techniques consider the bit-error probability that is to be expected in the data bits at the receiver side, possibly after a separate channel decoding step, in the design of the quantizer tables and the index-assignments for the reproduction levels, i.e., the transmitter is adapted to the channel. However, much work in the area and also this paper concentrate on the receiver side, especially on joint source-channel decoding (JSCD). The key-idea of JSCD is to exploit the residual redundancies in the data bits to improve the overall quality of the transmission. In the past, two main approaches could be distinguished: In the first type, the residual redundancies in the data bits are used as a-priori information in channel decoding in order to reduce the bit-error rate after decoding. One example is source-controlled channel decoding stated in [4] for the decoding of binary convolutional channel codes. The idea has been extended in [5] and [6] for the use of non-binary a-priori information. The problem of these

106

Turbo codes

approaches is that the actual quality-criterion in the transmission of waveform signals (e.g., speech, audio) is not the bit-error probability, but the (possibly weighted) SNR between the input of the transmitter and the output of the receiver. However, powerful channel coding is required in such systems, especially for wireless transmission. The algorithms of the second class process reliability informations of the received bits and statistical a-priori information on the quantizer indices to estimate the decoder output signal. Such algorithms have been stated, e.g., in [7, 8, 9, 10]. The techniques can be used after soft-output channel decoding [11, 12] or without application of any channel code. The a-posteriori probabilities (APPS) of the quantizer indices are computed for the estimation rather than the APPS of single index-bits, because the bit-APPS do not contain the correlation of the index-bits due to a non-uniform probability distribution of the indices, which usually results from imperfect source encoding. The drawback of this approach is that a possibly required channel code is decoded independently of this estimation procedure. Although both methods could be combined, e.g., by application of the APRISOVA algorithm [4] for channel decoding followed by the soft-bit source decoding [9], there is still a separation between source and channel decoding, since the channel decoder does not take advantage of the results of soft-output source-decoding. In [13] an optimal algorithm for joint source-channel decoding called Channel-Coded Optimal Estimation (CCOE) was introduced, for a system with a single quantizer index that is channel-encoded by a systematic encoder. The algorithm was generalized in [14] for non-systematic channel encoders and the transmission of several quantizer indices in parallel. A generalization for the application of short non-linear block codes has been stated in [15]. Channel-Coded Optimal Estimation fully closes the gap between the two approaches mentioned above, since it exploits the reliabilities of the channel output-bits, the channel-code redundancies, and the residual source correlations in one step, and it allows to estimate the decoder-output signal directly. However, CCOE has the drawback of many optimal algorithms: It is not feasible for practical systems due to its complexity. Therefore, iterative source-channel decoding (ISCD) has been introduced in [16] and, more detailed, in [17], for systems with channel coding by a systematic encoder. For this special case, iterative source-channel decoding has been formulated as an approximation of the optimal algorithm in [18]. An iterative approach to source-channel decoding has also been stated in [19]. In this paper, iterative source-channel decoding (ISCD) is generalized for any binary channel code. It is practically feasible, if a channel code is selected, for which a symbol-by-symbol APP decoder can be implemented with realisti complexity. TheISCD-algorithmis no longer restricted to the use of systematic channel encoders.

Iterative source-channel decoding

107

This paper is organized as follows: First, a model of a transmission system is given, for which the optimal joint source-channel decoding algorithm is shortly reviewed. Then, iterative source-channel decoding is derived from the optimal algorithm by insertion of appropriate approximations. The latter allows the decomposition of the optimal decoder into two components, which can be identified as the constituent decoders for the channel-code and the source-code redundancies. Thus, new insight is gained into the relations of the optimal and the iterative decoding principles. Finally, after some remarks concerning the implementation, the performance of ISCD is analyzed by simulation results.

II. System model Figure 1 shows the model of a transmission system that is used throughout the paper. The input vectors X1K, ..., XMK have to be transmitted at each time k. The vectors can be thought of as M independent source-signals. In a practical case they might also be the parameters1 of a speech, audio, or image codec. The input vectors Xjk j = 1,..., M, are quantized ("source-encoded") by the indices Ijk by Nj bits. Although for each input vector Xjk j = 1,..., M, a different number Nj of bits can be used for the quantization, the number of output bits is fixed for each quantizer2. Additionally, it is assumed that it is not possible to jointly encode the input vectors of several time-steps (which might be desirable in order to achieve more efficient source-encoding) since this would cause additional delay at the receiver. This prerequisite is realistic in common two-way communication scenarios, e.g., in telephony applications.

Figure 1. Model of the transmission system.

1. Usually, the codec-parameters are derived from a block of source-signal samples. For instance, in speech coding the duration of a signal block is typically around 20 ms. 2. The applications which are intended for the algorithms stated in this paper lie in the area of mobile communications, e.g., speech transmission. The possibility of variable-length encoding of the quantizer indices is not considered because of the well-known error-propagation and synchronization problems due to bit-errors. For instance, this is the reason why practical speech encoders for mobile radio systems usually do not use variable-length encoding of their quantizer indices. The combination of the algorithms described in this paper with more recent approaches of robust decoding of entropy-coded sources is an interesting topic for future work.

108

Turbo codes

After source encoding the index-bits are interleaved3, commonly channelencoded, and the codewords Vk = { Vl,k ε {0,1}, l = 1, .... Nv}, Nv > N =ΣMj=1 Nj, are transmitted over an AWGN-channel. Coherently detected binary phase-shift keying is assumed for the modulation. Therefore, the conditional p.d.f. (probability density function) of the received value vl,k ε R at the channel output, given that the code bit vl,k ε {0,1} bas been transmitted, is given by

with the variance σ2n =N0/2Es.Es is the energy that is used to transmit each channelcode bit and N0 is the one-sided power spectral density of the channel noise. The joint conditional p.d.f. pc (Vk | Vk) that the vector Vk ε R Nv has been received, given that the codeword Vk ε {0, 1 }NV was transmitted, equals the product of (1) over all Nv code-bits, since the channel noise is "white" and normally distributed, i.e.,

If the signals Xjk, are autocorrelated, adjacent indices Ijk-1, Ijk show dependencies. They are modeled by first-order stationary Markov-processes, which are described by the index-transition probabilities P(Ijk =λj |Ijk-l= µj), λj, µj = 0, ..., 2Nj - 1 and j = 1,..., M. It is assumed that these probabilities and the probability-distributions of the indices are known. The indices are assumed to be mutually independent. This simplifies the analysis and the realization of source-channel decoding, and it is at least approximately fulfilled in many practically relevant situations. For instance, it was shown in [20] that the quantizer indices of CELP speech codecs [21] show relatively small mutual correlations4 but some of them (e.g., the quantizer indices of the block-energy and the spectral shape) are strongly autocorrelated over few adjacent time-steps (typically, a time-step corresponds to a speech frame with a duration of 20 ms). An appropriate model for this type of correlation is given by first-order Markov-processes. In the terminology of turbo-codes the coding scheme is serially concatenated since the redundancies of the indices (i.e., the time-based dependencies and the non-uniform probability distributions) form the "outer" codes which are encoded 3. The interleaver, which is denoted by π(.), is treated as a part of the channel encoder. Consequently, it is "hidden" in the algorithms for channel decoding in the sequel. 4. The cross-correlations of the parameters of a source codec can be exploited by vector quantization to reduce the bit rate without the error-propagation problems which occur in schemes that also exploit the autocorrelation of the parameters, e.g., by differential encoding.

Iterative source-channel decoding

109

once more by the "inner" channel code. The M-interleaver was included, because it is a well-known fact that it improves the error-correction properties of concatenated channel codes since the individual bits of the indices, which are coupled by the residual redundancies (the "outer" codes) are spread by the bit-interleaver over the whole range of input bits of the "inner" channel code. Additionally, the interleaver makes adjacent bits in a block independent, which is an assumption of the frequently used APP decoding-algorithm in [11] (BEJR-algorithm). In principle, it would be possible to use an interleaver-length that is larger than the number of bits that have to be transmitted at each time step. However, this case is not considered in this paper since such an interleaver would cause additional delay.

III. Optimal joint source-channel decoding For the system model stated above, the goal is to minimize the distortion of the decoder output signals Xjkdue to the channel noise, i.e., we would like to perform optimal joint source-channel decoding (JSCD) for a fixed transmitter. This allows improvement of existing systems by the algorithms stated below with decoder-only modifications. The mathematical formulation of the optimization criterion is given by the conditional expectation of the mean-squared error (accumulated over all signal vectors):

This distortion measure has to be minimized over the output vectors Xjk of the joint source-channel decoder. In (3), X j q (I j k ) is the entry with the index Ijk of the quantization table, which is used by the source encoder to quantize the vector Xjk and Vk0 = { Vk, Vk-1, ..., V0} is the set of channel words which were received up to the current time k. Let the realizations µ = (µ1,...., µM} of the set of indices

take on "values" from the set S, i.e., µ ε S, with

Then, the minimization (3) results in mean-square estimators for the signals Xjk, even if the indices are not mutually independent, i.e.,

with the index a-posteriori probabilities (APPS)

110

Turbo codes

Hence, the actual problem of joint source-channel decoding is how to compute (7). Using the Bayes-rule, the index APPS (7) can be expressed as

This results in

which can be written more compact according to

The term can be interpreted as index a-priori probability, since it is computed only from channel words that have been received in the past, i.e., the information carried by the currently received channel-word Vk is not used. The factor

does not depend on the random variableIjk{,i.e., it is a normalizing constant, that makes the left-hand side of (10) a true probability that sums up to one over all possible values of Ijk. Thus, the p.d.f.s p(V k-1 0 ) and p(Vk0) do not have to be explicitly computed, since the normalization of (10) can be easily achieved by computing Bk according to

Considering the mutual independence of the indices and the Markov-models for their time-based dependencies, it can be shown by using the results5 in [14], that the two terms on the right-hand side of (10) are given by

5.The optimal algorithm for joint source-channel decoding has been stated in [14] and called Channel-Coded Optimal Estimation. In this paper, the general formula from [14] is used in a specialized form for mutually independent quantizer indices.

Iterative source-channel decoding

111

for λj ε Sj, j = 1, 2, . . . , M, and the index a-priori probabilities

The notation V(µ)k corresponds to the channel codeword that results from some input index-combination µ ε S. No restrictions apply for the channel code and the channel encoder, i.e., the result is quite general. Additionally, it should be noticed that the index a-priori probabilities (15) are computed from the "old" index APPS (10) at time k - 1, i.e., the equations (10), (14), and (15) form a recursion that allows the computation of the index APPS using only the known quantities, which were defined in section II. At k = 0, the "old" APPS are initialized by the "unconditioned" probability distributions of the indices in order to compute (15) for the first time.

IV. Iterative source-channel decoding The optimal algorithm for JSCD stated in section III is practically not feasible if realistic numbers of index-bits are first spread by an interleaver and then are commonly channel encoded. This is due to the tremendous complexity of the summation in (14), which is taken over all possible combinations of the indices, excluding only the index Ijk under consideration. In principle, the a-posteriori probabilities (10) can be computed by the symbolby-symbol APP-algorithm in [11] (BCJR-algorithm), if a non-binary channel code is used, with code-symbols that are matched to the numbers of bits in the quantizer indices. In this approach, the bits cannot be interleaved prior to channel encoding, because the goal is to exploit the a-priori probabilities of the indices in one step of the trellis of the non-binary code. Thus, the overall concatenated code has worse distance properties than a scheme with an interleaver. Additionally, the implementation of an APP-decoder for a non-binary channel code is practically feasible only for a small number of bits (e.g., two or three) in the code-symbols. Nonetheless, practical source encoders often issue quantizer indices with more than five bits. An efficient implementation of the APP-algorithm is possible if a binary channel code with a regular trellis with a small number of states is used.6 This will be assumed in the sequel. Still, the quantizer-indices are non-binary: In general they consist of arbitrary numbers of bits. In order to be able to efficiently compute a posteriori probabilities for the output data bits of the channel code, but to exploit the correlations of the bits of the non-binary indices, the iterative source-channel decoding (ISCD) can be used. It has been introduced in [16, 17] for channel codes with systematic encoders. However, this restriction is not necessary, as we will see in the more general framework in the sequel. 6. This is the case for the frequently used binary convolutional codes with short constraint-length.

112

Turbo codes

The basic idea of iterative source-channel decoding is to use a binary channel code and to simplify the computation of the index APPS (10) in such a way, that the APP-algorithm [11] (BCJR-algorithm) can be efficiently applied to exploit the channel-code redundancies, while the bit-correlations, which are contained in the index a-priori probabilities (15), are still utilized for the computation of the index APPS (7).

As a consequence of this idea, the term p(Vk | Ijk = λj, Vk-10) in (10), that contains the channel-code redundancies and which is given by the highly complex formula (14), must be approximated by the bit APPS P(C) (iji,k = λjl|Vk0), (or some other quantity, that can be efficiently computed by the APP-algorithm). The superscript "(C)" is introduced to identify quantities that belong to the APP-algorithm for channel decoding. In the next section, this algorithm is analyzed in some detail in order to break up the recursion (10), (14), and (15) into two constituent algorithms that operate iteratively with drastically lower complexity than the optimal scheme. A. General Description of the Symbol-by-Symbol APP-algorithm The symbol-by-symbol APP-algorithm for decoding of a binary channel code (i.e., the code symbols are bits) computes the bit APPS

The probability of each data bit ijl,k is "conditioned" on the whole received channel-word7 Vk (and on the underlying code constraints). In (16) the same (a little complex) bit-notation as above was used, to make the results of this section directly applicable in the rest of the paper. The bit APPS (16) can be rewritten as the marginal probability-distribution

The a-priori probability P(V ( µ ) k ) of the code word can be replaced by the (C) m = m product of the a-priori probabilities P a(i l,k µ l) of the data-bits, if the latter are assumed to be mutually independent (which is not true in our setup). Therefore, we obtain:

7. From the channel-coding point of view, there is no need for the time index k that identifies adjacent codewords, because the channel decoder only exploits the redundancies within one code word to compute the output bit-APPS (16). The latter only have a time index, because the a-priori information, that is passed to the channel decoder, depends on the previously transmitted codewords due to the correlation of the source-encoder indices.

Iterative source-channel decoding

113

Finally, (18) can be rewritten as

The bit APPS given by this formula can be efficiently computed by the APPalgorithm in [11], if the channel code is appropriately structured. For the application in an iterative decoding scheme it is convenient to define

using (19) for the second equality. This quantity contains the information on the data bits, that has been derived by the APP-decoder by exploiting the received channel word Vk and the channel-code redundancies, i.e., it contains the extrinsic [22] and the received channel information. The a-priori information P(C)a(ijl,k = λjl ) is removed (by the division in (20)) in order to extract the new information that can be used by further constituent decoders without feeding back information to them that they produced themselves or that they have already used. Since, due to the correlations of adjacent indices in the system model, the index a-priori probabilities depend on the previously received channel words, the notation is formally adapted to that by setting

and

B. Derivation of Iterative Source-channel Decoding (ISCD) Since we would like to make the APP-algorithm for decoding of a binary channel codes, as summarized in section IV-A, applicable in a source-channel decoding scheme that involves non-binary quantizer indices, we have to • find appropriate bit a-priori information for the app-algorithm. • find a way to use the bit -based output-information of the APP-decoder in source-channel decoding.

114

Turbo codes

As a first step towards the solution of these two problems, the left-hand side of (14) is reformulated:

Now, in the numerator and in the denominator, the index probabilities are approximated by the product over the corresponding bit probabilities:

Thus, the mutual dependencies of the bits, which result from the residual redundancies in the quantizer indices, are neglected. Since, later on, the approximately computed quantities will vary within iterations, the iteration-counter ξ = 1, 2, ... has been introduced in (24). The latter equation can be rewritten as

If we insert (26) into (10) we obtain

as the approximation of P(Ijk = λj| Vk0). The use of the bit a-priori probabilities (21) and the definition (22) leads to

Iterative source-channel decoding

115

It should be noticed that the probabilities P(C)er,ξ (ijl,k = λjl) are given by (20), which is computed by the APP-algorithm for channel decoding. The formula (28) is strongly similar to the one that is used in the Optimal-Estimation (OE) algorithm [14], that has been introduced in [9] for soft-bit source decoding: Instead of a pure channel-term that inserts the reliabilities of the received index-bits into the estimation, we have a modified channel-term in (28) (between the brackets) that includes the reliabilities of the received bits and the additional information derived by the APP-algorithm from the channel-code redundancies. Thus, APP channel decoding produces a more reliable "virtual" channel for the OE-algorithm. For the concatenated code that consists of the explicit channel code and the implicit index-dependencies, the APP-algorithm and the OE-algorithm are the constituent decoders in the iterative decoding scheme described in the sequel. In the first "iteration", ie., ξ = 1, the bit a-priori probabilities for the APPalgorithm can be computed from the index a-priori probabilities according to

This is all available a-priori information, before the first run of the APP-algorithm for channel decoding. Similar as in (15), the index a-priori probabilities are given by

Markov-model

with the difference, that the termP'(Ijk-1= µk-10) denotes the index APPs from the previous time k - 1 that resulted from (28) after the last iteration. It should be noticed that the index a-priori probabilities (30) are fixed quantities within the iterations, i.e., they have to be computed only once at each time k. Using the index APPS that result from (28), the mean square estimations xjk for transmitted signals could now be determined by (6). However, the index APPS are only approximations of the optimal values, since the bit a-priori informations, that were used for channel decoding, did not contain the mutual dependencies of the bits: They were "summed away" in (29). The idea of how to improve the accuracy of the index APPS is taken over from iterative decoding of turbo codes [23]: From the intermediate results for the index APPS derived from (28), new bit APPS are computed by

The superscript "(5)" was introduced, since (31) is computed after source

116

Turbo codes

decoding (which included the results of the previous channel-decoding step). Inserting (28) into (31) leads to

which equals

By definition of the extrinsic bit probabilities

(33) can be written as

A comparison of (35) and (29) reveals that both equation are structurally similar. In both equations a summation is carried out over the a-priori probabilities of the index with the fixed realization λjl ε {0, 1 } for the index-bit ijl, k While in (29) no further information on the bit ijl,k was available, now, after the run of the APP channel decoder, additional information P(C)er,ξ (ijl,k = λjl) on the bits has been computed. Our intention is to provide the APP channel decoder with improved a-priori information for the next iteration. Therefore, the information which is given by P(C)er,ξ. (ijl,k) should not be included in the new a-priori information for this bit, since it has been generated by the APP channel decoder itself. This is the reason why it has been excluded in the definition (35) of the extrinsic information. On the other hand, the channel-decoder output-information P(C)er,ξ(ijv,k),v# l, on the other bits is used in (35) to weight the index a-priori probabilities before summing them up. This way, new (extrinsic) information (S) P€,C (ijl,k) is computed compared to (29), because additional knowledge on the data e,ξ . *»* j bit i lk can be derived from the probabilities P^ ,(i]v]), v* /, and the coupling of the index-bits by the index a-priori probabilities. For the next iteration, which starts with a new run of the APP-algorithrn, the required new a-priori information is, therefore, given by

Iterative source-channel decoding

117

Finally, the iteration counter is incremented, i.e., £— >£+ 1, and the next iteration is started. Iterative source-channel decoding (ISCD) can be summarized as follows: 1. At each time k, compute the initial index a-priori probabilities by (30). 2. Insert the results from step 1 in (29) to determine the bit a-priori information for the APP channel decoder for the first iteration. 3. Set the iteration counter to one, i.e., £ = 1. 4. Perform APP channel decoding by an efficient realization of (20). 5. Source decoding through Optimal Estimation by inserting the results of APP channel decoding into (28) to compute new (temporary) index APPS. 6. If this was the last iteration, proceed with step 10, otherwise continue with?. 7. Use (31) and (34) to compute extrinsic bit information from the source-code redundancies. 8. Set the extrinsic information from step 7 equal to the new a-priori information for the APP channel decoder in the next iteration, i.e., perform (36). 9. Increment the iteration counter, i.e., £— »£ + 1, and proceed with step 4. 10. Estimate the receiver output signals by (6) using the index APPS from step 5. C. Implementation of Iterative Source-channel Decoding An iterative source-channel decoder that works as summarized at the end of the previous section is depicted in Fig. 2. It directly fits into the transmission system in Fig. 1 . The iterative source-channel decoder consists of two constituent decoders: The APP-algorithm for channel decoding and the Optimal-Estimation algorithm for source-decoding. Both were described in the previous section. As usual in iterative decoding schemes, they are denoted as Soft-In/Soft-Out decoders (siso decoders), since both process and issue bit-reliability informations ("soft-values"). The soft-values that are passed between the SISO decoders in Fig. 2 are not the probabilities from the previous section, but log-likelihood-ratios (L-values, [22]), which are directly related to them. For instance, the extrinsic L- value from the siso source decoder for the bit i^k in the iteration with the number £ is defined by

using the natural logarithm. The other L-values are related to the corresponding probabilities in a similar fashion. In order to simplify the drawing, vectors L(Ik) of L-values that correspond to the index set defined in (4) are noted in Fig. 2, i.e.:

Figure 2. Iterative source-channel decoding.

Iterative source-channel decoding

119

The advantage of L-values is that they cause fewer numerical problems than probability-values in the implementation of algorithms with finite word-length on a digital computer. The BCJR-algorithm, an efficient implementation of the APPalgorithm, e.g., for decoding of binary convolutional channel codes, can be completely carried out in the L-value domain (Log-MAP-algorithm, [24]). The use of such an algorithm is assumed in Fig. 2. The received channel values vljc are converted to L-values at the input of the siso channel decoder by multiplication with the factor Lc = 4 -^-. This follows from the definition of the L-values, the p.d.f. (1), and the usual assumption that the code-bits are equally likely: P(Vlk=®

fyjt)

.

P
U

/jt= 0)

. Ej. _

Since the siso source decoder, as it has been stated above, processes probabilities, the interface to the APP-algorithm operating in the L-value domain is shortly discussed in the sequel: The computation of the index APPS by the Optimal-Estimation algorithm (28) requires the bit-probabilities P*® (i{k = ty), I = 1, .... N7'. The latter can be computed from the output L-values L^M fr*) of the siso channel decoder by

by inversion of the L-value definition (37). Since the Optimal-Estimation algorithm computes the product over all these probabilities for one index, this operation can be simplified by inserting (40) into (28), i.e.,

(41)

P{(/{ = A/ 1 V*) = D(- jl exp(-L(Q^ -^ ) • Pa(l{ = V),

with the normalizing constant

(42) that does not depend on the variable M. Now, the product in (41) can be turned into a summation in the L-value domain, i.e.: (43)

P<J{=

Thus, the L-values from the siso channel decoder can be integrated into the Optimal-Estimation algorithm for siso source decoding without converting the single L-values back to probabilities if (43) is used instead of (28). This is a

120

Turbo codes

simplification that also has numerical advantages. Additionally, the left-hand side of (43) is a probability that must sum up to one over all index-realizations λjεSj. Hence, the constant Djk can be computed from this condition instead of using (42). The computation of new bit APPS within the iterations is still carried out by (31), but the derivation of the extrinsic L-values L(S)e,ε(ijl,k), that are issued by SISO source decoder, can be simplified, since (34) requires a division which is turned into a simple subtraction in the L-value-domain:

Thus, in the whole ISCD algorithm the L-valuesL(C)er,ε(ijl,k)from the siso channel decoder are used and the probabilities F*®JJJlk = A{) are no longer required.

V. Simulation results Independent Gaussian random signals were correlated by a first-order recursive low-pass filter (coefficient a = 0. 9) in order to generate the signals ty j = 1, ..., M. In the first simulations 5-bit optimal scalar quantizers (i.e., AP = 5V/) were used as source encoders. Their output-bits were spread by a random-interleaver at each time k and afterwards they were commonly channel-encoded by a terminated rate-1/2 recursive systematic convolutional code (RSC-code8, [23]). The codewords were transmitted over the AWGN-channel. hi the first simulation M = 2 signals and a terminated memory-2 rate-1/2 RSCchannel-code9 were considered (code-rate R = 10/24). The results are depicted in Fig. 3. The mean SNR (averaged over all signals) is plotted over E^/N^, the ratio of the energy Eb per transmitted data-bit and the one-sided power spectral density N0 of the channel-noise. Clearly, the "optimal JSCD" decoder works best, followed by the iterative decoding (curves "ISCD"). It should be noticed that the difference between the "opt. JSCD" and the "ISCD"-curves is only due to the approximations in the iterative decoding scheme, i.e., the neglect of the bit-correlations of the quantizer indices in (24) and (31). An additional decoder was realized (curve "BCJR-OE") which used the BCJR algorithm for channel decoding with "zero" a-priori information followed by Optimal Estimation (OE) for source decoding, i.e., the scheme had the same constituent algorithms as ISCD. Due to its better initialization BCD outperforms the "BCJROE" decoder, even if only one iteration is carried out In contrast to "BCJR-OE", ISCD utilizes the time-based correlations of the index-bits for channel decoding. 8. Recursive Systematic Convolutional Codes, [23] 9. Generator polynomials: g0(D) = 1 + D + D2 and g1(D) = 1 + D2; g0(D) was used for the feedback part.

Iterative source-channel decoding

121

Figure 3. Performance of iterative and optimal joint source-channel decoding. Transmission of M = 2 autocorrelated signals (a = 0.9) in parallel.

Obviously, ISCD can not take advantage of the mutual dependencies of the index-bits, since the performance hardly increases with the number of iterations. This can bc explained by the short length of the channel code which does not allow sufficient interleaving10 to make the extrinsic bit-information from the source decoder "virtually" independent for the channel decoder. It is known that the performance of concatenated codes and their iterative decoding schemes is better, if a code with "long" codewords is used. Therefore, a similar simulation as above was carried out, but with M = 50 commonly channel-encoded indices and a terminated memory-4 RSC-code11 (code-rate R = 250/508). The results are depicted in Fig. 4: Now, the second iteration leads to a strong improvement compared to "ISCD, 1 it." and "BCJR-OE", i.e., the mutual correlations of the bits can be exploited in part by the iterative decoding scheme. Unfortunately, the simulation of the optimal decoder (including the interleaver) is too complex in this case, so a comparison with the best possible performance of the concatenated code (without the approximations inherent in ISCD) can not be carried out. It is interesting to notice that more than two iterations do not significantly increase the SNR in Fig. 4. This is due to the "short" source codewords, which consist of just five bits, so that they influence only a small number of RSC-code-bits. 10. One should recall that the length of the interleaver is limited to the number of information bits at each time k in order to avoid additional delay. 11. Generator polynomials: g0(D) = 1 + D3+ D4 and g1(D) = 1 + D + D3+ D4; g0(D) was used for the feedback part

122

Turbo codes

Figure 4. Performance of iterative source-channel decoding and its constituent algorithms. Transmission of M = 50 autocorrelated signals (a = 0.9) in parallel. In Fig. 5 the results of a simulation similar to the previous one are shown. The difference is that the source signal is only weakly autocorrelated by setting the low-pass-filter coefficient to a = 0.5. This time, there is almost no difference between "BCJR-OE" and "ISCD, 1 it.". This means that the bits have almost no autocorrelation. Qualitatively, the behavior of ISCD is the same as above. The second iteration produces a gain (a small one in this case) but more than two iterations again do not further increase the performance. Here, the gain by the second iteration is due to the well known [25] non-uniform probability distribution of the indices at the output of an optimal quantizer, which leads to bit-correlations in the quantizer index that can only be exploited by the iterations. In Fig. 6 the influence of the interleaver on the performance of ISCD is investigated. For this purpose a terminated memory-6 rate-1/2 RSC-code12 was used and the source signals were quantized by 4 bits. The filter-coefficient was set to a = 0.9, i.e., the signals were strongly autocorrelated. The performance of ISCD with 2 iterations was compared for M = 15 indices (resulting in 60 data bits) and M = 100 indices (resulting in 400 data bits), in both cases with a random interleaver and without an interleaver. Fig. 6 shows that the application of an interleaver significantly improves the performance of ISCD. The gain13 gets larger with increasing block length. 12. Generator polynomials: g0(D) = 1+D2 +D3+D5+D6 and g1(D) = 1+D+D2 +D3+D6; g0(D) used for the feedback part. 13. Even if no interleaver is used, a gain caused by the increasing block-length can be observed. The reason is that the rate-loss due to the tail-bits of the terminated convolutional code is smaller if the block-length is enlarged (Eb/N0 is noted on the x-axis, where Eb is the energy per transmitted data-bit.)

Iterative source-channel decoding

123

Figure 5. Performance of iterative source-channel decoding and its constituent algorithms. Transmission of M = 50 autocorrelated signals (a = 0.5) in parallel.

Figure 6. Performance of iterative source-channel decoding for several block lengths and interleavers.

124

Turbo codes

VI. Conclusions Iterative source-channel decoding was stated in a generalized framework for communication systems that use binary channel codes. It was derived from the optimal joint decoding algorithm by insertion of an approximation, that allows decomposition of the optimal scheme into two constituent algorithms, namely the well-known symbol-by-symbol APP-algorithm for the channel-decoding part and the Optimal-Estimation algorithm for the source-decoding part. Both algorithms are applied alternately in several iterations to compensate for the loss of performance due to the approximation and they exploit only "their" type of redundancy, i.e., either the channel-code redundancies (APP-algorithm), or the residual redundancies in the source-encoder indices (OE-algorithm). In the simulations, in which convolutional channel codes were used, it was shown that ISCD achieves a strongly better quality of transmission than its constituent algorithms applied only once. Furthermore, the application of a random interleaver leads to a significantly better performance compared to a system without an interleaver. Both the absolute performance of ISCD and the gain due to the interleaver grow if the block length is increased. In all simulations only two iterations were required to nearly achieve the maximum performance. Overall, ISCD works best in applications where long blocks of correlated data bits are transmitted. It is well suited for a combination with practical source-codecs for speech, audio, and image signals, because it exploits the remaining redundancies in a systematic way and the quantizer-indices are allowed to have different, arbitrary numbers of bits. Additionally, it offers scalable complexity at the decoder, since it is up to the system designer if he wants to perform iterations or not.

REFERENCES [1] FARVARDIN (N.), "A study of vector quantization for noisy channels", IEEE Transactions on Information Theory, vol. 36, n° 4, pp. 799-809, July 1990. [2] MILLER (D.), ROSE (K.), "Combined source-channel vector quantization using deterrainistic annealing", IEEE Transactions on Communications, vol. 42, n° 2/3/4, pp. 347-356, Feb./Mar./Apr. 1994. [3] ZEGER (K.), GERSHO (A.), "Pseudo-Gray coding", IEEE Transactions on Communications, vol. 38, n° 12, pp. 2147-2158, Dec. 1990. [4] HAGENAUER (J.), "Source controlled channel decoding", IEEE Transactions on Communications, vol. 43, n° 9, pp. 2449-2457, Sept. 1995. [5] HINDELANG (T.), RuscITTO (A.), "Kanaldecodierung mit Apriori-Wissen bei nicht binaren Quellensymbolen," in Proceedings of the ITG-Fachtagung "Codierung fur Quelle, Kanal und Ubertragung", Mar. 1998, pp. 163-167.

Iterative source-channel decoding

125

[6] HEINEN (S.), GEILER (A.), VARY (P.), "MAP channel decoding by exploiting multilevel source a priori knowledge," in Proceedings of the ITG-Fachtagung "Codierung fur Quelle, Kanal und Ubertragung", Mar. 1998, pp. 89-94. [7] SAYOOD (K.), BORKENHAGEN (J.C.), "Use of residual redundancy in the design of joint source/channel coders", IEEE Transactions on Communications, vol. 39, no 6, pp. 838-846, June 1991. [8] PHAMDO (N.), FARVARDIN (N.), "Optimal detection of discrete Markov sources over discrete memoryless channels - applications to combined source-channel coding," IEEE Transactions on Information Theory, vol. 40, n° 1, pp. 186-193, Jan. 1994. [9] FINGSCHEIDT (T.), VARY (P.), "Robust speech decoding: A universal approach to bit error concealment," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 1997, vol. 3, pp. 1667-1670. [10] SKOGLUND (M.), "Hadamard-based soft decoding for vector quantization over noisy channels," IEEE Transactions on Information Theory, vol. 45, n° 2, pp. 515-532, Mar. 1999. [11] BAHL (L.R.), COCKE (J.), JELINEK (F.), Raviv (J.), "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Transactions on Information Theory, vol. IT-20, pp. 284-287, Mar. 1974. [12] HAGENAUER (J.), HOEHER (P.), "A Viterbi algorithm with soft-decision outputs and its applications," in Proceedings of the Global Telecommunications Conference (GLOBECOM), Nov. 1989, vol. 3, pp. 1680-1686. [13] GORTZ (N.), "Joint source-channel decoding using bit-reliability information and source statistics," in IEEE Proceedings of the International Symposium on Information Theory (KIT), Aug. 1998, p. 9. [14] GORTZ (N.), "Joint source-channel decoding by channel-coded, optimal estimation," in Proceedings of the 3rd ITG Conference Source and Channel Coding, Jan. 2000, pp. 267-272. [15] HEINEN (S.), BLECK (S.), VARY (P.), "Robust speech transmission over noisy channels employing nonlinear block codes," in Proceedings of the IEEE Speech Coding Workshop, June 1999, pp. 72-74. [16] GORTZ (N.), "Iterative source-channel decoding using soft-in/soft-out decoders," in Proceedings of the IEEE International Symposium on Information Theory (ISIT), June 2000, p. 173. [17] GORTZ (N.), "Analysis and performance of iterative source-channel decoding," in Proceedings of the International Symposium on Turbo Codes and Related Topics, Sept. 2000, pp. 251-254. [18] GORTZ (N.), "On the iterative approximation of optimal joint source-channel decoding," IEEE Journal on Selected Areas in Communications, accepted for publication, [19] HINDELANG (T.), FINGSCHEIDT (T.), SESHADRI (N.), Cox (R.V.), "Combined source/channel (de-)coding: Can a priori information be used twice?," in Proceedings of the IEEE International Symposium on Information Theory (ISIT), June 2000, p. 266.

126

Turbo codes

[20] GORTZ (N.), "Zero-redundancy error protection for CELP speech codecs," in Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), Sept. 1997, vol. 3, pp. 1283-1286. [21] SCHROEDER (M.R.), ATAL (B.S.), "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar. 1985, vol. 1, pp. 25.1.1-25.1.4. [22] HAGENAUER (J.), OFFER (E.), PAPKE (L.), "Iterative decoding of binary block and convolutional codes," IEEE Transactions on Information Theory, vol. 42, n° 2, pp. 429-445, Mar. 1996. [23] BERROU (C.), GLAVIEUX (A.), "Near optimum error correcting coding and decoding: Turbo-codes," IEEE Transactions on Communications, vol. 44, n° 10, pp. 1261-1271,

Oct. 1996. [24] ROBERTSON (P.), HOEHER (P.), VILLEBRUN (E.), "Optimal and sub-optimal maximum a posteriori algorithms suitable for turbo decoding," European Transactions on Telecommunications (ETT), vol. 8, n° 2, pp. 119-125, Mar. 1997. [25] JAYANT (N.S.), Noll (P.), Digital Coding of Waveforms, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1984.

Biography Norbert Gortz received the Dr.-Ing. degree in electrical engineering from the University of Kiel, Germany, where he served as a member of the research staff until October 2000. In November 2000 he joined the Munich University of Technology as a lecturer and senior researcher. His interests include speech coding, joint source-channel (de)coding and the application of the turbo-principle in communications.

Chapter 7

How do we implement block turbo codes? S. Kerouédan, P. Adde and R. Pyndiah ENST, Brittany, France

I. Introduction Since their invention in the early nineties, turbo codes have shown their capacity to achieve reliable digital transmission systems. The concept, first introduced by C. Berrou [1], involves the parallel concatenation of two convolutional codes. An iterative process, known as the turbo code, based on aSISO(soft input-soft output) algorithm exhibits performance close to the Shannon limit. Nowadays, CTC (convolutional turbo codes) are considered for a wide range of applications. The iterative process can be used with the block code family, product codes, as shown by R. Pyndiah [2]. BTC (block turbo codes) are especially attractive for applications requiring high code rates (up to 0.75) [3] or small blocks of data bits [4]. As is explained in the next section, the decoding algorithm is based on Chase's algorithm [5], but it is not the only solution. For example, Kaneko's algorithm [6] can be the core of the decoding process as described in [7]. We will here discuss our choices in the implementation of two block turbo decoders. In the first section, we briefly focus on the description of the siso (SoftIn Soft-out) algorithm for the decoding of concatenated block codes and on the turbo decoding process. Then we explain how to adapt the algorithm to the constraints for circuit implementation. Finally, we discuss the two prototypes realized.

II. Block turbo code: near optimum decoding Let us consider two systematic linear block codes C1 having parameters (n1,k1,δ1) and C2 having parameters (n2,k2,δ2) where ni, ki and δi (i=1,2) stand for code length, number of information symbols and minimum Hamming distance respectively. The product code P = C 1 oC 2 is illustrated in Figure 1. The parameters of the resulting product code P are given by n = nl X n2, k=kl X k2 and δ=δ1 X δ2 and the code rate R by R=Rl X R2 where Ri is the code rate of code Ci.

128

Turbo codes

Let us consider the transmission of binary elements {0,1} coded by a linear block code on a Gaussian channel using binary symbols {-1,+1}. The optimum decision of the received samples R (observation at channel output for a transmitted code word E) is given by:

if

(1)

where Ci = (cip..., cip ...,cin) is the ith code word of code C with parameters and D = (d1, ...dl, ..., dn) is the decision corresponding to the maximum likelihood transmitted sequence conditionally to R. One can show that the previous decoding rule, in the case of AWGN channel, can be simplified into:

where

|R-C i | 2 =

is defined as the square Euclidean distance i

between R and C . In 1972, Chase proposed an algorithm [5] which approximates the optimum sequence decoding of block codes with a low computation complexity and a small performance degradation. This algorithm yields the maximum likelihood sequence D for a given soft input R. It is a suboptimal algorithm. In 1994, R. Pyndiah [2] supplemented this algorithm to compute the soft decisions associated with the maximum likelihood sequence D, which gives a measure of the reliability of each component of D. The optimum soft output is given by the Log Likelihood Ratio (LLR) of the transmitted symbol ejε {-1; +1} element of the transmitted code word E:

Figure 1. Principle of a product code.

How do we implement block turbo codes?

129

where Θ i dj is the optimum decision associated with ej, S j is the subset of code words C i with c j=Θ and

is the probability density function of R conditionally to E. Computing the +1(j) -1(j) +1 -l algorithm, noting that C and C are the code words in Sj and Sj at minimum Euclidean distance from R, leads to the following approximated soft output, cf. [4]

then

This expression indicates that the soft output rj'is obtained by the addition of the soft input rj and a term wj, that we shall call extrinsic information. The search of C+1(j) and C-1(j) is done suboptimaly as described below: 1. Search for the m least reliable binary symbols of R'k ; their positions are called I1, I2, ...,Im and the reliabilities of these symbols are called 2. Generate τ test vectors TQ Q ε {1,..., τ} which are a combination of elementary test vectors Tj having "1" in position Ij and "0" elsewhere. 3. For each test vector TQ among τ words, compute ZQ = T Q Osign (R'k).

130

Turbo codes

4. Decode ZQ by the Berlekamp algorithm (result Y Q ). 5. For each vector YQ, compute the square Euclidean distance MYQ between R and YQ. Class the µ first smallest metric (p.<τ) from smallest to largest metric. The vector corresponding to the smallest metric MD is the decoded word D. 6. For each symbol j of D, if a concurrent word exists, we select it. The concurrent word is the word YQj with the closest metric from D and with the opposite symbol in the jth position. Figure 2 illustrates the selection of the concurrent among the three closest word from D (µ = 3). 7. Compute reliability Fj for each element dj of D. The calculation depends on the existence or not of a concurrent corresponding to the symbol j. There are three cases: • If there is a concurrent word, the reliability is the difference between the metric of this concurrent word and the decoded word (examples j = 5 or j = 11 or j = 25 in Figure 2); • If there is no concurrent word, the reliability is a constant β (example j = 20 in Figure 2), the value of β is obtained by combination of the value of less reliable symbols, the reliability of the jth symbol in word R'k, and the metric of D. C simulations give us the better combinations which, for example, can be for a selection of the 5 least reliable symbols β= |r'j| + |MF l | + |MF3| + | M F 4 | - M D ; • If the received word is a code word, the reliability is γ = β+ 2; 8. Compute extrinsic information; the additional information; given by the decoder

Then iterative computing introduces the turbo effect in the decoding algorithm. The iterative decoding procedure can be achieved by cascading several elementary decoders as illustrated in Figure 3, where k stands for the number of half-iterations,

Figure 2. For each position of D, we look for a concurrent among the µ = 3 closest words YQ.

7How do we implement block turbo codes?

131

Figure 3. Data flow through the elementary decoder.

1. R'k is the vector received from the previous half-iteration, 2. Rk is the vector received from the channel (Rk = R'k for the 1st halfiteration), 3. Wk is the vector that contains the extrinsic information, that is the additional information given by the decoder concerning the reliability of the decoded bit, 4. Dk is the result of binary decoding, 5. α and β are constants varying with the current half-iteration.

III. The algorithm: targeting an implementation The concept of the iterative process, which performs the turbo decoding, involves implementing successively column decoder and row decoder as described in Figure 4. Between each decoding, a reconstruction of the matrix is essential to recover the code words when decoding is completed.

Figure 4. The decoding iterative process.

132

Turbo codes

To achieve the optimal decoder it is necessary to store all the possible concurrent words and to compare their metric to that of the decided word one, but it is not realistic to store more than few concurrent words due to the storage elements it requires. We have to make simplifications which will deteriorate performance but reduce the complexity of the circuit. The difficulty is in making the right compromise [9, 10, 11]. A. Elementary decoder architecture Following the steps of the algorithm described above, we consider the elementary decoder described in Figure 2, using the architecture illustrated in Figure 5. We distinguish five parts in the circuit working in parallel thanks to a sequencer: a. the input sequential part, which concerns all the functions where progress calculations are performed at the same rate as the input symbols (a counter is used for the timing of the elementary decoder, the parity and the syndrome computing sub-blocks, and the processing unit that determines the m least reliable binary symbols); b. the algebraic decoding part, which determines the optimum code word for a given input binary vector; c. the selection part, which selects the maximum likelihood code word (that is, the word having the minimal distance with R'k) and the µ concurrent code words; d. the output sequential part, which computes the extrinsic information where calculations are performed at the same rate as the output symbols; e. 2 storage elements which are divided into 3 RAMS in order to compute in parallel various operations (writing, computing, reading). B. Performance vs. complexity Such a structure requires memory elements, (in particular for input data storage, concurrent words, test vectors, and extrinsic information) which can considerably increase the circuit area. We will discuss here the different options for optimizing the trade off between performance and complexity for the elementary decoder. This is especially necessary in the case of long codes, which is why we choose to illustrate this study with the decoding of a BCH(128,120,4) X BCH(128,120,4). The simulations are given for 6 iterations. The area evaluation given in Tables I, II and III, are derived from the synthesis of the diagram of Figure 5 in the STMicroelectronics CMOS 0.18µm target library. The unit area measure is the gate, and the number of flip-flops is also given as well as the critical time path through the elementary decoder. 1) Number of concurrent words When implementing the more efficient algorithm, 16 test vectors, 16 concurrent words and their Euclidean distance will be stored and computed. This requires a

How do we implement block turbo codes?

133

Figure 5. Block diagram of the elementary decoder.

large number of sequential and combinational elements, depending on the block code used as is explained in [9]. The solution we propose is to fix a priori the number of concurrent words: we keep 1 or 3 competitors. The storage capacity required for each configuration involves 6 RAMS of 128 words of q=5 bits, which is about 3.8kbits. Simulations, Figure 6, show a loss of around 0.2dB at BER=10-6,

Figure 6. BER function ofSNRfor BCH(128,120,4) X BCH(128,120,4) in the case of memorizing 1, 3 or 16 concurrent words (q = 5 bits, τ = 16 test vectors, 6 iterations, m = 5).

134

Turbo codes

Table I. Comparison between the "µ= 1 concurrent word" and "µ = 3 concurrent words" algorithms, with τ = 16 test vectors and q = 5 bits. Number of gates except RAM

Number of flip-flops

Critical path

1 concurrent word

4333

276

6.44 ns

3 concurrent words

5659

396

7.73 ns

between the solutions with 16 concurrent words, and 1 concurrent word. Table I shows that implementation complexity increases by 13.3% between the solutions with 1 competitor and 3 competitors, which is quite reasonable. We do not synthesize the solution of 16 competitors but [9] estimate that the area will increase by 60%. 2) Number of bits to quantify data The number of quantization bits q is the number of bits required to represent the samples of the received word. A received symbol of q bits contains a sign bit and the magnitude, which is the reliability of the symbol. If q is equal or less than 3, simulation results show a significant degradation (Fig.7), so we do not synthesize this configuration. When opting between 4 or 5 bits, some additional resources are required. In particular, we can notice an increase in memory for the storage of the received data, and an increase in combinational elements to compute the new reliability on each symbol. The gain between the use of 5 bits instead of 4 bits is in the range of 0.1 dB at BER=10-6, see Figure.7. In this case, the area increases by about 4 %, see Table II. 3) Number τ of test vectors To create more test vectors, it is required that we increase the number m of "least reliable symbols" and memorize more test vectors. It also implies the use of more comparisons between the Euclidean distance to sort the words. The need for RAM is identical in the three configurations: 3.8 kbits of storage capacity. In Figure 8, and Table III, we can observe that in going from the "8 vectors" solution to "16 vectors" one increases the area by 3.5 % for a gain of about 0.07 dB at BER=10-6. A "32 vectors" solution increases the area by 8% with respect to the "16 vectors" solution for only a gain of 0.04 dB at BER = 10-6. 4) Comments The gain between the different solutions seems sometimes minute but in terms of Bit Error Rate, 0.1 dB can be essential for some applications; that is why we cannot say which solution is the best compromise; it all depends on the application. On the one hand, the "32 vectors" solution versus the "16 vectors" solution does not offer a

How do we implement block turbo codes?

135

Eb/NO in dB

Figure 7. BER function of SNR for BCH(128,120,4) X BCH(128,120,4) in the case of a quantization on 3, 4 or 5 bits (τ =16 test vectors, µ = l concurrent word, 6 iterations,

Table II. Comparison between the quantification on "4 bits" or "5 bits", with 1 = 16 test vectors and µ = l concurrent word.

Ram 5 bits 4 bits

Number of gates except RAM

Number of flip-flops

Critical path

4333 3941

276 266

6.44ns 6.64ns

3.8kbits 3kbits

Table III. Comparison of area between the algorithms where "τ = 32 vectors", "τ = 16 vectors" or "τ = 8 vectors" are used, with q = 5 bits and µ = 1 concurrent word. Critical path

Number of gates except RAM

Number of flip-flops

32 vectors

5135

337

7 ns

16 vectors

4333

276

6.44ns

8 vectors

3981

253

6.84 ns

136

Turbo codes

Eb/NO in dB

Figure 8. BER function of SNR for BCH(128,120,4) X BCH(128,120,4) in the case of choosing 32, 16 or 8 test vectors (µ = 1 concurrent word, q = 5 bits, 6 iterations, m = 5). significant gain for an increase in area of 8 %, so we can class it as a " bad configuration". On the other hand, we can notice that the "16 test vectors, 5 bits and 3 concurrent words" solution is 16.7 % larger than the "8 test vectors, 5 bits and 1 concurrent word" solution, for BER= 6.10-7 at SNR = 4dB compared with BER= 3.10-6 at SNR =4 db.

IV. Towards a prototype After discussing the choice of the correct parameters for the algorithm, and designing the elementary decoder to implement it, we are now going to focus on the architecture of the a turbo decoder. A. To perform the iterative process: 2 choices A first analysis of the block turbo decoder algorithm shows that there are two different structures which can be adopted for hardware implementation [8, 12]. The first solution is a modular structure where a module integrates the elementary decoder illustrated in Figure 2. Several modules are then pipelined to realize the block turbo decoder. In the second solution, several

How do we implement block turbo codes?

137

iterations are realized by a single circuit, which is the elementary decoder. We have developed two prototypes: the first one (prototype A) in 1997 uses modular architecture [13]. The second one (prototype B) developed more recently uses the second architecture [14]. Table IV summarizes the different parameters adopted for each prototype. 1) First solution: the modular structure The elementary decoder was implemented on a FPGA circuit (Xilinx 4010), one per half-iteration. This solution requires 2kbytes of RAM per half-iteration. It was validated by VHDL functional simulations, which were compared with initial simulations (C programs). The logic schemes were obtained from VHDL modeling, using automatic synthesis. The noisy data are generated by a C program, stored in a RAM of large size (4 X 256 kbits) and selected randomly. Two breadboards were used: 1. a first one with the encoder, the Gaussian channel simulator and two decoders for the first iteration, 2. a second one, with six decoders for the three following iterations. This second circuit can be duplicated if necessary to increase the number of iterations. As shown in Figure 9, the experimental results are very close to the simulation results. The differences when SNR >3.5dB are due especially to the noise model and approximations in the calculation. Simulations are also less precise for high SNR due to the number of errors counted in the case of computation versus those measured. The data rate is about 6 Mbps. 2) Second solution: the iterations on the same elementary decoder The architecture adopted to implement this solution consists of 3 blocks. The first one, the input unit, is used to store incoming data in the matrices. During the Table IV. Different parameters for the two prototypes. Decoding of the product code BCH(32,26,4) X BCH(32,26,4) Quantization bits Number of iterations Number of test vectors Number of concurrent words

α

β

Prototype A

Prototype B

4 4 8 1

5 7.5 16 3

programmable parameter depending on the current half-iteration programmable parameter depending on the current half-iteration

adaptive coefficient depending on the received word [15]

138

Turbo codes

reception of one matrix, the previous matrix (when it exists) is sent to the processing unit which is in charge of the turbo decoding. In this second block, 7.5 iterations are performed. Then the decoded matrix is sent to the output unit, which returns the binary decision. The whole system was described in VHDL, simulated and compared with the results of C program simulations. The circuit is implemented in a FPGA Xilinx (technology VIRTEX, xcv200). The circuit is fully synthesized and the breadboard is now in a test phase. The data rate is around 1 MHz for 7.5 iterations. B. Comments Obviously the modular prototype is bulkier than the second prototype, due to the repetition of the elementary decoder circuit and memory. At performance level, it is essential to note that the elementary decoder is different in the two cases due to the choice of parameters as discussed in Section III of this article (cf. Table IV). The experimental results (Fig. 10) illustrate the significant gain provided by the

Eb/NO in dB Figure 9. Experimental results compared to simulation results with the modular architecture (prototype A).

How do we implement block turbo codes?

139

Eb/NO in dB Figure 10. Comparison between the two prototypes. The parameters are different for each prototype (cf. Table IV).

second prototype, but it is absolutely realistic to implement the more efficient elementary decoder in the modular solution. In this case, experimental results will certainly be identical for both prototypes, except in terms of frequency. Indeed, the drawback of prototype B is the much lower data rate, but this can be sufficient for some applications.

V. Conclusion This study presents our implementation of two block turbo decoders. The experimental results corroborate the simulation results and shows good performance of block turbo decoding. We choose a decoding algorithm and a BCH product code but it is not the only possibility. In [17], the author discusses of her implementation of an RS block turbo code. Some other teams implement their

140

Turbo codes

algorithm in DSP [16], but what is essential is that such prototypes whether in FPGA technology, ASIC or DSP demonstrate the real efficiency of turbo decoding fo future applications. Acknowledgement The authors would like to acknowledge Jean-Rene Inisan and Yves Sichez for their help in the realization of the first prototype, Nassim Abid, Mathieu Deilhes, Marta Fernandez Sanchez, Bertrand Mercier and Mohamed Sardi, third year students in Electronics at the ENST Bretagne, and Magali Le Gall, member of the Electronic Department, for their help in designing, simulating and testing the second prototype. REFERENCES [1] BERROU (C), GLAVIEUX (A.), THITIMAJSHIMA (P.), "Near Shannon limit error-correctin coding and decoding: Turbo-codes", IEEE Int. Conf. on Com. ICC 93, May 1993, 2/3, pp. 1064-1071. [2] PYNDIAH (R.), GLAVIEUX (A.), A. PICART (A.), JACQ (S.), "Near optimum decoding of product codes", in proc. of IEEE GLOBECOM '94 Conference, Nov-Dec. 1994, San Francisco, 1/3, pp. 339-343. [3] PYNDIAH (R.), "Near optimum decoding of product codes: Block Turbo Codes", IEEE Trans. on Comm., Aug. 1998, 46, n° 8, pp. 1003-1010. [4] PYNDIAH (R.), "Iterative decoding of product codes: block turbo code". Int. Symposium on turbo codes and related topics, Brest, Sept. 1997, pp. 71-79. [5] CHASE (D.), "A class of algorithms for decoding block codes with channel measurement information", IEEE Trans. Inform. Theory, Jan. 1972, IT-18, pp. 170-18. [6] KANEKO (T), NISHUIMA (T.), INAZUMI (H.), HIRASAWA (S.), "An efficient maximumlikelihood-decoding algorithm for linear block codes with algebraic decoder", IEEE Trans. Inform. Theory, Mar. 1994, 40, pp. 320-3327. [7] DAVE (S.), KIM (J.), KWATRA (S.C.), "An efficient decoding algorithm for Bloc Turbo Codes", IEEE Trans. On Communications, Jan.2001, 49, n°1, pp 41-46. [8] ADDE (P.), PYNDIAH (R.), RAOUL (O.), INISAN (J.R.), "Block turbo decoder design", Int. Symposium on turbo codes and related topics, Brest, Sept. 1997, pp. 166-169. [9] ADDE (P.), PYNDIAH (R.), RAOUL (O.), "Performance and complexity of block turbo decoder circuits", Third International Conference on Electronics, Circuits and System KECS'96, 13-16 Oct. 1996 - Rodos, Greece, pp. 172-175. [10] KEROUEDAN (S.), ADDE (P.), FERRY (P.), "Comparaison performances/complexity de decodeurs de codes BCH utilises en turbo-decodage". Gretsi'99, 13/17 Sept.1999. [11] ADDE (P.), PYNDIAH (R.), BUDA (F.), "Design and performance of a product code turbo encoding-decoding prototype", Annals of telecommunications, 54,n°3-4, pp. 214-219.

How do we implement block turbo codes?

141

[12] RAOUL (O.), ADDE (P.), PYNDIAH (R.), "Architecture et conception d'un circuit turbodecodeur de codes produits," GRETSI'95, Juan-Les-Pins, Sept. 1995, pp. 981-984. [13] ADDE (P.), PYNDIAH (R.), INISAN (J.R.), SICHEZ (Y), "Conception d'un turbo decodeur de code produit", GRETSI'97, Sept. 1997, Grenoble, pp. 1169-1172. [14] KEROUEDAN (S.), ADDE (P.), "Implementation of a block turbo decoder in a single chip", 2nd International Symposium on Turbo Codes and ReIated Topics, 3/7 Sept.2000, Brest. [15] ADDE (P.), PYNDIAH (R.), "Recent simplification and improvement of block turbo code", 2nd International Symposium on Turbo Codes and Related Topics, 3/7 Sept.2000, Brest. [16] GOALIC (A.), PYNDIAH (R.), "Real time turbo decoding of product codes on a digital signal processor", Int. Symposium on turbo codes and related topics, Brest, Sept. 1997. [17] ROBERT (S.), "Les turbo codes en blocs", seminaire UBS, Lorient, Feb. 2001.

Chapter 8

Turbo processing in transmit antenna diversity systems G. Bauch and J. Hagenauer Institute for Communications Engineering, Munich Univerity of Technology, Germany

N. Seshadri Broadcom Corporation, California, USA

I. Introduction In future wireless communications systems high data rates need to be reliably transmitted over time-varying bandlimited channels. The wireless channel suffers mainly from time-varying fading due to multipath propagation and destructive superposition of signals received over different paths. Therefore, special techniques have to be applied to enable bandwidth efficient transmission. A widely applied technique to reduce the detrimental effects of multipath fading is antenna diversity. Usually, multiple antennas are used at the receiver with some kind of combining of the received signals, e.g. maximum ratio combining. However, it is hard to efficiently use receive antenna diversity at the mobile units since they should remain relatively simple, inexpensive and small. Consequently, receive diversity has been almost exclusively used at the base station. Recently, different transmit diversity techniques have been introduced to benefit from antenna diversity also in the downlink while putting the diversity burden on the base station. A transmit diversity scheme for two transmit antennas and a simple decoding algorithm was introduced by Alamouti in [1] and generalized to an arbitrary number of transmit antennas as space-time block coding by Tarokh et al. in [2]. Actually, the term code is somehow misleading since there is no coding gain. A space-time block code is basically a transmit diversity scheme which. achieves the maximum possible diversity level of nT. nR for nT transmit and nR receive antennas under the constraint of allowing simple combining. To obtain further coding gain a space-time block code has to be concatenated with an outer FEC code. In recent years 'Turbo" codes have been shown to perform close to the Shannon limit in AWGN channels. There have also been a number of proposals for

Turbo processing in transmit antenna diversity systems

143

bandwidth efficient Turbo coding schemes combining the ideas of Turbo codes and higher order modulation. In [3], Robertson and Wörz proposed a sophisticated scheme using Ungerböck codes in their recursive systematic form as component codes for Turbo-TCM. We will first describe the multiple-input-multiple-output (MIMO) channel model in Section II and motivate the use of multiple-antenna systems by results on the capacity of MIMO channels in Section III. In Section IV, we explain the idea of space-time block coding and give an interpretation of space-time block coding as a transformation of the fading multiple-input-multiple-output (MIMO) channel towards a Gaussian single-input-single- output (SISO) channel and derive analytical results on the bit error rate of space-time block codes. Furthermore, we give a general linear combining rule for space-time block codes. In Section V we will use Turbo-TCM as an outer code to be used with a space-time block code. Finally, in Section VI we describe the MAP equalizer for space-time block codes in frequency-selective MIMO channels and propose a double iterative scheme for turbo equalization and turbo decoding of the concatenation of Turbo-TCM and space-time block codes.

II. Channel model We first consider a flat fading multiple-input-multiple-output (MIMO) channel as shown in Figure 1, which describes a system with nT transmit antennas and nR receive antennas. The tap gain from transmit antenna i to receive antenna j at time k is denoted by h(ij)K. Later, we will drop the time index k if the taps are assumed to be constant over the considered time period.

Figure 1. Flat fading MIMO channel model

144

Turbo codes

The antennas are separated far enough to ensure independently fading channels from each transmit to each receive antenna. Therefore, the channel taps are modelled as independent complex Gaussian random variables of equal (ij) 2 variance and satisfy E{| hk | } = 1, i.e. the channel is passive. We will consider two versions of time variance: In quasi-static fading the tap gains are assumed to be constant over a block and change independently from block to block. This describes a system which applies frequency hopping. We will also consider a channel where the tap gains vary according to a classical Doppler spectrum, which is characterized by the product fdTs of maximal Doppler shift and symbol duration. The symbol transmitted from antenna i at time k is denoted xk(i). The mean energy per symbol xk(i) is given by

whereas

is the total energy per use of the MEMO channel. A channel use is defined as the simultaneous transmission of a symbol xk(i) from all transmit antennas i = 1, ..., nT The observed value at receive antenna j at time k is given by

The additive noise nk(j) at each receive antenna j is assumed to be white and Gaussian with spectral power density N0 in the equivalent lowpass domain, i.e.

where δn is the Kronecker delta. The flat fading channel model implies negligible time dispersion, which can be assumed for indoor applications. However, in general there will be significant time dispersion and the wireless MIMO channel will consist of a frequency-selective fading channel from each transmit antenna to each receive antenna as depicted in Figure 2. The memory D per transmit antenna is considered to be the same for all channels. If we assume the transmit antennas to be located at the same base station, e.g. distributed on the roof of a house, this assumption is justified by the fact that the number of multipath components with different delay is dictated by large structures and reflecting objects. The d-th tap of the frequency-selective channel from transmit antenna i to receive antenna j at time k is denoted hk(ij) (d), d = 0, ..., D. All taps hk(ij (d) are independent complex Gaussian random variables with zero mean and equal mean power and satisfy

Turbo processing in transmit antenna diversity systems

145

Figure 2. MIMO ISI channel model.

i.e the channel is passive. In this paper we consider only quasi-static fading in the frequency-selective case.

III. Capacity of MEMO channels The application of multiple antenna systems is motivated by the large capacities of MIMO channels. If the channel taps are unknown to the transmitter but perfectly known to the receiver, the capacity of a flat fading MIMO channel is given by [4, 5]

where the nR × nT matrix

contains the channel taps. In fading channels the capacity is limited by deep fades. However, during most of the transmission time a significantly higher capacity is available. This is particularly important if transmission is done in bursts without coding across bursts. If the system is designed to provide a certain data rate, a certain percentage of bursts will be erroneous. This percentage is called outage and it is assumed that an outer ARQ protocol takes care of these erroneous packets. Thus, in fading channels it is useful to consider the outage capacity instead of the Shannon capacity. The x%-outage capacity Cx is the maximum data rate which can be transmitted error free in (100 - x)% of the time, i.e. we expect x% outage. Since the channel is assumed to be constant during transmission of each burst, the

146

Turbo codes

outage capacity can be evaluated by Monte Carlo simulation if we compute the channel capacity for a large number of channel realizations H. It can be shown that the asymptotic slope of the capacity curve over Es/N0 in dB is determined by the minimum min {n T ,n R } of the number of transmit and receive antennas [4]. Therefore, a significant gain in terms of outage capacity is achieved when the number of antennas is increased at both the transmitter and the receiver. Figure 3 shows numerical results on the 10%-outage capacity of a flat fading MIMO channel with an equal number of nT = nR transmit and receive antennas. It can be observed that the outage capacity increases linearly with nT = nR. However, if the number of antennas is increased only at the transmitter or the receiver, only a parallel shift of the capacity curve over Es/N0 is achieved but the asymptotic slope remains constant since min {nT, nR} is not increased. This is depicted in Figure 4 for receive antenna diversity and nT = 1 transmit antenna. We will come back to this result later. For frequency-selective MIMO channels, we obtain in principle the same results.

IV. Space-time block codes A. Principle of Space-Time Block Codes In [4] and [5], it was shown that the outage capacity of fading channels can be increased significantly by using multiple antennas at both the transmitter and the

Figure 3. 10% outage capacity of flat fading MIMO channel with nT = nR

Turbo processing in transmit antenna diversity systems

147

Figure 4. 10% outage capacity of flat fading MIMO channel with nT = 1.

receiver. There have been different proposals to make use of multiple antennas in a communications system: The data rate can be increased directly by transmitting independent symbols simultaneously from different antennas. If M-ary modulation is applied a maximum likelihood receiver has to choose the most likely out of MnT signals, which results in prohibitive complexity even for a moderate number of transmit antennas. Therefore, suboptimum detection has to be applied. An example is Bell-Labs-Layered-Space-Time-Architecture (BLAST) as proposed in [6]. Here, in each step only the signal transmitted from one antenna is detected while the signals transmitted from the other antennas are suppressed by means of zero forcing or MMSE and by interference cancellation using the results of previous detection steps. However, the performance of this scheme is limitted by error propagation [7]. Furthermore, such systems require at least as many receive antennas as transmit antennas [6, 8]. This is undesired in some mobile receivers, which shall remain small, simple and cheap. An alternative approach is to use multiple transmit antennas to increase the diversity level, i.e. the asymptotic slope of the BER curve over SNR. The same information is transmitted from different antennas making use of the fact that the probability of simultaneous deep fades in all channels is smaller than in the case of one transmit antenna. Suitable pre-processing at the transmitter makes sure that the received signals can be combined such that the signals transmitted simultaneously from different antennas add up only constructively. An advantage

148

Turbo codes

of transmit antenna diversity is that additional receive antenna diversity is possible to improve the performance but it is not mandatory. A simple transmit antenna diversity scheme for nT = 2 transmit antennas was proposed by Alamouti in [1] and generalized to an arbitrary number of transmit antennas as Space-Time Block Codes by Tarokh et. al. in [2] and [9]. We retain the term Space-Time Block Codes since it is generally accepted in literature. However, we emphasize that basically space-time block codes are a pure diversity scheme and do not achieve an additional coding gain. Therefore, actually the term code is a little misleading. The mathematical basis of space-time block codes is the theory of real orthogonal designs which was developed by Radon at the beginning of the 20th century in [10] and extended to generalized complex orthogonal designs in [2]. A space-time block code is defined by a generalized complex orthogonal design, i.e. a P X nT matrix

with orthogonal columns. The entries bij of B are elements xt of an M-ary signal constellation, the complex conjugates x* or linear combinations of xt and xt*. A block of K symbols xt, t = 1, ..., K is input to the space-time block "encoder". The K symbols are mapped on the nT. P entries bij of the matrix B according to the mapping rule of the space-time block code. Then, all entries bij in the same row of B are transmitted simultaneously from nT antennas. The entries in the same column of B are transmitted from the same antenna in successive time slots. Therefore, the columns of B represent "space", the rows represent "time". In contrast to space-time trellis codes, the mapping rule of a space-time block code is independent of the particular modulation constellation. The crucial property of space-time block codes is the orthogonality of the columns of the matrix B, since it makes possible to separate the symbols transmitted simultaneously from different antennas at the receiver by simple linear combining. This will be explained in the next sections. B. Example of Space-Time Block Code with Two Transmit Antennas The simplest space-time block code, which was proposed by Alamouti in [1], serves as an example to show the principle of space-time block codes. Alamouti's scheme for nT = 2 transmit antennas is determined by the complex orthogonal design

Turbo processing in transmit antenna diversity systems

149

Here, a block is made up of K = 2 symbols, since the symbols x1, x2 and their conjugates are entries of the matrix B2. The example given in Figure 5 considers 8-PSK modulation. Three source bits are viewed as a symbol ct and mapped on a complex constellation point xt. In the example of Figure 5, the symbol c1 = 2 is mapped on x1 = j, and the symbol c2 = 7 is mapped on x2 = 1 / 2 - j/ 2 he complex symbols x1 and x2 are written in a P X n T = 2X2 matrix according to the mapping rule B2 given in (8). In the first time slot, the symbols in the first row of B2 are transmitted simultaneously, i.e. x1(1) = xl is transmitted from antenna 1, whereas x1(2) = x2 is transmitted from antenna 2. In the next time slot, antenna 1 transmits x2(1) = - x2*, and simultaneously antenna 2 transmits x2(2) = x1*. Since P = 2 symbol durations are necessary to transmit K = 2 symbols, the space-time block code is of rate

and no bandwidth expansion takes place. It is obvious that the full diversity gain can be obtained only if

because each symbol has to be transmitted with the same energy from all transmit antennas. Due to the orthogonality of the space-time block code, simple linear combining is possible at the receiver. For simplicity, we consider nR = 1 receive antenna. Furthermore, the channel is required to be constant during transmission of a space-time block code matrix, i.e. during P = 2 symbol durations. This is not a critical assumption since P is small. At the receiver we observe the superposition of the simultaneously transmitted signals perturbed by noise.

Figure 5. Space-time block code transmitter.

150

Turbo codes

where we used the conjugate complex of the second equation in (11). Leftmultip multiplying (12) by the transposed and conjugate channel matrix HH, i.e.

using

yields and Thus, due to the orthogonal structure of the space-time block code, we obtain decoupled equations for x] and x2. The equations (15) correspond to maximum ratio combining. Hence, we obtain the same diversity level as in receive antenna diversity with nT = | transmit antenna and nR = 2 receive antennas. Figure 6 illustrates the linear combining and the analogy to receive antenna diversity with maximum ratio combining in a combining network. C. Space-Time Block Codes of Other Dimensions Construction of generalized complex orthogonal designs of arbitrary dimension is a difficult mathematical problem. In order to avoid a loss in terms of bandwidth efficiency compared to a single transmit antenna system, the rate of a space-time block code should be as close as possible to R = 1. Furthermore, the number P of rows of the orthogonal design should be minimized in order to avoid long decision delays and to ensure that the channel is constant during transmission of a matrix. For nT > 2 transmit antennas, the best known space-time block codes given in [2] and [11] are of rate R = 3/4. However, the energy combining gain of those space-time block codes exactly compensates for the rate loss. Consequently, for constant per bit SNR Eb/N0 there is neither an energy loss nor a coding gain compared to a hypothetical space-time block code of rate R = 1 and all space-time block codes are equivalent in terms of per bit SNR. However, there is still the

Figure 6. Linear combining for detection of the space-time block code B2.

152

Turbo codes

disadvantage of bandwidth expansion by a factor of 1/R. The simple space-time block code

of rate R = 3/4 for nT = 4 transmit antennas given in [11] serves as an example. Since in each time slot only three antennas are active, the energy per transmitted symbol is given by

A rate 3/4 space-time block code for nT = 3 transmit antennas is obtained from (16) by simply dropping one column. D. Generalized Combining and Soft-Output-MAP-Detection of Space-Time Block Codes D.1. Linear Combiningng

In Section IV-B we have described the linear combining for the space-tune block code B2 according to (8). In general, all space-time block codes can be detected by the following linear combining:

Turbo processing in transmit antenna diversity systems

153

where IK and 0 K x K are the KXK identity matrix and all zero matrix, respectively, and the nRP X 2K matrix H is defined such that

The constant factor c depends on the space-time block code and is c = 1 for the space-time block codes given in this paper. The variance per real dimension of the additive white Gaussian noise nk is given by

where σ2 is the variance per real dimension of the noise at each of the receive antennas. The linear combining can be simplified to the method described in Section IV-B for those space-time block codes, which in a certain row contain only symbols xt or only conjugate complex symbols c*t. It is an important property of space-time block codes that not all base stations in a network need to be equipped with multiple transmit antennas. A base station with only one antenna simply transmits only one column s of the space-time block code. At the receiver, the other taps are set h(ij) = 0 for i s by the channel estimation and the detection is done according to (18). Consequently, the mobile station requires no explicit information if transmit antenna diversity is currently applied. D.2. Soft-Output MAP-Detection Due to the orthogonality of the space-time block code matrix B, we can obtain decoupled expressions for the symbols xt, = 1, ..., K, after linear combining. Since no information is lost due to the combining according to (18), a MAP detector for each symbol xt maximizes the probability

or the logarithmic probability

154

Turbo codes

Here, Pa(xt) is an a-priori information about xt, which can be obtained e.g. from knowledge of the source statistics. If no a-priori information is available, we assume all symbols to be equally likely, i.e. Pa(xt) = 1/M.Using (18) and (20), the transition probability log p(yt | xt) becomes

Collecting all terms independent of xt in the constant, it follows from (22) with (23)

The constant can be evaluated by normalization such that the sum of probabilities P(x t |y t ) over all possible xt becomes equal to one. For M-PSK constellations, |x t | 2 is the same for all possible xt. Therefore, the third term in (24) can be included in the constant. In summary, for equally likely M-PSK symbols, the MAP detector first combines the received values yk(j) according to (18) and outputs the combined values yt. Then, the constellation point with minimal Euclidean distance from yt is chosen. If a soft-output has to be calculated according to Figure 7 to be passed to an outer decoder, the squared Euclidean distance between the constellation points 2 and yt has to be divided by 2σ and normalized. The correction term in (24) has also to be considered for constellations with different energy per symbol. The main advantage of space-time block codes is that the maximal possible diversity level is obtained with low detection complexity. If M-ary modulation is applied, (24) has to be evaluated for all M possible symbols. The increase in complexity for larger M is mainly determined by the evaluation of (24). Therefore, the complexity increases only linearly with M. In contrast, the complexity of space-time trellis codes increases according to M n T - l . E. Interpretation of Space-Time Block Codes as Transformation of the Channel From (18) it can be seen that space-time block codes together with a linear combiner at the receiver transform the fading MIMO channel towards a SISO channel with a lower probability of deep fades compared to the channel from a

Turbo processing in transmit antenna diversity systems

155

Figure 7. Soft-output MAP detector for space-time block codes and K M-PSK modulated symbols.

certain transmit antenna to a certain receive antenna. In the previous section we have explained that the rate loss of a space-time block code with R < 1 is exactly recovered by the energy gain after linear combining. Therefore, except for a bandwidth expansion, all space-time block codes are equivalent to a hypothetical space-time block code of rate R = 1 for which the mean energy per symbol is given by

Consequently, the space-time block code, the fading MIMO channel, and the linear combiner can be described by an equivalent channel model as depicted in Figure 8, which is determined by

The resulting channel tap is given by

Furthermore, the mean energy Es per symbol xt becomes

and the variance

156

Turbo codes

of the additive white Gaussian noise nk corresponds to the variance σ2 of the noise at a certain receive antenna. The effect of transmit and receive antenna diversity can be illustrated using the density function of the SNR after combining or in the equivalent channel model according to Figure 8. We will need these density functions in order to calculate the bit error probability in Section IV-F. We consider quasi-static fading with independent complex Gaussian channel taps h(ij) with variance σh2 = 1/2 per real dimension. For a particular channel realisation, the average SNR per bit for M-ary modulation in diversity branch (if), i= 1,..., n T , j= 1, ...,nR, is defined by

The expectation value of (30) yields

The average per bit SNR γb after linear combining is the sum of the average SNRs γb(ij) in the diversity branches, i.e.

and the expectation value of (32) becomes

Figure 8. Equivalent channel model for space-time block codes and linear combining at the receiver.

Turbo processing in transmit antenna diversity systems

157

The sum

is a chi-squared random variable with 2nTnR degrees of freedom and the density function

Using (32) and (31), the density function of the average per bit SNR γb after combining becomes

We will need fγb (γb) for the BER calculation in Section IV-F. However, first we investigate the density function / (7) of the SNR γb after combining, normalized to its expected value γb,0. From

it follows the density function

The density functionfγ (γ) is depicted in Figure 9 for different diversity levels It can be observed how the variance of the SNR decreases with increasing diversity level, i.e. the fading channel is shifted towards an AWGN channel, the density function f y (y) of which is given by an impulse at 7= 1. Due to the chosen normalization, the curves in Figure 9 describe any diversity scheme with diversity level nT. nR, no matter which particular diversity method is applied. The diversity can be achieved e.g. by transmit antenna diversity, receive antenna diversity, frequency diversity, or a combination of different methods. F. Bit Error Probability of Space-Time Block Codes The transmission of a space-time block code matrix together with linear combining at the receiver corresponds to transmission over the SISO-AWGN channel of Figure 8 with the per bit SNR Yb according to (32). The bit error

158

Turbo codes

Figure 9. Density function of the normalized SNR according to (37) after combining for diversity levels nTnR.

probability Pb(yb) for this particular matrix can be calculated using the well known expressions for an AWGN channel. Using the SNR density function fyb(yb) given in (36), we can now calculate the bit error probability Pb of a space-time block code in quasi-static fading with independent complex Gaussian channel taps from

For BPSK and QPSK with Gray mapping, we obtain

According to [12], there exists the closed form solution

for (39), where

Turbo processing in transmit antenna diversity systems

159

For higher order modulation, there exists no closed form solution [13]. However, for high SNR we can use the approximation

for M-PSK with Gray mapping [14]. For the bit error probability (39) of M-PSK it follows the approximation

'

where

Figure 10 gives simulation results for the ber of space-time block codes with 8-psk modulation for quasistatic flat fading. The analytical results obtained from (44) match almost perfectly with the simulation results. G. Outage Capacity of Space-Time Block Codes In Section III, it was stated that the asymptotic slope of the capacity curve over SNR is determined by the minimum min {nT ,nR} of the number of transmit and receive antennas. From the equivalent channel model in Figure 8, it is obvious that the capacity of a system applying space-time block codes is always limited by the effective min {n T ,n R } = 1. Therefore, the outage capacity of the underlying MIMO channel can never be reached using space-time block codes if additional receive antenna diversity is applied, i.e. nR >1. This is due to the fact that diversity is obtained by transmitting the same information over all antennas, i.e. the signals transmitted from different antennas are not independent. For a space-time block code of rate R = 1, the capacity for a block with constant channel taps h(ij) follows from the equivalent channel model in Figure 8 to be

160

Turbo codes

Figure 10. BER of space-time block codes with nt transmit and nR receive antennas for 8-PSK with Gray mapping and quasi-static flat fading.

For nR - 1, this is equivalent to the capacity of a multiple-input-single-output (MISO) channel according to (5). Therefore, the capacity of the underlying MISO channel can be reached. However, for nR > 1, the capacity curveis only shifted, since Hip

the sum V V | ^((0 1 is a chi-squared random variable with 2nTnR degrees of freedom and expected value nT nR, but the asymptotic slope is not increased. From an information theory point of view it can be concluded that transmit antenna diversity should be used in systems with nR = 1 receive antenna. If multiple receive antennas are available, the data rate should be increased by transmitting independent data from different antennas. This approach is e.g. chosen in Bell Labs Layered Space-Time Architecture (BLAST) [6]. However, such systems require either high detection complexity or suffer from error propagation like BLAST [7]. Therefore, in practice, transmit antenna diversity can also be useful if multiple receive antennas are applied.

Turbo processing in transmit antenna diversity systems

161

V. Concatenation of space-time block codes and outer channel code Space-time block codes are a pure diversity scheme, which does not provide a coding gain. In order to also obtain coding gain, the space-time block code has to be concatenated with an outer channel code (see Figure 11). Since the input symbols of a space-time block "encoder" are constellation elements of a higher order modulation, trellis-coded modulation (TCM) is well suited for this task. The space-time block code transforms the MIMO channel to aSISOchannel according to Figure 8. Therefore, the same design criteria are obviously relevant for the outer code as in transmission overSISOchannels. Usually, channel coding is done for blocks. If the channel tap he of the equivalent channel model in Figure 8 is constant during transmission of a coded block, Ungerbock codes are suitable, which are designed for AWGN channels and maximize the free Euclidean distance. In fast fading environments, i.e. the channel tap he is time varying over a block, parameters like the effective length of the code and the minimum squared product distance are important [15]. Such TCM codes have been proposed by Schlegel and Costello for 8-PSK in [16]. Furthermore, in fast fading, an interleaver IIc should be used between the outer code and the space-time block code (see Figure 11). For 8-PSK, an Ungerbock code with 8 states exists which is optimum for AWGN channels as well as for fast fading channels [15]. This code, the encoder of which is shown in Figure 12, will be used in this paper. Bandwidth efficient schemes with turbo codes, so called turbo trellis-coded modulation, have been shown to perform close to the Shannon limit in AWGN channels. A sophisticated scheme for turbo-TCM was proposed by Robertson and Worz in [3]. In AWGN channels, this scheme outperforms an alternative approach proposed in [17], which uses a binary punctured turbo code and Gray mapping on higer order modulation constellation points, by 0.5 dB in bit error rate and by 3

Figure 11. Concatenation of a space-time block code and an outer channel code.

162

Turbo codes

Figure 12. Encoder of recursive systematic Ungerbock code with 8 states for 8-PSK.

dB in frame error rate. Therefore, here we consider this scheme for channel coding in a transmit antenna diversity system with space-time block codes. A. Concatenation of Turbo-TCM and Space-Time Block Codes Turbo-TCM as proposed by Robertson and Worz in [3] uses Ungerbock codes in their recursive systematic form as component codes in a parallel concatenated coding scheme. The encoder is depicted in Figure 13, where the Ungerbock code according to Figure 12 is chosen as component code. The sequence of information bits is encoded by the first component code. The second component code operates on an interleaved version of the information bits. The permutation rule IIT of the interleaver is crucial. Interleaving is performed on groups of bits rather than bits. For M-ary modulation, b - 1 = log2 M - 1 bits are combined to a group dt. Then, the interleaver maps groups in even positions t, i.e. t - 0, 2,4,.... on even positions t', whereas groups in odd positions t, i.e. t = 1, 3, 5,..., are mapped to odd positions. This permutation rule ensures that all systematic bits are contained in exactly one transmitted code symbol xt, since all code symbols of the first component code in odd positions t and all code symbols of the second component code in even positions t' are punctured. Therefore, only every other code symbol is transmitted and the turbo-TCM achieves the same rate of b - 1 bit/symbol as ordinary TCM. The code symbols xt of the turbo-TCM are interleaved according to the permutation rule IIc. The interleaved symbols xk are mapped on transmit symbols according to the space-time block code and transmitted over the MIMO channel. The decoding procedure at the receiver is illustrated in Figure 14. Usually, no a-priori information logPa(xk) will be available for the code symbols. Then, after linear combining according to (18), we obtain from (24) for each transmitted code symbol xk the logarithmic probability log p(y^k |x k ), which. after deinterleaving II"1 is the channel information log p(y^t | xt) of the turbo-TCM decoder. The component decoders are soft-in-soft-out MAP decoders with inputs for channel information and an independent a-priori information log Pa(dt) about the group dt of b - 1 information bits, which determine the trellis transition at time t.

Figure 13. Turbo-TCM with Ungerbock codes.

Figure 14. Decoding of the contenation of space-time block code and Turbo-TCM with Ungerbock codes.

Turbo processing in transmit antenna diversity systems

165

Due to puncturing, channel information is available for every other code symbol of the component codes only. The channel information for the punctured symbols is set to

(

for t odd for decoder 1 for t even for decoder 2 .

However, each code symbol is determined by the systematic part dt and a parity bit pt. The systematic part dt of the punctured code symbols of the first component code is included as the systematic part in the transmitted code symbols of the second component code. Therefore, for the first decoding step of the first component decoder, we can obtain an a-priori information log Pa(dt), t = 1, 3, 5, ... for the trellis transitions, for which no channel information is available, from the deinterleaved channel information for the transmitted code symbols of the second component code. This a-priori information is set to

where pt (2)is the parity bit of the second component code and the time index t refers to the deinterleaved sequence of code symbols. This a-priori information is justified by the assumption that all bit groups dt are equally likely and for the parity bit pt (2) the values pt (2) = 0 and Pt (2) = 1 are equally likely and statistically independent of dt. In this case, applying Bayes rule, (48) follows from

For the other trellis transitions t = 0, 2,4, ..., all possible groups dt are assumed to be equally likely, i.e.

The MAP component decoder computes the a-posteriori information log P(dt | y^), which consists of three parts: channel information, a-priori information and extrinsic information. The a-priori information log Pq(dt) is subtracted symbolwise from the a-posteriori information log P(d t |y). The resulting information is interleaved according to the permutation rule IIT and passed to the a-priori input of the second component decoder. Consequently, at the a-priori input of the second component decoder, for the punctured symbols of the second component code, the direct channel information of which is log p(y^t | xt) = 0, we have channel information and extrinsic information via the first component decoder. For the transmitted symbols, only the extrinsic information of the first component decoder is used as a-priori information. After decoding of the second component code, the a-priori information log Pa(dt',) is subtracted from the

166

Turbo codes

a-posteriori information log P(dt'|y^) and the result is fed back to the first component decoder as a-priori information for the next decoding step. Some iterations can be performed before the a-posteriori information of one of the component decoders is used as final result. Finally, we want to discuss whether an improvement can be obtained by a turbo feedback from the turbo-TCM decoder to the space-time block decoder. Using fed back information in iterative decoding can only improve the performance if extrinsic information is generated. The extrinsic information about a symbol is the information obtained from the information about the other symbols in a block. From Figure 14 it is obvious that the a-priori information log Pa(xk) of the space-time block decoder affects the a-posteriori information of the symbol xk only, since due to the orthogonality of the space-time block code, the detection of the symbols is decoupled. Therefore, no extrinsic information is obtained and the effect of a feedback from the turbo-TCM decoder to the space-time block decoder would be the same as a feedback to the input of the turbo-TCM decoder itself. Clearly, this cannot improve the performance due to multiple use of the same information. Again, we note that the term space-time block "code" for the diversity scheme is misleading. The decoding complexity of the concatenated scheme is mainly determined by the decoding complexity of the outer code. The space-time block code provides diversity on virtually no additional complexity. Of course it is also possible to design special codes for turbo space-time coded modulation. Besides its good performance, the advantage of the serial concatenation of a channel code and a space-time block code proposed in this paper is the low detection complexity. Furthermore, standard channel coding methods can be applied. This makes the system very flexible, since transmit antenna diversity can easily be switched off, if only one transmit antenna is available without requiring a modification of the channel coding scheme. This is also an advantage for the successive introduction of transmit antenna diversity in existing mobile systems. B. Simulation Results Simulations were carried out for transmit antenna diversity with nT = 2 transmit antennas and the space-time block code B2 according to (8). For channel coding we used the 8-PSK Ungerbock code with 8 states according to Figure 12 as well as turbo-TCM with the same Ungerbock code as component code. Therefore, the transmission rate is R = 2 bit/channel use. For decoding we used the BCJR-MAP algorithm [18] and assumed that the receiver has perfect knowledge of the channel coefficients. All interleavers are pseudo-random. The block size was 1000 information bits, i.e. L = 500 code symbols of the outer code, which is reasonable for a mobile radio system. For larger blocks, we expect higher turbo gains due to better interleaving. Figure 15 shows the frame error rate Pf for quasi-static flat fading, where a frame error means one or more bit errors. It is a typical phenomenon for turbo

Turbo processing in transmit antenna diversity systems

167

Figure 15. Frame error rate of the concatenation of a space-time block code (STBC) with an outer channel code (TCM) or Turbo-TCM (TTCM). nT = 2 transmit antennas, nR receive antennas, 8 PSK, frame size L - 500 symbols, quasi-static flat fading.

decoding in fading channels that only a marginal additional gain is obtained if more than one or two iterations are performed. This is also true for large blocks of e.g. 10000 information bits. The turbo gain after two iterations is about 1.5 dB. The diversity gain due to the space-time block code compared to turbo-TCM without transmit antenna diversity (TTCM), i.e. nT = nR = 1, is obvious. For comparison we included simulation results for the concatenation of ordinary TCM with a space-time block code (TCM + STBC) in Figure 15. Similar results are obtained as for turbo-TCM without iterations. The 10% outage capacity for nR = 1 receive antenna is 2 bit/channel use at Eb/N0 = 7.3 dB. Therefore, at a frame error rate of 10%, the concatenated scheme performs within 2 dB from the 10% outage capacity. For nR = 2, the 10% outage capacity is at 2.3 dB and the concatenated scheme performs within 2.9 dB from this limit. In quasi-static flat fading, no gain in BER is observed after turbo iterations. This is due to the fact that the BER curves of turbo-TCM in AWGN channels show a relatively small waterfall region. For very low SNR, no improvement is obtained by iterations. For high SNR, the BER is small even without iterations. In quasi-static fading, each block is transmitted over an AWGN channel with different SNR. Only a

168

Turbo codes

small fraction of the blocks is transmitted in the SNR range where turbo iterations are very effective. Therefore, virtually no improvement due to turbo iterations is observed in the average BER over all blocks. In fast fading with a classical Doppler spectrum and a product fdTs = 0.01 of maximum Doppler shift and symbol duration, however, we observe a significant gain after one or two turbo iterations also in the BER (see Figure 16). Due to the time variance of the channel during transmission of a coded block, an additional diversity effect is obtained by the interleaver IIT between the turbo-TCM component codes and the interleaver IIc between the outer channel code and the space-time block code. Here, the slope of the curve for the concatenation of a space-time block code and turbo-TCM is steeper than for the concatenation of a space-time block code and ordinary TCM due to the diversity effect of the interleaver IIT between the component codes. Therefore, in fast fading environments, a parallel concatenated outer coding scheme is useful even if no turbo iterations are performed. Similar results are obtained for quasi-static fading, where coding is done across bursts, i.e. if frequency hopping is applied.

Figure 16. Bit error rate of the concatenation of a space-time block code (STBC) with an outer channel code (TCM or Turbo-TCM (TTCM)). nT = 2 transmit antennas, nR receive antennas, 8 PSK, L - 500, flat fading (fdTs = 0.01).

Turbo processing in transmit antenna diversity systems

169

In summary, in addition to transmit antenna diversity, substantial benefits can be obtained by the concatenation of a space-time block code and turbo-TCM, as long as the channel is time varying during transmission of a coded block or frequency hopping is applied.

VI. Space-time block codes in frequency-selective channels A. MAP-Equalization of Space-Time Block Codes Due to time dispersion, the orthogonality of space-time block codes is lost in heavily frequency selective channels. In CDMA systems, the combination can be done in the fingers of a Rake receiver. In TDMA/FDMA systems, an equalizer has to be applied. In this paper, we consider only the optimum MAP equalizer, which makes use of the dependencies of the signals transmitted from different antennas, but does not exploit the orthogonal structure of the space-time block codes. The orthogonality could be used in linear or DFE equalizer structures. The MAP equalizer directly computes the a-posteriori probabilities log P(xt | y). Therefore, no separate combining is necessary. Figure 17 shows exemplarily the channel model with memory D = 4 per transmit antenna for the space-time block code B2 according to (8) for nT = 2 transmit antennas. In transmit antenna diversity with space-time block codes, K information symbols are transmitted from nT antennas in P successive time slots according to the mapping rule of the generalized complex orthogonal design BnT. If the channel memory D is a multiple of P, we can assign P time slots to a transition in the equalizer trellis, such that a trellis transition is determined by a complete D space-time block code matrix and a state is determined by K • ~ information D_

symbols xt. Then, for M-ary modulation, the trellis has only states. Consequently, for a space-time block code of rate R = 1, the equalizer complexity, measured as the number of trellis states, corresponds to the complexity in the single transmit antenna case. In contrast to other multiple transmit antenna schemes, such as space-time trellis codes or BLAST, here transmit antenna diversity does not increase the equalizer complexity. However, the channel memory D taken into account by the equalizer should be a multiple of P, since otherwise, the equalizer would have to consider as many states as for the next multiple of P. The metric increment for the trellis transition at time k from state s' to state s of a soft-in-soft-out equalizer, such as the BCJR-MAP or Viterbi algorithm, is calculated according to

Figure 17. Frequency-selective channel model for transmission with the space-time block code B2

Turbo processing in transmit antenna diversity systems

where the a-priori intormation outer decoder.

171

can be obtained from the feedback of an

B. Double Iterative Scheme for Turbo-Equalization and Turbo-Decoding of Space-Time Block Codes with Turbo-TCM If Turbo-TCM is used for outer channel coding, the concatenated scheme can be detected in a double iterative process including turbo-equalization and turbodecoding (see Figure 18). For turbo-equalization [19, 20], the turbo-TCM decoder needs to deliver also an a-posteriori information about the code symbols xt. The puncturing rule has to be taken into account in the calculation of the extrinsic information, which is fed back to the equalizer. Therefore, first the a-posteriori information about the code symbols with odd time index t is punctured at the output of the first component decoder. At the output of the second component decoder, the a-posteriori information about the code symbols with even time index is punctured. After at least one decoder iteration, channel information and extrinsic information of both component decoders is included in the a-posteriori information of both component decoders. Therefore, subtracting the channel information stemming from the equalizer from the a-posteriori information of the component decoders yields the extrinsic information, which has to be fed back to the a-priori input of the equalizer. The best performance is achieved if some iterations of the outer turbo-TCM decoder are performed, before extrinsic information is fed back to the equalizer in order to perform a turbo-equalizer iteration. However, if each of the component decoders is decoded only once before a new equalizer iteration is performed, the information fed back for the code symbols of the first component decoder includes the extrinsic information of the first component decoder only, but no extrinsic information of the second component decoder. C. Simulation Results Simulation results for the bit error rate and frame error rate of the double iterative scheme are depicted in Figures 19 and 20. We consider a quasi-static frequency-selective channel with three independent complex Gaussian taps of equal mean energy per transmit antenna, i.e. D = 2, and nR = 1 receive antenna. The BCJR-MAP algorithm is used for both equalization and decoding. All other parameters are chosen as in Section V-B. For comparison, results for the double iterative scheme without transmit antenna diversity, i.e. nT - 1, are also included. Three iterations of the turbo-TCM decoder are performed each time, before extrinsic information is fed back to the equalizer. The turbo-TCM decoder converges after two or three decoder iterations. For nT = 2 transmit antennas, a gain of 0.8 dB in BER and 1.3 dB in PER is obtained due to turbo-TCM decoding. A further improvement can be achieved by a feedback to the equalizer. After two equalizer iterations, the gain is increased to 1.3 dB in BER and 2.3 dB in PER,

^

Figure 18. Double iterative scheme for turbo-equalization and turbo-TCM decoding.

Turbo processing in transmit antenna diversity systems

173

Figure 19. Bit error rate of the double iterative scheme. nR = 1, D = 2, 1000 information bits per block. respectively. Again, we observe that a small number of iterations is sufficient in fading environments. The comparison of the curves with and without transmit antenna diversity shows that at least for moderate channel memory, a significant diversity gain is obtained by the space-time block code also in frequency-selective channels.

VII. Conclusions We considered coded transmission over fading MIMO channels with transmit diversity. We gave a new view on space-time block codes and showed that basically the space-time block code together with linear combining at the receiver provides a transformation of the fading MIMO channel to an equivalent SISO channel with a significantly smaller probability of deep fades. Using the equivalent channel model we gave analytical results on the BER. Furthermore, we described the concatenation of Turbo-TCM with a space-time block code and showed that in addition to the transmit diversity substantial benefits can be obtained by turbo iterations as long as the channel is time-varying during transmission of a coded block or frequency hopping is applied. The separate design of transmit diversity and channel coding has the advantage of flexibility.

174

Turbo codes

Figure 20. Frame error rate of the double iterative scheme. nR = 1, D = 2, 1000 information bits per block.

The system can easily be adapted to different numbers of transmit antennas. Furthermore, standard channel coding techniques can be applied. Furthermore, in frequency-selective MIMO channels, space-time block codes provide a significant diversity gain. In this case, the concatenation of space-time block code and turbo-TCM can be detected in a double iterative scheme of turbo equalization and turbo decoding. REFERENCES [1] ALAMOUTI (S.), "A simple transmitter diversity technique for wireless communications," IEEE Journal on Selected Areas of Communications, Special Issue on Signal Processing for Wireless Communications, 1998. [2] TAROKH (V.), JAFARKHANI (H.), CALDERBANK (A.R.), "Space-time block codes from orthogonal designs," IEEE Transactions on Information Theory, vol. 45, n° 5, pp. 1456-1467, June 1999. [3] Robertson (P.), WORZ (T.), "A novel bandwidth efficient coding scheme employing turbo codes", in International Conference on Communications (ICC). IEEE, 1996, pp. 962-967.

Turbo processing in transmit antenna diversity systems

175

[4] FOSCHINI (G.T), GANS (M.J), "On limits of wireless communications in a fading

[5]

[6]

[7]

[8]

[9]

[10]

[11] [12] [13] [14] [15] [16]

[17]

[18]

[19]

environment when using multiple antennas," Wireless Personal Communications, vol.6, pp. 311-335, 1998. TELATAR (E.), TSE (D.N.C.), "Capacity and mutual information of wideband multipath fading channels," IEEE Transactions on Information Theory, vol. 46, n° 4, pp. 1384-1400, July 2000. FOSCHINI (G.J.), "Layered space-time architecture for wireless communication in a fading environment when using multiple antennas," Bell Labs Technical Journal, vol. 1, Nr.2, pp. 41-59, 1996. BARO (S.), BAUCH (G.), PAVLIC (A.), SEMMLER (A.), "Improving BLAST performance using space-time block codes and turbo decoding," in GLOBECOM2000. IEEE, November 2000, pp. 1067-1071. BJERKE (B.A), PROAKIS (J.G.), "Multiple-antenna diversity techniques for transmission over fading channels," in Wireless Communications and Networking Conference (WCNC), September 1999, vol. 1, pp. 261-265. TAROKH (V.), JAFARKHANI (H.), CALDERBANK (A.R.), "Space-time block coding for wireless communications: Performance results," IEEE Transactions on Information Theory, vol. 17, n° 3, pp. 451-460, March 1998. RADON (J.), "Lineare Scharen orthogonaler Matritzen," Abhandlungen aus dem Mathematischen Seminar der Hamburgischen Universitat, vol. I, n° 2, pp. 1-14, 1922. TIRKKONEN (O.), HOTTINEN (A.), "Complex space-time block codes for four TX antennas," in GLOBECOM, IEEE, November/December 2000, pp. 1005-1009. PROAKIS (J.G.), Digital Communications, McGraw Hill, New York, third edition, 1995. PROAKIS (J.G.), "Probabilities of error for adaptive reception of m-phase signals," IEEE Transactions on Communications, vol. COM-16, pp. 71-81, February 1968. KAMMEYER (K.), Nachrichtenubertragung, Informationstechnik. Teubner, Stuttgart, second edition, 1996, (in German). JAMALI (S.H.), LE-NGOC (T.), Coded-Modulation Techniques for Fading Channels, Kluwer Academic Publishers, Boston, 1994. SCHLEGEL (C), COSTELLO (D.J.), "Bandwidth efficient coding for fading channels: Code construction and performance analysis," IEEE Journal on Selected Areas in Communications, vol. SAC-7, pp. 1356-1368, December 1989. LE GOFF (S.), GLAVIEUX (A.), BERROU (C.), "Turbo-codes and high spectral efficiency modulation," in International Conference on Communications (ICC). IEEE, May 1994, pp. 645-649. BAHL (L.R.), COCKE (J.), JELINEK (F), RAVIV (J.), "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Transactions on Information Theory, vol. IT-20, pp. 284-287, March 1974. DOUILLARD (C.), JEZEQUEL (M.),

BERROU (C.),

PlCART (A.),

DlDIER (P.), GLAVIEUX

(A.),

"Iterative correction of intersymbol interference: Turbo-equalization," European Transactions on Telecommunications, vol. 6, n° 5, pp. 507-511, September-October 1995.

176

Turbo codes

[20] BAUCH (G.), KHORRAM (H.), HAGENAUER (J.), "Iterative equalization and decoding in mobile communications systems," in The Second European Personal Mobile Communications Conference (2.EPMCC'97) together with 3. ITG-Fachtagung "Mobile Kommunikation". VDE/ITG, September/October 1997, pp. 307-312. [21] ALAMOUTI (S.), TAROKH (V.), POON (P.), "Trellis-coded modulation and transmit diversity: Design criteria and performance evaluation," in ICUPC, October 1998. [22] BAUCH (G.), AL-DHAHIR (N.), "Iterative equalization and decoding with channel shortening filters for space-time coded modulation," in IEEE Vehicular Technology Conference (VTC), September 2000, pp. 1575-1582. [23] BAUCH (G.), AL-DHAHIR (N.), "Reduced-complexity turbo equalization with multiple transmit and receive antennas over multipath fading channels," in Conference on Information Sciences and Systems (CISS), March 2000, pp. wp3-13-wp3-18. [24] BAUCH (G.), "Concatenation of space-time block codes and "turbo"-TCM," in International Conference on Communications (ICC). IEEE, June 1999, pp. 1202-1206. [25] BARO (S.), BAUCH (G.), HANSMANN (A.), "Improved codes for space-time trellis coded modulation," IEEE Communications Letters, vol. 4, n° 1, pp. 20-22, January 2000. [26] BAUCH (G.), FRANZ (V.), "A comparison of soft-in/soft-out algorithms for "turbodetection"," in International Conference on Telecommunications (ICT), June 1998. [27] BERROU (C.), GLAVIEUX (A.), THITIMAJSHIMA (P.), "Near shannon limit error-correcting and decoding: Turbo-codes (1)," in International Conference on Communications [28]

(ICC), IEEE, May 1993, pp. 1064-1070. BAUCH (G.), NAGUIB (A.), SESHADRI (N.), "MAP equalization of space-time coded signals over frequency selective channels," in Wireless Communications and Networking Conference (WCNC), September 1999.

[29] GLAVIEUX (A.), LAOT (C.), LABAT (J.), "Turbo-equalization over a frequency selective channel," in International Symposium on Turbo Codes, ENST de Bretagne, September 1997, pp. 96-102. [30] HAGENAUER (J.), "The turbo principle: Tutorial introduction and state of the art," in International Symposium on Turbo Codes, ENST de Bretagne, September 1997,

pp. 1-11. [31] HAGENAUER (J.), OFFER (E.), PAPKE (L), "Iterative decoding of binary block and convolutional codes," IEEE Transactions on Information Theory, vol. IT-42, pp. 425-429, March 1996. [32] PICART (A.), DIDIER (P.), GLAVIEUX (A.), "Turbo-detection: A new approach to combat channel frequency selectivity," in International Conference on Communications (ICC). IEEE, June 1997, pp. 1498-1502. [33] ROBERTSON (P.), "An overview of bandwidth efficient turbo coding schemes," in International Symposium on Turbo Codes, 1997, pp. 103-110. [34] ROBERTSON (P.), WORZ (T.), "Coded modulation scheme employing turbo codes," Electronic Letters, vol. 31, n° 18, pp. 1546-1547, August 1995. [35] ROBERTSON (P.), WORZ (T.), "Extensions of turbo trellis coded modulation to high bandwidth efficiencies," in International Conference on Communications (ICC), IEEE, 1997, pp. 1251-1255.

Turbo processing in transmit antenna diversity systems

177

[36] RAPHAELI (Y.), UND ZARAI (D.), "Combined turbo equalization and turbo decoding," in International Symposium on Turbo Codes, 1997, pp. 180-183. [37] TAROKH (V.), NAGUIB (A.), SESHADRI (N.), CALDERBANK (A.R.), "Space-time codes for high data rate wireless communication: Performance criteria in the presence of channel estimation errors, mobility and multiple paths," IEEE Transactions on Communications, vol. 47, n° 2, pp. 199-207, February 1999. [38] WITTNEBEN (A.), "A new bandwidth efficient transmit antenna modulation diversity scheme for linear digital modulation," in International Conference on Communications (ICC), IEEE, 1993, pp. 1630-1633.

Biographies Gehard Bauch received the Dipl.-Ing. and the Dr.-Ing. degree in electrical engineering from Munich University of Technology in 1995 and 2001, respectively. In 1996 he was a research assistant at the German Aerospace Research Establishment (DLR), Oberpfaffenhofen. Since 1996 he has been with the Institute for Communications Engineering, Munich University of Technology. In the summers 1998 and 1999 he visited AT&T Labs Research, Shannon Laboratory, Florham. Park, NJ, USA, working on space-time codes. His research interests include Turbo codes, equalization and space-time processing. Joachim Hagenauer received the Dr.-Ing. degree in electrical engineering in 1974 from the Technical University of Darmstadt where he served as an Assistant Professor. From May 1975 to September 1976 he held a postdoctoral fellowship at the IBM TJ. Watson Research Center, Yorktown Heights, NY, USA. Since 1977 he has been with the German Aerospace Reseaxch Establishment (DLR), Oberpfaffenhofen. During 1986-1987 he spent a sabbatical year as Otto Lilienthal Fellow at AT&T Bell Laboratories, Crawford Hill, NJ. Since 1993 he has been a Full Professor at the Munich University of Technology. He is currently President of the IEEE Information Theory Society. Nambi Seshadri received the B.E. degree in electronics and communication from the University of Madras, India, in 1982, and the M.S. and Ph.D. degrees in computer and systems engineering from Rensselear Polytechnic Institute, Troy, New York, in 1984 and 1986, respectively. He was a Department Head at AT&T Labs Research, NJ, and joined Broadcom Corp. in 1999.

Chapter 9

Quantum factor graphs M.G. Parker Inst for Informatikk, University of Bergen, Norway

I. Introduction Recent interest in Turbo Codes [2] and Low Density Parity Check Codes [4, 6] has fuelled development of Factor Graphs and associated Sum-Product Algorithm [5, 1] (SPA), with applications to error-correction, signal processing, statistics, neural networks, and system theory. Meanwhile the possibility of Quantum Computing has sparked much interest [9, 10], and Quantum Bayesian Nets have been proposed to help analyse and design Quantum Computers [12, 11]. This paper links these areas of research, showing that quantum resources can achieve maximum-likelihood (ML) decoding of classical information. The natural Hilbert Space of a quantum particle encodes a probability vector, and the joint-state of quantum particles realises the "products" associated with SPA. SPA summary is omitted as quantum bits (qubits) naturally encode the total joint-probability state. Dependencies between vector indices become "entanglement" in quantum space, with the Factor Graph defining dependency (entanglement) between qubits. Graph function nodes are implemented as unitary matrix1 - vector products followed by quantum measurement. This is the Quantum Product Algorithm (QPA). As QPA avoids summary it avoids problems encountered by SPA on graphs with short cycles. Moreover, whereas SPA is iterative, using message-passing and activating each node more than once, QPA does not iterate but must successfully activate each node only once. However the (severe) drawbacks with QPA are as follows: 1) Each function node must be repeatedly activated until it successfully "prepares" its local variable nodes (qubits) in the correct entangled state - any activation failure destroys evolution in all variable nodes already entangled with local variables. 2) Once a complete Factor Graph has successfully evolved, final quantum measurement only delivers the ML codeword with a certain (largest) probability. Repeated successful evolutions then determine the ML codeword to within any degree of confidence. This second drawback can be overcome by suitable "ML Amplification" of QPA output, prior to measurement. 1. "Unitary" means that U satisfies UU t = I, where t means "conjugate transpose".

Quantum factor graphs

179

Section II presents QPA, highlighting its ability to deliver the optimal output joint-state, unlike SPA. Quantum systems describe the exact joint-state by appropriate "entanglement" with and measurement of ancillary qubits. Section III considers a simple example of QPA on Quantum Factor Graphs, showing that iteration on graphs with cycles is unnecessary because QPA avoids premature summary. Section IV shows how to amplify the likelihood of measuring the ML codeword from QPA output. Unfortunately, QPA must be repeated many times and/or executed in parallel to have a hope of successful completion. Suitable distributed QPA scheduling is discussed in Section V, and it is argued that successful QPA completion is conceivable using asynchronous distributed processing on many-node Factor Graphs. This paper does not deal with phase properties of quantum computers. It is expected that the inclusion of phase and non-diagonal unitary matrices will greatly increase functionality of the Quantum Factor Graph. The aim of this paper is not to propose an immediately realisable implementation of a quantum computer. Rather, it is to highlight similarities between graphs for classical message-passing, and graphs that "factor" quantum computation. The paper also highlights the differences between the two graphs: whereas classical graphs can only ever compute over a tensor product space, the quantum graph can compute over the complete entangled (tensor-irreducible) space.

II. The Quantum Product Algorithm (QPA) II.1. Preliminaries Consider the Factor Graph of Fig 1.

Figure 1. Two-Qubit Factor Graph.

LetU,=

where | gk | 2 = 1 - |fk | 2, and fk*gk + f k g = 0, V K. "*" means complex conjugate. Let Ufg =

s 1. Ufg is unitary, and the Uf of Fig. 1 and subsequent figures

always implies the action of Ufg together with the measurement of an ancillary qubit, z, as described below. A qubit, xi, can be in states 0 or 1 or in a statistical

180

Turbo codes

superposition of 0 and 1. Let qubits x0, x1 be initialised (by the black boxes) to statesx0 = (a0, B0)Tand x1 = (a1, B1)T, where ai, Bi are complex probabilities such that ai | 2 +| Bi|2 = 1. For instance, x0 is in states 0 and 1 with probabilities I |a 0 | 2 and | B0|2, respectively. Let an ancillary qubit, z, be initialised to state 0, i.e. z = (1, 0). Then the initial joint probability product-state of qubits x0, x1 ,z is A = (a0, B0)T <8> (04, ^ <8> (1, Of = (aottj, ^alt a^, P0P1? 0, 0, 0, Q)T = (*0, 2 = 1, and "®" is the tensor j, s2, s3,0,0,0,0)r, where product. The element at vector index v is the probability that the qubits are in state v. For instance, qubits x0 x1 z are in joint-state 010 with probability Subsequent measurement of a subset of the qubits projects the measured qubits to a fixed substate with a certain probability, and "summarises" the vector for the remaining non-measured qubits. Thus QPA is as follows, • Compute S = UfgA. Measure qubit z. With probability pf= we collapse z to 0, am With probability pg = 0

to 1, and x0, x1 to joint-state Sg = normalisation constants. Pf + Pg = 1. Sf is our desired QPA result. Successful QPA completion is self-verified when we measure z = 0. In contrast, classical SPA computes Sf = UfA (with probability 1) and must then perform a subsequent "summary" step on Sf before returning a result for each variable separately. This result is

For instance, for XQ = 0 we sum the two classical2 probabilities of Sf where x0 = 0 to get | S0 f0 | 2+| s2f2 | 2. Similarly, for x0 = 1 we summarise to| s1 f1 | 2+ |s3 f3|2.It is in this sense that SPA is a "tensor-approximation" of QPA. We identify the following successively accurate computational scenarios (decoding modes) for a space of N binary-state variables: • Hard-Decision operates on a probability space, (a0, P0) <8> (a,, p,) ® ... ® (a^, p^), a, p E {0,1} • Soft-Decision operates on a probability space, (a0, P0) ® (ap P!> ® ... <8> (a^_1? p^), a, p e {Real numbers 0-> 1} 2. Classical SPA probabilities in this paper are always represented as the magnitude-squared of their quantum counterparts.

Quantum factor graphs

181

• Quantum Soft-Decision operates on a probability space, • Entangled-Decision operates on a probability space, (s0, s1 s2,..., s2N_1), s e {Complex numbers} All four of the above Decision modes satisfy the probability restriction that the sum of the magnitude-squareds of the vector elements is 1. Both Quantum Soft-Decision and Entangled-Decision make use of the natural quantum statistical properties of matter, including the property of Superposition. Moreover, Entangled-Decision operates over exponentially larger space. Classical SPA operates in Soft-Decision mode. QPA operates in Entangled-Decision mode. In the previous discussion it was assumed that the QPA was operating on input of the form, (A0,b0) <8> (cij, pj)r <8> (1,0)r. More generally, QPA can operate on input and deliver output in Entangled-Decision mode. This is in strong contrast to SPA which must summarise both input and output down to Soft-Decision mode. It is this approximation that forces SPA to iterate and to sometimes fail on graphs with cycles. Consider the following example. If the diagonal of Uf is (1, 0, 0, 1), then Uf represents XOR, and Fig 1 decodes to codeset C = {00,11} (i.e. x0 + xl = 0, mod 2). C has distance 2, which is optimal for length 2 binary codes: in general if Uf cannot be tensor-decomposed then it represents a code C with good distance properties. Initially, let X0 = ( 0.4, 0.6)T, x1 = ( 0.6, 0.4)T. Then A = ( .24, 0.6, 0.4, V0.24, 0, 0, 0, 0)T, and Sf = —?~ (1, 0, 0, l)T. pf = 0.48, so, on average, 48 Sf outputs are computed for every 100 QPA attempts. The ML codeword is both 00 and 11, and when Sf is measured, 00 and 11 are equally likely to be returned. In contrast, classical SPA for the same input returns X0 = x1 = (—, —), implying (wrongly) an equally likely decode to any of the words 00, 01, 10, 11. So even in this simplest example the advantage of QPA over SPA is evident. n.2. Product Space for Classical SPA Because X0 and x1 are separated in Fig 1, their classical joint-state only represents tensor product states (Soft-Decision mode). An equivalent Factor Graph to that of Fig 1 could combine x0 and x1 into one quaternary variable which would reach all non-product quaternary states. But this requires "thickening" of graph communication lines and exponential increase in SPA computational complexity. Consequently, only limited variable "clustering" is desirable, although too little clustering "thins out" the solution space to an insufficient highly-factored product

182

Turbo codes

space. This is the fundamental Factor Graph trade-off - good Factor Graphs achieve efficient SPA by careful variable "separation", ensuring the joint product space is close enough to the exact (non-summarised) non-product space. n.3. Entangled Space for QPA In contrast, although x0 and x1 are physically separated in Fig 1, quantum non-locality must take into account correlations between x0 and x1. Their joint-state now occurs over the union of product and (much larger) non-product (entangled) space (Entangled-Decision mode). An entangled joint-state vector cannot be tensor-factorised over constituent qubits. QPA does not usually output to product space because the joint-state of output qubits is usually entangled. In fact QPA is algorithmically simpler than SPA, as SPA is a subsequent tensor approximation of QPA output at each local function. II.4. Example Let the diagonal of Uf be (1, 0, 0, 1). Initialise X0 and xl to joint-product-state, X0 = —7= (1,

2)r, jcj = —T~ (1, l)r. With probability pf = 0.5 QPA measures z = 0

2)T. A final measurement 2 1 of qubits x0 and x1 yields code-words 11 and 00 with probability — and —, i j j respectively. In contrast SPA summarises Sf to x0 = xl = — (l, 2). Although a final and computes the joint-state of x0, xl as Sf=—7- (1,0,0, 3

"hard-decision" on x0 and xl chooses, correctly, the ML codeword x0 = x1 = 1, the joint-product-state output, — (1, 2 )T <8> — (1, 2)T = -<1, 2, 2, 4)T assigns, incorrectly, a non-zero probability to words 01 and 10. n.5. A Priori Initialisation To initialise X0 to(α0, β0)T, we again use QPA. Let the diagonal of Uf (for the left-hand black box of Fig 1) be (α0, β0). Then the diagonal of Ug is

a*u Vl-1! aUJn 1 2 Bun VK

± i (— —;— — —, — —J-T

iPt

.

UJ rn

—). T Measurement off z = 0 initialises x0 to

T

(α0, β0) , and this occurs with probability pf - 0.5. xl is initialised likewise. n.6. Comments The major drawback of QPA is the significant probability of QPA failure, occurring when z is measured as 1. This problem is amplified for larger Quantum Factor Graphs where a different z is measured at each local function; QPA evolution failure at a function node not only destroys the states of variables connected

Quantum factor graphs

183

with that function, but also destroys all states of variables entangled with those variables. QPA is more likely to succeed when input variable probabilities are already skewed somewhat towards a valid codeword. Section III shows how QPA can operate successfully even when SPA fails.

III. Quantum Product Algorithm on Factor Graphs with Cycles This section shows that graph cycles do not compromise QPA performance. Consider the Factor Graph of Fig 2. Functions Ufo and Uf1, are both 8 × 8 XOR diagonal matrices with diagonal elements (10010110). Acting on the combined four-qubit space, x0, x1 x2, x3, they are the functions Ufo <8> I2 and I2 ® Uf1, respectively, with diagonal elements (1001011010010110) and (11000011001 11100), respectively, where I2 is the 2×2 identity matrix. QPA on Fig 2 performs the global function UF = (Ufo ® I2) (I2 <8> Uf1) on four-qubit space, with diagonal elements (1000001000010100), forcing output into codeset C = {0000,0110,1011,1101}. Functions Ufo, Uf1, and UF "sieve" the input joint-state, where UF is the combination of two "sub- sieves", Ufo and Uf1. QPA iteration (i.e. successfully completing a sub-function more than once on the same qubits) has no purpose, as only one needs apply a particular sieve once. So graph cycles have no bearing on QPA. (However iteration may be useful to maintain the entangled result in the presence of quantum decoherence and noise). To underline cycle-independence, consider the action of SPA, then QPA on Fig 2. Initialise as follows (using classical probabilities), -6> 0.4)r®(0.6,0.4)r<8>(0.6,0.4)r Hard-decision gives x0 x1x2x3 = 1000, which can then be decoded algebraically to codeword 0000. However, optimal soft-decision would decode to either x3x2xl x0= 1011 or 1101, with equal probability. Because of the small graph cycle SPA fails to decode correctly, and settles to the joint-product-state,

Figure 2. Factor Graph with a Cycle.

184

Turbo codes

(°-!08, 0.892)T ® (0.521, 0.479)T ® (0.521, 0.479)T ® (0.601, 0.399) . A final hard decision on this output gives non-codeword x0x1 x2x3= 1000 which can then be decoded algebraically, again to codeword 0000. In contrast, successful QPA outputs the optimal entangled joint-state, T

SF = —7-L— (

216, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0,

864, 0,

864, 0, 0)T

Final measurement of SF always outputs a codeword from C, and with probability 2*864 outputs either 1011 or 1101. QPA evolves on Fig 2 correctly with probability 0.204. Therefore 1000 attempts produce around 204 correctly entangled joint-states. To underline QPA advantage, consider the single variable extension of Fig 2 in Fig 3, where x4 is initialised to ( 0.5, 0.5)T. As x4 = x0 0 x3, and our original code, C = {0000, 01 10, 101 1, 1 101 }, always had x0 = x3 then x4 should always be 0. But SPA on Fig 3 computes x4 = (0.42 l,0.579)T and subsequent hard-decision gives x4 = 1. In contrast, successful QPA computes the optimal non-product joint-state,

SF = /*

( 216, 0, 0, 0, 0, 0, 96, 0, 0, 0, 0,

864, 0,

864,0, 0, 0,0, 0,0, 0,0,0,0, 0,0,0, 0,0, 0, 0,0)T Final measurement of SF, always output x4 - 0. QPA evolves on Fig 3 correctly with probability 0.204 * 0.5 = 0.114.

IV. Maximum-Likelihood (ML) amplification IV.l. Preliminaries The ML codeword is the one most likely to be measured from QPA output, with

Figure 3. Extended Factor Graph with a Cycle.

Quantum factor graphs

185

probability, pM, say. For instance, if QPA output of Fig 1 is 5, = —7— (1, 0, 0,

2)T,

say, then 11 is the ML codeword, and it is measured with probability pM = —. Numerous executions of QPA on the same input will verify that 11 is, indeed, the ML codeword. However, these numerous executions must output to a length 2N final averaging probability vector (for N qubits). We do not want to store such an exponential vector. Instead, therefore, we "amplify" the statistical advantage of 11 over 00 prior to measurement, thereby making 11 significantly more likely to be read. This is achieved by computing the square of each quantum vector element as follows. Consider two independent QPA executions on the same input, both outputting Sf, Associate these outputs with qubits x0,0, xl, 0, and x0,1 x1,1. The joint-state of qubits x0, 0, x1, 0, X0, 1, x1,1 is V0 = Sf® Sf = ^-(1,0,0,

2,0, 0,0,0, 0,0,0, 0,

2, 0,0, 2)T

Consider the unitary permutation matrix

Only the "1" positions in the first four rows are important. Performing P on (1, 0, 0, 2, 0, 0, V2, 0, 0, 0, 0, 0, 0, 2, 0, 0)T 5 We then measure qubits x0,1 ,x1,1• With probability pa0, = — we read x0,1,,xl , l =0,

PV0 = -

in which case x0,0 and x1,0 are forced into joint state Sf ,1=—/=- (1, 0, 0, 2), which is the element-square of Sf. A measurement of Sf,1 returns 11 4 2 with probability pM = —, which is a significant improvement over pM = — Likewise we compute the element fourth-powers of 5 by preparing two

186

Turbo codes

independent qubit pairs in Sf.1 and permuting the (umeasured) joint state vector V1= Sf,1 ® Sf,l to give PV1, and then measuring the second pair of qubits. With 17 probability pal =17/25weread this pair as 00, in which case the first two qubits are 1 forced into the joint-state Sf,2 = 1/ 17 (1, 0, 0, 4), which is the element fourth-power of Sf, A measurement of Sf,2 returns 11 with probability pM =16/17, 2 which is a further improvement over pM=2/3. In this way we amplify the 3 likelihood of measuring the ML codeword. To compute the element 2kth-power, Sf,k, 2 we require, on average,2/pakindependentpreparations, Sfk-1, each of which requires, 2 Pak on average, 2/pak-1independent preparations, Sfk-2, and so on. We can perform QPA on large Factor Graphs, then amplify the result k times to ensure a high likelihood of measuring the ML codeword, as described above. How-ever the above amplification acts on the complete graph with one operation, P. It would be preferable to decompose P into 4X4 unitary matrices which only act on independent qubit pairs xi,0 and xi,1, thereby localising amplification. Consider, once again, Fig 1. From the point of view of X0,1, x0,0 appears to be in summarised state3, Sf = 1/ 3 (1,

2)T. Similarly, from the point of view

of x0,0, x0,1 appears to be in state sf Thus X0,0, X0,1 appear to be in joint product state V0 =1/3(1, 2, 2,2)T. Consider unitary permutation matrix,

Q=

/1000 0001 0100 \0010

We compute Qv0 =1/3 (1, 2, 2, 2)T on qubits x0,0, x0,1 and measure qubit 5 x0,1. With probability pa0 =5/9 we read X0,1 = 0, in which case X0,0 is forced into joint state Sf,1 = 1/ 5 (1,2), which is the element-square of sf, Due to the exact form of our joint-state vector, Sf, this single measurement is enough to also force X0,0x1,0, into joint state Sf,1. However, for a general function Sf, we should perform Q on every qubit pair, xi,0xi,1, then measure xi,l Vi. This is equivalent to performing P' = Q ® Q on (re-ordered) joint-state vector x0,0x0,1x1,0x1,1, and this is identical to performing P 3.x0,0 is generally not in this summarised state, due to phase considerations, but the viewpoint is valid for our purposes as long as subsequent unitary matrix operations on x0 only have one non-zero entry per row.

Quantum factor graphs

187

onx0,0x1,0x0,1x1,1.The probability of measuring x1,0 = x1,1 = 0 is the same whether P or Q is used. The same process is followed to achieve element 2kth powers. IV.2. The Price of Amplification There is a statistical cost to qubit amplification. Let s = (α, β)T be the initial state of a qubit x, where, for notational convenience, we assume that a and β are both real. Then a2 + β2 = 1 and, given 2k qubits all identically prepared in state s, the likelihood of preparing one qubit in (unnormalised) state sk = (a2k,β2k)T, is γk, where

and rk = a2k + β2k. For a qubit in state sk, the probability of selecting the ML codebit is Mk

β2

(assuming a β). We can plot γk against PMk for various a2 as k varies, as shown in Fig 4. Each of the 25 lines in Fig 4. refers to a different value of a2, for a2 from 0.5 up to 0.98 in steps of 0.2. The initial state, s, when k = 0, occurs with probability

0.95

Figure 4. Amplification Success Probability, γk, v ML Advantage, PMk.

188

Turbo codes

γk = 1, and is marked on the right-hand side of Fig 4 for each of the 25 lines. After one amplification step, k = 1, and another 25 points are marked on the graph to the left of the points for k = 0, indicating that a successful amplification step has occurred with probability γk< 1. Similarly points for k = 2, k = 3, etc. are marked successively to the left on Fig 4. The y-axis shows the ML advantage, PMk which can be achieved with probability γk after k steps for each value of α2. For instance, when s = (a, β)T = (V0.62, V0.38)T, then an ML advantage of PMk = 0.9805 can be ensured after k = 3 steps, and this can be achieved with probability γk = 0.0223 given 23 = 8 independently prepared qubits, all in state s. Amplification is more rapid if 5 already has significant ML advantage (i.e. when a is high). In contrast if a2 = 0.5 then no amplification of that qubit is possible. This is quite reasonable as, in this case, both states 0 and 1 are equally likely, so there is no ML state. Successive measurement of zero of all second qubits of each qubit pair self-verifies that we have obtained successful amplification. If, at any step, k, the second qubit of the qubit pair is measured as one then amplification fails and the graph local to this qubit which has been successfully entangled up until now is completely destroyed.

V. Distributed QPA on Many-Node Factor Graphs VI.1. Preliminaries In classical systems it is desirable to implement SPA on Factor Graphs which "tensor-approximate" the variable space using many small-state variables (e.g. bits), linked by small-dimensional constituent functions, thereby minimising computational complexity. In quantum systems it is similarly desirable to implement QPA on Factor Graphs using many small-state variables (e.g. qubits), linked by small-dimensional constituent unitary functions. Any Quantum Computation can be decomposed into a sequence of one or two-bit "universal" gate unitary operations [3]4. Computational complexity is minimised by using small-dimensional unitary matrices for constituent functions. Moreover, fine granularity of the Factor Graph allows distributed node processing. This appears to be essential for large Quantum Factor Graphs to have acceptable probability of successful global evolution, as we will show. Distributed QPA allows variable nodes to evolve entanglement only with neighbouring variable nodes so that, if a local function measurement or amplification is unsuccessful, only local evolution is destroyed. Remember that local evolution is OFTEN unsuccessful, as failure occurs when a local ancillary qubit, z, is measured as 1, or when a local amplifying qubit is measured as 1. Therefore node localities with high likelihood of successful evolution (i.e. with positively skewed input probabilities) are likely to evolve first. These will then encourage other self-contradictory node localities to evolve successfully. In contrast, non-distributed QPA on large Factor Graphs 4. This also implies that any classical Factor Graph can be similarly decomposed.

Quantum factor graphs

189

using one large global function is very unlikely to ever succeed, especially for graphs encoding low-rate codes. To illustrate the advantage of distributed QPA, consider the low rate code of Fig 5, where Uflj = diag(l, 0, 0, 1). Both top and bottom graphs represent the code C = {0000, 1111}, where U is a combination of XOR sub-matrices, UfoI, Uf12, and Uf23. The top graph distributes processing. We allow Ufo1 and Uf23 to operate independently and in parallel. Moreover, if Ufoi fails to establish, then it does not destroy any successful evolution of Uf23, as the two localities are not currently entangled. Once both Ufo1, and U^j have completed successfully, the subsequent probability of successful completion of Uf12 is, in general, likely to increase. So distributing QPA increases likelihood of successful evolution of the complete Factor Graph. We now demonstrate this graphically. Let qubits x0,x1,x2,x, of Fig 5 initially be in states Xo = (a0, β0)T, x1 = (a1 β l ) T , x2 = (a2, β2)T, x3 = (a3, β3)T, where, for notational convenience, we assume all values are real. Then ai2 + βi2 = 1, Vi. The probability of successful completion of Uf01 is pf01 = (a0a1)2 +(β0β1)2, , and probability of successful completion of Uf23 is pf23= (a2a3)2 + (β2β3)2. Therefore the probability of successful completion of both Uf01 and U after exactly q parallel attempts (no less) is

Given successful completion of Uf01 and successful completion of Uf12 is

, the probability of subsequent

Figure 5. Distributed QPA (top) non-distributed QPA (bottom), 4-bit code.

h

Therefore the probability of successful completion of U f 0 1 a n d Uf23, immediately followed by successful completion of Uf12 is, p0 3(q) = P 0 _ 3 (q l)p'f12, and the probability of successful completion of Uf01 and Uf23, immediately followed by completion failure of Uf12 is, p0 3(q) = Po-3(q - 1) (1 - p'f12). Therefore the probability of successful completion after exactly t steps of Uf01, and Uf23 in parallel, followed by Uf12, is

where D(k) is the set of unordered partitions of k. Therefore the probability of successful completion after at most t steps of Uf01, and Uf23 in parallel, followed by Uf12, is

In contrast, for non-distributed QPA, the probability of successful completion, after at most t steps, of U, (the bottom graph of Fig 5) is P(t)=1-0(1 α1-αα2α3)2 2 t (βoβ1β2β3) ) . Figs 6 and 7 show plots of pm(t) and P(t) versus t for

Figure 6. N° of Steps v Non-Distributed and Distributed QPA: Completion Probabilities.

Quantum factor graphs

191

Figure 7. N° of Steps v Non-Distributed and Distributed QPA: w2 = 0.9, a0 varies.

α0 = α1 = α2 = α3 = w as w varies, and α0 = u, α1 = α2 = α3 = w = 0.9 as u varies, respectively. For Fig 7, low values of u indicate a contradiction between x0 and the other three variables. In particular the contradiction is so pronounced when α0 = 0.0 that successful QPA completion is highly unlikely. More generally, this indicates that severe internal Factor Graph contradictions are fatal to QPA (as they are for SPA). Both Fig 6 and 7 indicate that, due to initial latency of distributed processing, non-distributed QPA appears marginally faster for the first few steps. However, after a few steps distributed QPA in general becomes marginally faster. In fact results are unfairly biased towards the non-distributed case, as it is assumed that attempts to complete U and Unj have the same space-timecomplexity cost, whereas U is far more costly. Hence, even for this smallest example, Distributed QPA outperforms non-Distributed QPA. The example of Fig 5 only achieves marginal advantage using Distributed QPA because the example has so few nodes. The advantage is more pronouneed in Fig 8. Fig 8 represents the code C = {000000000,111111111 }5, where Uijk = diag(l, 0,0,1,0,1, 1,0). We allow Uf012, Uf345, and Uf678 to operate independently and 5. This code is trivial but demonstrates a "worst-case" low-rate scenario. In general, codes of higher rate, with or without cycles, decode more quickly.

192

Turbo codes

Figure 8. Distributed QPA, 9-qubits.

in parallel. If Uf012 fails to establish, then it does not destroy any successful evolution of Uf345 or Uf678, as the three localities are not currently entangled. Once U f 0 1 2 , U f 3 4 5 , and Uf678 have completed successfully, the probability of successful subsequent completion of Uf258 is, in general, amplified. Let qubits xi, 0i
Quantum factor graphs

193

Figure 9. N° of Steps v Non-Distributed and Distributed QPA, 9 qubits.

as the presented results are unfairly biased towards the non-distributed case, as it is assumed that attempts to complete non-distributed U or each constituent Ufijk have the same space-time-complexity cost, whereas U is far more costly. We conclude that Distributed QPA is essential for large Quantum Factor Graphs. V.2. Free-Running Distributed QPA Consider the notional Factor Graph of Fig 13. Each (square) function node activates time-independently on its local (circular) variable nodes. Functions successfully completed are marked with an "X". After a certain time, say, three "areas of success" evolve, due to general agreement between input variable states at these localities. This means that variables on the perimeter of each region of success are "encouraged" to agree with the "general view" of the associated region of success. Unfortunately, in the bottom left of the graph is a variable (dark circle) which strongly contradicts with the rest of the graph. No area of success evolves around it, and it is difficult for other areas of success to "swallow" it. Assuming the contradiction is not too strong then, eventually, after numerous attempts, the complete graph is marked with "X"s and the Graph evolves

194

Turbo codes

Figure 10. N° of Steps v Non-Distributed and Distributed QPA: w2 = 0.9, α0 varies, 9 qubits.

successfully. At this point the contents of each qubit variable can be amplified, and final measurement of all qubits provides the ML codeword with high probability. The advantage of a free-running strategy, where each function node is free to activate asynchronously, is that regions of general agreement develop first and influence other areas of the graph to "follow their opinion". Fig 13 also shows that one "bad" (contradictory) qubit can be a fatal stumbling block to successful evolution of the whole graph (as it can for SPA on classical graphs). Thus Distributed QPA requires Fault-Tolerance, where only an arbitrary subset of entangled nodes are required as a final result (node redundancy). The free-running schedule of Fig 13 naturally avoids the 'bad' qubits, but sufficient evolution occurs when enough function nodes complete. Alternatively, bad qubits could be set to ( 0.5, 0.5) after a time-out. A more detailed proposal of Fault-Tolerant QPA is left for future work. Fig 13 also serves to illustrate the "template" for a Reconfigurable Quantum Graph Array. One can envisage initialising an array of quantum variables so that two local variables can be strongly or weakly entangled by identifying the mutual square function nodes with strongly or weakly-entangling matrices, respectively. In particular, two neighbouring nodes may be "locally disconnected" by setting

Quantum factor graphs

195

Figure 11. N° of Steps v Non-Distributed and Distributed QPA: w2 = 0.9, (α1 varies, 9 qubits.

the function node joining them to a tensor-decomposable matrix, (i.e. zero-entangling). The quantum computer is then measurement-driven. The concept of measurement-driven quantum computation has also recently been pursued in [8], where a uniform entanglement is set-up throughout the array6 prior to computation via measurement. Fig 14 shows the system view of QPA. A continual stream of pure qubits needs to be initialised and then entangled, and then amplified, so as to ensure at least one successful entangled and amplified output from the whole apparatus.

VI. Phase QPA The above discussions have ignored the capacity of Quantum Systems to carry phase information. In fact QPA, as presented so far, is immune to phase modification, as classical probabilities have no phase component. However QPA should be generalised to cope with phase shift in order to decode quantum information. This is the subject of ongoing research. 6. lt is interesting that this entanglement is strongly related to Rudin-Shapiro and quadratie constructions [7].

196

Turbo codes

2

Figure 12. N° of Steps v Non-Distributed and Distributed QPA: w = 0.9, a0 = a8 variesas varies,

9 qubits.

VII. Conclusion and discussion The Quantum Product Algorithm (QPA) on a Factor Graph has been presented for Maximum-Likelihood (ML) Decoding of Classical "soft" information using quantum resources. The relationship of QPA to the Sum-Product Algorithm (SPA) has been indicated, where avoidance of summary allows QPA to overcome small graph cycles. Quantum Factor Graphs use small unitary matrices which each act on only a few qubits. QPA is measurement-driven and is only statistically likely to succeed after many attempts. The ML codeword is obtained with maximumlikelihood by measuring the entangled vector resulting from successful QPA. To ensure a high probability of measuring the ML codeword QPA output can be amplified prior to measurement. The complete ML decoder is only successful after many attempts. Finally, free-running Distributed QPA is proposed to improve the likelihood of successful QPA completion. The free-running distributed structure suggests further benefit will be obtained by introducing Fault-Tolerance in the form of redundant function and variable nodes. Phase aspects of QPA have yet to be explored. This paper has been written to demonstrate the exponential capacity

Quantum factor graphs

197

Figure 13. Free-Running Distributed QPA with one "Bad" Variable.

Figure 14. QPA with Amplification.

of quantum systems, and their natural suitability for graph decompositions such as the Factor Graph. The paper has not tried to deal with quantum noise and quantum decoherence, but one can expect the Factor Graph form to "gracefully" expand to cope with the extra redundancy necessary to protect qubits from decoherence and noise. When viewed in the context of entangled space, it is surprising how successful classical message-passing algorithms are, even though

198

Turbo codes

they are restricted to operate in tensor product space. This suggests that methods to improve the likelihood of successful QPA completion may include the possibility of hybrid QPA/SPA graphs, where SPA operates on non-cyclic and resolvable parts of the graph, leaving QPA to cope with small cycles or unresolved areas of the graph.

VIII. Appendix A - deriving Pm(t) and P(t) for Fig 8 The probability of successful completion of U f o 1 2 , is pf012 = (α0α1α2)2 + (β0β1β2)2' and similarly for pf345 and f 6 7 8 . Let h012 = (1 -pf012)q-1 ,h334545 = ( (1-Pf345)q-1, h678= (1 -Pf678)q-1 • Then the probability of successful completion of Ufo12, Uf345, and Uf678 after exactly q parallel attempts is

Given successful completion of U f012 , Uf345, and Uf678, the probability of subsequent successful completion of Uf258 is

Therefore the probability of successful completion of Uf012, Uf345, and Uffi78, immediately followed by successful completion of Uf258 is, P0 8(q) = P0-3-6(q-1) P' f 2 5 8 , and the probability of successful completion of U f012 , Uf345, and Uf678, immediately followed by completion failure of Uf258 is, p0 8(q) = P 0 - 3 - 6 (q-l) (1 - p' f 2 5 8 ). The probability of successful completion after exactly t steps of U f012 , Uf345, and Uf678 in parallel, followed by Uf258, is

where D(k) is the set of unordered partitions of k. Therefore the probability of successful completion after at most t steps of Uf012, Uf345, and Uf678 in parallel, followed by Uf258, is

Quantum factor graphs

199

In contrast, the probability of successful completion, after at most t steps, of a non-distributed version of Fig 8 is

REFERENCES [1] AJI (S.M.), MCELIECE (R.J.), "The Generalized Distributive Law", IEEE Trans. Inform. Theory, Vol. IT-46, pp. 325-343, March 2000. [2] BERROU (C), GLAVIEUX (A.), THITIMAJSHIMA (P.), "Near Shannon-Limit Error-Correcting Coding and Decoding: Turbo-Codes", Proc. 1993 IEEE Int. Conf. on Communications (Geneva, Switzerland), pp. 1064-1070, 1993. [3] DIVINCENZO (D.P.), "Two-Bit Gates are Universal for Quantum Computation", Phys. Rev. A 51, Vol. 1015, 1995, Also LANL arXiv:cond-mat/9407022. [A] GALLAGER (R.G.), "Low Density Parity Check Codes", IRE Trans. Inform. Theory, Vol. IT-8, pp. 21-28, Jan. 1962. [5] KSCHISCHANG (F.R.), FREY (B.J.), LoELIGER (H.-A.), "Factor Graphs and the Sum-Product Algorithm," IEEE Trans. Inform. Theory, Vol. 47, n° 2, pp. 498-519, Feb. 2001. [6] MACKAY (D.J.C.), NEAL (R.M.), "Near Shannon Limit Performance of Low Density Parity Check Codes," Electronics Letters, Vol. 32, n° 18, pp. 1645-1646, Aug 1996. Reprinted Vol. 33, n°6, pp. 457-458, Mar 1997. [7] PARKER (M.G.), RUMEN (V.), "The Quantum Entanglement of Bipolar Sequences", Sequences and their Applications, SETA'01, Bergen, Norvay, 13-17 May, 2001 Also http://www.ii.uib.no/- matthew/MattWeb.html. [8] RAUSSENDORF (R.), BRIEGEL (H.J.), "Quantum Computing via Measurements Only", LANL arXiv:quant-ph/0010033, 7 Oct 2000. [9] SHOR (P.W.), "Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer", SIAM J. Computing, Vol. 26, pp. 1484-, 1997 First appeared in Proceed-ings of the 35th Annual Symposium on the Theory of Computer Science, ed. S. Goldwasser, IEEE Computer Society Press, Los Alamitos, CA, pp. 124, 1994, Expanded Version: LANL: quant-phl9508027. [10] STEANE (A.M.), "Quantum Computing," Rept Prog. Phys., Vol. 61, pp. 117-173, 1998, Also LANL: quant-ph/9708022. [11] Tucci (R.R.), "How to Compile a Quantum Bayesian Net", LANL: [12]

quant-ph/9805016, 7 May, 1998. Tucci (R.R.), "Quantum Information Theory - A Quantum Bayesian Net Perspective", LANL: quant-ph/9909039, 13 Sep, 1999.

Biography Matthew Parker is currently a Postdoctoral Researcher in the Code Theory Group at the University of Bergen, Norway. Prior to this he was a Postdoctoral Researcher in the Telecommunications Research Group at the University of Bradford, UK. His research interests are Iterative Computation, Sequence Design, Quantum Computation, and Coding Theory.

Index

algorithm APP, symbol-by-symbol, 112 general description 112 quantum product, the, (QPA) 179, 182, 183, 188, 193, 195 distributed 188,193 free-running 193 many-node factor graphs 188 entangled space for 182 factor graphs with cycles 183 phase 195 reduced-state 63 amplification, maximum-likelihood (ML) 184 APP-algorithm, symbol-by-symbol 112 general description 112 asynchronous DS/CDMA, coded 82 et seq multipath Rayleigh fading channel 89 transmission system 101 multipath Rayleigh fading, with, 101 turbo soft interference cancellation 82 et seq AWGN 99 synchronous transmission system with 99 BER from EXIT chart 37

bit interleaving 62 bit 62,157 error probability 157 space-time block codes 157 interleaving 62 block turbo code (BTC) 127 et seq near optimum decoding 127 capacity limit, systematic doping 44 CDMA, synchronous, 87 SISO multiuser detectors 87 channel-coded optimal estimation (CCOE) 106 classical SPA 181 product space for 181 code 1,4,11, 15,29,30,38,40, 43, 59 et seq, 70, 82 et seq, 106, 127, 146, 148, 150, 154, 157, 159, 161, 162, 169, 171 asynchronous DS/CDMA 82 et seq turbo soft interference cancellation 82 et seq block 127, 146, 148, 150, 154, 157, 159, 161, 162, 169,171 space-time 146, 148,150, 154,157, 159, 161, 162, 169,171 bit error probability 157 frequency-selective channels 169 MAP-equalization 169

Index

outage capacity 159 outer channel code 161 principle of 146 transformation of the channel 154 turbo-TCM 162, 171 turbo-equalization and turbo-decoding 171 two transmit antennas 148 turbo (BTC) 127 near optimum decoding 127 channel 106,161 optimal estimation (CCOE) 106 outer, space-time block code and, 161 characteristic matching 29, 43 iterative decoding of serially concatenated codes 29 concatenated, serially 29, 30 iterative decoding 29, 30 code characteristic matching 29 design, with EXIT chart 38 modulations, spectrally efficient, 59 et seq noncoherent iterative decoding 59 et seq parameters, changing inner and outer 40 performance analysis 11 turbo 1, 15, 127 block (BTC) 127 interleaving 15 trellis terminations 1 Ungerboeck, inner, 70 words 4 edge-effect 4 concatenated codes, serially 29, 30 iterative decoding 29, 30

201

code characteristic matching 29 conditional weight enumerating function 2 cycles, factor graphs with, 183 quantum product algorithm 183 cyclic shift interleaver 10 et seq, 17, 20, 25 design, performance evaluation 20 decoder 29, 30, 31, 34, 37, 59 et seq, 61, 62, 67, 68, 105 et seq, 106,109, 111,113, 117, 127, 131, 132, 171 detection, joint, 68 elementary architecture 132 individual, transfer characteristics 37 inner, transfer characteristics 31 iterative 29, 30,59 et seq, 105 et seq, 106, 111, 113, 117, 131 performance 62 process, the, 131 serially concatenated codes 29,30 code characteristic matching 29 source-channel (ISCD) 105 et seq, 106, 111,113, 117 derivation of 113 implementation of 117 near optimum 127 block turbo code 127 noncoherent 59 et seq, 61 iterative 59 et seq spectrally efficient coded modulation 59 et seq soft-output 61 outer, transfer characteristics 34

202

Index

separate, detection and, 67 source-channel 105 et seq, 106, 109,111,113, 117 iterative (ISCD) 105 et seq, 106,111, 113,117 derivation of 113 implementation of 117 joint (JSCD) 105, 109 optimal 109 turbo, space-time block codes with turbo-TCM of, 171 turbo-equalization 171 decoding, iterative 62,131 non-coherent soft-output 61 trajectory 37 detection 67, 68, 87, 88, 153 decoding, joint, 68 MAP, soft-output, 153 multiuser 87, 88 SISO 87 synchronous CDMA 87 suboptimum, with soft outputs 88 separate, decoding and, 67 distance spectra 2 distributions 50, 54, 188,193 extrinsic 50,54 L-value 54 measuring 50 QPA 188, 193 free-running 193 many-node factor graphs 188 doping, systematic 44 capacity limit 44 DS/CDMA, coded asynchronous, 82 et seq turbo soft interference cancellation 82 et seq edge effect

1, 4

codewords 4 interleaver 1 encoder 5, 8 both, termination 5, 8 first, termination 5 equalization 169, 171 MAP, space-time block codes of, 169 turbo 171 decoding 171 space-time block codes with turbo-TCM 171 EXIT chart 37,38 BER from 37 code design 38 extrinsic distributions 50, 54 information transfer chart 36 L-value 54 measuring 50 transfer characteristics, re-computing 55 factor graphs 178 et seq, 183, 188 cycles, with, 183 quantum product algorithm 183 many-node, distributed QPA, 188 quantum 178 et seq first encoder, termination 5 flat fading MIMO channel model 143 flushing, post-interleaver 6, 8 free-running distributed QPA 193 frequency-selective channels, spacetime block codes, in, 169 inner 31,40,43,70 code 40,70 parameters, changing 40 outer, and, 40

Index

Ungerboeck 70 decoder transfer characteristics 31 matching 43 outer, and, 43 input 2,3 redundancy weight enumerating function 2 sequences, zero-terminating 3 iterative 29, 30, 59 et seq, 62, 85, 105etseq, 106,111, 113, 117, 131 decoding 29, 30, 59 et seq, 62, 85,105 et seq, 106, 111, 113, 117,131 noncoherent 59 et seq spectrally efficient coded modulations 59 et seq performance 62 process, the, 131 serially concatenated codes 29 et seq, 30 code characteristic matching 29 source-channel (ISCD) 105 et seq, 106, 111 derivation of 113 implementation of 117 turbo receiver 85 interference 82 et seq turbo soft cancellation 82 et seq coded asynchronous DS/CDMA 82 et seq interleaver 1, 2, 6, 8, 10 et seq, 16, 17, 19, 20, 25, 62 bit 62 cyclic shift 10 et seq, 17, 25 design, performance evaluation 20 design 16,20 edge effects 1

203

function 15 post, flushing 6, 8 S-random 19 turbo coding 15 uniform 2 interleaving 15 function 15 in turbo coding 15 joint source-channel decoding (JSCD) 105,109 optimal 109 linear combining 152 L-value distributions, extrinsic,

54

many-node factor graph 188 distributed QPA 188 MAP 153, 169 detection, soft-output, 153 equalization, space-time block codes, 169 maximum-likelihood (ML) amplification 184 MEMO channel capacity of 145 model 143, 145 flat fading 143 ISI 145 MRC (maximum ratio combining) device 93 multiuser detector 87, 88 SISO 87 synchronous CDMA 87 suboptimum, soft outputs with, 88 near optimum decoding 127 block turbo code (BTC) 127 noncoherent iterative decoding 59 etseq

204

Index

soft-output decoding 61 spectrally efficient coded modulations 59 et seq optimal 106, 109 channel-coded estimation (CCOE) 106 joint source-channel decoding 109 outage capacity, space-time block codes of, 159 outer 34, 40,43, 161 channel code, space-time block codes and, 161 code parameters, changing 40 inner, and, 40 decoder 34 transfer characteristics 34, 43 transfer characteristics, matching inner and, 43 phase QPA 195 post-interleaver flushing 6, 8 product 179, 181, 183, 188, 193, 195 quantum algorithm, the, (QPA) 179, 182, 183, 188, 193, 195 distributed 188, 193 free-running 193 many-node factor graphs 188 entangled space for 182 factor graphs with cycles 183 phase 195 space, classical SPA, 181 QPA (Quantum Product Algorithm) 179, 182, 183, 188, 193, 195 distributed 188,193 free-running 193

many-node factor graphs 188 entangled space for 182 factor graphs with cycles 183 phase 195 quantum factor graphs 178 et seq product algorithm, the, (QPA) 179, 182, 183, 188, 193, 195 distributed 188, 193 free-running 193 many-node factor graphs 188 entangled space for 182 factor graphs with cycles 183 phase 195 Rayleigh, multipath, 89, 101 fading 89, 101 asynchronous transmission system 101 channel, asynchronous multipath, 89 re-computing extrinsic transfer characteristics 55 reduced-state algorithm 63 relative contribution integral 12 SISO multiuser detectors 87 synchronous CDMA 87 soft-output decoding, noncoherent, 61 MAP-detection 153 suboptimum multiuser detector 88 source-channel decoding 105 et seq, 106, 109, 111,113,117 iterative (ISCD) 105 et seq, 106, 111,113,117 derivation of 113 implementation of 117 joint (JSCD) 105, 109 optimal 109

Index

SPA, classical 181 product space for, 181 space, entangled 182 product 181 classical SPA 181 QPA 182 space-time block codes bit error probability 157 frequency-selective channels 169 MAP-equalization of 169 outage capacity 159 outer channel code, and, 161 principle of 146 transformation of the channel 154 turbo-TCM 162, 171 turbo-equalization and turbodecoding17171 two transmit antennas 148 spectra 2, 59 et seq distance 2 efficient coded modulation 59 et seq noncoherent iterative decoding of 59 et seq S-random interleaves 19 suboptimum multiuser detector 88 soft outputs, with, 88 symbol-by-symbol APP-algorithm, general description, 112 synchronous 87,99 CDMA 87 SISO multiuser detectors 87 transmission system, AWGN with, 99 systematic doping, capacity limit, 44 termination encoder both 5,8 first 5

5

205

no 8 trellis 1,4 no 4 turbo codes 1 zero, input sequences 3 transfer characteristics 31, 34, 37, 43,55 chart, extrinsic information 36 decoders 31, 34, 37 individual 37 inner 31 outer 34 extrinsic, re-computing, 55 matching inner and outer 43 transformation of the channel 154 space-time block codes 154 transmission system 99, 101 asynchronous, with multipath Rayleigh fading, 101 synchronous, with AWGN, 99 transmit antenna 142 et seq, 148 diversity systems 142 et seq turbo processing 142 et seq two 148 space-time block code 148 trellis termination 1 et seq, 4 no 4 and turbo codes 1 turbo codes 1, 15, 127 block (BTC) 127 near optimum decoding 127 decoding 171 space-time block codes with turbo-TCM 171 turbo equalization 171 equalization 171 space-time block codes with turbo-TCM 171 turbo-decoding 171 interleaving 15

206

Index

processing 142 et seq transmit antenna diversity systems 142 et seq receiver, iterative, 85 soft interference cancellation 82 et seq coded asynchronous DS/CDMA 82 et seq TCM, space-time block codes and, 162, 171 turbo-equalization and turbodecoding of 171

trellis termination

1

Ungerboeck, inner, 70 code 70 uniform interleaver 2 weight enumerating function conditional 2 input-redundancy 2

2

zero-terminating input sequences 3