CHAPTER 1 1.1
Let r u ( k ) = E [ u ( n )u * ( n – k ) ]
(1)
r y ( k ) = E [ y ( n )y * ( n – k ) ]
(2)
We are give...
496 downloads
1723 Views
921KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
CHAPTER 1 1.1
Let r u ( k ) = E [ u ( n )u * ( n – k ) ]
(1)
r y ( k ) = E [ y ( n )y * ( n – k ) ]
(2)
We are given that y(n) = u(n + a) – u(n – a)
(3)
Hence, substituting Eq. (3) into (2), and then using Eq. (1), we get r y(k ) = E [(u(n + a) – u(n – a))(u*(n + a – k ) – u*(n – a – k ))] = 2r u ( k ) – r u ( 2a + k ) – r u ( – 2a + k ) 1.2
We know that the correlation matrix R is Hermitian; that is R
H
= R
Given that the inverse matrix R-1 exists, we may write –1 H
R R
= I
where I is the identity matrix. Taking the Hermitian transpose of both sides: RR
–H
= I
Hence, R
–H
= R
–1
That is, the inverse matrix R-1 is Hermitian. 1.3
For the case of a two-by-two matrix, we may Ru = Rs + Rν
1
2 r 11 r 12 σ 0 = + 2 r 21 r 22 0 σ
=
r 11 + σ r 21
2
r 12 r 22 + σ
2
For Ru to be nonsingular, we require 2
2
det ( R u ) = ( r 11 + σ ) ( r 22 + σ ) – r 12 r 21 > 0 With r12 = r21 for real data, this condition reduces to 2
2
( r 11 + σ ) ( r 22 + σ ) – r 12 r 21 > 0 2
2
Since this is quadratic in σ , we may impose the following condition on σ for nonsingularity of Ru: 4∆ r 2 1 σ > --- ( r 11 + r 22 ) 1 – -------------------------------------- 2 2 ( r + r ) – 1 11
22
2
where ∆ r = r 11 r 22 – r 12 1.4
We are given R = 1 1 1 1 This matrix is positive definite because a T a Ra = [ a 1 ,a 2 ] 1 1 1 1 1 a2 2
2
= a 1 + 2a 1 a 2 + a 2
2
2
= ( a 1 + a 2 ) > 0 for all nonzero values of a1 and a2 (Positive definiteness is stronger than nonnegative definiteness.) But the matrix R is singular because 2
2
det ( R ) = ( 1 ) – ( 1 ) = 0 Hence, it is possible for a matrix to be positive definite and yet it can be singular. 1.5
(a) H
r(0) r R M+1 = r RM
(1)
Let –1
R M+1 =
a b
H
b C
(2)
where a, b and C are to be determined. Multiplying (1) by (2): H
r(0) r I M+1 = r RM
a b
b
H
C
where IM+1 is the identity matrix. Therefore, H
r ( 0 )a + r b = 1
(3)
ra + R M b = 0
(4)
H
(5)
rb + R M C = I M H
H
r ( 0 )b + r C = 0
T
(6)
From Eq. (4):
3
–1
b = – R M ra
(7)
Hence, from (3) and (7): 1 a = -----------------------------------H –1 r ( 0 ) – r RM r
(8)
Correspondingly, –1
RM r b = – -----------------------------------H –1 r ( 0 ) – r RM r
(9)
From (5): –1
–1
C = R M – R M rb
H
–1
=
–1 RM
H
–1
R M rr R M + -----------------------------------H –1 r ( 0 ) – r RM r
(10)
As a check, the results of Eqs. (9) and (10) should satisfy Eq. (6). H
–1
H
–1
H
–1
r ( 0 )r R M H – 1 r R M rr R M r ( 0 )b + r C = – ------------------------------------ + r R M + ------------------------------------H –1 H –1 r ( 0 ) – r RM r r ( 0 ) – r RM r H
H
= 0
T
We have thus shown that H
–1 R M+1
–1
T 1 –r R M 0 0 = + a –1 0 R–1 R M r R – 1 rr H R – 1 M M M T 1 0 0 = + a –1 0 R–1 –R M r M
H
–1
[ 1 –r R M ]
4
where the scalar a is defined by Eq. (8):
(b)
RM
B*
r R M+1 = BT r(0) r
(11)
Let D
–1
R M+1 =
e
e f
H
(12)
where D, e and f are to be determined. Multiplying (11) by (12): RM
B*
D
r I M+1 = BT r(0) r
e
H
e f
Therefore RM D + r RM e + r r r
B* H
e
B*
(13)
= I
(14)
f = 0
BT
e + r(0) f = 1
BT
D + r ( 0 )e
H
= 0
(15) T
(16)
From (14): – 1 B*
e = – RM r
(17)
f
Hence, from (15) and (17): 1 f = --------------------------------------------BT – 1 B* r ( 0 ) – r RM r
(18)
Correspondingly,
5
– 1 B*
RM r e = – --------------------------------------------BT – 1 B* r ( 0 ) – r RM r
(19)
From (13): –1
– 1 B* H
D = RM – RM r
e
– 1 B* BT
=
–1 RM
–1
RM r r RM + --------------------------------------------BT – 1 B* r ( 0 ) – r RM r
(20)
As a check, the results of Eqs. (19) and (20) must satisfy Eq. (16). Thus BT
r
BT
D + r ( 0 )e
H
= r
BT
= 0
–1 RM +
– 1 B* BT
–1
BT
–1
r ( 0 )r R M r RM r r RM ------------------------------------------------ – --------------------------------------------BT – 1 B* BT – 1 B* r ( 0 ) – r RM r r ( 0 ) – r RM r
T
We have thus shown that – 1 B* BT
–1
–1 R M+1
=
RM 0 0
T
+f
–r
0
–1
=
RM 0 0
T
0
RM r
r
BT
–1
R M – R – 1 r B* M
–1
RM
– 1 B*
+f
–R M r
[ –r
1
BT
1
–1
RM 1 ]
where the scalar f is defined by Eq. (18). 1.6
(a) We express the difference equation describing the first-order AR process u(n) as u ( n ) = v ( n ) + w1 u ( n – 1 ) where w1 = -a1. Solving this equation by repeated substitution, we get u ( n ) = v ( n ) + w1 v ( n – 1 ) + w1 u ( n – 2 )
6
= … 2
n-1
= v ( n ) + w1 v ( n – 1 ) + w1 v ( n – 2 ) + … + w1 v ( 1 )
(1)
Here we have used the initial condition u(0) = 0 or equivalently u(1) = v(1) Taking the expected value of both sides of Eq. (1) and using E [v(n)] = µ
for all n,
we get the geometric series 2
n-1
E [ u ( n ) ] = µ + w1 µ + w1 µ + … + w1 µ 1 – w n 1 - , = µ -------------1 – w 1 µn,
w1 ≠ 1 w1 = 1
This result shows that if µ ≠ 0 , then E[u(n)] is a function of time n. Accordingly, the AR process u(n) is not stationary. If, however, the AR parameter satisfies the condition: a 1 < 1 or w 1 < 1 then µ E [ ( n ) ] → --------------- as n → ∞ 1 – w1 Under this condition, we say that the AR process is asymptotically stationary to order one. (b) When the white noise process v(n) has zero mean, the AR process u(n) will likewise have zero mean. Then
7
2
var [ v ( n ) ] = σ v
2
var [ u ( n ) ] = E [ u ( n ) ].
(2)
Substituting Eq. (1) into (2), and recognizing that for the white noise process 2 E [ v ( n )v ( k ) ] = σ v 0,
n=k
(3)
n≠k
we get the geometric series var [ u ( n ) ] = σ v ( 1 + w 1 + w 1 + … + w 1 2
2
4
2n-2
2n 2 1 – w1 - , σ v ----------------2 = 1 w – 1 2 σ v n,
)
w1 ≠ 1 w1 = 1
When |a1| < 1 or |w1| < 1, then 2
2
σv σv var [ u ( n ) ] ≈ --------------- = -------------- for large n 2 2 1 – w1 1 – a1 (c) The autocorrelation function of the AR process u(n) equals E[u(n)u(n-k)]. Substituting Eq. (1) into this formula, and using Eq. (3), we get 2
k
k+2
E [ u ( n )u ( n – k ) ] = σ v ( w 1 + w 1
+ … + w1
k+2n-2
2n 2 k 1 – w1 - , σ v w 1 ----------------2 = – 1 w 1 2 σ v n,
8
w1 ≠ 1 w1 = 1
)
For |a1| < 1 or |w1| < 1, we may therefore express this autocorrelation function as r ( k ) = E [ u ( n )u ( n – k ) ] 2 k
σv w1 ≈ --------------- for large n 2 1 – w1 Case 1: 0 < a1 < 1 In this case, w1 = -a1 is negative, and r(k) varies with k as follows: r(k) -3
-1
+3
+1
-2
-4
0
+2
+4
k
Case 2: -1 < a1 < 0 In this case, w1 = -a1 is positive and r(k) varies with k as follows: r(k)
-4
1.7
-3
-2
-1
0
+1
+2
+3 +4
k
(a) The second-order AR process u(n) is described by the difference equation: u ( n ) = u ( n – 1 ) – 0.5u ( n – 2 ) + v ( n ) Hence w1 = 1 w 2 = – 0.5 and the AR parameters equal a1 = –1 a 2 = 0.5 Accordingly, we write the Yule-Walker equations as
9
r(0) r(1)
r(1) r(0)
1 = r(1) – 0.5 r(2)
(b) Writing the Yule-Walker equations in expanded form: r ( 0 ) – 0.5r ( 1 ) = r ( 1 ) r ( 1 ) – 0.5r ( 0 ) = r ( 2 ) Solving the first relation for r(1): 2 r ( 1 ) = --- r ( 0 ) 3
(1)
Solving the second relation for r(2): 1 r ( 2 ) = --- r ( 0 ) 6
(2)
(c) Since the noise v(n) has zero mean, so will the AR process u(n). Hence, 2
var [ u ( n ) ] = E [ ( u n ) ] = r ( 0 ). We know that 2 σv
2
=
∑ ak r ( k ) k=0
= r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 )
(3)
Substituting (1) and (2) into (3), and solving for r(0), we get 2
σv r ( 0 ) = ---------------------------- = 1.2 2 1 1 + --- a 1 --- a 2 3 6 1.8
By definition, P0 = average power of the AR process u(n)
10
= E[|u(n)|2] = r(0)
(1)
where r(0) is the autocorrelation function of u(n) for zero lag. We note that r(1) r(2) … r(M ) -, ----------, , ------------- r--------r(0) (0) r(0)
{ a 1, a 2, …, a M }
Equivalently, except for the scaling factor r(0), { r ( 1 ), r ( 2 ), …, r ( M ) }
{ a 1, a 2, …, a M }
(2)
Combining Eqs. (1) and (2): { r ( 0 ), r ( 1 ), r ( 2 ), …, r ( M ) }
{ P 0, a , a 2 , … , a M } 1 1.9
(a) The transfer function of the MA model of Fig. 2.3 is * –1
H ( z ) = 1 + b1 z
* –2
+ b2 z
+ … + bK z
* –K
(b) The transfer function of the ARMA model of Fig. 2.4 is b0 + b1 z + b2 z + … + bK z H ( z ) = --------------------------------------------------------------------------------* –1 * –2 * –M 1 + a1 z + a2 z + … + a M z *
* –1
* –2
* –K
(c) The ARMA model reduces to an AR model when b0 = b1 = … = bK = 0 It reduces to an MA model when a1 = a2 = … = a M = 0 1.10
We are given x ( n ) = υ ( n ) + 0.75υ ( n – 1 ) + 0.25υ ( n – 2 ) Taking the z-transforms of both sides:
11
(3)
X ( z ) = ( 1 + 0.75z
–1
–2
+ 0.25z )V ( z )
Hence, the transfer function of the MA model is –1 –2 X (z) ----------- = 1 + 0.75z + 0.25z V (z)
1 = --------------------------------------------------------------–1 –2 –1 ( 1 + 0.75z + 0.25z )
(1)
Using long division, we may perform the following expansion of the denominator in Eq. (1): ( 1 + 0.75z
–1
–2 –1
+ 0.25z )
3 – 1 5 – 2 3 – 3 11 – 4 45 – 5 = 1 – --- z + ------ z – ------ z – --------- z + ------------ z 64 4 16 256 1024 85 – 8 627 – 9 91 – 6 93 – 7 1541 – 10 – ------------ z + --------------- z + --------------- z – ------------------ z + --------------------- z +… 65536 262144 4096 16283 1048576 ≈ 1 – 0.75z – 0.0222z
–6
–1
+ 0.3125z
+ 0.0057z
–7
–2
– 0.0469z
+ 0.0013z
–8
–3
– 0.043z
– 0.0024z
–4
–9
+ 0.0439z
+ 0.0015z
–5
– 10
(2)
(a) M = 2 Retaining terms in Eq. (2) up to z-2, we may approximate the MA model with an AR model of order two as follows: X (z) 1 ----------- ≈ ---------------------------------------------------------– 1 V ( z ) 1 – 0.75z + 0.3125z – 2 (b) M = 5 Retaining terms in Eq. (2) up to z-5, we obtain the following approximation in the forms of an AR model of order five: X (z) 1 ----------- ≈ --------------------------------------------------------------------------------------------------------------------------------------------------– 1 – 2 V ( z ) 1 – 0.75z + 0.3125z – 0.0469z – 3 – 0.043z – 4 + 0.0439z – 5
12
(c) M = 10 Finally, retaining terms in Eq. (2) up to z-10, we obtain the following approximation in the form of an AR model of order ten: X (z) 1 ----------- ≈ ----------V (z) D(z) where D(z) is given by the polynomial on the right-hand side of Eq. (2). 1.11
(a) The filter output is H
x(n) = w u(n) where u(n) is the tap-input vector. The average power of the filter output is therefore 2
H
H
E [ x ( n ) ] = E [ w u ( n )u ( n )w ] H
H
= w E [ u ( n )u ( n ) ]w H
= w Rw (b) If u(n) is extracted from a zero mean white noise of variance σ2, we have 2
R = σ I where I is the identity matrix. Hence, 2
2 H
E [ x(n) ] = σ w w 1.12
(a) The process u(n) is a linear combination of Gaussian samples. Hence, u(n) is Gaussian. (b) From inverse filtering, we recognize that v(n) may also be expressed as a linear combination of samples represented by u(n). Hence, if u(n) is Gaussian, then v(n) is also Gaussian.
1.13
(a) From the Gaussian moment factoring theorem: *
E ( u1 u2 )
k
= E [ u1 … u1 u2 … u2 ] *
*
13
= k! E [ u 1 u 2 ] … E [ u 1 u 2 ] *
*
*
= k! ( E [ u 1 u 2 ] )
k
(1)
(b) Putting u2 = u1 = u, Eq. (1) reduces to E[ u
2k
2
] = k! ( E [ u ] )
k
1.14
It is not permissible to interchange the order of expectation and limiting operations in Eq. (1.113). The reason is that the expectation is a linear operation, whereas the limiting operation with respect to the number of samples N is nonlinear.
1.15
The filter output is y(n) =
∑ h ( i )u ( n – i ) i
Similarly, we may write y(m) =
∑ h ( k )u ( m – k ) k
Hence, *
r y ( n, m ) = E [ y ( n )y ( m ) ] = E
∑ h ( i )u ( n – i ) ∑ h i
=
k
*
( k )E [ u ( n – i )u ( m – k ) ]
∑ ∑ h ( i )h
*
( k )r u ( n – i, m – k )
i
1.16
*
( k )u ( m – k )
∑ ∑ h ( i )h i
=
*
*
k
k
The mean-square value of the filter output in response to white noise input is 2
2σ ∆ω P o = -----------------π
14
The value Po is linearly proportional to the filter bandwidth ∆ω. This relation holds irrespective of how small ∆ω is, compared to the mid-band frequency of the filter. 1.17
(a) The variance of the filter output is 2
2 2σ ∆ω σ y = -----------------π
We are given 2
σ = 0.1 volt
2
∆ω = 2π × 1 radians/sec. Hence, 2 2 2 × 0.1 × 2 σ y = -------------------------- = 0.4 volt π
(b) The pdf of the filter output y is 2 2 – y ⁄ 2σ y 1 f ( y ) = ----------------- e 2πσ y 2 – y ⁄ 0.8 1 = ---------------------e 0.63 2π
1.18
(a) We are given N -1
Uk =
∑ u ( n ) exp ( – jnωk ) ,
k = 0,1,...,N-1
n=∞
where u(n) is real valued and 2π ω k = ------k N Hence,
15
* E[UkUl ]
N -1 N -1
= E
∑ ∑ u ( n )u ( m ) exp ( – jnωk + jmωl )
n=0 m=0 N -1 N -1
=
∑ ∑ exp ( – jnωk + jmωl )E [ u ( n )u ( m ) ]
n=0 m=0 N -1 N -1
=
∑ ∑ exp ( – jnωk + jmωl )r ( n – m )
n=0 m=0 N -1
=
N -1
∑ exp ( jnωk ) ∑ r ( n – m ) exp ( – jnωk )
m=0
(1)
n=0
By definition, we also have N -1
∑ r ( n ) exp ( – jnωk )
= Sk
n=0
Moreover, since r(n) is periodic with period N, we may invoke the time-shifting property of the discrete Fourier transform to write N -1
∑ r ( n – m ) exp ( – jnωk )
= exp ( – jmω k )S k
n=0
Thus, recognizing that ωk = (2π/N)k, Eq. (1) reduces to * E[UkUl ]
N -1
= Sk
∑ exp ( jm ( ωl – ωk ) )
m=0
S , = k 0,
l=k otherwise
(b) Part (a) shows that the complex spectral samples Uk are uncorrelated. If they are Gaussian, then they will also be statistically independent. Hence,
16
1 H 1 f U ( U 0, U 1, …, U N -1 ) = --------------------------------- exp – --- U ΛU 2 N ( 2π ) det ( Λ ) where U = [ U 0, U 1, …, U N -1 ]
T
H 1 Λ = --- E [ UU ] 2
1 = --- diag ( S 0, S 1, …, S N – 1 ) 2 N -1
1 det ( Λ ) = ------ ∏ S k N 2 k=0 Therefore, N -1 U 2 1 1 k - exp – --- ∑ ------------ f U ( U 0, U 1, …, U N -1 ) = --------------------------------------N -1 2 1 N –N k=0 --- S k ( 2π ) 2 ∏ S k 2 k=0 2
= π
1.19
–N
N -1 Uk exp ∑ – ------------ – ln S k k=0 S k
The mean square value of the increment process dz(ω) is 2
E [ dz ( ω ) ] = S ( ω )dω Hence E[|dz(ω)|2] is measured in watts. 1.20
The third-order cumulant of a process u(n) is c 3 ( τ 1, τ 2 ) = E [ u ( n )u ( n + τ 1 )u ( n + τ 2 ) ] = third-order moment. All odd-order moments of a Gaussian process are known to be zero; hence,
17
c 3 ( τ 1, τ 2 ) = 0 The fourth-order cumulant is c 4 ( τ 1, τ 2, τ 3 ) = E [ u ( n )u ( n + τ 1 )u ( n + τ 2 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 1 ) ]E [ u ( n + τ 2 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 2 ) ]E [ u ( n + τ 1 )u ( n + τ 3 ) ] – E [ u ( n )u ( n + τ 3 ) ]E [ u ( n + τ 1 )u ( n + τ 2 ) ] For the special case of τ = τ1 = τ2 = τ3, the fourth-order moment of a zero-mean Gaussian process of variance σ2 is 3σ4, and its second-order moment of σ2. Hence, the fourth-order cumulant is zero. Indeed, all cumulants higher than order two are zero. 1.21
The trispectrum is ∞
C 4 ( ω 1, ω 2, ω 3 ) =
∑
∞
∑
∞
∑
τ 1 =-∞ τ 2 =-∞ τ 3 =-∞
c 4 ( τ 1, τ 2, τ 3 )e
– j ( ω1 τ1 + ω2 τ2 + ω3 τ3 )
Let the process be passed through a three-dimensional band-pass filter centered on ω1, ω2, and ω3. We assume that the bandwidth (along each dimension) is small compared to the respective center frequency. The average power of the filter output is proportional to the trispectrum, C4(ω1, ω2, ω3). 1.22
(a) Starting with the formula c k ( τ 1, τ 2, …, τ k-1 ) = γ k
∞
∑ hi hi + τ1 … hi + τk-1
i=-∞
the third-order cumulant of the filter output is ∞
c 3 ( τ 1, τ 2 ) = γ 3
∑ hi hi + τ1 hi + τ2
i=-∞
where γ 3 is the third-order cumulant of the filter input. The bispectrum is
18
∞
C 3 ( ω 1, ω 2 ) = γ 3
∑
τ 1 =-∞ τ 2 =-∞ ∞
= γ3
∑
∑
∞
c 3 ( τ 1, τ 2 )e
– j ( ω1 τ1 + ω2 τ2 )
∞
∑
∑
i=-∞ τ 1 =-∞ τ 2 =-∞
hi hi + τ hi + τ e 1 2
– j ( ω1 τ1 + ω2 τ2 )
Hence, jω 1 jω 2 * j ( ω 1 + ω 2 ) C 3 ( ω 1, ω 2 ) = γ 3 H e H e H e
(b) From this formula, we immediately deduce that arg [ C 3 ( ω 1, ω 2 ) ] = arg H e 1.23
+ arg H e jω 2 – arg H e j ( ω 1 + ω 2 )
jω 1
The output of a filter of impulse response hi due to an input u(i) is given by the convolution sum y(n) =
∑ hi u ( n – i ) i
The third-order cumulant of the filter output is, for example, C 3 ( τ 1, τ 2 ) = E [ y ( n )y ( n + τ 1 )t ( n + τ 2 ) ] = E
∑ hi u ( n – i ) ∑ hk u ( n + τ1 – k ) ∑ hl u ( n + τ2 – l ) i
= E
k
∑ hi u ( n – i ) ∑ hk+τ1 u ( n – k ) ∑ hl+τ2 u ( n – l ) i
=
l
k
l
∑ ∑ ∑ hi hk+τ1 hl+τ2 E [ u ( n – i )u ( n – k )u ( n – l ) ] i
k
l
For an input sequence of independent and identically distributed random variables, we note that γ E [ u ( n – i )u ( n – k )u ( n – l ) ] = 3 0,
i = k= l
otherwise
19
Hence, ∞
C 3 ( τ 1, τ 2 ) = γ 3
∑ hi hi+τ1 hi+τ2
i=-∞
In general, we may thus write ∞
C 3 ( τ 1, τ 2, …, τ k-1 ) = γ k
∑ hi hi+τ1 …hi+τk-1
i=-∞
1.24
By definition:
r
(α)
N -1
* – j2παn jπαk 1 ( k ) = ---- ∑ E [ u ( n )u ( n – k )e ]e N n=0
Hence,
r
(α)
N -1
* – j2παn – j παk 1 ( – k ) = ---- ∑ E [ u ( n )u ( n + k )e ]e N n=0
r
( α )*
N -1
* j2παn – j παk 1 ( k ) = ---- ∑ E [ u ( n )u ( n – k )e ]e N n=0
We are told that the process u(n) is cyclostationary, which means that *
E [ u ( n )u ( n + k )e
– j2παn
*
] = E [ u ( n )u ( n – k )e
j2παn
]
It follows therefore that r 1.25
(α)
( –k ) = r
( α )*
(k)
For α = 0, the input to the time-average cross-correlator reduces to the squared amplitude of a narrow-band filter with mid-band frequency ω. Correspondingly, the time-average cross-correlator reduces to an average power meter. Thus, for α = 0, the instrumentation of Fig. 1.16 reduces to that of Fig. 1.13.
20
CHAPTER 2 2.1
(a) Let wk = x + jy p(-k) = a + jb We may then write f = wk p*(-k) = (x + jy)(a - jb) = (ax + by) + j(ay - bx) Let f = u + jv with u = ax + by v = ay - bx Hence, ∂u ------ = a ∂x
∂u ------ = b ∂y
∂v ----- = a ∂y
∂v ------ = – b ∂x
From these results we immediately see that ∂u ∂v ------ = ----∂x ∂y ∂v ∂u ------ = – -----∂x ∂y In other words, the product term wk p*(-k) satisfies the Cauchy-Rieman equations, and so this term is analytic.
21
(b) Let f = wk*p(-k) = (x - jy) (a + jb) = (ax + by) + j(bx - ay) Let f = u + jv with u = ax + by v = bx - ay Hence, ∂u ------ = a ∂x
∂u ------ = b ∂y
∂v ------ = b ∂x
∂v ----- = – a ∂y
From these results we immediately see that ∂u ∂v ------ ≠ ----∂x ∂y ∂u ∂v ------ = – -----∂y ∂x In other words, the product term wk*p(-k) does not satisfy the Cauchy-Rieman equations, and so this term is not analytic. 2.2
(a) From the Wiener-Hopf equation, we have –1
wo = R p
(1)
22
We are given
R =
1 0.5
0.5 1
p =
0.5 0.25
Hence, the inverse matrix R-1 is
R
–1
= 1 0.5
0.5 1
–1
1 = ---------- 1 0.75 – 0.5
– 0.5 1
Using Eq. (1), we therefore get 1 w o = ---------- 1 0.75 – 0.5 1 = --- 1 3 – 0.5
– 0.5 0.5 1 0.25 – 0.5 2 1 1
1 = --- 1.5 3 0 = 0.5 0 (b) The minimum mean-square error is 2
H
J min = σ d – p w o
23
2 = σ d – 0.5, 0.25 0.5 0 2
= σ d – 0.25 (c) The eigenvalues of matrix R are roots of the characteristic equation 2
2
( 1 – λ ) – ( 0.5 ) = 0 That is, the two roots are λ 1 = 0.5 and λ 2 = 1.5 The associated eigenvectors are defined by Rq=λq For λ1 = 0.5, we have
1 0.5
0.5 q 11 = 0.5 q 11 1 q 12 q 12
Expanding q11 + 0.5 q12 = 0.5 q11 0.5 q11 + q12 = 0.5 q12 Therefore, q11 = - q12 Normalizing the eigenvector q1 to unit length, we therefore have 1 q 1 = ------- 1 2 –1
24
Similarly, for the eigenvalue λ2 = 1.5, we may show that 1 q 2 = ------- 1 2 1 Accordingly, we may express the Wiener filter in terms of its eigenvalues and eigenvectors as follows: 2
H 1 w o = ∑ ----q i q i p i=1 λ i 1 = 1 1, – 1 + --- 1 1, 1 0.5 3 1 –1 0.25
=
( –11
–1 1
1 + --- 1 3 1
H 1 ------ q q λ 1 1 1
2.3
1 1
)
0.5 0.25
H 1 ------ q q λ 2 2 2
p
(a) From the Wiener-Hopf equation we have –1
wo = R p
(1)
We are given 1 0.5 0.25 R = 0.5 1 0.5 0.25 0.5 1 and p = 0.5 0.25 0.125
T
Hence, the use of these values in Eq. (1) yields
25
–1
wo = R p 1 0.5 0.25 = 0.5 1 0.5 0.25 0.5 1
–1
0.5 0.25 0.125
1.33 – 0.67 0 0.5 = – 0.67 1.67 – 0.67 0.25 0 – 0.67 1.33 0.125
= 0.5 0 0
T
(b) The minimum mean-square error is H
2
J min = σ d – p w o
=
2 σd
0.5 – 0.5 0.25 0.125 0 0
2
= σ d – 0.25 (c) The eigenvalues of matrix R are λ = 0.4069,
0.75,
1.8431
The corresponding eigenvectors constitute the orthogonal matrix:
Q =
– 0.4544 – 0.7071 0.5418 0.7662 0 0.6426 – 0.4544 0.7071 0.5418
Accordingly, we may express the Wiener filter in terms of its eigenvalues and eigenvectors as follows:
26
3
H 1 w o = ∑ ----q i q i p λi i=1
– 0.4544 1 = ---------------- 0.7662 – 0.4544 0.7662 – 0.4554 0.4069 – 0.4544
1 + ---------0.75
– 0.7071 0 – 0.7071 0 0.7071 0.7071
0.5 0.5418 1 × 0.25 + --------------1.8431 0.6426 0.5418 0.6426 0.5418 0.5418 0.125 0.2065 – 0.3482 0.2065 1 = ---------------- – 0.3482 0.5871 – 0.3482 0.4069 0.2065 – 0.3482 0.2065
1 + ---------0.75
0.5 0 – 0.5 0 0 0 – 0.5 0 0.5
0.2935 0.3482 0.2935 0.5 1 + ---------------- 0.3482 0.4129 0.3482 0.25 1.8431 0.2935 0.3482 0.2935 0.125 2.4
By definition, the correlation matrix H
R = E [ u ( n )u ( n ) ] where
27
...
u(n) u ( n ) = u ( n-1 ) u(0) Invoking the ergodicity theorem, N
H 1 R ( N ) = ----------- ∑ u ( n )u ( n ) N +1 n=0
Likewise, we may compute the cross-correlation vector p = E [ u ( n )d * ( n ) ] as the time average N
* 1 p ( N ) = ----------- ∑ u ( n )d ( n ) N +1 n=0
The tap-weight vector of the Wiener filter is thus defined by N
H w o ( N ) = ∑ u ( n )u ( n ) n=0
–1
N
* ∑ u ( n )d ( n ) n=0
which is dependent on the length (N+1) of the time series. 2.5
(a) R = E[u(n)uH(n)] = E[(α(n)s(n) + v(n))(α*(n)sH(n) + vH(n))] With α(n) uncorrelated with v(n), we have R = E[|α(n)|2]s(n)sH(n) + E[v(n)vH(n)] = σα2s(n)sH(n) + Rv
(1)
where Rv is the correlation matrix of v(n). (b) The cross-correlation vector between the input vector u(n) and desired response d(n) is
28
p = E[u(n)d*(n)]
(2)
If d(n) is uncorrelated with u(n), we have p = 0. Hence, the tap-weight of the Wiener filter is wo = R-1p =0 (c) With σα2 = 0, Eq. (1) reduces to R = Rv With the desired response d(n) = v(n-k) Eq. (2) yields p = E [ ( α ( n )s ( n ) + v ( n )v * ( n – k ) ) ] *
= E [ v ( n )v ( n – k ) ]
...
v(n) v(n – 1) * = E (v (n – k)) v(n – M + 1) rv(n) rv(k – 1)
,
0≤k≤M–1
(3)
...
=
rv(k – M + 1)
where rv(k) is the autocorrelation of v(n) for lag k. Accordingly, the tap-weight vector of the (optimum) Wiener filter is wo = R-1p
29
= Rv-1p where p is as defined in Eq. (3). (d) For a desired response d(n) = α(n)e-jωτ the cross-correlation vector p is p = E [u(n)(d*n)] *
= E [ α ( n )s ( n ) + v ( n )α ( n )e = s ( n )e
jωτ
2
= σ α s ( n )e
jωτ
]
2
E [ α(n) ]
jωτ
1
e
– jω
e
...
2 = σα e
jωτ
– jω ( M – 1 )
e ...
2 = σα e
e
jωτ jω ( τ – 1 )
jω ( τ – M + 1 )
The corresponding value of the tap-weight vector of the Wiener filter is
e 2
H
–1
e
jω ( τ – 1 )
...
2
w o = σ α ( σ α s ( n )s ( n ) + R v )
jωτ
e
jω ( τ – M + 1 )
30
e
jωτ
...
– 1 jω ( τ – 1 ) H 1 e = s ( n )s ( n ) + ------- R v 2 σ α
e 2.6
jω ( τ – M + 1 )
The optimum filtering solution is defined by the Wiener-Hopf equation Rwo = p for which the minimum mean-square error equals 2
H
J min = σ d – p w o
(1)
(2)
Combine Eqs. (1) and (2) into a single relation: 2
H
σd
p
p
R
J 1 = min –wo 0
Define 2
A = σd p
p
H
(3)
R
Since 2
σ d = E [ d ( n )d * ( n ) ] , *
p = E [ u ( n )d ( n ) ] , and *
R = E [ u ( n )u ( n ) ] , we may rewrite Eq. (3) as *
A = E [ d ( n )d ( n ) ] * E [ u ( n )d ( n ) ]
H
E [ d ( n ) ]u ( n ) H
E [ u ( n )u ( n ) ]
31
= E d ( n ) d * ( n ), u H ( n ) u(n) The minimum mean-square error equals 2
H
J min = σ d – p w o
(4)
2
Eliminate σ d between Eqs. (1) and (4): H
H
H
H
J ( w ) = J min + p w o – p w – w p + w Rw
(5)
Eliminate p between (2) and (5): H
H
H
H
J ( w ) = J min + w o Rw o – w o Rw – w Rw o + w Rw
(6)
where we have used the property RH = R. We may rewrite Eq. (6) simply as H
J ( w ) = J min + ( w – w o ) R ( w – w o ) which clearly show that J(wo) = Jmin. 2.7
The minimum mean-square error equals 2
H
–1
J min = σ d – p R p
(1)
Using the spectral theorem, we may express the correlation matrix R as R = QΛQ
H
M
=
∑ λk qk qk
H
k=1
Hence, the inverse of R equals
R
–1
M
=
1
q ∑ -----q λk k k
H
(2)
k=1
32
Substituting Eq. (2) into (1):
J min =
2 σd
M
–
1 H
∑ -----p λk
H
qk qk p
k=1
=
2 σd
M
–
1
p ∑ ----λk
H
qk
2
k=1
2.8
When the length of the Wiener filter is greater than the model order m, the tail end of the tap-weight vector of the Wiener filter is zero; thus, am
wo =
0
Therefore, the only possible solution for the case of an over-fitted model is am
wo = 2.9
0
(a) The Wiener solution is defined by RM aM = pM r M-m
RM H
r M-m
am
=
R M-m, M-m 0 M-m
pm p M-m
R M am = pm H
r M-m a m = p M-m H
H
–1
p M-m = r M-m a m = r M-m R M p m
(1)
(b) Applying the condition of Eq. (1) to the example in Section 2.7: H
r M-m = [ – 0.05, 0.1, 0.15 ]
33
0.8719 a m = – 0.9129 0.2444 The last entry in the 4-by-1 vector p is therefore H
r M-m a m = – 0.0436 – 0.0912 + 0.1222 = – 0.0126 2.10
2
H
2
H
J min = σ d – p w o –1
= σd – p R p When m = 0, 2
J min = σ d = 1.0 When m = 1, 1 J min = 1 – 0.5 × ------- × 0.5 1.1 = 0.9773 When m = 2, J min = 1 – [ 0.5 – 0.4 ] 1.1 0.5 = 1 - 0.6781 = 0.3219
0.5 1.1
–1
0.5 – 0.4
When m = 3,
34
1.1 0.5 0.1 J min = 1 – 0.5 – 0.4 – 0.2 0.5 1.1 0.5 0.1 0.5 1.1 = 1 - 0.6859 = 0.3141
–1
0.5 – 0.4 – 0.2
When m = 4, Jmin = 1 - 0.6859 = 0.3141 Thus any further increase in the filter order beyond m = 3 does not produce any meaningful reduction in the minimum mean-square error. 2.11
(a) u ( n ) = x ( n ) + ν 2 ( n )
(1)
d ( n ) = – d ( n-1 ) × 0.8458 + ν 1 ( n )
(2)
x ( n ) = d ( n ) + 0.9458x(n-1)
(3)
d ( n ) = x ( n ) – 0.9458x ( n-1 ) Using Eqs. (2) and (3): x ( n ) – 0.9458x ( n-1 ) = 0.8458 [ – x ( n-1 ) + 0.9458x ( n-2 ) ] + ν 1 ( n ) Rearranging terms: x ( n ) = ( 0.9458 – 8.8458 )x ( n-1 ) + 0.8x ( n-2 ) + ν 1 ( n ) x ( n ) = 0.1x ( n-1 ) + 0.8x ( n-2 ) + ν 1 ( n ) (b) u ( n ) = x ( n ) + ν 2 ( n ) where x ( n ) are ν 2 ( n ) are uncorrelated Therefore, R = R x + R ν 2
35
Rx =
r x(0)
r x(1)
r x(1)
r x(0)
2
r x(0) = σx
2
1 + a2 σ1 = --------------- ⋅ ---------------------------------- = 1 1 – a2 ( 1 + a )2 – a2 2
ν1(n) + _
1
.
Σ
d(n)
z-1 0.8458 d(n)
(a) Σ
d(n-1)
.
x(n) z-1
0.9452
(b)
–a1 r x ( 1 ) = --------------1 + a2 = 0.5
Rx =
1 0.5
R ν = 0.1 2 0
0.5 1 0 0.1
R = R x + R 2 = 1.1 0.5
p =
0.5 1.1
p(0) p(1)
p ( k ) = E [ u ( n – k ) ⋅ d ( n ) ],
k = 0, 1
36
ν2(n)
Σ
u(n)
p ( 0 ) = r x ( 0 ) + b1 r x ( –1 ) = 1 – 0.9458 × 0.5 = 0.5272 p ( 1 ) = r x ( 1 ) + b1 r x ( 0 ) = 0.5 – 0.9458 = – 0.4458 Therefore,
p =
0.5272 – 0.4458 –1
(c) Optimum weight vector w o = R p = 1.1 0.5 =
2.12
0.5 1.1
–1
0.5272 – 0.4458
0.8363 – 0.7853
(a) For M = 3 taps, the correlation matrix of the tap inputs equals
R =
1.1 0.5 0.85 0.5 1.1 0.5 0.85 0.5 1.1
The cross-correlation vector between the tap inputs and the desired response equals 0.527 p = – 0.446 0.377 (b) The inverse of the correlation matrix equals
R
–1
2.234 – 0.304 – 1.666 = – 0.304 1.186 – 0.304 – 1.66 – 0.304 2.234
37
Hence, the optimum weight vector equals 0.738 –1 w o = R p = – 0.803 0.138 The minimum mean-square error equals J min = 0.15 (a) The correlation matrix R is H
R = E [ u ( n )u ( n ) ]
e
– jω 1 n
2 = E [ A1 ] e
– jω 1 ( n – 1 )
e
+ jω 1 n
,e
+ jω 1 ( n – 1 )
...
2.13
e
– jω 1 ( n – M + 1 ) H
2
2
= E [ A 1 ]s ( ω 1 )s ( ω 1 ) + IE [ v ( n ) ] 2
H
2
= σ 1 s ( ω 1 )s ( ω 1 ) + σ v I where I is the identity matrix. (b) The tap-weight vector of the Wiener filter is –1
wo = R p From part (a), 2
2
H
R = σ v I + σ 1 s ( ω 1 )s ( ω 1 ) We are given 2
p = σ0 s ( ωo )
38
, …, e
+ jω 1 ( n – M + 1 )
To invert the matrix R, we use the matrix inversion lemma (see Chapter 9), as described here: If A = B-1 + CD-1CH then A-1 = B - BC(D + CHBC)-1CHB
In our case, 2
A = σv I B
2
–1
= σv I
–1
= σ1
D
2
C = s ( ω1 ) Hence, H 1 ------ s ( ω 1 )s ( ω 1 ) 2 σv –1 1 R = ------ I – ---------------------------------------------2 2 σv σv H ------ + s ( ω 1 )s ( ω 1 ) 2 σ1
The corresponding value of the Wiener tap-weight vector is –1
wo = R p 2
σ0 H ------ s ( ω 1 )s ( ω 1 ) 2 2 σ0 σv = ------ s ( ω 0 ) – ---------------------------------------------- s ( ω 0 ) 2 2 σv σv H ------ + s ( ω 1 )s ( ω 1 ) 2 σ1 We note that
39
H
s ( ω 1 )s ( ω 1 ) = M H
s ( ω 1 )s ( ω 0 ) = scalar Hence, 2 2 H σ0 σ 0 s ( ω 1 )s ( ω 0 ) w o = ------ s ( ω 0 ) – ------ --------------------------------- s ( ω 1 ) 2 2 2 σv σv σv ------ + M 2 σ0 2.14
The output of the array processor equals e ( n ) = u ( 1, n ) – wu ( 2, n ) The mean-square error equals 2
J (w) = E [ e(n) ] = E [ ( u ( 1, n ) – wu ( 2, n ) ) ( u * ( 1, n ) – w * u * ( 2, n ) ) ] 2
2
2
= E [ u ( 1, n ) ] + w E [ u ( 2, n ) ] *
– wE [ u ( 2, n )u * ( 1, n ) ] – w * E [ u ( 1, n )u ( 2, n ) ] Differentiate J(w) with respect to w: * 2 ∂J ------- = – 2E [ u ( 1, n )u ( 2, n ) ] + 2wE [ u ( 2, n ) ] ∂w
∂J ( w ) Putting --------------- = 0 and solving for the optimum value of w: ∂w *
E [ u ( 1, n )u ( 2, n ) ] w o = ---------------------------------------------2 E [ u ( 2, n ) ] 2.15
Define the index of performance (i.e., cost function)
40
2
H H
H
H
J ( w ) = E [ e ( n ) ] + c S w + w Sc – 2c D H
H H
H
H
= w Rw + c S w + w Sc – 2c D
1⁄2
1⁄2
1
1
Differentiate J(w) with respect to w and set the result equal to zero: ∂J ------- = 2Rw + 2Sc = 0 ∂w Hence, –1
w o = – R Sc But, we must constrain wo as H
S wo = D
1⁄2
1
This constraint yields H
–1
– S R Sc = D
1⁄2
1
Therefore, the vector c equals H
–1 1 ⁄ 2
–1
c = – (S R S) D
1
Correspondingly, the optimum weight vector equals –1
H
–1
–1 1 ⁄ 2
wo = R S ( S R S ) D 2.16
1
The weight vector w of the beamformer that maximizes the output signal-to-noise ratio H
w Rs w ( SNR ) o = -------------------H w Rv w is derived in part (b) of the solution to Problem 2.18. There it is shown that the optimum weight vector wSN so defined is given by
41
–1
w SN = R v s
(1)
where s is the signal component and Rv is the correlation matrix of the noise comment v(n). On the other hand, the optimum weight vector of the LCMV beamformer is defined by
wo =
–1 R s(φ) * g ----------------------------------H –1
(2)
s ( φ )R s ( φ )
where s(φ) is the steering vector. In general, the formulas (1) and (2) yield different values for the weight vector of the beamformer. 2.17
Let τi be the propagation delay, measured from the zero-time reference to the ith element of a nonuniformly spaced array, for a plane wave arriving from a direction defined by angle θ with respect to the perpendicular to the array. For a signal of angular frequency ω, this delay amounts to a phase shift equal to -ωτi. Let the phase shifts for all elements of the array be collected together in a column vector denoted by d(ω,θ). The response of a beamformer with weight vector w to a signal (with angular frequency ω) originating from angle θ = wHd(ω,θ). Hence, constraining the response of the array at ω and θ to some value g involves the linear constraint wHd(ω,θ) = g Thus, the constraint vector d(ω,θ) serves the purpose of generalizing the idea of an LCMV beamformer beyond simply the case of a uniformly spaced array. Everything else is the same as before, except for the fact that the correlation matrix of the received signal is no longer Toeplitz for the case of a nonuniformly spaced array.
2.18
(a) Under hypothesis H1, we have u=s+v The correlation matrix of u equals T
R = E [ uu ] T
= ss + R N , where RN = E[VVT]. T
The tap-weight vector wk is chosen so that w k u yields an optimum estimate of the kth element of s. Thus, with s(k) treated as the desired response, the cross-correlation vector between u and s(k) equals
42
p k = E [ us ( k ) ] k = 1, 2, …, M
= ss ( k ),
Hence, the Wiener-Hopf equation yields the optimum value of wk as –1
w ko = R p k T
–1
k = 1, 2, …, M
= ( ss + R N ) ss ( k ),
(1)
To apply the matrix inversion lemma (introduced in Problem 2.13), we let A = R B-1 = RN C = s D = 1 Hence, –1
R
–1
=
–1 RN
T
–1
R N ss R N – ---------------------------T –1 1 + s RN s
(2)
Substitute Eq. (2) into (1): –1 T –1 R N ss R N – 1 w ko = R N – ---------------------------- ss ( k ) T –1 1 + s RN s –1
T
–1
–1
T
–1
R N s ( 1 + s R N s ) – R N ss R N s = ----------------------------------------------------------------------------------s ( k ) T –1 1 + s RN s –1 s(k ) = ---------------------------R N s T –1 1 + s RN s
43
(b) The output signal-to-noise ratio equals 2
T
E[(w s) ] SNR = --------------------------T 2 E[(w v) ] T
T
w ss w = ------------------------------T T w E [ vv ]w T
T
w ss w = --------------------T w RN w
(3)
Since RN is positive definite, we may write 1⁄2 1⁄2
RN = RN RN
Define the vector 1⁄2
a = RN w or equivalently, –1 ⁄ 2
w = RN
a
(4)
Accordingly, we may rewrite Eq. (3) as follows T
1⁄2
T
1⁄2
a R N ss R N a SNR = --------------------------------------------T a a
(5)
where we have used the symmetric property of RV. Define the normalized vector a a = ------a where a is the norm of a. Then we may rewrite Eq. (5) as T
1⁄2
T
1⁄2
SNR = a R N ss R N a
44
T
1⁄2 2
= a RN s
Thus the output signal-to-noise ratio SNR equals the squared magnitude of the inner 1⁄2
product of two vectors a and R N s . This inner product is maximized when a equals –1 ⁄ 2
RN
s . That is, –1 ⁄ 2
a SN = R N
s
(6)
Let wSN denote the value of the tap-weight vector that corresponds to Eq. (6). Hence, the use of Eq. (4) in (6) yields –1 ⁄ 2
w SN = R N
–1 ⁄ 2
( RN
s)
–1
= RN s (c) Since the noise vector v(n) is Gaussian, its joint probability density function equals 1 1 T –1 f V ( v ) = ---------------------------------------------------- exp – --- v R N v 2 M⁄2 1⁄2 ( 2π ) ( detR N ) Under hypothesis H0 we have u=v and 1 T –1 1 f U ( u H 0 ) = ---------------------------------------------------- exp – --- u R N u M⁄2 1⁄2 2 ( detR N ) ( 2π ) Under hypothesis H1 we have u=s+v and T –1 1 1 f U ( u H 1 ) = ---------------------------------------------------- exp – --- ( u – s ) R N ( u – s ) M⁄2 1⁄2 2 ( detR N ) ( 2π )
45
Hence, the likelihood ratio equals f U(u H 1) Λ = ------------------------f U(u 0) T –1 1 T –1 = exp – --- s R N s + s R N u 2
The natural logarithm of the likelihood ratio equals T –1 1 T –1 ln Λ = – --- s R N s + s R N u 2
The first term represents a constant. Hence, testing ln Λ against a threshold is equivalent to the test T
–1
H
s RN u > 1 λ < H 0
where λ is some threshold. Equivalently, we may write –1
w ML = R N s where wML is the maximum likelihood weight vector. The results of parts (a), (b) and (b) show that the three criteria discussed here yield the same optimum value for the weight vector, except for a scaling factor. 2.19
(a) Assuming the use of a noncausal Wiener filter, we write ∞
∑ woi r ( i – k )
= p ( – k ),
k = 0, ± 1, ± 2, …, ± ∞
i=-∞
where the sum now extends from i=-∞ to i=∞ . Define the z-transforms: ∞
S(z) =
∑ k=-∞
–k
r ( k )z ,
∞
H u(z) =
∑ k=-∞
46
w o, k z
–k
(1)
∞
P(z) =
∑
p ( – k )z
–k
–1
= P(z )
k=-∞
Hence, applying the z-transform to Eq. (1): –1
H u ( z )S ( z ) = P ( z ) , which gives P(1 ⁄ z) H u ( z ) = -----------------S(z)
(2)
0.36 (b) P ( z ) = ---------------------------------------------0.2 1 – ------ ( 1 – 0.2z ) z 0.36 P ( 1 ⁄ z ) = ---------------------------------------------------( 1 – 0.2z )1 – ( 0.2 ⁄ z ) –1
( 1 – 0.146z ) ( 1 – 0.146z ) S ( z ) = 1.37 ----------------------------------------------------------------–1 ( 1 – 0.2z ) ( 1 – 0.2z ) Thus, applying Eq. (2) yields 0.36 H u ( z ) = ---------------------------------------------------------------------------–1 1.37 ( 1 – 0.146z ) ( 1 – 0.146z ) –1
0.36z = -----------------------------------------------------------------------------–1 –1 1.37 ( 1 – 0.146z ) ( z – 0.146 ) 0.0392 0.2685 = ------------------------------ + --------------------------–1 –1 z – 0.146 1 – 0.146z Clearly, this system is noncausal. Its impulse response is h(n) = inverse z-transform of Hu(z) = 0.2685 (0.146)nustep(n)
47
0.0392 1 n – ---------------- ------------- u step ( – n ) 0.146 0.146 where ustep(n) is the unit-step function: u step ( n ) = 1 for n=0,1,2,… 0 for n=-1,-2,… and ustep(-n) is its mirror image: u step ( – n ) = 1 for n=0,-1,-2,… 0 for n=1,2,… Simplifying, n
–n
h u ( n ) = 0.2685 × ( 0.146 ) u step ( +n ) – 0.2685 ( 6.849 ) u step ( – n ) Evaluating hu(n) for varying n: hu(0) = 0, and h u ( 1 ) = 0.03, h u ( 2 ) = 0.005, h u ( 3 ) = 0.0008 h u ( – 1 ) = – 0.03, h u ( – 2 ) = – 0.005, h u ( – 3 ) = – 0.0008 These are plotted in the following figure: hu(n) 0.03
. . -2
-1
.
.
0.01
0
1
. . . 2
3
Time n
.
(c) A delay by 3 time units applied to the impulse response will make the system causal and therefore realizable.
48
CHAPTER 3 3.1
(a) Let aM denote the tap-weight vector of the forward prediction-error filter. With a tapinput vector uM+1(n), the forward prediction error at the filter output equals H
f M ( n ) = a M u M+1 ( n ) The mean-square value of fM(n) equals H
2
H
E [ f M ( n ) ] = E [ a M u M+1 ( n )u M+1 ( n )a M ] H
H
= a M E [ u M+1 ( n )u M+1 ( n ) ]a M H
= a M R M+1 a M H
where R M+1 = E [ u M+1 ( n )u M+1 ( n ) ] is the correlation matrix of the tap-input vector. (b) The leading element of the vector a equals 1. Hence, the constrained cost function to be minimized is H
H
* T
J ( a M ) = a M R M+1 a M + λa M 1 + λ 1 a M where λ is the Lagrange multiplier and 1 is the first unit vector defined by 1 = [ 1, 0, …, 0 ] . Differentiating J(aM) with respect to aM, and setting the result equal to zero yields T
2R M+1 a M + 2λ1 = 0 Solving for aM: R M+1 a M = – λ1
(1)
However, we may partition RM+1 as
49
H
R M+1 = r ( 0 ) r
r RM
Hence, H
– λ = [ r ( ( 0 ), r ) ]a M = PM where PM is the minimum prediction-error power. Accordingly, we may rewrite Eq. (1) as
R M+1 a M = P M 1 =
3.2
PM 0
B*
(a) Let a M denote the tap-weight vector of the backward prediction-error filter. With a tap-input vector uM+1(n) the backward prediction error equals BT
b M ( n ) = a M u M+1 ( n ) The mean-square value of bM(n) equals 2
BT
H
B*
E [ b M ( n ) ] = E [ a M u M+1 ( n )u M+1 ( n )a M ] BT
H
B*
= a M E [ u M+1 ( n )u M+1 ( n ) ]a M BT
B*
= a M R M+1 a M B*
(b) The last element of a M equals 1. Hence, the constrained objective function to be minimized is BT
B*
BT B
* BT B* aM
J ( a M ) = a M R M+1 a M + λa M 1 + λ 1 where λ is the Lagrange multiplier, and
50
1
BT
= [ 0, 0, …, 1 ]
Differentiating J(aM) with respect to aM, B*
2R M+1 a M + 2λ1
B
= 0
B*
Solving for a M , we get B*
R M+1 a M = – λ1
B
(1)
However, we may express RM+1 in the partitioned form:
RM
R M+1 =
r
BT
r
B*
r(0)
Therefore, –λ = [ r
BT
B*
, r ( 0 ) ]a M
= PM where PM is the minimum backward prediction-error power. We may thus rewrite Eq. (1) as B*
R M+1 a M = P M 1 3.3
B
=
0 PM
(a) Writing the Wiener-Hopf equation Rg = rB* in expanded form, we have
51
..
...
...
r ( – M+1 ) r ( – M+2 ) …
g2
r(M ) = r ( M-1 )
...
...
g1
… r ( M-1 ) … r ( M-2 ) ...
r(1) r(0)
.
r(0) r ( –1 )
r(0)
r(1)
gM
Equivalently, we may write M
∑ gk r ( k – i ) = r ( M + 1 – i ),
i = 1, 2, …, M
k=1
Let k = M-l+1, or l = M-k+1. Then M
∑ g M -l+1 r ( M – l + 1 – i )
i = 1, 2, …, M
= r ( M + 1 – i ),
l=1
Next, put M+1-i = j, or i = M+1-j. Then M
∑ g M -l+1 r ( j – l )
j = 1, 2, …, M
= r ( j ),
l=1
Putting this relation into matrix form, we write
r(0)
g M-1
=
g1
This, in turn, may be put in the compact form RTgB = r* (b) The product rBTg equals
52
r(1) r(2) ...
gM ...
r ( M-1 ) r ( M-2 ) …
...
.
..
r ( – 1 ) … r ( – M+1 ) r ( 0 ) … r ( – M +2 ) ...
...
r(0) r(1)
r(M )
g1 BT
g = [ r ( – M ), r ( – M+1 ), …, r ( – 1 ) ]
g2 ...
r
gM M
=
∑ gk r ( k – 1 – M )
(1)
k=1
The product rTgB equals gM g M-1
= [ r ( – 1 ), r ( – 2 ), …, r ( – M ) ]
...
T B
r g
g1 M
=
∑ g M+1-k r ( –k ) k=1
Put M+1-k = l, or k = M+1-l. Hence, T B
r g
M
=
∑ gl r ( l-1-M )
(2)
l=1
From Eqs. (1) and (2): rBTg = rTgB 3.4
Starting with the formula m-1
r(m) = –
κ *m P m-1 –
∑ a*m-1, k r ( m-k )
k=1
and solving for κm, we get
53
m-1 r *m 1 κ m = ----------- – ----------- ∑ a m-1, k r * ( m-k ) P m-1 P m-1
(1)
k=1
We also note a m, k = a m-1,k + κ m a *m-1,m-k ,
k = 0, 1, …, m
2
P m = P m-1 ( 1 – κ m )
(2) (3)
(a) We are given r(0) = 1 r(1) = 0.8 r(2) = 0.6 r(3) = 0.4 We also note that P0 = r(0) = 1 Hence, the use of Eq. (1) for m = 1 yields r*(1) κ 1 = - ------------- = – 0.8 P0 The use of Eq. (3) for m = 1 yields 2
p 1 = P 0 ( 1 – κ 1 ) = 1 – 0.64 = 0.36 We next reapply Eq. (1) for m = 2: r*(2) r*(1) κ 2 = – ------------- – ------------- κ 1 P1 P1 where we have noted that κ1 = a1,1
54
Hence, 0.6 0.8 × 0.8 0.04 1 κ 2 = – ---------- + --------------------- = ---------- = --- = 0.1111 9 0.36 0.36 0.36 The use of Eq. (3) for m = 2 yields 2
P2 = P1 ( 1 – κ2 ) 4 1 = 0.36 1 – ------ = ------ = 0.0444 90 81 Next, we reapply Eq. (1) for m = 3: r*(3) 1 κ 3 = – ------------- – ------ ( a 2, 1 r * ( 2 ) + a 2, 2 r * ( 1 ) ) P2 P2 From Eq. (2) we have 1 a 2, 2 = κ 2 = --- = 0.1111 9 a 2, 1 = a 1, 1 + κ 2 a *1, 1 = κ 1 + κ 2 κ *1 4 4 4 188 = – --- – ------ × --- = – --------- = – 0.8356 5 90 5 225 Hence, 1 0.4 κ 3 = – ---------------- – ---------------- ( – 0.8356 × 0.6 + 0.111 × 0.8 ) 0.0444 0.0444 0.4 0.41248 0.01248 = – ---------------- + ------------------- = ------------------- = 0.281 0.0444 0.0444 0.0444 Note that all three reflection coefficients have a magnitude less than unity; hence, the lattice-predictor (prediction-error filter) representation of the process is minimum phase.
55
(b) This lattice-predictor representation is as shown in the following figure:
.
f0(n)
Σ
.
f1(n)
.
Σ
f3(n)
Σ
b3(n)
0.281
0.111
-0.8
u(n)
f2(n)
Σ
. 0.111
-0.8
.
b0(n) z-1
Σ
Stage 1
b1(n) z-1
0.281
.
Σ
b2(n) z-1
Stage 2
.
Stage 3
(c) From the calculations presented in part (a), we have P0 = 1 P1 = 0.36 P2 = 0.0444 To complete the calculations required, we note that P3 = P2(1 - |κ3|2) = 0.0444 (1-0.2812) = 0.0444 (1-0.079) = 0.0444 x 0.921 = 0.0409 From the power plot
1.0
Pm
0
1
2
3
4
5
m we note that the average power Pm decreases exponentially with the prediction order m.
56
The estimation error e(n) is e(n) = u(n) - wHx(n) where x ( n ) = u ( n – ∆ ) = [ u ( n-∆ ), u ( n-1-∆ ), …, u ( n-M-∆ ) ]
T
The mean-square value of the estimation error is 2
J = E [ e(n) ] H
H
= E [ (u ( n ) – w x ( n ))(u * ( n ) – x ( n )w) ] 2
H
H
H
H
= E [ u ( n ) ] – w E [ x ( n )u * ( n ) ] – E [ u ( n )x ( n ) ]w + w E [ x ( n )x ( n ) ]w H
H
H
H
= P 0 – w E [ u ( n-∆ )u * ( n ) ] – E [ u ( n )u ( n-∆ ) ]w + w E [ u ( n-∆ )u ( n-∆ ) ]w We now note the following:
...
u ( n-∆ ) * E [ u ( n-∆ )u ( n ) ] = E u ( n-1-∆ ) u ( n-M-∆ ) r ( –∆ ) = r ( – ∆-1 )
* u ( n )
= r∆
...
3.5
r ( – ∆ -M ) r ( –∆ ) H E [ u ( n )u ( n-∆ ) ] = r ( – ∆-1 ) r ( – ∆ -M )
H H
= r∆
H
E [ u ( n-∆ )u ( n-∆ ) ] = R
57
(1)
We may thus rewrite Eq. (1) as H
H
J = P0 – w r∆ – r∆ w + R The optimum value of the weight vector is –1
wo = R r∆ where R-1 is the inverse of the correlation matrix R. 3.6
We are given the difference equation u ( n ) = 0.9u ( n – 1 ) + v ( n ) (a) For a prediction-error filter of under two, we have a2,1 = -0.9 a2,2 = 0 The prediction-error filter representation of the process is therefore u(n)
.
u(n-1) z-1 -0.9
Σ
v(n)
(b) The corresponding reflection coefficients of the lattice predictor are κ1 = a2,1 = -0.9 κ2 = a2,2 = 0 We are given a first-order difference equation as the description of the AR process u(n). It is therefore natural that we use a first-order predictor for the representation of this process. 3.7
(a) (i) The tap-weight vector of the prediction-error filter of order M is
aM =
1 –wo
(1)
58
where –1
wo = R M r M
(2)
r M = E [ u M ( n-1 )u * ( n ) ]
e
– j2ω
e
2
= σα e
...
2 = σα e
– jω – jω
sM ( ω )
(3)
– jMω
2
H
2
R M = σ α s M ( ω )s M ( ω ) + σ v I M 1 – jω
e
(4)
...
sM ( ω ) = e
– jω ( M-1 )
From the matrix inversion lemma (see Chapter 9), we have: If A = B-1 + CD-1CH then A-1 = B - CD(D + CHBC)-1CHB
For our application B
–1
= σv I M
–1
= σα
D
2
2
C = sM ( ω ) Hence,
59
4
–1 RM
1 ⁄ σv H 1 = ------ I M – -----------------------------------------------------s M ( ω )s M ( ω ) 2 1 H 1 σv ------- + ------ s M ( ω )s M ( ω ) 2 2 σα σv 2
1 ⁄ σv H 1 = ------ I M – -------------------------------------------------------------s M ( ω )s M ( ω ) 2 2 2 H σv ( σ v ⁄ σ α ) + s M ( ω )s M ( ω ) We also note that H
s M ( w )s M ( w ) = M Hence, 2
–1 RM
1 ⁄ σv H 1 = ------ I M – ---------------------------------s M ( ω )s M ( ω ) 2 2 2 ( σv ⁄ σα ) + M σv
(5)
Substituting Eqs. (3) and (5) into (2) yields 2
wo =
2 2 – jω ( σ α ⁄ σ v )e sM ( ω )
2
2
= ( σ α ⁄ σ v )e
– jω
2
– jω
( σ α ⁄ σ v )e H – ---------------------------------s M ( ω )s M ( ω )s M ( ω ) 2 2 ( σv ⁄ σα ) + M
M 1 – --------------------------------- s M ( ω ) 2 2 ( σv ⁄ σα ) + M
– jω e = --------------------------------- s M ( ω ) ( σ2 ⁄ σ2 ) + M v α
(6)
Equations (1) and (6) define the tap-weight vector of the prediction-error filter. Moreover, the final value of the prediction-error power is H
P M = r ( 0 ) – r M wo
60
2 σα H = r ( 0 ) – --------------------------------- s M ( ω )s M ( ω ) 2 2 ( σv ⁄ σα ) + M
(7)
Using (4) and the fact that 2
2
r ( 0 ) = σα + σv
we may rewrite Eq. (7) as 2
PM =
2 σα +
2 σv –
Mσ α --------------------------------2 2 ( σv ⁄ σα ) + M
2
2
2
σv [ 1 + M + ( σv ⁄ σα ) ] = -----------------------------------------------------2 2 ( σv ⁄ σα ) + M
(8)
(a) (ii) The mth reflection coefficient is [from Eq. (3.56) of the text] ∆ m-1 κ m = – ----------P m-1 where BT
∆ m-1 = r m a m-1 Hence, BT
r m a m-1 κ m = – --------------------P m-1
(9)
From Eq. (3) we deduce that B
2 – jω B sm ( ω )
rm = σα e
61
2 = σα e
– jmω – j ( m-1 )ω
e
(10)
...
e
– jω
From Eq. (8): 2
2
2
σv [ m + ( σv ⁄ σα ) ] P m-1 = -------------------------------------------2 2 ( σ v ⁄ σ α ) + m-1
(11)
From Eqs. (1) and (6): 1 a m-1 =
– jω e – -------------------------------------- s m-1 ( ω ) ( σ 2 ⁄ σ 2 ) + m-1 v
(12)
α
Hence, the combined use of Eqs. (9)-(12) yields 2
2
2
– jmω
4 – jmω
σ α [ σ v + ( m-1 )σ α ]e ( m-1 ) σα e κ m = – -------------------------------------------------------------- + ------------------------------------------------2 2 2 2 2 2 2 2 σ v ( σ v + mσ α – σ α ) σ v ( σ v + mσ α – σ α ) 2 – jmω
σα e = – ------------------------------------2 2 2 σ v + mσ α – σ α 2
(a) (iii) When we let the noise variance σ v approach zero, the mth reflection coefficient 1 – jmω κm approaches ---------e , the magnitude of which, in turn, approaches zero as m m-1 approaches infinity. (b) (i) The tap-weight vector of the prediction-error filter of order M is
62
1 a M = α * e – jω 0 M-1 where 0M-1 is a null vector of dimension M-1. (ii) The reflection coefficients of a lattice predictor of order M are κ1 = α* e
– jω
for m = 2, …, M .
κm = 0
(c) We may consider u1(n) = u2(n) under the limiting conditions: α →1 and 2
σv → 0 2
where σ v refers to the noise in the AR process.
am =
a m-1 0
+ κm
0 B*
a m-1
In expanded form, we have
a m-1, 0
a m, 1
a m-1, 1 ...
=
0 a *m-1, m-1 + κm
...
a m, 0 ...
3.8
a m, m-1
a m-1, m-1
a *m-1, 1
a m, m
0
a *m-1, 0
or equivalently,
63
k = 0, 1, …, M
a m, k = a m-1, k + κ m a *m-1,m-k, Put m-k = l or k = m-l. Then,
l = 0, 1, …, m
a m, m-l = a m-1, m-l + κ m a *m-1, l , Complex conjugate both sides:
l = 0, 1, …, m
a *m, m-l = a *m-1, m-l + κ *m a m-1, l, Rewrite this relation in matrix form:
a *m, m-1
a *m-1, m-1 + κ *m
...
=
a m-1,0 a m-1,1 ...
0
...
a *m, m
a *m-1
a *m-1,1
a m-1, m-1
a *m-0
a *m-1,0
0
or equivalently,
B*
am =
0 B* a m-1
+ κ *m
a m-1 0
Note that (for all m) a m, k = 1, 0, 3.9
k=0 k>m
We start with the formula m-1
∆ m-1 =
∑ am-1,k r ( k – m )
(1)
k=0
The autocorrelation function r(k - m) equals (by definition)
64
r ( k – m ) = E [ u ( n – m )u * ( n – k ) ]
(2)
Hence, substituting Eq. (2) in (1): m-1
∆ m-1 =
∑ am-1,k E [ u ( n – m )u* ( n – k ) ]
k=0 m-1
= E u ( n – m ) ∑ a m-1,k u * ( n – k )
(3)
k=0
But, by definition, m-1
f m-1 ( n ) =
∑ a*m-1,k u ( n – k )
k=0
Hence, we may rewrite Eq. (3) as ∆ m-1 = E [ u ( n – m ) f *m-1 ( n ) ]
(4)
Next we note that u ( n – m ) = uˆ ( n – m U n-1 ) + b m-1 ( n-1 )
(5)
where Un-1 is the space spanned by u ( n-1 ), …, u ( n-m+1 ) , and bm-1 is the backward prediction error produced by a predictor of order m-1. The estimate uˆ ( n-m U n-1 ) equals m-1
uˆ ( n-m U n-1 ) =
∑ w*b, k u ( n – k )
(6)
k=1
Accordingly, using Eqs. (5) and (5) in (3): m-1
∆ m-1 =
E [ b m-1 ( n-1 ) f *m-1 ( n ) ] +
E
∑ w*b, k u ( n – k ) f *m-1 ( n )
k=1
65
m-1
=
E [ b m-1 ( n-1 ) f *m-1 ( n ) ]
∑ w*b, k E [ u ( n – k ) f m-1 ( n ) ]
+
*
(7)
k=1
But E [ f m-1 ( n )u * ( n – k ) ] = 0,
1 ≤ k ≤ m-1
Hence, Eq. (7) simplifies to ∆ m-1 = E [ b m-1 ( n-1 ) f *m-1 ( n ) ] 3.10
The polynomial x(z) equals x ( z ) = a M, M z M
=
M
∑ a M, k z
+ a M , M-1 z
M-1
+ … + a M, 0
k
k=0 M
= z H b, M ( z ) where Hb,M(z) is the transfer function of a backward prediction-error filter of order M. The reciprocal polynomial x′ ( z ) equals ′
x ( z ) = a M , M + a M , M-1 z + … + a M , 0 z *
*
*
M
M
= z H f , M (z) where Hf,M(z) is the transfer function of a forward prediction-error filter of order M. Next, we note, by definition, that ′
*
T [ x ( z ) ] = a M, 0 x ( z ) – a M, M x ( z ) M
=
∑ a M, k z k=0
k
M
– a M , M ∑ a M , M-k z *
k=0
66
k
M
=
∑ ( a M, k – a M, M a M, M-k )z *
k
k=0
But from the inverse recursion: *
a M , k – a M , M a M , M-k a M-1, k = ----------------------------------------------------, 2 1 – a M, M
k = 0, 1, …, M
Therefore, M
T [ x ( z ) ] = ( 1 – a M , M ) ∑ a M-1, k z 2
k
k=0 2
= ( 1 – a M, M ) [ z
M-1
H b, M-1 ( z ) ]
where we have used the fact that aM-1,M is zero. This shows that T[x(z)] is of order M-1, one less than the order of the original polynomial x(z). Similarly, we have 2
2
2
T [ x ( z ) ] = ( 1 – a M , M ) ( 1 – a M-1, M-1 ) [ z We also note that 1
T [ x ( 0 ) ] = 1 – a M, M 2
2
2
2
T [ x ( 0 ) ] = ( 1 – a M , M ) ( 1 – a M-1, M-1 ) Thus 0
0
M
1
1
M-1
H b, M-1 ( z )
2
2
M-2
H b, M-2 ( z )
T [ x ( z ) ] = T [ x ( 0 ) ]z H b, M ( z ) T [ x ( z ) ] = T [ x ( 0 ) ]z T [ x ( z ) ] = T [ x ( 0 ) ]z
67
M-2
H b, M-2 ( z ) ]
where, in the first line, we have 0
T [ x(z)] = x(z) and x(0) = 1 We may generalize these results by writing i
i
T [ x ( z ) ] = T [ x ( 0 ) ]z
M-i
H b, M-i ( z )
where i-1
i
T [ x(0)] =
∏ (1 –
2
a M – j, M – j ),
1≤i≤M
j=0
3.11
(a) The AR parameters equal a1 = –1 1 a 2 = --2 Since v(n) has zero mean, the average power of u(n) equals P0 = r ( 0 ) 2
σv 1 + a 2 = --------------- ------------------------------------ = 1.2 1 – a 2 [ ( 1 + a ) – a 2 ] 2 1 (b) For prediction order M = 2, the prediction-error filter coefficients equal the AR parameters: a 2, 1 = a 1 = – 1 1 a 2, 2 = a 2 = --2
68
The use of the inverse Levinson-Durbin recursion for real-valued data yields a m, k – a m, m a m, m-k a m-1, k = ------------------------------------------------, 2 1 – a m, m
k = 0, …, M
For m = 2, we have a 2, k – a 2, 2 a 2, 2-k a 1, k = -----------------------------------------2 1 – a 2, 2
k = 0, 1
Hence, a1,0 = 1 a 2, 1 – a 2, 2 a 2, 1 a 1, 1 = -------------------------------------2 1 – a 2, 2 2 = – --3 The reflection coefficients are thus as follows 2 κ 1 = a 1, 1 = – --3 1 κ 2 = a 2, 2 = --2 (c) Use of the formula (for real-valued data) 2
P m = P m-1 ( 1 – κ m ) yields the following values for the average prediction-error powers: 2 2 P 1 = P 0 ( 1 – κ 1 ) = --3 2 1 P 2 = P 1 ( 1 – κ 2 ) = --2
69
3.12
For real data, we have m-1
∑ am-1, k r ( m – k )
r ( m ) = – κ m P m-1 –
k=1
For m = 1, r ( 1 ) = –κ1 P0 = 0.8 For m = 2, r ( 2 ) = – κ 2 P 1 – a 1, 1 r ( 1 ) = 0.2 3.13
(a) The transfer function of the forward prediction-error filter equals –1
H f , M ( z ) = ( 1 – z i z )c i ( z ) where zi = ρi e
jω i
The power spectral density of the prediction error fM(n) equals S f (ω) = H f , M (e
jω 2
) S(ω)
where S(ω) is the power spectral density of the input process u(n). Hence, the meansquare value of the prediction error fM(n) equals ε =
=
π
∫–π H f , M ( e π
∫– π
1 – ρi e
jω 2
) S ( ω ) d( ω )
jω i – jω 2
e
Ci(e
jω 2
) S ( ω ) d( ω )
70
=
π
2
∫–π [ 1 – 2ρi cos ( ( ω – ωi ) + ρi ) ] C i ( e
jω 2
) S ( ω ) d( ω )
(b) Differentiating ε with respect to ρi: π jω 2 ∂ε -------- = 2 ∫ [ – cos ( ω – ω i ) + ρ i ] C i ( e ) S ( ω ) d( ω ) ∂ρ i –π
If zi lies outside the unit circle, ρi > 1. We note that (regardless of ρi) – 1 ≤ cos ( ω – ω i ) ≤ 1,
– π ≤ ( ω, ω i ) ≤ π
Hence, – cos ( ω – ω i ) + ρ i > 0, Since C i ( e ∂ε -------- > 0, ∂ρ i
jω 2
)
if ρ i > 1
and S(ω) are both positive, it follows that
if ρ i > 1
If the prediction-error filter is to be optimum, its parameters (and therefore the ρi) ∂ε must be chosen in such a way that -------- = 0 . ∂ρ i ∂ε Hence it is not possible for ρi > 1 and yet satisfy the optimality condition -------- = 0 . ∂ρ i The conclusion to be drawn is that the transfer function of a forward prediction-error filter (that is optimum) cannot have any of its zeros outside the unit circle. In other words, a forward prediction-error filter is necessarily minimum phase.
3.14
An AR process u(n) of order M is described by the difference equation u ( n ) = – a1 u ( n – 1 ) – … – a M u ( n – M ) + v ( n ) *
*
Equivalently, in the z-domain we have 1 U ( z ) = -----------------------------------------------------------V ( z ) * –1 * –M 1 + a1 z + … + a M z
71
When the process u(n) is applied to a forward prediction-error filter described by the transfer function * –1
H f ( z ) = 1 + a1 z
* –M
+ … + aM z
The z-transform of the resulting output is H f ( z )U ( z ) = V ( z ) In other words, the output consists of a white noise sequence v(n). Suppose next the process u(n) is applied to a backward prediction-error filter described by the transfer function H b ( z ) = a M + a M-1 z
–1
+ … + a1 z
– M+1
+z
–M
The z-transform of the resulting output is +z a M + a M-1 z + … + a 1 z H b ( z )U ( z ) = -----------------------------------------------------------------------------------------* –1 * –M 1 + a1 z + … + a M z –1
– M+1
–M
This rational function is recognized as an all-pass (nonminimum phase) function with unit magnitude but non-zero phase. Equivalently, we may state that the corresponding output sequence is an anticausal realization of white noise. 3.15
Let 1 I = -------2πj
°∫ C
–1 1 --- φ m ( z )φ k ( z )S ( z )dz z
On the unit circle, z = ejω dz = jejωdω Hence, jω – jω 1 π )S ( ω ) dω I = ------ ∫ φ m ( e )φ k ( e 2π – π
(1)
72
From the definition of φm(z), we have φm ( e
jω
m
jω ( m-i ) 1 ) = ----------- ∑ a m, i e P m i=0
Hence, we may rewrite Eq. (1) as 1 I = ------------------------2π P m P k
m
π
k
∫–π ∑ ∑ am, i ak, l e
jω ( m-i ) – j ω ( k-l )
e
S ( ω ) dω
i=0 l=0
k
k
1 = ------------------------- ∑ ∑ a m, i a k, l 2π P m P k i=0 l=0
π
∫–π S ( ω )e
jω ( m-k+l-i )
dω
(2)
From the Einstein-Wiener-Khintchine relations: jω ( m-k+l-i ) 1 π ------ ∫ S ( ω )e dω = r ( m-k+l-i ) 2π – π
Accordingly, we may simplify Eq. (2) as m
k
1 I = ------------------ ∑ ∑ a m, i a k, l r ( m-k+l-i ) P m P k i=0 l=0
(3)
From the augmented Wiener-Hopf equation for linear prediction: k
∑ am, i r ( m-k+l-i ) l=0
P , = k 0,
if m=k and l=0 otherwise
Substituting Eq. (4) into (3), we get I = 1, 0,
if m=k otherwise
where it is noted that ak,l = 1 for l = 0. Equivalently, I = δ mk
73
(4)
as required. 3.16
From Eqs. (3.81) and (3.82): m
H f , m(z) =
∏ ( 1 – zi z
–1
)
i=1 m
H b, m ( z ) =
∏ (z
*
–1
– zi )
i=1
z–1 – z* –1 i = ∏ --------------------- ( 1 – z i z ) – 1 i=1 1 – z i z m
z–1 – z* i = H f , m ( z ) ∏ --------------------- – 1 i=1 1 – z i z m
The factor ( z
–1
*
–1
– z i ) ⁄ ( 1 – z i z ) represents an all-pass structure with a magnitude
response equal to unity for z = ejω and all ω. Hence, H b, m ( e
jω
) = H f , m(e
jω
)
Given an input u(n) of power spectral density Su(ω) applied to both Hf,m(ejω) and Hb,m(ejω), we immediately see that S f , m( ω ) = Su( ω ) H f , m( e = S u ( ω ) H b, m ( e S b, m ( ω ) 3.17
jω 2
)
jω 2
)
for all m.
(a) The reflection coefficients of the two-stage lattice predictor equal 2 κ 1 = – --3
74
1 κ 2 = --2 Hence, the structure of this lattice predictor is as follows
.
.
Σ
−2/3
1/2
−2/3
1/2
Σ
f2(n)
Σ
b2(n)
u(n)
.
z−1
.
z−1
Σ
(b) The inverse lattice filter for generating the second-order AR process from a white noise process is as follows (see Fig. 3.11): White noise v(n)
Σ
Σ
.
Σ
.
1/2
−2/3
−1/2
2/3
.
z−1
Σ
.
.
AR process u(n)
z−1
From this latter structure, we see that 2 1 1 2 u ( n ) = --- u ( n-1 ) + – --- – --- u ( n-1 ) + – --- u ( n-2 ) + v ( n ) 2 3 2 3 That is, u ( n ) = u ( n-1 ) – 0.5u ( n-2 ) + v ( n ) which is exactly the same as the difference equation specified in Problem 3.11. 3.18
(a) From Fig. 3.10, fM(n) is obtained by passing u(n) through a minimum-phase prediction-error filter of order M, whereas bM(n) is obtained by passing u(n) through a maximum-phase prediction-error filter of order M. Hence, going through the steps outlined in Problem 3.16, we readily see that in passing through the path from the input fM(n) to the output bM(n), we will have gone through all-pass filter of order M.
75
(b) In going from the input fM(n) to the output u(n), we will have passed through the inverse of a forward prediction-error filter of order M. Since such a filter is minimum phase with all its zeros confined to the interior of the unit circle, it follows that its inverse is an all-pole filter with all its confined to the interior of the unit circle; hence its physical realizability is assured.
a 1, 1
0 1
… 0 … 0
a 2, 2
a 2, 1
… 0
...
...
. .. ...
(a) The (M+1)-by-(M+1) lower triangular matrix L is defined by 1 L =
a M , M a M , M-1 … 1 Let Y = LR
r ( – M ) r ( – M+1 ) …
...
… r(M ) … r ( M-1 ) .
r(1) r(0)
..
R =
r(0) r ( –1 )
...
where the (M+1)-by-(M+1) correlation matrix R is defined by
...
3.19
r(0)
Hence, the kmth element of the matrix product LR equals k
y km =
∑ ak, k-l r ( m-l ),
( m, k ) = 0, 1, …, M
(1)
l=0
For k = m, we thus have k
y mm =
∑ am, m-l r ( m-l ),
m = 0, 1, …, M
l=0
However, from the augmented Wiener-Hopf equation for backward prediction we have
76
m
∑ am, m-l r ( l-i ) *
0, = P m,
i = 0, 1, …, m-1 i=m
We therefore find that ymm is real-valued and that y mm = P m,
m = 0, 1, …, M .
(b) The mth column of matrix Y equals
r(k)
y 0m
1
∑ a1, 1-l r ( m-l )
y 1m
y mm
=
m
...
...
l=0
∑ am, m-l r ( m-l )
...
l=0
y Mm
0
The kth element of this column vector equals ykm that is defined by Eq. (1). The element ykm is recognized as the output produced by a backward prediction-error with tap weights a * , a * and tap inputs r ( m ), r ( m-1 ), …, r ( m-k ) , , …, a * k, k
k, k-1
k, 0
respectively. By summing the inner products of the respective tap weights and tap inputs, we get ykm. Hence, the mth column of matrix Y is obtained by passing the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } through the sequence of backward prediction-error filters whose transfer functions equal H b, 0 ( z ), H b, 1 ( z ), …, H b, m ( z ) .
77
(c) Apply the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } to the input of a lattice predictor of order m. Denote the variables appearing at the various points on the lower part of the predictor as x 0, x 1, …, x m , as shown here:
.
Σ
.
...
κ1
κm
κ1*
κm
Σ
{r(0),...,r(m)}
.
z−1
.
Σ
x0
. x1
z−1 . . .
. . z−1
xm-1
Σ
. xm
Figure 1 At time m we may express the resulting values of these outputs as follows: x0 = r(m) x1 = output of backward prediction-error filter or order 1 and with tap inputs r(m-1), r(m) 1
=
∑ a1, 1-l r ( m – l )
...
l=0
xm =
output of backward prediction-error filter of order k and with tap inputs r(0), r(1),...,r(m) m
=
∑ am, m-l r ( m – l ) l=0
The various sets of prediction-error filter coefficients are related to the reflection coefficients κ1,κ2,...,κm in accordance with the Levinson-Durbin recursion. Hence, the variables appearing at the various points on the lower line of the lattice predictor in Fig. 1at time m equal the elements of the mth column of matrix Y. (d) The lower output of stage m at time m equals the mmth element ymm of matrix Y. This output equals Pm, as shown in part (a). The upper output of stage m in the lattice predictor is equivalent to the output of a forward prediction-error filter of order m.
78
Hence, this output at time m+1, in response to the autocorrelation sequence { r ( 1 ), r ( 2 ), …, r ( m+1 ) } used as input, equals m
∑ am, l r ( m+1-l ) *
*
= ∆m
l=0
Thus, we deduce that (except for a minus sign) the ratio of the upper output of stage m in the lattice predictor of Fig. 1at time m+1 to the lower output of this stage at time m equals the complex conjugate of the reflection coefficient κm+1 for stage m+1 in the lattice predictor. (e) Using the autocorrelation sequence { r ( 0 ), r ( 1 ), …, r ( m ) } as input into the lattice predictor of Fig. 1, we may thus compute the corresponding sequence of reflection coefficients of the predictor as follows: (i) At the input of the lattice predictor (i.e., m=0), the upper output equals r(1) at time 1, and the lower input equals r(0) at time 0. Hence, the ratio of these two outputs equals [r(1)/r(0)] = – κ *1 . (ii) The upper output of stage 1 at time 2 equals ∆ *1 , and the lower output of this stage at time 1 equals P1. Hence, the ratio of these two outputs equals ( ∆ *1 ⁄ P 1 ) = – κ *2 . (iii)The ratio of the upper output of stage 2 at time 3 to the lower output of this stage at time 2 equals ( ∆ *2 ⁄ P 2 ) = – κ *3 , and so on for the higher stages of the lattice predictor. 3.20
A lattice filter exhibits some interesting correlation properties between the forward and backward prediction errors developed at the various stages of the filter. Basically, these properties are consequences of the principle of orthogonality, as described below: Property 1. The forward prediction error fm(n) and the input signal u(n) are orthogonal: E [ f m ( n )u* ( n – k ) ] = 0,
1≤k≤m
(1)
Similarly, the backward prediction error bm(n) and the input signal u(n) are orthogonal: E [ b m ( n )u* ( n – k ) ] = 0,
1 ≤ k ≤ m-1
Note the difference between the ranges of the index k in Eqs. (1) and (2).
79
(2)
Equations (1) and (2) are both restatements of the principle of orthogonality. By definition, the forward prediction error fm(n) equals the difference between u(n) and the prediction of u(n), given the tap inputs u(n-1),u(n-2),...,u(n-m). By the principle of orthogonality, the error fm(n) is orthogonal to u(n-k). k = 1,2,...,m. This proves Eq. (1). The backward prediction error, by definition, equals the difference between u(n-m) and the prediction of u(n-m), given the tap inputs u(n),u(n-1),...,u(n-m+1). Here, again, by the principle of orthogonality, the error bm(n) is orthogonal to u(n-k), k = 0,1,...,m-1. This proves Eq. (2). Property 2. The cross-correlation of the forward prediction error fm(n) and the input u(n) equals the cross-correlation of the backward prediction error bm(n) and the time-shifted input u(n-m), as shown by E [ f m ( n )u* ( n ) ] = E [ b m ( n )u* ( n – m ) ] = P m
(3)
where Pm is the corresponding prediction-error power. To prove the first part of this property, we note that u(n) equals the forward prediction error fm(n) plus the prediction of u(n), given the samples u(n-1),u(n-2),...,u(n-m). Since this prediction is orthogonal to the error fm(n) [which is a corollary to the principle of orthogonality], it follows that E [ f m ( n )u* ( n ) ] = E [ f m ( n ) f *m ( n ) ] = Pm To prove the second part of the property, we note that u(n-m) equals the backward prediction error bm(n) plus the prediction of u(n-m), given the samples u(n),u(n-1),...,u(nm+1). Since this prediction is orthogonal to the error bm(n), it follows that E [ b m ( n )u* ( n-m ) ] = E [ b m ( n )b *m ( n ) ] = Pm This completes the proof of Eq. (3). Property 3. The backward prediction errors are orthogonal to each other, as shown by P , E [ b m ( n )b *i ( n ) ] = m 0,
m=i m≠i
80
The forward prediction errors do not, however, exhibit the same orthogonality property as the backward prediction errors; rather, they are correlated, as shown by E [ f m ( n ) f *i ( n ) ] = P m
m≥i
(4)
Without loss of generality, we may assume that m > i. To prove Eq. (4) we express the backward prediction error bi(n) in terms of the input u(n) as the convolution sum i
bi ( n ) =
∑ ai, i-k u ( n – k )
(5)
k=0
where a *i, i-k, k = 0, 1, …, i , are the coefficients of a backward prediction-error filter of order i. Hence, we may write i
E [ b m ( n )b *i ( n ) ]
= E b m ( n ) ∑ a i, i-k u * ( n – k ) *
(6)
k=0
Now, by Property 1, we have E [ b m ( n )b *i ( n – k ) ] = 0
0≤k≤m–1
For the case when m > i, and with 0 < k < i, we therefore find that all the expectation terms inside the summation on the right side of Eq. (6) are zero. Correspondingly, E [ b m ( n )b *i ( n ) ] = 0
m≠i
When m = i, Eq. (6) reduces to E [ b m ( n )b *i ( n ) ] = E [ b m ( n )b *m ( n ) ] = P m,
m = i
This completes the proof of Eq. (3). To prove Eq. (4), we express the forward prediction error fi(n) in terms of the input u(n) as the convolution sum
81
i
f i(n) =
∑ a*i, k u ( n – k )
(7)
k=0
where ai,k, k = 0,1,...,i, are the coefficients of a forward prediction-error filter or order i. Hence, * E [ f m(n) f i (n)]
i
= E f m ( n ) ∑ a i, k u * ( n – k ) k=0 i
= E [ f m ( n )u* ( n ) ] +
∑ ai, k E [ f m ( n )u* ( n – k ) ]
(8)
k=1
where we have used the fact that ai,0 = 1. However, by Property 1, we have E [ f m ( n )u* ( n – k ) ] = 0
1≤k≤m
Also, by Property 2, we have E [ f m ( n )u* ( n ) ] = P m Therefore, Eq. (8) reduces to *
E [ f m ( n ) f i ( n ) ] = Pm
m≥i
This completes the proof of Eq. (4). Property 4. The time-shifted versions of the forward and backward prediction errors are orthogonal, as shown by, respectively, *
*
E [ f m ( n+l ) f i ( n ) ] = E [ f m ( n – l ) f i ( n ) ] = 0
1≤l≤m–i
(9)
m>i
*
*
E [ b m ( n )b i ( n – l ) ] = E [ b m ( n + l )b i ( n ) ] = 0
0≤l≤m–i–1 m>i
where l is an integer lag.
82
(10)
To prove Eq. (9), we use Eq. (7) to write * E [ f m(n) f i (n –
i
l ) ] = E f m ( n ) ∑ a i, k u * ( n – l – k ) k=0 i
∑ ai, k E [ f m ( n )u* ( n – l – k ) ]
=
(11)
k=0
By Property 1, we have E [ f m ( n )u * ( n – l – k ) ] = 0,
1≤l+k≤m
(12)
In the summation on the right side of Eq. (11) we have 0 < k < i. For the orthogonality relationship of Eq. (12) to hold for all values of k inside this range, the lag l must correspondingly satisfy the condition 1 < l < m-i. Thus, with the lag l bounded in this way, and with m > i, all the expectation terms inside the summation on the right side of Eq. (11) are zero. We therefore have *
E [ f m(n) f i (n – l)] = 0
1≤l≤m–i m>i
By definition, we have *
*
E [ f m(n) f i (n – l)] = E [ f m(n + l) f i (n)] *
Therefore, if the expectation E [ f m ( n ) f i ( n – l ) ] is zero, then so is the expectation *
E [ f m ( n + l ) f i ( n ) ] . This completes the proof of Eq. (9). To prove Eq. (10), we use Eq. (5) to write * E [ b m ( n )b i ( n
i
– l ) ] = E b m ∑ a *i, i-k u * ( n – l – k ) k=0 i
=
∑ a*i, i-k E [ bm ( n )u * ( n – l – k ) ] k=0
83
(13)
By Property 1, we have E [ b m ( n )u * ( n – l – k ) ] = 0,
0≤l+k≤m–1
(14)
For the orthogonality relationship to hold for 0 < k < i, the lag l must satisfy the condition 0 < l < m-i-1. Then, with m > i, we find that all the expectations inside the summation on the right side of Eq. (13) are zero. We therefore have E [ b m ( n )b *i ( n – l ) ] = 0,
0≤l≤m–i–1 m>i
By definition, we have E [ b m ( n )b *i ( n – l ) ] = E [ b m ( n + l )b *i ( n ) ] Hence, if the expectation E [ b m ( n )b *i ( n – l ) ] is zero, then so is the expectation E [ b m ( n + l )b *i ( n ) ] . This completes the proof of Eq. (10). Property 5. The time-shifted forward prediction errors fm(n+m) and fi(n+i) are orthogonal, as shown by P m=i E [ f m ( n – m ) f *i ( n + i ) ] = m, 0, m ≠ i
(15)
The corresponding time-shifted backward prediction errors bm(n+m) and bi(n+i), on the other hand, are correlated as shown by E [ b m ( n + m )b *i ( n + i ) ] = P m,
m≥i
(16)
Equations (15) and (16) are the duals of Eqs. (3) and (4), respectively Without loss of generality, we may assume m > i. To prove Eq. (15), we first recognize that E [ f m ( n + m ) f *i ( n + i ) ] = E [ f m ( n ) f *i ( n – m + i ) ]
84
= E [ f m ( n ) f *i ( n – l ) ]
(17)
where l = m–i Therefore, with m > i, we find from Property 4 that the expectation in Eq. (17) is zero. When, however, m = i, the lag l is zero, and this expectation equals Pm, the mean-square value of fm(n). This completes the proof of Eq. (15). To prove Eq. (16) we recognize that E [ b m ( n + m )b *i ( n + i ) ] = E [ b m ( n )b *i ( n – m + i ) ] = E [ b m ( n )b *i ( n – l ) ]
(18)
where l = m–i The value of l lies outside the range for which the expectation in Eq. (18) is zero [see Eq. (10)]. This means that bm(n+m) and bi(n+i) are correlated. To determine this correlation, we use Eq. (5) to write i
E [ bm ( n
+ m )b *i ( n
+ i ) ] = E b m ( n + m ) ∑ a *i, i-k u * ( n + i – k ) k=0
= E [ b m ( n )u * ( n ) ] i-1
+
∑ a*i, i-k E [ bm ( n + m )u
*
(n + i – k)]
(19)
k=0
where we have used ai,0 = 1. By Property 1, we have *
*
E [ b m ( n + m )u ( n + i – k ) ] = E [ b m ( n )u ( n + i – k – m ) ] = 0,
0≤k+m–i≤m–1
85
(20)
The orthogonality relationship of Eq. (20) holds for i-m < k < i-1. The summation on the right side of Eq. (19) applies for 0 < k < i-1. Hence, with m > i, all the expectation terms inside this summation are zero. Correspondingly, Eq. (19) reduces to E [ b m ( n + m )b *i ( n + i ) ] = E [ b m ( n + m )u * ( n ) ] = E [ b m ( n )u * ( n – m ) ] = P m,
m≥i
where we have made use of Property 2. This completes the proof of Eq. (16). Property 6. The forward and backward prediction errors exhibit the following crosscorrelation property:
E [ f m ( n )b *i ( n ) ]
* = κi Pm , 0,
m≥i
(21)
m
To prove this property, we use Eq. (5) to write i
E [ f m ( n )b *i ( n ) ]
= E f m ( n ) ∑ a *i, i-k u * ( n – k ) k=0 i
=
a *i, i E [ f m ( n )u * ( n ) ] +
∑ a*i, i-k E [ f m ( n )u* ( n – k ) ]
(22)
k=1
By Property 1, we have E [ f m ( n )u * ( n – k ) ] = 0,
1≤k≤m
Assuming that m > i, we therefore find that all the expectation terms in the second term (summation) on the right side of Eq. (22) are zero. Hence, i
∑ a*i, i-k E [ f m ( n )u* ( n – k ) ]
= 0
for m > i
k=1
86
By Property 2, we have E [ f m ( n )u * ( n ) ] = P m Therefore, with a i, i = κ i , we find that Eq. (22) reduces to E [ f m ( n )b *i ( n ) ] = κ *i P m,
m≥i
For the case when m < i, we adapt Eq. (7) to write m
E [ f m ( n )b *i ( n ) ]
∑ a*m, k u ( n – k )b*i ( n )
= E
k=0 m
=
∑ a*m, k u ( n – k )b*i ( n )
(23)
k=0
By Property 1, we have E [ b i ( n )u * ( n – k ) ] = 0
0≤k≤i–1
Therefore, with m < i, we find that all the expectation terms inside the summation on the right side of Eq. (23) are zero. Thus, E [ f m ( n )b *i ( n ) ] = 0
m
This completes the proof of Property 6. 3.21
The entropy of the input vector u(n) is defined by the multiple integral ∞
H u = –∫
f ( u ) ln [ f U ( u ) ] du –∞ U
(1)
But, –1
u(n) = L b(n)
(2)
87
where L is a lower triangular matrix defined by a sequence of backward prediction-error filters. We may therefore express the joint probability density function fB(b) of the backward prediction-error vector b(n) in terms of fU(u) as –1
–1
f B ( b ) = det ( L ) f U ( L b )
(3)
where the term inside the absolute value signs is the Jacobian of the transformation described in Eq. (2). We also note –1
d ( u ) = det ( L ) db
(4)
Next, we note that the entropy of the backward prediction-error vector b(n) is ∞
H b = –∫
f ( b ) ln [ f B ( b ) ] db –∞ B ∞
= –∫
–1
–1
–1
–1
det ( L ) f U ( L b ) ln [ det ( L ) f U ( L b ) ] db –∞
∞
–1
= –∫
f ( u ) ln [ det ( L ) f U ( u ) ] du –∞ U
where in the second line we have used Eq. (3) and in the third line we have used Eqs. (2) and (4). The transformation matrix L has unity for all its diagonal elements; hence, the Jacobian of the transformation is unity. Accordingly, we may simplify the entropy Hb as ∞
H b = –∫
f ( u ) ln [ f u ( u ) ] du –∞ u
= Hu This shows that the backward prediction-error vector b(n) has the same entropy, and therefore contains the same amount of information, as the input vector u(n). 3.22
(a) The index of performance to be optimized for stage m equals 2
2
J m = aE [ f m ( n ) ] + ( 1 – a )E [ b m ( n ) ]
(1)
where a is a constant that determines the mix of contributions from the forward and backward prediction errors to the index of performance. The forward prediction error equals
88
f m ( n ) = f m-1 ( n ) + κ *m b m-1 ( n – 1 )
(2)
The backward prediction error equals b m ( n ) = b m-1 ( n ) + κ m f m-1 ( n )
(3)
Substituting Eqs. (2) and (3) in (1) yields 2
2
J m = aE [ f m-1 ( n ) ] + ( 1 – a )E [ b m-1 ( n – 1 ) ] 2 2 2 + κ m ( 1 – a )E [ f m-1 ( n ) ] + aE [ b m-1 ( n – 1 ) ]
+ 2κ *m E [ f *m-1 ( n )b m-1 ( n – 1 ) ] + 2κ m E [ f m-1 ( n )b *m-1 ( n – 1 ) ] Hence, differentiating Jm with respect to κm and setting the result equal to zero, we find that the optimum value of κm equals E [ f *m-1 ( n )b m-1 ( n – 1 ) ] κ m, o ( a ) = – -------------------------------------------------------------------------------------------------------2 2 ( 1 – a )E [ f m-1 ( n ) ] + aE [ b m-1 ( n – 1 ) ] (b) The three special cases of interest are (i)
a=1 The index of performance equals 2
J m = E [ f m(n) ] Correspondingly, the optimum value of the reflection coefficient equals E [ f *m-1 ( n )b m-1 ( n – 1 ) ] κ m, o ( 1 ) = – --------------------------------------------------------2 E [ b m-1 ( n – 1 ) ] We refer to this method of optimization as the forward method. (ii)
a=0
89
The index of performance equals 2
J m = E [ b m-1 ( n ) ] Correspondingly, the optimum value of the reflection coefficient equals E [ f *m-1 ( n )b m-1 ( n – 1 ) ] κ m, o ( 0 ) = – --------------------------------------------------------2 E [ f m-1 ( n – 1 ) ] We refer to this method of optimization as the backward method. (iii)
a = 1/2 The index of performance equals 2 2 1 J m = --- E [ f m ( n ) ] + E [ b m ( n ) ] 2
Correspondingly, the optimum reflection coefficient yields the Burg formula 2E [ f *m-1 ( n )b m-1 ( n – 1 ) ] 1 κ m, o --- = – ----------------------------------------------------------------------------------- 2 2 2 E [ f m-1 ( n ) ] + E [ b m-1 ( n – 1 ) ] We refer to this method of optimization as the forward-backward method. 3.23
(a) The harmonic mean of the optimum values of the reflection coefficient produced by the forward and backward methods equals 2
2
E [ f m-1 ( n ) ] + E [ b m-1 ( n – 1 ) ] 1 1 1 --- -------------------- + -------------------- = – ----------------------------------------------------------------------------------2 κ m, o ( 1 ) κ m, o ( 0 ) 2E [ f * ( n )b ( n – 1 ) ] m-1
m-1
1 = -----------κ m, o where κm,o is the optimum value of the reflection coefficient produced by the Burg formula.
90
(b) Define the correlation coefficient or the normalized value of the cross-correlation function between the forward prediction error fm-1(n) and the delayed backward prediction error bm-1(n-1) as *
E [ f m-1 ( n )b m-1 ( n – 1 ) ] ρ = ---------------------------------------------------------------------------------------2 2 E [ f m-1 ( n ) ] + E [ b m-1 ( n – 1 ) ] Hence, we may redefine the optimum values κm,o(1) and κm,o(0) for the reflection coefficient that are produced by the forward and backward methods, respectively, as follows κ m, o ( 1 ) = – αρ and ρ κ m, o ( 0 ) = – --α where the parameter α is itself defined by 2
α =
E [ f m-1 ( n ) ] ------------------------------------------2 E [ b m-1 ( n – 1 ) ]
Accordingly, using the result of part (a), we may redefine the optimum value of the reflection coefficient produced by the Burg formula as 1 1 1 ------------ = – ------ --- + α 2ρ α κ m, o We note that the parameter α is a nonnegative real-valued scalar that lies in the range 0 ≤ α ≤ ∞ . Moreover, the factor 1 --- + α α attains its minimum value of 2 at α = 1. We may therefore write 1 --- + α ≥ 2 α
91
The correlation coefficient ρ is likewise a nonnegative scalar that lies in the range 0 ≤ ρ ≤ 1. That is, 1 ≤ ( 1 ⁄ ρ ) ≤ ∞ . We conclude therefore that 1 --------------- ≥ 1 κ m, o or, equivalently, κ m, o ≤ 1 for all m. 2
(c) The mean-square value of the forward prediction error fm(n) equals E [ f m ( n ) ] , where f m ( n ) = f m-1 ( n ) + κ *m, o b m-1 ( n-1 ) where the reflection coefficient is assigned the optimum value κm,o in accordance with the Burg formula. Hence, 2
2
2
2
E [ f m ( n ) ] = E [ f m-1 ( n ) ] + κ m, o E [ b m-1 ( n-1 ) ] + κ *m, o E [ f *m-1 ( n )b m-1 ( n-1 ) ] + κ m, o E [ f m-1 ( n )b *m-1 ( n-1 ) ] Using the Burg formula *
2E [ f m-1 ( n )b m-1 ( n-1 ) ] κ m, o = – -------------------------------------------------------------------------------2 2 E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] to eliminate the factor E [ f *m-1 ( n )b m-1 ( n-1 ) ] , we may thus rewrite the mean-square value of fm(n) as 2
2
2
2
E [ f m ( n ) ] = E [ f m-1 ( n ) ] + κ m, o E [ b m-1 ( n-1 ) ]
92
2 2 2 – κ m, o E [ f m-1 ( n ) ] + E [ b m-1 ( n ) ] 2
2
= ( 1 – κ m, o )E [ f m-1 ( n ) ] Similarly, we may express the mean-square value of the backward prediction error as follows 2
2
2
2
E [ b m-1 ( n ) ] = E [ b m-1 ( n-1 ) ] + κ m, o E [ f m-1 ( n ) ] + κ m, o E [ f m-1 ( n )b *m-1 ( n-1 ) ] + κ *m, o E [ f *m-1 ( n )b m-1 ( n-1 ) ] 2
2
2
= E [ b m-1 ( n ) ] + κ m, o E [ f m-1 ( n ) ] 2 2 2 – κ m, o E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] 2
2
= ( 1 – κ m, o )E [ b m-1 ( n-1 ) ]
93
3.24
Assuming ergodicity, we may set up the following table for time series of length N. Ensemble average
Time-average N
E b ( n-1 ) f * ( n ) m-1 m-1 ---------------------------------------------------------2 E b ( n-1 ) m-1
∑ bm-1( n-1 ) f *m-1( n ) n=1 ---------------------------------------------------------N 2 b ( n-1 ) m-1 n=1
∑
N
E b ( n-1 ) f * ( n ) m-1 m-1 ---------------------------------------------------------2 E f (n) m-1
∑ bm-1( n-1 ) f *m-1( n ) n=1 ---------------------------------------------------------N 2 f (n) m-1 n=1
∑
N
E b ( n-1 ) f * ( n ) m-1 m-1 ----------------------------------------------------------------------------------------------2 2 1⁄2 E f (n) E b ( n-1 ) m-1 m-1
∑ bm-1( n-1 ) f *m-1( n ) n=1 -------------------------------------------------------------------------------------N N 2 2 f (n) b ( n-1 ) m-1 m-1 n=1 n=1
∑
∑
Applying the Cauchy-Schwartz inequality, we now see that only the last entry assures the condition |κm| < 1 for all m: N
2
∑
b m-1 ( n-1 ) f *m-1 ( n )
n=1
N
N
2 2 ≤ ∑ f m-1 ( n ) ∑ b m-1 ( n-1 ) n=1 n=1
2
3.25
z (a) G ( z ) = -------------------------------------------------------( 1 – z ⁄ 0.4 ) ( 1 + z ⁄ 0.8 ) – 0.4 × 0.8 = --------------------------------------------0.8 1 – 0.4 ------- 1 + ------- z z We may therefore set a 21 = 0.4,
a 22 = – 0.32
94
a 21 – a 22 ⋅ a 21 a 1 = ----------------------------------2 1 – a 22 0.4 + 0.4 × 0.32 = -------------------------------------1–a 2 32
= 0.5882 The reflection coefficients are therefore κ 2 = – 0.32 κ 1 = 0.5882 (b) F2(z)
Σ
.
.
Σ
κ1
κ2
.−κ
−κ2
B2(z)
.
Σ
.
1
Σ
z-1 B1(z)
z-1 B0(z)
Fig. 1 B 2 ( z ) = [ – κ 2 B 1 ( z )z
–1
+ F 2 ( z ) ]κ 2 + B 1 ( z )z
B0 ( z ) 1 -------------- = -----------------------------------------------–1 –2 F2(z) 1 + a 21 z + a 22 z B 1 ( z ) = B 0 ( z )z
–1
+ B 0 ( z )κ 1 –1
B1 ( z ) κ1 + z -------------- = -----------------------------------------------–1 –2 F2(z) 1 + a 21 z + a 22 z B1 ( z ) –1 B2 ( z ) 2 –1 -------------- = -------------- [ z – κ 2 z ] + κ 2 F2(z) F2(z)
95
–1
(1)
–1
–1
2 –1
[ κ1 + z ][ z – κ2 z ] = ---------------------------------------------------------- + κ 2 –1 –2 1 + a 21 z + a 22 z –1
–1
–2
κ1z + κ1κ2z + κ2 + z = ------------------------------------------------------------------–1 –2 1 + ( κ 1 + κ 1 κ 2 )z + κ 2 z –2
–1
z + 0.4z – 0.32 = ------------------------------------------------–1 –2 1 + 0.4z – 0.32z Therefore, the all pass transfer function realized by the filter of Fig. 1 is –2
–1
z + 0.4z – 0.32 ------------------------------------------------–1 –2 1 + 0.4z – 0.32z 3.26
(a) Prediction order M=1 The inverse lattice filter consists of a single stage. By initializing the single stage of the filter to 1 and operating it with zero input, we have the following configuration to consider:
zero input
Σ
Output κ1
−κ∗1
Σ
z-1
Initial state = 1
Output for lag of one = – κ *1 r(1) = ---------r(0) (b) Prediction order M=2 In this case the inverse lattice filter consists of two stages. By initializing the two states of the filter to 1,0 and operating it with zero input, we have the following configuration to consider:
96
zero input
.
Σ
.
Σ
κ2
κ1
.
−κ∗1
−κ∗2
Σ
z-1
Output
.
Σ Initial state = 0
z-1
Initial state = 1
r(1) Output for lag of one = – κ *1 = ---------r(0) Output for lag of two = ( – κ *2 ) + ( – κ *1 ) ( κ 1 ) ( – κ *2 ) + ( – κ *1 ) ( – κ *1 ) r(2) = ---------r(0) (c) Prediction order M=3 In this case, the inverse lattice filter consists of three stages. By initializing the three states of the filter to 1,0,0 and operating it with zero input, we have the following configuration to consider zero input
Σ
.
Σ
Σ
.
Σ
κ2
κ3
−κ∗3
.
Initial state = 0 z-1
−κ∗2 Σ
.
. κ1
Initial state = 0 z-1
−κ∗1
Σ
.
Output for lag of one = – κ *1 r(1) = ---------r(0) Output for lag of two = ( – κ *2 ) + ( – κ *1 ) ( κ 1 ) ( – κ *2 ) + ( – κ *1 ) ( – κ *1 ) r(2) = ---------r(0)
97
Output
z-1
Initial state = 1
Output for lag of three = ( – κ *3 ) + ( – κ *2 ) ( κ 2 ) ( – κ *3 ) + ( – κ *1 ) ( κ 1 ) ( – κ *3 ) +( – κ *1 ) ( κ 1 ) ( – κ *2 ) ( κ 2 ) ( – κ *3 ) + ( – ( κ *1 ) ) ( – κ *2 ) + ( – κ *2 ) ( κ *1 ) + ( – κ *2 ) ( κ 1 ) ( – κ *2 ) + ( – κ *1 ) ( – κ *1 ) ( κ 1 ) ( – κ *2 ) + ( – κ *1 ) ( κ 1 ) ( – κ *2 ) ( κ 1 ) ( – κ *2 ) + ( – κ *1 ) ( – κ *1 ) ( – κ *1 ) + ( – κ *2 ) ( – κ *1 ) + ( – κ *1 ) ( – κ *1 ) ( κ 1 ) ( – κ *2 ) r(3) = ---------r(0) Note that a lag of one corresponds to signal flow through a single delay element z-1, a lag of two corresponds to signal flow through two delay elements, and a lag of three corresponds to signal flow through three delay elements, and so on. 3.27
Backforward prediction errors are orthogonal to each other. On the other hand, forward prediction errors are correlated, which follows from Property 3 discussed in Problem 3.20.
3.28 Fm(z)
.
Σ
.
Σ
...
κm
Bm(z)
.
κ1
−κ2 Σ
-1
z
.
..
...
h*m-1
* hm
F0(z)
Σ
.
z-1
.
B0(z)
h*1
h*0
Σ
Σ
...
Σ
−κ1
Y(z)
B0 ( z ) B1 ( z ) Bm ( z ) Y (z) --------------- = h 0 --------------- + h 1 --------------- + … + h m --------------Fm(z) Fm(z) Fm(z) Fm(z) –1
–m
h 0 + h 1 [ a 11 + a 10 z ] + … + h m [ a mm + …a m0 z ] = -------------------------------------------------------------------------------------------------------------------------------, –1 –m 1 + a m1 z + … + a mm z
98
a m0 = 1
–1
–m
( h 0 + h 1 a 11 + …h m a mm ) + ( h 1 a 10 + h 2 a 21 + …h m a m˙ m-1 )z + … + h m z -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------–1 –m 1 + a m1 z + … + a mm z –1
–m
P 0 + P 1 z + …P m z = --------------------------------------------------------, –1 –m 1 + d 1 z + …d m z
d i = a mi
–1
–m
[ 1 + P1 ⁄ P0 z + … + Pm ⁄ P0 z ] = P 0 ------------------------------------------------------------------------------------–1 –m 1 + d 1 z + …d m z
(1)
m
∏ ( 1 – z ⁄ zi ) i=1 = G 0 -------------------------------m ∏ ( 1 – z ⁄ pi ) i=1 m
∏ ( 1 – zi ⁄ z ) G 0 p 1 … p m i=1 = --------------------------- -------------------------------m z 1 …z m ∏ ( 1 – pi ⁄ z )
(2)
i=1
1 + ( –1 ) G0 p1 … pm = --------------------------z 1 …z m
1
m m
m
∑ zi z
–1
+ ( –1 )
∑ ∑ zi zh z
2
–m
m
( – 1 ) z 1 …z m
i=1 k=1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------m m m 1 + ( –1 )
1
∑ pi z
–1
+ ( –1 )
2
∑ ∑
i=1 i≠k
(a) By equating the denominators of Eqs. (1) and (2):
1 + a m1 z
+…+z
i≠k
i=1
i=1
–1
–2
+ …a mm z
–m
m
=
∏ ( 1 – pi ⁄ z ) i=1
99
k=1
pi pk z
–2
m
+ … + ( –1 ) p1 … pm z
–m
= 1 + ( –1 )
1
m
∑ pi z
m m
–1
p p z +∑∑ i k
i=1
–2
m
+ … + ( –1 ) p1 … pm z
–m
i=1 k=1 i≠k
By equating the coefficients of z-i, am,i can be calculated for i = 1...m. Then by using the equation a mk – a mm a *m, m-k a m-1, k = -------------------------------------------2 1 – a mm
k = 0, 1, …, m
ai,i, i = 1,...m can be calculated. Reflection coefficient ki - ai i
i = 1,...m
(b) By equating the numerator of Eqs. (1) and (2): h 0 + h 1 a 11 + …h m a mm + … + h m z
–m
G0 pi … pm m = -------------------------- ∏ ( 1 – z i ⁄ z ) z 1 …z m i=1
m G0 p1 … pm 1 –1 m –m = --------------------------- 1 + ( +1 ) ∑ z i z + … ( – 1 ) z 1 …z m z z 1 …z m i=1
By equating the coefficients of z-i, from z-m to z-1, the regression coefficients hm,...,h0 can be calculated G0 p1 … pm ( h 0 + h 1 a 11 + …h m a m1 ) = --------------------------z 1 …z m Coefficient of z-m is G0 p1 … pm m h m = --------------------------- ( – 1 ) z 1 …z m z 1 …z m m
= ( –1 ) G0 p1 … pm By equating the coefficients of z-m+1
100
(1)
G0 p1 … pm h m-1 a m-1,0 + h m-1 a m-1 = --------------------------- {coefficients of z-m+1] z 1 …z m By back substitution, hm-1 can be calculated in the same way as hm-2,...h1,h0 can be calculated. 3.29
( 1 + z ⁄ 0.1 ) ( 1 + z ⁄ 0.6 ) (i) G ( z ) = 10 -------------------------------------------------------( 1 – z ⁄ 0.4 ) ( 1 + z ⁄ 0.8 ) 0.6 0.1 ------- + 1 ------- + 1 z 10 × 0.4 × 0.8 z = --------------------------------- --------------------------------------------0.1 × 0.6 0.2 0.4 ------- – 1 1 + ------- z z 0.6 1 + 0.1 ------- 1 + ------- z z 32 = – ------ × 10 --------------------------------------------6 0.8 1 – 0.4 ------- 1 + ------- z z –1
–2
[ 1 + 0.7z + 0.06z ] 32 = – ------ × 10 -----------------------------------------------------–1 –2 6 1 + 0.4z – 0.32z Therefore, a 21 = 0.4, a 22 = – 0.32 a 21 – a 22 a 21 a 11 = ------------------------------2 1 – a 22 0.4 + 0.32 × 0.4 = -------------------------------------2 1 – 0.32 0.528 = ---------------0.8976 = 0.5882 Hence, the reflection coefficients are κ 1 = 0.5882 and
101
κ 2 = – 0.32 The regression coefficients are calculated as follows: – 32 h 2 = --------- × 10 × 0.06 6 = – 3.2 – 32 h 1 a 10 + h 2 a 21 = 0.7 × --------- × 10 6 Therefore h 1 = – 34.13 – 32 h 0 + h 1 a 11 + h 2 a 22 = --------- × 10 6 Therefore h 0 = – 34.28 2
10 ( 1 + z + z ) (ii) G ( z ) = -------------------------------------------------------( 1 – z ⁄ 0.4 ) ( 1 + z ⁄ 0.8 ) –1
–2
[1 + z + z ] = – ( 10 × 0.4 × 0.8 ) ------------------------------------------------- 1 – 0.4 ------- ( 1 + 0.8 ⁄ z ) z From part (i) a 21 = 0.4, a 22 = – 0.32
a 11 = 0.5882
κ 1 = 0.5882 κ 2 = – 0.32
h 2 = – 10 × 0.4 × 0.8 = – 3.2 h 1 a 10 + h 2 a 21 = – 10 × 0.4 × 0.8 h 1 = – 1.92
102
h 0 + h 1 a 11 + h 2 a 22 = – 10 × 0.4 × 0.8 h 0 = – 3.095 10 ( 1 + z ⁄ 0.6 ) (iii) G ( z ) = -------------------------------------------------------( 1 – z ⁄ 0.4 ) ( 1 + z ⁄ 0.8 ) 0.6 ------- + 1 ( 1.5 ⁄ ( z + 1 ) ) z 10 × 0.4 × 0.8 = – --------------------------------- -------------------------------------------------------( 1 – 0.4 ⁄ z ) ( 1 + 0.3 ⁄ z ) 0.6 × 1.5 From part (i), a 21 = 0.4, a 22 = – 0.32 κ 2 = – 0.32
a 11 = 0.5882
κ 1 = 0.5882
– 10 × 32 h 2 = --------------------- × 0.6 × 1.5 = – 3.2 90 – 10 × 32 h 1 a 10 + h 2 a 21 = 2.1 × --------------------- 80 Therefore h 1 = – 6.19 – 10 × 32 h 0 + h 1 a 11 + h 2 a 22 = --------------------90 h 0 = – 0.938 2
10 ( 1 + z + 0.5z ) (iv) -------------------------------------------------------( 1 – z ⁄ 0.4 ) ( 1 + z ⁄ 0.8 ) –2
–1
z + z + 0.5 = – 10 × 0.4 × 0.8 --------------------------------------------0.8 1 – 0.4 ------- 1 + ------- z z From part (i), a 21 = 0.4, a 22 = – 0.32
103
κ 1 = 0.5882
a 11 = 0.5882 h 2 = 10 × 0.4 × 0.8
κ 2 = – 0.32
= – 3.2
h 1 a 10 + h 2 a 21 = – 10 × 0.4 × 0.8 h 1 = – 1.92 h 0 + h 1 a 11 + h 2 a 22 = – 0.5 × 10 × 0.4 × 0.8 h 0 = – 1.495
3.30
(a) A ( e
jω
M
) =
∑ ak e
– jω k
k=0
ˆ (e H
jω
N
) =
∑ hˆ ( i )e
– jω i
k=0
ˆ e H
jω m
A e jω m = 1
(1)
ˆ (ω) 2 pˆ ( ω ) = H ˆ *e H
jω m
jω jω jω H ˆ e m A e m = H ˆ * e m
pˆ ( ω m ) A e p
∑ pˆ e
k=0 p
N
∑∑ k=1 m=1
jω m
jω = H ˆ * e m
jω m
jω a e – jω m k = H ˆ * e m k
jω m – jω m k jω mi pˆ e a k e e =
104
N
∑ Hˆ m=1
* jω m jω mi
e
e
From definition 1 Rˆ ( i ) = ---N
N
∑ pˆ ( ωm ) cos ωmi m=1
and averaging over ω m εΩ p
1 ∑ ak Rˆ ( i – k ) = ---N-
k=0
N
∑ Hˆ ( ωm )e
– jω mi
m=1
= hˆ ( – 1 ) for all i
1 = ---N
N
∑ m=1
exp ( – jω mi ) -------------------------------------------M
∑ ak exp ( – jωmk )
k=0 M
(b) D ( w ) =
∑ d k cos ωk k=0 2
N -1
2
Uk Uk D IS = ∑ ------------- – ln ------------- – 1 S (a) Sk ( a ) k=0 k N -1
=
∑
Uk
k=0
2
M
∑ cos ωk – ln
Uk
k=0
2
M
∑ d k cos ωk – 1 k=0
∂D JS ------------- = 0 ∂d i 2
U i cos ω i = U i cos ω i – ----------------------------------------M 2 U i ∑ d k cos ω k 2
k=0
105
Therefore U i
2
M
∑ d k cos ωk = 1
for j = 0…N -1
k=0
Ui
2
ˆ ( e jω ) = H
p ( wm ) A e
2
jω m
jω = H ˆ * e m
As in part (a), it can be shown: p
∑ ak R ( i – k )
= hˆ ( – i )
k=0
1 where R ( i ) = ---N
N
∑ p ( wm ) cos ωmi m=1
By comparing with part (a) the results are: p
p
∑ ak Rˆ ( j – k ) =
∑ ak R ( j – k )
k=0
k=0
for 0 ≤ i ≤ p
Therefore Rˆ ( i ) = Rˆ ( i ) 1 (c) S k ( a ) = -------------------------------m 2 – jω k ∑ ak e k=0 N -1
2
2
Uk Uk D IS ( a ) = ∑ ------------- – ln ------------- – 1 S (a) Sk ( a ) k=0 k ∂D IS ------------ = 0 ∂a i
for all i = 0,...m
106
N -1
D IS ( a ) =
∑
Uk
2
k=0
m
∑ ak e
2
– jω k
– ln U k
2
k=0
= Ui
e
2 – jω i 2
Ui e – -------------------------------------------m Ui
2
∑ ak e
– jω k
k=0
= 0
Ui
2
m
∑ ak e
– jω k
2
= 1 for all i = 0,...,m
k=0
Ui
2
ˆ e = H
jω m 2
From part (a) it can be shown that m
∑ ak R ( i – k )
= hˆ ( – i )
∑ ak e k=0
∂D IS ( a ) -------------------- = 0 ∂a i
2 – jω i 2
m
0
k=0
107
2
– jω k
2
–1
CHAPTER 4 4.1
(a) For convergence of the steepest-descent algorithm: 2 0 < µ < -----------λ max where λ max is the largest eigenvalue of the correlation matrix R. We are given
R =
1 0.5
0.5 1
The two eigenvalues of R are λ1 = 0.5 λ2 = 1.5 Hence λ max = 1.5 . The step-size parameter µ must therefore satisfy the condition 2 0 < µ < ------- = 1.334 1.5 We may thus choose µ = 1.0. (b) From Eq. 4.9 of the text, w ( n + 1 ) = w ( n ) + µ [ p – Rw ( n ) ] With µ = 1 and
p =
0.5 0.25
we therefore have w ( n + 1 ) = w ( n ) + 0.5 – 1 0.5 0.25
0.5 w ( n ) 1
108
= 1 0 – 1 0.5 0 1 0 – 0.5
=
0.5 1
0.5 w ( n ) – 0.25
– 0.5 w ( n ) – 0.5 0 0.25
That is, w 1 ( n+1 )
=
w 2 ( n+1 )
0 – 0.5
– 0.5 w 1 ( n ) – 0.5 1 0.25 w2 ( n )
Equivalently, we may write w 1 ( n+1 ) = – 0.5 w 2 ( n ) – 0.5 w 2 ( n+1 ) = – 0.5 w 1 ( n ) – 0.25 (c) To investigate the effect of varying the step-size parameter µ on the trajectory, we find it convenient to work with v(n) rather than with w(n). The kth natural mode of the steepest descent algorithm is described by ν k ( n+1 ) = ( 1 – µλ k )ν k ( n ),
k = 1, 2
Specifically, ν 1 ( n+1 ) = ( 1 – 0.5µ )ν 1 ( n ) ν 2 ( n+1 ) = ( 1 – 1.5µ )ν 2 ( n ) For the initial condition, we have
v(0) =
ν1 ( 0 ) ν2 ( 0 ) H
= Q wo From the solution to Problem 2.2:
109
w o = 0.5 0 1 Q = ------2
1 1 –1 1
Hence, 1 v ( 0 ) = ------- 1 – 1 0.5 2 1 1 0 1 = ------- 0.5 2 0.5 That is, 0.5 ν 1 ( 0 ) = ν 2 ( 0 ) = ------2 For n > 0, we have n
ν k ( n ) = ( 1 – µλ k ) ν k ( 0 ) ,
k = 1,2
Hence, n
ν 1 ( n ) = ( 1 – 0.5µ ) ν 1 ( 0 ) n
ν 2 ( n ) = ( 1 – 1.5µ ) ν 2 ( 0 ) µ = 1 n
ν 1 ( n ) = ( 0.5 ) ν 1 ( 0 ) n
ν 2 ( n ) = ( – 0.5 ) ν 2 ( 0 ) Solution - This represents an oscillatory trajectory.
110
µ = 0.1 n
ν 1 ( n ) = ( 0.59 ) ν 1 ( 0 ) n
ν 2 ( n ) = ( 0.85 ) ν 2 ( 0 ) This second solution represents a damped trajectory. The transition from a damped to an oscillatory trajector occurs at 1 µ = ------- = 0.667 1.5 Specifically, for 0 < µ < 0.667, the trajectory is damped. On the other hand, for 0.667 < µ < 1.334 the trajectory is oscillatory. 4.2
We are given J ( w ) = J min + r ( 0 ) ( w – w o )
2
(a) The correlation matrix R = r(0). Hence, λmax = r(0) 2 2 Correspondingly, µ max = ------------ = ---------λ max r(0) (b) Time constant of the filter is 1 1 τ 1 ≈ --------- = -------------µλ 1 µr ( 0 ) (c) J(w) Slope = 2r(0)(w - wo) Jmin 0
wo
w
w
111
∂J ------- = 2r ( 0 ) ( w – w o ) ∂w 4.3
(a) There is a single mode with eigenvalue λ1 = r(0), and q1 = 1, Hence, J ( n ) = J min + λ 1 v 1 ( n )
2
where v1(n) = q1(wo - w(n)) = (wo - w(n)) 2 2 ∂J ( n ) (b) -------------- = v 1 ( n ) = ( w o – w ( n ) ) ∂λ 1
4.4
The estimation error e(n) equals H
e ( n ) = d ( n ) – w ( n )u ( n ) where d(n) is the desired response, w(n) is the tap-weight vector, and u(n) is the tap-input vector. Hence, the gradient of the instantaneous squared error equals 2 ∂ ∂ ˆ J ( n ) = ------ [ e ( n ) ] = ------- [ e ( n )e * ( n ) ] ∇ ∂w ∂w
∂e * ( n ) ∂e ( n ) = e ( n ) ---------------- + e * ( n ) -------------∂w ∂w H
= – 2e * ( n )u ( n ) = 2u ( n )d * ( n ) + 2u ( n )u ( n )w ( n ) 4.5
Consider the approximation to the inverse of the correlation matrix: n
R ( n+1 ) ≈ µ ∑ ( I – µR ) –1
k
k=0
where µ is a positive constant bounded in value as 2 0 < µ < -----------λ max where λ max is the largest eigenvalue of R. Note that according to this approximation, we have R-1(1) = µI. Correspondingly, we may approximate the optimum Wiener solution as
112
–1
w ( n+1 ) = R ( n+1 )p n
≈ µ ∑ ( I – µR ) p k
k=0 n
= µp + µ ∑ ( I – µR ) p k
k=1
In the second term, put k = i+1 or i = k-1: n
w ( n+1 ) = µp + µ ( I – µR ) ∑ ( I – µR ) p i
k=1
= µp + ( I – µR )w ( n )
(1)
where, in the second line, we have used the fact that n
µ ∑ ( I – µR ) p = w ( n ) i
i=0
Hence, rearranging Eq. (1): w ( n+1 ) = w ( n ) + µ [ p – Rw ( n ) ] which is the standard formula for the steepest descent algorithm. 4.6
2 1 J ( w ( n+1 ) ) = J ( w ( n ) ) – --- µ g ( n ) 2
For stability of the steepest-descent algorithm, we therefore require J ( w ( n+1 ) ) < J ( w ( n ) ) To satisfy this requirement, the step-size parameter µ should be positive, since µ g(n)
2
> 0.
Hence, the steepest-descent algorithm becomes unstable when the step-size parameter is negative.
113
4.7
Let δw ( n+1 ) = w ( n+1 ) – w ( n ) = µ [ p – Rw ( n ) ] By definition, p = E [ u ( n )d * ( n ) ] H
R = E [ u ( n )u ( n ) ] Hence, H
δw ( n+1 ) = { µE [ u ( n )d * ( n ) ] – E [ u ( n )u ( n ) ]w ( n ) } H
= µE [ u ( n ) { d * ( n )u ( n )w ( n ) } ] But, the estimation error is H
e ( n ) = d ( n ) – w ( n )u ( n ) Therefore, δw ( n+1 ) = µE [ u ( n )e * ( n ) ] At the minimum point of the error-performance surface, the correction δw ( n+1 ) is zero. Hence, at this point, we have *
E [ u ( n )e ( n ) ] = 0 which is a restatement of the principle of orthogonality. 4.8
We are given J approx ( n ) = [ J ( 0 ) – J ( ∞ ) ]e
–n ⁄ τ
+ J (∞)
For n = 1, we have
114
J approx ( 1 ) = J ( 1 ) = [ J ( 0 ) – J ( ∞ ) ]e
–1 ⁄ τ
+ J (∞)
Solving for e1/τ, we get e
1⁄τ
J (0) – J (∞) = ----------------------------J (1) – J (∞)
Taking natural logarithms: J (0) – J (∞) 1 --- = ln ----------------------------J (1) – J (∞) τ or J (1) – J (∞) τ = ln ----------------------------J (0) – J (∞) 4.9
(a) The cross-correlation between the “desired response” u(n) and the tap input u(n-1) is p = E [ u ( n )u ( n-1 ) ] = r ( 1 ) The mean-square value of the tap input u(n-1) is 2
E [ u ( n-1 ) ] = r ( 0 ) Hence, w ( n+1 ) = w ( n ) + µ ( r ( 1 ) – r ( 0 )w ( n ) ) For the recursive computation of the parameter a, we note that a(n) = -w(n); hence a ( n+1 ) = a ( n ) – µ [ r ( 1 ) + r ( 0 )a ( n ) ] (b) The error-performance surface is defined by 2
J ( n ) = r ( 0 ) – 2r ( 1 )w ( n ) + r ( 0 )w ( n ) 2
= r ( 0 ) + 2r ( 1 )a ( n ) + r ( 0 )a ( n )
115
The corresponding plot of the error performance surface is therefore J(n)
r(0) r2(1)
r(0) - ------r(0) 0
-r(1)/r(0)
a
(c) The condition on the step-size parameter is 2 0 < µ < ---------r(0) 4.10
The second-order AR process u(n) is described by the difference equation u ( n ) = – 0.5u ( n-1 ) + u ( n-2 ) + v ( n )
(1)
Hence, w 1 = – 0.5, w 2 = 1 and the AR parameters equal a 1 = 0.5, a 2 = – 1 Accordingly, we write the Yule-Walker equations as r(0) r(1)
2 σv
r ( 1 ) – 0.5 = r ( 1 ) r(0) 1 r(2)
(2)
2
=
∑ ak r ( k ) k=0
1 = a0 r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 ) = r ( 0 ) + 0.5r ( 1 ) – r ( 2 )
(3)
Eqs. (1), (2) and (3) yield
116
r(0) = 0 r(1) = 1 r(2) = -1/2 Hence, R = 0 1
1 0
The eigenvalues of R are -1, +1. For convergence of the steepest descent algorithm: 2 0 < µ < -----------λ max where λmax is the largest eigenvalue of the correlation matrix R. Hence, with λmax = 1. 0<µ<2 4.11
u ( n-2 ) = 0.5u ( n-1 ) + u ( n ) – v ( n )
(1)
Hence, w 1 = 1 w 2 = 0.5 Accordingly, we may write Rw b = r r(0) r(1)
2 σv
B*
as
r(1) 1 = r(2) r ( 0 ) 0.5 r(1) 2
=
∑ ak r ( k ) k=0
1 = r ( 0 ) – r ( 1 ) – 0.5r ( 2 )
(2)
r(0) = 0 r(1) = -2/3
Therefore, R =
0 –2 ⁄ 3
–2 ⁄ 3 0
117
The eigenvalues of R are λ = -2/3, 2/3 For convergence of the steepest descent algorithm: 2 0 < µ < -----------λ max 2 0 < µ < ---------2⁄3 0<µ<3 4.12
The third-order AR process u(n) is described by the difference equation u ( n ) = – 0.5u ( n-1 ) – 0.5u ( n-2 ) + 0.5u ( n-3 ) + v ( n ) Hence w1 = -0.5,
w2 = -0.5,
w3 = 0.5
and the AR parameters equal a1 = 0.5,
a2 = 0.5,
a3 = -0.5
Accordingly, we write the Yule-Walker equations as r(1) r ( 0 ) r ( 1 ) r ( 2 ) – 0.5 r ( 1 ) r ( 0 ) r ( 1 ) – 0.5 = r ( 2 ) r(3) r ( 2 ) r ( 1 ) r ( 0 ) 0.5
2 σv
3
=
∑ ak r ( k ) k=0
1 = r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 ) + a3 ( 3 )
(3)
Equations (1), (2) and (3) yield r(0) = 1 r(1) = -1/2 r(2) = -1/2 r(3) = +1
118
(a) The correlation matrix is 1 –1 ⁄ 2 –1 ⁄ 2 R = –1 ⁄ 2 1 –1 ⁄ 2 –1 ⁄ 2 –1 ⁄ 2 1 (b) The eigenvalues of R are 0, 1.5, 1.5 (c) For convergence of the steepest descent algorithm, we require 2 0 < µ < -----------λ max Hence, with λmax = 1.5, we have 2 0 < µ < ------1.5 4.13
(a) The first-order MA process is described by the difference equation u ( n ) = v ( n ) – 0.2v ( n-1 ) Taking the z-transform of both sides: U(z) = (1 - 0.2z-1)V(z) U (z) 1 ------------ = ---------------------------------V (z) –1 –1 ( 1 – 0.2z ) 1 ≈ -----------------------------------------------------–1 –2 1 + ( 0.2 )z + 0.04z Accordingly, u ( n ) + 0.2u ( n-1 ) + 0.04u ( n-2 ) = v ( n ) u ( n ) = – 0.2u ( n-1 ) – 0.04u ( n-2 ) + v ( n ) for which we infer w1 = -0.2,
w2 = -0.04
119
Correespondingly, the AR parameters equal a1 = 0.2,
a2 = 0.04
Yule-Walker equations are r ( 0 ) r ( 1 ) – 0.2 = r ( 1 ) r ( 1 ) r ( 0 ) – 0.04 r(2)
2
σv =
(1)
3
∑ ak r ( k )
(2)
k=0
1 = r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 )
(3)
Equations (1), (2) and (3) yield r(0) = 1.0401 r(1) = -0.2000 r(2) = -0.0016 The correlation matrix is
R =
r(0) R(1)
r ( 1 ) = 1.0401 R(0) – 0.2000
– 0.2000 1.0401
The eigenvalues of R are = 1.2401, 0.8401, λmax = 1.2401 For convergence of the steepest descent algorithm: 2 0 < µ < ---------------1.2401 That is, 0 < µ < 1.6127 U (z) 1 (b) ------------ = ---------------------------------V (z) –1 –1 ( 1 – 0.2z )
120
1 ≈ -----------------------------------------------------------------------------–1 –2 –3 1 + 0.2z + 0.04z + 0.008z Accordingly, u ( n ) + 0.2u ( n-1 ) + 0.04u ( n-2 ) + 0.008u ( n-3 ) = v ( n ) w1 = -0.2,
w2 = -0.04,
w3 = -0.008
The AR parameters are a1 = 0.2,
a2 = 0.04,
a3 = 0.008
The Yule-Walker equations are r(1) r ( 0 ) r ( 1 ) r ( 2 ) – 0.2 r ( 1 ) r ( 0 ) r ( 1 ) – 0.04 = r ( 2 ) r(3) r ( 2 ) r ( 1 ) r ( 0 ) 0.008
2 σv
(4)
3
=
∑ ak r ( k )
(5)
k=0
1 = r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 ) + a3 r ( 3 ) Equations (4), (5) and (6) yield r(0) = 1.04 r(1) = -0.2 r(2) = 0.0 r(3) = -0.003 1.04 – 0.2 0 The correlation matrix R = – 0.2 1.04 – 0.2 0 – 0.2 1.04 and its eigenvalues are λ = 0.7572, 1.04 1.3228,
λmax = 1.3228
For convergence of the steepest descent algorithm:
121
(6)
2 0 < µ < ---------------1.3228 0 < µ < 1.5119 (c) When the system is approximated by a second-order AR model, 0 < µ < 1.6127 When it is approximated by a third-order AR model, 0 < µ < 1.5119 These results show that the upperbound on the step-size parameters becomes tighter as the approximating model order is increased. 4.14
(a) u ( n ) = – 0.5u ( n-1 ) + v ( n ) – 0.2v ( n-1 ) Take z-transforms of both sides: U(z) = -0.5z-1U(z) + V(z) - 0.2z-1V(z) –1
U (z) 1 – 0.2z ------------ = ------------------------–1 V (z) 1 + 0.5z 1 = ----------------------------------------------------------------–1 –1 –1 ( 1 + 0.5z ) ( 1 – 0.2z ) Approximation to third-order yields U (z) 1 ------------ = -------------------------------------------------------------------------------------------------------------–1 –1 –2 –3 V (z) ( 1 + 0.5z ) ( 1 + 0.2z + 0.04z + 0.08z ) 1 ≈ -----------------------------------------------------------------------------–1 –2 –3 1 + 0.7z + 0.14z + 0.028z Accordingly, u ( n ) + 0.7u ( n-1 ) + 0.14u ( n-2 ) + 0.028u ( n-3 ) = v ( n ) u ( n ) = – 0.7u ( n-1 ) – 0.14u ( n-2 ) – 0.028u ( n-3 ) + v ( n )
122
w1 = -0.7,
w2 = -0.14,
w3 = -0.028
The AR parameters are a1 = 0.7,
a2 = 0.14,
a3 = 0.028
The Yule-Walker equations are r ( 0 ) r ( 1 ) r ( 2 ) – 0.7 r(1) = r ( 1 ) r ( 0 ) r ( 1 ) – 0.14 r(2) r ( 2 ) r ( 1 ) r ( 0 ) 0.028 r(3)
2 σv
(1)
3
=
∑ ak r ( k )
(2)
k=0
r ( 0 ) + a1 r ( 1 ) + a2 r ( 2 ) + a3 r ( 3 ) = 1
(3)
Equations (1), (2), and (3) yield r(0) = 1.6554 r(1) = -1.0292 r(2) = 0.5175 r(3) = -0.2646 r(0) r(1) r(2) 1.6554 – 1.0292 0.5175 The correlation matrix R = r ( 1 ) r ( 0 ) r ( 1 ) = – 1.0292 1.6554 – 1.0292 r(2) r(1) r(0) 0.5175 – 1.0292 1.6554 (b) The eigenvalues of R are 0.4358, 1.1379, 3.3925,
λmax = 3.3925
(c) For convergence of the steepest descent algorithm we require 2 0 < µ < ------------, λ max = 3.3925 λ max 2 0 < µ < ---------------3.3925 0 < µ < 0.589
123
4.15
–1
(a) w ( n+1 ) = ( 1 – µ )w ( n ) + µR p = ( 1 – µ )w ( n ) + µw o,
–1
(R p = w o )
Substract w o from both sides w ( n+1 ) – w o = ( 1 – µ )w ( n ) + µw o – w o = ( 1 – µ ) ( w ( n ) – wo ) Starting with the initial value w o and iterating: n
w ( n ) – wo = ( 1 –µ ) ( w ( 0 ) – wo ) From Eq. (2.52) of the text: H
J ( n ) = J min + ( w – w o ) R ( w – w o ) n
H
(1) n
= J min + ( 1 – µ ) [ w ( 0 ) – w o ] R ( 1 – µ ) [ w ( 0 ) – w o ] 2n
H
= J min + ( 1 – µ ) [ w ( 0 ) – w o ] R [ w ( 0 ) – w o ] 2n
= J min + ( 1 – µ ) ( J ( 0 ) – J min ) The transient behavior of Newton’s algorithm is thus characterized by a single exponential whose corresponding time constant is ( 1 –µ )
2k
= e
–k ⁄ τ
(2)
where µ « 1 . (b) Taking the logarithm of both sides of Eq. (2): 2k ln ( 1 – µ ) = – k ⁄ τ When µ « 1 , ln ( 1 – µ ) ≈ – µ . Hence
124
– 2kµ ≈ – k ⁄ τ from which we readily deduce the time constant 1 τ = ------ , 2µ
µ « 1.
125
CHAPTER 5 5.1
From Fig. 5.2 of the text we see that the LMS algorithm requires 2M+1 complex multiplications and 2M complex additions per iteration, where M is the number of tap weights used in the adaptive transversal filter. Therefore, the computational complexity of the LMS algorithm is O(M).
5.2 d(n) Primary signal
Reference signal
o
o
u(n)
^w*(n)
_
+ Σ
.
e(n)
e ( n ) = d ( n ) – wˆ *1 ( n )u ( n ) ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e * ( n ) w 5.3
For backward prediction, we have (assuming real data) T
ˆ ( n )u ( n ) uˆ ( n – n 0 ) = w y ( n ) = u ( n ) – uˆ ( n ) ˆ ( n+1 ) = w ˆ ( n ) + µ [ u ( n – n 0 ) – uˆ ( n – n 0 ) ]u ( n ) w 5.4
The adaptive line enhancer minimizes the mean-square error, E[|e(n)|2]. For the problem at hand, the cost function J = E[|e(n)|2] consists of the sum of three components: 2
(1) The average power of the primary input noise, denoted by σ ν (2) The average power of the noise at the filter output (3) The average power of contribution produced by the sinusoidal components at the input and output of the filter Let the peak value of the transfer function be denoted by a. We may then approximate the peak value of the weights as 2a/M, where M is the length of the filter. On this basis, we 2
may approximate the average power of the noise at the filter output as (2a2/M) σ ν , which takes care of term (2). For term (3), we assume that the input and output sinusoidal
126
components subtract coherently, thereby yielding the average power (A2/2)(1-a)2. Hence, we may express the cost function J as 2
2
2 2 A 2 2a J ≈ σ ν + --------- σ ν + ------ ( 1 – a ) M 2
Differentiating J with respect to a and setting the result equal to zero yields the optimum scale factor 2
A a opt = ------------------------------------2 2 A + 4 ( σν ⁄ M ) 2
2
( A ⁄ 2σ ν ) ( M ⁄ 2 ) = ---------------------------------------------------2 2 1 + ( A ⁄ 2σ ν ) ( M ⁄ 2 ) ( M ⁄ 2 )SNR = --------------------------------------1 + ( M ⁄ 2 )SNR 2
2
where SNR = A ⁄ 2σ ν 5.5
The index of performance equals J ( w, K ) = E [ e
2K
( n ) ],
K = 1, 2, 3, …
The estimation error e(n) equals T
e ( n ) = d ( n ) – w ( n )u ( n )
(1)
where d(n) is the desired response, w(n) is the tap-weight vector of the transversal filter, and u(n) is the tap-input vector. In accordance with the multiple linear regression model for d(n), we have T
d ( n ) = w o ( n )u ( n )
(2)
where wo is the parameter vector, and v(n) is a white-noise process of zero mean and variance σ 2v . (a) The instantaneous gradient vector equals
127
∂ ˆ ( n, K ) = ------J ( w, K ) ∇ ∂w ∂ 2K = ------- [ e ( n ) ] ∂w = 2K e
2K -1
= – 2K e
∂e ( n ) ( n ) -------------∂w
2K -1
( n )u ( n )
Hence, we may express the new adaptation rule for the estimate of the tap-weight vector as 1 ˆ ( n, K ) ) ˆ ( n+1 ) = w ˆ ( n ) – --- µ ( – ∇ w 2 ˆ ( n ) + µKu ( n )e = w
2K -1
(n)
(3)
ˆ ( n ) used in place of (b) Eliminate d(n) between Eqs. (1) and (2), with the estimate w w(n): T
ˆ (n)) u(n) + v(n) e ( n ) = ( wo – w T
= ∈ ( n )u ( n ) + v ( n ) T
= u (n) ∈ (n) + v(n)
(4)
Subtract wo from both sides of Eq. (3): ∈ ( n+1 ) = ∈ ( n )-µKu ( n )e
2K -1
(n)
(5)
ˆ ( n ) is close to wo), we may use Eq. (4) For the case when ∈(n) is close to zero (i.e., w to write e
2K -1
T
(n) = [u (n) ∈ (n) + v(n)]
= v
2K -1
2K -1
T
u (n) ∈ (n) ( n ) 1 + -----------------------------v(n)
128
2K -1
≈v
2K -1
= v
T
u (n) ∈ (n) ( n ) 1 + ( 2K -1 ) -----------------------------v(n)
2K -1
T
( n ) + ( 2K -1 )u ( n ) ∈ ( n )v
2 ( K -1 )
(n)
(6)
Substitute Eq. (6) into (5): ∈ ( n+1 ) ≈ [ I – µK ( 2K -1 )v
2 ( K -1 )
T
( n )u ( n )u ( n ) ] ∈ ( n )-µKv
2K -1
u(n)
Taking the expectation of both sides of this relation and recognizing that (1) ∈(n) is independent of u(n) by low-pass filtering action of the filter, (2) u(n) is independent of v(n) by assumption, and (3) u(n) has zero mean, we get E[ ∈ ( n+1 )] = { I – µK ( 2K -1 )E [ v
2 ( K -1 )
( n ) ]R }E[ ∈ ( n )]
(7)
where T
R = E [ u ( n )u ( n ) ] (c) Let R = QΛQ
T
(8)
where Λ is the diagonal matrix of eigenvalues of R, and Q is a matrix whose column vectors equal the associated eigenvectors. Hence, substituting Eq. (8) in (7) and using T
υ ( n ) = Q E[ ∈ ( n )] we get υ ( n+1 ) = { I – µK ( 2K -1 )E [ v
2 ( K -1 )
( n ) ]Λ }υ ( n )
That is, the ith element of this equation is 2 ( K -1 ) υ i ( n+1 ) = 1 – µK ( 2K -1 )E [ v ( n ) ]λ i υ i ( n )
(9)
where υ i ( n ) is the ith element of υ ( n ) , and i = 1, 2, …, M . Solving the first-order difference equation (9):
129
n-1
2 ( K -1 ) ( n ) ]λ i υ i ( n ) = 1 – µK ( 2K -1 )E [ v
υi ( 0 )
where υ i ( 0 ) is the initial value of υ i ( n ) . Hence, for υ i ( n ) to converge, we require that 1 – µK ( 2K -1 )E [ v
2 ( K -1 )
( n ) ] λ max < 1
where λmax is the largest eigenvalue of R. This condition on µ may be rewritten as 2 0 < µ < ---------------------------------------------------------------------2 ( K -1 ) K ( 2K -1 )λ max E [ v (n)]
(10)
When this condition is satisfied, we find that υi ( ∞ ) → 0
for all i
ˆ ( ∞ ) → wo . That is, ∈(∞) → 0 and, correspondingly, w (d) For K = 1, the results described in Eqs. (3), (7) and (10) reduce as follows, respectively, ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e ( n ) w E[ ∈ ( n+1 )] = ( I – µR )E[ ∈ ( n )] 2 0 < µ < -----------λ max These results are recognized to be the same as those for the conventional LMS algorithm for real-valued data. 5.6
(a) We start with the equation E[ ∈ ( n+1 )] = ( I – µR ) E[ ∈ ( n )] where ˆ (n) ε ( n ) = wo – w We note that
130
(1)
ˆ (n)] m(n) = E [w wo = m ( ∞ ) Hence, we may rewrite Eq. (1) as m ( n+1 ) – m ( ∞ ) = ( I – µR ) ( m ( n ) – m ( ∞ ) ) This is a first-order difference equation in m(n) - m(∞). Solving it, we get n
m ( n ) – m ( ∞ ) = ( I – µR ) ( m ( 0 ) – m ( ∞ ) ) where m(0) is the initial value of m(n). Hence, n
m ( n ) = m ( ∞ ) + ( I – µR ) ( m ( 0 ) – m ( ∞ ) )
(2)
(b) Using the expansion H
R = Q ΛQ and the fact that H
Q Q = I we may rewrite Eq. (2) as H
H
m ( n ) = m ( ∞ ) + (Q Q – µQ ΛQ) ( m ( 0 ) – m ( ∞ ) )
(3)
Let υ(n) = Q(m(n) – m(∞)) υ(0) = Q(m(0) – m(∞)) We may then rewrite Eq. (3) as n
υ ( n ) = ( I – µΛ ) υ ( 0 ) For an arbitrary υ ( 0 ), υ ( n ) → 0 and the LMS algorithm is therefore convergent in the mean value if and only if
131
1 – µλ max < 1 that is 2 µ < -----------λ max 5.7
(a) For convergence of the LMS algorithm in the mean value, we require that the step-size parameter µ satisfy the following condition: 2 0 < µ < -----------λ max
(1)
where λmax is the largest eigenvalue of the correlation matrix of the input vector u(n). For a white noise input of zero mean and variance σ2, we have R = σ2I For such an input, all the eigenvalues assume the common value σ2. Correspondingly, Eq. (1) takes on the form 2 0 < µ < -----2 σ (b) The excess mean-square error produced by the LMS algorithm is J ex = J ( ∞ ) – J min 2 = ----------M
(2)
∑ λi
i=1
With λi = σ2 for all i, Eq. (2) takes on the form 2 J ex = -----------2 Mσ
132
where M is the number of taps in the transversal filter. 5.8
(a) We are given 2
J (n) = e(n) + α w(n)
2
H
= e ( n )e * ( n ) + αw ( n )w ( n ) Hence, using the material described in Appendix B: ∂J ( n ) ∇J ( n ) = -----------------∂w * ( n ) *
∂e ( n ) * ∂e ( n ) = ------------------ e ( n ) + ------------------ + 2αw ( n ) ∂w * ( n ) ∂w * ( n )
(1)
By definition, H
e ( n ) = d ( n ) – w ( n )u ( n ) *
H
e* ( n ) = d ( n ) – u ( n )w ( n ) Therefore, *
∂e ( n ) ∂e ( n ) ------------------ = – u ( n ) and ------------------ = 0 ∂w * ( n ) ∂w * ( n ) Thus, Eq. (1) reduces to *
∇J ( n ) = – 2u ( n )e ( n ) + 2αw The update for the tap-weight vector in the leaky LMS algorithm is therefore as follows: 1 ˆ ( n+1 ) = w ˆ ( n ) – --- µ∇J ( n ) w 2 *
ˆ ( n ) + µ ( u ( n )e ( n ) – αw ( n ) ) = w
133
*
ˆ ( n ) + µ u ( n )e ( n ) = ( 1 – µα )w
(2)
(b) Taking expectations of both sides of Eq. (2): ˆ ( n+1 ) ] = ( 1 – µα )E [ w ˆ ( n ) ] + µE [ u ( n ) ] E[w But H
ˆ ( n )u ( n ) e(n) = d (n) – w Hence, ˆ ( n+1 ) ] = ( 1 – µα )E [ w ˆ ( n ) ] + E [ u ( n )d * ( n ) ] E[w H
ˆ (n)] – µE [ u ( n )u ( n )w
(3)
ˆ ( n ) is essentially independent of u(n), we may Invoking the fact that for small µ, w write H
H
ˆ ( n ) ] = E [ u ( n )u ( n ) ]E [ w ˆ (n)] E [ u ( n )u ( n )w ˆ (n)] = RE [ w We also recognize that E [ u ( n )d * ( n ) ] = p We may therefore rewrite Eq. (3) as ˆ ( n+1 ) ] = ( 1 – µα )E [ w ˆ ( n ) ] + µp – µRE [ w ˆ (n)] E[w
(4)
ˆ ( n+1 ) ] and E [ w ˆ ( n ) ] become indistinguishable, in which case we As n → ∞, E [ w may simplify Eq. (4) to ˆ ( n ) ] + µp – µRE [ w ˆ ( n ) ] for n → ∞ 0 = – µαE [ w Equivalently, we may write –1
ˆ ( n ) ] = ( R + αI ) p lim E [ w
n→∞
134
(c) Let the input vector applied to the standard LMS algorithm be written as u eq ( n ) = u ( n ) + v ( n )
(5)
where v(n) is a white noise process of zero mean and variance α. The correlation matrix of ueq(n) is therefore Req = R + αI Correspondingly, we may write –1
ˆ ( n ) = R eq p lim w n→∞ –1
= ( R + αI ) p Thus, the standard LMS algorithm is equivalent to a leaky LMS algorithm with its input vector ueq(n) related to the input vector u(n) of the leaky LMS algorithm in the manner described in Eq. (5). 5.9
We start with 2
K ( n+1 ) = ( I – µR )K ( n ) ( I – µR ) + µ J min R
(1)
We are given 2
R = σv I
(2)
Hence, Eq. (1) reduces to 2 2
2 2
K ( n+1 ) = ( 1 – µσ v ) K ( n ) + µ σ v J min I For n → ∞, we thus have 2 2
2 2
K ( ∞ ) = ( 1 – µσ v ) K ( ∞ ) + µ σ v J min I That is, 2
( 2 – µσ v )K ( ∞ ) = µJ min I
(3)
135
For convergence in the mean square, we choose the step-size parameter µ to satisfy the condition 2 0 < µ < ----------M
∑ λi
i=1
For an input process u(n) whose correlation matrix R is described in Eq. (2), we have λ i = σ v for i = 1, 2, …, M 2
where M is the number of taps in the transversal filter. Hence, 2 0 < µ < -----------2 Mσ v 2
For large M, it follows that ( 2 – µσ v ) is closely approximated by 2, in which case Eq. (3) takes on the approximate form µ K ( ∞ ) ≈ --- J min I 2 5.10
We start with Eq. (5.58) in the text *
∈0 ( n+1 ) = ( I – µR ) ∈0 ( n ) + µu ( n )e 0 ( n )
(1)
This difference equation has a solution in convolutional form ∈0 ( n ) =
* H ( n )✪ [ µu ( n )e 0 ( n ) ]
∞
=
∑
*
H ( m ) [ µu ( n – m )e 0 ( n – m ) ],
m=-∞
where ✪ denotes convolution, and the matrix H ( n ) = ( I – µR )
n-1
for n > 0 and H ( n ) = 0 for n ≤ 0
represents the causal symmetric M x M matrix impulse response of the system.
136
(2)
We are interested in the transmission of stationary stochastic signals, particularly the transformation of the input autocorrelation (use is made of the statistical independence of u(n) and e *0 ( n ) )
F
(l)
H = E µu ( n )e 0 ( n )µu ( n – l )e *0 ( n – l ) H 2 (l) (l) = E e 0 ( n )e *0 ( n – l ) E { µu ( n )µu ( n – l ) } = µ J min R
(3)
H
into the weight-error correlation matrix K 0 = E{ ∈0 ( n ) ∈ 0 ( n )} . Using standard results of linear system theory, this is found as K0 =
∑ ∑ H ( m )F
(l)
H ( m+l ) =
∑T
(l)
, where T
l
m l
(l)
=
∑ H ( m )F
(l)
H ( m+l )
(4)
m
In an attempt to evaluate the last sum in Eq. (4), we encounter the difficulty that the H ( m ) (l)
(l)
and F , in general, do not commute, thus prohibiting the extraction of F from the sum. In fact, the last sum does not admit an explicit summation. Instead we show that for (l)
µ → 0, T satisfies a Lyapunov equation. To this end, we write the impulse response Eq. (2) in the recursive form [where the Dirac function δ ( m ) equals unity for m = 0 and zero elsewhere] H ( m+1 ) = ( I – µR )H ( m ) + δ ( m )I = H ( m ) ( I – µR ) + δ ( m )I
(5)
which yields T
(l)
=
∑ H ( m+1 )F
(l)
H(m + l + 1)
m
=
∑ [ ( I – µR )H ( m ) + δ ( m )I ]F
(l)
[ H ( m + l ) ( I – µR ) + δ ( m + l )I ]
m
= ( I – µR )T
(l)
( I – µR ) + F
(l)
H ( l ) ( I – µR ) + ( I – µR )H ( – l )F
137
(l)
+ δ ( l )F
(l)
(6)
For small µ we may simplify Eq. (6) by neglecting the terms µRT –F F
(l)
(l)
RT
H ( l )µR, – µRH ( – l )F
(l)
, and approximate F
(l)
H ( l ) + H ( – l )F
(l)
+ δ ( l )F
(l) (l)
µR , by
. Thus we arrive at the Lyapunov equation
(l)
+T
(l)
–1 ( l )
R = µ F
(7)
and, after summation over l, using K 0 =
∑T
(l)
, we get
l
RK 0 + K 0 R =
∑µ
–1 ( l )
F
l
(l)
= µ ∑ J min R
(l)
.
(8)
l
which is the desired result. 5.11
(a) We start with Eq. (5.80) of the text: n-1
v k ( n ) = ( 1 – µλ k ) v k ( 0 ) + ∑ ( 1 – µλ k ) n
n-1-i
φk ( i )
i=0
Applying the expectation operator and noting that E [ φ k ( n ) ] = 0 for all k we get n
E [ v k ( n ) ] = ( 1 – µλ k ) v k ( 0 ),
k = 1, 2, …, M
(b) The mean-square value of vk(n) is 2
E [ v k ( n ) ] = ( 1 – µλ k ) n-1 n-1
2n
vk ( 0 )
+ ∑ ∑ ( 1 – µλ k )
2
n-1-i
( 1 – µλ k )
n-1-i
E [ φ k ( i )φ *k ( j ) ]
i=0 j=0
where we have ignored the cross-product terms as the initial value vk(0) is independent of φk(i). Next we note
138
* E [ φ k ( i )φ k ( j ) ]
2 = µ J min λ k, 0,
i= j otherwise
Hence, 2
E [ v k ( n ) ] = ( 1 – µλ k )
2n
n-1
vk ( 0 )
+ ∑ ( 1 – µλ k )
2
2n-2-2i 2
µ J min λ k
i=0
= ( 1 – µλ k )
2n
vk ( 0 )
2
2
+ µ J min λ k ( 1 – µλ k )
2n-2
n-1
∑ ( 1 – µλk )
–2 i
(1)
i=0
The sum n-1
∑ ( 1 – µλk )
–2 i
i=0
represents a geometric series with a first term equal to 1, a geometric ratio of ( 1 – µλ k )
–2
, and a total number of terms equal to n. We thus have – 2n
n-1
∑ ( 1 – µλk )
i=0
–2 i
1 – ( 1 – µλ k ) = ---------------------------------------–2 1 – ( 1 – µλ k ) 2
– 2n+2
( 1 – µλ k ) – ( 1 – µλ k ) = – --------------------------------------------------------------------µλ k ( 2 – µλ k ) Hence, Eq. (1) reduces to 2
E [ ν k ( n ) ] = ( 1 – µλ k )
2n
νk ( 0 )
2
139
– µJ min ( 1 – µλ k )
2
2n-2 ( 1
– 2n+2
– µλ k ) – ( 1 – µλ k ) --------------------------------------------------------------------- 2 – µλ k
µJ min 2n 2 µ = ------------------J min + ( 1 – µλ k ) ν k ( 0 ) – ------------------ 2 – µλ k 2 – µλ k 5.12
(a) The mean-square deviation is defined by 2
D ( n ) = E [ || ∈ ( n )|| ] 2
≈ E [ || ∈ ( n )|| ] for small step-size µ 2
= E [ QV ( n ) ] ,
Q = orthonormal matrix
2
= E [ V(n) ] For n → ∞ , we thus have 2
D (∞) ≈ E [ V(∞) ] M
∑
vk ( ∞ )
2
∑ E [ vk ( ∞ )
2
= E
k=1 M
=
]
k=1
From Eq. (5.82), µJ min µJ min 2 2n 2 E [ v k ( n ) ] = ------------------ + ( 1 – µλ k ) v k ( ∞ ) – ------------------ 2 – µλ k 2 – µλ k Hence, with µ small compared to 2/λmax, µJ min 2 E [ v k ( ∞ ) ] = -----------------2 – µλ k
140
µJ min ≈ -------------- for small µ. 2 (b) From the Lyapunov equation derived in Problem 5.10, we have RK 0 ( n ) + K 0 ( n )R ≈ µJ min R,
µ small
where only the first term of the summation in the right-hand side of Eq. 8 in the solution to Problem 5.10 is retained. Taking the trace of both sides of this equation, and recognizing that tr [ RK 0 ( n ) ] = tr [ K 0 ( n )R ] we get for n = ∞ : 2tr [ RK 0 ( ∞ ) ] ≈ µJ min tr [ R ] From Eq. (5.90) of the text, µ J ex ( ∞ ) = tr [ RK 0 ( n ) ] = --- J min tr [ R ] 2 Hence, the misadjustment is J ex ( ∞ ) M ≈ ----------------J min µ = --- tr [ R ] 2 5.13
The error correlation matrix K(n) equals H
K ( n ) = E[ ∈ ( n ) ∈ ( n ) ] The trace of K(n) equals H
tr [ K ( n ) ] = tr { E[ ∈ ( n ) ∈ ( n ) ] } H
= E { tr[ ∈ ( n ) ∈ ( n ) ] }
141
Since H
H
tr[ ∈ ( n ) ∈ ( n ) ] = tr[ ∈ ( n ) ∈ ( n ) ] we may express the trace of K(n) as H
tr [ K ( n ) ] = E { tr[ ∈ ( n ) ∈ ( n ) ] } H
= tr { E[ ∈ ( n ) ∈ ( n )] } The inner product ∈H(n)∈(n) equals the squared norm of ∈(n) which is a scalar. Hence tr [ K ( n ) ] = E [ ∈ ( n )
2
]
(1)
From convergence analysis of the LMS algorithm, we have 2
K ( n+1 ) = ( I – µR )K ( n ) ( I – µR ) + µ J min R
(2)
Initially, ||∈(n)|| is so large that we may justifiably ignore the term µ2JminR, in which case Eq. (2) may be approximated as K ( n+1 ) = ( I – µR )K ( n ) ( I – µR ),
n small
(3)
Assuming that 2
R = σu I we may further reduce Eq. (3) to 2 2
K ( n+1 ) ≈ ( 1 – µσ u ) K ( n ) Thus, in light of Eq. (1) we may write E [ ∈ ( n+1 )
2
2 2
] ≈ ( 1 – µσ u ) E [ ∈ ( n )
2
],
The convergence ratio is therefore approximately
142
n small
2
E [ ∈ ( n+1 ) ] c ( n ) = ----------------------------------------2 E[ ∈ (n) ] 2 2
≈ ( 1 – µσ u ) 5.14
(a) The tap-weight update in the LMS algorithm is defined by ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e * ( n ) w where u(n) is the tap-input vector, and e(n) is the estimation error (i.e., error signal). There are two different cases to be considered: 1. The process from which u(n) is drawn is deterministic, and so is the desired response d(n): under these conditions, the estimation error e(n) is likewise deterministic. Correspondingly, the correction term µu(n)e*(n) computed for each iteration of the algorithm is a deterministic quantity. Hence, stating from a known ˆ ( 0 ) , the trajectory traced out by the evolution of w ˆ ( n ) with initial value w increasing number of iterations, n, can be determined exactly. In other words, when the LMS algorithm operates in a completely deterministic environment, exemplified by the sinusoidal example presented in Section 5.9, the algorithm exhibits a deterministic behavior. 2. The environment in which the LMS algorithm is stochastic: In this second case, the correction term µu(n)e*(n) assumes a stochastic form. Remembering that the product u(n)e*(n) represents an estimate of the gradient vector, we may (in this case) view the LMS algorithm as a stochastic gradient algorithm. (b) When white noise is added to the sinusoidal process of Example 7 in Section 5.9, the input vector u(n) takes on a stochastic form. Specifically, the correlation matrix R of u(n) now consists of the expression described in Eq. (5.135) of the text, plus σ 2v I . In ˆ ( n ) with increasing n follows a stochastic this new situation the evolution of w (random) trajectory. The ensemble-averaged value of this trajectory follows a path defined in terms of R.
5.15
(a) The update equations for the tandem configuration of the LMS filters are formulated as follows: H
ˆ 1 ( n )u ( n ) e1 ( n ) = d ( n ) – w H
H
ˆ 1 ( n+1 ) = w ˆ 1 ( n ) + µu ( n )e 1 ( n ) w
143
H
ˆ 2 ( n )u ( n ) e2 ( n ) = e1 ( n ) – w ˆ 2 ( n ) + µu ( n )e 2 ( n ) ˆ 2 ( n+1 ) = w w This assumes that both filters are updated with the same step-size µ. (b) As can be seen from Fig. P5.2 as well as from the equations in part (a) of this problem, ˆ 1 follows the exponential convergence of the LMS algorithm and the adaptation of w ˆ 2 . The adaptation of w ˆ 2 is, however, dependent on its behavior is unaffected by w H
H
ˆ 1 . Let R = [ u ( n ), u ( n ) ] = QΛQ , where R is the correlation matrix of the w input vector. We use the eigendecomposition of R. If we consider rotated representations of the weight vectors 1
H
1
H
ˆ1 vˆ 1 ( n ) = Q w ˆ2 vˆ 2 ( n ) = Q w H
and keep in mind that the signal is modelled by d ( n ) = w o u ( n ) + v ( n ) where v ( n ) is a zero mean random noise, we may show that the total MSE = MSE11 + MSE12 + MSE22. H
Hence, MSE ij = E[ ∈i ( n ) ∈ j( n ) ], where ( i, j ) = 1, 2 is the ordered number of the LMS filters, and ∈i ( n ) = weight-error vector of LMS filter i. This result clearly indicates that the tandem system converges in the mean-square sense if MSE11, MSE12, and MSE22 converge in the mean-square. 5.16
(a) The cost function is defined by 2 1 J ( n ) = --- e ( n ) 2
The derivative of J(n) with respect to the step-size parameter µ is *
* 1 ∂e ( n ) ∂e ( n ) ∂J ( n ) -------------- = --- e ( n ) ---------------- + e ( n ) -------------2 ∂µ ∂µ ∂µ
But
144
H
ˆ ( n )u ( n ) e(n) = d (n) – w Hence H
ˆ (n) H ∂w ∂e ( n ) -------------- = – -------------------u ( n ) = – γ ( n )u ( n ) ∂µ ∂µ ∂J ( n ) ∆µ ( n ) = – ρ -------------∂µ H * H ρ = --- [ e ( n )u ( n )γ ( n ) + e ( n )γ ( n )u ( n ) ] 2
where ρ is the learning-rate parameter for adjusting the step-size in an iterative manner. From the LMS algorithm, we have (assuming a variable step size) *
ˆ ( n+1 ) = w ˆ ( n ) + µ ( n+1 )u ( n )e ( n ) w Hence *
* ∂e ( n ) ˆ ( n+1 ) ˆ (n) ∂w ∂w ---------------------- = --------------- + u ( n )e ( n )µ ( n+1 )u ( n ) ---------------∂µ ∂µ ∂µ
That is, *
H
γ ( n+1 ) = γ ( n ) + u ( n )e ( n ) – µ ( n+1 )u ( n )u ( n )γ ( n ) H
*
= [ I – µ ( n+1 )u ( n )u ( n ) ]γ ( n ) + u ( n )e ( n ) Finally, stepping back by one iteration, we get the desired result H
*
γ ( n ) = [ I – µ ( n )u ( n-1 )u ( n-1 ) ]γ ( n-1 ) + u ( n-1 )e ( n-1 ) (b) We may now formulate the two recursions for an LMS algorithm with adjustable stepsize as follows: H * H 1 µ ( n ) = µ ( n-1 ) + --- ρ [ e ( n )u ( n )γ ( n ) + e ( n )γ ( n )u ( n ) ] 2
145
*
H
= µ ( n-1 ) + Re [ e ( n )γ ( n )u ( n ) ] *
ˆ ( n+1 ) = w ˆ ( n ) + µ ( n )u ( n )e ( n ) w 5.17
The tightening of the inequality in ERq. (5.145) is justified by invoking two relations: •
The update equation for the LMS algorithm: ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e * ( n ) w or, equivalently, ∈ ( n+1 ) = ∈ ( n ) – µu ( n )e * ( n ) H
ˆ (n)] = ∈ ( n ) – µu ( n ) [ d * ( n ) – u ( n )w •
The multiple regression model H
d (n) = w u(n) + ν(n) where w is the parameter vector of the model. Here it is assumed that the step-size parameter µ is maintained smaller than 1 ⁄ u ( n ) accordance with Eq. (5.146). 5.18
2
in
The input signal is the sum of two uncorrelated sinusoidal inputs: u ( n ) = A 0 cos ( w 0 n ) + A 1 cos ( w 1 n + φ )
(1)
where the phase difference φ is a random variable. [In the first printing of the book, the phase φ was unfortunately missed out.] Hence, the autocorrelation function of the input u(n) is the sum of the autocorrelation functions of the two sinusoidal inputs. The transfer function approach works on the premise that the LMS filter length M is large. Under this condition, the nonlinear LMS filter may be approximated as a linear discretetime system with an excitation H
ˆ ( 0 )u ( n ), x(n) = d (n) – w
d ( n ) = desired response ˆ ( 0 ) = initial tap-weight vector w
146
and the error H
ˆ ( n )u ( n ) e(n) = d (n) – w being the resulting response. Whenever we have a linear system, the principle of superposition applies. Invoking this principle, we may then state that, under the above-mentioned conditions, the LMS filter, with its input defined by Eq. (1), produces an output equal to the sum of two components, one produced by the sinusoidal input of frequency ω0 and the other by the second sinusoidal input of frequency ω1. Now, from the material on adaptive noise cancellation presented in Section 5.3 of the text, we know that the LMS filter with a sinusoidal input of frequency ω0 acts as a notch filter centered on ω0; see Fig. 5.8(b) of the text. Accordingly, we conclude that the LMS filter acting on the composite sinusoidal input of Eq. (1) acts as the sum of two notch filters, one centered on frequency ω0 and the other on ω1. 5.19
From Eq. (2.77) of the text, the cost function for a constrained MVDR beamformer is m-1 m-1
∑ ∑
J =
* wk wi r ( i
k=0 i=0
*
m-1
* – jθ 0 k
– k ) + Re λ ∑ w k e k=0
– g
(1)
where λ is the Lagrange multiplier defined in Eq. (2.84) 2g λ = – ----------------------------------------H –1 s ( θ 0 )R s ( θ 0 )
(2)
where g is a specified complex-valued gain, R is the correlation matrix of the input vector u(n), and s ( θ 0 ) is the steering vector directed at the target of interest. Combining Eqs. (1) and (2) and rewriting it in matrix form: H
* H
H
J = w Rw – ( k w s ( θ 0 ) + k s ( θ 0 )w + Re [ g ] )
(3)
where g k = ----------------------------------------H –1 s ( θ 0 )R s ( θ 0 )
(4)
147
which acts merely as a scaling factor ensuring that the gain of the beamformer along the direction θ 0 is equal to g. Differentiating the cost function J with respect to the weight vector w: * ∂J ------- = 2Rw – 2s ( θ 0 )k ∂w
(5)
Substituting R = u(n)uH(n) into Eq. (5), where u(n) is the input vector: H * ∂J ( n ) ˆ ( n ) – 2s ( θ 0 )k -------------- = 2u ( n )u ( n )w ∂w
(6)
Hence, the LMS algorithm for the MVDR beamformer is 1 ∂J ˆ ( n+1 ) = w ˆ ( n ) – --- µ ------w 2 ∂w H
ˆ ( n ) + s ( θ 0 )k = [ I – µu ( n )u ( n ) ]w
*
(7)
where µ is the step-size parameter. Comparing this algorithm with the LMS algorithm for temporal processing, we see that s ( θ 0 )k * plays the product role of desired response and input vector. As a check on the algorithm described in Eq. (7), suppose the step-size parameter µ is small enough to justify using the direct method. We may then replace u(n)uH(n) by R and write *
ˆ ( n+1 ) ≈ [ I – µR ]w ˆ ( n ) + µk s ( θ 0 ) w
(8)
ˆ (n) ≈ w ˆ ( n+1 ) , in which case Eq. (4) reduces to For n → ∞, w *
ˆ ( n ) ≈ k s ( θ0 ) , Rw
n large
or * –1
ˆ ( n ) ≈ k R s ( θ0 ) , w
n large
The corresponding beamformer output along θ 0 is, as desired,
148
H
* H
–1
ˆ ( n )s ( θ 0 ) = k s ( θ 0 )R s ( θ 0 ) w *
= g ,
n large
149
CHAPTER 6 6.1
(a) We note that 2
J ( n+1 ) = E [ e ( n+1 ) ] where H
e ( n+1 ) = d ( n+1 ) – w ( n+1 )u ( n+1 ) Hence, 2
H
J ( n+1 ) = E [ d ( n+1 ) – w ( n+1 )u ( n+1 ) ] H 2 H 1 1 = σ d – w ( n ) – --- µ ( n )∇ ( n ) p – p w ( n ) – --- µ ( n )∇ ( n ) 2 2 H 1 1 + w ( n ) – --- µ ( n )∇ ( n ) R w ( n ) – --- µ ( n )∇ ( n ) 2 2
Dropping the terms that are not a function of µ(n), and differentiating J(n+1) with respect to µ(n), we get H H ∂ ∂ 1 1 --------------J ( n+1 ) = -------------- --- µ ( n )∇ ( n )p + --- µ ( n )p ∇ ( n ) ∂µ ( n ) ∂µ ( n ) 2 2 H 1 1 – --- µ ( n )∇ ( n )R w ( n ) – --- µ ( n )∇ ( n ) 2 2 H 1 1 – --- µ ( n ) w ( n ) – --- µ ( n )∇ ( n ) R∇ ( n ) 2 2
}
H 1 H = --- ( ∇ ( n )p ∇ ( n ) ) 2 H H 1 H – --- ( ∇ ( n )Rw ( n ) + w ( n )R∇ ( n ) ) + µ ( n )∇ ( n )R∇ ( n ) 2
Setting the result equal to zero, and solving for µ(n):
150
H H H 1 H --- ( ∇ ( n )Rw ( n ) + w ( n )R∇ ( n ) – ∇ ( n )p – p ∇ ( n ) ) 2 µ o ( n ) = --------------------------------------------------------------------------------------------------------------------------------------H ∇ ( n )R∇ ( n )
(b) We are given that ∇ ( n ) = 2 ( Rw ( n ) – p ) Hence, Eq. (1) simplifies to H
∇ ( n )∇ ( n ) µ o ( n ) = ---------------------------------H ∇ ( n )R∇ ( n ) Using instantaneous estimates for R and ∇ ( n ) : ˆ = u ( n )u H ( n ) R H
ˆ ( n ) – u ( n )d * ( n ) ] ∇ ( n ) = 2 [ u ( n )u ( n )w *
= – 2u ( n )e ( n ) we find that the corresponding value of µ o ( n ) is H
2
u ( n )u ( n ) e ( n ) µˆ o ( n ) = ---------------------------------------------------2 H 2 ( u ( n )u ( n ) ) e ( n ) 1 1 = --------------------------- = -----------------H 2 u ( n )u ( n ) u(n) Correspondingly, we have * µ˜ ˆ ( n+1 ) = w ˆ ( n ) + ------------------u ( n )e ( n ) w 2 u(n)
which is recognized as the normalized LMS algorithm.
151
(1)
6.2
ˆ ( n+1 ) = w ˆ ( n+1 ) - ˆw ( n ) δw H 1 ˆ ( n+1 ) – w ˆ (n)] = ------------------u ( n )u ( n ) [ w 2 u(n) H H 1 ˆ ( n+1 ) – u ( n )w ˆ (n)] = ------------------u ( n ) [ u ( n )w 2 u(n) * H 1 ˆ ( n )) = ------------------u ( n )(d ( n ) – u ( n )w 2 u(n) * 1 = ------------------u ( n )e ( n ) 2 u(n)
6.3
The second statement is the correct one. The justification is obvious from the solution to Problem 2; see also Eq. (6.10) of the text. By definition, Since u ( n ) = [ u ( n ), u ( n-1 )…u ( n – M+1 ) ] ˆ ( n ) = [ w 0 ( n ), w 1 ( n )…w M-1 ( n ) ] w
T
T
ˆ ( n+1 ) = [ w 0 ( n+1 ), w 1 ( n+1 )…w M-1 ( n+1 ) ] w Replace all of these terms into the NLMS formula * u˜ ˆ ( n+1 ) – w ˆ ( n ) = ------------------u ( n )e ( n ) w 2 u(n)
and look at each element in the vector. We then find that the correct answer is * u˜ ˆ k ( n ) = ------------------u ( n – k )e ( n ) wˆ k ( n+1 ) – w 2 u(n)
6.4
ˆ ( n+1 ) = w ˆ ( n+1 ) – w ˆ (n) δw –1
ˆ ( n+1 ) – w ˆ (n)] = A ( n )A ( n ) [ w –1
ˆ ( n+1 ) – A ( n )w ˆ (n)] = A ( n ) [ A ( n )w
152
k = 0, 1, …, M-1
–1
ˆ (n)] = A ( n ) [ d ( n ) – A ( n )w –1 H 1 = A ( n ) --- A ( n )A ( n )λ , 2
from Eq. (6.44) of the text
H 1 –1 = --- A ( n )A ( n )A ( n )λ 2 –1 H 1 H = --- A ( n )2 ( A ( n )A ( n ) ) e ( n ) 2 H
H
–1
= A ( n ) ( A ( n )A ( n ) ) e ( n ) 6.5 Virtues
6.6
Limitations
LMS
Simple. Stable H∞ robust Model-independent
Slow convergence. Learning rate must have the dimension of inverse power.
NLMS
Convergent in the mean sense. Stable in mean-square sense. H∞ robust. Invariant to scaling factor of input. Dimensionless step-size. (special case of APAF).
Little increase in computational complexity. (compared to LMS)
APAF
More accurate due to the use of more information. Semi-batch learning. Faster convergence.
Increased computational complexity.
Provided we show, under different scaling situations, that the learning rules (weightupdate formula) are the same, then we can say the final solutions are the same under the same initial condition. (1) NLMS, unscaled: * µ˜ ˆ ( n+1 ) = w ˆ ( n ) + ------------------u ( n )e ( n ) w 2 u(n)
153
NLMS, scaled: H H µ˜ ˆ ( n )u ( n ) ] ˆ ( n+1 ) = w ˆ ( n ) + ---------------------- au ( n ) [ w ( n+1 )au ( n ) – aw w 2 au ( n ) H 2 H µ˜ ˆ ( n ) + ------------------------- a u ( n ) [ w ( n+1 )u ( n ) – w ˆ ( n )u ( n ) ] = w 2 2 a u(n) H µ˜ ˆ ( n )u ( n ) ] ˆ ( n ) + ------------------u ( n ) [ d ( n ) – w = w 2 u(n) * µ˜ ˆ ( n ) + ------------------u ( n )e ( n ) = w 2 u(n)
which is the same as the unscaled NLMS. (2) APAF Denote Ascaled = [au(n), au(n-1),...,au(n-N+1)] ˆ ( n+1 ) dscaled = Ascaled(n) w Unscaled case: H
H
–1
ˆ ( n+1 ) = w ˆ ( n ) + A ( n ) ( A ( n )A ( n ) ) e ( n ) w Scaled case: ˆ ( n+1 ) – w ˆ (n) w –1
ˆ ( n+1 ) – w ˆ (n)] = A scaled ( n )A scaled ( n ) [ w –1
ˆ ( n+1 ) – A scaled ( n )w ˆ (n)] = A scaled ( n ) [ A scaled ( n )w –1
= A scaled ( n ) [ d scaled ( n ) – A scaled ( n )u ( n ) ] –1 H 1 = A scaled ( n ) --- A scaled ( n )A scaled ( n )λ 2
154
–1 H 1 H = --- A scaled ( n )2 ( A scaled ( n )A scaled ( n ) ) e scaled ( n ) 2 H
2
–1
H
= aA ( n ) ( a A ( n )A ( n ) ) ae ( n ) H
H
–1
= A ( n ) ( A ( n )A ( n ) ) e ( n ) which is the same as the unscaled APAF. N -1
6.7
u(n) =
∑ wk u ( n – k ) + v ( n ) *
k=1
(a) The algorithms can be APAF formulated as follows: Step (1): Initialize AR coefficients w1,w2,...,wN-1 as random values Step (2): Suppose w(n) = [w1(n),...,wN-1(n)]H u(n) = [u(n-1),...,u(n-N+1)]H H
ˆ ( n )u ( n-1 ) + v ( n ) Hence, we have u ( n ) = w H
Suppose A ( n ) = [ u ( n-1 ), … , u ( n-N +1 ) ] H
ˆ (n) Calculate e ( n ) = u ( n ) – A ( n )w If e ( n )
2
≤α
(α is predefined small positive value)
go to end else go to step (3) ˆ (n) Step (3): Update w –1
H H ˆ ( n+1 ) = w ˆ ( n ) + µ˜ A ( n ) ( A ( n )A ( n ) ) e ( n ) w
go to Step (2).
155
H
H
H
–1
(b) φ ( n ) = [ I – A ( n ) ( A ( n )A ( n ) ) A ( n ) ]u ( n ) H
H
–1 H
= u ( n ) – A ( n ) ( A ( n )A ( n ) ) A ( n )u ( n ) = u ( n ) – Pu ( n ) where P is the projection operator. Geometrically, the distance (difference) between u(n) and its projected vector Pu(n) is the error (noise vector). Hence, with ν ( n ) assumed to be white Gaussian of zero mean, it follows that the elements of the vector φ(n) are themselves zero-mean white Gaussian processes. 6.8
(a) From Eq. (6.19), * 2 Re E [ ξ u ( n )e ( ( n ) ⁄ u ( n ) ) ] µ˜ opt = -----------------------------------------------------------------------------2 2 E [ e(n) ⁄ u(n) ] Assuming that the undisturbed estimation error ξ u ( n ) is equal to the disturbed estimation error (i.e., normal error signal) e(n), then we may put µ˜ opt ≈ 1 in which case Eq. (6.18) gives the bounds on µ˜ as 0 < µ˜ < 2 (b) For the APAF, from Eq. (6.56) we know –1 H H 2E Re ξ u ( n ) ( A ( n )A ( n ) ) e ( n ) 0 < u˜ < --------------------------------------------------------------------------------------------–1 H H E [ e ( n ) ( A ( n )A ( n ) ) e ( n ) ]
ˆ ( n ) is the undisturbed error vector, and where ξ u ( n ) = A ( n )w – A ( n )w
156
ˆ ( n+1 ) – A ( n )w ˆ ( n ) is the disturbed error vector. Here again, if these e ( n ) = A ( n )w two error vectors are assumed to be equal, then the bounds on µ˜ given in Eq. (6.56) reduce to 0 < µ˜ < 2 . 6.9
(a) For the NLMS filter, suppose M is the length of the filter. Examine the weight-update formula * u ( n )e ( n ) ˆ ( n+1 ) ˆ ( n ) + u˜ ------------------------= w w 2 u(n) H e(n) = d (n) - w ˆ ( n )u ( n ) The computation involving these calculations involves 5M multiplications (or division) and 2M additions. Hence, the computation complexity is O(M). (b) For the APAF, suppose N is the order of the filter. Examine the weight-update formula –1
H H ˆ ( n+1 ) = w ˆ ( n ) + u˜ A ( n ) ( A ( n )A ( n ) ) e ( n ) w H
A ( n ) = [ u ( n ), u ( n-1 ), …, u ( n-N +1 ) ] ˆ (n) e ( n ) = d ( n ) – A ( n )w d ( n ) = [ d ( n ), d ( n-1 ), …, d ( n-N +1 ) ] Here we see that the computation involved in APAF is about N times as that of NLMS. Hence, the computational complexity of APAF is O(MN).
157
CHAPTER 7 7.1
When M = 6, L = 4, we have for the kth block w0 ( k ) u ( 4k ) u ( 4k+1 ) u ( 4k+2 ) u ( 4k+3 )
u ( 4k-1 ) u ( 4k ) u ( 4k+1 ) u ( 4k+2 )
u ( 4k-2 ) u ( 4k-1 ) u ( 4k ) u ( 4k+1 )
u ( 4k-3 ) u ( 4k-2 ) u ( 4k-1 ) u ( 4k )
u ( 4k-4 ) u ( 4k-3 ) u ( 4k-2 ) u ( 4k-1 )
w1 ( k ) y ( 4k ) u ( 4k-5 ) u ( 4k-4 ) w 2 ( k ) = y ( 4k+1 ) y ( 4k+2 ) u ( 4k-3 ) w 3 ( k ) y ( 4k+3 ) u ( 4k-2 ) w ( k ) 4 w5 ( k )
For the (k+j)th, with j = +1, +2,..., we have w0 ( k ) u ( uk+4 j ) u ( uk+4 j+1 ) u ( uk+4 j+2 ) u ( uk+4 j+3 )
7.2
u ( uk+4 j-1 ) u ( uk+4 j ) u ( uk+4 j+1 ) u ( uk+4 j+2 )
u ( uk+4 j-2 ) u ( uk+4 j-1 ) u ( uk+4 j ) u ( uk+4 j+1 )
u ( uk+4 j-3 ) u ( uk+4 j-2 ) u ( uk+4 j-1 ) u ( uk+4 j )
u ( uk+4 j-4 ) u ( uk+4 j-3 ) u ( uk+4 j-2 ) u ( uk+4 j-1 )
w1 ( k ) u ( uk+4 j-5 ) y ( 4k+4 j ) u ( uk+4 j-4 ) w 2 ( k ) = y ( 4k+4 j+1 ) u ( uk+4 j-3 ) w 3 ( k ) y ( 4k+4 j+2 ) u ( uk+4 j-2 ) w ( k ) y ( 4k+4 j+3 ) 4 w5 ( k )
(a) Using the 2M-by-2M constraint matrix G, we may rewrite the cross-correlation vector φ(k) as follows: [ φ ( k ), 0, …, 0 ] T
T
–1 H
= G 1 F U ( k )E ( k )
Accordingly, the weight-update is rewritten in the compact form: ˆ ( k+1 ) = W ˆ ( k ) + µFG F – 1 U H ( k )E ( k ) W 1 ˆ ( k ) + µGU H ( k )E ( k ) = W where G = FG1F-1
158
The matrix operator F denotes discrete Fourier transformation, and F-1 denotes inverse discrete Fourier transformation. (b) With G2 = [0,I] the error vector E(k) is readily rewritten as follows: E ( k ) = F [ 0, …, 0, e ( k ) ] T
T
T T
= FG 2 e ( k ) (c) Using these matrix notations, the fast LMS algorithm may be reformulated as follows: Initialization: ˆ ( 0 ) = [ 0, 0, …, 0 ] T W i = 0, 1, …, 2M-1
P i ( 0 ) = δ i,
where M is the number of taps, and δi is a small positive number. Computation: For each new block of M input samples, compute the following: U ( k ) = diag ( F [ u ( kM-M ), …, u ( kM-1 ), u ( kM ), …, u ( kM+M-1 ) ] ) T
Y ( k ) = U ( k )W ( k ) –1
y ( k ) = G2 F Y ( k ) e(k ) = d(k ) – y(k ) T
E ( k ) = FG 2 e ( k ) 2
P i ( k ) = γ P i ( k-1 ) + ( 1 – γ ) U i ( k ) ,
i = 0, 1, …, 2M-1
µ ( k ) = diag ( P 0 ( k ), P 1 ( k ), …, P 2M-1 ( k ) ) –1
–1
–1
159
ˆ ( k+1 ) = W ˆ ( k ) + FG F – 1 µ ( k )U ( k )E ( k ) W 1 ˜ (d) When the constraint gradient G is equal to the identity matrix, the fast LMS algorithm reduces to its unconstrained frequency-domain form. 7.3
Removing the gradient constraint shown in Fig. 7.3 permits the adaptive filter to perform circular convolution instead of linear convolution. (Circular convolution is the form of convolution performed by the discrete Fourier transform.) This modification may be acceptable in those applications where there is no particular concern as to how the input signal is used to minimize the mean-square value of the error signal (estimation error). One such application is that of noise (interference) cancellation.
7.4
(a) From Fig. 7.1, we note that the z-transform of xk(n) is related to the z-transform of the input u(n) as follows: –M
1–z z [ x k ( n ) ] = ------------------------------z [ u ( n ) ], 1–e
k = 0, 1, …, M-1
j2π – --------M –1
z
Cross-multiplying and rearranging terms:
z [ xk ( n ) ] = e
j2π – --------M –1
z
⋅ z [ xk ( n ) ] + z [ u ( n ) ] – z
–M
⋅ z[u(n)]
Taking inverse z-transforms:
xk ( n ) = e
j2π – --------- k M
x k ( n-1 ) + u ( n ) – u ( n-M ),
k = 0, 1, …, M-1
(b) We are given that –1
h ( n+1 ) = h ( n ) + µ˜ D x ( n )e * ( n )
(1)
The error signal is H
e ( n ) = d ( n ) – h ( n )x ( n ) T
= d ( n ) – x ( n )h * ( n )
(2)
Hence, substituting Eq. (2) into (1), and rearranging:
160
–1
H
–1
*
h ( n+1 ) = [ I – µ˜ D x ( n )x ( n ) ]h ( n ) + µ˜ D x ( n )d ( n ) Taking the expectation of both sides of this equation, and invoking the properties of LMS filters with a small step size: –1
–1
E [ h ( n+1 ) ] = [ I – µ˜ D R x ]E [ h ( n ) ] + µ˜ D p x
(3)
where Rx is the correlation matrix of the input vector x(n) and px is the crosscorrelation vector between x(n) and d(n). Assuming convergence in the mean, E [ h ( n+1 ) ] = E [ h ( n ) ] = h o as n → ∞ Hence, from Eq. (3) we deduce that R x ho = p x
(4)
But xk(n) is related to the input vector u(n) by the DFT formula: M-1
xk ( n ) =
∑ u ( n – i )e
j2π – --------- ik M
i=0 T
H
= u ( n )ε *k = ε k u ( n )
where ε k = 1, e
j2π --------- k M
, …, e
j2π T --------- ( M-1 ) M
Hence, the vector x(n) is related to u(n) by H
ε0
H
ε1
u(n)
...
x(n) =
H
ε M-1 =
MQu ( n )
161
where Q is the M-by-M unitary matrix
e
j2π – --------M
1 e
1
… e
...
1 ...
1 Q = --------M
…
1
– j 2π ------------ ( M-1 ) M
...
1
– j 2π ------------ ( M-1 ) M
…e
(5)
– j 2π 2 ------------ ( M-1 ) M
Accordingly, we have H
R x = E [ x ( n )x ( n ) ] H
H
= E [ Qu ( n )u ( n )Q ] = QRQ
H
(6)
where R is the correlation matrix of the input vector u(n). Similarly, we may write p x = E [ x ( n )d * ( n ) ] *
= E [ Qu ( n )d ( n ) ] (7)
= Qp
where p is the cross-correlation vector between u(n) and d(n). Next, substituting Eqs. (6) and (7) into (4): H
QRQ h o = Qp The matrix Q is nonsingular, and its inverse matrix therefore exists. Hence, H
RQ h o = p We recognize the Wiener solution as –1
wo = R p
162
It follows therefore that H
Q ho = wo The matrix Q is in actual fact a unitary matrix; we may thus also write h o = Qw o The unitary matrix Q is related to the DFT by Eq. (5). (c) (i)
From part (b), we have h o = Qw o –1
= QR p –1
H
= QR Q Qp –1
H
We recognize QR Q as the correlation matrix of the tap-input vector x(n), and Qp as the cross-correlation vector between x(n) and the desired response –1
H
d(n). From the unitary similarity transformation, QR Q equals a diagonal matrix. This is functionally equivalent to prewhitening the original input vector u(n) characterized by the correlation matrix R.
7.5
(ii)
The eigenvalue spread χ(R) of the original input vector u(n) is, in general, greater than unity. On the other hand, the eigenvalue spread χ(R) of the DFT output vector x(n), which is prewhitened by the DFT, is closer to unity. In general, we therefore find that the eigenvalue spread of u(n) is larger than that of x(n).
(iii)
The frequency-domain LMS algorithm has a faster rate of convergence than the conventional LMS algorithm, because of the result reported in sub-part (ii) of (c).
(a) We start with m m⁄2 1 C m ( n ) = --- k m ( – 1 ) W 2M A m ( n ) 2 m m ⁄ 2 (1) (2) 1 = --- k m ( – 1 ) W 2M ( A m ( n ) + A m ( n ) 2
163
(1) (2) 1 = --- k m [ C m ( n ) + C m ( n ) ], 2
m = 0, 1, …, M-1
where (1)
m
m ⁄ 2 (1)
m
m⁄2
C m ( n ) = ( – 1 ) W 2M A m ( n ) (1)
m
m
= ( – 1 ) W 2M [ W 2M A m ( n-1 ) + u ( n ) – ( – 1 ) u ( n-M ) ] m⁄2
m ⁄ 2 (1)
m
= W 2M [ W 2M C m ( n-1 ) + ( – 1 ) u ( n ) – u ( n-M ) ] (2)
m
m ⁄ 2 (2)
m
m⁄2
C m ( n ) = ( – 1 ) W 2M A m ( n ) (2)
–m
–m
m
= ( – 1 ) W 2M [ W 2M A m ( n-1 ) + W 2M ( u ( n ) – ( – 1 ) u ( n-M ) ) ] –m ⁄ 2
–m ⁄ 2 ( 2 )
m
= W 2M [ W 2M C m ( n-1 ) + ( – 1 ) u ( n ) – u ( n-M ) ] m⁄2
(1)
(b) Basically, the terms W 2M , C m ( n-1 ) and u(n-m) involved in the computation of (1)
–m ⁄ 2
(2)
C m ( n ) , and likewise the terms W 2M , C m ( n-1 ) and u(n-m) involved in the (2)
computation of C m ( n ) are all multiplied by the constant β, to ensure stability of the algorithm. 7.6
Using noble identity 1 u(n)
↓L
G(z)
y(n)
≡
u(n)
G(zL)
↓L
y(n)
Now put L = 2. Noting that in the z-domain, the 2-band down-sampling operator x(n) 2 ↓
y(n)
can be expressed by 1⁄2 1⁄2 1 Y ( E ) = --- [ X ( z ) + X ( – z ) ] 2
(1)
164
Observing the bottom part of Fig. 7.13 in the text, we may formulate the synthesis filter output as upper
1⁄2 ˆ 1⁄2 1⁄2 1⁄2 ˆ 1⁄2 1⁄2 1 --- [ U ( z )W ( z )H 1 ( z ) + U ( – z )W ( – z )H 1 ( – z ) ] 2
(2)
lower
1⁄2 ˆ 1⁄2 1⁄2 1⁄2 ˆ 1⁄2 1⁄2 1 --- [ U ( z )W ( z )H 0 ( z ) + U ( – z )W ( – z )H 1 ( – z ) ] 2
(3)
ˆ ( z ) [see Eq. (7.79) of the text] By polyphase decomposition of W ˆ (z) = W ˆ 1 ( z2 ) + z–1 W ˆ 2 ( z2 ) W it follows that ˆ ( z1 ⁄ 2 ) = W ˆ 1 ( z ) + z–1 ⁄ 2 W ˆ 2(z) W
(4)
SubstitutingEq. (4) into (2) and rearranging the terms, we have 1⁄2 1⁄2 1⁄2 1⁄2 ˆ 1 --- [ U ( z )H 1 ( z ) + U ( – z )H 1 ( – z ) ]W 1(z) 2 1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 ˆ 1 + --- [ U ( z )H 1 ( z )z + U ( – z )H 1 ( – z )z ]W 2(z) 2
(5)
Similarly, substituting Eq. (4) into (3), we have 1⁄2 1⁄2 1⁄2 1⁄2 ˆ 1 --- [ U ( z )H 0 ( z ) + U ( – z )H 0 ( – z ) ]W 1(z) 2 1⁄2 1 ⁄ 2 –1 ⁄ 2 1⁄2 1 ⁄ 2 –1 ⁄ 2 ˆ 1 + --- [ U ( z )H 0 ( z )z + U ( – z )H 0 ( – z )z ]W 2 ( z ) 2
(6)
Equation (5) corresponds to the description of the upper-bottom part of Fig. 7.15 (following from Eq. (1). Equation (6) corresponds to the description of the lower-bottom part of Fig. 7.15. The equivalence between the two figures, 7.13 and 7.15, is established. 7.7
Consider the case when the input u(n) is complex-valued. Suppose u ( n ) = u I ( n ) + ju Q ( n )
165
ˆˆ ( n ) ˆ (n) = w ˆ I ( n ) + jw w Q e ( n ) = [ e 1, I ( n ) + je 1, Q ( n ), e 2, I ( n ) + je 2, Q ( n ) ] = ( [ e 1, I ( n ), e 2, I ( n ) ] + j [ e 1, Q ( n ), e 2, Q ( n ) ] )
P =
α1 0 0 α2
Define the cost function * 1 H J ( n ) = --- e ( n )Pe ( n ) 2 2 2 2 2 1 = --- [ ( α 1 e 1, I + e 1, Q ) + α 2 ( e 2, I + e 2, Q ) ] 2
Suppose ) ˆ 1, k = W ˆ (1I, )k + jW ˆ (1Q W ,k ) ˆ 2, k = W ˆ (2I, )k + jW ˆ (2Q W ,k
Similar to the real-valued case, we may derive ∂J ( n ) ˆ (1I, )k ( n ) + µ --------------ˆ (1I, )k ( n+1 ) = W W ˆ (1I, )k ∂W ∂J ( n ) ˆ (2I, )k ( n+1 ) = W ˆ (2I, )k ( n ) + µ --------------W (I ) ˆ ∂W 2, k ) ∂J ( n ) ˆ (1Q ˆQ W , k ( n+1 ) = W 1, k ( n ) + µ --------------(Q) ˆ ∂W 1, k ) ∂J ( n ) ˆ (2Q ˆQ W , k ( n+1 ) = W 2, k ( n ) + µ --------------) ˆ (2Q ∂W ,k
From Eq. (1), we also have
166
(1)
∂e 1, I ( n ) ∂e 1, Q ( n ) ∂J ( n ) ---------------- = α 1 e 1, I ( n ) --------------------- + e 1, Q ( n ) ----------------------ˆ (1I, )k ˆ (1I, )k ˆ (1I, )k ∂W ∂W ∂W ∂e 2, I ( n ) ∂e 2, Q ( n ) + α 2 e 2, I ( n ) --------------------- + e 2, Q ( n ) ----------------------ˆ (1I, )k ˆ (1I, )k ∂W ∂W
(2)
∂e 1, I ( n ) ∂e 1, Q ( n ) ∂J ( n ) ---------------- = α 1 e 1, I ( n ) --------------------- + e 1, Q ( n ) ----------------------ˆ (2I, )k ˆ (2I, )k ˆ (2I, )k ∂W ∂W ∂W ∂e 2, I ( n ) ∂e 2, Q ( n ) + α 2 e 2, I ( n ) --------------------- + e 2, Q ( n ) ----------------------ˆ (2I, )k ˆ (2I, )k ∂W ∂W
(3)
∂e 1, I ( n ) ∂e 1, Q ( n ) ∂J ( n ) ---------------- = α 1 e 1, I ( n ) --------------------- + e 1, Q ( n ) ----------------------) ) ) ˆ (1Q ˆ (1Q ˆ (1Q ∂W ∂W ∂W ,k ,k ,k ∂e 2, I ( n ) ∂e 2, Q ( n ) + α 2 e 2, I ( n ) --------------------- + e 2, Q ( n ) ----------------------) ) ˆ (1Q ˆ (1Q ∂W ∂W ,k ,k
(4)
∂e 1, I ( n ) ∂e 1, Q ( n ) ∂J ( n ) ---------------- = α 1 e 1, I ( n ) --------------------- + e 1, Q ( n ) ----------------------) ) ) ˆ (2Q ˆ (2Q ˆ (2Q ∂W ∂W ∂W ,k ,k ,k ∂e 2, I ( n ) ∂e 2, Q ( n ) + α 2 e 2, I ( n ) --------------------- + e 2, Q ( n ) ----------------------) ) ˆ (2Q ˆ (2Q ∂W ∂W ,k ,k
(5)
Similarly, we obtain for i = 1,2 and j = 1,2 ∂e i, I ( n ) (I ) -------------------- = – X ij ( n – k ) ˆ (j,I )k ∂W
∂e i, Q ( n ) (Q) --------------------- = – X ij ( n – k ) ˆ (j,I )k ∂W
(6)
∂e i, Q ( n ) (Q) --------------------- = – X ij ( n – k ) (Q) ˆ j, K ∂W
∂e i, I ( n ) (I ) -------------------- = – X ij ( n – k ) (Q) ˆ j, K ∂W
(7)
Substituting Eqs. (6) and (7) into (2)-(5), we finally get (Q) ˆ (1I, )k ( n+1 ) = W ˆ (1I, )k ( n ) + µ [ α e ( n )X ( I ) ( n-k ) – α e W 1 1, I 11 1 1, Q ( n )X 11 ( n-k ) ]
167
(I )
(Q)
+ α 2 e 2, I ( n )X 21 ( n-k ) – α 2 e 2, Q ( n )X 21 ( n-k )] ) (Q) (I ) ˆ (Q) ˆ (1Q W , k ( n+1 ) = W 1, k ( n ) + µ [ α 1 e 1, I ( n )X 11 ( n-k ) – α 1 e 1, Q ( n )X 11 ( n-k ) ] (Q)
(I )
+ α 2 e 2, I ( n )X 21 ( n-k ) – α 2 e 2, Q ( n )X 21 ( n-k )] (Q) ˆ (2I, )k ( n+1 ) = W ˆ (2I, )k ( n ) + µ [ α ( e ( n )X ( I ) ( n-k ) + α e W 1 1, I 12 1 1, Q ( n )X 12 ( n-k ) ) ] (I )
(Q)
+ α 2 e 2, I ( n )X 22 ( n-k ) + α 2 e 2, Q ( n )X 22 ( n-k )] ) (Q) (I ) ˆ (2Q ˆ (Q) W , k ( n+1 ) = W 2, k ( n ) + µ [ α 1 ( e 1, I ( n )X 11 ( n-k ) + α 1 e 1, Q ( n )X 12 ( n-k ) ) (Q)
(I )
+ α 2 ( e 2, I ( n )X 22 ( n-k ) + α 2 e 2, Q ( n )X 22 ( n-k )] ) 7.8
Using NLMS algorithm, T M suppose x 1, 1 = x 11 ( n ), x 11 ( n-1 ), …, x 11 n – ----- +1 2 T M x 2, 1 = x 21 ( n ), x 21 ( n-1 ), …, x 21 n – ----- +1 2
Applying the NLMS version of Eq. (7.91) of the text, we may write α 1 e 1 ( n )x 11 ( n-k ) α 2 e 2 ( n )x 21 ( n-k ) ˆ 1, k ( n+1 ) = W ˆ 1, K ( n ) + µ˜ ----------------------------------------W - + -----------------------------------------2 2 x 1, 1 x 2, 1 Similarly, the NLMS version of Eq. (7.92) is α 1 e 1 ( n )x 12 ( n-k ) α 2 e 2 ( n )x 22 ( n-k ) ˆ 2, k ( n+1 ) = W ˆ 2, k ( n ) + µ˜ ----------------------------------------W - + -----------------------------------------2 2 x 1, 2 x 2, 2 where T M x 1, 2 = x 12 ( n ), x 12 ( n-1 ), …, x 12 n – ----- +1 2 T M x 2, 2 = x 22 ( n ), x 22 ( n-1 ), …, x 22 n – ----- +1 2
168
The advantages of using the NLMS algorithm in subband adaptive filters: 1. Time-varying learning rate, 2. faster convergence speed, and 3. invariance to scaling of input data.
169
CHAPTER 8 The Hermitian transpose of data matrix A is defined by
=
...
H
...
A
u ( 1, 1 ) u ( 1, 2 ) … u ( 1, n ) u ( 2, 1 ) u ( 2, 2 ) … u ( 2, n ) ...
u ( M, 1 ) u ( M, 2 ) … u ( M, n ) where u(k,i) is the output of sensor k in the linear array at time i, where k = 1, 2, …, M , and i = 1, 2, …, n . (a) The matrix product AHA equals: n
n
∑ u ( 1, i )u* ( 1, i )
∑ u ( 1, i )u* ( M, i )
i=1 n
i=1 n
i=1 n
i=1
i=1
∑ u ( 2, i )u* ( 2, i )
n
...
...
∑ u ( 2, i )u* ( 1, i )
n
… …
∑ u ( 2, i )u* ( M, i )
i=1
...
H
A A =
n
∑ u ( 1, i )u* ( 2, i ) …
n
∑ u ( M, i )u* ( 1, i )
∑ u ( M, i )u* ( 2, i ) …
∑ u ( M, i )u* ( M, i )
i=1
i=1
i=1
This represents the M-by-M spatial (deterministic) correlation matrix of the array with temporal averaging applied to each element of the matrix. This form of averaging assumes that the environment in which the array operates is temporally stationary. (b) The matrix product AAH equals
H
=
M
∑ u* ( k, 1 )u ( k, 2 ) … ∑ u* ( k, 1 )u ( k, n )
k=1 M
k=1 M
k=1 M
k=1
k=1
k=1
∑ u* ( k, 2 )u ( k, 1 ) M
∑ u* ( k, 2 )u ( k, 2 ) … ∑ u* ( k, 2 )u ( k, n ) ...
AA
∑ u* ( k, 1 )u ( k, 1 )
M
...
M
...
8.1
∑ u* ( k, n )u ( k, 1 )
M
M
∑ u* ( k, n )u ( k, 2 ) … ∑ u* ( k, n )u ( k, n )
k=1
k=1
k=1
170
This second matrix represents the n-by-n temporal (deterministic) correlation matrix of the array with spatial averaging applied to each element of the matrix. This form of averaging assumes that the environment is spatially stationary. 8.2
ˆ is consistent if We say that the least-squares estimate w 2
ˆ – wo ] = 0 lim E [ w
(1)
N→∞
We note that 2
H
ˆ – wo ] = E [ ( w ˆ – wo ) ( w ˆ – wo ) ] E[ w H ˆ – wo ) ( w ˆ – wo ) ] = tr E [ ( w H ˆ – wo ) ( w ˆ – wo ) ] = E tr [ ( w H ˆ – wo ) ( w ˆ – wo ) ] = E tr [ ( w H ˆ – wo ) ] ˆ – wo ) ( w = tr E [ ( w = tr [ K ] 2
–1
= σ tr [ Φ ]
(2)
where the correlation matrix Φ is dependent on n. Substituting Eq. (2) in (1): 2
ˆ – wo ] = lim E [ w
N→∞
2
–1
lim σ tr [ Φ ]
N→∞
ˆ is consistent if This result shows that w –1
lim tr [ Φ ] = 0
N→∞
171
8.3
The data matrix is
A =
2 3 1 2 –1 1
The desired response vector is 2 d = 1 1 ⁄ 34 The tap-weights vector of the linear least-squares filter is T
–1 T
ˆ = (A A) A d w
(1)
We first note that
T A A = 2 1 –1 3 2 1
2 3 1 2 –1 1
= 6 7 7 14 T det ( A A ) = 6 7 = 35 7 14
T
(A A)
–1
1 = ------ 14 7 = 35 – 7 6
2 --5 1 – --5
1 – --5 6 -----35
Hence, using Eq. (1) we get
172
2 --ˆ = 5 w 1 – --5 2 --= 5 1 – --5
1 2 – --5 2 1 –1 1 6 3 2 1 -----1 ⁄ 34 35
1 – --5 6 -----35
169 --------34 273 --------34
= 0.382 0.382 8.4
Express the transfer function of the (forward) prediction-error filter as follows (see Fig. 1): ′
–1
H f ( z ) = ( 1 – z i z )H f ( z ) where M
H f (z) =
∑ a*M, k z
–k
,
a M, 0 = 1
k=0
and zi = ρi e
jθ i
Fig. 1:
u(n)
Hf′ (z)
g(n) 1 - ziz-1
From this figure we note that –1
Z [ f M ( n ) ] = ( 1 – z i z )Z [ g ( n ) ] Hence f M ( n ) = g ( n ) – z i g ( n-1 )
173
fM(n)
The prediction-error energy equals (according to the autocorrelation method) ∞
∑
εf =
f M (n)
2
n=1 ∞
∑ [ g ( n ) – zi g ( n-1 ) ] [ g* ( n ) – z*i g* ( n-1 ) ]
=
n=1
With z i = ρ i e ∞
εf =
jθ i
, we may expand the expression for the prediction-error energy as
∑ g(n)
2
n=1
2
2
+ ρ i g ( n-1 ) – 2Re ρ i e
jθ i
g ( n-1 )g * ( n )
Differentiate εf with respect to ρi: ∞ ∞ ∂ε f jθ i 2 * --------- = 2ρ i ∑ g ( n-1 ) – 2Re ∑ e g ( n-1 )g ( n ) ∂ρ i n=1 n=1
(1)
From the Cauchy-Schwartz inequality, we have ∞
Re
∑e
jθ i
∞
*
g ( n-1 )g ( n ) ≤
n=1
∑
*
g (n)
2
1⁄2
n=1
∞
∑
e
jθ i
g ( n-1 )
2 1⁄2
n=1
Here we note that ∞
∑
e
jθ i
g ( n-1 )
∞
2
=
n=1
∑
g ( n-1 )
2
n=1 ∞
=
∑
g(n)
2
(2)
n=0
where it is recognized that in the autocorrelation method g(n) is zero for n < 0. Accordingly, we may rewrite the Cauchy-Schwartz inequality as
174
∞
Re
∑e
jθ i
∞
*
g ( n-1 )g ( n ) ≤
∑
g(n)
2
(3)
n=0
n=1
Using Eqs. (1) - (3), we thus have ∞
2 ∂ε -------- ≥ 2 ( ρ i – 1 ) ∑ g ( n ) ∂ρ i
(4)
n=0
For ρi > 1, the right-hand side of Eq. (4) is always greater than or equal to zero. The jθ
derivative ∂ε f ⁄ ( ∂ρ i ) is zero if, and only if, ρi = 1 and g * ( n ) = g ( n-1 )e i for n > 0. In such a case u(n) and g(n) are identically zero. Thus ∂ε ⁄ ( ∂ρ i ) > 1 for ρi > 1, and so we conclude that all the zeros of the transfer function Hf(z) must lie inside the unit circle. That is, Hf(z) is minimum phase. 8.5
For forward linear prediction we have (a) The data matrix is
u(1)
...
...
...
u ( M ) u ( M+1 ) … u ( N -1 ) H A = u ( M-1 ) u ( M ) … u ( N -2 ) u(2)
… u ( N -M ) H
The correlation matrix is Φ = A A (b) The desired response vector is d
H
= [ u ( M+1 ), u ( M+2 ), …, u ( N ) ]
The cross-correlation vector is AHd. (c) The minimum value of Ef is H
H
H
–1 H
E f , min = d d – d ( A A ) A d 8.6
For backward linear prediction we have
175
u* ( 3 )
… u * ( N -M+1 )
u* ( 3 )
u* ( 4 )
… u * ( N -M+2 )
...
...
u* ( 2 ) ...
(a) The data matrix is
u * ( M+1 ) u * ( M+2 ) …
u* ( N )
Note the difference between this data matrix and that for forward linear prediction. The correlation matrix is H
Φ = A A (b) The data vector is d
H
= [ u * ( 1 ), u * ( 2 ), …, u * ( N -M ) ]
The cross-correlation vector is AHd. (c) The minimum value of Eb is H
H
H
–1
E b, min = d d – d ( A A ) d 8.7
The data matrix is A
H
= [ u ( M ), u ( M+1 ), …, u ( N ) ]
The desired response vector is d
H
= [ d ( M ), d ( M+1 ), …, d ( N ) ]
The cost function is H
H
H
H
E ( w ) = d d – w z – z w + w Φw where z is the cross-correlation vector: z = AHd
176
and Φ is the correlation matrix H
Φ = A A Differentiating the cost function with respect to the weight vector: ∂E ( w ) ----------------- = – 2z + 2Φw ∂(w) Setting this result equal to zero and solving for the optimum w, we get –1
ˆ = Φ z w –1 H
H
= (A A) A d K
8.8
E reg =
∑
H
2
H
w u(n) + λ(w s(θ) – 1) + δ w
2
n=1
Taking the derivative of Ereg with respect to the weight vector w and setting the result equal to zero K ∂E reg H H -------------- = ∑ 2w u ( n )u ( n ) + λs ( θ ) + 2δw = 0 ∂w n=1
Hence, –λ s ( θ ) ˆ = --------------------------------------------------------w K H 2 ∑ u ( n )u ( n ) + δI n=1 H
ˆ s(θ) = 1 , Since w 2
(1)
K
H ˆ ∑ u ( n )u ( n ) + δI –λ = 2 w n=1
(2)
By the virtue of Eq. (8.80) in the text, we may have
177
1 ˆ = – --- λs ( θ ) Φw 2
(3)
Use Eqs. (1) and (2) in (3) and rearrange terms, obtaining K
Φ =
∑ u ( n )u
H
( n ) + δI
n=1
8.9
Starting with the cost function N
E =
∑
2
2
( f M ( i ) + bM ( i ) )
i=M+1 H
f M ( i ) = a M u M+1 ( i ) H
T
a M = [ 1, – w 1, – w 2, …, – w M ] = [ 1, – w ] H
u M+1 ( i ) = [ u ( i ), u ( i-1 ), …, u ( i-M ) ] H B
b M ( i ) = a M u M+1 ( i ) BH
u M+1 ( i ) = [ u ( i-M ), u ( i-M+1 ), …, u ( i ) ] (a) Using these definitions, rewrite the cost function as N
∑
E =
H
H
H B
i=M+1
By definition,
aM =
BH
( a M u M+1 ( i )u M+1 ( i )a M + a M u M+1 ( i )u M+1 ( i )a M )
u(i) 1 and u M+1 ( i ) = u ( i-1 ) –w M
Hence,
178
N
∑
E =
2
H
H
H
( u(i) + w Φ f w – θ f w – w θ f
i=M+1 2
H
H
H
+ u ( i – M ) + w Φb w – θb w – w θb ) where N
∑
Φf =
H
u M ( i-1 )u M ( i-1 )
i=M+1 N
∑
θf =
*
u M ( i-1 )u ( i )
i=M+1 N
∑
Φb =
B*
BH
u M ( i )u M ( i )
i=M+1 N
∑
θb =
B
*
u M ( i )u ( i-M )
i=M+1
Setting ∂E ------- = 0 ∂w and solving for the optimum value of w, we obtain –1
ˆ = Φ θ w where N
Φ = Φ f + Φb =
∑
H
B*
BH
[ u M ( i-1 )u M ( i-1 ) + u M ( i )u M ( i ) ]
i=M+1 N
θ = θ f + θb =
∑
B*
[ u M ( i-1 )u ( i ) + u M ( i )u ( i-M ) ]
i=M+1
179
(b) Finally N
∑
E min = E f , min + E b, min =
2
2
H
ˆ [ u ( i ) + u ( i-M ) ] – θ w
i=M+1
(c) aˆ =
1 , ˆ –w
Φaˆ =
E min 0
N
∑
Φ = Φ f + Φb =
H
B*
BH
[ u M ( i-1 )u M ( i-1 ) + u M ( i )u M ( i ) ]
(1)
i=M+1
Examining Eq. (1) and noting that H
u M ( i ) = [ u ( i ), u ( i-1 ), …, u ( i-M ) ] BH
u M ( i ) = [ u ( i-M ), u ( i-M+1 ), …, u ( i ) ] we easily find that
8.10
φ ( M-k, M-t ) = φ ( t, k ) ,
*
0 ≤ ( k, t ) ≤ M
*
φ ( k, t ) = φ ( t, k ) ,
0 ≤ ( k, t ) ≤ M
The SIR maximization problem may be stated as follows: w H ss H w max --------------------- w H w Rw
H
subject to C N -1 w = f N -1
(a) Using Lagrang’s method, the SIR maximization problem can be written as a minimization problem 1 H H H minJ ( w ) = min --- w Rw + λ ( C N -1 w – f N -1 ) 2 where λ
H
= [ λ 1 …λ N -1 ] is the Lagrange vector (0 < λi < 1)
180
∂J ( w ) Set --------------- = 0 , obtaining ∂w H
That is, Rw + C N -1 λ = 0 –1
w opt = – R C N -1 λ
(1)
H
H
–1
Since C N -1 w opt = f N -1 = – C N -1 R C N -1 λ H
–1
λ = – [ C N -1 R C N -1 ]
–1
f N -1
(2)
Using Eq. (2) in (1): –1
H
–1
w opt = R C [ C R C ] –1
H
–1
f
–1
(b) When w opt = R C [ C R C ] H
H
–1
f , the SIR becomes
H
H
H
–H
w ss w w ss w ---------------------- = --------------------------------------------------------------------------------------------------------H H H –1 –H H –H H –1 –1 w Rw f [C R C] C R C[C R C] f H
H
–1
–H
H
–1
H
–1
–1
f [ C R C ] C R ss R C [ C R C ] f = ----------------------------------------------------------------------------------------------------------------------------H H –1 –H H –H H –1 –1 f [C R C] C R C[C R C] f (c) When there are no auxiliary constraints Cn-1 = 0, the fixed value fo will only determine the normalization of the solution yielding an unconditional maximum for SIR: –1
w opt = αR s H
(α is constant) –H
H
–1
H –1 s R ss R s SIR max = ---------------------------------------- = s R s H –H s R s
ˆ ( n ) = 1--(d) R n
n
∑ u ( i )u
H
(i)
i=1
ˆ ( n ) = λR ˆ ( n-1 ) + u ( n )u H ( n ) R
181
where 0 < λ < 1; the parameter λ is a weighting factor, not to be confused with the Lagrange multiplier. 8.11
(a) We are given
A =
1 –1 0.5 2
Therefore, T
1 0.5 1 – 1 – 1 2 0.5 2
A A =
= 1.25 0 0 5 This is a diagonal matrix. Hence, its eigenvalues are λ1 = 1.25 λ2 = 5 The singular values of matrix A are therefore σ1 =
λ1 =
5 1.25 = ------2
σ2 =
λ2 =
5
The eigenvectors of ATA are the right singular vectors of A. For the problem at hand, the eigenvectors of ATA constitute a unit matrix. We therefore have V = 10 01 (b) The matrix product AAT is
AA
T
=
1 – 1 1 0.5 0.5 2 – 1 2
182
2 – 1.5 – 1.5 4.25
=
The eigenvalues of AAT are the roots of the characteristic equation 2
( 2 – λ ) ( 4.25 – λ ) – ( 1.5 ) = 0 Expanding this equation: 2
λ – 6.25λ + 6.25 = 0 Solving for the roots of this quadratic equation: λ1 = 1.25 λ2 = 5 The singular values of the data matrix A are therefore σ1 =
5 λ 1 = ------2
σ2 =
λ2 =
5
which are identical to the values calculated in part (a). To find the eigenvectors of AAT, we note that 2 – 1.5 q 11 = 1.25 q 11 – 1.5 4.25 q 12 q 12 Hence, 0.75 q11 - 1.5 q12 = 0 Equivalently, q11 = 2 q12 2
2
Setting q 11 + q 12 = 1 , we find that
183
1 2 q 12 = ± ------- and q 12 = ± ------5 5 We may thus write 1 q 1 = ------- 1 5 2 Similarly, we may show that 1 q 2 = ------- 2 5 –1 The eigenvectors of AAT are the left singular vectors of A. We may thus express the left singular vectors of matrix A as 1 U = ------- 1 2 5 2 –1 The pseudoinverse of the matrix A is given by
A
+
–1
T = V Σ 0 U 0 0
2 ------- 0 1 5 ------- 1 2 = 1 0 5 2 –1 0 1 1 0 ------5 1 = --- 2 0 1 2 5 0 1 2 –1 1 = --- 2 4 5 2 –1 8.12
Given the 2 x 2 complex matrix
184
1 + 0.5 j A = 1+ j 0.5 – j 1 – j we have H A A = 3.25 3 , 3 3.25
AA
H
= 3.25 3 j – 3 j 3.25
The eigenvalues and eigenvectors of matrix A are found in the usual way by first solving A – λI = 0 for the eigenvalues λi for i = 1,2, and then using these values in x x A 1 = λi 1 x x 2 2
to determine the eigenvectors x. The two eigenvalues in descending order, for both AHA and AAH are λ1 = 6.25 and λ2 = 0.25. Regarding the eigenvectors, we have for AHA 1 V = x y = ------- 1 1 2 1 –1 x –y where x and y are arbitrary real numbers, and we arbitrarily make the simplest choice that gives us an orthonormal set of vectors. For AAH we similarly have 1 U = x y = ------- j 1 2 1 j jx – jy for another arbitrary choice of x and y. The SVD of A is given by A = UΣVH
185
where the right singular matrix V is the eigen-matrix of AHA and the left singular matrix U is the eigen-matrix of AAH. The singular values should be the square roots of the common eigenvalues: σ 1 = σ 0 Σ = 1 0 σ 2
λ 1 = 2.5 and σ 2 = 0.5 with
Interestingly enough
UΣV
H
= B = 1.25 + j0.25 0.25 + j1.25
– 1.25 + j0.25 0.25 – j1.25
where, obviously B ≠ A, but, BHB = AHA and BBH = AAH. In order to have B = A, the arbitrariness in choosing the eigenvectors of AHA and AAH above, should be removed. The natural constraint relation is that A = UΣVH holds true. Starting from that premise, we take V as is and define U using
U = AVΣ
–1
= 0.5657 + j0.4243 0.4243 – j0.5657
j0.7071 – 0.7071
We can easily verify that the new matrix U is indeed composed of eigenvectors of AAH determined above. Clearly, the new U is related to the old one by some similarity transformation H
U new = W U old W where W is a unitary matrix. This problem illustrates the need for a less arbitrary procedure for determining the SVD of a matrix, even in the simplest of cases, with a 2 x 2 matrix. 8.13
We are given
A =
2 3 1 2 –1 1
186
We note that
A
T
= 2 1 –1 3 2 1
Next, we set up
T
A A = 2 1 –1 3 2 1
= 6 7
2 3 1 2 –1 1
7 14
The eigenvalues of ATA are roots of the characteristic equation: (6-λ)(14-λ) - (7)2 = 0 or, λ2 - 20λ + 35 = 0 Solving for the roots of this equation, we thus get λ = 10 ± 65 That is, λ 1 = 10 – 65 ≈ 1.94 ,
σ1 =
λ 1 = 1.393
λ 2 = 10 + 65 ≈ 18.06 ,
σ2 =
λ 2 = 4.25
To find the eigenvectors of ATA, we note that 6 7 q 11 = 1.94 q 11 7 14 q 12 q 12 Hence,
187
4.06 q11 + 7 q12 = 0 or q12 = - 0.58 q11 We also note that 2
2
q 11 + q 12 = 1 Therefore, 2
2 2
q 11 + ( 0.58 ) q 11 = 1 or 1 q 11 = ± ----------------- = ± 0.87 1.336 Hence, q 12 = − + 0.505 Similarly, we may show that the eigenvector associated with λ2 = 18.06 is defined by q21 = -0.505 q22 = -0.87 Therefore, the right singular vectors of the data matrix A constitute the matrix
V =
0.87 – 0.505 – 0.505 – 0.87
(1)
Next, we set up
AA
T
=
2 3 2 1 –1 1 2 32 1 –1 1
188
=
13 8 1 8 51 1 12
The eigenvalues of AAT are the roots of the third-order characteristic equation: 13-λ 8 1 8 5-λ 1 = 0 1 1 2-λ or λ(λ2 - 20λ + 25) = 0 The eigenvalues of AAT are therefore as follows λ0 = 0 λ 1 ≈ 1.94 λ 2 ≈ 18.06 The two nonzero eigenvalues of AAT are the same as those found for ATA. To find the eigenvectors of AAT, we note that for λ1 = 1.94: q 11 13 8 1 q 11 8 5 1 q 12 = 1.94 q 12 1 1 2 q q 13 13 11.06 q11 + 8 q12 + q13 = 0 8 q11 + 3.06 q12 + q13 = 0 q11 + q12 + 0.06 q13 = 0 Using the first two equations to eliminate q13: 3.06 q11 + 4.94 q12 = 0 Hence,
189
q 12 ≈ – 0.619q 11 Using the first and third equations to eliminate q12: 3.06 q11 + 0.52 q13 = 0 Hence, q 13 ≈ – 5.88q 11 Next, we note that 2
2
2
q 11 + q 12 + q 13 = 1 Therefore, 2
2
2
q 11 + 0.383q 11 + 34.574q 11 = 1 2
36q 11 ≈ 1 q 11 ≈ ± 0.167 Correspondingly, q 12 ≈ − + 0.109 q 13 ≈ − + 0.98
We may thus set 0.167 q 1 ≈ – 0.109 – 0.981
(2)
For the eigenvalue λ2 = 18.06, we may write q 21 13 8 1 q 21 8 5 1 q 22 = 18.06 q 22 1 1 2 q q 23 23
190
-5.06 q21 + 8 q22 + q23 = 0 8 q21 - 13.06 q22 + q23 = 0 q21 + q22 - 16.06 q23 = 0 Using the first two equations to eliminate q23: -13.06 q21 + 21.06 q22 = 0 q22 = 0.62 q21 Using the first and third equations to eliminate q22: -13.06 q21 + 129.48 q23 = 0 q 23 ≈ 0.101q 21 We next note that 2
2
2
q 21 + q 22 + q 23 = 1 We therefore obtain 2
2
2
q 21 + 0.384q 21 + 0.01q 21 = 1 q 21 ≈ ± 0.845 Hence, q 22 ≈ ± 0.524 q 23 ≈ ± 0.085 We may thus set – 0.845 q 2 = – 0.524 – 0.0845
(3)
Note that q2 is orthogonal to q1, as it should be; that is, T
q1 q2 = 0 .
191
To complete the eigencomputation, we need to determine the eigenvector associated with the zero eigenvalue of matrix AAT. Here we note 13 8 1 8 5 1 q0 = 0 , 1 1 2 where q 0
2
λ0 = 0
= 1 . Solving for q0, we get
0.5071 q 0 = – 0.8452 0.169
(4)
Putting together Eqs. (2) through (4) for the eigenvectors q1, q2, and q0, we may express the left singular vectors of the data matrix A as the matrix 0.167 0.845 0.5071 U = – 0.109 0.524 – 0.8452 – 0.981 0.088 0.169 where the third column corresponds to the zero eigenvalue of AAT. The singular value decomposition of the data matrix A may therefore be expressed as 0.167 – 0.845 0.5071 1.393 A = – 0.198 – 0.524 – 0.8452 0 – 0.98 – 0.085 0.169 0
0 0.87 4.25 – 0.505 0
– 0.505 – 0.87
0.167 – 0.845 0.87 – 0.505 = – 0.109 – 0.524 1.393 0 0 4.25 – 0.505 – 0.87 – 0.98 – 0.085 As a check, carrying out the matrix multiplication given here, we get 2.016 3.0069 A = 0.9925 2.0142 – 1.0065 1.0044
192
which is very close to the original data matrix. (a) The pseudoinverse of matrix A is
A
+
2
=
∑ i=1
T 1 ----- v i u i σi
(b) The least-squares weight vector is +
ˆ = A d w T
2
=
∑ i=1
( ui d ) --------------- v i σi
T
T
( ui d ) ( u2 d ) = --------------- v 1 + --------------- v 2 σ1 σ2
1 = ------------- 0.167 0.109 – 0.981 1.393
1 + ------------- 0.845 0.524 0.085 4.250
2 0.87 1 – 0.505 1 ⁄ 24 2 0.505 1 0.87 1 ⁄ 34
= 0.3859 0.3859
193
8.14
First we set up the following identifications: Least-squares
Normalized LMS
Data matrix
A
xH(n)
Desired data vector
d
e*(n)
Parameter vector
w
c(n+1)
Eigenvalue
σi2
||x(n)||2
ui
1
Eigenvector
Hence, the application of the linear least-squares solution yields 1 c ( n+1 ) = ------------------x ( n )e * ( n ) 2 x(n) That is to say, 1 ˆ ( n+1 ) = ---------------------------- u ( n )e * ( n ) δw 2 u(n) + a Correspondingly, we may write µ ˆ ( n+1 ) = w ˆ ( n ) + ---------------------------- u ( n )e * ( n ) w 2 u(n) + a where µ is the step-size parameter. 8.15
We are given an SVD computer that may be pictured as follows:
Data matrix A
singular values: σ1,σ2,...,σW SVD Computer
left singular vectors: u1,u2,...,uK right singular vectors v1,v2,...,vM
W: rank of data matrix A K: number of rows of A M: number of columns of A The MVDR spectrum is defined by
194
1 S MVDR ( ω ) = -------------------------------------H –1 s ( ω )Φ s ( ω )
(1)
–1
where s(ω) is the steering vector, and Φ is the inverse of the correlation matrix Φ . Specifically, Φ is defined in terms ofthe data matrix A by H
Φ = A A where W
∑ σi ui vi
A =
H
i=1
A
H
W
∑ σi vi ui
=
H
i=1
That is, W
Φ =
∑ σi vi vi 2
H
ui
,
2
= 1 for all i
i=1
Correspondingly, we may express the inverse matrix Φ Φ
–1
W
=
1
–1
as
∑ -----2- vi vi
H
(2)
i=1 σ i
Hence, the denominator of the MVDR spectrum may be expressed as H
W
–1
s ( ω )Φ s ( ω ) =
i=1 σ i W
=
1 H
∑ -----2- s
H
( ω )v i v i s ( ω )
∑ zi ( ω )
2
(3)
i=1
where zi(ω) is a frequency-dependent scalar that is defined by the inner product:
195
1 H z i ( ω ) = ----- s ( ω )v i, σi
i = 1, 2, …, W
(4)
Accordingly, Eq. (2) may be rewritten as 1 S MVDR ( ω ) = ------------------------W 2 ∑ zi ( ω ) i=1
(5)
Thus, to evaluate the MVDR spectrum, we may proceed as follows: •
Compute the SVD of the data matrix A.
•
Use the right-singular vectors vi and the corresponding singular values σi in Eq. (4) to evaluate zi(ω) for i = 1,2,...,W, where W is the rank of A.
•
Use Eq. (5) to evaluate the MVDR spectrum.
196
CHAPTER 9 9.1
Assume that β ( n, i ) = λ ( i )β ( n, i-1 ),
i = 1, …, n
Hence, for i = n: β ( n, n ) = λ ( n )β ( n, n-1 ) Since β(n,n) = 1, we have –1
λ ( n ) = β ( n, n-1 ) Next, for i = n-1, β ( n, n-1 ) = λ ( n-1 )β ( n, n-2 ) or equivalently, –1
β ( n, n-2 ) = λ ( n-1 )β ( n, n-1 ) –1
–1
= λ ( n-1 )λ ( n ) Proceeding in this way, we may thus write –1
–1
–1
β ( n, i ) = λ ( i+1 )…λ ( n-1 )λ ( n ) n
=
∏
–1
λ (k)
k=i+1
For β(n,i) to equal λn-i, we must have –1
–1
–1
λ ( i+1 )…λ ( n-1 )λ ( n ) = λ
n-i
This requirement is satisfied by choosing λ(k ) = λ
–1
for all k
197
We thus find that
β ( n, i ) =
n – i terms λ…λλ
= λ 9.2
n-i
The matrix inversion lemma states that if A = B-1 + CD-1C
(1)
then A-1 = B - BC(D + CHBC)-1CHB
(2)
To prove this lemma, we multiply Eq. (1) by (2): AA-1 = (B-1 + CD-1CH)[B - BC(D + CHBC)-1CHB] = B-1B - B-1BC(D + CHBC)-1CHB - CD-1CHBC(D + CHBC)-1CHB + CD-1CHB We have to show that AA-1 = I. Since (D + CHBC)-1(D + CHBC) = I, and B-1B = I, we may rewrite this result as AA-1 = I - C( D + CHBC)-1CHB + CD-1(D + CHBC)( D + CHBC)-1CHB - CD-1CHBC(D + CHBC)-1CHB = I - [C - CD-1(D + CHBC) - CD-1CHBC] · (D + CHBC)-1CHB = I - (C - CD-1D)(D + CHBC)-1CHB Since D-1D = I, the second term in this last line is zero; hence, AA-1 = I which is the desired result.
198
9.3
We are given H
Φ ( n ) = δI + u ( n )u ( n )
(1)
Let A = B-1 + CD-1CH
(2)
Then, according to the matrix inversion lemma: A-1 = B - BC(D + CHBC)-1CHB
(3)
From Eqs. (1) and (2), we note: A = Φ(n) B-1 = δI C = u(n) D = I Hence, using Eq. (3): –1 H –1 1 H 1 1 Φ ( n ) = --- I – -----u ( n ) 1 + --- u ( n )u ( n ) u ( n ) δ δ δ2 H 1 u ( n )u ( n ) = --- ------------------------------------ δ δ + u H ( n )u ( n )
1 δI = --- ------------------------------------ δ δ + u H ( n )u ( n ) 1 = ------------------------------------ I H δ + u ( n )u ( n ) 9.4
From Section 9.6, we have *
ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w
(1)
199
Define ˆ (n) ε ( n ) = wo – w We may thus write *
ε ( n ) = ε ( n-1 ) – k ( n )ξ ( n )
(2)
Since *
ξ ( n ) = d ( n ) – wˆ ( n-1 )u ( n ) *
eo ( n ) = d ( n ) – wo u ( n ) We may expand Eq. (2) as follows *
ˆ ( n-1 )u ( n ) ) ε ( n ) = ε ( n-1 ) – k ( n ) ( d ( n ) – w *
*
*
ˆ ( n-1 )u ( n ) ) = ε ( n-1 ) – k ( n ) ( d ( n ) – w o u ( n ) – w *
= ε ( n-1 ) – k ( n ) ( e o ( n ) + ε ( n-1 )u ( n ) ) *
*
*
*
= ε ( n-1 ) – k ( n )e o ( n ) – k ( n )u ( n )ε ( n-1 ) *
*
= 1 – k ( n ) ( u ( n ) )ε ( n-1 ) – k ( n )e o ( n ) Hence, *
a ( n ) = 1 – k ( n )u ( n ) < 1, provided that 0 < |k(n)u*(n)| < 1. Then under this condition, ε ( n ) is guaranteed to decrease with increasing n. The convergence process is perturbed by the white noise eo(n). 9.5
From Eq. (9.25) of the text: *
ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w
200
where we have *
ˆ (1) = w ˆ ( 0 ) + k ( 1 )ξ ( 1 ) w *
...
ˆ (2) = w ˆ ( 1 ) + k ( 2 )ξ ( 1 ) w *
ˆ (n) = w ˆ ( n-1 ) + k ( n )ξ ( n ) w Summing the above n equations, we have n
ˆ (n) = w ˆ ( 0 ) + ∑ k ( i )ξ ( i ) w *
i=1
Hence ˆ (n) ε ( n ) = wo – w n
ˆ ( 0 ) – ∑ k ( i )ξ ( i ) = wo – w *
i=1 n
= ε ( 0 ) – ∑ k ( i )ξ ( i ) *
i=1
9.6
The a posteriori estimation error e(n) and the a priori estimation error ξ(n) are related by e ( n ) = γ ( n )ξ ( n ) where γ ( n ) is defined by [see Eq. (9.42) of the text] 1 γ ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ ( n-1 )u ( n ) Hence, for n = 1 we have e ( 1 ) = γ ( 1 )ξ ( 1 ) where
201
1 γ ( 1 ) = ---------------------------------------------------------------–1 H –1 1 + λ u ( 1 )Φ ( 0 )u ( 1 ) 1 = ------------------------------------------------------, –1 –1 H 1 + λ δ u ( 1 )u ( 1 ) λδ = -----------------------------, 2 λδ + u ( 1 ) λδ ≈ ----------------, 2 u(1)
–1
–1
Φ (0) = δ I
u(1) = u(1) 0 δ << 1
Hence, γ ( 1 ) is small. Next, we observe that H
ˆ ( n-1 )u ( n ) ξ(n) = d (n) – w For n = 1: H
ˆ ( 0 )u ( 1 ) ξ(1) = d (1) – w = d ( 1 ),
ˆ (0) = 0 w
We thus conclude that e(1) equals a very small quantity namely γ ( 1 ) , multiplied by d(1), which, in turn, makes e(1) small. As n increases, γ ( n ) approaches towards the final value of 1. Correspondingly, the meansquare value of the estimation error e(n) increases towards its steady-state value, Jmin. 9.7
ˆ (n)] (a) E [ ε ( n ) ] = E [ w o – w ˆ (n)] = wo – E [ w From Eq. (9.56) of the text we know that, for large n, δ ˆ ( n ) ] = w o – --- p E[w n Therefore,
202
δ E [ ε ( n ) ] = --- p n δ E [ ε ( n-1 ) ] = -------- p n-1 Hence, E [ ε ( n ) ] – E [ ε ( n-1 ) ] –δ 1 = ---------------- P = – --- E [ ε ( n-1 ) ] n ( n-1 ) n Rearranging terms: 1 E [ ε ( n ) ] ≈ 1 – --- E [ ε ( n-1 ) ] n
(1)
(b) Compared to the self-orthogonalizing adaptive filter discussed in Section 7.4, we can see from Eq. (7.36) that E [ ε ( n ) ] = ( 1 – α )E [ ε ( n-1 ) ]
(2)
where the constant α lies in the range 0 < α < 1. For α = 1/n, Eqs. (1) and (2) are the same. This is not surprising because in RLS filter, the step-size µ is the inverse –1
correlation matrix Φ ( n ) whereas in self-orthogonalizing filter, step-size is αR-1 where R is the correlation matrix. 9.8
(a) From Appendix G, using the property of complex Wishart distribution, we have: for H
–1
2 α R α any given nonzero α in we know that ------------------------------ is χ (chi-square) distributed H –1 α Φ ( n )α with n-M+1 degrees of freedom. Hence,
RM,
H –1 –1 1 E [ α Φ ( n )α ] = αR αE ----------------------------2 χ ( n-M+1 ) H –1 1 = -----------------α R α n-M+1
which further implies that
203
( n > M+1 )
–1 1 –1 E [ Φ ( n ) ] = -----------------R n-M+1
Usually n >> M-1. Hence 1 –1 –1 E [ Φ ( n ) ] ≈ --- R n H
H
(b) By definition D ( n ) = E [ ε ( n )ε ( n ) ] , and K ( n ) = E [ ε ( n )ε ( n ) ] ; thus D ( n ) = tr [ K ( n ) ] From Eq. (9.58), we may obtain 2 1 2 –1 –1 K ( n ) = σ o E [ Φ ( n ) ] = --- σ o R n
Hence, –1 1 2 D ( n ) = --- σ o tr [ R ] n M
1 2 1 = --- σ o ∑ ---n λi i=1
9.9
Consider the difference: X ( n ) = ∈ H ( n )Φ ( n ) ∈ ( n ) - ∈ H ( n-1 )Φ ( n-1 ) ∈ ( n-1 )
(1)
From RLS theory: –1
*
∈ ( n ) = ∈ ( n-1 ) – Φ ( n )u ( n )ξ ( n )
(2)
We may therefore rewrite Eq. (1) as –1
*
H
–1
*
X ( n ) = [ ∈ ( n-1 ) – Φ ( n )u ( n )ξ ( n )] Φ ( n )[ ∈ ( n-1 ) + Φ ( n )u ( n )ξ ( n )] H
- ∈ ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) H
= ∈ ( n-1 )[Φ ( n ) – Φ ( n-1 )]- ∈ ( n-1 )
204
H
–1
– ξ ( n )u ( n )Φ ( n )Φ ( n ) ∈ ( n-1 ) H
*
–1
– ξ ( n ) ∈ ( n-1 )Φ ( n )Φ ( n )u ( n ) 2 H
–1
–1
+ ξ ( n ) u ( n )Φ ( n )Φ ( n )Φ ( n )u ( n ) H
–1
= ∈ ( n-1 ) [ Φ ( n ) – Φ ( n-1 ) ] ∈ ( n-1 ) H
H
*
– ξ ( n )u ( n ) ∈ ( n-1 ) – ξ ( n ) ∈ ( n-1 )u ( n ) 2 H
+ ξ ( n ) u ( n )Φ ( n )u ( n )
(3)
Moreover, from RLS theory: H
Φ ( n ) = Φ ( n-1 ) + u ( n )u ( n ) , –1
H
λ=1
(4)
λ=1
(5)
–1
r ( n ) = 1 + u ( n )Φ ( n )u ( n ) Hence, using Eqs. (4) and (5) in (3) yields H
H
X ( n )= ∈ ( n-1 )u ( n )u ( n ) ∈ ( n-1 ) H
*
H
– ξ ( n )u ( n ) ∈ ( n-1 ) – ξ ( n ) ∈ ( n-1 )u ( n ) 2
–1
+ ξ(n) (1 – r (n))
(6)
Again, from RLS theory: H
∈ ( n-1 )u ( n ) = ξ u ( n )
(7)
ν ( n ) = ξ ( n ) – ξu ( n )
(8)
Hence, substituting Eqs. (7) and (8) into Eq. (6), we finally get 2
*
*
2
–1
X ( n ) = ξ u ( n ) – ξ ( n )ξ u ( n ) – ξ ( n )ξ u ( n ) + ξ ( n ) ( 1 – r ( n ) )
205
2
–1
= ξ ( n ) – ξu ( n ) – r ( n ) ξ ( n ) 2
–1
= ν(n) – r (n) ξ(n)
2
2
(9)
Recalling the definition of X(n) given in Eq. (1), we may thus write H
H
2
–1
∈ ( n )Φ ( n ) ∈ ( n )- ∈ ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) = ν ( n ) – r ( n ) ξ ( n ) or equivalently 2
H 2 ξ(n) ∈ ( n )Φ ( n ) ∈ ( n ) + ---------------- = ∈H ( n-1 )Φ ( n-1 ) ∈ ( n-1 ) + ν ( n ) r(n)
which is the desired result.
206
2
CHAPTER 10 10.1
Let α(1) = y(1)
(1)
α ( 2 ) = y ( 2 ) + A 1, 1 y ( 1 )
(2)
where the matrix A1,1 is to be determined. This matrix is chosen so as to make the innovations processes α(1) and α(2) uncorrelated with each other. That is, H
E [ α ( 2 )α ( 1 ) ] = 0
(3)
Substitute Eqs. (1) and (2) into (3): H
H
E [ y ( 2 )y ( 1 ) ] + A 1, 1 E [ y ( 1 )y ( 1 ) ] = 0 H
Postmultiplying both sides of this equation by the inverse of E [ y ( 1 )y ( 1 ) ] and rearranging: H
H
A 1, 1 = – E [ y ( 2 )y ( 1 ) ] { E [ y ( 1 )y ( 1 ) ] }
–1
We may rewrite Eqs. (1) and (2) in the compact form 0 y(1) α(1) = I A 1, 1 I y ( 2 ) α(2) This relation shows that, given the observation vectors y(1) and y(2), we may compute the innovations processes α(1) and α(2). The block lower triangular transformation matrix I 0 A 1, 1 I is invertible since its determinant equals 1. Hence, we may recover y(1) and y(2) from α(1) and α(2) by using the relation:
207
0 y(1) = I A 1, 1 I y(2)
–1
α(1) α(2)
I 0 α(1) – A 1, 1 I α ( 2 )
=
In general, we may express the innovations process α(n) as a linear combination of the observation vectors y(1), y(2),...,y(n) as follows: α ( n ) = y ( n ) + A 1, 1 y ( n – 1 ) + … + A n-1, n-1 y ( 1 ) n
=
∑ An, 1, k y ( n-k+1 ),
n = 1, 2, …,
k=1
where An-1,0 = I. The set of matrices {An-1,k} is chosen to satisfy the following conditions H
E [ α ( n+1 )α ( n ) ] = 0
n = 1, 2, …,
We may thus write … 0 y(1) … 0 y(2) ...
0 I
...
...
...
α(n)
I A 1, 1
...
α(1) α(2) =
A n, 1, n-1 A n, 1, n-2 … I y ( n )
The block lower triangular transformation matrix is invertible since its determinant equals one. Hence, we may go back and forth between the given set of observation vectors { y ( 1 ), y ( 2 ), …, y ( n ) } and the corresponding set of innovations processes { α ( 1 ), α ( 2 ), …, α ( n ) } without any loss of information. 10.2
First, we note that H
H
H
E [ ε ( n, n-1 )v 1 ( n ) ] = E [ x ( n )v 1 ( n ) ] – E [ xˆ ( n y n-1 )v 1 ( n ) ] Since the estimate xˆ ( n y n – 1 ) consists of a linear combination of the observation vectors y ( 1 ), …, y ( n-1 ) , and since
208
H
E [ y ( k )v 1 ( n ) ] = 0,
0≤k≤n
it follows that H
E [ xˆ ( n y n-1 )v 1 ( n ) ] = 0 We also have H E [ x ( n )v 1 ( n ) ]
=
H Φ ( n, 0 )E [ x ( 0 )v 1 ( n ) ] +
n-1
∑ Φ ( n, i )E [ v1 ( i )v1 ( n ) ] H
i=1
Since, by hypothesis H
E [ x ( 0 )v 1 ( n ) ] = 0 and H
E [ v 1 ( i )v 1 ( n ) ] = 0,
0≤i≤n
it follows that H
E [ x ( n )v 1 ( n ) ] = 0 Accordingly, we deduce that H
E [ ε ( n, n-1 )v 1 ( n ) ] = 0 Next, we note that H
H
H
E [ ε ( n, n-1 )v 2 ( n ) ] = E [ x ( n )v 2 ( n ) ] – E [ xˆ ( n y n-1 )v 2 ( n ) ] We have H
E [ x ( n )v 2 ( n ) ] = 0 Also, since ( xˆ ( n ) y n-1 ) consists of a linear combination of y(1),...,y(n-1) and since
209
H
E [ y ( k )v 2 ( n ) ] = 0 ,
1 ≤ k ≤ n-1
it follows that H
E [ xˆ ( n y n-1 )v 2 ( n ) ] = 0 We therefore conclude that H
E [ ε ( n, n-1 )v 2 ( n ) ] = 0 10.3
The estimated state-error vector equals ε ( i, n ) = x ( i ) – xˆ ( i y n ) n
= x(i) –
∑ bi ( k )α ( k ) k=1
The expected value of the squared norm of ε(i,n) equals 2
H
E [ ε ( i, n ) ] = E [ ε ( i, n )ε ( i, n ) ] n
=
n
∑ ∑
H b i ( k )b i ( l )E [ α * ( k )α ( l ) ] –
k=1 l=1 n
–
∑ E[x
H
n
∑ bi
H
( k )E [ x ( i )α * ( k ) ]
k=1
H
( i )α ( k ) ]b i ( k ) + E [ x ( i )x ( i ) ]
k=1
Differentiating this index of performance with respect to the vector b(k) and setting the result equal to zero, we find that the optimum value of bi(k) is determined by 2b i ( k )E [ α ( k )α * ( k ) ] – 2E [ x ( i )α * ( k ) ] = 0 Hence, the optimum value of bi(k) equals –2
b i ( k ) = E [ x ( i )α * ( k ) ]σ α
210
where 2
2
σα = E [ α ( k ) ] Correspondingly, the estimate of the state vector equals n
xˆ ( i y n ) =
∑ E [ x ( i )α* ( k ) ]σα α ( k ) –2
k=1 n
=
∑ E [ x ( i )ϕ* ( k ) ]ϕ ( k ) k=1
where ϕ ( k ) is the normalized innovation: α(k ) ϕ ( k ) = ----------σα 10.4
(a) The matrices K(n,n-1) and Q2(n) are both correlation matrices and therefore nonnegative definite. In particular, we note K(n,n-1) = E[ε(n,n-1)εH(n,n-1)] Q2(n) = E[v2(n)v2H(n)] where ε(n,n-1) is the predicted state-error vector, and v2(n) is the measurement noise vector. We may therefore express R(n) in the equivalent form R(n) = E[e(n)eH(n)] where e(n) = C(n)ε(n,n-1) + v2(n) where it is noted that ε(n,n-1) and v2(n) are uncorrelated. We now see that R(n) is a correlation matrix and therefore nonnegative definite. (b) For R(n) to be nonsingular, and therefore for the inverse matrix R-1(n) to exist, we demand that Q2(n) be positive definite such that the determinant of
211
C(n)K(n,n-1)CH(n) + Q2(n) is nonzero. This requirement, in effect, says that no measurement is exact, hence the unavoidable presence of measurement noise and therefore Q2(n). Such a requirement is reasonable on physical grounds. 10.5
In the limit, when n approaches infinity, we may put K(n+1,n) = K(n,n-1) = K Under this condition, Eq. (10.54) of the text simplifies to K = (I - GC)K(I - CHGH) + Q1 + GQ2GH
(1)
where it is assumed that the state-transition matrix equals the identity matrix, and G, C, Q1 and Q2 are the limiting values of the matrices G(n), C(n), Q1(n) and Q2(n), respectively. From Eq. (10.49) we find that the limiting value of the Kalman gain equals G = KCH(CKCH + Q2)-1
(2)
Expanding Eq. (1) and then using Eq. (2) to eliminate G, we get KCH(CKCH + Q2)-1CK - Q1 = O
10.6
x ( n+1 ) = 0 1 x ( n ) + v 1 ( n ) 1 1
Q1 = I ; E [ v1 ] = 0
y ( n ) = 1 0 x ( n ) + v2 ( n )
Q2 = 1 ; E [ v2 ] = 0
(a) Using Table 10.2, we formulate the recursions for computing the Kalman filter Known parameters: F ( n+1, n ) = F = 0 1 1 1 c(n) = c = 1 0 Q1 ( n ) = I Q2 ( n ) = 1
212
Unknown parameters:
K ( n ), n-1 ) = K =
k 11 k 12 k 21 k 22
We may then write
G ( n ) = 0 1 K ( n, n-1 ) 1 1 0 K ( n, n-1 ) 1 + 1 1 1 0 0
–1
=
k 21 ( n, n-1 ) -----------------------------------k 11 ( n, n-1 ) + 1 k 11 ( n, n-1 ) + k 21 ( n, n-1 ) ------------------------------------------------------------k 11 ( n, n-1 ) + 1
α ( n ) = y ( n ) – 1 0 xˆ ( n Y n-1 ) = y ( n ) – xˆ 1 ( n Y n-1 ) { k 11 ( n, n-1 ) – k 21 ( n, n-1 ) }xˆ ( n Y ) + k 21 ( n, n-1 )y ( n ) 1 n-1 --------------------------------------------------------------------------------------------------------------------------------------------------k 11 ( n, n-1 ) + 1
) + G ( n )α ( n ) = xˆ ( n+1 Y ) = 0 1 xˆ ( n Y n-1 n 1 1 { 1 – k 21 ( n, n-1 ) }xˆ ( n Y ) + { k 11 ( n, n-1 ) + k 21 ( n, n-1 )y ( n ) } 1 n-1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------
k 11 ( n, n-1 ) + 1
(1) K ( n ) = K ( n, n-1 ) – 0 1 G ( n ) 1 0 K ( n, n-1 ) = 11
k 11 + k 21
------------------------- 0 k 11 + 1 = K ( n, n-1 ) – K ( n, n-1 ) = K ( n, n-1 ) – k 11 + 2 k 21 ---------------------------- 0 k 11 + 1
=
– k 21 + 1 k 11 ---------------------k 11 + 1
– k 21 + 1 k 12 ---------------------k 11 + 1
k 11 + 2 k 21 k 21 – k 11 ---------------------------k 11 + 1
k 11 + 2k 21 k 22 – k ---------------------------12 k + 1 11
213
k 11 + k 21 k 11 -----------------------k 11 + 1
k 11 + k 21 k 12 -----------------------k 11 + 1
k 11 + 2 k 21 k 11 ---------------------------k 11 + 1
k 11 + k 21 k 12 ---------------------------k 11 + 1
K ( n+1, n ) = 0 1 K ( n ) 0 1 + I = 1 1 1 1
=
k 11 + 2k 21 1 + k 22 – k ------------------------ 12 k 11 + 1 k 11 + k 21 k 22 – k --------------------- 12 k + 1 11
k 21 + k
k 11 + 2k 21 – ( k 11 + k 12 ) ------------------------ 22 k 11 + 1
(2)
1 – k 11 + k 21 1 + k 21 + k + ( k 11 + k 12 ) ------------------------------ 22 k 11 + 1
For this particular case, the Kalman filtering process is entirely described by the pair of equations (1) and (2). (b) The generalized form of the algebraic Riccati equation is: H
H
–1
H
H
K – FKF + FKC ( CKC + Q 2 ) CKF – Q 1 = 0 here we use the fact that the predicted state-error correlation matrix K is symmetric
For our problem:
F = 0 1 , 1 1
Q1 = I ,
C = 10 ,
Q2 = [ 1 ] ,
K =
k1 k2 k2 k3
Therefore, we have:
C k1 k2 – k2 k3
0 1 k1 1 1 k2
k2 0 1 + k3 1 1
K=
B k2 1 + 1 k3 0
–1
1 0
k1 k2 0 1 –I = 0 k2 k3 1 1
k1 0 1 k1 k2 1 1 0 1 1 k 2 k 3 0 k2
A
214
k k k k 1 A = ------------ 0 1 1 2 1 0 1 2 0 1 k 1 +1 1 1 k k 0 0 k k 1 1 2 3 2 3 k3 k2 1 1 0 k 2 k 1 +k 2 = -----------k 1 +1 k +k k +k 0 0 k k +k 1 2 2 3 3 2 3
k2 0 k2 1 = -----------k 1 +1 k +k 0 k 1 2 3
=
k 2 +k 3
k2 -----------k 1 +1
2
k 2 ( k 1 +k 2 ) ------------------------k 1 +1
k 2 ( k 1 +k 2 ) ------------------------k 1 +1
( k 1 +k 2 ) ---------------------k 1 +1
k B = 01 1 1 1 k2
=
k 1 +k 2
k2
1 = -----------k 1 +1
k2 1 0 k3 k2 0 1 = k3 0 0 k 1 +k 2 k 2 +k 3 1 1 k 2 +k 3
k 1 -k 3
–k 3
–k 3
– k 1 -2k 2
C+A–I = 0
215
k 2 ( k 1 +k 2 )
k 2 ( k 1 +k 2 ) ( k 1 +k 2 )
2
k 2 +k 3 k 1 +2k +k 3 2
C = K–B =
2
k2
2
2
k2 ------------ + k 1 -k 3 -1 k 1 +1
k 2 ( k 1 +k 2 ) ------------------------- – k 3 k 1 +1
k 2 ( k 1 +k 2 ) ------------------------- – k 3 k 1 +1
( k 1 +k 2 ) ---------------------- – k 1 -2k 2 -1 k 1 +1
2
= 00 00
k 2 ( k 1 +k 2 ) k 3 = ------------------------k 1 +1 –k 1 k 2 -------------- + k 1 -1=0 k 1 +1 ⇒ ( k +k ) 2 1 2 --------------------- – k 1 -2k 2 -1=0 k +1 1
2
k 2 – k 2 ( k 1 +k 2 ) ------------------------------------- + k 1 -1=0 k 1 +1
⇒
2
( k 1 +k 2 ) ---------------------- – k 1 -2k 2 -1=0 k 1 +1
2 k1 – 1 k 2 = -------------k1 k 2 +2k k +k 2 -k 2 -k – 2k k – 2k – k – 1=0 1 2 2 1 1 1 2 2 1 1 2
2
2 k1 – 1 k 2 = -------------k1 ⇒ k 2 – 2k – 2k – 1=0 2 1 2
2
k1 – 1 (k1 – 1) ---------------------- – 2 -------------- – 2k 1 – 1 = 0 2 k1 k1 4
2
4
3
3
3
2
k 1 – 2k 1 + 1 – 2k 1 + 2k 1 – 2k 1 – k 1 = 0 2
k 1 – 4k 1 – 3k 1 + 2k 1 + 1 = 0 This equation has four different real-valued solutions, but it is easy to show that only one of them is meaningful in the context of our problem.
216
10.7
(a) The state equations are x(n+1) = F(n+1,n)x(n) + v1(n)
(1)
y(n) = C(n)x(n) + v2(n)
(2)
We are given x(n) = [a1(n),...,aM(n),...,aM+N(n)]T C(n) = [-y(n-1),...,-y(n-M),v(n-1),...,v(n-N)] We are also given ak(n+1) = ak(n) + wk(n),
k = 1,...,M+N
which means that x(n+1) = x(n) + w(n)
(3)
Comparing Eqs. (1) and (3) we therefore deduce that for the problem at hand F(n+1+,n) = I and v1(n) = w(n) We next note that the difference equation describing the given ARMA process may be recast as follows M
y ( n ) = – ∑ a k ( n )y ( n – k ) + k=1
N
∑ a M + k ( n )v ( n – k ) + v ( n ) k=1
= C ( n )x ( n ) + v ( n )
(4)
Comparing Eqs. (2) and (4) we therefore deduce y(n) = y(n) v2(n) = v(n) This completes the evaluation of the state equations. (b) Since the transition matrix equals the identity matrix, we find that the predicted and filtered forms of the state vector are the same. Hence,
217
G(n) = K(n,n-1)CH(n)[C(n)K(n,n-1)CH(n) + Q2(n)]-1 α(n) = y(n) - C(n)x(n|Yn-1) xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + G ( n )α ( n ) x ( n Y n ) = xˆ ( n+1 Y n ) K(n) = K(n,n-1) - G(n)C(n)K(n,n-1) K(n+1,n) = K(n) + Q1(n) (c) To initialize the algorithm, we set xˆ ( 0 Y 0 ) = E [ x ( 0 ) ] , H
K ( 0 ) = E [ x ( 0 )x ( 0 ) ] , 10.8
v ( n ) = ν1 ( n ) w ( n ) = ν2 ( n )
We are given the state equations x(n+1) = A x(n) + b v(n)
(1)
y(n) = hTx(n) + w(n)
(2)
We may therefore make the following identifications: F(n+1,n) = A v(n) = b v(n) C(n) = hT v2(n) = w(n) Using the Kalman filtering algorithm, we have 2 –1
T
G ( n ) = AK ( n, n-1 )h [ h K ( n, n-1 )h + σ w ] T
α ( n ) = y ( n ) – h xˆ ( n Y n-1 )
218
xˆ ( n+1 Y n-1 ) = Axˆ ( n Y n-1 ) + G ( n )α ( n ) xˆ ( n Y n ) = Axˆ ( n+1 Y n ) T
K ( n ) = K ( n, n-1 ) – AG ( n )h K ( n, n-1 ) T
2
K ( n+1,n ) = AK ( n )A + σ v b b
T
We note the following: 2
T
1.
The factor ( h K ( n, n-1 )h + σ w ) is a scalar.
2.
The Kalman gain G(n) is a vector.
3.
The matrix A is M-by-M. The matrix K(n,n-1) is also M-by-M. The vector h is M-by-1. Hence, the Kalman gain is an M-by-1 vector.
The dynamics of the message source are represented by the state Eq. (1), where the M-byM matrix A and the M-by-1 vector b are defined by
...
...
0 . . . . . . 0 10 . . . . . 0 A = 0 1 0 . . . . 0 0 . . . . 0 10
...
1 b = 0 0
The elements of the state vector x(n) consist of the channel input represented by u(n) = v(n) and successively delayed versions of it; v(n) is modeled as a random binary white noise sequence. Equation (1) simply states that each succeeding component at time n is equal to the previous component at time n-1. The channel output is described by Eq. (2), where y(n) is the measured output, hT(n) is the 1-by-M row vector of channel coefficients, and w(n) is a Gaussian white noise sequence independent of v(n).
219
For the digital communication system described by Eqs. (1) and (2), use of the Kalman filter yields xˆ ( n+1 Y n ) = Axˆ ( n Y n-1 ) + G ( n )α ( n ) where G(n) is the Kalman gain, and T
α ( n ) = y ( n ) – h xˆ ( n Y n-1 ) The resulting general form for the Kalman filter is depicted in Fig. 1. We see from this figure that the Kalman filter consists of an IIR filter with forward coefficients defined by the channel impulse response and feedback coefficients defined by the Kalman gain (which in this problem is a column vector).
. g1
Σ
^
x(n|yn-1)
.
z-1
h0
Mean of the input v(n), which is given to be zero
..
.
.
g2
g3
Σ
z-1
... ... .
.Σ
.
gM-1
Σ
z-1
..
h1
h2
...
hM-2
hM-1
Σ
Σ
...
Σ
Σ α(n)
Σ
Figure 1
10.9
gM
Σ x^ M(n+1|yn)
Estimate hTx(n|yn-1)
+ _
Channel output y(n)
We start with α(n) = y(n) - C(n)x(n)|Yn-1)
(1)
xˆ (n+1|Yn) = F(n+1,n) xˆ (n|Yn-1) + G(n)α(n)
(2)
Substituting Eq. (1) into (2): xˆ ( n+1 Y n ) = F ( n+1, n )xˆ ( n Y n-1 ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ]
220
The filtered state estimate is xˆ ( n Y n ) = F ( n, n+1 )xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + F ( n+1, n )G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ] = [ I – F ( n+1, n )G ( n )C ( n ) ]xˆ ( n Y n-1 ) + F ( n+1, n )G ( n )y ( n ) But y(n) = C(n)x(n) + v2(n) Hence, xˆ ( n Y n ) = [ I – F ( n, n+1 )G ( n )C ( n ) ]xˆ ( n Y n-1 ) + F ( n, n+1 )G ( n )C ( n )x ( n ) + F ( n, n+1 )G ( n )v 2 ( n ) Taking expectations, and recognizing that the measurement noise vector v2(n) has zero mean: E [ xˆ ( n Y n ) ] = [ I – F ( n, n+1 )G ( n )C ( n ) ]E [ xˆ ( n Y n-1 ) ] + F ( n, n+1 )G ( n )C ( n )E [ xˆ ( n ) ] For n=1, we thus get E [ xˆ ( 1 Y 1 ) ] = [ I – F ( 1, 2 )G ( 1 )C ( 1 ) ]E [ xˆ ( 1 Y 0 ) ] + F ( 1, 2 )G ( 1 )C ( 1 )E [ x ( 1 ) ] Since the one-step prediction xˆ ( 1 Y 0 ) must be specified, we have E [ xˆ ( 1 Y 0 ) ] = xˆ ( 1 Y 0 ) Accordingly, substituting the choice of the initial condition xˆ ( 1 Y 0 ) = E [ x ( 1 ) ] in Eq. (1) yields
221
(1)
E [ xˆ ( 1 Y 1 ) ] = [ I – F ( 1, 2 )G ( 1 )C ( 1 ) ]E [ x ( 1 ) ] + F ( 1, 2 )G ( 1 )C ( 1 )E [ x ( 1 ) ] = E [x(1)] By induction, we may go on to show that in general: E [ xˆ Y n ] = E [ x ( n ) ] In other words, the filtered estimate xˆ ( n y n ) produced by the Kalman filter is an unbiased estimate for the specified method of initialization. 10.10 MAP derivation of Kalman filter f XY ( x ( n ), Y n ) f XY ( x ( n ), y ( n ), Y n-1 ) (a) f X ( x ( n ) y ( n ) ) = ------------------------------------ = -----------------------------------------------------f Y (Y n) f Y ( y ( n ), Y n-1 )
(1)
where f XY ( x ( n ), y ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ), Y n-1 ) f XY ( x ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ), Y n-1 ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) = f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 )
(2)
where we have used the fact that y ( n ) = C ( n )x ( n ) + v 2 ( n ) , and v 2 ( n ) does not depend on Yn-1. Consequently, f Y ( y ( n ) x ( n ), Y n-1 ) = f Y ( y ( n ) x ( n ) ) Substituting Eq. (2) into (1), we thus get f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) f X ( x ( n ) Y n ) = ------------------------------------------------------------------------------------------------f Y ( y ( n ), Y n-1 ) f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) f Y ( Y n-1 ) = ------------------------------------------------------------------------------------------------f Y ( y ( ( n ) Y n-1 ) ) f Y ( Y n-1 )
222
f Y ( y ( n ) x ( n ) ) f X ( x ( n ) Y n-1 ) = -------------------------------------------------------------------------f Y ( y ( n ), Y n-1 )
(1)
(b) Examine first f Y ( y ( n ) x ( n ) ) ; its mean is E [ y ( n ) x ( n ) ] = E [ C ( n )x ( n ) + v 2 ( n )x ( n ) ] = C ( n )x ( n ) The variance is var [ y ( n ) x ( n ) ] = var [ v 2 ( n ) x ( n ) ] = Q 2 ( n ) , where Q2(n) = correlation matrix of measurement noise v2(n). Thus, assuming Gaussianity, we may write H –1 1 f Y ( y ( n ) x ( n ) ) = A 1 exp – --- ( y ( n ) – C ( n )x ( n ) ) Q 2 ( n ) ( y ( n ) – C ( n )x ( n ) ) 2
(2)
where the constant A1 is a proper scaling factor. Consider next f X ( x ( n ) Y n-1 ) ; its mean is E [ x ( n ) Y n-1 ] = E [ F ( n, n-1 )xˆ ( n-1 ) + v 1 ( n-1 ) Y n-1 ] = F ( n, n-1 )xˆ ( n-1 ) = xˆ ( n Y n-1 ) The variance is var [ x ( n ) Y n-1 ] = var [ x ( n ) – xˆ ( n ) Y n-1 ] = var [ ε ( n, n-1 ) ] where ε(n,n-1) is the state-error vector. Denote this variance by K(n,n-1), which is to be determined. Again, assuming Gaussianity, we may write
223
H –1 1 f X ( x ( n ) Y n-1 ) = A 2 exp – --- ( x ( n )-xˆ ( n ) Y n-1 ) K ( n, n-1 ) ( x ( n )-xˆ ( n ) Y n-1 ) (3) 2
where A2 is another appropriate scaling factor. Thus, substituting Eqs. (2) and (3) into (1), we get H –1 1 f X ( x ( n ) Y n ) = A exp – --- ( y ( n ) – C ( n )x ( n ) ) Q 2 ( n ) ( y ( n ) – C ( n )x ( n ) ) 2 H –1 1 – --- ( x ( n ) – xˆ ( n ) Y n-1 ) K ( n, n-1 ) ( x ( n ) – xˆ ( n ) Y n-1 )) 2
(4)
)
where A = A1A2 is another constant. (v) By definition, the MAP estimate of the state is defined by the condition ∂ ln f X ( x ( n ) Y n ) = 0 ----------------------------------------x ( n ) = xˆ MAP ( n ) ∂(x(n))
(5)
Hence, substituting Eq. (4) into (5) yields H
–1
–1
–1
–1
xˆ MAP ( n ) = [ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ] [ K ( n, n-1 )xˆ ( n Y n-1 ) H
–1
+ C ( n )Q 2 ( n )y ( n ) ]
(6)
From a computational point of view, we need to put the first inverse matrix into a more convenient form. To that end, we apply the matrix inversion lemma, which may be stated as follows (see Section 9.2 of the text). If A = B
–1
–1 H
+ CD C
then A
–1
H
–1 H
= B – BC ( D + C BC ) C B H
–1
–1
With the expression [ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ] we note the following B = K ( n, n-1 )
224
–1
as the issue of concern,
H
C = C (n) D = Q2 ( n ) Hence, applying the matrix inversion lemma: H
–1
–1
[ C ( n )Q 2 ( n )C ( n ) + K ( n, n-1 ) ]
–1
H
H
–1
= K ( n, n-1 )-K ( n, n-1 )C ( n ) ( Q 2 ( n )+C ( n )K ( n, n-1 )C ( n ) ) C ( n )K ( n, n-1 ) (7) Substituting Eq. (7) into (6), and then going through some lengthy but straightforward algebraic manipulations, we get xˆ MAP ( n ) = xˆ ( n Y n-1 ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) ]
(8)
where G(n) is the Kalman gain defined by H
H
G ( n ) = F ( n+1, n )K ( n, n-1 )C ( n ) [ C ( n )K ( n, n-1 )C ( n ) + Q 2 ( n ) ]
–1
(9)
The one issue that is yet to be determined is K(n,n-1). Here we note ∈ ( n, n-1 ) = x ( n ) – xˆ ( n Y n-1 ) = F ( n, n-1 )x ( n-1 ) + v 1 ( n-1 ) – F ( n, n-1 )xˆ MAP ( n – 1 ) = F ( n, n-1 ) ∈
MAP
( n-1 ) + v 1 ( n-1 )
Therefore, H
K ( n, n-1 ) = F ( n, n-1 )K ( n-1 )F ( n, n-1 ) + Q 1 ( n-1 ) where K ( n ) = var[ ∈ MAP( n )] There only remains for us to determine K(n-1). Here we note ∈ MAP( n ) = x ( n ) – xˆ MAP ( n )
225
(10)
= x ( n ) + ( – xˆ ( n Y n-1 ) ) + G ( n ) [ y ( n ) – C ( n )xˆ ( n Y n-1 ) But y ( n ) = C ( n )x ( n ) + v 2 ( n ) Hence, noting that ∈ ( n, n-1 ) = x ( n ) – xˆ ( n Y n-1 ) , ∈ MAP( n ) =
∈ ( n, n-1 ) – G ( n ) [ C ( n ) ∈ ( n, n-1 ) + v 2 ( n ) ]
= [I – G ( n )C ( n )] ∈ ( n, n-1 ) – G ( n )v 2 ( n ) which yields K ( n ) = var[ ∈ MAP( n )] H
H
= [ I – G ( n )C ( n ) ]K ( n, n-1 ) [ I – G ( n )C ( n ) ] + G ( n )Q 2 ( n )G ( n ) After some manipulations, this formula reduces to K ( n ) = K ( n, n-1 ) – F ( n, n+1 )G ( n )C ( n )K ( n, n-1 )
(11)
The algorithm for computing the MAP estimate xˆ MAP ( n ) is now complete. It is made up of Eqs. (8) through (11). Indeed, comparing these equations with the Kalman filtering algorithm summarized in Table 10.2 of the text, we find that the MAP estimate xˆ MAP ( n ) is nothing but the filtered estimate xˆ ( n ) in the standard Kalman filter theory. (d) The second dedrivative of lnfX(x(n)|Yn) is given by the expression –1
K ( n, n-1 )xˆ ( n, n-1 ) which is always negative. Hence, the condition of Eq. (2) in Problem 10.10 is satisfied by the MAP estimate xˆ MAP ( n ) . 10.11 Given a system x ( n + 1 ) = Fx ( n ) y ( n ) = Cx ( n ) (a)
Show that
226
xˆ ( n Y n ) = F ( I – G ( n )C )xˆ ( n Y n-1 ) + CG ( n )y ( n ) α ( n ) = y ( n ) – Cxˆ ( n Y n-1 ) ˜ By definition, the innovation is α ( n ) = y ( n ) – yˆ ( n Y n-1 ) ˜ Following the same reasoning used to derive Eq. (10.31) of the text, we have yˆ ( n Y n-1 ) = Cxˆ ( n Y n-1 ) Therefore, α ( n ) = y ( n ) – Cxˆ ( n Y n-1 ) which confirms the second equation of the problem. ˜ Next, y ( n ) = Cxˆ ( n ) ⇒ α ( n ) = [ Cx ( n ) – xˆ ( n Y n-1 ) ] = Cε ( n, n-1 ) ˜ H
H
R ( n ) = E [ α ( n )α ( n ) ] = CE [ ε ( n, n-1 )ε ( n, n-1 ) ]C ˜ ˜ ˜
H
= CK ( n, n-1 )C
H
Similarly to Eq. (10.45) of the text, we can write: xˆ ( n Y n-1 ) = Fxˆ ( n Y n-1 ) + G ( n )α ( n ) ˜ H
–1
H
–1
G ( n ) = E [ x ( n+1 )α ( n ) ]R ( n ) = FK ( n, n-1 )C R ( n ) ˜ Here we have used Eqs. (10-44)-(10.49) of the text as a model. ε ( n+1,n ) = [ F – G ( n )C ]ε ( n, n-1 ) ˜ ˜ H
H
H
H
K ( n+1,n ) = E [ ε ( n+1,n )ε ( n+1,n ) ] = [ F – G ( n )C ]K ( n, n-1 ) [ F – C G ( n ) ] ˜ ˜ H = FK ( n )F –1
K ( n ) = ( I – F G ( n )C )K ( n, n-1 ) Similarly to Eq. (10.58) of the text,
227
–1
–1
xˆ ( n Y n ) = F xˆ ( n+1 Y n ) = xˆ ( n Y n-1 ) + F G ( n )α ( n ) ˜ –1
H
= xˆ ( n Y n-1 ) + ( I – F G ( n )C )K ( n, n-1 )F y ( n ) – Cxˆ ( n Y n-1 ) –1
–1
= xˆ ( n Y n-1 ) – F G ( n )C ( n Y n-1 ) + F G ( n )y ( n ) –1
–1
= ( I – F G ( n )C )xˆ ( n Y n-1 ) + F G ( n )y ( n ) (Note: An error was made in the first printing of the book.) (b)
The innovations sequence α(1), α(2),...,α(n) consists of samples that are orthogonal to each other; see Property 2, p.467 of the text. Orthogonality is synonymous with whitening. Moreover, invoking the very definition of the innovations process, we may refer to the Kalman filter as a whitening filter.
H 10.12 K XY ( n ) = E [ x ( n ) – xˆ ( n Y n-1 ) ] [ y ( n ) – yˆ ( n Y n-1 ) ] H K YY ( n ) = E [ y ( n ) – yˆ ( n Y n-1 ) ] [ y ( n ) – yˆ ( n Y n-1 ) ] –1
Show that G f ( n ) = K XY ( n )K YY ( n ) The innovation vector α ( n ) = y ( n ) – yˆ ( n Y n-1 ) - by definition. H
Therefore, K YY ( n ) = E [ α ( n )α ( n ) ] = R ( n ) Also, by definition, –1
H
–1
G f ( n ) = F ( n+1, n )G ( n ) = F ( n, n+1 )E [ x ( n+1 )α ( n ) ]R ( n )
H H H E [ x ( n+1 )α ( n ) ] = F ( n+1, n )E [ x ( n )α ( n ) ] = F ( n+1, n )E [ x ( n )-xˆ ( n Y n-1 ) ]α ( n ) ˜ ˜ ˜ = F ( n+1, n )K XY ( n )
228
H
Here we have used the fact that the estimate xˆ ( n Y n-1 ) is orthogonal to α ( n ) . ˜ Therefore, adding it inside the expectation term does not change the expectation value. Using the results obtained: –1
–1
G f ( n ) = F ( n, n-1 )F ( n+1, n )K XY ( n )R ( n ) = K XY ( n )K YY ( n ) 10.13 Let us consider the unforced dynamical model x ( n+1 ) = F ( n+1, n )x ( n )
(1)
H
y ( n ) = u ( n )x ( n ) + v ( n )
(2)
1 where F ( n+1, n ) = ------- I λ From (1) we can see that x(n) = λ
–n ⁄ 2
x(0)
(3) H
(a) Considering a deterministic multiple regression model d ( n ) = e o ( n ) + w o u ( n ) , we can write:
...
...
...
* H * d ( 0 )=u ( 0 )w o + e o ( 0 ) d * ( 1 )=u H ( 1 )w + e * ( 1 ) o o d * ( n )=u H ( n )w + e * ( n ) o o
which represents a deterministic system of linear equations (b) From Eqs. (1), (2) and (3), we have:
229
H y ( 0 )=u ( 0 )x ( 0 ) + v ( 0 ) y ( 0 )=u ( 0 )x ( 0 ) + v ( 0 ) H –1 ⁄ 2 1⁄2 H –1 ⁄ 2 y ( 1 )=u ( 1 )x ( 0 )λ + v ( 0 ) ⇔ λ y ( 1 )=u ( 1 )x ( 0 )λ + v(0) H –1 ⁄ 2 n⁄2 H –1 ⁄ 2 y ( n )=u ( n )x ( 0 )λ + v(0) λ y ( n )=u ( n )x ( 0 )λ + v(0) ...
...
...
...
...
...
H
which represents a stochastic system of linear equations. (c) Both the stochastic and the determinitic systems of linear simultaneous equations describe the same problem, and therefore should have a common solution. Comparing the two systems, we can set x ( 0 )=w o –n ⁄ 2 * y ( n )=λ d (n) – n ⁄ 2 * v ( n )=λ eo ( n ) 10.14 The reason for the RLS operating satisfactorily is the fact that the minimum mean-square error follows the recursion (see Eq. (9.30)) *
E min ( n ) = λE min ( n-1 ) + ξ ( n )e ( n ) Hence, with λ in the interval 0 < λ < 1, the stored value of the minimum mean-square error, Emin(n-1), is reduced by the factor λ as the minimum mean-square error is recursively updated. 10.15 Consider first the correspondence between xˆ ( 1 Y 0 ) prediction, we have xˆ ( n + 1 Y n ):
λ
–n ⁄ 2
ˆ ( 0 ) . For the one-step and w
ˆ (n) w
ˆ ( 0 ) in Therefore, setting n=0, we see that xˆ ( 1 Y 0 ) in the Kalman filter corresponds to w the RLS algorithm. Consider next the correlation matrix of the error in state prediction, which is defined by K(n):
λ-1P(n)
230
Putting n=0, we readily see that K(0) in the Kalman filter corresponds to λ-1P(0) in the RLS algorithm. 10.16 The condition number of K(n) is λ max χ ( K ) = -----------λ min Given that K(n) = U(n)D(n)UH(n) we may write H
χ ( K ) = χ ( UDU ) H
≤ χ ( U )χ ( D )χ ( U )
(1)
The eigenvalues of an upper triangular matrix are the same as the diagonal elements of the matrix. For the situation at hand, the upper triangular matrix has 1’s for all its diagonal elements. Hence, the λmax and λmin of U(n) are both equal to one, and so H
χ(U) = χ(U ) = 1 Accordingly, Eq. (1) simplifies to χ(D) ≥ χ(K) 10.17 Let K(n) = K1/2(n)KH/2(n) Hence, χ(K) ≤ χ(K
1⁄2
)χ ( K
H⁄2
) = (χ(K
1⁄2
))
2
The implication of this result is that the condition number of the square root K1/2(n) is the square root of the condition number of the original matrix K(n). 10.18 Adding the known vector d(n) to the state equation is equivalent to assuming that the state noise vector v1(n) has a mean equal to d(n). In general, we have
231
xˆ ( n + 1 Y n ) = F ( n + 1, n )xˆ ( n Y n ) + vˆ 1 ( n Y n ) If therefore v1(n) has a mean d(n), it follows that xˆ ( n + 1 Y n ) = F ( n + 1, n )xˆ ( n Y n ) + d ( n )
232
CHAPTER 11 11.1
To drive the square-root information filter, we proceed as follows. Let λ
1⁄2
K
–H ⁄ 2
( n-1 )
λ
A ( n ) = xˆ H ( n Y )K – H ⁄ 2 ( n-1 ) n-1 0
T
1⁄2
u(n)
y* ( n ) 1
Taking the Hermitian transpose:
H
A (n) =
λ
1⁄2
λ
K
–1 ⁄ 2
( n-1 )
1⁄2 H
u (n)
K
–1 ⁄ 2
( n-1 )xˆ ( n Y n-1 )
y(n)
0 1
Hence, postmultiplying A(n) by AH(n):
λK H A ( n )A ( n ) =
(
λ
λ
–1
H ( n-1 )+λu ( n )u ( n )
1⁄2
–1 xˆ ( n Y n-1 )K ( n-1 ) 1⁄2 H +λ u ( n )y* ( n )
1⁄2 H u (n)
λ
)
1 ⁄ 2 –1 1⁄2 K ( n-1 )xˆ ( n Y n-1 )+λ u ( n )y ( n ) –1 H λxˆ ( n Y n-1 )K ( n-1 )xˆ ( n-1 Y n )
y(n)
This result pertains to the pre-array of the square-root information filter. Consider next the post-array of the square-root information filter. Let
B(n) =
B 11 ( n ) b 21 ( n ) b 31 ( n ) 0
T
b 22 ( n ) b 32 ( n )
Taking the Hermitian transpose:
233
λ
1⁄2
u(n)
y* ( n )
1
H
B 11 ( n ) H
B (n) =
0
H
*
H
*
b 21 ( n ) b 22 ( n ) b 31 ( n ) b 32 ( n )
Hence, pre-multiplying B(n) by BH(n): H B 11 ( n )B 11 ( n ) H B ( n )B ( n )
= b ( n )B ( n ) 21 11 H b 31 ( n )B 11 ( n )
H B 11 ( n )b 21 ( n )
H H B 11 ( n )b 31 ( n )
H 2 b 21 ( n )b 21 ( n ) + b 22 ( n ) H * b 31 ( n )b 31 ( n ) + b 32 ( n )b 22 ( n )
H * b 21 ( n )b 31 ( n ) + b 22 ( n )b 32 ( n ) H 2 b 31 ( n )b 31 ( n ) + b 32 ( n )
Equating terms in A(n)AH(n) to corresponding terms in BH(n)B(n), we get, H
–1
H
1. B 11 ( n )B 11 ( n ) = λK ( n-1 ) + λu ( n )u ( n ) –1
= K (n) = K
–H ⁄ 2
( n )K
–1 ⁄ 2
(n)
Hence, H
B 11 ( n ) = K
–H ⁄ 2
(n)
H
2. B 11 ( n )b 21 ( n ) = λ
1⁄2
–1
K ( n-1 )xˆ ( n Y n-1 ) + λ
–1
= K ( n )xˆ ( n+1 Y n ) Hence, b 21 ( n ) = K H
–H ⁄ 2
( n )xˆ ( n+1 Y n )
3. B 11 ( n )b 31 ( n ) = λ
1⁄2
u(n)
Hence,
234
1⁄2
u ( n )y ( n )
b 31 ( n ) = λ
1⁄2
K
H⁄2
( n )u ( n )
H
4. b 31 ( n )b 31 ( n ) + b 32 ( n )
2
= 1
Hence, b 32 ( n )
2
H
= 1 – λu ( n )K ( n )u ( n ) –1
= r (n) That is, b 32 = r
–1 ⁄ 2
(n)
H
5. b 21 ( n )b 21 ( n ) + b 22 ( n )
2
H
–1
= xˆ ( n Y n-1 )K ( n-1 )xˆ ( n Y n-1 ) + y ( n )
2
Hence, b 22 ( n )
2
H
–1
= xˆ ( n Y n-1 )K ( n-1 )xˆ ( n Y n-1 ) + y ( n )
2
–1
– xˆ ( n+1 Y n )K ( n )xˆ ( n+1 Y n )
(1)
H
6. b 31 ( b )b 21 ( n ) + b *32 ( n )b 22 ( n ) = y ( n ) Hence, r
–1 ⁄ 2
( n )b 22 ( n ) = y ( n ) – λ
1⁄2 H
u ( n )K
1⁄2
( n )K
–1 ⁄ 2
( n )xˆ ( n+1 Y n )
That is, b 22 ( n ) = r
1⁄2
(n)[ y(n) – λ
1⁄2 H
u ( n )xˆ ( n+1 Y n ) ]
But, we know that xˆ ( n+1 Y n ) = λ
–1 ⁄ 2
xˆ ( n Y n-1 ) + α ( n )g ( n )
where g(n) is the Kalman gain and α(n) is the innovation. Therefore,
235
b 22 ( n ) = r = r = r
1⁄2
1⁄2 1⁄2
H
( n ) [ y ( n ) – u ( n )xˆ ( n Y n-1 ) – λ (n)[α(n) – λ
1⁄2
( n )α ( n ) [ 1 – λ
1⁄2
H
α ( n )u ( n )g ( n ) ]
H
α ( n )u ( n )g ( n ) ]
1⁄2 H
u ( n )g ( n ) ]
But, λ
1⁄2
–1
–1
g ( n ) = K ( n )u ( n ) ⋅ r ( n )
Therefore, b 22 ( n ) = r = r = r
1⁄2
H
–1
–1
( n )α ( n ) [ 1 – u ( n )K ( n )u ( n )r ( n ) ]
–1 ⁄ 2 –1 ⁄ 2
H
–1
( n )α ( n ) [ r ( n ) – u ( n )K ( n )u ( n ) ] ( n )α ( n )
(2)
where we have used r(n) = 1 + uH(n)K-1(n)u(n). Final Check In a laborious way, it can be shown that Eq. (1) is satisfied exactly by the value of b22(n) defined in Eq. (2). 11.2
Consider a linear dynamical system described by the state-space model: x ( n+1 ) = λ
–1 ⁄ 2
x(n)
(1)
H
y ( n ) = u ( n )x ( n ) + v ( n )
(2)
where v(n) is a Gaussian variable of zero mean and variance Q(n). The Kalman filtering algorithm for the model is described by: g(n) = λ
–1 ⁄ 2
–1
K ( n-1 )u ( n )R ( n )
H
R ( n ) = u ( n )K ( n-1 )u ( n ) + Q ( n )
236
H
α ( n ) = y ( n ) – u ( n )xˆ ( n Y n-1 ) xˆ ( n+1 Y n ) = λ
–1 ⁄ 2
xˆ ( n Y n-1 ) + g ( n )α ( n )
–1
K ( n ) = λ K ( n-1 ) – λ
–1 ⁄ 2
H
g ( n )u ( n )K ( n-1 )
Then, proceeding in a manner similar to that described in Chapter 11, we may formulate the extended square-root information filter for the state-space model of Eqs. (1) and (2) as follows: –1
–1
–1
H
K ( n ) = λ [ K ( n-1 ) + Q ( n )u ( n )u ( n ) ] –1
K ( n )xˆ ( n+1 Y n ) = λ
–1 ⁄ 2
(3)
–1
–1
[ K ( n-1 )xˆ ( n Y n-1 ) + Q ( n )u ( n )y ( n ) ]
(4)
where g(n) is the Kalman gain (vector). Thus, in light of Eqs. (3) and (4), we may formulate the following array structure for the square-root information filter:
λ
1 ⁄ 2 –H ⁄ 2
K
( n-1 )
H
xˆ ( n Y n-1 )K T 0
–H ⁄ 2
λ ( n-1 )
Q
–1 ⁄ 2 –1 ⁄ 2
Q
–1 ⁄ 2
Q
( n )u ( n )
–H ⁄ 2
(n)
0
Θ ( n ) = xˆ H ( n+1 Y )K – H ⁄ 2 ( n ) n
( n )y ( n )
–1 ⁄ 2
K
λ
(n)
–1 ⁄ 2 –1
Q
H
( n )u ( n )K
R
1⁄2
(n)
–1 ⁄ 2
R
( n )α * ( n )
–1 ⁄ 2
(n)
This equation includes an ordinary square-root information filter as a special case. Specifically, putting Q(n) = 0 we get the pre-array-to-post-array transformation for that filter. 11.3
(a) Verify the expression for the extended square-root information filter given by λ
1 ⁄ 2 –H ⁄ 2 K ( n-1 )
xˆ ( n Y n-1 ) K 0 λ
–H ⁄ 2
( n-1 )
T
1⁄2 1⁄2 K ( n-1 )
λ
1⁄2
u(n)
*
K
–H ⁄ 2
( n-1 )
y (n)
xˆ ( n+1 Y n ) K Θ ( n )=
1
λ
0
237
( n-1 )
1⁄2 H 1⁄2 u (n)K (n) K
Let
–H ⁄ 2
0
1⁄2
(n)
r
1⁄2
r
*
( n )α ( n )
1⁄2
– g ( n )r
(n)
1⁄2
(n)
λ
1 ⁄ 2 –H ⁄ 2 K ( n-1 )
xˆ ( n Y n-1 ) K A(n) = 0 λ
–H ⁄ 2
u(n)
*
( n-1 )
y (n) 1
1 ⁄ 2 –1 ⁄ 2 ( n-1 ) K
H
B (n) =
1⁄2
T
B 11 ( n ) H
λ
0
0
H
b 22 ( n )
*
H
b 32 ( n )
H
B 42 ( n )
b 21 ( n )
*
b 31 ( n )
H
B 41 ( n )
Then, equating the matrix product A(n) AH(n) to the matrix product BH(n) B(n), and comparing their corresponding terms, we may say the following: •
Points 1 through 6 discussed in the solution to Problem 11.1 remain valid. Accordingly, the entries B11(n), b21(n), b31(n), b22(n), and b32(n) will all have the values determined there.
•
We have the following additional equations to consider: H
H
1. B 41 ( n )B 11 ( n ) = I since B 11 ( n ) = K H
B 41 ( n ) = K
1⁄2
–1 ⁄ 2
( n ) , it follows that
(n)
H
(1)
H
2. B 41 ( n )b 31 ( n ) + B 42 ( n )b 32 ( n ) = 0 Hence, H
H
B 42 ( n ) = B 41 ( n )b 31 ( n ) ⁄ b 32 ( n ) Since b 32 ( n ) = r
–1 ⁄ 2
H
( n ) and b 31 ( n ) = λ
238
1⁄2 H
u ( n )K
1⁄2
(n) ,
it follows that H
B 42 ( n ) = λ
1⁄2 1⁄2
= –λ
r
( n )K
1⁄2 1⁄2
r
1⁄2
( n )K
–H ⁄ 2
( n )u ( n )
( n )K ( n )u ( n )
(2)
But from Kalman filter theory for the dynamical system being considered here, we know that λ
1⁄2
K ( n )u ( n ) = g ( n )
Accordingly, we may simplify Eq. (2) to H
B 42 ( n ) = – r
1⁄2
( n )g ( n )
where g(n) is the Kalman gain. This completes the evaluation of the post-array for the extended square-root information filter. 3. In addition to the two equations described above, we have several other relations that follow from A(n) AH(n) = BH(n) B(n); these additional relations merely provide a means to check the values already determined for the entries of the post-array. (b) Using results from part (a), derive the extended QR-RLS algorithm. In order to derive the extended QR-RLS algorithm we make use of the following relationships between the Kalman and RLS variables: K
–1
( n ) → λΦ ( n )
α(n) → λ
–1
r (n) → γ (n) g(n) → λ
–1 ⁄ 2
y(n) → λ
k(n)
–n ⁄ 2 *
ξ (n)
–n ⁄ 2 *
d (n)
xˆ ( n Y n-1 ) → λ
–n ⁄ 2
ˆ ( n-1 ) w
Therefore, the extended QR-RLS could be written as: λ λ
1⁄2
Φ
(n)
1⁄2 H
p ( n-1 ) 0
λ
1⁄2
–1 ⁄ 2
Φ
T
–H ⁄ 2
( n-1 )
u(n)
Φ
1⁄2
(n)
H
d ( n ) Θ ( n )= p (n) H –H ⁄ 2 1 u ( n )Φ (n) Φ
0
–H ⁄ 2
(n)
0 ξ ( n )γ γ
1⁄2
1⁄2
– k ( n )γ
(n)
(n)
–1 ⁄ 2
(n)
where we used the same reasoning as while obtaining equations (11.40)-(11.47)
239
ˆ (n) = w ˆ ( n-1 ) + [ k ( n )γ w 11.4
–1 ⁄ 2
( n ) ] [ ξ ( n )γ
–1 ⁄ 2
(n)]
*
We start with Q ( n )A ( n ) = R ( n ) O Let
Q(n) =
Q1 ( n ) Q2 ( n )
Hence, Q1 ( n )
A(n) = R(n) Q2 ( n ) O H H A ( n ) = [ Q1 , Q2 ( n ) ] R ( n ) O H
= Q 1 ( n )R ( n ) H
H
A ( n ) = R ( n )Q 1 ( n ) The projection matrix is therefore –1 H
H
P ( n ) = A ( n ) ( A ( n )A ( n ) ) A ( n ) H
H
H
–1 H
H
= Q 1 ( n )R ( n ) ( R ( n )Q 1 ( n )Q 1 ( n )R ( n ) ) R ( n )Q 1 ( n ) H
Since Q 1 ( n )Q 1 ( n ) = I , we have H
H
–1 H
P ( n ) = Q 1 ( n )R ( n ) ( R ( n )R ( n ) ) R ( n )Q 1 ( n )
240
We also note that for an upper triangular matrix R(n), –1 H
H
R ( n ) ( R ( n )R ( n ) ) R ( n ) = identity matrix Hence, H
P ( n ) = Q 1 ( n )Q 1 ( n ) 11.5
In a prediction-error filter, the input u(n) represents the desired response and the tap inputs u(n-1),...,u(n-M) represent the variables used to estimate u(n). Hence, we may restructure the inputs of the systolic array in Fig. 11.4 of the text in the following manner so that it operates as a prediction-error filter (illustrated here for order M = 3): 1.0 u(3), u(2), u(1)
u(2) u(1),
u(1)
u(4)
0
u(3,
. .
0
.
0
u(2) fm(n)
Prediction-error filter 11.6
We are given that H
R ( n )a ( n ) = s Hence, a(n) = R
–H
( n )s
We note that RH(n) is a lower triangular matrix. Hence, given R(n) and s, the vector a(n) may be computed by means of a linear section using forward substitution.
241
11.7
The output of the last interval cell in the bottom row of the triangular section is given by (see Fig. 11.2(a) of the text): * 1⁄2
u out = c u in – s λ
x
where c and s are the Givens parameters, uin is the input to the cell and x is the stored value to the cell. At time n, we have u in ( n ) = d ( n ) *
s ( n )λ
1⁄2
H
ˆ ( n-1 )u ( n ) x ( n-1 ) = c ( n )w
ˆ ( n-1 ) is the previous where d(n) is the desired response, u(n) is the input vector, and w value of the least-squares weight vector. Hence, H
ˆ ( n-1 )u ( n ) u out = c ( n )d ( n ) – c ( n )w H
ˆ ( n-1 )u ( n ) ] = c(n)[d (n) – w Recognizing that c(n) = γ
1⁄2
(n)
and H
ˆ ( n-1 )u ( n ) = ξ ( n ) , d(n) – w we finally get u out = γ 11.8
1⁄2
( n )ξ ( n ) .
(a) We note that H
R ( n )a ( n ) = s ( φ ) Hence, a(n) = R
–H
( n )s ( φ )
242
Taking Hermitian transpose: H
H
–1
a ( n ) = s ( φ )R ( n ) (b) For an MVDR beamformer: a(n) ˆ ( n ) = -------------------------R ( n )w H a ( n )a ( n ) Hence, –1
R ( n )a ( n ) ˆ ( n ) = ----------------------------w H a ( n )a ( n ) We note that aH(n)a(n) is a scalar. With R(n) being an upper triangular matrix, it follows that the linear section performs backward substitution. The resulting output of ˆ (n) . this section is the weight vector w 11.9
Reformulating the prearray of the extended QR-RLQ algorithm in Problem 2 of Chapter 11 by making use of the correspondences in Table 11.3, we may write λ
1⁄2
λ 0
Φ
1⁄2
( n-1 )
u(n)
1⁄2 H
a ( n-1 )
0
T
Φ
1⁄2
(n)
0
Θ ( n ) = aH ( n ) H
u ( n )Φ
1
– e′ ( n )γ –H ⁄ 2
(n)
γ
1⁄2
–1 ⁄ 2
(n)
(n)
Squaring both sides of this equation, and retaining the terms on the second rows: λ a ( n-1 )
2
= a(n)
The term e′ ( n ) ⁄ γ λ a ( n-1 )
2
1⁄2
= a(n)
2
+γ
–1
( n ) e′ ( n )
2
( n ) is recognized as an estimation error denoted by ε(n). Hence, 2
+ ε(n)
2
11.10 The standard RLS filter is the covariance version of the Kalman filter. The inverse QRRLS filter is the square-root version of the covariance Kalman filter. It follows therefore that the inverse QR-RLS filter is the square-root RLS filter.
243
CHAPTER 12 12.1
2 2 2 1 J fb, m = --- ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) ( 1 + κ m ) 2 *
*
*
+ κ m E [ f m-1 ( n )b m-1 ( n-1 ) ] + κ m E [ b m-1 ( n-1 ) f m-1 ( n ) ] Differentiating with respect to a complex variable: ∂J fb, m 2 2 ----------------- = κ m ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) ∂κ m *
*
+ E [ b m-1 ( n-1 ) f m-1 ( n ) ] + E [ b m-1 ( n-1 ) f m-1 ( n ) ] 2
2
= κ m ( E [ f m-1 ( n ) ] + E [ b m-1 ( n-1 ) ] ) *
+ 2E [ b m-1 ( n-1 ) f m-1 ( n ) ] 12.2
(a) Suppose we write f m ( i ) = f m-1 ( i ) + κˆ m ( n )b m-1 ( i-1 ) b m ( i ) = b m-1 ( i ) + κˆ m ( n ) f m-1 ( i ) Then substituting these relations into Burg’s formula: n
2 ∑ b m-1 ( i-1 ) f m-1 ( i ) *
i=1 κˆ m ( n ) = – -----------------------------------------------------------------n 2 2 ∑ f m-1 ( i ) + bm-1 ( i-1 ) i=1 n
2 ∑ b m-1 ( i-1 ) f m-1 ( i ) + 2b m-1 ( n-1 ) f m-1 ( n ) *
*
i=1 = – -----------------------------------------------------------------------------------------------------------, 2 2 E m-1 ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 )
244
Cross-multiplying and proceeding in a manner similar to that described after Eqs. (12.12) and (12.13) in the text, we finally get *
*
f m-1 ( n )b m ( n ) + b m-1 ( n-1 ) f m ( n ) κˆ m ( n ) = κˆ m ( n-1 ) – ---------------------------------------------------------------------------------E m-1 ( n-1 ) (b) The algorithm so formulated is impractical because to compute the updated forward prediction error fm(n) and backward prediction error bm(n) we need to know the updated κˆ m ( n ) . This is not possible to do because the correction term for κˆ m ( n ) requires knowledge of fm(n) and bm(n). 12.3
For the transversal filter of Fig. 12.6 in the text we have: tap-weight vector
= kM(n)
tap-input vector
= uM(i),
desired response, d(i) = 1 0
i = 1,2,...,n i=n i = n-1,…, 1
The a posteriori estimation error equals H
e ( i ) = d ( i ) – k M ( n )u M ( i ),
i = 1, 2, …, n
The deterministic cross-correlation vector φ ( n ) equals n
φ(n) =
∑λ
n-i
uM d *( i )
i=1
= uM ( n ) We also note that n
E d(n) =
∑λ
n-i
d(i)
2
= 1
i=1
Hence, the sum of weighted error squares equals
245
H
ˆ (n) E min ( n ) = E d ( n ) – φ ( n )w H
= 1 – u M ( n )k m ( n ) H
We note that the inner product u M ( n )k m ( n ) is a real-valued scalar. Hence, H
E min ( n ) = 1 – k m ( n )u m ( n ) = γ M (n) 12.4
We start with H
Φ m ( n ) = λΦ m ( n-1 ) + u m ( n )u m ( n ) where u M ( n ) is the tap-input vector, Φ m ( n-1 ) is the past value of the deterministic correlation matrix, and Φ m ( n ) is its present value. Hence, H
λΦ m ( n-1 ) = Φ m ( n ) – u m ( n )u ( n ) m H
–1
= Φ m ( n ) [ I – u m ( n )u ( n )Φ m ( n ) ] m where I is the identity matrix. Hence, taking the determinants of both sides: H m
–1
λdet [ Φ m ( n-1 ) ] = det [ Φ m ( n ) ]det [ I – u m ( n )u ( n )Φ m ( n ) ] But, H m
–1
–1
H
det [ I – u m ( n )u ( n )Φ m ( n ) ] = det I – u m ( n )Φ m ( n )u m ( n ) H
–1
= 1 – u m ( n )Φ m ( n )u m ( n ) = γ m(n) We may therefore rewrite Eq. (1) as
246
(1)
λdet [ Φ m ( n-1 ) ] = det [ Φ m ( n ) ]γ m ( n ) Hence, we may express the conversion factor γ m ( n ) as det [ Φ m ( n-1 ) ] γ m ( n ) = λ ----------------------------------det [ Φ m ( n ) ] 12.5
(a) The (m+1)-by-(m+1) correlation matrix Φ m+1 may be expressed in the form H
U (n)
Φ m+1 ( n ) =
φ1 ( n )
(1)
φ 1 ( n ) Φ m ( n-1 )
Define the inverse of this matrix as H
α1 β1 β1 Γ1
–1
Φ m+1 ( n ) =
(2)
Hence, from Eqs. (1) and (2): –1
I m+1 = Φ m+1 ( n )Φ m+1 ( n ) U (n)
H
φ1 ( n )
H
α1 β1 = φ 1 ( n ) Φ m ( n-1 ) β 1 Γ 1 H
=
H
H
U ( n )α 1 + φ 1 ( n )β 1
U ( n )β 1 + φ 1 ( n )Γ 1
φ 1 ( n )α 1 + Φ m ( n-1 )β 1
φ 1 ( n )β 1 ( n ) + Φ m ( n-1 )Γ 1
H
From this relation we deduce the following four equations: H
U ( n )α 1 + φ 1 ( n )β 1 = 1 H
(3)
H
U ( n )β 1 + φ 1 ( n )Γ 1 = 0
(4)
247
φ 1 ( n )α 1 + Φ m ( n-1 )β 1 = 0
(5)
H
φ 1 ( n )β 1 + Φ m ( n-1 )Γ 1 = I m
(6)
Eliminate β1 between Eqs. (3) and (5): H
–1
U ( n )α 1 – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )α 1 = 1 Hence, 1 α 1 = -----------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )
(7)
which is real-valued. Correspondingly, –1
β 1 = – Φ m ( n-1 )φ 1 ( n )α 1 –1
– Φ m ( n-1 )φ 1 ( n ) = -----------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )
(8)
From Eq. (6): –1
–1
H
Γ 1 = Φ m ( n-1 ) – Φ m ( n-1 )φ 1 ( n )β 1 –1
=
–1 Φ m ( n-1 ) +
H
–1
Φ m ( n-1 )φ 1 ( n )φ 1 ( n )Φ m ( n-1 ) ----------------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n )
Check: Substitute Eqs. (8) and (9) into the left hand side of Eq. (4):
H U ( n )β 1
H + φ 1 ( n )Γ 1
H –1 – φ 1 ( n )Φ m ( n-1 ) = U ( n ) ------------------------------------------------------------------------ H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n ) H
–1
+ φ 1 ( n )Φ m ( n-1 )
248
(9)
H
–1
H
–1
φ 1 ( n )Φ m ( n-1 )φ 1 ( n )φ 1 ( n )Φ m ( n-1 ) + --------------------------------------------------------------------------------------------H –1 U ( n ) – φ 1 ( n )Φ m ( n-1 )φ 1 ( n ) = 0 This agrees with the right-hand side of Eq. (4). From Eq. (7) we note that 1 α 1 = --------------Fm(n) From Eq. (8) we note that ˆ f (n) w β 1 = – --------------Fm(n) From Eq. (9) we note that
Γ1 =
H
T
0
0m
0m
Φ m ( n-1 )
ˆ f (n) ˆ f ( n )w w + -------------------------------Fm(n)
–1
Hence, we may express the inverse matrix of Eq. (1) as follows:
–1 Φ m+1 ( n )
=
=
0m
0m
Φ m ( n-1 )
0
0m
0m
=
T
0
–1
1 + --------------Fm(n)
H
1
ˆ f (n) –w
ˆ f (n) –w
ˆ f ( n )w ˆ f (n) w
T
H
1 H 1 ˆ f (n)] + --------------[ 1, – w F m ( n ) –w ˆ f (n) –1 Φ m ( n-1 ) T
0
0m
0m
Φ m ( n-1 )
–1
H 1 + ---------------a m ( n )a m ( n ) Fm(n)
249
(b) Consider next the second form of the (m+1)-by-(m+1) correlation matrix Φ m+1 ( n ) given by
Φ m+1 ( n ) =
Φm ( n )
φ2 ( n )
H φ2 ( n )
U ( n-m )
(10)
Define the inverse of this matrix as
–1
Φ m+1 ( n ) =
Γ2
β2
H
α2
β2
(11)
Using Eqs. (10) and (11): –1
I m+1 = Φ m+1 ( n )Φ m+1 ( n )
=
Φm ( n ) H
φ2 ( n )
φ2 ( n )
Γ2
β2
H
α2
U ( n-m ) β 2 H
=
Φ m ( n )Γ 2 + φ 2 ( n )β 2 H
H
φ 2 ( n )Γ 2 + U ( n-m )β 2
Φ m ( n )β 2 + φ 2 ( n )α 2 H
φ 2 ( n )β 2 + U ( n-m )α 2
We thus deduce the following four relations: H
Φ m ( n )Γ 2 + φ 2 ( n )β 2 = I m
(12)
Φ m ( n )β 2 + φ 2 ( n )α 2 = 0
(13)
H
H
φ 2 ( n )Γ 2 + U ( n-m )β 2 = 0
(14)
H
φ 2 ( n )β 2 + U ( n-m )α 2 = 1
(15)
Eliminate β2 between Eqs. (13) and (15):
250
H
–1
– φ 2 ( n )Φ m ( n )φ 2 ( n )α 2 + U ( n-m )α 2 = 1 Hence, 1 α 2 = -------------------------------------------------------------------------------------------H –1 –1 U ( n-m ) – φ 2 ( n )Φ m ( n )Φ m ( n )θ 2 ( n )
(16)
Correspondingly, β2 equals –1
β 2 = – Φ m ( n )φ 2 ( n )α 2 –1
Φ m ( n )φ 2 ( n ) = – -------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n )
(17)
Substitute Eq. (17) in (12) and solve for Γ 2 : –1
–1
H
Γ 2 = Φ m ( n ) – Φ m ( n )φ 2 ( n )β 2 –1
=
–1 Φm ( n )
H
–1
Φ m ( n )φ 2 ( n )φ 2 ( n )Φ m ( n ) + -------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n )
Check: Substitute Eqs. (17) and (18) into the left-hand side of Eq. (14): H –1 H –1 φ 2 ( n )Φ m ( n )φ 2 ( n )φ 2 ( n )Φ m ( n ) H –1 φ 2 ( n )Φ m ( n ) + ---------------------------------------------------------------------------------H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n ) H
–1
φ 2 ( n )Φ m ( n ) – U ( n-m ) -------------------------------------------------------------------------- = 0 H –1 U ( n-m ) – φ 2 ( n )Φ m ( n )φ 2 ( n ) which agrees with the right-hand side of Eq. (14). Next, we note that
251
(18)
1 α 2 = --------------Bm ( n ) g(n) β 2 = – --------------Bm ( n ) –1
Γ2 =
Φm ( n ) 0 0
T
0
H 1 + ---------------w b ( n )w b ( n ) Bm ( n )
–1
Hence, we may express the inverse matrix Φ m+1 ( n ) in the alternative form:
–1 Φ m+1 ( n )
–1
=
0
1 w ( n )w b ( n ) + --------------- b Bm ( n ) –wb 0
T
–1
=
1
–wb ( n ) H 1 + --------------[ – w b ( n ), 1 ] Bm ( n ) 1 0
T
–1
Φm ( n ) 0 0
12.6
–wb ( n )
Φm ( n ) 0 0
=
H
Φm ( n ) 0
T
0
H 1 + ---------------c m ( n )c m ( n ) Bm ( n )
(a) From the solution to part (a) of Problem 12.5, we have
–1 Φ m+1 ( n )
=
T
0
0m
0m
–1 Φ m ( n-1 )
H 1 + ---------------a m ( m )a m ( n ) Fm(n)
(1)
Correspondingly, the input vector um+1(n) is partitioned as follows: u m+1 ( n ) =
u(n) u m+1 ( n-1 )
(2)
From the definition of the conversion factor, we have
252
H
–1
γ m+1 ( n ) = 1 – u m+1 ( n-1 )Φ m+1 ( n )u m+1 ( n )
(3)
Therefore, substituting Eqs. (1) and (2) into (3): H
–1
γ m+1 ( n ) = 1 – u m ( n-1 )Φ m ( n-1 )u m ( n-1 ) H H 1 – ---------------u m+1 ( n )a m ( m )a m ( n )u m+1 ( n ) Fm(n) 2
f m(n) = γ m+1 ( n-1 ) – --------------------Fm(n)
(4)
where we have used the fact that H
f m ( n ) = a m ( n )u m+1 ( n ) (b) From the solution to part (b) of Problem 12.5, we have –1
–1 Φ m+1 ( n )
Φm ( n ) 0m H 1 = + ---------------c m ( n )c m ( n ) Bm ( n ) T 0m 0
(5)
This time, we partition the input vector um+1(n) as follows: u m+1 ( n ) =
um ( n )
(6)
u(n – m)
Therefore, substituting Eqs. (5) and (6) into (3): H
–1
γ m+1 ( n ) = 1 – u m ( n )Φ m ( n )u m ( n ) H H 1 – ---------------u m+1 ( n )c m ( n )c m ( n )u m+1 ( n ) Bm ( n ) 2
= γ m ( n ) – bm ( n ) ⁄ Bm ( n )
253
(7)
where we have made use of the fact that H
b m ( n ) = c m ( n )u m+1 ( n ) (c) We invoke the following property of the conversion factor: f m(n) γ m ( n-1 ) = --------------ηm ( n )
(8)
Also, we note that F m ( n ) = λF m ( n-1 ) + η m ( n ) f *m ( n )
(9)
Therefore eliminating η m ( n ) between Eqs. (8) and (9): 2
f m(n) F m ( n ) = λF m ( n-1 ) + --------------------γ m ( n-1 ) Finally, eliminating f m ( n )
2
(10)
between Eqs. (4) and (10):
γ m ( n-1 ) γ m+1 ( n ) = γ m ( n-1 ) – -------------------- [ F m ( n ) – λF m ( n-1 ) ] F (n) m
= λF m ( n-1 )γ m ( n-1 ) ⁄ F m ( n ) (d) Next, we invoke another property of the conversion factor: bm ( n ) γ m ( n ) = --------------βm ( n )
(11)
We also note that B m ( n ) = λB m ( n-1 ) + b *m ( n )β m ( n ) Therefore, eliminating β m ( n ) between Eqs. (11) and (12):
254
(12)
2
B m ( n ) = λB m ( n-1 ) + b m ( n ) ⁄ γ m ( n ) Eliminating b m ( n )
2
(13)
between Eqs. (7) and (13):
γ m(n) γ m+1 ( n ) = γ m ( n ) – --------------- [ B m ( n ) – λB m ( n-1 ) ] B (n) m
= λB m ( n-1 )γ m ( n ) ⁄ B m ( n ) n
12.7
(a) F m ( n ) =
∑λ
n-i
f m(i)
2
i=1 n
=
∑λ
n-i
f m(i) f m(i)
n-i
ˆ f , m ( n )u m ( i-1 ) ] f m ( i ) [u(i) – w
n-i
* u(i) f m(i) –
n-i
u(i) f m(i)
n-i
ˆ f , m ( n-1 )u m ( i-1 ) ] f m ( i ) [ ηm ( i ) + w
n-i
* ηm ( i ) f m ( i )
n-i
ηm ( i ) f m ( i )
*
i=1 n
=
∑λ
H
*
i=1 n
=
∑λ i=1 n
=
∑λ
n
H n-i * ˆ f , m ( n ) ∑ λ u m ( i-1 ) f m ( i ) = 0 w i=1
*
i=1 n
=
∑λ
H
*
i=1 n
=
∑λ i=1 n
=
∑λ
n
H * ˆ f , m ( n-1 ) ∑ u m ( i-1 ) f m ( i ) = 0 + w i=1
*
i=1
255
n
=
∑λ
n-i
*
*
ηm ( i ) f m ( i ) + ηm ( n ) f m ( n )
i=1 *
= λF m ( n-1 ) + η m ( n ) f m ( n ) (b) Following a similar procedure, we may use the relations n
Bm ( n ) =
∑λ
n-i
bm ( i )
2
i=1
and n
∑λ
n-i
*
u m ( i )b m ( i ) = 0
i=1
to derive the recursion *
B m ( n )λB m ( n-1 ) + β m ( n )b m ( n ) 12.8
(a) We start with n
∆ m-1 ( n ) =
∑λ
n-i
*
b m-1 ( i-1 ) f m-1 ( i )
i=1 n
=
∑λ
n-i
H
*
ˆ b, m-1 ( n-2 )u m-1 ( i-1 ) ] f m-1 ( i ) [u(i – m) – w
i=1 n
=
∑λ
n-i
* m ) f m-1 ( i )
u(i –
n-i
u ( i – m ) f m-1 ( i )
i=1 n
=
∑λ
n
H * ˆ b, m-1 ( n-2 ) ∑ u m-1 ( i-1 ) f m-1 ( i ) = 0 – w i=1
*
i=1
256
n
=
∑λ
H
*
n-i
ˆ b, m-1 ( n-1 )u m-1 ( i ) ] f m-1 ( i ) [ β m-1 ( i-1 ) + w
n-i
* β m-1 ( i-1 ) f m-1 ( i ) +
n-i
β m-1 ( i-1 ) f m-1 ( i )
n-i
β m-1 ( i-1 ) f m-1 ( i ) + β m-1 ( n-1 ) f m-1 ( n )
i=1 n
=
∑λ i=1 n
=
∑λ
n
H * ˆ b, m-1 ( n-1 ) ∑ u m-1 ( i-1 ) f m-1 ( i ) = 0 w i=1
*
i=1 n
=
∑λ
*
*
i=1 *
= λ∆ m-1 ( n-1 ) + β m-1 ( n-1 ) f m-1 ( n )
(1)
Comparing this equation with Eq. (12.69) of the text, we deduce the equivalence *
*
η m-1 ( n )b m-1 ( n-1 ) = β m-1 ( n-1 ) f m-1 ( n )
(2)
(b) Applying Eqs. (12.48) and (12.49) of the text, we may write *
* η m-1 ( n )
f m-1 ( n ) = -----------------------γ m-1 ( n-1 )
(3)
b m-1 ( n-1 ) = γ m-1 ( n-1 )β m-1 ( n-1 )
(4)
Multiplying Eqs. (3) and (4): *
* η m-1 ( n )b m-1 ( n-1 )
f m-1 ( n ) = ------------------------ ⋅ γ m-1 ( n-1 )β m-1 ( n-1 ) γ m-1 ( n-1 ) *
= f m-1 ( n )β m-1 ( n-1 ) which proves Eq. (2)
257
12.9
(a) Let Φ m+1 ( n ) denote the (m+1)-by-(m+1) correlation matrix of the tap-input vector ˜ um+1(i) applied to the forward prediction error filter of order m, where 1 < i < n. Let am(n) denote the tap-weight vector of this filter, and Fm(n) denote the corresponding sum of weighted prediction-error squares. We may characterize this filter by the augmented normal equations
Φ m+1 ( n )a m ( n ) = ˜
F m(n)
(1)
0m
where 0m is the m-by-1 null vector. The correlation matrix Φ m+1 ( n ) may be ˜ partitioned in two different ways, depending on how we interpret the first or last element of the tap-input vector um+1(i). The form of partitioning that we like to use first is the one that enables us to relate the tap-weight vector am(n), pertaining to prediction order m, to the tap-weight vector am-1(n), pertaining to prediction order m-1. This aim is realized by using
Φ m+1 ( n ) = ˜
Φm ( n ) ˜ H
φ2 ( n ) ˜
φ2 ( n ) ˜
(2)
U 2(n)
where Φ m ( n ) is the m-by-m correlation matrix of the tap-input vector u m ( i ), φ 2 ( n ) is ˜ the m-by-1 cross-correlation vector between um(i) and u(i-m), and U2(n) is the˜ sum of weighted squared values of the input u(i-m) for 1 < i < n. Note that U2(n) is zero for n - m < 0. We postmultiply both sides of Eq. (2) by an (m+1)-by-1 vector whose first m elements are defined by the vector am-1(n) and whose last element equals zero. We may thus write
Φ m+1 ( n ) ˜
a m-1 ( n ) 0
Φm ( n ) = ˜ H φ2 ( n ) ˜
φ2 ( n ) a m-1 ( n ) ˜ 0 U 2(n)
Φ m ( n )a m-1 ( n ) = ˜ H φ 2 ( n )a m-1 ( n ) ˜
(3)
Both Φ m ( n ) and am-1(n) have the same time argument n. Furthermore, in the first line ˜ of Eq. (3), they are both positioned in such a way that when the matrix multiplication
258
is performed Φ m ( n ) becomes postmultiplied by am-1(n). For a forward prediction˜ error filter of order m-1, evaluated at time n, the set of augmented normal equations defined in Eq. (1) takes the form
Φ m ( n )a m-1 ( n ) = ˜
F m-1 ( n ) 0 m-1
Define the scalar H
∆ m-1 ( n ) = φ 2 ( n )a m-1 ( n ) ˜
(4)
Accordingly, we may rewrite Eq. (3) as
Φ m+1 ( n ) ˜
a m-1 ( n ) 0
F m-1 ( n ) =
(5)
0 m-1 ∆ m-1 ( n )
(b) For a definition of ∆ m-1 ( n ) , we have n
∆ m-1 ( n ) =
∑λ
n-i * f m-1 ( i )b m-1 ( i-1 )
(6)
i=1
For another definition of this same quantity, we have H
∆ m-1 ( n ) = φ 2 ( n )a m-1 ( n ) ˜
(7)
where n
φ2 ( n ) = ∑ λ ˜ i=1
n-i
*
u m ( i )u ( i-1 )
(8)
To show that these two definitions are equivalent, we first substitute Eq. (8) into (7): n
∆ m-1 ( n ) =
∑λ
n-i H u m ( i )a m-1 ( n )u ( i-m )
i=1
259
(9)
From the definition of the forward prediction error, we have H
f m-1 ( i ) = a m-1 ( n )u m ( i ) ,
1≤i≤n
We may therefore rewrite Eq. (9) as n
∆ m-1 ( n ) =
∑λ
n-i * f m-1 ( i )u ( i-m )
(10)
i=1
Next, from the definition of the backward prediction error, we have m-1
b m-1 ( i ) = u ( i-m ) –
∑ wb, m-1,k ( n )u ( i-k ) *
(11)
k=1
where w b, m-1,k ( n ) is the kth element of the backward predictor’s coefficient vector. Therefore, eliminating u(i-m) between Eqs. (10) and (11): n
∆ m-1 ( n ) =
∑λ
n-i * f m-1 ( i )b m-1 ( i )
i=1 n
+∑
n
∑λ
n-i * * w b, m-1,k ( n ) f m-1 ( i )u ( i-k )
(12)
i=1 k=1
But, the tap inputs u(i-1),u(i-2),...,u(i-m+1) are the very inputs involved in computing the forward prediction error. From the principle of orthogonality, we therefore have m-1
∑λ
n-i * f m-1 ( i )u ( i-k )
= 0
for all i
k=1
That is, n
∆ m-1 ( n ) =
∑λ
n-i * f m-1 ( i )b m-1 ( i )
i=1
H
= φ 2 ( n )a m-1 ( n ) ˜
which is the desired result.
260
(c) Consider next the backward prediction-error filter of order m. Let cm(n) denote its tapweight vector, and Bm(n) denote the corresponding sum of weighted prediction-error squares. This filter is characterized by the augmented normal equations written in the matrix form: 0m Φ m+1 ( n )c m ( n ) = ˜ B m(n)
(13)
where Φ m+1 ( n ) is as defined previously, and 0m is the m-by-1 null vector. This time ˜ we use the other partitioned form of the correlation matrix Φ m ( n ), as shown by ˜
Φ m+1 ( n ) = ˜
U 1(n) φ1 ( n ) ˜
H
φ1 ( n ) ˜
(14)
Φ 1 ( n-1 ) ˜
where U1(n) is the sum of weighted squared values of the input u(i) for the time interval 1 < i < n, φ 1 ( n ) is the m-by-1 cross-correlation vector between u(i) and the ˜ tap-input vector um(i-1), and Φ 1 ( n-1 ) is the m-by-m correlation matrix of um(i-1). ˜ Correspondingly, we postmultiply Φ m+1 ( n ) by an (m+1)-by-1 vector whose first ˜ element is zero and whose m remaining elements are defined by the tap-weight vector cm-1(n-1) that pertains to a backward prediction-error filter of order m-1. We may thus write H
U 1(n) φ1 ( n ) 0 0 = Φ m+1 ( n ) ˜ ˜ c m-1 ( n-1 ) φ 1 ( n ) Φ m ( n-1 ) c m-1 ( n-1 ) ˜ ˜ H
φ ( n )c m-1 ( n-1 ) = ˜1 Φ m ( n-1 )c m-1 ( n-1 ) ˜
(15)
Both Φ m ( n-1 ) and cm-1(n-1) have the same time argument, n-1. Also, they are both ˜ positioned in the first line of Eq. (15) in such a way that, when the matrix multiplication is performed, Φ m-1 ( n-1 ) becomes postmultiplied by cm-1(n-1). For a ˜
261
backward prediction-error filter of order m-1, evaluated at time n-1, the set of augmented normal equations in Eq. (13) takes the form 0 m-1 Φ m ( n-1 )c m-1 ( n-1 ) = ˜ B m-1 ( n-1 ) Define the second scalar H
∆′ m-1 ( n-1 ) = φ 1 ( n )c m-1 ( n-1 ) ˜
(16)
where the prime is intended to distinguish this new parameter from ∆ m-1 ( n-1 ) . Accordingly, we may rewrite Eq. (15) as
0 Φ m+1 ( n ) = ˜ c m-1 ( n-1 )
∆′ m-1 ( n ) (17)
0 m-1 B m-1 ( n-1 )
(d) The parameters ∆ m-1 ( n-1 ) and ∆′ m-1 ( n-1 ), defined by Eqs. (4) and (16), respectively, are in actual fact the complex conjugate of one another; that is, *
∆′ m-1 ( n ) = ∆ m-1 ( n )
(18)
*
where ∆ m-1 ( n ) is the complex conjugate of ∆ m-1 ( n ) . We prove this relation in three stages: 1. We premultiply both sides of Eq. (5) by the row vector H
[ 0, c m-1 ( n-1 ) ] where the superscript H denotes Hermitian transposition. The result of this matrix multiplication is the scalar
H
[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜
a m-1 ( n ) 0
H
= [ 0, c m-1 ( n-1 ) ]
F m-1 ( n-1 ) 0 m-1 ∆ m-1 ( n )
262
= ∆ m-1 ( n )
(19)
where we have used the fact that the last element of c m-1 ( n-1 ) equals unity. 2. We apply Hermitian transposition to both sides of Eq. (17), and use the Hermitian property of the correlation matrix Φ m+1 ( n ) , thereby obtaining ˜ H
*
T
[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) = [ ∆′ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ] ˜ *
where ∆′ m-1 ( n ) is the complex conjugate of ∆′ m-1 ( n ) and B m-1 ( n-1 ) is real valued. Next we use this relation to evaluate the scalar
[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜
a m-1 ( n )
*
0
T
= ∆′ m-1 ( n )0 m-1, B m-1 ( n-1 )
a m-1 ( n ) 0
*
= ∆′ m-1 ( n )
(20)
where we have used the fact that the first element of am-1(n) equals unity. 3. Comparison of Eqs. (19) and (20) immediately yields the relation of Eq. (18) between the parameters ∆ m-1 ( n ) and ∆′ m-1 ( n ) . (e) We are now equipped with the relations needed to derive the desired time-update for recursive computation of the parameter ∆ m-1 ( n ) . Consider the m-by-1 tap-weight vector am-1(n-1) that pertains to a forward predictionerror filter of order m-1, evaluated at time n-1. The reason for considering time n-1 will become apparent presently. Since the leading element of the vector am-1(n-1) equals unity, we may express ∆ m-1 ( n ) as follows [see Eq. (18) and (20)]: T
∆ m-1 ( n ) = [ ∆ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ]
263
a m-1 ( n-1 ) 0
(21)
Taking the Hermitian transpose of both sides of Eq. (17), recognizing the Hermitian property of Φ m+1 ( n ) and using the relation of Eq. (19), we get ˜ T
[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) = [ ∆ m-1 ( n ), 0 m-1, B m-1 ( n-1 ) ] ˜
(22)
Hence, substitution of Eq. (22) into (21) yields
∆ m-1 ( n ) = [ 0, c m-1 ( n-1 ) ]Φ m+1 ( n ) ˜
a m-1 ( n-1 )
(23)
0
But the correlation matrix Φ m+1 ( n ) may be time-updated as follows: ˜ H
Φ m+1 ( n ) = λΦ m+1 ( n-1 ) + u m+1 ( n )u m+1 ( n ) ˜ ˜
(24)
Accordingly, we may use this relation for Φ m+1 ( n ) to rewrite Eq. (23) as ˜ H
∆ m-1 ( n ) = λ [ 0, c m-1 ( n-1 ) ]Φ m+1 ( n-1 ) ˜
a m-1 ( n-1 )
H
0
H
+ [ 0, c m-1 ( n-1 ) ]u m+1 ( n )u m+1 ( n )
a m-1 ( n-1 )
(25)
0
Next, we recognize from the definition of forward a priori prediction error that H
u m+1 ( n )
a m-1 ( n-1 ) 0
H
*
= [ u m ( n ), u ( n-m ) ]
a m-1 ( n-1 ) 0
H
= u m ( n )a m-1 ( n-1 ) *
= η m-1 ( n )
(26)
and from the definition of the backward a posteriori prediction error that
264
H
H
[ 0, c m-1 ( n-1 ) ]u m+1 ( n ) = [ 0, c m-1 ( n-1 ) ]
u(n) u m ( n-1 )
H
= c m-1 ( n-1 )u m ( n-1 ) = b m-1 ( n-1 )
(27)
Also, by substituting n-1 for n into Eq. (5), we have
Φ m+1 ( n-1 ) ˜
a m-1 ( n-1 )
F m-1 ( n-1 ) =
0 m-1
0
∆ m-1 ( n-1 )
Hence, using this relation and the fact that the last element of the tap-weight vector cm-1(n-1), pertaining to the backward prediction-error filter, equals unity, we may write the first term on the right-hand side of Eq. (25), except for λ, as H
[ 0, c m-1 ( n-1 ) ]Φ m+1 ( n-1 ) ˜
a m-1 ( n-1 )
H
= [ 0, c m-1 ( n-1 ) ]
0 F m-1 ( n-1 ) 0 m-1 ∆ m-1 ( n-1 )
= ∆ m-1 ( n-1 )
(28)
Finally, substituting Eqs. (26), (27), and (28) into (25), we may express the timeupdate recursion for ∆ m-1 ( n ) simply as *
∆ m-1 ( n ) = λ∆ m-1 ( n-1 ) + b m-1 ( n-1 )η m-1 ( n ) which is the desired result.
265
(29)
12.10 (a) We start with the relations
Φ m+1 ( n )
a m-1 ( n )
F m-1 ( n ) =
(1)
0 m-1
0
∆ m-1 ( n )
and
0 = Φ m+1 ( n ) c m-1 ( n-1 )
∆ *m-1 ( n ) (2)
0 m-1 B m-1 ( n-1 )
Multiplying Eq. (2) by the ratio ∆ m-1 ( n ) ⁄ B m-1 ( n-1 ) and subtracting the result from Eq. (1):
Φ m+1 ( n )
a m-1 ( n ) 0
∆ m-1 ( n ) 0 – ------------------------B m-1 ( n-1 ) c m-1 ( n-1 ) 2
F m-1 ( n ) – ∆ m-1 ( n ) ⁄ B m-1 ( n-1 ) = 0m
(3)
Equation (3) represents the augmented normal equations for forward prediction with order m, as shown by
Φ m+1 ( n )a m-1 ( n ) =
Fm(n)
(4)
0m
on the basis of which we may immediately write 2
∆ m-1 ( n ) F m ( n ) = F m-1 ( n ) – ------------------------B m-1 ( n-1 ) (b) Multiplying Eq. (1) by ∆ *m-1 ( n ) ⁄ F m-1 ( n ) and subtracting the result from Eq. (2):
266
∆ *m-1 ( n ) a ( n ) 0 – -------------------- m-1 c m-1 ( n-1 ) F m-1 ( n ) 0
Φ m+1 ( n )
=
0m
(5)
2
B m-1 ( n-1 ) – ∆ m-1 ( n ) ⁄ F m-1 ( n )
Equation (5) represents the augmented normal equations for backward prediction with order m, as shown by 0m
Φ m+1 ( n )c m-1 ( n ) =
(6)
B m-1 ( n )
on the basis of which we may immediately write 2
B m ( n ) = B m-1 ( n-1 ) – ∆ m-1 ( n ) ⁄ F m-1 ( n ) 12.11 (a) From part (a) of problem 12.5, we have
–1 Φ M+1 ( n )
=
T
0
0M
0M
Φ M ( n-1 )
–1
H 1 + ----------------a M ( n )a M ( n ) . FM (n)
Similarly, from part (b) of the same problem, we have –1
–1 Φ M+1 ( n )
=
ΦM ( n ) T
0M
0M 0
H 1 + ----------------c M ( n )c M ( n ) . BM ( n )
–1
–1
Subtracting these two equations gives Φ M+1 ( n ) – Φ M+1 ( n ) = 0 , or
0 =
0 0M
T
–1
ΦM ( n ) H 1 + ----------------a M ( n )a M ( n ) – FM(n) –1 T Φ M ( n-1 ) 0M 0M
This is easily rearranged as
267
0M 0
H 1 – ----------------c M ( n )c M ( n ) BM ( n )
–1
ΦM ( n )
0M
T
0M
–
0
T
0
0M
0M
Φ M ( n-1 )
H H 1 1 = ----------------a M ( n )a M ( n ) – ----------------c M ( n )c M ( n ) . BM ( n ) FM (n)
–1
(b) From page 441, equation (9.16) we have the basic matrix recursion –1 ΦM ( n )
= λ
–1
–1 Φ M ( n-1 )
–1 λ
–1
H
–1
–1
Φ M ( n-1 )u ( n )u ( n )Φ M ( n-1 ) – λ -----------------------------------------------------------------------------------–1 –1 H 1 + λ u ( n )Φ M ( n-1 )u ( n )
From Eq. (9.18) of the text, we may introduce the gain vector –1
–1
λ Φ M ( n-1 )u ( n ) k M ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ M ( n-1 )u ( n ) which yields the expression –1
–1
–1
–1
H
–1
Φ M ( n ) = λ Φ M ( n-1 ) – λ k M ( n )u ( n )Φ M ( n-1 ) Now, again from Eq. (9.18) of the text, we have –1 H
–1
H
–1 H
–1
λ u ( n )Φ M ( n-1 ) = k M ( n ) ( 1 + λ u ( n )Φ M ( n-1 )u ( n ) ) But from Eq. (10.100) of the text, we may introduce the conversion factor as 1 γ M ( n ) = --------------------------------------------------------------------–1 H –1 1 + λ u ( n )Φ M ( n-1 )u ( n ) so that the basic matrix inversion lemma becomes H
–1 ΦM ( n )
= λ
–1
–1 Φ M ( n-1 )
k M ( n )k M ( n ) – -------------------------------γ M (n)
(c) The result of part (b) can be rearranged as H
–1 Φ M ( n-1 )
=
–1 λΦ M ( n ) +
k M ( n )k M ( n ) λ -------------------------------γ M (n)
268
Inserting this recursion into part (a), we have –1
ΦM ( n )
0M
T 0M
0
–
T
0
0M –1
H
λΦ M ( n ) + λk M ( n )k M ( n ) ⁄ γ M ( n )
0M
H H 1 1 = ----------------a M ( n )a M ( n ) – ----------------c M ( n )c M ( n ) FM (n) BM ( n )
This, in turn, can be rearranged as –1
ΦM ( n )
0M
T 0M
0
T
0
0M
0M
ΦM ( n )
–λ
–1
H
H 0 k M ( n ) c ( n )c H ( n ) a M ( n )a M ( n ) 0 M M -------------------------- – -----------------------------= -------------------------------- + λ -. FM (n) BM ( n ) kM ( n ) γ M ( n )
(d) Now we multiply the solution from (c) from the left by 2
M
[ 1, ( z ⁄ λ ), ( z ⁄ λ ) , …, ( z ⁄ λ ) ] , and from the right by M H
2
[ 1, ( w ⁄ λ ), ( (w ) ⁄ λ , …, w ⁄ λ) ] . First note that, if we set 1 0M 0
w* ⁄ λ ...
M
P ( z, w* ) = [ 1 ( z ⁄ λ )… ( z ⁄ λ ) ]
–1 ΦM ( n ) T 0M
( w* ⁄ λ )
M
then, upon displacing this matrix one position along the main diagonal, we obtain
269
1 0 0M
w* ⁄ λ
= zw*P ( z, w* )
...
M
[ 1 ( z ⁄ λ )… ( z ⁄ λ ) ]λ
T 0M –1 ΦM ( n )
( w* ⁄ λ )
M
Thus the left-hand side of part (c) yields the two-variable polynomial ( 1 – zw* )P ( z, w* ) Similarly, rewrite the right-hand side as the sum and difference of dyads: H
H
cM ( n ) aM ( n ) cM ( n ) aM ( n ) 0 λ λ -------------- – -------------------H -------------------- -------------------- + ---------------------------------γ M ( n ) kM ( n ) γ M ( n ) 0 kM ( n ) FM (n) FM (n) BM ( n ) BM ( n ) Accordingly, if we set M aM ( n ) A ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------FM(n) M 0 λ K ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] --------------γ M ( n ) kM ( n ) M cM ( n ) C ( z ) = [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------BM ( n )
then we will have
M aM ( n ) [ 1, z ⁄ λ … ( z ⁄ λ ) ] -------------------- -------------------FM (n) FM (n)
1 w* ⁄ λ
= A ( z ) A* ( w ),
...
H aM ( n )
( w* ⁄ λ )
M
and likewise for the remaining terms. Putting all this together, we finally obtain ( 1 – zw* )P ( z, w* ) = A ( z ) A* ( w ) + K ( z )K * ( w ) – B ( z )B* ( w )
270
(e) If we set z = w = e jω, then the result of part (d) reads as jω ( – j )ω
e = 0
jω
) P(e , e
– jω
(1 – e
) = A(e
jω
) A* ( e
jω
) + K (e
jω
)K * ( e
jω
) – C(e
jω
)C* ( e
This gives A(e
jω 2
jω 2
) + K (e
)
= C(e
jω 2
) ,
for all ω.
(f) If we set w* = z* in the result of part (d), we have 2
( 1 – z )P ( z, z* ) = A ( z ) A* ( z ) + K ( z )K * ( z ) – C ( z )C* ( z ) 2
2
= A(z) + K (z) – C (z)
2
(1)
–1
Now, since Φ M ( n ) is a positive definite matrix, it is true that 1 M-1
–1
]Φ m ( n )
z* ⁄ λ
> 0,
...
P ( z, z* ) = [ 1, z ⁄ λ, …, ( z ⁄ λ )
( z* ⁄ λ )
for all z.
M
Accordingly, the right-hand side of Eq. (1) above must take the same sign as (1 - |z|2), which gives <0, A ( z ) + K ( z ) – C ( z ) =0, >0, 2
2
2
z >1; z =1; z <1.
(g) From the first inequality of part (f), we have 2
2
2
A ( z ) + K ( z ) – C ( z ) < 0,
in z >1.
Now, if C(z) were to have a zero with modulus greater than one, that is, C ( z 0 ) = 0,
with z 0 >1
271
(2)
jω
)
then we would have 2
C ( z0 )
2
2
2
= A ( z 0 ) + K ( z 0 ) ≥ 0.
2
A ( z0 ) + K ( z0 ) –
0 With |z0| > 1, we obtain a contradiction to Eq. (2) above. Accordingly, C(z) can have no zeros in |z| > 1, so that it may be determined without phase ambiguity from A(z) and K(z) by way of spectral factorization using the result of part (e). 12.12 (a) Backward prediction The first three lines of Table 12.4 follow from the state-space models of Eq. (12.87) through (12.95), togehter with Table 10.3 of Chapter 10 in the text. For the remaining three entries, we may proceed as follows: –1
n-1
(a) K ( n-1 ) ↔ λ Φ ( n-1 ) = λ ∑ λ i=1 –1
–1
n – i+1
2
ε f , m-1 ( i-1 )
–1 –1
= λ F m-1 ( n-1 ) (b) g ( n ) ↔ λ
–1 ⁄ 2
k(n) = λ
–1 ⁄ 2 –1 F m-1 ( n )ε f , m-1 ( n )
= λ
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m-1 ( n-1 )
= λ
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m-1 ( n )
e(n) ↔ λ
–n ⁄ 2
*
*
+ η m-1 ( n )κ b, m ( n-1 )
*
( ε b, m-1 ( n-1 ) + ε f , m-1 ( n )κ b, m ( n ) )
= λ
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 ) ( b m-1 ( n-1 )
= λ
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )b m ( n )
Hence, B m ( n ) α(n) r ( n ) = ----------- = γ m-1 ( n-1 ) --------------- e(n) bm ( n )
272
*
+ f m-1 ( n )κ b, m ( n ) )
–1
γ m-1 ( n-1 ) = -----------------------γ m(n) (b) The relations for joint-process estimation follow a similar procedure. 12.13 (a) The correction in the update equation for Emin(n) is defined by the product term α m ( n )e *m ( n ) . This correction term is also equal to |εm+1(n)|2. Hence ε m+1 ( n )ε *m+1 ( n ) = α m ( n )e *m ( n ) By definition εm ( n ) =
em ( n ) ⋅ αm ( n )
We know α m ( n )e *m ( n ) is real. Hence, we must have arg [ ε m ( n ) ] = arg [ e m ( n ) ] Moreover, it is natural to have arg [ ε m ( n ) ] = arg [ e m ( n ) ] which goes to show that arg [ ε m ( n ) ] = arg [ e m ( n ) ] = arg [ α m ( n ) ] (b) The correction term in the update equation for Bmin(n) is defined by the product term ψ m ( n )b *m ( n ) . This correction term is also equal to |εb,m(n)|2. Hence, ε b, m ( n )ε *b, m ( n ) = ψ m ( n )b *m ( n ) By definition, ε b, m ( n ) =
bm ( n ) ⋅ ψ m ( n )
We know ψ m ( n )b *m ( n ) is real. Hence, we must have arg [ b m ( n ) ] = arg [ ψ m ( n ) ]
273
Moreover, it is natural to have arg [ ε b, m ( n ) ] = arg [ b m ( n ) ] Therefore, arg [ ε b, m ( n ) ] = arg [ b m ( n ) ] = arg [ ψ m ( n ) ] (c) The correction term in the update equation for Fm(n) is η m ( n ) f *m ( n ) which is also equal to |εf,m(n)|2. Hence, ε f , m ( n )ε *f , m ( n ) = η m ( n ) f *m ( n ) By definition, ε f , m(n) =
f m ( n ) ⋅ ηm ( n )
We know η m ( n ) f *m ( n ) is real. Hence, we must have arg [ f m ( n ) ] = arg [ η m ( n ) ] Moreover, it is natural to have arg [ ε f , m ( n ) ] = arg [ f m ( n ) ] Therefore, arg [ ε f , m ( n ) ] = arg [ f m ( n ) ] arg [ η m ( n ) ] 12.14 We start with 1⁄2 1⁄2
λ B m-1 ( n-1 ) c b, m-1 ( n ) = -------------------------------------1⁄2 B m-1 ( n )
(1)
Also, we note that
274
1⁄2 1⁄2
λ F m-1 ( n-1 ) c f , m-1 ( n ) = -------------------------------------1⁄2 F m-1 ( n )
(2)
But from the solution to part (d) of Problem 12.6: 1⁄2 1⁄2
λ B m-1 ( n-1 ) 1 ⁄ 2 1⁄2 -------------------------------------- γ m-1 ( n ) = γ m ( n ) 1⁄2 B m-1 ( n )
(3)
We next adapt the relation of part (c) of Problem 12.6 to our present situation: 1⁄2 γ m (n)
1⁄2 1⁄2
λ F m-1 ( n-1 ) 1 ⁄ 2 = -------------------------------------- γ m-1 ( n ) 1⁄2 Fm (n)
(4)
Hence, using Eqs. (1) and (3): 1⁄2
1⁄2
γ m ( n ) = c b, m-1 ( n )γ m-1 ( n )
(5)
and using Eqs. (2) and (4): 1⁄2
1⁄2
γ m ( n ) = c f , m-1 ( n )γ m-1 ( n )
(6)
Multiplying Eqs. (1) and (2), we get γ m ( n ) = c b, m-1 ( n )c f , m-1 ( n )γ m-1 ( n )
(7)
which is the desired result. H
12.15 The matrix product Φ m+1 ( n )L m ( n ) equals 1 c 1, 1 ( n ) … c m-1,m-1 ( n ) 0 1 … c m-1,m-2 ( n ) H : : Φ m+1 ( n )L m ( n ) = Φ m+1 ( n ) : 0 0 … 1 0
0
…
275
0
c m, m ( n ) c m,m-1 ( n ) : c m, 1 ( n ) 1
(1)
(1) From the augmented system of normal equations, we have c m, m ( n )
...
=
...
Φ m+1 ( n )
0 0
c m, m – 1 ( n ) c m, 1 ( n )
Bm ( n )
1 (2) From the solution to Problem 12.9 we note that
Φ m+1 ( n ) =
Φm ( n ) H
φ2 ( n )
φ2 ( n ) U 2(n)
Hence c m-1, m-1 ( n )
...
Φ m+1 ( n )
c m-1, m – 2 ( n )
=
Φ m ( n ) c m-1, m – 2 ( n ) ...
c m-1, m-1 ( n )
1
1 0
H φ 2 ( n )c m-1 ( n )
]
...
0 0 =
B m-1 ( n ) H
φ 2 ( n )c m-1 ( n ) (3) Thus far we have dealt with the last two columns of the matrix on the right-hand side of Eq. (1). Similarly, we may go onto show that
276
c 1, 1 ( n ) 1
...
=
...
Φ m+1 ( n )
0 B1 ( n ) x x
0 0
where the crosses refer to some nonzero elements. Finally, we arrive at
x =
...
...
Φ m+1 ( n ) =
B0 ( n )
1 0 0 0
x x
where again the crosses refer to some other nonzero elements.
0
B1 ( n ) …
0
0
...
0
...
x
0
...
B0 ( n ) H Φ m+1 ( n )L m ( n )
…
...
Putting all of these pieces together, we conclude that
x
x
x
x
… B m-1 ( n ) 0 … x Bm ( n )
H
This shows that Φ m+1 ( n )L m ( n ) is a lower triangular matrix. Since L m ( n ) is itself a lower triangular matrix, all the elements of L m ( n ) above the main diagonal are zero. H
Moreover, since Φ m+1 ( n ) is Hermitian symmetric, L m ( n )Φ m+1 ( n )L m ( n ) is likewise Hermitian symmetric. Accordingly, all the elements of this product below the main diagonal are also zero. Finally, since all the diagonal elements of L m ( n ) equal unity, we H
conclude that L m ( n )Φ m+1 ( n )L m ( n ) is a diagonal matrix, as shown by H
L m ( n )Φ m+1 ( n )L m ( n ) = diag [ B 0 ( n ), B 1 ( n ), …, B m-1 ( n ), B m ( n ) ]
277
12.16 The joint probability density function of the time series u(n),u(n-1),...,u(n-M) equals –1 1 H 1 f U ( u ) = -------------------------------------- exp – --- u ( n )R u ( n ) 1⁄2 2 [ 2πdet ( R ) ]
where uT(n) = [u(n),u(n-1),...,u(n-M)] and R is the (M+1)-by-(M+1) ensemble-averaged correlation matrix of the input vector u(n). The log-likelihood function equals L = ln f U ( u ) –1 1 1 H = – --- ln [ 2πdet ( R ) ] – --- u ( n )R u ( n ) 2 2
(1)
For n > M+1, we may approximate the correlation matrix R in terms of the deterministic correlation matrix Φ ( n ) as 1 R ≈ --- Φ ( n ), n
n ≥ M+1
(2)
Hence, we may rewrite Eq. (1) as –1 1 1 H L ≈ – --- ln [ 2πdet ( R ) ] – ------ u ( n )Φ ( n )u ( n ) 2 2n
(3)
The second term on the right side of Eq. (3) equals –1 1 H 1 ------ u ( n )Φ ( n )u ( n ) = ------ [ 1 – γ ( n ) ] 2n 2n
where γ ( n ) is the conversion factor. 12.17 The a posteriori estimation em(n) is order-updated by using the recursion e m ( n ) = e m-1 ( n ) – κ *m ( n )b m ( n ),
m = 1, 2, …, M
(1)
By definition, we have e m ( n ) = d ( n ) – dˆ ( n U n-m )
(2)
278
where d(n) is the desired response and dˆ ( n U n-m ) is the least-squares estimate of d(n) given the input samples u(n),...,u(n-m+1),u(n-m) that span the space Un-m. Similarly, e m-1 ( n ) = d ( n ) – dˆ ( n U n-m+1 )
(3)
where dˆ ( n U n-m+1 ) is the least-squares estimate of d(n) given the input samples u(n),...,u(n-m+1). Substituting Eqs. (1) and (2) into (3), we get dˆ ( n U n-m ) = dˆ ( n U n-m+1 ) + h *m ( n )b m ( n )
(4)
Equation (4) shows that given the regression coefficient hm, we only need bm(n) as the new piece of information for updating the least-squares estimate of the desired response. Hence, bm(n) may be viewed as a form of innovation, which is in perfect agreement with the discussion presented in Section 10.1. 12.18 From Eqs. (12.159) of the text: H
D m+1 ( n ) = L m ( n )Φ m+1 ( n )L m ( n ) Equivalently, we may write –1
H
–1
Φ m+1 ( n ) = L m ( n )D m+1 ( n )L m ( n ) Hence, –1
P ( n ) = Φ m+1 ( n ) = [D
–1 ⁄ 2
H
( n )L ( n ) ] [ D
–1 ⁄ 2
( n )L ( n ) ]
12.19 (a) From Eq. (12.152), xˆ ( n+1 Y n ) = λ
–1 ⁄ 2
xˆ ( n+1 Y n-1 ) + g ( n )α ( n )
(1)
From Eqs. (12.97) and the solution to Problem 12.12: g(n) ↔ λ
–1 ⁄ 2 –1 F m-1 ( n )ε f , m-1 ( n )
(2)
279
α(n) ↔ λ
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )β m ( n )
(3)
(It is important to note that the relation corresponding to α(n) must involve the error signal appropriate to the estimation problem of interest.) Also from Table 12.4 (under backward prediction) xˆ ( n+1 Y n-1 ) ↔ ( – λ
–n ⁄ 2
κ b, m ( n-1 ) )
(4)
Hence, substituting Eqs. (2), (3) and (4) into (1), and cancelling the common term λ
–( 1 ⁄ 2 ) ⁄ 2
: 1⁄2
γ m-1 ( n-1 )ε f , m-1 ( n ) * κ b, m ( n ) = κ b, m ( n-1 ) – ------------------------------------------------- β m ( n ) F m-1 ( n )
(5)
But, from Eq. (12.80): 1⁄2
ε f , m-1 ( n-1 ) = γ m-1 ( n-1 )η m-1 ( n )
(6)
Thus Eq. (5) simplifies to 1⁄2
γ m-1 ( n-1 )η m-1 ( n ) * κ b, m ( n ) = κ b, m ( n-1 ) – --------------------------------------------- β m ( n ) F m-1 ( n )
(7)
which is the desired result; see Eq. (12.154). (b)From Table 12.4 (under joint-process estimation) and the solution to Problem 12.12: xˆ ( n Y n-1 ) ↔ λ g(n) ↔ λ
–n ⁄ 2
h m-1 ( n-1 )
(10)
–1 ⁄ 2 –1 B m-1 ( n )ε b, m-1 ( n )
(11)
–n ⁄ 2 1 ⁄ 2 * γ m-1 ( n-1 )ξ m ( n )
(12)
α(n) ↔ λ
Hence, using Eqs. (10) through (12) in (1), we get
280
1⁄2
γ m-1 ( n-1 )ε b, m-1 ( n ) * h m-1 ( n ) = h m-1 ( n-1 ) + ------------------------------------------------- ξ m ( n ) B m-1 ( n )
(13)
From Eq. (12.80): ε b, m-1 ( n ) = γ m-1 ( n-1 )β m-1 ( n-1 ) Hence, γ m-1 ( n )β m-1 ( n ) * h m-1 ( n ) = h m-1 ( n-1 ) + --------------------------------------- ξ m ( n ) B m-1 ( n )
(14)
which is exactly the same as Eq. (12.155). 12.20 For all m, when we put *
*
κ b, m ( n ) = κ f , m ( n ) = κ m ( n ) γm = 1 in Table 12.7, we get F m-1 ( n ) = λF m-1 ( n-1 ) + f m-1 ( n )
2
(1)
B m-1 ( n-1 ) = λB m-1 ( n-2 ) + b m-1 ( n-1 )
2
*
(2)
f m ( n ) = f m-1 ( n ) + κ m ( n-1 )b m-1 ( n-1 )
(3)
b m ( n ) = b m-1 ( n-1 ) + κ m ( n-1 ) f m-1 ( n )
(4)
*
b m-1 ( n-1 ) f m ( n ) κ m ( n ) = κ m ( n-1 ) – ---------------------------------------B m-1 ( n-1 )
(5)
*
f m-1 ( n )b m ( n ) = κ m ( n-1 ) – ----------------------------------F m-1 ( n )
(6)
281
Adding Eqs. (1) and (2), and setting E m-1 ( n ) = F m-1 ( n ) + B m-1 ( n-1 ) we get 2
2
E m-1 ( n ) = λE m-1 ( n-1 ) + ( f m-1 ( n ) + b m-1 ( n-1 ) )
(7)
Equation (7) is recognized as the update equation (12.9) for the GAL algorithm in the text. The only outstanding issue is the recursive relation for the reflection coefficient. Adding Eqs. (5) and (6) and setting κ m ( n ) = κˆ m ( n ) and dividing by two, we get *
*
1 b m-1 ( n-1 ) f m ( n ) f m-1 ( n )b m ( n ) κˆ m ( n ) = κˆ m ( n-1 ) – --- ---------------------------------------- + ----------------------------------2 B m-1 ( n-1 ) F m-1 ( n )
(8)
Now assuming that 1 B m-1 ( n-1 ) = F m-1 ( n ) = --- E m-1 ( n ) 2
(9)
then Eq. (8) takes the form * * 1 κˆ m ( n ) = κˆ m ( n-1 ) – -------------------- [ b m-1 ( n-1 ) f m ( n ) + f m-1 ( n )b m ( n ) ] E (n)
(10)
m-1
Equation (10) is recognized as the update equation (12.15) of the algorithm in the text. Conclusions: The GAL algorithm is a special case of the recursive LSL algorithm, using a posteriori estimation errors under the following settings: κ b, m ( n ) = κˆ f , m ( n ) γ m(n) = 1 1 F m-1 ( n ) = B m-1 ( n-1 ) = --- E m-1 ( n ) 2
282
12.21 The forward reflection coefficient equals ∆ m-1 ( n ) κ f , m ( n ) = ------------------------B m-1 ( n-1 ) *
∆ m-1 ( n-1 ) b m-1 ( n-1 ) f m-1 ( n ) = – λ ------------------------- – -------------------------------------------------B m-1 ( n-1 ) γ m-1 ( n-1 )B m-1 ( n-1 ) *
b m-1 ( n-1 ) f m-1 ( n ) * B m-1 ( n-2 ) ∆ m-1 ( n-1 ) = – λ ------------------------- ------------------------- – --------------------------------------------------ξ m ( n ) B m-1 ( n-1 ) B m-1 ( n-2 ) γ m-1 ( n-1 )B m-1 ( n-1 ) where, in the first term, we have multiplied and divided by B m-1 ( n-2 ) . But ∆ m-1 ( n-1 ) κ f , m ( n-1 ) = – ------------------------B m-1 ( n-2 ) Hence, *
B m-1 ( n-2 ) b m-1 ( n-1 ) f m-1 ( n ) κ f , m ( n ) = ------------------------- κ f , m ( n-1 ) – -----------------------------------------------------B m-1 ( n-1 ) λB m-1 ( n-2 )γ m-1 ( n-1 ) We now use the relations (see part d of Problem 12.6) γ m ( n-1 ) λB m-1 ( n-2 ) ----------------------------- = -----------------------B m-1 ( n-1 ) γ m-1 ( n-1 ) Hence, we may rewrite Eq. (1) as *
γ m ( n-1 ) b m-1 ( n-1 ) f m-1 ( n ) κ f , m ( n ) = ------------------------ κ f , m ( n-1 ) – -----------------------------------------------------γ m-1 ( n-1 ) λB m-1 ( n-2 )γ m-1 ( n-1 ) Next, the backward reflection coefficient equals ∆ *m-1 ( n ) κ b, m ( n ) = – -------------------F m-1 ( n )
283
(1)
∆ *m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = – λ ------------------------- – --------------------------------------------F m-1 ( n ) F m-1 ( n )γ m-1 ( n-1 ) F m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = λκ b, m ( n-1 ) ------------------------- – --------------------------------------------F ( n ) F ( n )γ ( n-1 ) m-1
m-1
m-1
F m-1 ( n-1 ) b *m-1 ( n-1 ) f m-1 ( n ) = λ ------------------------- κ b, m ( n-1 ) – -----------------------------------------------------F m-1 ( n ) λF m-1 ( n-1 )γ m-1 ( n-1 )
(2)
We now recognize that (see part c of Problem 12.6) F m-1 ( n-1 ) γ m(n) λ ------------------------- = -----------------------F m-1 ( n ) γ m-1 ( n-1 ) Hence, we may rewrite Eq. (2) as γ m(n) b *m-1 ( n-1 ) f m-1 ( n ) κ b, m ( n ) = ------------------------ κ b, m ( n-1 ) – -----------------------------------------------------γ m-1 ( n-1 ) λF m-1 ( n-1 )γ m-1 ( n-1 ) 12.22 The forward a posteriori prediction error equals f m ( n ) = f m-1 ( n ) + κ *f , m ( n )b m-1 ( n-1 ) ∆ *m-1 ( n ) = f m-1 ( n ) – ------------------------- b m-1 ( n-1 ) B ( n-1 ) m-1
Hence, the normalized value of fm(n) equals f m(n) f m ( n ) = ---------------------------------------------1⁄2 1⁄2 F m ( n )γ m ( n-1 ) ∆ *m-1 ( n ) f m-1 ( n ) = ---------------------------------------------- – -----------------------------------------------------------------------b m-1 ( n-1 ) 1⁄2 1⁄2 1⁄2 1⁄2 F m ( n )γ m ( n-1 ) B m-1 ( n-1 )F m ( n )γ m ( n-1 ) But
284
(1)
2
∆ m-1 ( n ) F m ( n ) = F m-1 ( n ) 1 – ---------------------------------------------F m-1 ( n )B m-1 ( n-1 ) = F m-1 ( n ) [ 1 – ∆ m-1 ( n ) ]
(2)
where ∆ m-1 ( n ) ∆ m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) Similarly, 2
∆ m-1 ( n ) B m ( n ) = B m-1 ( n-1 ) 1 – ---------------------------------------------B m-1 ( n-1 )F m-1 ( n ) 2
= B m-1 ( n-1 ) [ 1 – ∆ m-1 ( n ) ]
(3)
Also, we may write 2
b m-1 ( n-1 ) γ m ( n-1 ) = γ m-1 ( n-1 ) 1 – -------------------------------------------------γ m-1 ( n-1 )B m-1 ( n-1 ) 2
= γ m-1 ( n-1 ) [ 1 – b m-1 ( n-1 ) ]
(4)
where b m-1 ( n-1 ) b m-1 ( n-1 ) = --------------------------------------------------1⁄2 1⁄2 γ m-1 ( n-1 )B m-1 ( n-1 )
(5)
Hence, we may use Eqs. (2) and (4) to express the first term on the right side of Eq. (1) as 1⁄2
1⁄2
f m-1 ( n ) ⁄ [ F m-1 ( n )γ m-1 ( n-1 ) ] f m-1 ( n ) ---------------------------------------------- = ----------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 F m ( n )γ m ( n-1 ) [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]
285
f m-1 ( n ) = ----------------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]
(6)
Next, we use Eqs. (2) to (4) to express the second term on the right side of Eq. (1) as
∆ *m-1 ( n )b m-1 ( n-1 ) ∆ *m-1 ( n )b m-1 ( n-1 ) -------------------------------------------------------------------------- = --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 1⁄2 1⁄2 B m-1 ( n-1 )F m ( n )γ m ( n-1 ) B m-1 ( n-1 )F m ( n )γ m ( n-1 ) 1 – ∆ m-1 ( n ) 1 – b m-1 ( n-1 )
∆ *m-1 ( n )b m-1 ( n-1 )
= ----------------------------------------------------------------------------------------1⁄2 1⁄2 1 – ∆ m-1 ( n )
2
1 – b m-1 ( n-1 )
2
(7)
Substituting Eqs. (6) and (7) into (1), we may thus write f m-1 ( n ) – ∆ *m-1 ( n )b m-1 ( n-1 ) f m ( n ) = ----------------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ]
(8)
Next, the backward a posteriori prediction error equals ∆ m-1 ( n ) b m ( b ) = b m-1 ( n-1 ) – -------------------- f m-1 ( n ) F m-1 ( n ) Hence, the normalized value of bm(n) equals ∆ m-1 ( n ) f m-1 ( n ) b m-1 ( n-1 ) b m ( b ) = ---------------------------------------- – ------------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 B m ( n )γ m ( n ) F m-1 ( n )B m ( n )γ m ( n ) The conversion factor may be updated by using the recursion: 2
f m-1 ( n ) γ m ( n ) = γ m-1 ( n-1 ) – ------------------------F m-1 ( n )
2
f m-1 ( n ) = γ m-1 ( n-1 ) 1 – --------------------------------------------F m-1 ( n )γ m-1 ( n-1 )
286
(9)
2
= γ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ]
(10)
where f m-1 ( n ) f m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )γ m-1 ( n-1 ) Using Eqs. (3) and (10), we may express the first term on the right side of (Eq. 9) as 1⁄2
1⁄2
b m-1 ( n-1 ) b m-1 ( n-1 ) ⁄ [ B m-1 ( n-1 )γ m-1 ( n-1 ) ] ---------------------------------------- = -----------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 B m ( n )γ m ( n ) [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] b m-1 ( n-1 ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ]
(11)
Using Eqs. (3) and (9), we may also rewrite the second term on the right side of Eq. (9) as ∆ m-1 ( n ) f m-1 ( n ) ∆ m-1 ( n ) f m-1 ( n ) ------------------------------------------------------------- = ------------------------------------------------------------------------------------------------------------------------------------------------------------------------1⁄2 1⁄2 2 1⁄2 2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m ( n )γ m ( n ) F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 ) [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] ∆ m-1 ( n ) f m-1 ( n ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ]
(12)
Hence, the use of Eqs. (11) and (12) in (9) yields b m-1 ( n-1 ) – ∆ m-1 ( n ) f m-1 ( n ) b m ( n ) = -----------------------------------------------------------------------------------------------2 1⁄2 2 1⁄2 [ 1 – ∆ m-1 ( n ) ] [ 1 – f m-1 ( n ) ] Finally, we note that b m-1 ( n-1 ) f *m-1 ( n ) ∆ m-1 ( n ) = λ∆ m-1 ( n-1 ) + --------------------------------------------γ m-1 ( n-1 )
287
(13)
The normalized value of ∆ m-1 ( n ) equals ∆ m-1 ( n ) ∆ m-1 ( n ) = ---------------------------------------------1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) ∆ m-1 ( n ) b m-1 ( n-1 ) f *m-1 ( n ) ---------------------------------------------= λ + ----------------------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 )
(14)
Using Eqs. (2) and (3), we may rewrite the first term on the right side of Eq. (14) as 1⁄2
1⁄2
∆ m-1 ( n-1 ) ∆ m-1 ( n-1 ) γ F m-1 ( n-1 )B m-1 ( n-2 ) λ ---------------------------------------------- = ---------------------------------------------------- ------------------------------------------------------1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 ) F m-1 ( n-1 )B m-1 ( n-2 ) F m-1 ( n )B m-1 ( n-1 )
(15)
We next use the relations: λF m-1 ( n-1 ) γ m(n) ----------------------------- = -----------------------F m-1 ( n ) γ m-1 ( n-1 ) λB m-1 ( n-2 ) γ m ( n-1 ) ----------------------------- = -----------------------B m-1 ( n-1 ) γ m-1 ( n-1 ) ∆ m-1 ( n-1 ) ∆ m-1 ( n-1 ) = ---------------------------------------------------1⁄2 1⁄2 F m-1 ( n-1 )B m ( n-2 ) Hence, we may rewrite Eq. (15) as 1⁄2
1⁄2
∆ m-1 ( n-1 ) γ m ( n )γ m ( n-1 ) λ ---------------------------------------------- = ∆ m-1 ( n-1 ) --------------------------------------------1⁄2 1⁄2 γ m-1 ( n-1 ) F m-1 ( n )B m-1 ( n-1 ) Next, multiplying Eq. (4) by (10), we have γ m ( n )γ m ( n-1 ) 2 2 ----------------------------------- = [ 1 – f m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ] 2 γ m-1 ( n-1 ) Accordingly, we may rewrite (16) as
288
(16)
∆ m-1 ( n-1 ) 2 1⁄2 2 1⁄2 λ ---------------------------------------------- = ∆ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ] [ 1 – b m-1 ( n-1 ) ] 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 )
(17)
Next, the second term on the right side of Eq. (14) equals b m-1 ( n-1 ) f *m-1 ( n ) ----------------------------------------------------------------------- = b m-1 ( n-1 ) f *m-1 ( n ) 1⁄2 1⁄2 F m-1 ( n )B m-1 ( n-1 )γ m-1 ( n-1 )
(18)
Thus, substituting Eqs. (17) and (18) into (14), we get the desired time-update 2 1⁄2
∆ m-1 ( n ) = ∆ m-1 ( n-1 ) [ 1 – f m-1 ( n ) ]
2 1⁄2
[ 1 – b m-1 ( n-1 ) ]
+ b m-1 ( n-1 ) f *m-1 ( n )
(19)
Equations (19), (8) and (13), in that order, constitute the normalized LSL algorithm.
289
CHAPTER 13 13.1
The analog (infinite-precision) form of the LMS algorithm is described by T
ˆ ( n+1 ) = w ˆ ( n ) + µu ( n ) [ d ( n ) – w ( n )u ( n ) ], w
n = 0, 1, 2, …
ˆ ( n ) is the tap-weight vector estimate at time n, u(n) is the tap-input vector, d(n) is where w the desired response, and µ is the step-size parameter. The digital (finite-precision) counterpart of this update may be expressed as ˆ q ( n+1 ) = w ˆ q ( n ) + Q [ µu q ( n )e q ( n ) ] w where T
e ( n ) = d ( n ) – w ( n )u ( n ) and the use of subscripts q signifies the use of finite-precision arithmetic. Let Q [ µu q ( n )e q ( n ) ] = µu q ( n )e q ( n ) + v ( n ) where the quantizing noise vector v(n) is determined by the manner in which the term u q ( n )e q ( n ) is computed. Hence, ˆ q ( n ) + µu q ( n )e q ( n ) + v ( n ) ˆ q ( n+1 ) = w w
(1)
The quantized value of the estimation error e(n) may be expressed as T
T
e q ( n ) = e ( n ) – ∆u ( n )w ( n ) – u ( n )∆w ( n ) + ζ ( n ) where ζ ( n ) denotes residual error. The quantized value of u(n) equals u q ( n ) = u ( n ) + ∆u ( n ) Hence, we may express u q ( n )e q ( n ), ignoring second-order effects, as follows T
ˆ (n) u q ( n )e q ( n ) = u ( n )e ( n ) + ∆u ( n )e ( n ) – u ( n )∆u ( n )w T
ˆ ( n ) + u ( n )ζ ( n ) – u ( n )u ( n )∆w
290
We may therefore rewrite Eq. (1) as ˆ q ( n ) + µu q ( n )e q ( n ) + µ∆u ( n )e ( n ) ˆ q ( n+1 ) = w w T
T
ˆ ( n ) – µu ( n )u ( n )∆w ˆ ( n ) + µu ( n )ζ ( n ) – µu ( n )∆u ( n )w + v(n)
(2)
But ˆ q ( n+1 ) = w ˆ ( n+1 ) + ∆w ˆ ( n+1 ) w ˆ q(n) = w ˆ ( n ) + ∆w ˆ (n) w ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e ( n ) w Hence, from Eq. (2) we deduce that ∆w ( n+1 ) = F ( n )∆w ( n ) + t ( n ) where T
F ( n ) = I – µu ( n )u ( n ) T
ˆ ( n )∆u ( n ) + µu ( n )ζ ( n ) + v ( n ) t ( n ) = µ∆u ( n )e ( n ) – µu ( n )w Note that T
T
ˆ (n) = w ˆ ( n )∆u ( n ) ∆u ( n )w 13.2
Building on the solution to Problem 13.1, assume that ∆w ( n ) is stationary. Then the expectation of ∆w ( n ) is zero because the expectation of t(n) is zero.
13.3
We note that yI ( n ) =
∑ wi u ( n – i ) i
y II ( n ) =
∑ wiq u ( n – i ) i
291
Hence, ε ( n ) = y I ( n ) – y II ( n ) =
∑ ( wi – wiq )u ( n – i ) i
The mean-square value of ε(n) is
∑ ∑ ( wi – wiq ) ( w j – w jq )u ( n – i )u ( n – j )
2
E[ε (n)] = E
i
=
j
∑ ∑ ( wi – wiq ) ( w j – w jq )E [ u ( n – i )u ( n – j ) ] i
j
Assuming that 2 E [ u ( n – i )u ( n – j ) ] = A rms , 0,
j=i j≠i
we may simplify Eq. (1) as E [ ε ( n ) ] = A rms ∑ ( w i – w iq ) 2
2
2
i
13.4
(a) The digital residual error is LSB e D ( n ) = --------------µ A rms With 12-bit quantization, the least significant bit is LSB = 2
– 12
≈ 0.25 × 10
–3
We are given µ = 0.07 A rms = 1 Hence, the digital residual error is
292
(1)
– 12
–3
–2 0.25 × 10 2 e D ( n ) = ------------------- ≈ ---------------------------- = 0.35 × 10 0.07 × 1 0.07
(b) The rms quantization error is ( QE ) rms =
M-1
2
2 1⁄2 E [ ε ( n ) ] = A rms ∑ ( w i – w iq ) ≤ ( A rms M ) ( LSB ) i=0 2
For the problem at hand we have ( QE ) rms ≈ 17 × 0.25 × 10
–3
≈ 10
–3
We thus see that (QE)rms is about 3.5 times worse than the digital residual error. 13.5
We start with 1 1 -------------- = ----------------------------------κq( n ) λ + π q ( n )u ( n ) η π ( n )u ( n ) 1 ≈ --------------------------------- 1 – --------------------------------- λ + π ( n )u ( n ) λ + π ( n )u ( n )
(1)
where it is noted that πq ( n ) = π ( n ) + ηπ ( n ) Next, we note that P q ( n-1 )u ( n ) k q ( n ) = ------------------------------κq( n )
(2)
Let P q ( n-1 ) = P ( n-1 ) + η P ( n-1 )
(3)
Therefore, using Eqs. (1) and (3) in (2): ηk ( n ) = kq ( n ) – k ( n )
293
η π ( n )u ( n ) 1 = [ ( P ( n-1 ) + η P ( n-1 ) )u ( n ) ] --------------------------------- 1 – --------------------------------- – k ( n ) λ + π ( n )u ( n ) λ + π ( n )u ( n ) P ( n-1 )u ( n ) η P ( n-1 )u ( n ) P ( n-1 )u ( n )η π ( n )u ( n ) ≈ --------------------------------- + --------------------------------- – -------------------------------------------------------- – k ( n ) 2 λ + π ( n )u ( n ) λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) η P ( n-1 )u ( n ) P ( n-1 )u ( n )η π ( n )u ( n ) = --------------------------------- – -------------------------------------------------------2 λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H
We next calculate (using the relation η π ( n ) = u ( n )η P ( n-1 ) ) η P′ ( n ) = η k ( n )π ( n ) + k ( n )η π ( n ) η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )η π ( n )u ( n )π ( n ) P ( n-1 )u ( n )η π ( n ) ≈ -------------------------------------------- – -------------------------------------------------------------------- + -------------------------------------------2 λ + π ( n )u ( n ) λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H
η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )u ( n )η P ( n-1 )u ( n )π ( n ) = -------------------------------------------- – ----------------------------------------------------------------------------------------2 λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H
P ( n-1 )u ( n )u ( n )η P ( n-1 ) + ----------------------------------------------------------------λ + π ( n )u ( n ) Finally, we calculate 1 η P ( n ) = --- ( η P ( n-1 ) – η P′ ( n-1 ) ) λ H
η P ( n-1 )u ( n )π ( n ) P ( n-1 )u ( n )u ( n )η P ( n-1 )u ( n )π ( n ) 1 = --- η P ( n-1 ) – -------------------------------------------- + ----------------------------------------------------------------------------------------- 2 λ λ + π ( n )u ( n ) ( λ + π ( n )u ( n ) ) H
P ( n-1 )u ( n )u ( n )η P ( n-1 ) – ----------------------------------------------------------------- λ + π ( n )u ( n ) We now note that
294
P ( n-1 )u ( n ) --------------------------------- = k ( n ) λ + π ( n )u ( n ) H
π ( n ) = u ( n )P ( n-1 ) Hence, assuming that P(n-1) is Hermitian: H H H 1 η P ( n ) = --- η P ( n-1 ) – η P ( n-1 )u ( n )k ( n ) + k ( n )u ( n )η P ( n-1 )u ( n )k ( n ) λ
(
H
– k ( n )u ( n )η P ( n-1 )
)
H H H 1 1 = --- ( I – k ( n )u ( n ) )η cP ( n ) – --- ( I – k ( n )u ( n ) )η P ( n-1 )u ( n )k ( n ) λ λ H H H 1 = --- ( I – k ( n )u ( n ) )η P ( n-1 ) ( I – k ( n )u ( n ) ) λ T
From this result we readily see that η P ( n ) = η P ( n ) which demonstrates the symmetrypreserving-property of the RLS algorithm summarized in Table 13.1. 13.6
The condition for persistent excitation (assuming real data) may be expressed as n
aI ≤
∑λ
n-i
T
u ( i )u ( i ) ≤ bI
i=n 0
where a and b are both positive numbers. Premultiplying by zT and postmultiplying by z: T
az z ≤
n
∑λ
n-i T
T
T
z u ( i )u ( i )z ≤ bz z
i=n 0
We now recognize that T
T
z u ( i ) = u ( i )z Hence,
295
n
∑λ
T
az z ≤
2
n-i T
T
z u ( i ) ≤ bz z
i=n 0
For a nonzero vector z, this condition requires that we have T
z u(i) > α
for n 0 ≤ i ≤ n
This is another way of defining the condition for persistent excitation. 13.7
We start with the matrix relation: λ
1⁄2
λ
Φ
1⁄2
u(n)
1⁄2 H
1⁄2
Φ
1⁄2
(n)
d (n) Θ(n) = p (n) H –H ⁄ 2 1 (n) u ( n )Φ
T
H⁄2
Φ
H
p ( n-1 ) 0
λ
( n-1 )
( n-1 )
0
Φ
–H ⁄ 2
(n)
0 ξ ( n )γ γ
1⁄2
1⁄2
– k ( n )γ
(n)
(1)
(n)
–1 ⁄ 2
(n)
where Θ ( n ) is unitary rotation. Let X(n) = Φ
–H ⁄ 2
(n) + ηx(n)
(2)
where η x ( n ) represents the effect of round-off error. Assuming there are no additional local errors introduced at time n, the recursion pertaining to the bottom parts of the prearray and postarray of Eq. (1) takes on the following form: λ
–1 ⁄ 2
X ( n-1 ),
0 Θ ( n ) = X ( n ),
–y ( n )
(3)
The vector y(n) is the quantized version of k(n)γ-1/2(n): y ( n ) = k ( n )γ
–1 ⁄ 2
(n) + ηy(n)
(4)
where k(n) is the gain vector, γ(n) is the conversion factor, and η y ( n ) is the round-off error.
296
Substituting Eq. (2) and (4) inro (3), we may express the quantized version of the last rows in Eq. (1) as follows:
λ
–1 ⁄ 2 –H ⁄ 2 –1 ⁄ 2 Φ ( n-1 ) + λ η x ( n-1 ),
0 Θ(n)
= Φ – H ⁄ 2 ( n ) + η ( n ), x
– k ( n )γ
–1 ⁄ 2
(n) – ηy(n)
(5)
Under infinite-precision arithmetic, we have from the last rows of Eq. (1):
λ
–1 ⁄ 2 –H ⁄ 2 Φ ( n-1 )
0
Θ(n) =
Φ
–H ⁄ 2
( n ),
– k ( n )γ
–1 ⁄ 2
(n)
(6)
Hence, comparing Eqs. (5) and (6), we infer λ
–1 ⁄ 2
0 θ(n) = ηx(n)
η x ( n-1 )
–η y ( n )
(7)
Equation (7) reveals that the error propagation due to η x ( n-1 ) is NOT necessarily stable, in that the local errors tend to grow unboundedly. The unlimited error growth is due to (1) the amplification produced by the factor λ-1/2 for λ < 1, and (2) the fact that the unitary rotation Θ is independent of the error η x . Consequently, as the recursion progresses the –H ⁄ 2
1⁄2
stored values of Φ and Φ deviate more and more from each other’s Hermitian transpose, thereby contradicting the very premise on which the extended QR-RLS algorithm is based. 13.8
We start with the relations (see Problem 12.5): –1
–1 Φ M+1 ( n )
=
ΦM ( n ) T
0M
u M+1 ( n ) =
0M 0
H 1 + ----------------c M ( n )c ( n ) M BM ( n )
uM ( n ) u ( n-M )
Hence, –1
k M+1 ( n ) = Φ M+1 ( n )u M+1 ( n )
297
=
kM ( n ) 0
bM ( n ) + ----------------c M ( n ) BM ( n )
(1)
Let kM+1,M+1(n) denote the last element of the gain vector kM+1(n). Then, recognizing that the last element of cM(n) is unity by definition, we immediately deduce from Eq. (1) that bM ( n ) k M+1,M+1 ( n ) = ---------------BM ( n ) Normalizing with respect to γ M+1 ( n ) , we write k M+1,M+1 ( n ) bM ( n ) k˜ M+1,M+1 ( n ) = -------------------------------- = -------------------------------------γ M+1 ( n ) γ M+1 ( n )B M ( n )
(2)
Next, we use the relation (see part (d) of the solution to Problem 12.6) B M ( n-1 ) γ M+1 ( n ) = λ ---------------------γ M ( n ) BM ( n )
(3)
But B M ( n ) = λB M ( n-1 ) + b *M ( n ) β M ( n ) or b *M ( n )β M ( n ) λB M ( n-1 ) ------------------------- = 1 – -------------------------------BM ( n ) BM ( n ) We may therefore rewrite Eq. (3) in the equivalent form: b *M ( n )β M ( n ) γ M ( n ) = 1 – -------------------------------- BM ( n )
–1
γ M+1 ( n )
The rescue variable is thus defined by γ M+1 ( n ) b *M ( n )β M ( n ) R = --------------------- = 1 – -------------------------------γ M (n) BM ( n )
(4)
298
Ideally, we have 0 < R < 1. Eliminating bM(n)/BM(n) between Eqs. (2) and (4), we may express the rescue variable in the equivalent form: R = 1 – γ M+1 ( n )k˜ M+1,M+1 ( n )β M ( n )
299
CHAPTER 14 14.1
In an adaptive equalizer, the input signal equals the channel output and the desired response equals the channel input (i.e., transmitted signal). In a stationary environment, both of these signals are stationary with the result that the error-performance surface is fixed in all respects. On the other hand, in a nonstationary environment, the channel output (i.e., equalizer input) is nonstationary with the result that both the correlation matrix R of the input vector and the cross-correlation vector p between the input vector and desired response take on time-varying forms. Consequently, the error-performance surface is continually changing its shape and is also in a constant state of motion.
14.2
In adaptive prediction applied to a nonstationary process, both the input vector (defined by a set of past values of the process) and the desired response (defined by the present value of the process) are nonstationary. Accordingly, in such a case the error-performance surface behaves in a manner similar to that described for adaptive equalization in Problem 14.1. Specifically, the error-performance surface constantly changes its shape and constantly moves. In contrast, the error-performance surface for the adaptive prediction of a stationary process is completely fixed.
14.3
We have, by definition, ˆ (n) – E[w ˆ (n)] ∈1( n ) = w ˆ ( n ) ] – wo ∈2( n ) = E [ w We may therefore expand H 1
H
ˆ ( n )- [ w ˆ (n)]) (E[w ˆ ( n ) ] – wo ) ] E [ ∈ ( n ) ∈2( n )] = E [ ( w H
H
ˆ (n)] – E[w ˆ ( n ) ]E [ w ˆ (n)] ˆ ( n )E [ w = E[w H
H
ˆ ( n )w o + E [ w ˆ ( n ) ]w o ] –w H
H
ˆ ( n )w o ] + E [ w ˆ ( n ) ]E [ w o ] = – E[w Invoking the assumption that w(n) and wo are statistically independent, we may go on to write H 1
H
H
ˆ ( n ) ]E [ w o ] ) + E [ w ˆ ( n ) ]E [ w o ] E [ ∈ ( n ) ∈2( n )] = ( – E [ w = 0
(1)
300
From this result we immediately deduce that we also have H 2
E [ ∈ ( n )∈1( n )] = 0
(2)
Finally, we note that E[ ∈ (n)
2
H
] = E [ ∈ (n) ∈ (n) ] H
= E[( ∈ 1 ( n )+ ∈ 2 ( n ) ) ( ∈ ( n )+ ∈ 2 ( n )] 1
= E [ ∈ 1( n )
2
H 1
]+E [ ∈ ( n ) ∈2( n ) ] H 2
+ E[ ∈ ( n ) ∈1( n )] = E[
∈1( n )
2
] + E[
∈2( n )
+ E[ 2
∈2( n )
2
]
]
where in the last line we have made use of Eqs. (1) and (2). 14.4
Invoking the low-pass filtering action of the LMS filter for small µ, we note that ∈1(n) and ∈2(n) are both independent of the input vector u(n). We may therefore write:
1.
H H H H E[ ∈ 1 ( n )u ( n )u ( n ) ∈1( n )] = tr E[ ∈ 1 ( n ) u ( n )u ( n ) ∈1( n )] H H = E tr[ ∈ 1( n )u ( n )u ( n ) ∈1( n )]
H H = E tr[u ( n )u ( n ) ∈1( n )∈ 1( n )]
H H = tr E[u ( n )u ( n ) ∈1( n ) ∈ 1( n ) ] H H = tr E [ u ( n )u ( n ) ]E[ ∈1( n ) ∈ 1( n ) ]
301
= tr [ RK 1 ( n ) ] 2. Similarly, we may show that H
H
H
E[ ∈ 2( n )u ( n )u ( n ) ∈ 2( n )]
3.
H
H
H
E[ ∈ 1( n )u ( n )u ( n ) ∈ 2( n )]
= tr [ RK 2 ( n ) ] H H = tr E[ ∈ 1( n )u ( n )u ( n )∈ 2( n )]
H H = tr E [ u ( n )u ( n ) ]E[ ∈ 2( n ) ∈ 1( n )] H = tr RE[ ∈ 2( n ) ∈ 1( n )]
(1)
Next we note that H
H
E[ ∈ 2( n ) ∈ 1( n )]
= 0
It follows therefore that H
H
H
E[ ∈ 1( n )u ( n )u ( n ) ∈ 2( n )]
= 0
Similarly, we may show that H
H
H
E[ ∈ 2( n )u ( n )u ( n ) ∈ 1( n )] 14.5
= 0
Evaluating the mean-square values of both sides of Eq. (14.27) yields 2
2
2
2
E [ v k ( n+1 ) ] = ( 1 – µλ k ) E [ v k ( n ) ] + E [ φ k ( n ) ] 2 2
2
2
= ( 1 – 2µλ k + µ λ k )E [ v k ( n ) ] + σ φ k
302
(1)
2 2
For small m, we may ignore the term µ λ k in comparison to the unity term, and so approximate Eq. (1) as 2
2
2
E [ v k ( n+1 ) ] ≈ ( 1 – 2µλ k )E [ v k ( n ) ] + σ φ k
(2)
Under steady-state conditions, v k ( n+1 ) → v k ( n ) as n → ∞ , in which case Eq. (2) reduces further to 2
2
2µλ k E [ v k ( n ) ] ≈ σ φ k 2 2
≈ µ σ ν λ u, k + λ ω, k 2
Solving for E [ v k ( n ) ] , we get 2
σν λ ω, k E [ v k ( n ) ] ≈ ------ µ + -------------2 2λ u, k 2
(3)
Hence, the mean-square deviation is (see Eq. (14.31)) 2
D ( n ) = E [ ε0 ( n ) ] M
=
∑ E [ vk ( n )
2
]
k=1
λ ω, k M 2 1 = ----- σ ν µ + --- ∑ ----------2 2 λ u, k M
k=1
–1 M 2 1 = ----- σ ν µ + --- tr [ R u R ω ] 2 2 H
where R u = E [ u ( n )u ( n ) ], H
R ω = E [ ω ( n )ω ( n ) ],
(4) u ( n ) = tap-input vector ω ( n ) = process noise
303
14.6
The misadjustment of the LMS algorithm is given by (see Eqs. (5.91) and (14.36) M
2 1 M ≈ ------ ∑ λ u, k E [ v k ( n ) ] 2 σ ν k=1
(1)
where M is the filter length. From the solution to Problem 14.5, we have 2 µ 2 1 λ ω, k E [ v k ( n ) ] ≈ --- σ ν + ------ ----------2µ λ 2
(2)
u, k
Substituting Eq. (2) into (1), we get M µ 2 1 λ ω, k 1 M ≈ ------ ∑ λ u, k --- σ ν + ------ ----------- 2 2µ λ u, k 2 σ ν k=1
µ = --2
M
M
1 ∑ λu, k + ------------2- ∑ λω, k 2µσ ν k=1 k=1
1 µ = --- + r [ R u ] + -------------tr [ R ω ] 2 2 2µσ
(3)
ν
which is the desired result. 14.7
To simplify the presentation, we use the following notations in the solution to this problem: R u = R and R w = Q (a) The minimum misadjustment for the LMS algorithm is LMS 1 M min = ------ tr [ R ]tr [ Q ] σν
(1)
The corresponding value for the RLS algorithm is RLS 1 M min = ------ Mtr [ RQ ] σν
(2)
304
For Q = c1R, Eqs. (1) and (2) yield the ratio: LMS
M min tr [ R ] ---------------- = ------------------------RLS 2 M min Mtr [ R ]
(3)
Now M
∑ λi qi qi
R =
H
i=1 M
2
R =
∑ λi qi qi 2
H
i=1
We may therefore write M
tr [ R ] =
∑ λi
(4)
i=1
2
tr [ R ] =
M
∑ λi
2
(5)
i=1
Let λ = [ λ 1, λ 2, …, λ M ] 1 = [ 1, 1, …, 1 ]
T
T
We may then reformulate Eqs. (4) and (5) as follows, respectively: T
tr [ R ] = λ 1 2
T
tr [ R ] = λ λ = λ
2
Applying the Cauchy-Schwarz inequality to the matrix product λT1:
305
T
2
λ 1 ≤ λ Since 1
2
2
⋅ 1
2
= M , it follows that
2
2
( tr [ R ] ) ≤ tr [ R ] ⋅ M or, equivalently, tr [ R ] -------------------------- ≤ 1 2 Mtr [ R ]
(6)
Accordingly, we may rewrite Eq. (3) as LMS
M min ---------------- ≤ 1 RLS M min or LMS
RLS
M min ≤ M min ,
Q = c1 R
(b) Consider next the minimum mean-square deviation as the criterion of interest. For the LMS algorithm, we have LMS
–1
D min = σ ν Mtr [ R Q ] and for the RLS algorithm: RLS
–1
D min = σ ν tr [ R ]tr [ Q ] Therefore, LMS
D min --------------- = RLS D min
–1
Mtr [ R Q ] --------------------------------–1 tr [ R ]tr [ Q ]
For Q = c2R-1,
306
LMS
–2 D min Mtr [ R ] --------------- = ---------------------------RLS –1 tr [ R ] D min
(7)
Since,
R
–1
M
=
∑ λi
–1
H
qi qi
i=1
and
R
–2
M
=
∑ λi
–2
H
qi qi ,
i=1
it follows that M
–1
tr [ R ] =
∑ λi
–1
i=1
and –2
M
tr [ R ] =
∑ λi
–2
i=1
Let –1
–1
–1 T
λ inv = [ λ 1 , λ 2 , …, λ M ] 1 = [ 1, 1, …, 1 ]
T
Hence, –1
T
–2
T
tr [ R ] = λ inv 1 tr [ R ] = λ inv λ inv = λ inv
2
307
T
Applying the Cauchy-Schwarz inequality to the matrix product λ inv 1 , we may write 2
T
λ inv 1 ≤ λ inv
2
1
2
That is, –1
2
–2
( tr [ R ] ) ≤ tr [ R ] ⋅ M or equivalently, –2
Mtr [ R ] ---------------------------- ≥ 1 –1 tr [ R ]
(8)
Accordingly, we may rewrite Eq. (7) as LMS
D min --------------- ≥ 1 RLS D min That is, LMS
RLS
D min ≥ D min , 14.8
Q = c2 R
–1
As with Problem 14.7, here again we simplify the presentation by using the notations R u = R and R ω = Q (a) LMS Algorithm For Q = c1R, or equivalently, R-1Q = c1I: (i)
LMS
–1
D min = σ ν M ( tr [ R Q ] ) = σ ν M ( tr [ c 1 I ] )
1⁄2
1⁄2
= σν M c1
(1)
308
1⁄2 –1 1 µ opt = ---------------- ( tr [ R Q ] ) σν M
=
(ii)
c1 ⁄ σν
(2)
LMS 1⁄2 1 M min = ------ ( tr [ R ]tr [ Q ] ) σν
c1 = ---------tr [ R ] σν
(3)
Givben the two-by-two correlation matrix
R =
r 11
r 21
r 21
r 22
we may write tr [ R ] = r 11 + r 22
(4)
Therefore, substituting Eq. (4) into (3): c 1 ( r 11 + r 22 ) LMS M min = -----------------------------------σν
(5)
The optimum step-size parameter is c1 1 tr [ Q ] 1 ⁄ 2 µ opt = ------ ------------- = --------σ ν tr [ R ] σν
(6)
which is the same as the µopt for minimum DLMS. Consider next the case of Q = c2R-1: LMS
–1
(iii) D min = σ ν M ( tr [ R Q ] )
1⁄2
309
–2
= σ ν 2c 2 ( tr [ R ] )
1⁄2
(7)
R =
R
–1
r 11
r 21
r 21
r 22
1 r = ----- 22 ∆r –r 21
– r 21
2
∆ r = r 11 r 22 – r 21
,
r 11 2
R
–2
–1 –1
= R R
2
r 22 + r 21 1 = -----2 ∆r –r ( r + r ) 21 11 22
– r 21 ( r 11 + r 22 ) 2
2
r 11 + r 21
–2 2 2 1 2 tr [ R ] = ------ [ r 11 + 2r 21 + r 22 ] 2 ∆r
(8)
Substituting Eq. (8) into (7), and noting that M = 2: 2
LMS D min
2
2
r 11 + 2r 21 + r 22 = σ ν 2c 2 ------------------------------------------2 r 11 r 22 – r 21
(9)
The optimum step-size parameter is 1⁄2 –1 1 µ opt = ---------------- ( tr [ R Q ] ) σν M –2 1 ⁄ 2 1 c2 = ------ ----- ( tr [ R ] ) σν 2 2
2
2
1 c 2 r 11 + 2r 21 + r 22 = ------ ----- ------------------------------------------2 σν 2 r 11 r 22 – r 21 LMS 1⁄2 1 (iv) M min = ------ ( tr [ R ]tr [ Q ] ) σν
310
(10)
c2 –1 1 ⁄ 2 = --------- ( tr [ R ]tr [ R ] ) σν c2 r 11 + r 22 = --------- -----------------------------------------σν 2 1⁄2 ( r 11 r 22 – r 21 ) LMS
µ opt
(11)
1 tr [ Q ] 1 ⁄ 2 = ------ ------------- σ ν tr [ R ] c 2 tr [ R – 1 ] 1 ⁄ 2 = --------- ------------------- σ ν tr [ R ] c2 2 –1 ⁄ 2 = --------- ( r 11 r 22 – r 21 ) σν
(12)
which is different from the µopt for minimum DLMS. (b) RLS Algorithm For Q = c1R: (i)
RLS
–1
D min = σ ν ( tr [ R ]tr [ Q ] )
1⁄2
–1
= σ ν c 1 ( tr [ R ]tr [ R ] )
1⁄2
r 11 + r 22 = σ ν c 1 -----------------------------------------2 1⁄2 ( r 11 r 22 – r 21 )
(13)
1 tr [ Q ] 1 ⁄ 2 λ opt = 1 – ------ ------------------- σ ν tr [ R – 1 ] c1 2 = 1 – --------- r 11 r 22 – r 21 σν
311
(14)
RLS 1⁄2 1 (ii) M min = ------ ( Mtr [ RQ ] ) σν
c1 2 1⁄2 , = --------- ( 2tr [ R ] ) σν
M=2
2c 1 2 2 2 = ------------ r 11 + 2r 21 + r 22 σν 1⁄2 1 1 , λ opt = 1 – ------ ----- tr [ RQ ] σν M
(15)
M=2
2 2 1 c1 2 = 1 – ------ ----- r 11 + 2r 21 + r 22 σν 2
(16)
which is different from the λopt for minimum DRLS. Consider next the case of Q = c2R-1 or equivalently RQ = c2I: RLS
–1
(iii) D min = σ ν ( tr [ R ]tr [ Q ] )
1⁄2
–1
= σ ν c 2 ( tr [ R ] ) r 11 + r 22 = σ ν c 2 ----------------------------2 r 11 r 22 – r 21
(17)
1 tr [ Q ] 1 ⁄ 2 λ opt = 1 – ------ ------------------- σ ν tr [ R – 1 ] c2 2 = 1 – --------- r 11 r 22 – r 21 σν RLS 1⁄2 1 (iv) M min = ------ ( Mtr [ RQ ] ) , σν
(18)
M=2
312
2 = ------ c 2 σν
(19)
1 1 λ opt = 1 – ------ ----- tr [ c 2 I ] , σν M
M=2
c2 = 1 – --------σν
(20)
RLS
which is different from the λopt for minimum D min . Comparisons of LMS and RLS algorithms: 1.
Q = c1 R LMS
2
D min 2 r 11 r 22 – r 21 --------------- = ------------------------------------RLS r 11 + r 22 D min LMS
M min r 11 + r 22 ---------------- = --------------------------------------------------RLS 2 2 2 M min 2 r 11 + 2r 21 + r 22 2.
Q = c2 R LMS
D min --------------- = RLS D min
–1
2
2
2
r 11 + 2r 21 + r 22 2 ------------------------------------------r 11 + r 22
LMS
M min r 11 + r 22 ---------------- = ------------------------------------RLS 2 M min 2 r 11 r 22 – r 21
313
CHAPTER 15 15.1
For the output error method M
N
∑ ai ( n )u ( n-i ) + ∑ bi ( n )y ( n-i )
y(n) =
i=0
(1)
i=1
Taking the z-transform of both sizes of Eq. (1) Y ( z ) = A ( z )U ( z ) + B ( z )Y ( z )
(2)
For the equation error method y′ ( n ) =
=
=
M
N
i=0
i=1
M
N
i=0
i=1
M
N
∑ ai ( n )u ( n-i ) + ∑ bi ( n )d ( n-i ) ∑ ai ( n )u ( n-i ) + ∑ bi ( n ) [ e′ ( n-i ) + y′ ( n-i ) ] N
∑ ai ( n )u ( n-i ) + ∑ bi ( n )e′ ( n-i ) + ∑ bi ( n )y′ ( n-i ) i=0
i=1
(3)
i=1
Taking the z-transform of both sides of Eq. (3): Y′ ( z ) = A ( z )U ( z ) + B ( z ) E′ ( z ) + B ( z )Y′ ( z ) = A ( z )U ( z ) + B ( z ) ( 1 – B ( z ) )E ( z ) + B ( z )Y′ ( z ) ( 1 – B ( z ) )Y′ ( z ) = A ( z )U ( z ) + B ( z ) ( 1 – B ( z ) )E ( z ) 1 Y′ ( z ) = -------------------- A ( z )U ( z ) + B ( z )E ( z ) , 1 – B(z) 1 which explains why the transfer function -------------------- comes in. 1 – B(z)
314
(4)
15.2
The approximation used in Eqs. (15.18) and (15.19), reproduced here for convenience of presentation, ∂y ( n- j ) α j n ≈ α 0 ( n- j ) = ---------------------∂a 0 ( n- j )
j = 1, 2, …, M j = 2, …, N
β j ( n ) ≈ β 1 ( n- j+1 )
is based on the following observation: When the adaptive filtering algorithm reaches a convergent point, then the parameters of the filter could be held constant, at which point the two equations become exact. Hence, ∂y ( n ) ∂y ( n- j ) α j ( n ) = ---------------- ≈ ------------------- = α 0 ( n- j ) ∂a j ( n ) ∂a 0 ( n ) ∂y ( n ) ∂y ( n- j+1 ) β j ( n ) = ---------------- ≈ -------------------------- = β 1 ( n- j+1 ) ∂b 1 ( n ) ∂b j ( n ) 15.3
j = 1, 2, …, M
j = 2, …, N
To mitigate the stability problem in adaptive IIR filters, the following measures can be born in mind in formulating the algorithm: 1. Use of the “equation error method” to replace “output error method” for rough approximation. 2. Use of a lattice structure for the IIR filter1 3. Combine IIR and FIR structures to devise hybrid filter, e.g., Laguerre transversal filter.
15.4
LMS Algorithm for Laguerre transversal filter: Initialization: Initialize the weights w0,w1,...,wM by setting them equal to zero, or else by assigning them small randon values. Computation: –1
z –α L ( z, α ) = -------------------–1 1 – αz L i ( z , α ) = L o ( z, α ) ( L ( z , α ) ) U i ( z, α ) = L i ( z, α )U ( z )
1.
i
i = 0, 1, 2, … i = 0, 1, 2, …
Miao, K.X., et al., IEEE Trans. Signal Processing, 42, pp. 721-742, 1994.
315
M
y(n) =
∑ wi Z
–1
M
[ U i ( z, α ) ] =
i=0
∑ wi ui ( n, α ) i=0
e(n) = d (n) – y(n) Update: –1
w i ( n+1 ) = w i ( n ) + µ˜ z [ U i ( z, α ) ]e ( n ) = w i ( n ) + µ˜ u M ( n, α )e ( n ) 15.5
In order to prove the shift invariance property, we need to prove a Lemma first. Lemma: Let y1(n) and y2(n) be two random signals obtained by linear time-invariant filtering of the (wide-sense) stationary signal x(n), then E L [ y 1 ( n ) ] { L [ y 2 ( n ) ] } * = E y 1 ( n )y *2 ( n ) where L denotes the filtering operation. Proof: Let H1(z) and H2(z) be the filters used to get y1(n) and y2(n) from x(n). If y1(n) and y2(n) are jointly wide-sense stationary, and * 1 dz E y 1 ( n )y *2 ( n ) = -------- ∫ jω H 1 ( z )H 2 ( z )S x ( z ) ----° 2πj z z=e where Sx(ejω) is the power spectrum of input signal x(n), then we may write * E L [ y1 ( n ) ] { L [ y2 ( n ) ] } 2 * 1 dz = -------- ∫ jω L 1 ( z ) H 1 ( z )H ( z )S x ( z ) ----2 ° 2πj z=e z * 1 dz = -------- ∫ jω H 1 ( z )H ( z )S x ( z ) ----2 2πj°z=e z
316
2
for L 1 ( z ) =1
= E y 1 ( n )y *2 ( n ) *
With this Lemma at hand, now lexamine ξ j ( n ) and ξ k ( n ) and assume j > k without loss of generality. Applying the Lemma given above, we have * jω * jω jω 1 π E ξ j ( n )ξ k ( n ) = ------ ∫ L j ( e )L k ( e )S x ( e ) dω 2π – π j
–1 z–1 – a z – a = 1 , for z = 1 it follows that Since L j ( z ) = L 0 ( z ) -------------------- , and ------------------–1 1 – az – 1 1 – az
* E ξ j ( n )ξ k ( n ) – jω jω 2 e 1 π –a = ------ ∫ L 0 ( e ) ----------------------- 2π – π 1 – ae – j ω
j-k
Sx(e
jω
) dω
jω * jω jω 1 π = ------ ∫ L j – k ( e )L 0 ( e )S x ( e ) dω 2π – π
* = E ξ j – k ( n )ξ 0 ( n ) The above proof naturally holds for the real-valued situation as a special case: E { ξ j ( n )ξ k ( n ) } = E { ξ j – k ( n )ξ 0 ( n ) } 15.6
ξ ( n ) = [ ξ 0 ( n ), ξ 1 ( n ), …, ξ M ( n ) ]
T
From the solution to Problem 15.5, we know ξ j ( n ) satisfies the shift-invariant property E [ ξ j ( n )ξ k ( n ) ] = E [ ξ j – k ( n )ξ 0 ( n ) ] = C ξ ( j – k )
317
(1)
T
Now examine the correlation matrix R = E [ ξ ( n )ξ ( n ) ] which is a M-by-M matrix. For any element (j,k) of this matrix (1 < j, k < M) we know that Eq. (1) holds. For any (j+1, k+1) element, E [ ξ j+1 ( n )ξ k+1 ( n ) ] = E [ ξ j-k ( n )ξ 0 ( n ) ] = E [ ξ j ( n )ξ k ( n ) ] = Cξ( j – k ) For the (j+M, k+M) element we have E [ ξ j+M ( n )ξ k+M ( n ) ] = E [ ξ j ( n )ξ k ( n ) ] Thus the diagonal and sub-diagonal elements of R will be the same, hence R is a Toeplitz matrix. 15.7
(a) The derivation of the algorithm summarized in Table 15.3 is based on the Burg formula for reflection coefficients of a lattice predictor for real-valued data [see Eq. (2.7)] n
2 ∑ b m-1 ( i-1 ) f m-1 ( i ) i=1 κˆ m ( n ) = – ----------------------------------------------------------------------n 2 2 ∑ ( f m-1 ( i ) + bm-1 ( i-1 ) ) i=1
∆m ( n ) = – ---------------Dm ( n )
(1)
where n
∆ m ( n ) = 2 ∑ b m-1 ( i-1 ) f m-1 ( i ) i=1 n
Dm ( n ) =
∑ f m-1 ( i ) + bm-1 ( i-1 ) 2
2
i=1
For recursive computations of the numerator ∆ m ( n ) and denominator Dm(n), we proceed by writing
318
n
∆ m ( n ) = 2 ∑ b m-1 ( i-1 ) f m-1 ( i ) + 2b m-1 ( n-1 ) f m-1 ( n ) i=1
= ∆ m ( n-1 ) + 2b m-1 ( n-1 ) f m-1 ( n ) n
Dm ( n ) =
∑ ( f m-1 ( i ) + bm-1 ( i-1 ) ) + f m-1 ( n ) + bm-1 ( n-1 ) 2
2
2
2
i=1 2
2
= D m ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 ) To exercise designer control on these two recursive computations, we modify them respectively as follows: ∆ m ( n ) = λ∆ m ( n-1 ) + 2b m-1 ( n-1 ) f m-1 ( n ) 2
2
D m ( n ) = λD m ( n-1 ) + f m-1 ( n ) + b m-1 ( n-1 )
(2) (3)
where λ is a design parameter, 0 < λ < 1. Finally, we apply Eq. (15.35) of the text in accordance with the Laguerre formulation of the lattice filter: b˜ m-1 ( n ) = b m-1 ( n-1 ) + α ( b˜ m-1 ( n-1 ) – b m-1 ( n ) )
(4)
where α is another design parameter. Thus, using b˜ m-1 ( n ) in place of b m-1 ( n-1 ) in Eqs. (2) and (3), we get the entries in the first two lines of the time updates. All that remains for us to do is to use Eq. (1) to compute the reflection coefficient κˆ m ( n ) ; hence, the third line of the time updates. (b) Using GAL algorithm, compared to conventional GAL algorithm, the lattice structure has a faster initial convergence and less computational complexity.
319
CHAPTER 16 16.1
(a) The received signal of a digital communication system in baseband form is given by ∞
u(t ) =
∑
x m h ( t – mT ) + v ( t )
m=-∞
where xk is the transmitted symbol, h(t) is the overall impulse response, T is the symbol period, and v(t) is the channel noise. Evaluating u(t) at times t1 and t2: ∞
u(t1) =
∑
x m h ( t 1 – mT ) + v ( t 1 )
m=-∞ ∞
u(t2) =
∑
x l h ( t 2 – lT ) + v ( t 2 )
m=-∞
Hence, the autocorrelation function of u(t) is r u ( t 1, t 2 ) = E [ u ( t 1 )u * ( t 2 ) ] ∞
∞
∑ ∑ xm x*l h ( t 1 – mT )h ( t 2 – lT )
= E
m=-∞ l=-∞
+ E [ v ( t 1 )v * ( t 2 ) ] ∞
=
∞
∑ ∑ r x ( mT – lT )h ( t 1 – mT )h* ( t 2 – lT )
(1)
m=-∞ l=-∞ 2
+ σv δ ( t 1 – t 2 ) where rx(mT - lT) is the autocorrelation function of the transmitted signal. From Eq. (1) we immediately see that r u ( t 1 + T , t 2 + T ) = r u ( t 1, t 2 ) which shows that u(t) is indeed cyclostationary in the wide sense.
320
(b) Applying the definitions α 1 T ⁄2 τ τ j2παt r u ( τ ) = --- ∫ r u t + ---, t – --- e dt T –T ⁄ 2 2 2 α
Su ( ω ) =
∞
α
∫–∞ r u ( τ )e
α = k ⁄ T,
– j 2πfτ
dτ,
ω = 2πf
k = 0, ± 1, ± 2, …
to the result obtained in part (a), we may show that k⁄T
Su
jω + jkπ ⁄ T jω jkπ ⁄ T 1 kπ = ---H ( e )H * ( e – e )S x ω + ------ T T 2
+ σ v δ ( k ),
k = 0, ± 1, ± 2…
(2)
As a check, we see that for k = 0, Eq. (2) reduces to the standard result: jω 2 2 1 S u ( ω ) = --- H ( e ) S x + σ v T k⁄T
Let ψ k ( ω ) denote the phase response of S u and Φ ( ω ) denote the phase response of the channel. Then recognizing that the power spectral density Sx(ω) of the transmitted signal is real valued, we readily find from Eq. (2) that kπ kπ Ψ k ( ω ) = Φ ω + ------ – Φ ω – ------ , T T
k = 0, ± 1, ± 2, …
(c) From the formula for the inverse Fourier transform, we have jωτ 1 ∞ ψ k ( τ ) = ------ ∫ Ψ k ( ω )e dω 2π – ∞ jωτ 1 ∞ φ ( τ ) = ------ ∫ Φ k ( ω )e dω 2π – ∞
Applying these definitions to Eq. (3): ψ k ( τ ) = φ ( τ )e
– j πkτ ⁄ T
– φ ( τ )e
jπkτ ⁄ T
321
(3)
kπτ = – 2 jφ ( τ ) sin --------- , T
k = 0, ± 1, ± 2, …
(4)
On the basis of Eq. (4), we may make two important observations: (1) For k = 0 and kτ/T equal to an integer, ψ k ( τ ) is identically zero. For these values of k, φ(τ) cannot be determined. This means that for an arbitrary channel, the unknown phase response Φ ( f ) cannot be identified for k = 0 or kτ/T = integer by using cyclostationary second-order statistics of the channel output. (2) For k = +2 and higher in absolute value, the use of ψ k ( τ ) does not reveal any more information about the channel phase response than what can be obtained with k = +1. We may therefore just as well work with k = 1, for which ψ k ( τ ) has the largest support, as shown by πτ ψ 1 ( τ ) = – 2 jφ ( τ ) sin ------ T That is, jψ 1 ( τ ) φ ( τ ) = -----------------------------2 sin ( πτ ⁄ T ) which shows that φ(τ) is identifiable from ψ 1 ( τ ) except for τ = mT, where m is an integer. 16.2
In the noise-free case, we have (1)
U n = Hx n
where H is the LN-by-(M+N) multichannel filtering matrix, xn is the (M+N)-by-1 transmitted signal vector, and un is the LN-by-1 multichannel received signal vector. Let un be applied to a multichannel structure characterized by the (M+N)-by-LN filtering matrix T such that we have TU n = x n
(2)
This zero-forcing condition ensures the perfect recovery of xn from un. Substituting Eq. (1) into (2): (3)
TH = I
322
where I is the identity matrix. From Eq. (3) we immediately deduce that T = H
+
where H+ is the pseudoinverse of H.
The relationship between g and H on the one hand and h and gk on the other hand, as described in Eq. (16.27), namely, gkHHHgk = hHgkgkh follows directly from the two sets of definitions
H
,
(l)
=
( L-1 )
gk
0 … h M-1 … 0 (l)
(l)
0 … h0
… hM
(0) (1)
( L-1 )
(1)
Gk
( L-1 )
(l)
(l)
g k, 0 … g k, N -1 … (l)
, Gk
=
0
0
(l)
… g k, N -2 … (l)
0 0 ...
Gk
(0)
...
Gk
...
gk
Gk =
(l)
...
gk =
H
...
(1)
gk
(l)
...
H
...
H=
(l)
h0 … h M … 0
(0)
...
H
...
16.3
(l)
… g k, 0 … g k, N -1
323
h h h 16.4
(1)
...
h =
(0)
( L-1 )
For a noiseless channel, the received signal is M
(l)
un =
(l)
∑ hn
x n-m,
l = 0, 1, …, L-1
(1)
m=0
By definition, we have M
( l ) –m
∑ hm z
l
H (z) =
m=0
where z is the unit-delay operator. We therefore rewrite Eq. (1) in the equivalent form: (l)
l
un = H ( z ) [ xn ]
(2)
where H(l)(z) acts as an operator. Multiplying Eq. (2) by G(l)(z) and then summing over l: L-1
∑G
l
(l) ( z ) [ un ]
l=0
L-1
=
∑ G ( z )H ( z ) [ xn ] l
l
(3)
l=0
According to the generalized Bezout identity: L-1
∑ G ( z )H ( z ) l
l
= 1
l=0
We may therefore simplify Eq. (3) to L-1
(l)
∑ G ( z ) [ un l
] = xn
l=0
Let
324
y
(l)
(l)
l
( n ) = G ( z ) [ u n ],
l = 0, 1, …, L-1
Let G(l)(z) be written in the expanded form: M
( l ) –i
∑ gi
l
G (z) =
z
i=0
Hence,
y
(l)
M
( l ) –i
∑ gi
(n) =
(l)
z [ un ]
i=0 M
(l) (l) un –i
∑ gi
=
i=0
From linear prediction theory, we recognize that (l) uˆ n+1
M
=
(l) (l) un –i
∑ gi i=0
It follows therefore that y
(l)
–i
(l)
( n ) = z [ uˆ n+1 ]
and so we may rewrite L-1
xn =
∑z
–i
(l)
[ uˆ n+1 ]
l=0
16.5
We are given 1 ∞ xˆ = -------------- ∫ x f V ( y – c 0 x ) f X ( x ) dx f Y ( y ) –∞ where
325
f X ( x) = 1 ⁄ 2 3 0,
– 3≤x< 3 otherwise
2 2 – υ ⁄ 2σ 1 --------------e f V (υ) = , 2πσ
and ∞
∫–∞ f X ( x ) f V ( y – c0 x ) dx
f Y ( y) = Hence,
∫–
2
– ( y – c 0 x ) ⁄ 2σ
3
2
dx xe xˆ = ---------------------------------------------------------2 2
∫–
3
3
e
– ( y – c 0 x ) ⁄ 2σ
(1)
dx
3
Let t = ( y – c0 x ) ⁄ σ dt = – c 0 d x ⁄ σ Then, we may rewrite Eq. (1) as ( y+ 3c 0 ) ⁄ σ
2 –t ⁄ 2 σ - ( y – tσ )e dt ∫( y – 3c0 ) ⁄ σ ---2 c0 xˆ = -----------------------------------------------------------------------------( y+ 3c 0 ) ⁄ σ σ – t 2 ⁄ 2 -e dt ∫( y – 3c0 ) ⁄ σ ---c0
( y+ 3c 0 ) ⁄ σ
2
–t ⁄ 2
dt te ∫ 1 σ ( y – 3c 0 ) ⁄ σ = ----- y – ----- ------------------------------------------------------c0 c 0 ( y+ 3c 0 ) ⁄ σ – t 2 ⁄ 2 e dt ∫
(2)
( y – 3c 0 ) ⁄ σ
Next we recognize the following two results:
326
( y+ 3c 0 ) ⁄ σ
∫( y –
3c 0 ) ⁄ σ
2
te
–t ⁄ 2
2
dt =
=
( y+ 3c 0 ) ⁄ σ – t 2 ⁄ 2
∫( y –
3c 0 ) ⁄ σ
e
dt =
=
e
–t ⁄ 2
( y + 3c 0 ) ⁄ σ ( y – 3c 0 ) ⁄ σ
y – 3c 0 y+ 3c 0 2π Z ------------------- – Z ------------------- σ σ ∞
∫( y –
2
3c 0 ) ⁄ σ
e
–t ⁄ 2
∞
dt – ∫
( y + 3c 0 ) ⁄ σ
σ Z ( ( y + 3c 0 ) ⁄ σ ) – Z ( ( y – 3c 0 ) ⁄ σ ) 1 xˆ = ----- y + ----- ------------------------------------------------------------------------------------------c 0 Q ( ( y – 3c ) ⁄ σ ) – Q ( ( y + 3c ) ⁄ σ ) c0 0 0 For convergence of a Bussgang algorithm: E [ y ( n )y ( n + k ) ] = E [ y ( n )g ( y ( n + k ) ) ] For large n: 2
E [ y ( n ) ] = E [ y ( n )g ( y ( n ) ) ] For y(n) = x(n) to achieve perfect equalization: 2
E [ xˆ ] = E [ xg ( x ) ] With xˆ being of zero mean and unit variance, we thus have E [ xˆ g ( x ) ] = 1 16.7
We start with (for real-valued data) y(n) =
∑ wˆ i ( n )u ( n – i ) i
T
–t ⁄ 2
dt
2π { Q ( ( y – 3c 0 ) ⁄ σ ) – Q ( ( y + 3c 0 ) ⁄ σ ) }
We may therefore rewrite Eq. (2) in the compact form:
16.6
2
e
T
ˆ ( n )u ( n ) = u ( n )w ˆ (n) = w
327
Let y = [ y ( n 1 ), y ( n 2 ), …, y ( n K ) ]
T
T
u ( n1 ) T
u ( n2 ) ...
U =
T
u ( nK ) ˆ ( n ) has a constant value w ˆ , averaged over the whole block of data, then Assuming that w ˆ y = Uw ˆ is The solution for w +
ˆ = U y w where U+ is the pseudoinverse of the matrix U.
16.8
(a) For the binary PSK system, a plot of the error signal versus the equalizer output has the form shown by the continuous curve in Fig. 1:
e(n)
2 1.5
SE-CMA
1.0 0.5 0 CMA
-0.5 -1 -1.5 2
y(n) -2
-1
0
1
Fig. 1
328
2
(b) The corresponding plot for the signed-error (SE) version of the CMA is shown by the dashed line in Fig. 1 (c) CMA is a stochastic algorithm minimizing the Godard criterion 2 2 1 J cm = --- E [ y n – R 2 ] 4
where the positive constant R2 is the dispersion constant, which is chosen in accordance with the source statistics. For a fractionally spaced equalizer (FSE) update algorithm, the algorithm is described by *
2
2
γ
2
{
w ( n+1 ) = w ( n ) + µu ( n )y ( n )γ – y ( n ) , ψ ( yn )
= 1 + R2
where µ is small positive step size and ψ ( . ) is called the CMA error function. The signed-error (SE)-CMA algorithm is described as follows: w ( n+1 ) = w ( n ) + µu ( n ) sgn ψ ( y ( n ) ) where sgn ( . ) denotes the signum function. Specifically, sgn ( x ) = 1 for x > 0 – 1 for x < 0 The SE-CMA is computationally more efficient than CMA. 16.9
The update function for the constant modulus algorithm (CMA) is described by ˆ ( n+1 ) = w ˆ ( n ) + µu ( n )e * ( n ) w where 2
2
e ( n ) = y ( n ) ( γ – y ( n ) ),
γ
2
= 1 + R2
In quadriphase-shift keying (QPSK), the output signal y(n) is complex-valued, as shown by
329
y ( n ) = y I ( n ) + jy Q ( n ),
y I ( n ) = in-phase component y Q ( n ) = quadrature component
Hence, 2
2
eI ( n ) = yI ( n ) ( R2 – yI ( n ) – yQ ( n ) ) 2
2
eQ ( n ) = yQ ( n ) ( R2 – yI ( n ) – yQ ( n ) ) For the signed CMA, we thus have ˆ ( n+1 ) = w ˆ ( n ) + µu ( n ) sgn e ( n ) w ˆ ( n+1 ) + µu ( n ) sgn [ e I ( n ) + je Q ( n ) ] = w The weights are complex valued. Hence, following the analysis presented in Section 5.3 of the text, we may write ˆ I ( n ) + µ ( u I ( n ) sgn [ e I ( n ) ] – u Q ( n ) sgn [ e Q ( n ) ] ) ˆ I ( n+1 ) = w w
(1)
ˆ Q ( n ) + µ ( u Q ( n ) sgn [ e I ( n ) ] – u I ( n ) sgn [ e Q ( n ) ] ) ˆ Q ( n+1 ) = w w
(2)
where ˆ (n) = w ˆ I ( n ) + jw ˆ Q(n) w u ( n ) = u I ( n ) + ju Q ( n ) The standard version of the complex CMA is as follows: ˆ I ( n+1 ) = w ˆ I ( n ) + µ ( u I ( n )e I ( n ) – u Q ( n )e Q ( n ) ) w
(3)
ˆ Q ( n ) + µ ( u Q ( n )e I ( n ) – u I ( n )e Q ( n ) ) ˆ Q ( n+1 ) = w w
(4)
Both versions of the CMA, the signed version of Eqs. (1) and (2) and the standard version of Eqs. (3) and (4), can now be treated in the same way as the real-valued CMA in Problem 16.8.
330
16.10 Dithered signed-error version of CMA, hereafter referred to as DSE-CMA: (a) According to quantization theory, the operator αsgn(v(n)) has an effect equivalent to that of the two-level quantizer: Q(v(n)) = ∆ ⁄ 2 –∆ ⁄ 2
v(n) ≥ 0 v(n) > 0
where v ( n ) = e ( n ) + αε ( n ) Furthermore, since the samples of the dither ε ( n ) are i.i.d over the interval [-1, 1], { αε ( n ) } satisfies the requirement for a valid dither process if the constant α satisfies the requirement: α ≥ e(n) ε(n)
w(n) x(n)
+
sgn
x(n)
+
Fig. 1 The equivalent model is illustrated in Fig. 1. Hence, we may rewrite the DSE-CMA update formula: w ( n+1 ) = w ( n ) + µu ( n ) ( e ( n ) + ε ( n ) )
(1)
Also since ε ( n ) is an uncorrelated random process, its first moment is defined by E [ε(n) e(n)] = E [ε(n)] = 0
(2)
Taking the expectation of the DSE-CMA error function, we find that it is a hardlimited version of the e(n), as shown by E [v(n) y(n)] α, = e ( n ), – α,
v(n) > 0 v(n) ≤ α v ( n ) < –α
which follows from Eqs. (1) and (2).
331
16.11 (a) The Shalvi-Weinstein equalizer is based on the cost function 4
2
2
J ( n ) = E [ y ( n ) ] subject to the constraint E [ y ( n ) ] = σ x 2
where y(n) is the equalizer output and σ x is the variance of the original data sequence applied to the channel input. Applying the method of Lagrange multipliers, we may define a cost function for the Shalvi-Weinstein equalizer that incorporates the constraint as follows: 4
2
2
J SW ( n ) = E [ y ( n ) ] + λ ( E [ y ( n ) ] – σ x )
(1)
where λ is the Lagrange multiplier. From Eq. (16.105) of the text, we find that the cost function for the Godard algorithm takes the following form for p = 2: 2
J G = E ( y ( n ) – R2 ) 4
2
2
2
= E [ y ( n ) ] – 2R 2 E [ y ( n ) ] + R 2
(2)
where R2 is a positive real constant. Comparing Eqs. (1) and (2), we see that these two cost functions have the same mathematical form. Hence, we may infer that these two equalization algorithms share the same optimization criterion. (b) For a more detailed discussion of the equivalence between the Godard and ShalviWeinstein algorithms, we may proceed as follows: First, rewrite the tap-weight vector of the equalizer in polar form (i.e., a unit-norm vector times a radial scale factor), and then optimize the Godard cost function with respect to the radial factor. The “reduced” cost function that results from this transformation is then recognized as a monotonic transformation of the corresponding Shalvi-Weinstein cost function. Since the transformation relating these two criteria is monotonic, their stationary points and local/global minima coincide to within a radial factor. By taking this approach, the equivalence between the Godard and ShalviWeinstein equalizers seem to hold under general conditions (linear or nonlinear channels, iid or correlated data sequences applied to the channel input, Gaussian or non-Gaussian channel noise, etc).1
1.
For further details on the issues raised herein, see P.A. Regalia, “On the equivalence between the Godard and Shalvi-Weinstein schemes of blind equalization”, Signal Processing, vol. 73, pp.185-190, 1999.
332
16.12 For the derivation of Eq. (16.116), see the Appendix of the paper by Johnson et al.2 Note, however, the CMA cost function for binary PSK given in Eq. (63) of that paper is four times that of Eq. (16.116).
2.
C.R. Johnson, et al., “Blind equalization using the constant modulus criterion: A review”, Proc. IEEE, vol. 86, pp. 1927-1950, October 1998.
333
CHAPTER 17 17.1
(a) The complementary error function 1 x –t ⁄ 2 dt ϕ ( x ) = ---------- ∫ e 2π – ∞ qualifies as a sigmoid function for two reasons: 1. The function is a monotonically increasing function of x, with ϕ ( –∞ ) = 0 ϕ ( 0 ) = 0.5 ϕ(∞) = 1 For x = ∞, ϕ equals the total area under the probability density function of a Gaussian variable with zero mean and unit variable; this area is unity by definition. 2. The function ϕ(x) is continuously differentiable: 1 – x2 ⁄ 2 dϕ ------ = ----------e dx 2π (b) The inverse tangent function 2 –1 ϕ ( x ) = --- tan ( x ) π also qualifies as a sigmoid function for two reasons: 1. ϕ ( – ∞ ) = – 1 ϕ(0) = 0 ϕ ( ∞ ) = +1 2. ϕ(x) is continuously differentiable: 2 1 dϕ ------ = --- ------------------π dx 2 1+x The complementary error function and the inverse tangent function differ from each other in the following respects:
334
• • 17.2
The complementary error function is unipolar (nonsymmetric). The inverse tangent function is bipolar (antisymmetric).
The incorporation of a momentum modifies the update rule for sympatic weight wkj as follows: ∂E ( n ) ∆w kj ( n ) = α∆w kj ( n-1 ) – η --------------∂w
(1)
kj
where α η ε(n) n
= momentum constant = learning-rate parameter = cost function to be minimized = iteration number
Equation (1) represents a first-order difference equation in ∆w wj ( n ) . Solving it for ∆w kj ( n ) , we get n
∆w kj ( n ) = – η ∑ α
n-t ∂E ( t )
t=0
-------------∂w kj
(2)
For -1 < α < 0, we may rewrite Eq. (2) as n
∆w kj ( n ) = – η ∑ ( – 1 ) t=0
n-t
α
n-t ∂E ( t )
-------------∂w kj
Thus, the use of -1 < α < 0 in place of the commonly used value 0 < α < 1 is merely to introduce the multiplying factor (-1)n-t, which (for a specified n) alternates in algebraic sign as t increases. 17.3
Consider the real-valued version of the back-propagation algorithm, which is summarized in Table 17.2 of the text. In the backward pass, starting from the output layer, we see that as we progress from the output layer, the error signal decreases the further away we are from the output layer. This would then suggest that the learning rates used to adjust the weights in the multilayer perceptron should be increased to make up for the decrease in the error signal as we move away from the output layer. In so doing, the rate at which the learning process takes place in the different layers of the network is equalized, which is highly desirable.
335
17.4
We are given the time series 3
u(n) =
2
2
∑ ai v ( n-i ) + ∑ ∑ aij v ( n-1 )v ( n- j ) i=1
i=1 j=1
We may implement it as follows: v(n) z-1
. . . . . .
v(n-1)
a1
z-1
v(n-2)
17.5
a11 a21 + a12
X X
Σ
Σ
u(n)
a22
a2
z-1 v(n-3)
X
a3
Σ
The minimum description length (MDL) criterion strives to optimize the model order. In particular, it provides the best compromise between its two components: a likelihood function, and a penalty function. Realizing that model order has a direct bearing on model complexity, it may therefore be argued that the MDL criterion tries to match the complexity of a model to the underlying complexity of the input data. The risk R of Eq. (17.63) also has two components: one determined by the training data, and the other determined by the network complexity. Loosely speaking, the roles of these two components are comparable to those of the likelihood function and the penalty function in the MDL criterion, respectively. Increasing the first component of the risk R at the expense of the second component implies that the training data are highly reliable, whereas increasing the second component of R at the expense of the first component implies that the training data are of poor quality.
336
17.6
(a) Laguerre-based version of MLP has the following structure: wo,0 wo,1
Input signal u(n) Lo(z,α)
wo,2
v1 v2 v3
L(z,α)
Output y(n)
...
... ...
...
v4
desired output d(n)
L(z,α)
i=0,2,...,N-1
wo,N
vM
wM,N j=0,1,...,M-1
(b) The new BP algorithm can be devised for Laguerre-based version MLP in a manner similar to the LMS algorithm formulated for Langerre filter (Problem 4 in Chapter 15). The only difference between the new BP algorithm and the conventional BP algorithm lies in the adjustment of the input-to-hidden layer weights, and the calculation of each hidden unit output. Recall from the solution to Problem 4 in Chapter 15, we have For hidden unit output: N -1
hj =
∑ wi, j ui ( n, α )
i=0
N -1
=
∑z
–1
[ U i ( z, α ) ]
i=0 M-1
ϕ ( h j ) = tanh ( h j ) ,
j = 0, 1, …, M-1
y = tanh ∑ v j ϕ ( h j ) j=0
337
For adaptation of weights 1. Output layer (for simplicity, consider only one output unit) M-1
∆v j ( n ) = µ˜ h j ( n )ϕ′ ∑ v j ϕ ( h j ) [ d – y ( n ) ] j=0 M-1
Bias ( n ) = µ˜ ϕ′ ∑ v j ϕ ( h j ) [ d – y ( n ) ] j=0 2. Hidden layer (for simplicity, consider only one hidden layer) ∆w ij ( n ) = µ˜ u i ( n, α )ϕ′ ( h j ) ∑ v j ϕ′ ∑ v j h j [ d – y ( n ) ] j
j
Bias ( n ) = µ˜ ϕ′ ( h j ) ∑ v j ϕ ∑ v j h j [ d – y ( n ) ] j
j
338