Multigrid Methods

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:85–87 Published online in Wiley InterSci...

27 downloads 769 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:85–87 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.586

Editorial

Multigrid Methods SUMMARY This special issue contains papers from the Thirteenth Copper Mountain Conference on Multigrid Methods, held in the Colorado Rocky Mountains on March 19–23, 2007, co-chaired by Van Henson and Joel Dendy. The papers address a variety of applications and cover a breadth of topics, ranging from theory to high-performance computing. Copyright q 2008 John Wiley & Sons, Ltd. KEY WORDS:

multigrid; image processing; adaptive refinement; domain decomposition; Karhunen– Lo`eve expansion; eigensolver; Hodge decomposition

The First Copper Mountain Conference on Multigrid Methods was organized in 1983 by Steve McCormick, who persevered to chair nine more in this biennial series before handing over the reins in 2003. Today, the conference is widely regarded as one of the premier international conferences on multigrid methods. In 1990, it was joined by the equally successful conference on iterative methods, chaired by Tom Manteuffel. The 2007 multigrid meeting was co-chaired by the now three-time veterans Van Henson and Joel Dendy. The conference began with three tutorial sessions given by Van Henson and Craig Douglas. The sessions covered multigrid basics as well as more advanced topics such as nonlinear multigrid and algebraic multigrid (AMG). The remaining five days of the conference were organized around a series of 25-min talks, allowing ample time for individual research discussions with colleagues. The student paper competition produced three winners, Hengguang Li (Penn State University), Christian Mense (Technical University of Bonn), and Hisham Zubair (University of Delft), who presented their papers in the student session. This special issue contains 10 papers from the Thirteenth Copper Mountain Conference on Multigrid Methods, held in the Colorado Rocky Mountains on March 19–23, 2007. The papers address a variety of applications and cover a breadth of topics, ranging from theory to highperformance computing. De Sterck et al. [1] explore two efficiency-based refinement strategies for the adaptive finite element solution of partial differential equations (PDEs). The goal is to reach a pre-specified bound on the global discretization error with minimal amount of work. The methods described require a multigrid method that is optimal on adaptive grids with potentially higher-order elements. De Sterck et al. [2] introduce long-range interpolation strategies for AMG. The resulting AMG methods exhibit dramatic reductions in complexity costs on parallel computers while maintaining near-optimal multigrid convergence properties. Rosseel et al. [3] describe an AMG method for solving stochastic PDEs. The stochastic finite element method is used to transform the problem to a large system of coupled PDEs, and the AMG method is used to solve the system. Bell and Olson [4] propose a general AMG approach for the solution of discrete k-form Laplacians. The method uses an aggregation approach and maintains commutativity of the coarse and fine de Rham complexes. Copyright q

2008 John Wiley & Sons, Ltd.

86

EDITORIAL

St¨urmer et al. [5] introduce a fast multigrid solver for applications in image processing, including image denoising and non-rigid diffusion-based image registration. The solver utilizes architectureaware optimizations and is compared with solvers based on fast Fourier transforms. K¨ostler et al. [6] develop a geometric multigrid solver for optical flow and image registration problems. The collective pointwise smoothers used are analyzed with Fourier analysis, and the method is applied to synthetic and real world images. Michelini and Coyle [7] introduce an alternative to classical local Fourier analysis (LFA) as a tool for designing intergrid transfer operators in multigrid methods. A harmonic aliasing property is introduced and the approach is compared and contrasted with LFA. Brezina et al. [8] introduce an eigensolver based on the smoothed aggregation (SA) method that produces an approximation to the minimal eigenvector of the system. The ultimate aim of the work is to improve the so-called adaptive SA method, which has been shown to be a highly robust solver. Zhu [9] derives convergence theory for overlapping domain decomposition methods for secondorder elliptic equations with large jumps in coefficients. It is shown that the convergence rate is nearly uniform with respect to the jumps and mesh size. Brannick et al. [10] analyze a multigrid V-cycle scheme for solving the discretized 2D Poisson equation with corner singularities. The method is proven to be uniformly convergent for finite element discretizations of the Poisson equation on graded meshes, and supporting numerical experiments are supplied. The 2007 conference was held in cooperation with the Society for Industrial and Applied Mathematics and sponsored by the Lawrence Livermore and Los Alamos National Laboratories, Front Range Scientific Computation, Inc., the Department of Energy, the National Science Foundation, and IBM Corporation. The Program Committee members for the conference were Susanne Brenner, Craig Douglas, Robert Falgout, Jim Jones, Kirk Jordan, Tom Manteuffel, Steve McCormick, David Moulton, Kees Oosterlee, Joseph Pasciak, Ulrich R¨ude, John Ruge, Klaus St¨uben, Olof Widlund, Ulrike Yang, Irad Yavneh, and Ludmil Zikatanov. The Program Committee served as Guest Editors for the special issue. We thank the editors of Numerical Linear Algebra with Applications for hosting this special issue, especially Panayot Vassilevski, for his invaluable help and guidance. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. REFERENCES 1. De Sterck H, Manteuffel T, McCormick S, Nolting J, Ruge J, Tang L. Efficiency-based h- and hp-refinement strategies for finite element methods. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.567. 2. De Sterck H, Falgout RD, Nolting JW, Yang UM. Distance-two interpolation for parallel algebraic multigrid. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.559. 3. Rosseel E, Boonen T, Vandewalle S. Algebraic multigrid for stationary and time-dependent partial differential equations with stochastic coefficients. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.568. 4. Bell N, Olson LN. Algebraic multigrid for k-form Laplacians. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.577. 5. St¨urmer M, K¨ostler H, R¨ude U. A fast full multigrid solver for applications in image processing. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.563. 6. K¨ostler H, Ruhnau K, Wienands R. Multigrid solution of the optical flow system using a combined diffusionand curvature-based regularizer. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.576. 7. Michelini PN, Coyle EJ. A semi-algebraic approach that enables the design of inter-grid operators to optimize multigrid convergence. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.579. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:85–87

EDITORIAL

87

8. Brezina M, Manteuffel T, McCormick S, Ruge J, Sanders G, Vassilevski P. A generalized eigensolver based on smoothed aggregation (GES-SA) for initializing smoothed aggregation (SA) multigrid. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.575. 9. Zhu Y. Domain decomposition preconditioners for elliptic equations with jump coefficients. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.566. 10. Brannick JJ, Li H, Zikatanov LT. Uniform convergence of the multigrid V -cycle on graded meshes for corner singularities. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.574.

ROBERT D. FALGOUT GUEST EDITOR Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore, CA, U.S.A.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:85–87

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:89–114 Published online 17 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.567

Efficiency-based h- and hp-refinement strategies for finite element methods H. De Sterck1, ∗, † , T. Manteuffel2 , S. McCormick2 , J. Nolting2 , J. Ruge2 and L. Tang1 1 Department 2 Department

of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, U.S.A.

SUMMARY Two efficiency-based grid refinement strategies are investigated for adaptive finite element solution of partial differential equations. In each refinement step, the elements are ordered in terms of decreasing local error, and the optimal fraction of elements to be refined is determined based on efficiency measures that take both error reduction and work into account. The goal is to reach a pre-specified bound on the global error with minimal amount of work. Two efficiency measures are discussed, ‘work times error’ and ‘accuracy per computational cost’. The resulting refinement strategies are first compared for a one-dimensional (1D) model problem that may have a singularity. Modified versions of the efficiency strategies are proposed for the singular case, and the resulting adaptive methods are compared with a threshold-based refinement strategy. Next, the efficiency strategies are applied to the case of hp-refinement for the 1D model problem. The use of the efficiency-based refinement strategies is then explored for problems with spatial dimension greater than one. The ‘work times error’ strategy is inefficient when the spatial dimension, d, is larger than the finite element order, p, but the ‘accuracy per computational cost’ strategy provides an efficient refinement mechanism for any combination of d and p. Copyright q 2008 John Wiley & Sons, Ltd. Received 19 April 2007; Accepted 1 November 2007 KEY WORDS:

adaptive refinement; finite element methods; hp-refinement

1. INTRODUCTION Adaptive finite element methods are being used extensively as powerful tools for approximating solutions of partial differential equations (PDEs) in a variety of application fields, see, e.g. [1–3]. This paper investigates the behavior of two efficiency-based grid refinement strategies for adaptive

∗ Correspondence †

to: H. De Sterck, Department of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada. E-mail: [email protected]

Copyright q

2008 John Wiley & Sons, Ltd.

90

H. DE STERCK ET AL.

finite element solution of PDEs. It is assumed that a sharp, easily computed local a posteriori error estimator is available for the finite element method. In each refinement step, the elements are ordered in terms of decreasing local error, and the optimal fraction of elements to be refined in the current step is determined based on efficiency measures that take both error reduction and work into account. The goal is to reach a pre-specified bound on the global error with a minimal amount of work. It is assumed that optimal solvers are used for the discrete linear systems and that the computational work for solving these systems is, thus, proportional to the number of degrees of freedom (DOF). Two efficiency measures are discussed. The first efficiency measure is ‘work times error’ efficiency (WEE), which was originally proposed in [4]. A second measure proposed in this paper is called ‘accuracy per computational cost’ efficiency (ACE). In the first part of the paper, the performance of the two measures is compared for a standard onedimensional (1D) model problem with solution x , which may exhibit a singularity at the origin, depending on the value of the parameter . The accuracy of the resulting grid is compared with the asymptotically optimal ‘radical grid’ [3, 5]. Modified versions of the efficiency strategies are proposed for the singular case, and the resulting adaptive methods are compared with a thresholdbased refinement strategy. The efficiency strategies are also applied to the hp-refinement case for the 1D model problem, and the results are compared with the ‘optimal geometric grid’ for hp-refinement that was derived in [5]. In the last part of the paper, the use of the efficiencybased refinement strategies is explored for problems with spatial dimension d>1. The ‘work times error’ strategy turns out to be inefficient when the spatial dimension, d, is larger than the finite element order, p, but the ‘accuracy per computational cost’ strategy provides an efficient refinement mechanism for any combination of d and p. This is illustrated for a model problem in two dimensions (2D). This paper is organized as follows. In the following section, the efficiency-based h-refinement strategies are described, along with the notation used in this paper, the model problem, and assumptions on the PDE problems, finite element methods, error estimators, and linear solvers considered. The performance of the WEE and ACE refinement strategies for the 1D model problem is discussed in Section 3. Modified WEE and ACE refinement strategies for the singular case are considered in Section 4. In Section 5, efficiency-based hp-refinement strategies is discussed and illustrated for the 1D test problem. Section 6 describes how the efficiency-based refinement strategies can be applied for 2D problems. Throughout the paper, numerical tests illustrate the performance of the proposed methods. Smooth and singular 1D model problems are introduced in Section 2.2, and the performance of the proposed h- and hp-refinement strategies in 1D is discussed in Sections 3–5. A smooth 2D test problem is proposed in Section 6.2, and 2D h-refinement results are discussed in Section 6.3. Conclusions are formulated in Section 7.

2. EFFICIENCY-BASED h-REFINEMENT STRATEGIES 2.1. Assumptions on PDE problem, error estimate, refinement process, and linear solver Consider a PDE expressed abstractly as Lu = f

in ⊂ Rd

(1)

with appropriate boundary conditions and solution space V . Assume that continuity and coercivity bounds for the corresponding bilinear form can be verified in some suitable norm. Let Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

91

Th be a regular partition of the domain, , into finite elements [3, 6], i.e. = ∈Th with h = max{diam() : ∈ Th }. In this paper we assume, for simplicity, that the elements are squares in 2D and cubes in three dimensions (3D). Let Vh be a finite-dimensional subspace of V and u h ∈ Vh a finite element approximation such that the following error estimate holds: u −u h H m () Ch s−m u H s ()

(2)

where 0m<s< p +1, m and p are integers and s is a real number. Here, p is the polynomial order of the finite element method. Furthermore, assume that we obtain a sharp a posteriori error estimate E(u h , f ) that is equivalent to u −u h H m () . The associated error functional is given by F(u h , f ) = E 2 (u h , f ). For example, the L 2 functional is a natural a posteriori error estimate for first-order system least squares (FOSLS) finite element methods, and equivalence to the H 1 norm has been proved for several relevant second-order PDE systems of elliptic type [4, 7–9]. The local value of the error, E, on element j is denoted by j . Consider an adaptive hp-refinement process of the following form. The refinement process starts on a coarse grid with uniform element size h and order p = 1 (level 0) and proceeds through levels = 1, 2, . . . , until the error measure, E (u h , f ), has a value less than a given bound. In each step, some elements may be refined in h by splitting them into 2d sub-elements, and some elements may be refined in p by doubling the element order. The decision of which elements to refine is based on the information provided by the local error estimator, and by heuristics that may take into account predicted error reduction and work. In particular, we consider strategies where the elements are ordered in terms of decreasing local error, such that elements with larger error are considered for refinement first. Standard threshold-based approaches then may refine, for example, a fixed fraction of the elements in every step or a fixed fraction of the total error functional. Let the work needed to solve the discrete linear system on level be given by W . Our goal is to reach a pre-specified bound on the global error, E (u h , f ), with a L minimal amount of total work, =1 W . Finding this optimal grid sequence may be difficult, even if we restrict the process to h-refinement alone. Hence, we turn to seeking nearly optimal solutions by using heuristics of greedy type. We consider refinement heuristics that determine the fraction of elements to be refined based on optimizing an efficiency measure in every step. We expect that a desirable grid sequence needs to be a high accuracy sequence, i.e. a grid sequence for which the error, E (N ), decreases with nearly optimal order as a function of the number of DOF, N , on grid level . Note that our strategy also results in an approximate solution to the following problem: find a mesh with a fixed number of DOF that minimizes the error. To this end, one can simply stop the above described process when the specified number of DOF is reached. We allow the domain to contain singularities, i.e. points or lines in whose neighborhood the full convergence order of the finite element method cannot be attained due to lack of smoothness of the solution. For simplicity, assume that those singular points or lines can be located only at coarse-level grid points or grid lines and that their power and location are known. This includes the case where the singularities occur at the boundaries of the simulation domain. If the location and strength of the singularities are not known in advance, they can be estimated by monitoring reduction rates of local error functionals during a few steps of initial uniform refinement. It is assumed that optimal solvers, e.g. multigrid, are used for the discrete linear systems. The computational work for solving these systems is, thus, assumed to be a fixed constant times the number of DOF: W = c N . Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

92

H. DE STERCK ET AL.

2.2. 1D model problem and finite element method In the first part of this paper, we study the performance of the proposed efficiency-based refinement strategies for a standard model problem in 1D [3, 5]: u = (−1)x −2 ,

u(0) = 0,

u(1) = 1

(3)

with exact solution given by u = x . While the efficiency-based refinement strategies can be applied to various types of finite element methods and associated error estimates, we choose to illustrate the refinement strategies for model problem (3) using standard Galerkin finite element methods of order p, with the error estimated by the H 1 seminorm of the actual error, e = u −u h , i.e. F(u h , f ) = u −u h 2L 2 () and 2j = u −u h 2L 2 ( ) . These are equivalent to the H 1 norm, since it turns out that j

e(xi ) = 0 at each grid point for our model problem [3, 5]. Note that u ∈ H 1+−(1/2)− ((0, 1)) for any >0. If we choose 12 < 32 such that u ∈ H 1 ((0, 1)) but u ∈ / H 2 ((0, 1)), then there is an x -type singularity at x = 0. We choose this model problem and this error estimator because asymptotically optimal h- and hp-finite element grids have been developed for them [3, 5], which can be used as a point of comparison for the refinement strategies to be presented in this paper. In addition, it turns out that the finite element approximations can be obtained easily, namely, by interpolation for p = 1 and by integrating a truncated Legendre expansion of u (x) for p1. The refinement strategies presented in this paper can be equally applied to other finite element methods, as is illustrated in the second part of the paper, where we present results for a 2D problem using the FOSLS finite element method [4, 8]. 2.3. ‘WEE’ and ‘ACE’ strategies On each level, order the elements such that the local error, j , satisfies 1 · · · N . With r ∈ (0, 1] denoting the to-be-determined fraction of elements that will be refined, let f (r ) ∈ [0, 1] be the fraction of the total error functional in the refinement region, (r ) ∈ [0, 1] the predicted functional reduction, and (r ) ∈ [1, 2] the ratio of the number of DOF on level +1 and level , i.e. N+1 = (r ) N . The first refinement strategy, WEE, was initially proposed in [4]. Here, the fraction, r , of elements to be refined on the current level is determined by minimizing the following efficiency measure: work×error reduction = (r ) (r ) (4) i.e.

ropt = arg min (r ) (r ) r ∈(0,1]

(5)

The motivation for this heuristic is as follows: more work on the current level is justified when it results in increased error reduction that offsets the extra work. While this choice does not guarantee that a globally optimal grid sequence is obtained, this local optimization in each step results in an overall strategy of greedy type, which can be expected to lead to a reasonable approximation to the optimal grid sequence. We also propose a second strategy, ACE. We define the predicted effective functional reduction factor (r )eff = (r )1/(r ) Copyright q

2008 John Wiley & Sons, Ltd.

(6) Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

93

The fraction, r , of elements to be refined on the current level is determined by minimizing this effective reduction factor, which is the same as minimizing log((r )eff ), i.e. log((r )) r ∈(0,1] (r )

ropt = arg min

(7)

The effective functional reduction factor, (r )eff , measures the functional reduction per unit work. Indeed, compare two hypothetical error-reducing processes with functional reduction factors 1 and 2 , and work proportional to 1 and 2 . Assume that process 2 requires double the work of process 1, 2 = 2 1 . Then the two processes would be equally effective when 2 = 21 , because process 1 could be applied twice to obtain the same error reduction as process 2, using the same total amount of work as process 2. Minimizing the effective functional reduction in every step, thus, chooses the fraction, r , of elements to be refined by locally minimizing the functional reduction per unit work. Both the strategies of minimizing work times error reduction and minimizing the effective functional reduction factor are ways for optimizing the efficiency of the refinement process at each level. Hence, we call the two proposed efficiency-based refinement strategies. 2.4. Error and work estimates for h-refinement in 1D The predicted functional reduction ratio, (r ), and element growth ratio, (r ), can be determined as follows for the case of h-refinement in 1D with fixed finite element order p. The element growth ratio, (r ), can be determined easily. We have N elements on level . Of these, r N are refined into two new elements each, while (1−r )N elements are left unrefined. Thus, the number of elements on level +1 is N+1 = (1−r )N +2r N = (1+r )N . This yields (r ) = 1+r

(8)

The predicted functional reduction factor, (r ), depends on the error estimate and the smoothness of the solution. As mentioned above, we consider the case that the error estimate is equivalent to the H 1 norm of u −u h , i.e. F(u h , f ) ≈ u −u h 2H 1 () and 2j ≈ u −u h 2H 1 ( ) . The error has the j following asymptotic behavior [6]. For elements j in which the solution is smooth (at least in H p+1 ( j ) if order p elements are used), we have 2j ≈ u h −u2H 1 (

j)

2p

Ch j u2H p+1 ( 2p

C M p+1 h j h j

j)

(9)

p+1 Here, we can take M p+1 = i=0 u (i)2 ∞, j , such that u2H p+1 ( ) M p+1 h j . If j is split into j two equal parts, we have two new elements, j,1 and j,2 , and we can assume that 2j,1 +2j,2 2j Copyright q

2008 John Wiley & Sons, Ltd.

2 p 1 ≈ 2

(10) Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

94

H. DE STERCK ET AL.

However, if u is less smooth in some element j , i.e. if we can assume only that u ∈ H s ( j ) with s ∈ R, s< p +1, then we have 2(s−1)

2j Ch j

u2s,i

(11)

For simplicity, we consider only the highly singular case here, for which s p +1. If, again, j is split into two, assuming element j,1 contains the singularity, then j,1 j,2 and us, j,1 ≈ us, j . We then obtain 2j,1 +2j,2 i2

2(s−1) 1 ≈ 2 ≈ 2 j 2j,1

(12)

Suppose the solution is sufficiently smooth in the whole domain. Then the predicted functional reduction factor, (r ), can be obtained as follows. We apply (10) to the elements that are refined. A fraction, 1− f (r ), of elements do not get refined; hence, we assume that their errors are not reduced. This results in (r ) = 1− f (r )+( 12 )2 p f (r )

(13)

It is cumbersome to give a general expression for the singular case. However, assuming that we know the power and location of the singularities in advance, one can easily compute (r ) using (10) and (12).

3. PERFORMANCE OF THE WEE AND ACE h-REFINEMENT STRATEGIES IN 1D 3.1. Performance of WEE and ACE for smooth solutions We apply the WEE and ACE strategies to our 1D model problem (3) with p = 1. On each level , each element is allowed to be refined at most once. We first consider the nonsingular case and choose > 32 such that u ∈ H 2 ((0, 1)). It follows that the predicted functional reduction factor, (r ), is given by (r ) = 1− 34 f (r )

(14)

Note that, for a given error bound, our ultimate goal is to choose a grid sequence that minimizes L L the total work, =1 W , which is the same as minimizing =1 N , based on our assumption that the work is proportional to N . For a given error bound, the number of elements on final grid N L is determined by the convergence rate of the global error w.r.t. the DOF, which in fact is determined by the refinement strategy. For our model problem, it has been shown in [5] that the rate of convergence is never better than (N p)− p , where N is the number of elements and p is the degree of the polynomial. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

95

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

Theorem 1 (Gui and Babuˇska [5]) Let E = ( i2 )1/2 . Then there is a constant, C = C(, p)>0, for any grid {0 = x 0 <x1 < · · · <x N = 1}, such that EC (N p)− p

(15)

For our example problem, an asymptotically optimal final grid, called a radical grid, is described in [3, 5]: x j = ( j/N )( p+1/2)/(−1/2) ,

j = 0, . . . , N

(16)

log10(E)

This grid is optimal in the sense that, in the limit of large N , it results in the smallest error as a function of the number of DOF. If the WEE or the ACE strategy results in a grid sequence with approximately optimal convergence rate of the global error w.r.t. DOF, then the number of elements on the final grid must be close to the optimal number of elements, which depends only on the given error bound. Because we wish to minimize work, it follows that, among the methods with approximately optimal convergence rate, the methods for which the sequence {N } increases fast are preferable. Large refinements are, thus, advantageous. We compare the numerical results of the WEE and ACE strategies, and radical grid for = 2.1 and p = 1 in Figures 1–6. In the numerical results, we carry out the refinement process until E L (u h , f )2e−5 on final grid level L. From Figure 1, it can be observed that both strategies result in a highly accurate grid sequence. Thus, for a given error bound, the difference in the number of elements on the final grid is very small. This can be verified on Figure 2. Figures 3 and 4 show that the ACE strategy is slightly more efficient than the WEE strategy for our model problem in the smooth case. There are two small refinements in the WEE refinement process, while there are no small refinements for the ACE strategy. It follows that for a given error bound on the final grid, the WEE strategy may

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

log (N) 10

Figure 1. Error versus DOF, = 2.1 (no singularity), p = 1. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

96

H. DE STERCK ET AL.

x 10

x 10

1.8

1.8

1.6

1.6

1.4

1.4

1.2 1.2 1 1 0.8 0.8

0.6

0.6

0.4

0.4

0.2 0

(a)

0

0.2

0.4

0.6

0.8

1

0.2

(b)

0

0.2

0.4

0.6

0.8

1

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

(a)

f(ropt) and ropt

f(r

opt

) and r

opt

Figure 2. Local error functional, i2 , versus grid location on the final grid, = 2.1 (no singularity), p = 1: (a) WEE: N L = 32 741, E L = 1.859e−5, L = 18, total work = 102 313 and (b) ACE: N L = 32 760, E L = 1.858e−5, L = 16, total work = 65 520.

0

2

4

6

8

10

12

14

level

16

18

0

(b)

0

5

10

15

level

Figure 3. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements, ropt , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.

require slightly more total work than the ACE strategy, see Figure 5. Figure 2 shows that, for both strategies, the local errors in all elements tend to be equally distributed. This explains why the values of f (ropt ) and ropt are close in Figure 3. From Figure 6 one can see that the predicted reduction factor (ropt ) is very accurate. This suggests a modification of the refinement process that can be considered to increase performance: one does not need to solve the linear systems until the new level is refined enough to have a significant number of additional elements in it. In this way complexity is never a problem, and we can still have a highly accurate grid sequence. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

97

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

3.5

x 10

3

3

2.5

2.5

2

2

1.5

1.5

1

1

0.5

0.5

0

(a)

0

x 10

3.5

2

4

6

8

10

12

14

16

0

18

(b)

0

2

4

6

8

10

12

14

16

Figure 4. Number of elements, N , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.

10 WEE ACE

error on final grid

10

10

10

10

10 10

10

10

10

10

10

10

total work

Figure 5. Final error, E L , versus total work,

L

=1

N , = 2.1 (no singularity), p = 1.

3.2. Performance of WEE and ACE for singular solutions Next, we consider a singular example: let = 0.6, so that u ∈ H 1.1 ((0, 1)). In the numerical results, we carry out the refinement process until E L (u h , f )7e−4 on final grid level L. For p = 1, the error reduction in the element that contains x = 0 can be approximately given by ( 12 )0.2 , see (12). The predicted reduction factor (r ) is given by 3 1 0.2 1 1 (r ) = 1− f (r )+ f (17) − 4 2 4 N Here, we assume that the local error in the element that contains x = 0 is always the largest. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

98

H. DE STERCK ET AL.

1

1 predicted factor actual factor

predicted factor 0.9

actual factor

0.8

0.8

0.7

0.7

γ and g

γ and g

0.9

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0

2

4

6

(a)

8

10

12

14

16

0.2

18

0

5

10

(b)

level

15

level

log10(E)

Figure 6. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.

1.5

2

2.5

3

3.5

4

4.5

5

log (N) 10

Figure 7. Error versus DOF, = 0.6 (singular case), p = 1.

The numerical results in Figures 7–12 show that the two refinement strategies fail for this singular case. Figure 7 shows that the WEE strategy results in a highly accurate grid sequence, while the ACE strategy becomes inaccurate by comparison with the radical grid. For both strategies, the local error in the first element, which contains the singularity, is always the largest, see Figure 8. Hence, it is refined by the WEE and the ACE in every step. This also confirms that the predicted reduction factor can be given by (17). The WEE strategy generates a grid sequence with local errors being nearly equally distributed, but the ACE strategy does not: more than 90% of the global error accumulates in only 10% of the elements; see Figures 8 and 9. Most refinement steps of the Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

99

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

x 10

4

4.5

x 10

3.5

4 3 3.5 2.5

3 2.5

2

2

1.5

1.5 1 1 0.5

0.5 0

(a)

0

0.2

0.4

x

0.6

0.8

1

0

(b)

0

0.2

0.4

x

0.6

0.8

1

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

(a)

f(ropt) and ropt

f(r

opt

) and r

opt

Figure 8. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case), p = 1: (a) WEE: N L = 6925, E L = 6.169e−4, L = 154, total work = 192 775 and (b) ACE: N L = 24 986, E L = 6.411e−4, L = 106, total work = 365 420.

0

20

40

60

80

100

120

level

140

160

0

(b)

0

20

40

60

80

100

120

level

Figure 9. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements, ropt , versus level , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.

WEE strategy are small refinements: only the first element (possibly with a few other elements) is continuously being refined (see Figures 9 and 10). This implies that the number of elements increases slowly as a function of refinement level. It follows that the total work is very large. The ACE strategy does choose a refinement region with large fraction of the error in it. However, this large fraction of error is contained only in a few elements. As a result, only a small fraction of elements are refined. Thus, the required total work is still large; see Figures 10 and 11. Compared with the nonsingular case (Figure 5), the slope of the error versus total work plot in Figure 11 is Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

100

H. DE STERCK ET AL.

7000

2.5

x 10

6000 2 5000 1.5

4000

3000

1

2000 0.5 1000

0

(a)

0

20

40

60

80

100

120

140

160

0

(b)

0

20

40

60

80

100

120

Figure 10. Number of elements, N , versus level, , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.

10 WEE ACE

error on final grid

10

10

10

10 10

10

10

10

10

10

10

total work

Figure 11. Final error, E L , versus total work,

L

=1

N , = 0.6 (singular case), p = 1.

much less steep, especially in the initial phase of the refinement process. The predicted reduction factors for both strategies are accurate, see Figure 12. This suggests that we can make the same modification as for the smooth case to increase performance: one can wait on solving the linear systems until the number of elements has increased sufficiently. In this way, one can assure that the complexity is never a problem, but calculating and minimizing the WEE and ACE functions many times may be costly as well. In conclusion, for the highly singular case, the WEE strategy results in an accurate grid sequence but is not efficient due to too many small refinements; the ACE Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

101

1

1

0.9

0.9

0.8

0.8

γ l and g

γ and g

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

0.7

0.6

0.7

0.6

0.5

0.5 predicted factor

(a)

predicted factor

actual factor

0.4 0

20

40

actual factor

0.4 60

80

100

120

level

140

160

(b)

0

20

40

60

80

100

120

level

Figure 12. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g , versus level, , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.

strategy is worse than the WEE strategy in this case, because the grid sequence is not accurate and many small refinements are performed.

4. MODIFIED WEE AND ACE h-REFINEMENT STRATEGIES FOR SINGULAR SOLUTIONS IN 1D 4.1. Modified WEE and ACE h-refinement strategies The inefficiency of the WEE and ACE strategies for the highly singular solution is due to many steps of small refinement for the singular elements. Therefore, we attempt to avoid these steps by using a geometrically graded grid starting from the singular point, with the aim of saving work while attempting to keep the grid sequence accurate. As was discussed before, we assume that singularities can be located only at coarse-level grid points and that we know the location and the power of the singularities in advance. We propose to do graded grid refinement for elements containing a singularity, in such a way that we obtain the same error reduction factor as in elements in which the solution is smooth. For example, for a singularity located at a domain boundary, the element at the boundary is split into two, and then, within the same refinement step, the new element at the singularity is repeatedly split into two again, until the predicted error reduction factor matches the desired error reduction. We modify the predicted functional reduction factor, (r ), and the work increase ratio, (r ), accordingly. We expect the correspondingly modified WEE and ACE strategies (MWEE and MACE) to generate a highly accurate grid sequence in an efficient way. This results in the following modified efficiency-based refinement strategies: (1) Order the elements such that the local error, j , satisfies 1 2 · · · N . (2) Perform graded grid refinement for elements containing a singularity, i.e. if u ∈ H s j ( j ), then graded grid refinement with m j levels is used for any j that needs to be refined, with Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

102

H. DE STERCK ET AL.

m j satisfying 2m j (s j −1) 2 p 1 p 1 ≈ ⇒m j = 2 2 s j −1 Note that we assume here that the error in the first, singular new element dominates the sum of the errors in the other new elements of the graded grid. This is a good approximation for a strong singularity. For elements in which the solution is smooth, single refinement is performed: m j = 1. Let k j be the number of new elements after j is refined: k j = m j +1. (3) The predicted functional reduction factor, (r ), and the work increase ratio, (r ), are given by j r N k j (r ) = 1−r + N (18) 2 p 1 (r ) = 1− f (r )+ f (r ) 2 (4) Find the optimal r defined in (5) for the MWEE strategy and in (7) for the MACE strategy. (5) Repeat. 4.2. Performance of the modified WEE and ACE h-refinement strategies for singular solutions We again choose = 0.6 and p = 1 for our example problem. There is a singularity at x = 0, with error reduction factor bound ( 12 )0.2 . Therefore, for the element that contains x = 0, we use 1 11-graded refinement (m = 0.1 ). Numerical results are shown in Figures 13–18. By comparing the numerical results for the modified strategies with the results for the original methods, we see the following. Both the MWEE and MACE strategies result in highly accurate

−0.5 MWEE : −1.0157x+0.68384 MACE : −1.0119x+0.70888 Radical Grid : −0.9938x+0.58166

−1

−2

10

log (E)

−1.5

−2.5

−3

−3.5

−4

1

1.5

2

2.5

3

3.5

4

4.5

5

log (N) 10

Figure 13. Error versus DOF, = 0.6 (singular case), p = 1. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

103

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

4.5

x 10

4.5

4

4

3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

1

1

0.5

0.5

0

(a)

0

0.2

0.4

0.6

0.8

1

0

(b)

x 10

0

0.2

0.4

0.6

0.8

1

1 0.9

0.8

0.8

0.7

0.7

) and r

0.6

opt

0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

(a)

opt

1 0.9

f(r

f(r

opt

) and r

opt

Figure 14. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case), p = 1: (a) MWEE: N L = 6975, E L = 6.125e−4, L = 15, total work = 21 176 and (b) MACE: N L = 8517, E L = 5.443e−4, L = 12, total work = 17 044.

0

2

4

6

8

10

level

12

14

0

(b)

1

2

3

4

5

6

7

8

9

10

11

level

Figure 15. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements, ropt , versus level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.

grid sequences: the convergence rate is very close to the optimal rate (Figure 13). Local error functionals on the final MWEE grid are more equally distributed than for the MACE grid. For the MWEE strategy, the local error functional in the singular element is only three times larger than in the smooth elements. However, for the MACE strategy, that ratio is as large as 1000 (Figure 14). For the MWEE strategy, the number of elements, N , increases much faster than for the WEE strategy, which reduces the work considerably (Figure 15). However, there still exist a few small refinement steps. For the MACE strategy, it seems that the strategy tends to do uniform refinement after several initial steps (Figure 15(b)). Similar to the smooth solution case, the MWEE strategy may need slightly more work to reach the same error bound than the MACE strategy due to a Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

104

H. DE STERCK ET AL.

7000

9000 8000

6000

7000 5000 6000 4000

5000

3000

4000 3000

2000 2000 1000

0

(a)

1000

0

5

10

15

0

(b)

0

2

4

6

8

10

12

Figure 16. Number of elements, N , versus level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.

10 MWEE MACE WEE ACE

error on final grid

10

10

10

10 10

10

10

10

10

10

10

total work

Figure 17. Final error, E L , versus total work,

L

=1

N , = 0.6 (singular case), p = 1.

few steps of small refinement (Figure 17). However, since the MWEE strategy is slightly more accurate, the difference is very small. Again, the predicted functional reduction factors are good approximations of the actual factors for both strategies (Figure 18). 4.3. Comparison with threshold-based refinement strategy It is instructive to compare the MWEE and MACE strategies with the threshold-based refinement strategy that chooses to refine a fixed fraction of the error functional on each level, i.e. f (r ) ≡ . The same graded grid refinement strategy is used for the elements that contain a singularity. We find the following for our example problem. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

105

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

0.9

0.31 predicted factor

0.8

predicted factor

0.3

actual factor

actual factor 0.29

0.7

0.6

γ and g

γ and g

0.28

0.5

0.27 0.26 0.25

0.4 0.24 0.3

0.2

(a)

0.23

0

2

4

6

8

10

12

0.22

14

(b)

level

1

2

3

4

5

6

7

8

9

10

11

level

Figure 18. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.

10 MWEE MACE graded threshold 1.0 graded threshold 0.8 graded threshold 0.2

log10(E)

error on final grid

MWEE MACE graded threshold 1.0 graded threshold 0.8 graded threshold 0.2

10

10

1

(a)

1.5

2

2.5

3

log10(N)

3.5

4

10

(b)

10

10

10

total work

Figure 19. Efficiency-based and threshold-based refinement strategies: (a) error versus DOF and (b) final L error, E L , versus total work, =1 N . (Both = 0.6 (singular case), p = 1.)

If we choose to refine a fixed fraction of the global error that is too small (less than the average of f (ropt ) in the modified efficiency-based strategies), e.g. = 0.2 in Figure 19, then the resulting grid sequence is almost of optimal accuracy, but the total work increases significantly since N increases slowly. A threshold value that is too large (larger than the average of f (ropt ) in the modified efficiency-based strategies), e.g. = 1.0 in Figure 19, makes the number of elements, L , increases faster, but the large threshold results in a less accurate grid sequence. This {N }=1 implies that more total work is required to reach the same error bound. A threshold value that is close to the average of f (ropt ) in the modified efficiency-based strategies, namely, = 0.8 in Figure 19, results in a refinement process that performs similar to the efficiency-based refinement processes. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

106

H. DE STERCK ET AL.

In conclusion, the efficiency-based refinement strategies automatically and adaptively choose a nearly optimal fraction of the error to be refined. As a result, they generate nearly optimal grid sequences in an efficient way, and there is no need to determine the optimal value of a threshold parameter. 4.4. Results for p = 2 In this section, we briefly illustrate how the (M)WEE and (M)ACE strategies perform for finite element polynomial order p = 2. First, consider a smooth case with = 3.1, such that u ∈ H 3 and u ∈ / H 4 . Error versus DOF and total work are plotted for WEE and ACE in Figure 20. Both strategies lead to global refinement in every step for this example and produce a sequence of grids that are very close to optimal radical grids. Figure 21 shows results for a highly singular case, with = 0.6, such that u ∈ H 1 and u ∈ / H 2. WEE and ACE produce small refinements, but this is remedied by the MWEE and MACE strategies,

(a)

(b)

Figure 20. Efficiency-based refinement strategies for a smooth problem with p = 2 ( = 3.1): (a) error L N . versus DOF and (b) final error, E L , versus total work, =1

(a)

(b)

Figure 21. Efficiency-based refinement strategies for a singular problem with p = 2 ( = 0.6): (a) error L versus DOF and (b) final error, E L , versus total work, =1 N . Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

107

resulting, as before, in much less work for the modified strategies. It has to be noted, however, that the MWEE and MACE grids contain many more elements than optimal graded grids. This is probably due to the fact that the singularity is very strong for = 0.6 and p = 2, such that a geometrically graded grid with a grading factor of 12 does not decrease the grid size fast enough in the vicinity of the singularity. Nevertheless, we can conclude that, within the constraint of refinement based on splitting cells in two, the MWEE and MACE strategies lead to an efficient refinement process.

5. EFFICIENCY-BASED hp-REFINEMENT STRATEGIES IN 1D Assuming that we know a good approximation for the p-refinement error reduction factor for each element, we can apply the efficiency-based refinement strategies to hp-refinement processes. 5.1. hp-version of the (M)WEE and (M)ACE refinement strategies Consider an hp-finite element method for our simple example problem (3). Let Th = {0 = x0 <x1 < · · · <x N = 1} be the grid and let p = { p1 , p2 , . . . , p N } be the degrees of the polynomials in the elements. Let u h be the Galerkin finite element solution of (3) and 2j ( p j ) = u −u s 20, j the local error functional in element j = [x j−1 , x j ] with polynomial of degree p j . We choose local Legendre polynomials as the modal base functions [3]. Then we have the following theorem: Theorem 2 (Gui and Babuˇska [5]) Let 2j ( p j ) be the local error of the finite element solution of problem (12), and let √ √ xi − xi−1 j = [x j−1 , x j ], j = √ √ xi + xi−1 Then 21 ( p1 ) ≈

h 12−1

(19)

p14−2

If j (2 jN ) is not close to 1, then ⎫2 ⎧

2 −1 p j ⎬ ⎨ 1− j j −1/2 2j ( p j ) ≈ h j ⎩ 2 j p j ⎭

(20)

We only consider h-refinement for the first element, which contains the singularity. Then we have the error functional reduction factor bound ( 12 )2−1 as in (12). For an element j that does not contain the singularity, note that j is small, and again we obtain the same h-reduction factor bound, ( 12 )2 p j , as before (see (10)). Moreover, if we double the degree of polynomial p j , we obtain the p-reduction factor bound as follows:

p j 2 j j (2 p j ) ≈ (21) j(pj) 2 Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

108

H. DE STERCK ET AL.

We can then develop an hp-version of the MWEE strategy as follows: (1) Order the elements such that the local error, j , satisfies 1 2 · · · N . (2) Let pmax be the maximal polynomial order to be used in the refinement process. Three types of refinement are used, depending on the element. We use a graded grid with p = 1 for the elements containing a singularity, in such a way that the predicted error-reduction factor attains 14 . (Note that a target reduction factor of up to 1/2 pmax could be used, but we choose 1 4 for simplicity in our numerical tests.) For elements without a singularity, p-refinement (doubling p) is used if the solution is locally smooth enough (which, in general, can be detected a posteriori by comparing predicted and observed error-functional reduction ratios) and p< pmax . Otherwise, h-refinement is used and the degree p is inherited by both subelements. As before, we assume that the work of solving the linear systems is proportional to the number of DOF. Then, doubling p or splitting the element into two elements with order p has the same computational complexity. (3) Calculate the MWEE or MACE efficiency function and find the optimal fraction of elements to be refined, ropt . (4) Refine elements j , 1 jropt N . (5) Repeat. For a general problem different from (3), it may be difficult to find a sharp approximation formula for the error reduction in the case of p-refinement. Hence, we are interested in seeking a more general but possibly less sharp p-error reduction factor. Recall that for elements j in which the solution is smooth (at least in H p j +1 ( j ) if order p j elements are used), we have 2p

2j ( p j )C( p j )h j j u2

H

p j +1

( j )

More precisely, we have the following approximation [3]: 2j ( p j )c Assuming that (1/(2 p j )!)u2

H

p j +1

( j )

hj 2

2 p j

1 u2 p j +1 ( j ) H (2 p j )!

(22)

M, where M is a constant, we obtain the following general

p-error reduction factor 2j (2 p j ) 2j ( p j )

≈

hj 2

2 p j (23)

for elements j that do not contain a singularity. 5.2. Optimal geometric hp-grid for the model problem Just as in the case of h-refinement, we seek some kind of optimal grid for comparison. Suppose the locations of the grid points are given by x j =q N− j , Copyright q

2008 John Wiley & Sons, Ltd.

0
(24)

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

109

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

10

E

10

10 geometric qopt geometric q=0.5

10 10

10

10

10

DOF

Figure 22. Error versus DOF, = 0.6 (singular case), p = 1.

√ √ Let j = = (1− q)/(1+ q), ∀ j : 1 jN . It was shown in [5] that the optimal degree distribution of p for these grid locations tends to a linear distribution with slope so = (−1/2)

log q log

(25)

Furthermore, the optimal geometric grid factor q and linear slope so combination is given by √ qopt = ( 2−1)2 ,

sopt = 2−1

(26)

5.3. Numerical results and comparisons We apply the hp-version MWEE and MACE strategies with the two p-refinement reduction factors given by (21) and (23) to our model problem 3 with = 0.6 and compare the numerical results with the optimal geometric grid with q = qopt and q = 12 ; see Figures 22 and 23. In the numerical results, we carry out the refinement process until E L (u h , f )5e−3 on final grid level L. Observe that the hp-finite element methods result in much faster error convergence rates than the h-finite element method. Both the hp-MACE and hp-MWEE strategies result in a highly accurate grid sequence with rate-of-error convergence very close to the geometrical grid with grading number q = 0.5. Also, the refinement process is efficient, i.e. the number of DOF increases fast w.r.t. the refinement level. Surprisingly, hp-refinement strategies using the more general, but less accurate, error reduction factor (23), result in a better grid sequence than with the more accurate Babuˇska factor, (21). The results are even better than the optimal geometric grid sequence when the number of DOF is small. More work needs to be done to verify whether the general factor (23) works well for more general problems. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

110

H. DE STERCK ET AL.

error on final grid

10

10

10

10 10

10

10

10

10

total work

Figure 23. Final error, E L , versus total work,

L

=1

N , = 0.6 (singular case), p = 1.

6. 2D RESULTS In this section, we explore the use of the proposed efficiency-based refinement strategies in two spatial dimensions. In these initial considerations, we discuss only problems with sufficiently smooth solutions. 6.1. Efficiency strategies in Rd The efficiency-based WEE and ACE refinement strategies presented above can readily be applied to problems in d spatial dimensions. Let ⊂ Rd . Assume again that the error estimator, F(u h , f ), is equivalent to the H 1 norm of u −u h : F(u h , f ) ≈ u −u h 2H 1 () . Assume that the refinement process, in each step, splits elements into 2d sub-elements. Then the element growth ratio, (r ), is given by (r ) = 1+(2d −1)r

(27)

Suppose the solution is sufficiently smooth in the whole domain. As in the 1D case, the predicted functional reduction factor, (r ), is given by (r ) = 1− f (r )+( 12 )2 p f (r )

(28)

The WEE and ACE strategies can then be used to determine the fraction of elements to be refined, ropt , according to Equations (5) and (7), respectively. It should be noted here that the WEE measure may be problematic in dimensions higher than √one. This can be seen as follows. The WEE measure determines ropt by minimizing MWEE ≡ (r ) (r ) over r ∈ [1/N , 1]. For smooth solutions, (1/N ) ≈ 1 and (1/N ) ≈ 1, such that MWEE (1/N ) ≈ 1. For r = 1, however, it can be observed that (1) = 2d and (1) = ( 12 )2 p , such that MWEE (1) = 2d− p . This means that MWEE >1 when d> p. MWEE (r ) is often a very smooth function; hence, ropt is likely to be close to 1/N when d> p, resulting in small refinements, which are inefficient. We, Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

111

thus, expect that the WEE strategy may not be efficient when d> p. We investigate this issue in the numerical results presented below. Also, it can be noted that this problem does not occur for the ACE strategy.

6.2. Model problem and finite element method The following 2D finite element problem is considered to illustrate the efficiency-based refinement strategies. We solve the Poisson boundary value problem (BVP) − p = f p=g

in on *

(29)

= (0, 1)×(0, 1) with the right-hand side f and boundary conditions g chosen such that the solution is given by ⎧ 1, r r0 ⎪ ⎨ p(r, ) = h(r ), r0 r r1 ⎪ ⎩ 0, r1 r

(30)

Here, (r, ) are the usual polar coordinates and h(r ) is the unique polynomial of degree five such that p ∈ C 2 (). We choose r0 = 0.7 and r1 = 0.8. The solution of this test problem takes on the unit value in the lower left corner of the domain and is zero elsewhere, except for a steep gradient in the thin strip 0.7r 0.8. Figure 24(a) shows the grid obtained for this model problem after several refinement steps.

Figure 24. Adaptively refined grids using the ACE refinement strategy for 2D problems with p = 2: (a) single arc on a unit square domain and (b) double arc on a unit square domain. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

112

H. DE STERCK ET AL.

To illustrate the broad applicability of our refinement strategies, we solve this model problem using a FOSLS finite element method, rather than the Galerkin method that was used for the 1D test problems. BVP (29) is rewritten as a first-order system BVP [8] −∇ ·U = f

in

U = ∇p ∇ ×U = 0 p=g

on *

(31)

*g * = (0, 1)×(0, 1)

s·U =

where U is a vector of auxiliary unknowns, and s is the unit vector tangent to *. The FOSLS error estimator is given by F( ph ,Uh ; f ) = ∇ ·Uh + f 2L 2 () +Uh −∇ ph 2(L 2 ())2 +∇ ×Uh 2L 2 () .

Under certain smoothness assumptions, the FOSLS error estimator is equivalent to the H 1 -norm [8]: F( ph ,Uh ; f ) ≈ p − ph 2H 1 () +U −Uh 2(H 1 ())2 . Note that in our approach refinement is performed in such a way that new nodes are introduced on element edges and faces; hence, local refinement introduces hanging nodes (see Figure 24(a)). To maintain a C 0 solution, we treat these as slave nodes, enforcing a continuity constraint across element boundaries. This results in a conforming finite element method, and the approximation properties discussed in this paper still hold on this type of grid. 6.3. Numerical results We present numerical results for the 2D model problem with p = 1 and 2 in Figures 25 and 26, respectively. The figures show error versus DOF and total work for the WEE and ACE refinement strategies, compared with global refinement in every step. For p = 1, the ACE strategy results in an efficient algorithm, but, as expected, the WEE strategy produces many small refinement steps for this case where d> p and is, thus, not efficient (Figure 25).

(a)

(b)

Figure 25. Efficiency-based refinement strategies for the 2D model problem with p = 1: (a) error versus L DOF and (b) final error, E L , versus total work, =1 N . Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES

(a)

113

(b)

Figure 26. Efficiency-based refinement strategies for the 2D model problem with p = 2: (a) error versus L DOF and (b) final error, E L , versus total work, =1 N .

Figure 26 shows that, for p = 2 (d = p), both the ACE and WEE strategies produce an efficient refinement process. Figure 24(b) shows the resulting grid when the ACE strategy is applied to a slightly more complicated test problem, in which two circular steps are superimposed (u = 1 in the lower left corner, u = 2 in the lower right corner, u = 3 where the two steps overlap, and u = 0 in the top part of the domain). The adaptive refinement process adequately captures the error generated at the steep gradients.

7. CONCLUSIONS Two efficiency-based adaptive refinement strategies for finite element methods, WEE and ACE, were discussed. The two strategies take both error reduction and work into account. The two strategies were first compared for a 1D model problem. For the case of h-refinement with smooth solutions, the efficiency-based strategies generate a highly accurate grid sequence and an efficient refinement process. However, for singular solutions, the refinement process becomes inefficient due to many steps of small refinements. Use of a graded grid for elements with a singularity leads to significant improvement. For both the WEE and ACE strategies, this modification saves a lot of work and also results in a highly accurate grid sequence. For the hp-refinement case, similar conclusions are obtained. However, for general problems, the difficulty here may lie in how to find a good approximation for the p-error reduction factor. Application to problems with spatial dimension larger than one shows that the WEE strategy is inefficient when the dimension, d, is larger than the finite element order, p. The ACE strategy, however, produces an efficient refinement process for any combination of d and p. Future work will include application of these grid refinement strategies to problems with singularities in multiple spatial dimensions. Also, an idea to be explored in the future is to enhance the refinement strategies by allowing double or triple refinement for some elements, and determining, in each step, the optimal number of elements to be refined once, twice and thrice. More realistic measures for computational work must be considered that may, for instance, take into account matrix assembly costs and multigrid convergence factors, and their dependence on the finite element order and the spatial dimension of the problem. Another topic of interest is the Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

114

H. DE STERCK ET AL.

parallelization of the efficiency-based refinement strategies. Binning strategies need to be considered to reduce the work for minimizing the efficiency measures and to reduce the communication between processors [4]. Also, load balancing issues are important for parallel adaptive methods (see, e.g. [10]). After initial solution of a coarse level problem on a single processor, the domain may be partitioned such that each parallel processor receives a subdomain with approximately the same amount of error. This may be a fruitful strategy for load balancing in that, as the grid becomes finer, the optimal refinement approaches global refinement, which requires minimal load balancing. This will be explored in future research. REFERENCES 1. Ruede U. Mathematical and Computational Techniques for Multilevel Adaptive Methods. Frontiers in Applied Mathematics, vol. 13. SIAM: Philadelphia, 1993. 2. Verfuerth R. A Review of a Posteriori Error Estimation and Adaptive Mesh-Refinement Techniques. Teubner, Wiley: Stuttgart, 1996. 3. Schwab C. p- and hp-Finite Element Methods. Clarendon Press: Oxford, 1998. 4. Berndt M, Manteuffel TA, McCormick SF. Local error estimates and adaptive refinement for first-order system least squares (FOSLS). Electronic Transactions on Numerical Analysis 1997; 6:35–43. 5. Gui W, Babuˇska I. The h, p and hp versions of the finite element method in 1 dimension, parts I, II, III. Numerische Mathematik 1986; 49:577–683. 6. Brenner SC, Scott LR. The Mathematical Theory of Finite Element Methods. Springer: New York, 1996. 7. Cai Z, Lazarov R, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial differential equations. I. SIAM Journal on Numerical Analysis 1994; 31:1785–1799. 8. Cai Z, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial differential equations. II. SIAM Journal on Numerical Analysis 1997; 34:425–454. 9. Bochev PB, Gunzburger MD. Finite element methods of least-squares type. SIAM Review 1998; 40:789–837. 10. Bank RE, Holst MJ. A new paradigm for parallel adaptive meshing algorithms. SIAM Review 2003; 45:292–323.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:89–114 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:115–139 Published online 29 October 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.559

Distance-two interpolation for parallel algebraic multigrid Hans De Sterck1 , Robert D. Falgout2 , Joshua W. Nolting3 and Ulrike Meier Yang2, ∗, † 1 Department

of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada N2L 3G1 for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551, U.S.A. 3 Department of Applied Mathematics, University of Colorado at Boulder, Campus Box 526, Boulder, CO 80302, U.S.A.

2 Center

SUMMARY Algebraic multigrid (AMG) is one of the most efficient and scalable parallel algorithms for solving sparse linear systems on unstructured grids. However, for large 3D problems, the coarse grids that are normally used in AMG often lead to growing complexity in terms of memory use and execution time per AMG V-cycle. Sparser coarse grids, such as those obtained by the parallel modified independent set (PMIS) coarsening algorithm, remedy this complexity growth but lead to nonscalable AMG convergence factors when traditional distance-one interpolation methods are used. In this paper, we study the scalability of AMG methods that combine PMIS coarse grids with long-distance interpolation methods. AMG performance and scalability are compared for previously introduced interpolation methods as well as new variants of them for a variety of relevant test problems on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers. Copyright q 2007 John Wiley & Sons, Ltd. Received 11 May 2007; Revised 20 September 2007; Accepted 21 September 2007 KEY WORDS:

algebraic multigrid; long-range interpolation; parallel implementation; reduced complexity; truncation

1. INTRODUCTION Algebraic multigrid (AMG) [1–4] is an efficient potentially scalable algorithm for sparse linear systems on unstructured grids. However, when applied to large 3D problems, the classical algorithm ∗ Correspondence

to: Ulrike Meier Yang, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA 94551, U.S.A. † E-mail: [email protected] Contract/grant sponsor: U.S. Department of Energy; contract/grant number: W-7405-Eng-48

Copyright q

2007 John Wiley & Sons, Ltd.

116

H. DE STERCK ET AL.

often generates unreasonably large complexities with regard to memory use as well as computational operations. Recently, we suggested a new parallel coarsening algorithm, called the parallel modified independent set (PMIS) algorithm [5], which is based on a parallel independent set algorithm suggested in [6]. The use of this coarsening algorithm in combination with a slight modification of Ruge and St¨uben’s classical interpolation scheme [2] leads to significantly lower complexities as well as significantly lower setup and cycle times. For various test problems, such as isotropic and grid-aligned anisotropic diffusion operators, one obtains scalable results, particularly when AMG is used in combination with Krylov methods. However, AMG convergence factors are severely impacted for more complicated problems, such as problems with rotated anisotropies or highly discontinuous material properties. Since we realized that classical interpolation methods, which use only distance-one neighbors for their interpolatory set, were not sufficient for these coarse grids, we decided to investigate interpolation operators that also include distancetwo neighbors. In this paper, we focus on the following distance-two interpolation operators: we study three methods proposed in [3], namely, standard interpolation, multipass interpolation, and the use of Jacobi interpolation to improve other interpolation operators, and we investigate two extensions of classical interpolation, which we denote with ‘extended’ and ‘extended+i’ interpolation. Our investigation shows that all of the long-distance interpolation strategies, except for multipass interpolation, significantly improve AMG convergence factors compared with classical interpolation. Multipass interpolation shows poor numerical scalability, which, however, can be improved with a Krylov accelerator, but it has very small computational complexity. All other long-distance interpolation operators showed increased complexities. While the increase is not very significant for 2D problems, it is of concern in the 3D case. Therefore, we also investigated complexity reducing strategies, such as the use of smaller sets of interpolation points and interpolation truncation. The use of these strategies led to AMG methods with significantly improved overall scalability. The paper is organized as follows. In Section 2, we briefly describe AMG. In Section 3, distance-one interpolation operators are presented, and Section 4 describes long-range interpolation operators. In Section 5, the computational cost of the interpolation strategies is investigated, and in Section 6 some sequential numerical results are given, which motivate the following sections. Section 7 presents various complexity reducing strategies. Section 8 investigates the parallel implementation of the methods. Section 9 presents parallel scaling results for a variety of test problems, and Section 10 contains the conclusions.

2. ALGEBRAIC MULTIGRID In this section, we give an outline of the basic principles and techniques that comprise AMG, and we define terminology and notation. Detailed explanations may be found in [2, 3, 7]. Consider a problem of the form Au = f

(1)

where A is an n ×n matrix with entries ai j . For convenience, the indices are identified with grid points, so that u i denotes the value of u at point i, and the grid is denoted by = {1, 2, . . . , n}. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

117

In any multigrid method, the central idea is that ‘smooth error,’ e, that is not eliminated by relaxation must be removed by coarse-grid correction. This is done by solving the residual equation Ae =r on a coarser grid, then interpolating the error back to the fine grid and using it to correct the fine-grid approximation. Using superscripts to indicate level number, where 1 denotes the finest level so that A1 = A and 1 = , AMG needs the following components: ‘grids’ 1 ⊃ 2 ⊃ · · · ⊃ M , grid operators A1 , A2 , . . . , A M , interpolation operators P k , restriction operators R k (often R k = (P k )T ), and smoothers S k , where k = 1, 2, . . . , M −1. Most of these components of AMG are determined in a first step, known as the setup phase. During the setup phase, on each level k, k = 1, . . . , M −1, k+1 is determined using a coarsening algorithm, P k and R k are defined and the Ak+1 is determined using the Galerkin condition Ak+1 = R k Ak P k . Once the setup phase is completed, the solve phase, a recursively defined cycle, can be performed as follows: Algorithm MGV(Ak , R k , P k , S k , u k , f k ). If k = M, solve A M u M = f M with a direct solver. Otherwise: Apply smoother S k 1 times to Ak u k = f k . Perform coarse-grid correction: Set r k = f k − Ak u k . Set r k+1 = R k r k . Set ek+1 = 0. Apply M GV (Ak+1 , R k+1 , P k+1 , S k+1 , ek+1 ,r k+1 ). Interpolate ek = P k ek+1 . Correct the solution by u k ← u k +ek . Apply smoother S k 2 times to Ak u k = f k . In the remainder of the paper, index k will be dropped for simplicity. The algorithm above describes a V(1 , 2 )-cycle; other more complex cycles such as W-cycles are described in [7]. In every V-cycle, the error is reduced by a certain factor, which is called the convergence factor. A sequence of V-cycles is executed until the error is reduced below a specified tolerance. For a scalable AMG method, the convergence factor is bounded away from one independently of the problem size n, and the computational work in both the setup and solve phases is linearly proportional to the problem size n. While AMG was originally developed in the context of symmetric M-matrix problems, AMG has been applied successfully to a much wider class of problems. We assume in this paper that A has positive diagonal elements.

3. DISTANCE-ONE INTERPOLATION STRATEGIES In this section, we first give some definitions as well as some general remarks, and then recall the possibly simplest interpolation strategy, the so-called direct interpolation strategy [3]. This is followed by a description of the classical distance-one AMG interpolation method that was introduced by Ruge and St¨uben [2]. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

118

H. DE STERCK ET AL.

3.1. Definitions and remarks One of the concepts used in the following sections is strength of connection. A point j strongly influences a point i or i strongly depends on j if −ai, j > max(−ai,k )

(2)

k=i

where 0<<1. We set = 0.25 in the remainder of the paper. We define the measure of a point i as the number of points which strongly depend on i. When PMIS coarsening is used, a positive random number that is smaller than 1 is added to the measure to distinguish between neighboring points that strongly influence the same number of points. In the PMIS-coarsening algorithm, points that do not strongly influence any other points are initialized as F-points. Using this concept of strength of connection we define the following sets: Ni = { j|ai j = 0} Si = { j ∈ Ni | j strongly influences i} Fis = F ∩ Si Cis = C ∩ Si Niw = Ni \(Fis ∩Cis ) In classical AMG [2], the interpolation of the error at the F-point i takes the form wi j e j ei =

(3)

j∈Ci

where wi j is an interpolation weight determining the contribution of the value e j to ei , and Ci ⊂ C is the coarse interpolatory set of F-point i. In most classical approaches to AMG interpolation, Ci is a subset of the nearest neighbors of grid point i, i.e. Ci ⊂ Ni , and longer-range interpolation is not considered. The points to which i is connected, comprise three sets: Cis , Fis and Niw . Based on assumptions on small residuals for smooth error [1–3, 7], an interpolation formula can be derived as follows. The assumption that algebraically smooth error has small residuals after relaxation Ae ≈ 0 can be rewritten as aii ei ≈ −

ai j e j

(4)

j∈Ni

or aii ei ≈ −

j∈Cis

ai j e j −

j∈Fis

ai j e j −

j∈Niw

ai j e j

(5)

From this expression, various interpolation formulae can be derived. We use the terminology of [3] for the various interpolation strategies. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

119

3.2. Direct interpolation The so-called ‘direct interpolation’ strategy [3] has one of the most simple interpolation formulae. The coarse interpolatory set is chosen as Ci = Cis , and ai j k∈Ni aik , j ∈ Cis (6) wi j = − aii k∈C s aik i

This leads to an interpolation, which is often not accurate enough. Nevertheless, we mention this approach here, since various other interpolation operators which we consider are based on it. This method is denoted by ‘direct’ in the tables presented below. In [3] it is also suggested to separate positive and negative coefficients when determining the weights, a strategy which can help when one encounters large positive off-diagonal matrix coefficients. We do not consider this approach here, since the strategy did not lead to an improvement for the problems we consider here. 3.3. Classical interpolation A generally more accurate distance-one interpolation formula is the interpolation suggested by Ruge and St¨uben in [2], which we call ‘classical interpolation’ (‘clas’). Again, Ci = Cis , but the contribution from strongly influencing F-points (the points in Fis ) in (5) is taken into account more carefully. An appropriate approximation for the errors e j of those strongly influencing F-points may be defined as k∈C a jk ek ej ≈ i (7) k∈Ci a jk This approximation can be justified by the observation that smooth error varies slowly in the direction of strong connection. The denominator simply ensures that constants are interpolated exactly. Replacing the e j with a sum over the elements k of the coarse interpolatory set Ci corresponds to taking into account strong F–F connections using C-points that are common between the F-points. Note that, when the two F-points i and j do not have a common C-point in Cis and C sj , the denominator in (7) is small or vanishing. Weak connections (from the points in Niw ) are generally not important and, in (5), errors e j , j ∈ Niw are replaced by ei . This leads to the following formula for the interpolation weights: aik ak j 1 wi j = − ai j + , j ∈ Cis (8) aii + k∈N w aik k∈F s m∈C s akm i

i

i

In our experiments this interpolation is further modified as proposed in [8] to avoid extremely large interpolation weights that can lead to divergence. Now the interpolation above was suggested based on a coarsening algorithm that ensured that two strongly connected F-points always have a common coarse neighbor. Since this condition is no longer guaranteed when using PMIS coarsening [5], it may happen that the term m∈C s ak,m i in Equation (8) vanishes. In our previous paper on the PMIS-coarsening method [5], we modified interpolation formula (8) such that if this case occurs, aik is added to the diagonal term (the term aii + k∈N w aik in Equation (8)), i.e. the strongly influencing neighbor point k of i is treated similar i to a weak connection of i. In what follows, we denote the set of strongly connected neighbors Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

120

H. DE STERCK ET AL.

k

i

l

Figure 1. Example illustrating a situation occurring with PMIS coarsening, which will not correctly be treated by direct or classical interpolation. Black points denote C-points, white points denote F-points, and the arrow from i to l denotes that i strongly depends on l.

k of i that are F-points but do not have a common C-point, i.e. Cis ∩Cks = ∅, by Fis∗ . Combining this with the modification suggested in [8] we obtain the following interpolation formula: aik a¯ k j 1 wi j = − ai j + , j ∈ Cis (9) s s∗ s aii + k∈N w ∪F s∗ aik a ¯ k∈F \F m∈C km i

i

i

where

a¯ i j =

i

i

0

if sign(ai j ) = sign(aii )

ai j

otherwise

In this paper we refer to formula (9) as ‘classical interpolation’. The numerical results that were presented in [5] showed that this interpolation formula, which is based on Ruge and St¨uben’s original distance-one interpolation formula [2], resulted in AMG methods with acceptable performance when used with PMIS-coarsened grids for various problems, but only when the AMG cycle is accelerated by a Krylov subspace method. Without such acceleration, interpolation formula (9) is not accurate enough on PMIS-coarsened grids: AMG convergence factors deteriorate quickly as a function of problem size, and scalability is lost. For various problems, such as problems with rotated anisotropies or problems with large discontinuities, adding Krylov acceleration did not remedy the scalability problems. One of the issues is that distance-one interpolation schemes do not treat situations similar to the one illustrated in Figure 1 correctly. Here we have an F-point with measure smaller than 1 that has no coarse neighbors. This situation can occur for example if we have a fairly large strength threshold. Both for classical and direct interpolation, the interpolated error in this point will vanish, and coarse-grid correction will not be able to reduce the error in this point. A major topic of this paper is to investigate whether distance-two interpolation methods are able to restore grid-independent convergence to AMG cycles that use PMIS-coarsened grids, without compromising scalability in terms of memory use and execution time per AMG V-cycle.

4. LONG-RANGE INTERPOLATION STRATEGIES In this section, various long-distance interpolation methods are described. Parallel implementation of some of these interpolation methods and parallel scalability results on PMIS-coarsened grids are discussed later in this paper. 4.1. Multipass interpolation Multipass interpolation (‘mp’) is suggested in [3], and is useful for low-complexity coarsening algorithms, particularly the so-called aggressive coarsening [3]. We suggested it in [5] as a possible Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

121

interpolation scheme to fix some of the problems that we saw when using our classical interpolation scheme (9). Multipass interpolation proceeds as follows: 1. Use direct interpolation for all F-points i, for which Cis = ∅. Place these points in set F ∗ . s s ∗ ∗ ∗ 2. For all i ∈ F \ F with F ∩ Fi = 0, replace, in Equation (4), for all j ∈ Fi ∩ F , e j by k∈C j w jk ek , where C j is the interpolatory set for e j . Apply direct interpolation to the new equation. Add i to F ∗ . Repeat step 2 until F ∗ = F. Multipass interpolation is fairly cheap. However, it is not very powerful, since it is based on direct interpolation. If applied to PMIS, it still ends up being direct interpolation for most F-points. However, it fixes the situation illustrated in Figure 1. If we apply multipass interpolation, the point i will be interpolated by the coarse neighbors (black points) of F-points k and l. 4.2. Jacobi interpolation Another approach that remedies convergence issues caused by distance-one interpolation formulae is Jacobi interpolation [3]. This approach uses an existing interpolation formula P (0) and applies one or more Jacobi iteration steps to the F-point portion of the interpolation operator leading to a more accurate interpolation operator P (n) . Assuming that A and the interpolation operator P (n) are reordered according to the C/F-splitting and can be written in the following way: (n) A F F A FC PFC (n) A= , P = (10) AC F ACC ICC (0)

(0)

then Jacobi iteration on A F F e F + A FC eC = 0, with initial guess e F = PFC eC , leads to (n)

(n−1)

PFC = (I F F − D −1 F F A F F )PFC

− D −1 F F A FC

(11)

where D F F is the diagonal matrix containing the diagonal of A F F , and I F F and ICC are identity matrices. If we apply this approach to a distance-one interpolation operator similar to classical interpolation, we obtain an improved long-distance interpolation operator. This approach is also recommended to be used to improve multipass interpolation. We include results where classical interpolation is used followed by one step of Jacobi interpolation in our numerical experiments and denote them by ‘clas+j’. 4.3. Standard interpolation Standard interpolation (‘std’) extends the interpolatory set that is used for direct interpolation [3]. s This is done by extending the stencil obtained through (4) via substitution of every e j with j ∈ Fi by 1/a j j k∈N j a jk ek . This leads to the following formula: aˆ i j e j ≈ 0 (12) aˆ ii ei + j∈ Nˆ i

with the new neighborhood Nˆ i = Ni ∪ j∈F s N j and the new coarse point set Cˆ i = Ci ∪ j∈F s C j . i i This can greatly increase the size of the interpolatory set. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

122

H. DE STERCK ET AL.

n

k

l

m

i

i n

k

l m

Figure 2. Example of the interpolatory points for a 5-point stencil (left) and a 9-point stencil (right). The gray point is the point to be interpolated, black points are C-points and white points are F-points.

See the left example in Figure 2. Consider point i. Using direct or classical interpolation, i would only be interpolated by the two distance-one coarse points. However, when we include the coarse points of its strong fine neighbors m and n, two additional interpolatory points k and l are added, leading to a potentially more accurate interpolation formula. Standard interpolation is now defined by applying direct interpolation to the new stencil, leading to aˆ i j k∈ Nˆ i aˆ ik wi j = − (13) aˆ ii k∈Cˆ i aˆ ik 4.4. Extended interpolation It is possible to extend the classical interpolation formula so that the interpolatory set includes C-points that are distance two away from the F-point to be interpolated, i.e. applying the classical interpolation formula, but using the same interpolatory set that is used in standard interpolation, see Figure 2: Cˆ i = Ci ∪ j∈F s C j . i Using the same reasoning that leads to the classical interpolation formula (8), the following approximate statement can be made regarding the error at an F-point i: k∈Cˆ a jk ek aii + ai j ei ≈ − ai j e j − ai j i (14) j∈N w j∈F s k∈Cˆ i a jk j∈Cˆ i

i

i

It then follows immediately that the interpolation weights using the extended coarse interpolatory set Cˆ i can be defined as aik a¯ k j 1 wi j = − ai j + , j ∈ Cˆ i (15) aii + k∈N w \Cˆ i aik ¯ km k∈F s m∈Cˆ i a i

Copyright q

2007 John Wiley & Sons, Ltd.

i

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

2 -1 0

123

2 -1

-1 2

1

3

Figure 3. Finite difference 1D Laplace example.

Note that this may lead to some weak coarse points in Niw being included in the interpolatory set Cˆ i , if they are strongly connected to a neighbor point of i. This new interpolation formula deals efficiently with strong F–F connections that do not share a common C-point. We call this interpolation strategy ‘extended interpolation’ (‘ext’). 4.5. Extended+i interpolation While extended interpolation remedies many problems that occur with classical interpolation, it does not always lead to the desired weights. Consider the case given in Figure 3. Here we have a 1D Laplace problem generated by finite differences. Points 1 and 2 are strongly connected F-points, and points 0 and 3 are coarse points. Clearly 0, 3 is the interpolatory set for point 1 for the case of extended interpolation. If we apply formula (15) to this example to calculate w1,0 and w1,3 , we obtain w1,0 = 0.5,

w1,3 = 0.5

This is a better result than we would obtain for direct interpolation (6) and classical interpolation (9): w1,0 = 1,

w1,3 = 0

but worse than standard interpolation (13), for which we obtain the intuitively best interpolation weights: w1,0 = 23 ,

w1,3 = 13

(16)

This can be remedied if we include not only connections a jk from strong fine neighbors j of i to points k of the interpolatory set but also connections a ji from j to point i itself. An alternative to expression (7) for the error in strongly connected F-points is then given by k∈C ∪{i} a jk ek ej ≈ i (17) k∈Ci ∪{i} a jk This can be rewritten as

ej ≈

k∈Ci

a jk ek

k∈Ci ∪{i} a jk

+

a ji ei k∈Ci ∪{i} a jk

which then, in a similar way as before, leads to interpolation weights a¯ k j 1 ai j + , j ∈ Cˆ i wi j = − aik a˜ ii ¯ kl k∈F s l∈Cˆ i ∪{i} a

(18)

(19)

i

Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

124

H. DE STERCK ET AL.

with now a˜ ii = aii +

n∈Niw \Cˆ i

ain +

k∈Fis

aik

a¯ ki

¯ kl l∈Cˆ i ∪{i} a

(20)

We call this modified extended interpolation ‘extended+i’, and refer to it by ‘ext+i’ (or sometimes ‘e+i’ to save space) in the tables below. If we apply it to the example illustrated in Figure 3 we obtain weights (16).

5. COMPUTATIONAL COST OF INTERPOLATION STRATEGIES In this section we consider the cost of some of the interpolation operators described in the previous sections. We use the following notations: Nc Nf nk ck fk wk sk fw

total number of coarse points total number of fine points average number of distance-k neighbor points average number of distance-k interpolatory points average number of strong fine distance-k neighbors average number of weak distance-k neighbors average number of common distance-k interpolatory points average number of strong neighbors treated weakly

Here f w indicates the number of strong F-neighbors that are treated weakly, which occur only for classical interpolation (8). Also, sk denotes the average number of C-points which are distance-one neighbors of j ∈ Fis and also distance-k interpolatory points for i, the point to be interpolated, i.e. sk is the number of nonzero coefficients a jl , where j ∈ Fis and l is a distance-k interpolatory point, divided by the number of distance-k interpolatory points for i. Note that sk is usually smaller than ck and at most equal to ck . Note also that n k = f k +ck +wk . In our considerations we assume a compressed sparse row data format, i.e. three arrays are used to store the matrix: a real array that contains the coefficients of the matrix, an integer array that contains the column indices for each coefficient and an integer array that contains pointers to the beginning of each row for the other two arrays. We also assume an additional integer array that indicates whether a point is an F- or a C-point. For all interpolation operators mentioned before, it is necessary to determine at first the interpolatory set. At the same time, the data structure for the interpolation operator can be determined. This can be accomplished by sweeping through each row that belongs to an F-point: coarse neighbors are identified via integer comparisons, and the pointer array for the interpolation operator is generated. For the distance-two interpolation schemes, it is also necessary to check neighbors of strong fine neighbors. This requires n 1 comparisons for direct and classical interpolations, and ( f 1 +1)n 1 comparisons for extended, extended+i and standard interpolations. The final data structure contains Nc + Nf c1 coefficients for classical and direct interpolations, and Nc + Nf (c1 +c2 ) coefficients for extended(+i) and standard interpolations. Next, the interpolation data structure is filled. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

125

For direct interpolation, all that is required is to sweep through a whole row once to compute i = − k∈Ni aik /(aii k∈C s aik ) and then multiply the relevant matrix elements ai j with i . The i sum in the denominator requires an additional n 1 comparisons, and the two summations require n 1 +1 additions. For classical, extended, and extended+i interpolations one needs to first compute for each point k ∈ Fis \ Fis∗ , ik = aik /( m∈D s akm ), where Dis = Cis for classical, Dis = Cˆ i for extended, and i Dis = Cˆ i ∪{i} for ext+i interpolation. For example, for classical interpolation, this requires f 1 n 1 comparisons. After this step, all these coefficients need to be processed again in order to add ik ak j to the appropriate weights. This requires an additional f 1 n 1 comparisons. The number of additions, multiplications and divisions can be determined similarly. For standard interpolation, at first the new stencil needs to be computed, leading to f 1 n 1 additions and multiplications and f 1 divisions. This can be done when setting up the data structure to avoid n 1 comparisons. After this one proceeds just as for direct interpolation with a much larger stencil of size n 1 +n 2 . The number of floating point additions, multiplications, and divisions to compute all interpolation weights for each F-point are given in Table I. Note that a sum over m elements is treated as m additions, assuming that we are adding to a variable that was originally 0. Also note that occurrences of products of variables, such as f i ci or f i si , are of order n i2 , since f i , ci , si are dependent on n i . This is also reflected in the results given in Table II for two specific examples. Let us look at some examples to get an idea about the actual cost involved. First, consider a 5-point stencil as in Figure 2. Here, we have the following parameters: c1 = f 1 = 2, w1 = w2 = 0, n 1 = 4, f w = 2, s1 = 0, c2 = 2, f 2 = 3, n 2 = 5, s2 = 1.5. Table II shows the resulting interpolation cost. Next, we look at an example with a bigger stencil, see the 9-point stencil in Figure 2 and Table II. The parameters are now c1 = 2, f 1 = 6, w1 = w2 = 0, n 1 = 8, f w = 1, s1 = 1, c2 = 3, f 2 = 12, n 2 = 15, s2 = 1. We clearly see that a larger stencil significantly increases the ratio of classical over direct interpolation, as well as that of distance-two over distance-one interpolations. Table III shows the times for calculating these interpolation operators for matrices with stencils of various sizes. Two 2D examples, one with a 5-point and another with a 9-point stencil, were examined on a 1000×1000 grid. The 3D examples, with a 7-point and a 27-point stencil, were examined for an 80×80×80 grid. We have also included actual measurements of the average number of interpolatory points for these examples. As expected, larger stencils lead to a larger number of operations for each interpolation operator, with a much more significant increase

Table I. Computational cost for various interpolation operators. Interpolation direct clas std ext ext+i Copyright q

Additions

Multiplications

Divisions

Comparisons

n 1 +1 2 f 1 s1 +w1 + f w f 1 n 1 +n 1 +n 2 +1 2 f 1 (s1 +s2 )+w1 2 f 1 (s1 +s2 +1)+w1

c1 +1 f 1 s1 +c1 f 1 n 1 +c1 +c2 +1 f 1 (s1 +s2 )+c1 +c2 f 1 (s1 +s2 +1)+c1 +c2

1 f 1 − f w +1 f 1 +1 f 1 +1 f 1 +1

2n 1 (2 f 1 +1)n 1 ( f 1 +2)n 1 +n 2 (3 f 1 +1)n 1 (3 f 1 +1)n 1

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

126

H. DE STERCK ET AL.

Table II. Cost for examples in Figure 2. Left example in Figure 2 Interpolation direct clas std ext ext+i

Right example in Figure 2

Adds

Mults

Divs

Comps

Adds

Mults

Divs

5 2 18 6 10

3 2 13 7 9

1 1 3 3 3

8 20 21 28 28

9 13 72 24 36

3 8 54 17 23

1 6 7 7 7

Comps 16 104 79 152 152

Table III. Average number of distance-one (c1 ) and distance-two (c2 ) interpolatory points and times for various interpolation operators. Interpolation Stencil

c1

c2

Direct

clas

std

ext

ext+i

5-point 9-point 7-point 27-point

2.3 1.8 2.7 2.3

1.9 2.8 4.1 7.2

0.27 0.36 0.19 0.40

0.35 1.11 0.31 3.72

0.64 2.16 0.80 8.00

0.51 2.09 0.73 7.43

0.54 2.48 0.81 8.32

for distance-two interpolation operators, particularly for the 3D problems. These effects are significant, especially since on coarser levels the stencils become larger and, thus, impact the total setup time.

6. SEQUENTIAL NUMERICAL RESULTS While the previous section examined the computational cost for the interpolation operator, we are of course mainly interested in the performance of the complete solver, which also includes coarsening, the generation of the coarse-grid operator as well as the solve phase. We apply the new and old interpolation operators here to a variety of test problems from [5] to compare their efficiency. We did not include results using direct interpolation, since it performs worse than classical and multipass interpolation for the problems considered, nor results using multipass interpolation followed by Jacobi interpolation, since these results were very similar to those obtained for ‘clas+j’. All these tests were obtained using AMG as a solver with a strength threshold of = 0.25, and coarse–fineGauss–Seidel as a smoother. The iterations were stopped when the relative residual was smaller than 10−8 . We also include operator complexity Cop , which is defined as the sum of the number of nonzeroes of all matrices Ak divided by the number of nonzeroes of the original matrix A = A1 . Cop is an indicator of computational complexity and memory use, i.e. large operator complexities lead to large setup times, times per cycle and memory requirements. Table IV shows results for the 2D Poisson problem −u = f using a 5-point finite difference discretization and a 9-point finite element discretization. Table V shows results for the 2D rotated Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

127

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

Table IV. AMG for the 5- and 9-point 2D Laplace problem on a 1000×1000 square with random right-hand side using different interpolation operators. 5-point

9-point

Method

Cop

# its

Time

Cop

# its

Time

clas clas+j mp ext ext+i std

1.92 2.65 1.92 2.54 2.57 2.56

244 15 244 16 11 16

151.60 26.09 152.34 20.24 16.93 20.63

1.24 1.65 1.24 1.60 1.60 1.60

157 9 183 10 10 17

100.44 21.03 115.72 18.26 18.40 23.06

Table V. AMG for a problem with 45◦ and 60◦ rotated anisotropy on a 512×512 square using different interpolation operators. 45◦

60◦

Method

Cop

# its

Time

Cop

# its

clas clas+j mp ext ext+i std

1.90 2.39 1.90 2.07 2.07 2.07

168 29 163 31 11 13

38.60 10.50 37.16 8.75 4.05 4.53

1.82 3.40 1.82 2.69 2.89 2.89

>1000 424 >1000 217 97 148

Time 131.85 59.70 29.78 43.68

anisotropic problem −(c2 +s 2 )u x x +2(1−)scu x y −(s 2 +c2 )u yy = 1

(21)

with s = sin , c = cos , and = 0.001 with rotation angles = 45 and 60◦ . The use of the distance-two interpolation operators combined with PMIS shows significant improvements over classical and multipass interpolations with regard to number of iterations as well as time. The best interpolation operator here is the ext+i interpolation, which has the lowest number of iterations and times in general. The difference is especially significant in the case of the problems with rotated anisotropies. The operator complexity is larger, however, as was expected. This increase becomes more significant for 3D problems. Here we consider the partial differential equation −(au x )x −(au y ) y −(au z )z = f

(22)

on a n ×n ×n cube. For the Laplace problem a(x, y, z) = 1, for the problem denoted by ‘Jumps’ we consider the function a(x, y, z) = 1000 for the interior cube 0.1<x, y, z<0.9, a(x, y, z) = 0.01 for 0<x, y, z<0.1 and the other cubes of size 0.1×0.1×0.1 that are located at the corners of the unit cube and a(x, y, z) = 1 elsewhere. The 27-point problem is a matrix with a 27-point stencil with the value 26 in the interior and −1 elsewhere and is being tested because we also wanted to consider a problem with a larger stencil. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

128

H. DE STERCK ET AL.

Table VI. AMG for a 7-point 3D Laplace problem, a problem with a 27-point stencil and a 3D structured PDE problem with jumps on a 60×60×60 cube with a random right-hand side using different interpolation operators. 7-point

27-point

Jumps

Method

Cop

# its

Time

Cop

# its

Time

Cop

# its

Time

clas clas+j mp ext ext+i std

2.34 5.12 2.35 4.93 4.27 4.20

45 11 47 11 9 10

10.21 20.35 10.40 16.70 14.48 12.78

1.09 1.34 1.10 1.35 1.35 1.38

28 8 30 8 8 10

10.58 17.10 9.39 21.32 21.55 18.58

2.50 5.37 2.50 5.27 5.10 5.21

>1000 15 80 15 11 18

20.99 17.37 16.89 15.96 17.47

70 60 50 no. of its.

clas mp

40

clas+j std

30

ext ext+i

20 10 0 20

40

60 n

80

100

Figure 4. Number of iterations for PMIS with various interpolation operators for a 3D 7-point Laplace problem on a n ×n ×n-grid.

While for these problems AMG convergence factors for distance-two interpolation improve significantly compared with classical and multipass interpolations, as can be seen in Table VI, overall times are worse for the 7-point 3D Laplace problem as well as the 27-point problem on a 60×60×60 grid. The only problem on the 60×60×60 grid that benefits from distancetwo interpolation operators also with regard to time is the problem with jumps, which requires long-distance interpolation to even converge. Using distance-two interpolation operators leads to complexities about twice as large as those obtained when using classical or multi-pass interpolation, which work relatively well for the 7- and 27-point problem on the 60×60×60 grid. However, when we scale up the problem sizes, they show very good scalability in terms of AMG convergence factors, as can be seen in Figure 4, which shows the number of iterations for a 3D 7-point Laplace problem on a n ×n ×n grid for increasing n. The anticipated large differences in numbers of iterations between distance-one and distance-two interpolations show up in the 2D results of Tables IV and V on grids with 1000 points per direction, but are not yet particularly significant in the 3D results of Table VI with only 60 points per direction. It is expected, however, that for the Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

129

large problems that we want to solve on a parallel computer, distance-two interpolation operators will lead to overall better times than classical or multi-pass interpolation due to scalable AMG convergence factors, if the operator complexity can be kept under control. See Section 9 for actual test results.

7. REDUCING OPERATOR COMPLEXITY While the methods described in the previous section largely restore grid-independent convergence to AMG cycles that use PMIS-coarsened grids, they also lead to much larger operator complexities for the V-cycles. Therefore, it is necessary to consider ways to reduce these complexities while (hopefully) retaining the improved convergence. In this section we describe a few ways of achieving this. 7.1. Choosing smaller interpolatory sets It is certainly possible to consider other interpolatory sets, which are larger than Cis , but smaller than Cˆ i . Particularly, it appears that a good interpolatory set would be one that only extends Cis for strong F–F connections without a common C-point, since in the other cases point i is likely already surrounded by interpolatory points and an extension is not necessary. If we look at the right example in Figure 2, we see that neighbor k of i is the only fine neighbor that does not share a C-point with i. Consequently, it may be sufficient to only include points n and l in the extended interpolatory set. Applying this approach to the extended interpolation leads to the so-called F–F interpolation [9]. The size of the interpolatory set can be further decreased if we limit the number of C-points added when an F-point is encountered that does not have common coarse neighbors. This has been done in the so-called F–F1 interpolation [9], where only the first C-point is added. For the right example in Figure 2 this means that only point n or l would be added to the interpolatory set for i. Choosing a smaller interpolatory set decreases c2 and with it s2 , leading to fewer multiplications and additions for the extended interpolation methods. On the other hand, additional operations are needed to determine which coarse neighbors of strong F-points are common C-points. This means that the actual determination of the interpolation operator might not be faster than creating the extended interpolation operators. The real benefit, however, is achieved by the fact that the use of smaller interpolatory sets leads to smaller stencils for the coarse-grid operator and hence to smaller overall operator complexities. Applying these methods to some of our previous test problems, we attain the results shown in Tables VII and VIII. Here, ‘x-cc’ denotes that interpolation ‘x’ is used, but the interpolatory set is only extended when there are no common C-points. Similarly, ‘x-ccs’ is just like ‘x-cc’, except that for every strong F-point without a common C-point only a single C-point is added. The results show that 2D problems do not benefit from this strategy, since operator complexities are only slightly decreased, while the number of iterations increases. Therefore, total times increase. However, 3D problems can be solved much faster due to significantly decreased setup times leading to only half the total times when the ‘ccs’-strategy is employed. Again, these beneficial effects are expected to be stronger on larger grids. Indeed, additional numerical tests (not presented here, see [9]) also show that the ‘x-cc’ and ‘x-ccs’ distance-two interpolations result in algorithms that are highly scalable as a function of problem size: Cop tends to a constant that is significantly Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

130

H. DE STERCK ET AL.

Table VII. AMG for the 9-point 2D Laplace problem on a 1000×1000 square with random right-hand side using different interpolation operators and rotated anisotropies of 0.001 on a 512×512 grid. 45◦

9-point

60◦

Method

Cop

# its

Time

Cop

# its

Time

Cop

# its

Time

ext ext-cc ext-ccs ext+i ext+i-cc ext+i-ccs

1.60 1.45 1.43 1.60 1.45 1.42

10 14 15 10 14 15

18.26 17.35 17.46 18.40 17.91 17.98

2.07 2.06 2.05 2.07 2.05 2.04

31 34 34 11 14 14

8.67 9.33 9.13 4.05 4.72 4.73

2.69 2.62 2.42 2.89 2.80 2.51

217 247 270 97 117 143

59.70 66.22 67.96 29.78 34.63 38.87

Table VIII. AMG for a 7- and 27-point 3D Laplace problem and a 3D structured PDE problem with jumps on a 60×60×60 cube with a random right-hand side using different interpolation operators. 7-point

27-point

Jumps

Method

Cop

# its

Time

Cop

# its

Time

Cop

# its

Time

ext ext-cc ext-ccs ext+i ext+i-cc ext+i-ccs

4.93 4.62 4.00 4.27 4.12 3.64

11 12 12 9 9 9

16.70 11.11 8.46 14.48 9.16 7.23

1.35 1.33 1.31 1.35 1.33 1.31

8 7 7 8 7 7

21.32 11.82 10.34 21.55 12.48 10.95

5.27 4.86 4.23 5.10 4.66 4.00

15 16 17 11 13 14

16.89 11.59 9.61 15.96 10.35 8.37

smaller than the Cop value for the ‘x’ interpolations, and the number of iterations is nearly constant as a function of problem size, and only slightly larger than the number of iterations for the full ‘x’ interpolation formulas [9]. This shows that using distance-two interpolation formulas with reduced complexities restores the grid-independent convergence and scalability of AMG on PMIScoarsened grids, without the need for GMRES acceleration. This makes these methods suitable algorithms for large problems on parallel computers, as is discussed below. 7.2. Interpolation truncation Another very effective way to reduce complexities is the use of interpolation truncation. There are essentially two ways by which we can truncate interpolation operators: we can choose a truncation factor and eliminate every weight whose absolute value is smaller than this factor, i.e. for which |wi j |< [3], or we can limit the number of coefficients per row, i.e. choose only the kmax largest weights in absolute value. In both cases, the new weights need to be re-scaled so that the total sums remain unchanged. Both approaches can lead to significant reductions in setup times and operator complexities, particularly for 3D problems, but if too much is truncated, the number of iterations rises significantly, as one would expect. We only report results for one interpolation formula (ext+i) for a 3D example here, see Table IX. However, similar results can be obtained using the other interpolation operators. For 2D problems, Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

131

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

Table IX. Effect of truncation on AMG with ext+i interpolation for a 7-point 3D Laplace problem on a 60×60×60 cube with a random right-hand side. Truncation factor

Max. # of weights

Cop

# its

Time

kmax

Cop

# its

Time

0 0.1 0.2 0.3 0.4 0.5

4.27 4.13 3.88 3.39 3.02 2.75

9 9 9 10 13 20

14.48 10.72 8.52 6.82 6.60 7.67

7 6 5 4 3

3.75 3.42 3.01 2.73 2.48

9 9 10 14 24

8.63 7.41 6.42 6.30 7.41

truncation leads to an increase in total time similarly as reported for interpolatory set restriction in the previous section.

8. PARALLEL IMPLEMENTATION This section describes the parallel implementation and gives a rough idea of the cost involved, with particular focus on the increase in communication required for the distance-two interpolation formulae compared with distance-one interpolation. Since the core computation for the interpolation routines is approximately the same as in the sequential case, we only focus on the additional computations that are required for inter-communication between processors. In parallel, each matrix is stored using a parallel data format, the ParCSR matrix data structure, which is described and analyzed in detail in [10]. Matrices are distributed across processors by contiguous blocks of rows, which are stored via two compressed sparse row matrices, one storing the local entries and the other one storing the off-processor entries. There is an additional array containing a mapping for the off-processor neighbor points. The data structure also contains the information necessary to retrieve information from distance-one off-processor neighbors. It, however, does not contain information on off-processor distance-two neighbors, which complicates the parallel implementation of distance-two interpolation operators. When determining these neighbors, there are four scenarios that need to be considered, see Figure 5. Consider point i, which is the point to be interpolated to, and is residing on Processor 0. A distance-two neighbor can reside on the same processor as i, similar to point j; it can be a distance-one neighbor to another point on Proc. 0, similar to point l, and therefore be already contained in the off-processor mapping; it can be a new point on a neighbor processor, similar to point k, or it can be located on a processor, which is currently not a neighbor processor to Proc. 0, similar to point m. There are basically five additional parts that are required for the parallel implementation, and for which we give rough estimates of the cost involved below. Operations include floating point and integer operations as well as message passing and sends and receives required to communicate data across processors. We use the following notations: n 1 denotes the average number of distance-one neighbors per point, as defined previously, p is the total number of processors, qi is the average number of distance-i neighbor processors per processor, Nio is the average number of distance-i off-processor points and equals the sum of the average number of distance-i off-processor C-points, Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

132

H. DE STERCK ET AL.

j Proc. 1

Proc. 0

k

i

Proc. 2

l

m

Proc. 3

Figure 5. Example of off-processor distance-two neighbors of point i. Black points are C-points, and white points are F-points.

Cio , and distance-i off-processor F-points, Fio . Note that the estimates of number of operations and number of processors involved given below are per processor. 1. Communication of C/F splitting for all off-processor neighbor points: This is required for all interpolation operators and takes O(N1o )+ O(q1 ) operations. 2. Obtaining the row information for off-processor distance-one F-points: This step is necessary for classical and distance-two interpolations but not for direct interpolation, which only uses local matrix coefficients to generate the interpolation formula. It requires O(n 1 F1o )+ O(F1o )+ O(q1 ) operations. 3. Determining off-processor distance-two points and additional communication information: This step is only required for distance-two interpolation operators. Finding the new offprocessor points, which requires checking whether they are already contained in the map and describing the off-processor connections, takes O(n 1 F1o log(N1o )) operations. Sorting the new information takes O(N2o log(N2o )) operations. Obtaining the communication information for the new points using an assumed partition algorithm [11], requires O(N2o )+ O(log p)+ O((q1 +q2 ) log(q1 +q2 )) operations. Obtaining the additional C/F splitting information takes O(N2o )+ O(q1 +q2 ) operations. 4. Communication of fine-to-coarse mappings: This step requires O(N1o )+ O(q1 ) operations for distance-one interpolation and O(N1o + N2o )+ O(q1 +q2 ) operations for the distance-two interpolation schemes. 5. Generating the interpolation matrix communication package: This step requires O(C1o )+ O(log p)+ O(q1 log q1 ) operations for distance-one interpolation and O(C1o +C2o )+ O(log p)+ O((q1 +q2 ) log(q1 +q2 )) operations for distance-two interpolation. Note that if truncation is used, Cio should be replaced by C˜ io with C˜ io
2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

133

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

9. PARALLEL NUMERICAL RESULTS In this section, we investigate weak scalability of the new interpolation operators by applying the resulting AMG methods to various problems. The following problems were run on Thunder, an Intel Itanium2 machine with 1024 nodes of four processors each, located at Lawrence Livermore National Laboratory, unless we say otherwise. In this section, p denotes the number of processors used. 9.1. Two-dimensional problems We first consider 2D Laplace problems, which perform very poorly for PMIS with classical interpolation. The results we obtained for 5-point and 9-point stencils are very similar. Therefore, we list only the results for the 9-point 2D Laplace problem here. Table X, which contains the number of iterations and total times for this problem, shows that classical interpolation performs very poorly, and multipass interpolation even worse. Nevertheless, these methods lead to the lowest operator complexities: 1.24. All long-range interpolation schemes lead to good scalable convergence, with standard interpolation performing slightly worse than classical followed by Jacobi, extended or extended+i interpolation, which are the overall fastest methods here with the best scalability. Operator complexities are highest for clas+j with 1.65 and about 1.6 for the other three interpolation operators. Also, when choosing the lower complexity versions e+i-cc and e+i-ccs, with complexities of 1.45 and 1.43, convergence deteriorates somewhat compared with e+i. Since for the 2D problems setup times are fairly small and the improvement in complexities is not very significant, this increase in number of iterations hurts the total times, and therefore there is no advantage in using low-complexity schemes for this problem. Truncated versions lead to even larger total times. Next, we consider the 2D problem with rotated anisotropy (21). The first problem here has an anisotropy of 0.001 rotated by 45◦ , see Table XI. Operator complexities for classical and multipass interpolations are here 1.9; they are 2.4 for classical interpolation followed by Jacobi, and 2.1 for all remaining interpolation operators. Here, extended interpolation performs worse than standard and extended+i interpolation, which gives the best results. In Table XII, we consider the harder problem, where the anisotropy is rotated by 60◦ . Operator complexities are now 1.8 for classical and multipass, 3.4 for clas+j, 2.9 for e+i and std, 2.7 for ext, 2.8 for e+i-cc and 2.5 for e+i-ccs. Here, fastest convergence is obtained for the extended+i interpolation, followed by e+i-cc, e+i-ccs, std, and ext. The other interpolations fail to converge within 500 iterations. While long-range interpolation operators improve convergence, it is still not good enough; hence, this problem should be solved using Krylov subspace acceleration.

Table X. Times in seconds (number of iterations) for a 9-point 2D Laplace problem with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’. p 1 64 256 1024 Copyright q

clas

clas+j

mp

std

ext

e+i

e+i-cc

e+i-ccs

15(88) 48(245) 79(400) 104(494)

3(9) 4(11) 5(12) 6(13)

18(105) 57(278) 85(436) n.c.

4(15) 6(20) 8(27) 9(27)

3(10) 4(12) 5(13) 6(14)

3(10) 4(12) 5(13) 6(14)

3(12) 5(16) 5(19) 7(21)

3(13) 5(19) 6(21) 7(21)

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

134

H. DE STERCK ET AL.

Table XI. Times in seconds (number of iterations) for a 2D problem with a 45◦ rotated anisotropy of 0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’. p 1 64 256 1024

clas

clas+j

mp

std

ext

e+i

e+i-cc

e+i-ccs

24(116) n.c. n.c. n.c.

7(27) 12(36) 13(37) 15(40)

24(119) 96(401) n.c. n.c.

3(11) 7(22) 8(25) 10(29)

7(29) 11(39) 12(42) 14(45)

3(10) 5(16) 6(18) 8(21)

3(12) 7(21) 8(25) 10(29)

3(12) 7(23) 8(27) 11(31)

Table XII. Times in seconds (number of iterations) for a 2D problem with a 60◦ rotated anisotropy of 0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’. p

clas

clas+j

mp

std

ext

e+i

e+i-cc

e+i-ccs

1 64 256 1024

n.c. n.c. n.c. n.c.

105(342) n.c. n.c. n.c.

n.c. n.c. n.c. n.c.

30(107) 79(256) 95(305) 113(357)

45(172) 96(330) 110(374) 123(408)

22(79) 47(152) 56(176) 62(193)

24(87) 59(196) 70(227) 82(263)

28(112) 70(254) 84(299) 100(347)

Table XIII. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem with 40×40×40 points per processor. p 1 64 512 1000 1728

clas

clas+j

mp

5(33) 17(80) 33(149) 39(175) 51(229)

8(11) 18(12) 26(12) 41(12) 63(12)

6(34) 16(79) 28(126) 31(138) 37(159)

std

ext

e+i

ext-ccs

e+i-ccs

5(9) 14(18) 20(26) 26(31) 35(41)

7(11) 16(12) 20(14) 30(13) 46(13)

6(8) 12(10) 17(15) 31(39) 40(33)

4(12) 9(14) 11(14) 15(15) 22(15)

3(9) 7(11) 11(11) 16(14) 24(16)

9.2. Three-dimensional structured problems We now consider 3D problems. Based on the sequential results in Section 6 we expect complexity reduction schemes to make a difference here. The first problem is a 7-point 3D Laplace problem on a structured cube with 40×40×40 unknowns per processor. Table XIII shows total times in seconds, and number of iterations. While classical interpolation solves the problem, the number of iterations increases rapidly with increasing number of processors and problem size. Multipass interpolation performs better for larger number of processors, but still shows unscalable convergence factors. Applying one step of Jacobi interpolation to classical interpolation leads to perfect scalability in terms of convergence factors, but unfortunately also to rising operator complexities (4.9–5.7), which are twice as large as for classical and multipass interpolation (2.3–2.4). Interestingly, while both standard and extended+i interpolations need less iterations for a small number of processors than extended interpolation, they show worse numerical scalability leading to far less iterations for extended interpolation for large number of processors. However, extended interpolation leads to larger complexities (4.7–5.3) compared with extended+i (4.2–4.5) and standard interpolation (4.1–4.4). The complexity reducing strategies lead to the following complexities: ext-cc (4.5–4.9), Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

135

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

Table XIV. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem with 40×40×40 points per processor. p 1 64 512 1000 1728

ext4

ext5

e+i4

e+i5

ext-cc5

e+i-cc5

std5

clas+j0.1

3(13) 6(19) 9(25) 10(25) 12(29)

3(11) 7(15) 8(18) 11(18) 12(21)

3(12) 7(19) 11(28) 11(30) 13(35)

3(9) 7(13) 10(19) 12(20) 14(24)

3(11) 6(14) 8(17) 10(18) 11(21)

3(9) 6(13) 8(17) 9(17) 11(20)

4(12) 9(25) 15(39) 17(39) 28(46)

3(13) 9(23) 13(36) 15(37) 19(45)

Table XV. Total times in seconds (number of iterations) for a structured 3D problem with jumps with 40×40×40 points per processor. p 1 64 512 1000 1728

mp

clas+j

ext

e+i

ext-ccs

e+i-ccs

std4

ext-cc5

11(64) 35(176) 58(280) 65(306) 77(350)

8(14) 20(17) 31(20) 35(21) 60(21)

7(14) 17(17) 24(24) 27(20) 73(70)

6(10) 15(14) 21(20) 26(21) 43(26)

5(18) 11(21) 15(24) 19(24) 25(29)

4(15) 9(19) 13(21) 18(22) 29(23)

6(26) 18(71) 27(98) 33(113) 53(169)

5(17) 11(24) 11(30) 14(33) 17(36)

ext-ccs (3.9–4.2), e+i-cc (4.0–4.3), and e+i-ccs (3.6–3.8). For the sake of saving space, we did not record the results for ext-cc or e+i-cc, but the times and number of iterations for these methods were in between those of ext and ext-ccs, or e+i and e+i-ccs, respectively. Interestingly, the complexity reducing strategies e+i-cc and e+i-ccs show not only better scalability with regard to time, but also better scalability of convergence factors than e+i interpolation in this case. For this problem, complexity reducing strategies, thus, are paying off. Table XIV shows results for various truncated interpolation schemes. We used the truncation strategy that restricts the number of weights per row using either 4 or 5 for the maximal number of elements. While we present both results for ext and e+i, we present only the faster results for the remaining interpolation schemes for the sake of saving space. We used a truncation factor of 0.1 for clas+j. Operator complexities were fairly consistent here across increasing numbers of processors: we obtained 2.9 for ext4, 3.2 for ext5, 2.8 for e+i4, 3.1 for e+i5, 3.2 for ext-cc5, 3.1 for e+i-cc5, 3.2 for std5, and 3.0 for clas+j0.1. Clearly, using four compared with five weights leads to lower complexities, but larger number of iterations. Total times are not significantly different. Comparing the fastest method, e+i-cc5, on 1728 processors to PMIS with classical interpolation, we see a factor of 11 in improvement with regard to number of iterations and a factor of 5 in improvement with regard to total time with a slight increase in complexity. Table XV shows results for the problem with jumps (22), for which PMIS with classical interpolation was shown to completely fail. Multipass interpolation converges here with highly degrading scalability but good complexities of 2.4. Applying Jacobi interpolation to classical interpolation leads to very good convergence, but, due to operator complexities between 5.1 and 5.7, it leads to a much more expensive setup and solve cycle. Applying a truncation factor of 0.1 as in the previous example leads to extremely bad convergence and is not helpful here. Standard interpolation converges very well for small number of processors, but diverges if p is greater or equal to 64. Interestingly enough std4 converges, albeit not very well. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

136

H. DE STERCK ET AL.

9.3. Unstructured problems In this section, we consider various linear systems on unstructured grids that have been generated by finite element discretizations. All of these problems were run on an Intel Xeon Linux cluster at Lawrence Livermore National Laboratory. The first problem is the 3D diffusion problem −a1 (x, y, z)u x x −a2 (x, y, z)u yy −a3 (x, y, z)u zz = f with Dirichlet boundary conditions on an unstructured cube. The material properties are discontinuous, and there are approximately 90 000 degrees of freedom per processor. See Figure 6 for an illustration of the grid used. There are five regions: four layers and the thin stick in the middle of the domain. This grid is further refined when a larger number of processors are used. The functions ai (x, y, z), i = 1, 2, 3 are constant within each of the five regions of the domains with the following values (4, 0.2, 1, 1, 104 ) for a1 (x, y, z), (1, 0.2, 3, 1, 104 ) for a2 (x, y, z), and (1, 0.01, 1, 1, 104 ) for a3 (x, y, z). We also include some results obtained with CLJP coarsening, which is a parallel coarsening scheme that was designed to ensure that two fine neighbors always have a common coarse neighbor and for which classical interpolation is therefore suitable [12, 8]. As a smoother we used hybrid Gauss–Seidel, which leads to a nonsymmetric preconditioner. Since in practice more complicated problems are usually solved using AMG as a preconditioner for Krylov subspace methods, we use AMG here as a preconditioner for GMRES(10). Note that both classical and multipass interpolations do not converge within 1000 iterations for these problems if they are used without a Krylov subspace method, whereas both extended and extended+i interpolations, as well as classical interpolation on CLJP-coarsened grids, converge well without it, with a somewhat larger number of iterations and slightly slower total times. The results in Table XVI show that the long-range interpolation operators, with the exception of multipass interpolation, restore the good convergence that was obtained with CLJP. CLJP has very large complexities, however. We also used a truncated version of classical interpolation, restricting the number of weights per fine point to at most 4 to control the complexities. While this hardly affected convergence factors, it significantly improved the total times to solution, see Figure 7, but

Figure 6. Grid for the elasticity problem. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

137

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

Table XVI. Number of iterations (operator complexities) for the unstructured 3D problem with jumps. AMG is used here as a preconditioner for GMRES(10). CLJP p 1 64 256 512 1024

PMIS

clas

clas4

clas

mp

e+i

e+i-cc

e+i4

9(5.6) 11(6.7) 11(7.8) 11(7.2) 10(8.6)

9(4.2) 12(4.6) 12(5.0) 13(4.6) 12(5.2)

18(1.5) 62(1.5) 72(1.5) 118(1.5) 162(1.6)

20(1.5) 34(1.5) 34(1.5) 35(1.5) 39(1.6)

9(2.7) 11(3.0) 12(2.9) 12(3.0) 12(3.4)

10(2.2) 13(2.3) 12(2.3) 13(2.4) 12(2.6)

9(1.8) 13(1.9) 13(1.8) 12(1.8) 14(2.0)

200

Seconds

150 CLJP/clas CLJP/clas4 PMIS/clas PMIS/e+i PMIS/mp PMIS/e+i-cc PMIS/e+i4

100

50

0 0

200

400 600 no. of procs

800

1000

Figure 7. Total times for a diffusion problem with highly discontinuous material properties. AMG is used here as a preconditioner for GMRES(10).

still did not achieve perfect scalability. Total times for CLJP with clas4 interpolation are comparable with PMIS with classical interpolation due to the small complexities of PMIS in spite of its significantly worse convergence factors. The use of extended+i and e+i-cc interpolations leads to better scalability than the methods mentioned before due to their lower complexities if compared with CLJP, or their better convergence factors if compared with PMIS with classical interpolation. Multipass interpolation leads to even better timings, but the overall best time and scalability are achieved by applying truncation to four weights per fine point to extended+i interpolation. For this problem extended interpolation performs similar to extended+i interpolation. Standard interpolation gives similar results on one processor, but the number of iterations gradually increases from 11 on one processor to 34 on 1024 processors. The second problem is a 3D linear elasticity problem using the same domain as above. However, a smaller grid size is used, since this problem requires more memory, leading to about 30 000 degrees of freedom per processor. The Poisson ratio chosen for the pile driver in the middle of the domain was chosen to be 0.4 and the Poisson ratios in the surrounding regions were 0.1, 0.3, 0.3 Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

138

H. DE STERCK ET AL.

Table XVII. Number of iterations for the 3D elasticity problem; range of operator complexities. AMG is used here as a preconditioner for conjugate gradient. CLJP

PMIS

p

clas

clas4

clas

mp

e+i

e+i-cc

e+i-ccs

e+i4

1 8 64 512

64 83 92 —

63 84 96 112

94 159 210 319

93 131 179 247

68 89 97 108

69 95 105 109

72 96 112 123

72 90 107 123

Cop

4.5–7.3

3.6–5.4

1.5

1.5

2.5–3.0

2.1–2.4

1.9–2.1

1.9–2.0

400 350

times in seconds

300 CLJP/clas CLJP/clas4 PMIS/clas PMIS/mp PMIS/e+i PMIS/e+i-cc PMIS/e+i-ccs PMIS/e+i4

250 200 150 100 50 0 0

100

200 300 no. of procs

400

500

Figure 8. Total times for the 3D elasticity problem. AMG is used here as a preconditioner for conjugate gradient.

and 0.2. Since this is a systems problem, the unknown-based AMG method for systems of PDEs was used. For this problem, the conjugate gradient method was used as an accelerator, and hybrid symmetric Gauss–Seidel as a smoother. The results are given in Table XVII and Figure 8. CLJP ran out of memory for the 512 processor run. Here also extended+i interpolation with truncation leads to the lowest run times and best scalability. Extended interpolation performed similar to extended+i interpolation. While standard interpolation performs similar to the other distance-two interpolation methods for a small number of processors, it performed significantly worse on 512 processors.

10. CONCLUSIONS We have studied the performance of AMG methods using the PMIS-coarsening algorithm in combination with various interpolation operators. PMIS with classical, distance-one interpolation Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG

139

leads to an AMG method with low complexity, but has bad scalability in terms of AMG convergence factors. The use of distance-two interpolation operators restores this scalability. However, it leads to an increase in operator complexity. While this increase was fairly small for 2D problems and was far outweighed by the much improved convergence, for 3D problems complexities were often twice as large, and impacted scalability. To counter this complexity growth, we implemented various complexity reducing strategies, such as the use of smaller interpolatory sets and interpolation truncation. The resulting AMG methods, particularly the extended+i interpolation in combination with truncation, lead to very good scalability for a variety of difficult PDE problems on large parallel computers.

ACKNOWLEDGEMENTS

We thank Tzanio Kolev for providing the unstructured problem generator and Jeff Painter for the Jacobi interpolation routine. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. REFERENCES 1. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984. 2. Ruge JW, St¨uben K. Algebraic multigrid (AMG). In Multigrid Methods, Vol. 3 of Frontiers in Applied Mathematics, McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130. 3. St¨uben K. Algebraic multigrid (AMG): an introduction with applications. In Multigrid, Trottenberg U, Oosterlee C, Sch¨uller A (eds). Academic Press: New York, 2000. 4. Cleary AJ, Falgout RD, Henson VE, Jones JE, Manteuffel TA, McCormick SF, Miranda GN, Ruge JW. Robustness and scalability of algebraic multigrid. SIAM Journal on Scientific Computing 2000; 21:1886–1908. 5. De Sterck H, Yang UM, Heys JJ. Reducing complexity in parallel algebraic multigrid preconditioners. SIAM Journal on Matrix Analysis and Applications 2006; 27:1019–1039. 6. Luby M. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing 1986; 15:1036–1053. 7. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000. 8. Henson VE, Yang UM. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical Mathematics 2002; 41:155–177. 9. Butler J. Improving coarsening and interpolation for algebraic multigrid. Master’s Thesis, Applied Mathematics, University of Waterloo, 2006. 10. Falgout RD, Jones JE, Yang UM. Pursuing scalability for hypre’s conceptual interfaces. ACM Transactions on Mathematical Software 2005; 31:326–350. 11. Baker A, Falgout RD, Yang UM. An assumed partition algorithm for determining processor inter-communication. Parallel Computing 2006; 32:394–414. 12. Cleary AJ, Falgout RD, Henson VE, Jones JE. Coarse grid selection for parallel algebraic multigrid. In Proceedings of the Fifth International Symposium on Solving Irregularly Structured Problems in Parallel. Springer: New York, 1998.

Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:115–139 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:141–163 Published online 28 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.568

Algebraic multigrid for stationary and time-dependent partial differential equations with stochastic coefficients E. Rosseel, T. Boonen and S. Vandewalle∗, † Computer Science Department, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium

SUMMARY We consider the numerical solution of time-dependent partial differential equations (PDEs) with random coefficients. A spectral approach, called stochastic finite element method, is used to compute the statistical characteristics of the solution. This method transforms a stochastic PDE into a coupled system of deterministic equations by means of a Galerkin projection onto a generalized polynomial chaos. An algebraic multigrid (AMG) method is presented to solve the algebraic systems that result after discretization of this coupled system. High-order time integration schemes of an implicit Runge–Kutta type and spatial discretization on unstructured finite element meshes are considered. The convergence properties of the AMG method are demonstrated by a convergence analysis and by numerical tests. Copyright q 2008 John Wiley & Sons, Ltd. Received 14 May 2007; Revised 8 November 2007; Accepted 8 November 2007 KEY WORDS:

partial differential equations with random coefficients; Karhunen–Lo`eve expansion; polynomial chaos; algebraic multigrid; implicit Runge–Kutta time discretization

1. INTRODUCTION Randomness in a physical problem can be modelled mathematically by using stochastic partial differential equations (PDEs). These may contain some stochastic or random parameters, for example, in the coefficients of the differential operator, in the boundary and initial conditions, or in the forcing term. Their solution allows to extract statistical information concerning the solution to the physical model, as required, e.g. in uncertainty propagation problems. The solution of a stochastic PDE can be obtained by a statistical or deterministic approach [1, 2]. The former are typically based on Monte Carlo simulation techniques, see, for example, [3, 4]. ∗ Correspondence

to: S. Vandewalle, Computer Science Department, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium. † E-mail: [email protected] Contract/grant sponsor: Belgian State, Science Policy Office Contract/grant sponsor: Research Council K.U.Leuven, CoE EF/05/006 Optimization in Engineering (OPTEC)

Copyright q

2008 John Wiley & Sons, Ltd.

142

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

Monte Carlo methods are often easy to implement but rapidly become prohibitively expensive with increasing accuracy demands. Examples of deterministic approaches include perturbation methods [5], Neumann expansion methods [6] and the spectral stochastic finite element method [7, 8]. Perturbation and Neumann expansion methods are restricted to small parameter variances and calculate only a few statistical moments of the solution. These restrictions do not hold for the stochastic finite element method, which, in principle, enables to compute the full statistical characteristics of the solution. That is, also the probability distribution of the solution can be extracted. As such, it provides a valuable alternative to Monte Carlo simulations, see [4] for a comparison. The stochastic finite element method transforms a stochastic PDE into a system of coupled deterministic PDEs after a projection of the random solution onto a suitable finite dimensional random space. We will use the stochastic finite element approach to discretize the random part of the PDE. For time-dependent PDEs, we will employ an implicit Runge–Kutta (IRK) time integration scheme [9]. For IRK methods, the dimension of the linear systems to be solved at each time step is proportional to the number of IRK stages. Multigrid methods are available for IRK discretizations of deterministic parabolic PDEs [10, 11]. In this paper, we extend these methods towards PDEs with random coefficients. In particular, we shall study an algebraic multigrid (AMG) approach, suited for unstructured finite element meshes. The paper is organized as follows. Section 2 describes the discretization of time-dependent stochastic PDEs by means of the stochastic finite element and IRK method. The AMG method is presented in Section 3. Its convergence properties are analyzed in Section 4 and further discussed in Section 5. Section 6 addresses some implementation issues. In Section 7, numerical experiments are given to illustrate the convergence behavior. Conclusions are presented in Section 8.

2. THE STOCHASTIC MODEL PROBLEM AND ITS DISCRETIZATION 2.1. A two-dimensional diffusion equation To describe the model problem, we first introduce some concepts from probability theory. Consider a complete probability space (, F, E) defined by a sample space , a -algebra F and a probability measure E. A random variable () is defined as the function : → R. Further on we will simply express instead of (). The expected value of corresponds to = dE = yp(y) dy. In the last equality p(y) represents the probability density function of , with support and y ∈ . A random field (x, ) is defined by the mapping : D⊗ → R, with D being a spatial domain. Hence, at every point x ∈ D, a random field corresponds to a certain random variable. A typical application is a stochastic material parameter that represents the properties of a heterogeneous mixture of materials. A random process (t, ), with t ∈ T = [0, T ], is defined as : T⊗ → R. This concept models stochastic time series, for example, the evolution of shares at a stock market. A random wave (x, t, ) generalizes the previous concepts and is defined by the mapping : D⊗T⊗ → R. We consider a diffusion problem with a random, spatially varying and time-dependent diffusion coefficient (x, t, ), defined over a two-dimensional domain D: *u(x, t, ) −∇ ·((x, t, )∇u(x, t, )) = b(x, t, ) *t Copyright q

2008 John Wiley & Sons, Ltd.

(1)

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

143

where x ∈ D, t ∈ T = [0, T ] and ∈ , a sample space. Further on, we shall consider only the case of a deterministic source term b(x, t). The method, however, immediately extends to the more general case of a stochastic source term. Model problem (1) is completed with suitable boundary conditions and initial conditions in the time-dependent case; only deterministic conditions are considered here. 2.2. Discretization of the random part of the problem To transform the stochastic PDE (1) into a system of deterministic time-dependent PDEs, we will follow the three-step procedure of [12]. First, we express the random inputs by a finite number of random variables; second, the solution is approximated using a finite-term expansion with random basis polynomials and third, we perform a Galerkin projection onto the set of polynomial basis functions. The second step of this procedure will be addressed in Section 2.2.1; steps one and three will be explained in Sections 2.2.2 and 2.2.3, respectively. 2.2.1. Generalized polynomial chaos. Consider a Hilbert space L 2 (, F, E) of square integrable functions of L independent random variables i on (, F, E). A finite dimensional subspace S of L 2 (, F, E) is defined through a set of Q basis functions {q }q=1,...,Q in the random variables 1 , . . . L . Let denote a vector containingthe random variables 1 , . . . L . The space S is equipped with an inner product defined by ab = abw(y) dy, with w(y) denoting the joint probability density corresponding to , the support of and a, b ∈ S. This inner product actually corresponds to an expectation of the product of its arguments. Several approaches have been proposed to construct S, e.g. [7, 12–14]. Here, we shall employ an orthonormal basis of multivariate polynomials l that are globally defined in each random variable i . These multivariate polynomials are built as a product of univariate polynomials {m i }i=1,...,L of degree m i in i and orthonormal w.r.t. the probability measure corresponding to i . Two criteria are often considered to determine the basis functions. One may limit the total degree L of the polynomial to a given value P, i.e. i=1 m i P. The total number of basis functions, Q, is then given by (L + P)!/L!P! [15]. Alternatively, one may limit the degrees of the univariate factors L separately, i.e. m i pi , i = 1, . . . , L, for a given set of pi -values. In this case Q = i=1 ( pi +1) [4]. Using the first criterion, a so-called generalized polynomial chaos basis [1, 12] can be constructed. The univariate polynomials are chosen from the Wiener–Askey scheme according to the probability distributions of the random variables i . The second criterion can be used to create an alternative set of basis functions {q } [4, 13, 16], which possess a double orthogonality property: j k = j,k

and i j k = i jk j,k

(2)

with i jk being a constant and j,k the Kronecker delta. This property allows to transform a linear stochastic PDE into a system of uncoupled deterministic PDEs. Having specified an appropriate random basis, the solution u(x, t, ) can be approximated by a linear combination of basis functions with deterministic coefficients u q (x, t). When the basis functions are collected in the column vector and the coefficients in a column vector u(x, t), we can express u(x, t, ) ≈

Q

u q (x, t)q () = T u(x, t)

(3)

q=1

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

144

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

2.2.2. Discretization of random inputs. The random inputs are typically discretized either by a generalized polynomial chaos expansion approach similar to (3), see e.g. [17, 18], or by a Karhunen–Lo`eve (KL) expansion [19]. The former leads to an approximation of the form (x, t, ) ≈

Q

i (x, t)i ()

with i (x, t) =

(x, t, )q

i=1

q2

(4)

We will apply this type of discretization to model random inputs with a lognormal marginal distribution. Analytical expressions for the corresponding i (x, t) coefficients are given in [20]. The truncated KL expansion approximates a random wave (x, t, ) as (x, t, ) ≈ 1 (x, t)+

L

i+1 (x, t)i ()

(5)

i=1

The function 1 (x, t) corresponds to the mean of (x, t, ). The functions i+1 (x, t) are eigenfunctions of the covariance function C (x1 , t1 ; x2 , t2 ), scaled by the square root of the corresponding eigenvalues. The random variables i are uncorrelated random variables with zero mean and unit variance [21]. We assume that these random variables are independent. Note that L +1 terms are needed to express a random input in an L-dimensional random space by a KL expansion, in comparison with Q terms in the case of a chaos expansion. Hence, a chaos expansion will be used only when the KL expansion is difficult to compute. 2.2.3. Galerkin approach. The stochastic PDE (1) can be converted into a system of deterministic PDEs for the unknown coefficients u q (x, t) that appear in (3). This is done by replacing (x, t, ) by its approximation (4) or (5), by inserting the right-hand side of (3) into the PDE and by imposing orthogonality of the resulting residual w.r.t. the chosen random basis. This results in C1

L∗ *u(x, t) − Ci ∇ ·[i (x, t)∇u(x, t)] = b(x, t)c *t i=1

(6)

with the vector c = and the matrices Ci defined as Ci = i−1 T , = i T ,

i = 1, . . . , L ∗ := L +1 i = 1, . . . , L ∗ := Q

if (x, t, ) represented by a KL expansion (7)

if (x, t, ) represented by a chaos expansion

(8)

and 0 = 1. The matrix C1 equals the identity matrix I Q of dimension Q if the polynomial chaos functions are suitably normalized. Analytical expressions for i T can be found in [22]; in Appendix A we present expressions for i T , see Equation (A1). Remark 2.1 In case of a double orthogonal polynomial chaos and a KL expansion of the random input, the PDEs are uncoupled. Indeed each matrix Ci is diagonal due to the orthogonality properties (2). However, the resulting number of PDEs becomes rapidly too large for practical purposes when the number of random variables or the degree of the polynomials increases. Remark 2.2 The uniqueness and existence of the solution to (6) can be proved from the Lax–Milgram lemma under certain conditions on the stochastic parameters, as detailed in [4, 13]. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

145

2.3. Spatial finite element discretization We shall use classical finite elements for spatial discretization and assume that each of the deterministic coefficients is discretized on the same mesh, with the same (number of) elements. Hence, each u q (x, t) for q = 1, . . . , Q is approximated as a linear combination of the form u q (x, t) ≈ N n=1 u q,n (t)sn (x), in terms of N basis functions sn (x). The coefficients are grouped together in the vectors b(t), uq (t) ∈ R N . The discretization of (6) can be compactly expressed after the introduction of a set of L ∗ deterministic stiffness matrices K i ∈ R N ×N defined as K i = K(i (x, t)) with [K(i (x, t))]kl = i (x, t)∇sk (x)·∇sl (x) dx, i = 1, . . . , L ∗ ; k,l = 1, . . . , N (9) D

After spatial discretization, the stochastic finite element method yields a system of ordinary differential equations (ODE): ⎤ ⎡ u1 (t) ⎥ ⎢ L∗ du(t) ⎥ ⎢ C1 ⊗ M + Ci ⊗ K i u(t) = c ⊗b(t) with u(t) = ⎢ ... ⎥ and uq (t) ∈ R N (10) ⎦ ⎣ dt i=1 u Q (t) Here, M ∈ R N ×N is the mass matrix defined as [M]kl = R Q×Q and the vector c ∈ R Q are defined by (7) or (8).

D sk (x)sl (x) dx,

and the matrices Ci ∈

2.4. Time discretization We consider time discretization by an IRK method [9]. To introduce some notation, we shall recall the basic formula of such a method, as applied to an ODE of the form du/dt = f (t, u). An IRK method computes an approximation u m+1 to the solution u(tm+1 ) at time tm+1 from an approximation u m at time tm . To this end, it introduces a number of auxiliary variables x j , j = 1, . . . , s, called stage values or stage vectors, at times tm +c j t with t = tm+1 −tm . The procedure corresponds to the following set of equations: u m+1 = u m +t

s

b j f (tm +c j t, x j )

(11)

j=1

xi = u m +t

s

ai j f (tm +c j t, x j ),

i = 1, . . . , s

(12)

j=1

Equation (11) expresses u m+1 as an update to u m in terms of the stage values {xi }i=1,...,s . Equation (12) describes the system of equations to be solved to compute the stage values. The method is fully characterized by the parameters Airk = [ai j ], birk = [b1 . . . bs ]T and cirk = [c1 . . . cs ]T . Equations (11) and (12) are often rewritten in terms of the stage value increments x j := x j −u m : u m+1 = u m +[x1 · · · xs ]A−T irk birk xi = t

s

(13)

ai j f (tm +c j t, u m +x j ),

i = 1, . . . , s

(14)

j=1

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

146

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

We will apply the IRK method to system (10). The approximation at time tm+1 to the solution u(tm+1 ) will be represented as u m+1 . Formulation (13)–(14) is used with stage vector increments denoted simply as x j , j = 1, . . . , s. They are grouped together into a long vector x ∈ RNQs , where the increments are numbered first along the random dimension, then along the spatial dimension and finally according to the stages. When the coefficient (x, ) is time independent, system (14) discretizing (10) becomes L∗ C1 ⊗ M ⊗ Is +t b (15) Ci ⊗ K i ⊗ Airk x = i=1

with b being a known vector depending on u m and on the right-hand side of (10) ⎤⎞ ⎛ ⎡ ⎞ ⎛ b(tm +c1 t) ∗ ⎥⎟ ⎜ ⎢ ⎟ ⎜ ⎥⎟ L ⎢ ⎟ ⎜ . T⎜

. b = t ⎜ INQ ⊗ Airk P ⎜c ⊗ ⎢ ⎥⎟ − Ci ⊗ K i ⊗ Airk [u m ⊗1s ]⎟ . ⎦⎠ i=1 ⎝ ⎣ ⎠ ⎝ b(tm +cs t)

(16)

and 1s = [1 . . . 1]T ∈ Rs . The matrix P T is such that it permutes the rows of the vector it multiplies so that all variables are grouped in the same order as the unknowns x. In case of a time-dependent stochastic coefficient (x, t, ), each of the elements of the stiffness matrices K i (9) is time dependent. According to Equation (14), every stiffness matrix K i (t) is evaluated at s time positions t = tm +c j t, j = 1, . . . , s. This leads to a total of L ∗ ·s stiffness matrices at each time step. Applying (14) yields the following system to be solved for the stage vector increments: ∗ L (C1 ⊗ M ⊗ Is )+t Ci ⊗ K i (tm +c1 t)⊗ Airk (:, 1) . . . i=1 L∗

Ci ⊗ K i (tm +cs t)⊗ Airk (:, s) P x = B

(17)

i=1

Matrix P is an NQs×NQs permutation matrix. It permutes the columns of the matrix that it is multiplied with so that consecutive IRK stages are grouped together in blocks of s columns. In the remainder of the paper, the multigrid formulation and analysis are presented for time-independent (x, ). The extension to the general case of (x, t, ) is straightforward. Remark 2.3 In Equation (15) the unknowns are ordered block-wise. The vector x consists of Q consecutive blocks, with each block corresponding to the unknowns associated with a random mode. These blocks can further be subdivided in N blocks, where each one contains the IRK unknowns per spatial node. Similar to the discussion in [23] on unknown-based and point-wise ordering of variables, the unknowns in Equation (15) can be reordered per spatial point. This yields the system L∗ M ⊗C1 ⊗ Is +t (18) K i ⊗Ci ⊗ Airk xˆ = bˆ i=1

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

147

with bˆ being the reordered version of b (16). The vector xˆ contains N blocks, with each block corresponding to the Qs unknowns related to a spatial point. This point-based ordering is more convenient to illustrate the block operations of the point-based AMG method presented in Section 3, see Remark 3.1.

3. AMG FOR THE STOCHASTIC FINITE ELEMENT METHOD Next, we present an AMG method to solve the stochastic finite element discretization (15) or (17). We will also consider the case of a stationary, i.e. time-independent problem. In that case, the discretization reads ⎡ L∗

Ci ⊗ K i u = b

i=1

u1

⎤

⎢ ⎥ ⎢ ⎥ with u = ⎢ ... ⎥ , uq ∈ R N and b = c ⊗b ⎣ ⎦ uQ

(19)

The basis of the method is the classical multigrid iteration as shown in Algorithm 1. The algorithm uses a hierarchy of K levels, k = 1, . . . , K , with A K u K = b K being the discretization of the (stochastic) PDE on the given (fine) mesh. The recursion scheme is determined by a parameter ; for example, the case = 1 is called a V-cycle, the case = 2 a W-cycle. An AMG method requires a setup phase to algebraically construct the restriction and prolongation operators, Rkk−1 k , k = 2, . . . , K . The coarse level operators A and Pk−1 k−1 , k = 2, . . . , K , are assembled by using the k+1 k Galerkin principle [24], i.e. Ak = Rk+1 Ak+1 Pk . To construct an AMG method for stochastic finite element and IRK discretizations, the AMG components are built so that all unknowns per spatial node are updated together. A block smoother will be used, and prolongation and restriction operators will have a tensor structure. Algorithm 1. Standard multigrid iteration for Au = b. ( = 1: V-cycle, = 2: W-cycle) (1)

(0)

u k = multigrid(u k , Ak , bk , k) (0)

(0)

• Presmoothing: u k = smooths1 (u k , Ak , bk ) (0) • Restrict residual: bk−1 = Rkk−1 (bk − Ak u k ) (0) • Coarse grid correction: solve Ak−1 vˆk−1 = bk−1 (0)

— if k = 1, vˆk−1 = A−1 k−1 bk−1 (0)

— if k>1, vˆk−1 = multigrid(0, Ak−1 , bk−1 , k −1) (0)

(0)

vˆk−1 = multigrid−1 (vˆk−1 , Ak−1 , bk−1 , k −1) (0)

(0)

(0)

k vˆ • Prolongate correction and update solution: uˆ k = u k + Pk−1 k−1 (1)

(0)

• Postsmoothing: u k = smooths2 (uˆ k , Ak , bk ) Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

148

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

3.1. A smoother for the stochastic finite element method We suggest to use a block lexicographic Gauss–Seidel smoothing method. More precisely, one smoothing step consists of a loop over all spatial nodes, in which all random and IRK unknowns per node are updated simultaneously. Hence, every iteration involves a sequence of N local solves of a linear system. The local system at node n corresponds to the Qs×Qs system: L∗ M[n,n] C1 ⊗ Is +t K i[n,n] Ci ⊗ Airk x[n] i=1

= b[n] −

m =n

M[n,m] C1 ⊗ Is +t

L∗ i=1

K i[n,m] Ci ⊗ Airk x[m]

(20)

with x[n] ∈ RQs being the unknowns associated with node n. For stationary problems, the local system simplifies to the Q × Q system: L∗ i=1

K i[n,n] Ci u [n] = b[n] −

L∗ m =n i=1

K i[n,m] Ci u [m]

(21)

The block Gauss–Seidel iteration step can be expressed as a linear iteration based on a matrix splitting of the stiffness matrices K i , K i = K i+ + K i− (i = 1, . . . , L ∗ ), and the mass matrix M, M = M + + M − . Here, K i+ and M + are the lower triangular parts of K i and M, respectively. The block Gauss–Seidel iteration in the th iteration step can then be formulated as L∗ + C1 ⊗ M + ⊗ Is +t Ci ⊗ K i ⊗ Airk x ( +1) i=1

= b − C1 ⊗ M ⊗ Is +t −

L∗ i=1

Ci ⊗ K i− ⊗ Airk

x ( )

(22)

Remark 3.1 The block Gauss–Seidel method entails every iteration a block triangular system solve. The triangular shape of these systems can be visualized by reordering the unknowns according to Equation (18). The block Gauss–Seidel iteration (22) can then be formulated as ⎡ ⎤ L∗ irk K i[1,1] Ci 0 ⎢ M[1,1] IQs +t ⎥ ⎢ ⎥ i=1 ⎢ ⎥ ⎢ ⎥ ( +1) .. ⎢ ⎥ xˆ = bˆGS . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ L∗ L∗ ⎣ ⎦ M[N ,1] IQs +t K i[N ,1] Ciirk . . . M[N ,N ] IQs +t K i[N ,N ] Ciirk i=1

i=1

with Ciirk = Ci ⊗ Airk , C1 replaced by I Q and bˆGS = bˆ −(M − ⊗ IQs +t Copyright q

2008 John Wiley & Sons, Ltd.

L ∗

− irk ( ) i=1 K i ⊗Ci ) xˆ .

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

149

3.2. Multigrid hierarchy and intergrid transfer operators We suggest to derive the multigrid hierarchy from the dominant term in (4) and (5), i.e. from the stiffness matrix of the averaged deterministic problem. This is the PDE that results from the stochastic PDE by replacing all the random parameters by their mean value. Such a hierarchy can be derived by using a classical AMG strategy applied to the stiffness matrix K 1 . Suppose Pd is a prolongation operator constructed for K 1 , then tensor prolongation and restriction operators for (15), denoted as P and R, respectively, are built as P = I Q ⊗ Pd ⊗ Is

and R = I Q ⊗ PdT ⊗ Is

(23)

and R = I Q ⊗ PdT

(24)

This simplifies in the stationary case to P = I Q ⊗ Pd

Note that all random and IRK unknowns associated with a particular spatial node are prolongated and restricted in a decoupled way. Only the spatial dimension is coarsened; the stochastic and time discretization is kept unaltered throughout this multigrid hierarchy. The coarse grid operator, denoted as A H , is deduced from (24) and (23) by using the Galerkin principle. That is, the coarse grid operator corresponds to RAh P, with Ah being the fine grid operator. Thus, applying formulas (24)–(23) to Equations (19)–(15) results in A H = C1 ⊗ Pd M PdT ⊗ Is +t

L∗ i=1

Ci ⊗ Pd K i PdT ⊗ Airk

for the time-dependent case. For the time-independent case, we have AH =

L∗ i=1

Ci ⊗ PdT K i Pd

4. CONVERGENCE ANALYSIS Using Fourier analysis [24, 25], valuable insights in the convergence behavior of geometric multigrid methods can be obtained. A local Fourier analysis of geometric multigrid for stochastic, stationary PDEs can be found in [16]. This analysis cannot be directly applied to AMG methods. Instead, the methodology from [10, 11] is followed. Our analysis for stationary and time-dependent problems as will be detailed in Sections 4.1 and 4.2, respectively, is restricted to the case of L ∗ = 2. This corresponds to a diffusion coefficient discretized with one random variable, see also Equation (5). In Section 5, the extension to the general case, L ∗ >2, is discussed. 4.1. Stationary problems First, we derive some results for the block smoother. We start from the error iteration: e( +1) = Se( )

with S = −(C1 ⊗ K 1+ +C2 ⊗ K 2+ )−1 (C1 ⊗ K 1− +C2 ⊗ K 2− )

(25)

and e( ) = u exact −u ( ) the error at iteration step . The asymptotic convergence is characterized by the spectral radius of the iteration operator S, denoted by (S). Assume that the random basis Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

150

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

functions 1 , . . . , Q are normalized in such a way that C1 equals the identity matrix I Q . The matrix C2 is a real symmetric matrix (7)–(8) with eigenvalue decomposition C2 = VC2 C2 VCT2 . Applying the similarity transform VC2 ⊗ I N to S leads to (S) = ((VCT2 ⊗ I N )S(VC2 ⊗ I N )) = (−(I Q ⊗ K 1+ +C2 ⊗ K 2+ )−1 (I Q ⊗ K 1− +C2 ⊗ K 2− )) =

Q q=1

=

Q

(−(K 1+ +q K 2+ )−1 (K 1− +q K 2− )) with q ∈ (C2 ) ˆ q )) ( S(

q=1

with Sˆ being a matrix-valued function defined as ˆ ) = −(K + +r K + )−1 (K − +r K − ) S(r 1 2 1 2

(26)

Thus, the asymptotic convergence factor of block Gauss–Seidel is given by ˆ q ))

(S) = max ( S( q ∈(C2 )

(27)

To characterize the convergence properties of a two-level multigrid cycle, we define the matrixvalued function Tˆ (r ): ˆ ))s1 ˆ ))s2 (I N − Pd (PdT (K 1 +r K 2 )Pd )−1 PdT (K 1 +r K 2 ))( S(r Tˆ (r ) = ( S(r ˆ ) being defined by (26), s1 and s2 are the number of pre- and postsmoothing iterations. with S(r An analogous derivation as above shows that the asymptotic convergence factor of the two-level cycle can be determined from the spectral radius of the corresponding iteration matrix T as

(T ) = max (Tˆ (q )) q ∈(C2 )

(28)

Formulas (27) and (28) allow the following intuitive interpretation. The convergence for the stationary stochastic finite element discretization with L ∗ = 2 equals the worst convergence of the corresponding Gauss–Seidel or multigrid method, applied to a set of deterministic problems of the form: (K 1 +q K 2 )u = b

with q ∈ (C2 )

(29)

4.2. Time-dependent problems The error iteration of the block Gauss–Seidel smoother (22) for C1 = I Q and L ∗ = 2 is given by (I Q ⊗ M + ⊗ Is +t (I Q ⊗ K 1+ +C2 ⊗ K 2+ )⊗ Airk )e( +1) = −(I Q ⊗ M − ⊗ Is +t (I Q ⊗ K 1− +C2 ⊗ K 2− )⊗ Airk )e( ) Copyright q

2008 John Wiley & Sons, Ltd.

(30)

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

151

with corresponding iteration matrix denoted as S. This matrix can be decoupled by applying the similarity transform VC2 ⊗ I N ⊗ Virk , with Virk resulting from the eigenvalue decomposition −1 Airk = Virk irk Virk and VC2 from C2 = V2 C2 V2T . This enables to express the spectrum of S as (S) =

Q s r =1 q=1

ˆ q , trirk )), ( S(

rirk ∈ (Airk ), q ∈ (C2 )

with Sˆ being the matrix-valued function defined as ˆ z) = −(M + + z K + + zr K + )−1 (M − + z K − + zr K − ) S(r, 1 2 1 2

(31)

Thus, the asymptotic convergence factor of lexicographic block Gauss–Seidel applied to system (15) with L ∗ = 2 corresponds to

(S) =

max

ˆ q , tirk )) max ( S(

irk ∈(Airk ) q ∈(C2 )

(32)

The analysis of the two-level multigrid cycle proceeds in a similar way. It is based on the matrix-valued function Tˆ (r, z) defined as ˆ z))s1 ˆ z))s2 (I N − Pd (PdT (M + z K 1 + zr K 2 )Pd )−1 PdT (M + z K 1 + zr K 2 ))( S(r, Tˆ (r, z) = ( S(r, ˆ z) given by (31) and s1 and s2 being the number of pre- and postsmoothing steps. Using with S(r, this matrix function, the asymptotic convergence factor of the two-level multigrid cycle becomes

(T ) =

max

max (Tˆ (q , tirk ))

irk ∈(Airk ) q ∈(C2 )

(33)

As in the stationary case, this value corresponds to the worst case asymptotic convergence factor of multigrid applied to the set of deterministic problems: (M +tirk K 1 +tirk q K 2 )x = b

with q ∈ (C2 ), irk ∈ (Airk )

(34)

These deterministic systems can be derived from backward Euler discretizations with scaled time step tirk of ODE systems: M

dx +(K 1 +q K 2 )x = b dt

(35)

4.3. General discretizations with L ∗ >2 The case L ∗ = 2 enables a decoupling of the stochastic and spatial dimensions, using a similarity transform based on C2 . Hence, the analysis can be reduced to the analysis of smaller problems of the form (29) and (34). For these problems, local Fourier analysis [24, 26] allows to derive sharp convergence factors, at least for the geometric multigrid variant on regular meshes. In general, no decoupling between the spatial and random part of the discretization is possible since the matrices Ci cannot be diagonalized simultaneously. An exception to this occurs when double orthogonal polynomials are used as basis functions. Indeed, then all matrices Ci are diagonal, see (2) and (7). Denoting the double orthogonal random basis by and the corresponding matrices Ci by G i , we can determine the spectral radius of the two-level multigrid iteration matrix T as

(T ) = max . . . 1 ∈(G 1 )

Copyright q

2008 John Wiley & Sons, Ltd.

max

L ∗ ∈(G L ∗ )

(Tˆ (1 , . . . , L ∗ )) Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

152

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

with the matrix-valued function Tˆ (r1 , . . . ,r ∗L ) being defined as ⎡ ∗ −1 ∗ ⎤ L L ˆ 1 , . . . ,r L ∗ ))s2 ⎣ I N − Pd PdT Tˆ (r1 , . . . ,r L ∗ ) = ( S(r ri K i Pd PdT ri K i ⎦ i=1

i=1

ˆ 1 , . . . ,r L ∗ ))s1 · ( S(r L ∗ L ∗ ˆ 1 , . . . ,r L ∗ ) = ( i=1 and the matrix-valued function S(r ri K i+ )−1 ( i=1 ri K i− ). In Section 5 we will point out a relation between the eigenvalues of the matrices Ci and the diagonal elements of the matrices G i . On the basis of that relation, we can show that the AMG convergence properties in case of a double orthogonal random basis are similar to the case L ∗ = 2 treated in the previous section. Moreover, also when the double orthogonal basis is not used, we can argue that the analysis of the case L ∗ = 2 is likely to provide valuable insights for the general case L ∗ >2. The first terms of system (19), i.e. C1 ⊗ K 1 +C2 ⊗ K 2 , represent the mean behavior of the stochastic PDE and the main stochastic variation. This follows from the stochastic discretization of the random coefficient as a truncation of a series of terms of decreasing importance, see Section 2.2. The sum involving the matrices C3 , . . . , C L ∗ can be seen as a perturbation of the system matrix. A more thorough (geometric) multigrid analysis for the general stationary case can be found in [16].

5. A DISCUSSION OF THE THEORETICAL CONVERGENCE PROPERTIES The convergence analysis of the previous section shows that both the matrices K 1 and K 2 as well as the eigenvalues of C2 and Airk determine the AMG convergence, see Equations (28) and (33). In this section we discuss the AMG convergence behavior with respect to the stochastic discretization. The conclusions agree with the properties of the geometric multigrid variant, as observed in [15] and theoretically analyzed in [16, 27]. The AMG convergence behavior with respect to the IRK discretization, i.e. the influence of the eigenvalues of Airk , is detailed in [11]. 5.1. A bound for the eigenvalues of C2 The range of the eigenvalues of C2 can be computed by using properties of double orthogonal random polynomials [16]. Consider the set of Q generalized polynomial chaos functions = [1 , . . . , Q ]T . This set can be expanded to an orthonormal set = [1 , . . . , Q , Q+1 , . . . , Q ]T that spans the same vector space as the double orthogonal polynomial chaos basis . Hence, an orthogonal matrix Z exists so that = Z . As a consequence, the matrix C 2 can be diagonalized by the matrix Z : T

C 2 := 1 = 1 Z ( )T Z T = Z 1 ( )T Z T = Z G 2 Z T

(36)

Thus, the eigenvalues of C 2 correspond to the diagonal entries of the diagonal matrix G 2 . It can be shown that these values coincide with the roots of univariate orthogonal polynomials from the Askey scheme, as explained in [16]. Moreover, as the matrix C2 is a principal submatrix of C 2 , the eigenvalues of C 2 determine upper and lower bounds for the eigenvalues of C2 . This allows one to determine bounds for the eigenvalues of C2 from the roots of certain univariate orthogonal polynomials. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

153

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

6

2

4 1

Eigenvalues

Eigenvalues

2

0

0

(a)

2

4

6

8

Hermite chaos order

10

0

0

(b)

5

10

15

20

Number of random dimensions

Figure 1. Effect of the polynomial chaos order and the number of random variables on the eigenvalues of C2 (7) in case of a Hermite polynomial chaos: (a) fixed number of random variables and increasing P, L = 4 and (b) fixed order and increasing value of L, P = 2.

5.2. Polynomial chaos type and order The influence of the polynomial chaos type and the polynomial order can be derived from the properties of the roots of the corresponding orthogonal polynomials [28]. In the case of a Legendre chaos, the eigenvalues of the matrix C2 take values between −1 and 1. Thus, AMG convergence is asymptotically independent of the polynomial chaos order. In case of a Hermite chaos, the eigenvalues can become arbitrarily large with a range that increases with increasing polynomial order. This effect is illustrated in Figure 1(a). Whether the large eigenvalue range affects the convergence or not depends on the particular PDE problem. Consider, e.g. model problem (1) √ with a Gaussian variable with mean 1 and variance , := 1+ . Then, K 1 is a discretized 1 √ Laplacian and K 2 = K 1 . According to (29), the AMG convergence corresponds to the worst convergence of multigrid applied to a set of problems of the form: √ (1+q )K 1 u = b with q ∈ (C2 ), C2 = 1 T (37) √ The multiplicative factor 1+q can be shifted to the right-hand side. Hence, the AMG convergence is independent of the variance and of the polynomial chaos order. However, in case of modelled as a random field (x, ), and L ∗ = 2, system (29) corresponds to K˜ u = b, with K˜ being a discretized diffusion problem with diffusion coefficient ˜ (x) := 1 (x)+q 2 (x). When ˜ (x) violates the ellipticity conditions of the bilinear form upon which K˜ is based (cf. Equation (9)), e.g. due to a large negative q , the AMG convergence degrades severely and eventually divergence is possible. This typically occurs only for a large polynomial chaos order. For other problems, the Hermite polynomial order can have an even more serious impact on AMG √ convergence. For example, consider the following problem with a Gaussian random variable 1 with zero mean and finite variance : 2 2 √ * u * u − +(1+ 1 ) 2 u = b *x 2 *y Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

154

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

The stiffness matrix K 1 corresponds to a discrete Laplace operator, while the second stiffness 2 matrix K 2 contains only contributions from * u/*y 2 . The AMG convergence rate equals the worst multigrid convergence for deterministic problems of the form: 2 √ *2 u * u − +(1+q ) 2 u = b with q ∈ (C2 ), C2 = 1 T *x 2 *y Increasing the polynomial order broadens the range of q and consequently increases the anisotropy of the problem. This results in a decreased AMG convergence. Eventually, for a sufficiently large order, the problem will lose ellipticity and AMG will diverge. 5.3. Number of random variables As our convergence analysis is restricted to L ∗ = 2, it does not provide direct information about the influence of the number of random variables L. However, the eigenvalue bounds of the matrices Ci (7) do not depend on the number of random variables. This follows from the fact that the eigenvalues of G i (36) equal the roots of one-dimensional polynomials [16] and thus are independent of the number of random variables. This property suggests an independence of AMG convergence on the number of random variables. Figure 1(b) shows the eigenvalues of C2 as a function of the number of random variables. The same eigenvalues can be found for C3 , . . . , C L ∗ (7), while all eigenvalues of C1 equal 1. Only in case of a double orthogonal polynomial chaos basis, a quantitative analysis of L ∗ >2 is possible. Then the independence of AMG convergence on the number of random variables can be demonstrated theoretically. 6. IMPLEMENTATION ASPECTS The effectiveness of an AMG method depends strongly on the efficiency of its implementation. In this section we point out some implementation issues that allow to reduce the computation time and memory usage. 6.1. Matrix formulation and storage Reordering the unknowns shows that the tensor product formulations (19) and (15) are mathematically equivalent to the matrix systems: L∗

K i U Ci = B

and

i=1

M X (C1 ⊗ Is )+t

L∗ i=1

K i X (Ci ⊗ ATirk ) = B˜

with the unknowns u and x being collected in the multivectors U ∈ R N ×Q and X ∈ R N ×Qs . Note that the N rows of X equal the N blocks of the unknown vector xˆ in Equation (18). This matrix representation allows an easy access of all the unknowns per nodal point: they correspond to a row in the matrix U or X . Such access is frequently needed for the block smoothing operator, the matrix–vector multiplication in the residual computation, and the block restriction and prolongation operators. Note also that storing these multivectors in a row-by-row storage format enables a cache efficient implementation. With one memory access, a whole set of values can be retrieved from memory that will be used in the subsequent operations. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

155

Obviously, the entire system of dimension NQ×NQ (in the stationary case), or NQs×NQs (in the time-dependent case), is never stored or constructed explicitly. Only the storage of one mass matrix M, of L ∗ stiffness matrices K i and L ∗ matrices Ci is required. These matrices can be stored in sparse matrix format. In general, all stiffness matrices K i have the same sparsity structure; hence, the description of this structure has to be stored just once. 6.2. Krylov acceleration Typically, AMG is used as a preconditioner for a Krylov method. This makes the scheme more robust and often significantly improves the convergence rates. The matrix–vector multiplication needed for Krylov methods can be implemented in a cache efficient way by using the row-by-row storage format suggested above. As explained in [11], the matrix–vector product Y = AX of a sparse matrix A ∈ R N ×N and a multivector X ∈ R N ×Q is implemented as a sequence of three nested loops, where the inner loop runs over the columns of the multivectors instead of over their rows. This results in an optimal reuse of the cache since the data access patterns of X and Y match their storage layout. For the stationary systems, conjugate gradients (CG) can be used as the matrices Ci and the stiffness matrices K i are symmetric. In the time-dependent case, we shall use BiCGStab or one of the GMRES variants because of the non-symmetry of matrix Airk . 6.3. Block smoothing A large part of the computation time is spent in the smoothing steps. At each smoothing iteration N , systems of size Q × Q or Qs×Qs have to be solved. Optimizing the solution time of these local systems is therefore of utmost importance. One possible approach is to factorize these systems already during setup so that every smoothing step only matrix–vector multiplications or back substitutions are required. However, the storage of N matrix factorizations may lead to excessive memory requirements for large values of N and Q. Hence, we will not consider this further. In our experiments with direct solvers, the factorization will be done on the fly. Depending on the properties of the local systems, different solution methods can be selected. Figure 2 shows the average computation time of several solution approaches to solve one local system. The considered methods include an LU solver without pivoting, a sparse LU solver (UMFPACK [29] and SuperLU [30]) and a Krylov method. The tests were performed on a Pentium IV 2.4 GHz machine with 512 MByte RAM. Values for Q as a function of the number of random variables L and the chaos order P are given in Table I. These values are to be multiplied by s to get the system dimension in the IRK case. In the stationary case, considering our model problem (1) discretized with a Hermite or a Legendre chaos, the local systems (21) are sparse and symmetric positive-definite, with clustered eigenvalues and a condition number O(1). For large problem sizes, the CG solver leads to the best performance. No preconditioning is necessary because the systems are well conditioned. In the time-dependent case, the local systems (20) are non-symmetric and sparse. The matrices have clustered, complex eigenvalues and a condition number typically of the order O(10). The sparse LU solver SuperLU [30] yields the smallest execution times for non-trivial problem sizes. In both the stationary and time-dependent cases, a direct solver is the most efficient method if the dimension, Q or Qs, is small enough. When the random model parameter is discretized by a generalized polynomial chaos (4) instead of by a KL expansion (5), the local systems have the same dimension Q but become less Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

156

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

10

Average solution time (sec.)

Average solution time (sec.)

10

10

10

10

10

Gauss elimination CG Sparse LU (umfpack) Sparse LU (SuperLU)

0

50

(a)

100

150

10

10

10

Gauss elimination BiCGStab Sparse LU (umfpack) Sparse LU (SuperLU)

10

10

200

0

50

(b)

Dimension local system

100

150

200

250

300

350

400

Dimension local system

Figure 2. Average computation time to solve one local system (21) or (20) in case of the model problem (1) with (x, t, ) modelled as a Gaussian random field (x, ) with an exponential covariance function: (a) stationary problem and (b) time-dependent problem. A Hermite chaos random discretization is used and a Radau IIA IRK method.

Table I. The number of random unknowns Q as a function of the number of random variables L and of the polynomial chaos order P. L P

1

2

4

8

10

15

20

1 2 4

2 3 5

3 6 15

5 15 70

9 45 495

11 66 1001

16 136 3876

21 231 10 626

sparse, see [31]. As a consequence, the local system solves are more time consuming. The corresponding computation times for different solution methods follow, however, the same pattern as in Figure 2.

7. NUMERICAL RESULTS In this section we present some numerical results obtained with the AMG method. First, we investigate the AMG convergence with respect to several discretization parameters for the stationary diffusion equation. The tests use a square spatial domain, D = [0, 1]2 , and piecewise linear, triangular finite elements. We consider homogeneous Dirichlet boundary conditions, and the source term b(x, t) is set to zero. The AMG prolongation operators are built with classical Ruge–St¨uben AMG [32]. The stopping criterion for the AMG method is a residual norm smaller than 10−10 . A random initial approximation to the solution was used. We consider several configurations for the random input (x, t, ). In case of a random field, (x, ), the stochastic diffusion coefficient depends on the spatial position, e.g. representing a heterogeneous material. In case of a random process, (t, ), the stochastic diffusion coefficient remains the same at all spatial points but evolves in Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

157

Table II. Configurations of the random coefficient (x, t, ) in Equation (1). Name

Random discretization

Distribution

Random field

g (x, ) u (x, ) ln (x, ) = exp(g )

Karhunen–Lo`eve expansion Karhunen–Lo`eve expansion Polynomial chaos expansion

Standard normal Uniform on [−1, 1] Standard normal

Random process

t (t, )

Karhunen–Lo`eve expansion

Standard normal

Table III. Number of iterations required to solve the steady-state diffusion equation corresponding to (1) with W (2, 1)-cycles, using AMG as standalone solver, or as preconditioner for CG (between brackets). Spatial nodes Q = 21, P = 2, L = 5

N = 10 177

N = 50 499

N = 113 981

N = 257 488

N = 356 806

31 (15) 31 (15) 32 (16)

35 (16) 34 (16) 37 (17)

36 (17) 36 (17) 39 (18)

36 (17) 36 (17) 38 (17)

37 (17) 37 (17) 39 (18)

L =1 Q =3

L =5 Q = 21

L = 10 Q = 66

L = 15 Q = 136

L = 20 Q = 231

32 (15) 32 (15) 35 (16)

34 (16) 33 (16) 36 (17)

34 (16) 34 (16) 36 (17)

35 (16) 35 (16) 36 (17)

35 (16) 35 (16) 37 (17)

P =1 Q =6

P =2 Q = 21

P =3 Q = 56

P =4 Q = 126

P =5 Q = 252

33 (15) 33 (15) 34 (16)

34 (16) 33 (16) 36 (17)

34 (16) 34 (16) 37 (17)

35 (16) 35 (16) 37 (17)

36 (17) 35 (17) 38 (18)

g u ln Random variables (N = 20 611, P = 2)

g u ln Chaos order (N = 20 611, L = 5)

g u ln

time. For each case, Table II indicates which expansion is used to construct the random input and what type of random variables are present in that expansion. In case of a KL expansion, an exponential covariance function is assumed, C (x, x ) = exp(−|x−x |/lc ), with variance = 0.1 and correlation length lc = 1. In case of the lognormal random field ln , the variance of the underlying Gaussian field g equals 0.3. For each configuration of , the mean value of the random input always equals the constant function 1. When the stochastic discretization is based on uniformly distributed random variables, a Legendre polynomial chaos is used, in the case of standard normal distributed random variables a Hermite chaos. Next, the AMG performance will be illustrated for a more complex test problem. 7.1. Stationary problems The dependence of the AMG convergence properties on the spatial and stochastic discretization parameters is illustrated by the numerical results displayed in Table III. As AMG cycle type, W (2, 1)-cycles are used since these result in a lower overall solution time compared with V - or F-cycles, see also Figure 3. As expected from the discussion in Section 5, the number of AMG Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

158

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

(a)

(b)

Figure 3. Total solution time when solving the steady-state problem with u (x, ), L = 5 and (2, 1)-cycles of AMG iterations: (a) a second-order Legendre chaos is used, resulting in Q = 21 and (b) the discretization is based on a first until a fifth-order Legendre chaos and a mesh with 20 611 nodes. Table IV. The number of iterations required to solve problem (38) with W (2, 1)-cycles, using AMG as standalone solver, or as preconditioner for CG (between brackets), until residual <10−10 . Polynomial chaos order

g (Hermite chaos) u (Legendre chaos) ln (Hermite chaos)

P =1

P =2

P =3

P =4

P =5

24 (16) 24 (16) 43 (25)

25 (17) 24 (16) 48 (28)

27 (19) 25 (17) 58 (33)

35 (22) 25 (17) 72 (40)

86 (61) 25 (17) 97 (53)

The finite element mesh consists of N = 20 611 nodes, and five random variables are used to discretize the random space.

Table V. The number of iterations required to solve the time-dependent problem (1) with V (2, 1)-cycles, using AMG as standalone solver, or as preconditioner for BiCGStab (between brackets). Time discretization order IRK stages

g t

1

3

5

7

9

11

s =1

s =2

s =3

s =4

s =5

s =6

41 (18) 39 (18)

33 (19) 32 (19)

29 (19) 28 (19)

27 (20) 27 (18)

27 (18) 27 (19)

27 (19) 27 (19)

The discretization is based on a finite element mesh with 20 611 nodes, a second-order Hermite chaos with L = 5, corresponding to Q = 21, and a Radau IIA implicit Runge–Kutta scheme with t = 0.01.

iterations is independent of the stochastic and spatial discretization when applied to our model problem. The independence on the polynomial chaos order is maintained in the case of a Hermite chaos for a low to moderate chaos order. Applying Krylov acceleration results in a more robust convergence and reduced computing times. The computation times for the calculations in Table III are presented graphically in Figure 3. The total AMG solution time is shown as a function of the number of spatial nodes and of the number of random unknowns. For this problem, the matrices Ci are defined as in Equation (7). By Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

159

increasing the number of spatial nodes, the number of local solves in the blocksmoother increases proportionally. In addition, extra coarse levels may be introduced so that the total increase in computation time is no longer linear. The results from the right figure were obtained by increasing the polynomial chaos order while keeping the number of random variables L constant. Thus, only the dimension of the matrices Ci increases; the number of stiffness matrices L ∗ remains the same. This mainly affects the cost of the block solves in the smoother. With CG, the number of iterations required is proportional to the square root of the condition number of (21). In practice this condition number is close to 1 so that the number of CG iterations is more or less independent of the dimension of the systems. The cost of each CG iteration depends on the sparsity of the matrix, which, with Ci defined by (7), is of the order O(Q). This results in a cost O(Q) to solve one block system in the smoother. The linear increase of the computation time in function of the number of random unknowns Q is clearly observed in Figure 3. If the number of stiffness matrices is also increased, then the total computing time tends to grow faster than linear. Also in the case of a polynomial chaos expansion of the random input, as in the lognormal field example, higher computing times are observed. This is caused by the larger number of stiffness matrices, Q instead of L +1, and by the decreased sparsity of the matrices Ci (8). As discussed in Section 5, the convergence analysis indicates that the convergence of AMG is asymptotically independent of the polynomial chaos order in case of a Legendre chaos but not in the case of a Hermite chaos. For model problem (1), solely a large polynomial chaos order has an impact on the multigrid convergence. For some problems, however, also small values of the polynomial chaos order affect the AMG convergence rate. This is illustrated by the problem 2

2

* u(x, ) * u(x, ) +(x, ) =0 *x 2 *y 2

(38)

which is discretized similar to our model problem. Table IV shows the AMG convergence for increasing values of the polynomial chaos order. In case of a Hermite chaos, the deteriorating AMG convergence is observed. As expected, the number of iterations remains unchanged in case of a Legendre chaos. 7.2. Time-dependent problems First, we illustrate the influence of the number of IRK stages on the AMG convergence, for model problem (1). The results are presented in Table V for a number of stages s, increasing from 1 up to 6. The first corresponds to a first-order method, while the later leads to a time integration scheme of order 11. As a test case, a Gaussian field g (x, ) and a Gaussian process t (t, ) were used, see Table II for the characteristics. Note that the convergence analysis predicts an increased AMG convergence rate when the number of IRK stages is increased. This effect is visible in the numerical results. Finally, we consider a more challenging transient potential equation: *V (x, t, ) −∇ ·( (x, )∇V (x, t, )) = 0 *t

(39)

on a complex domain. Here, V denotes the electric potential and the electric permittivity. No charge density is present. The domain, the boundary conditions and the setup for are presented in Figure 4. The model represents a three-phase cable, with four constant potentials along the outer and inner boundaries. The permittivity is expressed as a piecewise constant random field Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

160

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

Figure 4. Configuration of a two-dimensional stochastic problem. The random variables 1 , 2 , 3 are independent and each uniformly distributed on [−1, 1]. The permittivity of the free space, 0 = 8.85×10−12 .

-0.906

0

Mean at t = 0.1 -0.0436

0.819

Variance at t = 0.1 0.000529 0.00106

-0.906

0

Mean at t = 0.5 -0.0436

0.819

Variance at t = 0.5 0.000934 0.00187

-0.906

0

Mean at t = 2 -0.0436

0.819

Variance at t = 2 0.000999

0.002

Figure 5. Mean and variance of the solution of Equation (39). The configuration of Figure 4 is used with a three-stage Radau IIA IRK discretization and time step 0.05. The stochastic discretization is based on a second-order Legendre chaos. The electric potential is zero initially.

corresponding to the different material regions of the cable. The stochastic PDE models the effect of deviations in permittivity on the resulting electric potential as a function of space and time. Figure 5 shows the mean value and the variance of the electric potential at several instances in Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

(a)

161

(b)

Figure 6. Residual norms as a function of the number of iterations when solving Equation (39) on the domain represented by Figure 4, discretized with 166 245 nodes. A three-stage Radau IIA IRK discretization with time step 0.05 is used together with a second-order Legendre chaos resulting in a total of 1.7×106 unknowns in the steady-state case and 5.0×106 in the time-dependent case: (a) Steady-state problem, F(2, 1) cycles and (b) transient problem, V(2, 1) cycles.

time. Applying AMG results in similar convergence properties as the ones described above. An illustration of the convergence history as a function of the iteration index is given in Figure 6. Observe that the use of GMRESR [33] results in a more robust convergence than BiCGStab. This is typically also the case for classical deterministic PDEs. To limit the memory requirements of GMRESR, the method is restarted every five iterations.

8. CONCLUSIONS We have constructed and analyzed an AMG method for stochastic finite element discretizations of time-dependent stochastic PDEs. This work extends previous research on multigrid for stochastic finite element problems [16, 27] towards unstructured finite element meshes and high-order time discretizations. The presented AMG method has very favorable convergence properties with respect to the spatial, random and time discretization. To solve real engineering stochastic PDEs by the stochastic finite element method, however, further research is necessary. By using the knowledge of the stochastic discretization, the AMG components may be enhanced and optimized.

APPENDIX A: THE COMPUTATION of i j k IN CASE OF A HERMITE CHAOS The Hermite chaos of order P and defined over L standard normal variables 1 , . . . , L is constructed as a set of Q multivariate Hermite polynomials q , each defined as [34] q =

Copyright q

2008 John Wiley & Sons, Ltd.

L

1 H q,i (i ) i=1 q,i ! Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

162

E. ROSSEEL, T. BOONEN AND S. VANDEWALLE

with Hn (z) being a one-dimensional Hermite polynomial of order n and q = ( q,i , . . . , q,L ) a set L of non-negative integers with only a finite number non-zeros and i=1

q,i
H1 (z) = z,

Hn+1 (z) = z Hn (z)−n Hn−1 (z)

Based on the properties of Hermite polynomials [28, p. 390], the inner product of three multivariate Hermite polynomials can be calculated as

im ! jm ! km ! L i j k = (A1) m=1 (sm − im )!(sm − jm )!(sm − km )! if 2sm = im + jm + km is an even integer and sm im , sm jm , sm km , otherwise the inner product is zero.

ACKNOWLEDGEMENTS

This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with its authors. This research was supported in part by the Research Council K.U.Leuven, CoE EF/05/006 Optimization in Engineering (OPTEC). REFERENCES 1. Karniadakis G, Su C-H, Xiu D, Lucor D, Schwab C, Todor R. Generalized polynomial chaos solution for differential equations with random inputs. Research Report 2005-01, Seminar for Applied Mathematics, ETH Z¨urich, January 2005. 2. Xiu D, Karniadakis G. Modeling uncertainty in steady state diffusion problems via generalized polynomial chaos. Computer Methods in Applied Mechanics and Engineering 2002; 191:4927–4948. 3. Schu¨eller GI. A state-of-the-art report on computational stochastic mechanics. Probabilistic Engineering Mechanics 1997; 12(4):197–322. 4. Babuˇska I, Tempone R, Zouraris GE. Solving elliptic boundary value problems with uncertain coefficients by the finite element method: the stochastic formulation. Computer Methods in Applied Mechanics and Engineering 2005; 194:1251–1294. 5. Babuˇska I, Chatzipantelidis P. On solving elliptic stochastic partial differential equations. Computer Methods in Applied Mechanics and Engineering 2002; 191:4093–4122. 6. Shinozuka M, Deodatis G. Response variability of stochastic finite element systems. Journal of Engineering Mechanics 1988; 114:499–519. 7. Ghanem R, Spanos P. Stochastic Finite Elements, a Spectral Approach. Dover: Mineola, NY, 2003. 8. Ghanem R, Spanos P. A spectral stochastic finite element formulation for reliability analysis. Journal of Engineering Mechanics (ASCE) 1991; 17:2351–2372. 9. Hairer E, Wanner G. Solving Ordinary Differential Equations II: Stiff and Differential-algebraic Problems. Springer: Berlin, Germany, 1991. 10. Van lent J, Vandewalle S. Multigrid methods for implicit Runge–Kutta and boundary value method discretizations of parabolic pdes. SIAM Journal on Scientific Computing 2005; 27(1):67–92. 11. Boonen T, Van lent J, Vandewalle S. An algebraic multigrid method for high order time-discretization of the div–grad and the curl–curl equations. Applied Numerical Mathematics 2007; in press. 12. Xiu D, Karniadakis G. The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM Journal on Scientific Computing 2002; 24(2):619–644. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS

163

13. Babuˇska I, Tempone R, Zouraris GE. Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM Journal on Numerical Analysis 2004; 42:800–825. 14. Wan X, Karniadakis GE. An adaptive multi-element generalized polynomial chaos method for stochastic differential equations. Journal of Computational Physics 2005; 209(2):617–642. 15. Le Maˆıtre O, Knio O, Debusschere B, Najm H, Ghanem R. A multigrid solver for two-dimensional stochastic diffusion equations. Computer Methods in Applied Mechanics and Engineering 2003; 192:4723–4744. 16. Seynaeve B, Rosseel E, Nicola¨ı B, Vandewalle S. Fourier mode analysis of multigrid methods for partial differential equations with random coefficients. Journal of Computational Physics 2007; 224:132–149. 17. Ghanem R, Saad G, Doostan A. Efficient solution of stochastic systems: application to the embankment dam problem. Structural Safety 2007; 29(3):238–251. 18. Xiu D, Karniadakis G. Modeling uncertainty in flow simulations via generalized polynomial chaos. Journal of Computational Physics 2003; 187:137–167. 19. Lo`eve M. Probability Theory. Springer: New York, U.S.A., 1977. 20. Ghanem R. The nonlinear Gaussian spectrum of log-normal stochastic processes and variables. Journal of Applied Mechanics—Transactions of the ASME 1999; 66(4):964–973. 21. Phoon K, Huang S, Quek S. Simulation of second-order processes using Karhunen–Lo`eve expansion. Computers and Structures 2002; 80:1049–1060. 22. Sudret B, Der Kiureghian A. Stochastic finite elements and reliability: a state-of-the-art report. Technical Report UCB/SEMM-2000/08, University of California, Berkeley, 2000. 23. Ruge JW, St¨uben K. Algebraic multigrid. In Multigrid Methods, McCormick SF (ed.). Frontiers in Applied Mathematics. SIAM: Philadelphia, U.S.A., 1987; 73–130. 24. Trottenberg U, Oosterlee C, Sch¨uller A. Multigrid. Academic Press: San Diego, U.S.A., 2001. 25. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31: 333–390. 26. Wienands R, Joppich W. Practical Fourier Analysis for Multigrid Methods. CRC Press: Boca Raton, FL, U.S.A., 2005. 27. Elman H, Furnival D. Solving the stochastic steady-state diffusion problem using multigrid. IMA Journal of Numerical Analysis 2007; 27(4):675–688. 28. Szeg¨o G. Orthogonal Polynomials (4th edn). American Mathematical Society: Providence, U.S.A., 1967. 29. Davis TA. Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern multifrontal method. ACM Transactions on Mathematical Software 2004; 30(2):196–199. 30. Demmel JW, Eisenstat SC, Gilbert JR, Li XS, Liu JWH. A supernodal approach to sparse partial pivoting. SIAM Journal on Matrix Analysis and Applications 1999; 20(3):720–755. 31. Eiermann M, Ernst OG, Ullmann E. Computational aspects of the stochastic finite element method. Computing and Visualization in Science 2007; 10(1):3–15. 32. St¨uben K. A review of algebraic multigrid. Journal of Computational and Applied Mathematics 2001; 128: 281–309. 33. Van der Vorst H, Vuik C. GMRESR: a family of nested GMRES methods. Numerical Linear Algebra with Applications 1994; 1(4):369–386. 34. Matthies H, Keese A. Galerkin methods for linear and nonlinear elliptic stochastic partial differential equations. Computer Methods in Applied Mechanics and Engineering 2005; 194:1295–1331. 35. Soize C, Ghanem R. Physical systems with random uncertainties: chaos representations with arbritary probability measure. SIAM Journal on Scientific Computing 2004; 26(2):395–410.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:141–163 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:165–185 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.577

Algebraic multigrid for k-form Laplacians Nathan Bell∗, † and Luke N. Olson Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801, U.S.A.

SUMMARY In this paper we describe an aggregation-based algebraic multigrid method for the solution of discrete k-form Laplacians. Our work generalizes Reitzinger and Sch¨oberl’s algorithm to higher-dimensional discrete forms. We provide conditions on the tentative prolongators under which the commutativity of the coarse and fine de Rham complexes is maintained. Further, a practical algorithm that satisfies these conditions is outlined, and smoothed prolongation operators and the associated finite element spaces are highlighted. Numerical evidence of the efficiency and generality of the proposed method is presented in the context of discrete Hodge decompositions. Copyright q 2008 John Wiley & Sons, Ltd. Received 14 May 2007; Revised 4 December 2007; Accepted 5 December 2007 KEY WORDS:

algebraic multigrid; Hodge decomposition; discrete forms; mimetic methods; Whitney forms

1. INTRODUCTION Discrete differential k-forms arise in scientific disciplines ranging from computational electromagnetics to computer graphics. Examples include stable discretizations of the eddy-current problem [1–3], topological methods for sensor network coverage [4], visualization of complex flows [5, 6], and the design of vector fields on meshes [7]. In this paper we consider solving problems of the form ddk = k

(1)

where d denotes the exterior derivative and d the codifferential relating k-forms and . For k = 0, 1, 2, dd is also expressed as ∇·∇, ∇×∇×, and ∇∇·, respectively. We refer to operator dd generically as a Laplacian, although it does not correspond to the Laplace–de Rham operator D = dd+dd except for the case k = 0. We assume that (1) is discretized with mimetic first-order elements ∗ Correspondence †

to: Nathan Bell, Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801, U.S.A. E-mail: [email protected], [email protected]

Copyright q

2008 John Wiley & Sons, Ltd.

166

N. BELL AND L. N. OLSON

such as Whitney forms [8, 9] on simplicial meshes or the analog on hexahedral [10] or polyhedral elements [11]. In general, we use Ik to denote the map from discrete k-forms (cochains) to their respective finite elements. Such discretizations give rise to a discrete exterior k-form derivative Dk and discrete k-form innerproduct Mk (i, j) = Ik ei , Ik e j, which allows implementation of (1) in weak form as DTk Mk+1 Dk x = b

(2)

under the additional assumption that d commutes with I , i.e. Ik+1 Dk = dk Ik . This relationship is depicted as k 6

dk -

Ik

kd

k+1 6 Ik+1

Dk -

(3)

k+1 d

where k and kd denote the spaces of differential k-forms and discrete k-forms, respectively. For the remainder of the paper, we restrict our attention to solving (2) on structured or unstructured meshes of arbitrary dimension and element type, provided the elements satisfy the aforementioned commutativity property.

Figure 1. Enumeration of nodes (left), oriented edges (center), and oriented triangles (right) for a simple triangle mesh. We say that vertices 2 and 3 are upper adjacent since they are joined by edge 4. Similarly, edges 5 and 6 are both faces of triangle 2 and therefore upper adjacent.

Figure 2. Forms I0 0 , I1 D0 0 , and I1 1 where I denotes Whitney interpolation. The left and center figures illustrate property (3). Whether the derivative is applied before or after interpolation, the result is the same. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

167

AMG FOR k-FORM LAPLACIANS

1.1. Example Although our results hold more generally, it is instructive to examine a concrete example that satisfies the assumptions set out in Section 1. To this end, consider the three-element simplicial mesh depicted in Figure 1, with the enumeration and orientation of vertices, edges, and triangles as shown. In this example, we choose Whitney forms [8] to define the interpolation operators I0 , I1 , I2 which in turn determine the discrete innerproducts M0 , M1 , M2 . Finally, sparse matrices ⎡

−1 1 0 0 ⎢ −1 0 0 1 ⎢ ⎢ 0 ⎢ 0 −1 1 ⎢ 0 −1 0 1 D0 = ⎢ ⎢ ⎢ 0 0 −1 1 ⎢ ⎢ ⎣ 0 0 −1 0 0 0 0 −1 ⎤ −1 0 1 0 0 0 ⎥ 0 1 −1 1 0 0 ⎦ , 0 0 0 −1 1 −1

⎡ ⎤ 0 ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ D−1 = ⎢ 0⎥ , ⎢ ⎥ ⎣ 0⎦ 0 ⎡

1 ⎢ D1 = ⎣ 0 0

⎤ 0 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 1⎦

(4)

1 D2 = [0 0 0]

(5)

implement the discrete k-form derivative operators. A discrete k-form (cochain), denoted k , is represented by a column vector with entries corresponding to each of the k-simplices in the mesh. For example, the Whitney-interpolated fields corresponding to 0 = [0, 1, 2, 1, 2]T , the gradient D0 0 = [1, 1, 1, 0, −1, 0, 1]T , and another 1-form 1 = [1, 0, 1, 0, 0, 1, 0]T are shown in Figure 2. By convention, D−1 and D2 are included to complete the exact sequence. 1.2. Related work There is significant interest in efficient solution methods for Maxwell’s eddy-current problem ∇×∇×E+E = f

(6)

In particular, recent approaches focus on multilevel methods for both structured and unstructured meshes [12–15]. Scalar multigrid performs poorly on edge element discretizations of (6) since error modes that lie in the kernel of ∇×∇× are not effectively damped by standard relaxation methods. Fortunately, the problematic modes are easily identified by the range of the discrete gradient operator D0 , and an appropriate hybrid smoother [12, 13] can be constructed. An important property of these multigrid methods is commutativity between coarse and fine finite element spaces. The relationship is described as 0d 6

P0 0 d

Copyright q

2008 John Wiley & Sons, Ltd.

D0 - 1 d 6

P1

(7)

0 D 1 - d Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

168

N. BELL AND L. N. OLSON

k 0 the coarse gradient operator, and P0 and where d is the space of coarse discrete k-forms, D P1 are the nodal and edge prolongation operators, respectively. Combining (7) with (3) yields the same result for the corresponding fine and coarse finite element spaces. In [14], Reitzinger and Sch¨oberl describe an algebraic multigrid method for solving (6) on unstructured meshes. In their method, property (7) is maintained by choosing nodal aggregates and using these aggregates to obtain compatible edge aggregates. The nodal and edge aggregates then give rise to piecewise-constant prolongators P0 and P1 , which can be smoothed to achieve better multigrid convergence rates [15] while retaining property (7). The method we present can be viewed as a natural extension of Reitzinger and Sch¨oberl’s work from 1-forms to general k-forms. Commutativity of the coarse and fine de Rham complexes is maintained for all k-forms, and their associated finite element spaces Ik kd ⊂ k . The relationship is described by

0d

D0 - 1 d 6

6 P0 0 d

D1 - 2 d 6

P1

0 D 1 - d

...

6

P2

1 D 2 - d

kd Pk

...

k d

Dk -

k+1 d 6 Pk+1

(8)

k D k+1 - d

where Pk denotes either the tentative prolongator Pk or smoothed prolongator Sk Pk . 1.3. Focus and applications While our work is largely inspired by multigrid solvers for (6), our intended applications do not focus specifically on the eddy-current problem. Indeed, recent work suggests that the emphasis on multilevel commutativity, a property further developed in this paper, is at odds with developing efficient solvers for (6) in the presence of highly variable coefficients [16]. Although our method generalizes the work of Reitzinger and Sch¨oberl [14] and Hu et al. [15], this additional generality does not specifically address the aforementioned eddy-current issues. In Section 3, we discuss computing Hodge decompositions of discrete k-forms with the proposed method. The Hodge decomposition is a fundamental tool in both pure and applied mathematics that

Figure 3. The two harmonic 1-forms of a rocker arm surface mesh. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

AMG FOR k-FORM LAPLACIANS

169

exposes topological information through differential forms. For example, the two harmonic 1-forms shown in Figure 3 exist because the manifold has genus 1. The efficient solution of discrete k-form Laplacians has substantial utility in computational topology. For instance, sufficient conditions on the coverage of sensor networks reduce to the discovery of harmonic forms on the simplicial Rips complex [4]. In such applications, we do not encounter variable coefficients and often take the identity matrix for Mk .

2. PROPOSED METHOD 2.1. Complex coarsening k , In this section we describe the construction of tentative prolongators Pk and coarse operators D which satisfy (8). In practice, the two-level commutativity depicted in (8) is extended recursively for use in a multilevel method. Also, it is important to note that when solving (2) for a specific k, it is not necessary to coarsen the entire complex. As in [14], we presume the existence of a nodal aggregation algorithm that produces a piecewiseconstant tentative prolongator P0 . This procedure, called aggregate nodes in Algorithm 1, is fulfilled by either smoothed aggregation [17] or a graph partitioner on matrices DT0 M1 D0 or DT0 D0 . Ideally, the nodal aggregates are contiguous and have a small number of interfaces with other aggregates. Algorithm 1. coarsen complex(D−1 , D0 , . . . , D N ) 1 2 3 4 5 6 7 8

P0 ⇐ a g g r e g a t e n o d e s (D0 , . . .) f o r k = 0 t o N −1 Pk+1 ⇐ i n d u c e d a g g r e g a t e s (Pk , Dk , Dk+1 ) k ⇐ (P T Pk+1 )−1 P T Dk Pk D k+1 k+1 end −1 ⇐ P T D−1 D 0 N ⇐ D N PN D −1 , D 0, . . . , D N r e t u r n P0 , P1 , . . . , PN and D

2.2. Induced aggregates The key concept in [14], which we apply and extend here, is that nodal aggregates induce edge aggregates; we denote P1 as the resulting edge aggregation operator. As depicted in Figure 4, a coarse edge exists between two coarse nodal aggregates when any fine edge joins them. Multiple fine edges between the same two coarse nodal aggregates interpolate from a common coarse edge with weight 1 or −1 depending on their orientation relative to the coarse edge. The coarse nodes 0 , which satisfies diagram (7). and coarse edges define a coarse derivative operator D We now restate the previous process in an algebraic manner that generalizes to arbitrary k-forms. Given P0 as before, form the product D = D0 P0 that relates coarse nodes to fine edges. Observe that each row of D corresponds to a fine edge and each column to a coarse node. Notice that the ith row of D is zero when the end points of fine edge i lie within the same nodal aggregate. Conversely, the ith row of D is nonzero when the end points of fine edge i lie in different nodal Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

170

N. BELL AND L. N. OLSON

Figure 4. Nodal aggregates (left) determine coarse edges (center) through the algorithm induced aggregates. Fine edges crossing between node aggregates interpolate from the corresponding coarse edge with weight 1 or −1 depending on their relative orientation. Edges contained within an aggregate do not correspond to any coarse edge and receive weight 0. These weights are determined by lines 10–13 of induced aggregates.

aggregates. Furthermore, when two nonzero rows are equal up to a sign (i.e. linearly dependent), they interpolate from a common coarse edge. Therefore, the procedure of aggregating edges reduces to computing sets of linearly dependent rows in D. Each set of dependent rows yields a coarse edge and thus a column of P1 . In the general case, sets of dependent rows in D = Dk Pk are identified and used to produce Pk+1 . The process can be repeated to coarsen the entire de Rham complex. Alternatively, the coarsening can be stopped at a specific k < N . In Section 2.5, we discuss the coarse derivative operator k ⇐ (P T Pk+1 )−1 P T Dk Pk and show that it satisfies diagram (8). D k+1 k+1 Algorithm 2. induced aggregates(Pk , Dk , Dk+1 ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

D ⇐ Dk Pk G ⇐ DTk+1 Dk+1 V ⇐ {} n ⇐0 f o r i i n r ows(D) s u c h t h a t D(i, :) = 0 i f i ∈ V An ⇐ d e p e n d e n t r o w s (G, D, i) f o r j ∈ An i f D(i, :) = D( j, :) Pk+1 ( j, n) ⇐ 1 else Pk+1 ( j, n) ⇐ −1 end end n ⇐ n +1 V ⇐ V ∪ An end end r e t u r n Pk+1

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

AMG FOR k-FORM LAPLACIANS

171

Intuitively, linear dependence between rows in D = Dk Pk indicates redundancy created by operator Pk . Aggregating dependent rows together removes redundancy from the output of D and compresses the remaining degrees of freedom into a smaller set of variables. By construction, the tentative prolongators have full column rank and satisfy R(Dk Pk ) ⊂ R(Pk+1 )

(9)

where R(A) denotes the range of matrix A. Note that property (9) is clearly necessary to satisfy diagram (8). Using disjoint sets of dependent rows A0 , A1 , . . ., the function induced aggregates constructs the aggregation operator Pk+1 described above. Nonzero entry Pk+1 (i, j) indicates membership of the ith row of D—i.e. the ith k +1-dimensional element—to the jth aggregate A j .

2.3. Computing aggregates For a given row index i, the function dependent rows constructs a set of rows that are linearly dependent to D(i, :). In the matrix graph of G, a nonzero entry G(i, j) indicates that the k + 1-dimensional elements with indices i and j are upper adjacent [18]. In other words, i and j are both faces of some k +2-dimensional element. For example, two edges in a simplicial mesh are upper adjacent if they belong to the same triangle. All linearly dependent rows that are adjacent in the matrix graph of G are aggregated together. This construction ensures that the aggregates produced by dependent rows are contiguous. As shown in Figure 5, such aggregates are more T natural than those that result from aggregating all dependent rows together (i.e. using G = D D ). Algorithm 3. dependent rows(G, D, i) 1 2 3 4 5 6 7 8 9 10 11 12 13

Q ⇐ {i} A ⇐ {i} w h i l e Q = {} j ⇐ pop(Q) Q ⇐ Q \{ j} f o r k s u c h t h a t G( j, k) = 0 i f k ∈ A and D(i, :) = ±D(k, :) A ⇐ A ∪{k} Q ⇐ Q ∪{k} end end end return A

2.4. Example In this section, we describe the steps of our algorithm applied to the three-element simplicial mesh depicted in Figure 1. Matrices D−1 , D0 , D1 , and D2 , shown in Section 1.1, are first computed Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

172

N. BELL AND L. N. OLSON

Figure 5. Example where contiguous (center) and noncontiguous (right) aggregation differs. Contiguous aggregates are reflected through our choice of G defined in induced aggregates and later used in dependent rows.

and then passed to coarsen complex. The externally defined procedure aggregate nodes is then called to produce the piecewise-constant nodal aggregation operator ⎤ 1 0 0 ⎥ ⎢ ⎢ 1 0 0⎥ ⎥ ⎢ ⎥ ⎢ P0 = ⎢ 0 1 0⎥ ⎥ ⎢ ⎢ 1 0 0⎥ ⎦ ⎣ ⎡

(10)

0 0 1 whose corresponding aggregates are shown in Figure 6. At this stage of the procedure, a more general nodal problem DT0 M1 D0 may be utilized in determining the coarse aggregates. Next, induced aggregates is invoked with arguments P0 , D0 , D1 and the sparse matrix ⎡

0

0

0

⎤

⎥ ⎢ ⎢ 0 0 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢ −1 1 0⎥ ⎥ ⎢ ⎥ ⎢ 0 0⎥ D = D0 P0 = ⎢ 0 ⎥ ⎢ ⎢ 1 −1 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 0 −1 1⎥ ⎦ ⎣ −1

0

(11)

1

is constructed. Recall from Section 2.2 that the rows of D are used to determine the induced edge aggregates. The zero rows of D, namely rows 0, 1, and 3, correspond to interior edges, which is confirmed by Figure 6. Linear dependence between rows 2 and 4 indicates that edges 2 and 4 have common coarse endpoints, with the difference in sign indicating opposite orientations. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

173

AMG FOR k-FORM LAPLACIANS

Figure 6. Original mesh with nodal aggregates (left), coarse nodes (center), and coarse edges (right).

For each nonzero and un-aggregated row of D, dependent rows traverses ⎡

1

−1

0

1

0

0

0

⎤

⎥ ⎢ ⎢ −1 1 0 −1 0 0 0⎥ ⎥ ⎢ ⎥ ⎢ 0 0 1 −1 1 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ T 0⎥ G = D1 D1 = ⎢ 1 −1 −1 2 −1 0 ⎥ ⎢ ⎢ 0 0 1 −1 2 −1 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 0 0 0 0 −1 1 −1⎥ ⎦ ⎣ 0

0

0

0

−1

1

(12)

1

to find dependent rows among upper-adjacent edges. In this case, edges 3 and 4 are upper adjacent to 2; however, only row 4 in D is linearly dependent to row 2 in D. Rows 5 and 6 of D are not linearly dependent to any other rows, thus forming single aggregates for edges 5 and 6. The resulting aggregation operator ⎤ 0 0 ⎥ ⎢ ⎢ 0 0 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 1 0 0⎥ ⎥ ⎢ ⎥ ⎢ P1 = ⎢ 0 0 0⎥ ⎥ ⎢ ⎢ −1 0 0⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 0 1 0⎥ ⎦ ⎣ ⎡

0

0

(13)

0 1

is then used to produce the coarse discrete derivative operator ⎡

−1

0 = (P T P1 )−1 P T D0 P0 = ⎢ D ⎣ 0 1 1

−1 Copyright q

2008 John Wiley & Sons, Ltd.

⎤ 0 ⎥ −1 1⎦ 1

0

(14)

1

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

174

N. BELL AND L. N. OLSON

for the mesh in Figure 6. Subsequent iterations of the algorithm produce operators ⎡ ⎤ 0 ⎢ ⎥ 1 = (P T P2 )−1 P T D1 P1 = [1 1 −1], D 2 = D2 P2 = [0] P2 = ⎣ 0⎦ , D 2 2

(15)

1 which complete the coarse de Rham complex. 2.5. Commutativity 0, D 1, . . . , D K We now prove tentative prolongators P0 , P1 , . . . , PK and coarse derivative operators D produced by Algorithm 1 satisfy commutative diagram (8). The result is summarized by the following theorem. Theorem 1 k Let Pk : d → kd denote the discrete k-form prolongation operators with the following properties: Pk+1

has full column rank

(16a)

R(Dk Pk ) ⊂ R(Pk+1 )

(16b)

k ⇐ (P T Pk+1 )−1 P T Dk Pk D k+1 k+1

(16c)

Then, diagram (8) holds. That is, k Dk Pk = Pk+1 D

(17)

Proof Since Pk+1 has full column rank, the pseudoinverse is given by + T T Pk+1 = (Pk+1 Pk+1 )−1 Pk+1

(18)

Recall that for an arbitrary matrix A, the pseudoinverse satisfies A A+ A = A. Furthermore, R(Dk Pk ) ⊂ R(Pk+1 ) implies that Dk Pk = Pk+1 X for some matrix X. Combining these observations, k = Pk+1 P + Dk Pk Pk+1 D k+1 + = Pk+1 Pk+1 Pk+1 X

= Pk+1 X = Dk Pk

Since Algorithm 1 meets assumptions (16a)–(16c) it follows that diagram (8) is satisfied. Also, T P assuming disjoint aggregates, the matrix (Pk+1 k+1 ) appearing in (18) is a diagonal matrix; hence, its inverse is easily computed. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

175

AMG FOR k-FORM LAPLACIANS

2.6. Exact sequences The de Rham complex formed by the fine-level discrete derivative operators 0

D−1 -

D0 - 1 d

0d

D1 ···

D N −1-

dN

DN 0

(19)

is an exact sequence, i.e. img(Dk ) ⊂ ker(Dk+1 ) or equivalently Dk+1 Dk = 0. A natural question to ask is whether the coarse complex retains this property. As argued in Section 2.5, Dk Pk = Pk+1 X for some matrix X; therefore, it follows k+1 D k = P + Dk+1 Pk+1 P + Dk Pk D k+2 k+1 + + = Pk+2 Dk+1 Pk+1 Pk+1 Pk+1 X + Dk+1 Pk+1 X = Pk+2 + = Pk+2 Dk+1 Dk Pk

=0 since Dk+1 Dk = 0 by assumption. From diagram (3), we infer the same result for the associated finite element spaces. 2.7. Smoothed prolongators While the tentative prolongators P0 , P1 , . . . produced by coarsen complex commute with Dk and give rise to an coarse exact sequence, their piecewise-constant nature leads to suboptimal multigrid scaling [14, 15]. In smoothed aggregation [17], the tentative prolongator P is smoothed to produce another prolongator P = S P with superior approximation characteristics. We consider prolongation smoothers of the form S = (I −SA). Possible implementations include Richardson S = I , Jacobi S = diag(A)−1 , and polynomial S = p(A) [19]. Smoothed prolongation operators are desirable, but straightforward application of smoothers to each of P0 , P1 , . . . violates commutativity. The solution proposed in [15] smooths P0 and P1 with compatible smoothers S0 , S1 such that commutativity of the smoothed prolongators 0 . In the following theorem, we generalize this result to P0 , P1 is maintained, i.e. D0 P0 = P1 D arbitrary k. Theorem 2 k Given discrete k-form prolongation operators Pk satisfying (16a)–(16c), let Pk : d → kd denote the smoothed discrete k-form prolongation operators with the following properties: Pk = Sk Pk

(20a)

S0 = (I −S0 DT0 M1 D0 )

(20b)

Sk = (I −Sk DTk Mk+1 Dk −Dk−1 Sk−1 DTk−1 Mk ) Copyright q

2008 John Wiley & Sons, Ltd.

for k>0

(20c)

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

176

N. BELL AND L. N. OLSON

where Sk defines the type of prolongation smoother. Then, diagram (8) holds. That is, k Dk Pk = Pk+1 D

(21)

Dk Sk = Sk+1 Dk

(22)

Proof First, if

then k = Sk+1 Pk+1 D k Pk+1 D T T = Sk+1 Pk+1 (Pk+1 Pk+1 )−1 Pk+1 Dk Pk

= Sk+1 Dk Pk = Dk Sk Pk = Dk Pk Therefore, it suffices to show that (22) holds for all k. For k = 0, we have S1 D0 = (I −S1 DT1 M2 D1 −D0 S0 DT0 M1 )D0

= (D0 −S1 DT1 M2 D1 D0 −D0 S0 DT0 M1 D0 ) = (D0 −D0 S0 DT0 M1 D0 ) = D0 (I −S0 DT0 M1 D0 ) = D0 S0 and for all k>1 we have Sk+1 Dk = (I −Sk+1 DTk+1 Mk+2 Dk+1 −Dk Sk DTk Mk+1 )Dk

= (Dk −Sk+1 DTk+1 Mk+2 Dk+1 Dk −Dk Sk DTk Mk+1 Dk ) = (Dk −Dk Sk DTk Mk+1 Dk ) = (Dk −Dk Sk DTk Mk+1 Dk −Dk Dk−1 Sk−1 DTk−1 Mk ) = Dk (I −Sk DTk Mk+1 Dk −Dk−1 Sk−1 DTk−1 Mk ) = Dk Sk which completes the proof of (21).

k replace Mk k = P T Mk Pk and derivatives D On subsequent levels, the coarse innerproducts M k T k = P Ak Pk can also be and Dk in the definition of Sk . As shown below, the Galerkin product A k Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

177

AMG FOR k-FORM LAPLACIANS

expressed in terms of the coarse operators k = PkT Ak Pk A = PkT DTk Mk+1 Dk Pk k k P T Mk+1 Pk+1 D =D k+1 T

k+1 D k k M =D T

2.8. Extensions and applications Note that condition (9) permits some freedom in our choice of aggregates. For instance, in restricting ourselves to contiguous aggregates we have slightly enriched the range of Pk+1 beyond what is necessary. Provided that Pk+1 already satisfies (9), additional coarse basis functions can be introduced to better approximate low-energy modes. As in smoothed aggregation, these additional columns of Pk+1 can be chosen to exactly interpolate given near-nullspace vectors [17]. So far we have only discussed coarsening the cochain complex (8). It is worth noting that coarsen complex works equally well on the chain complex formed by the mesh boundary operators *k = DTk−1 , 0

DT−1

0d

DTN −2 N −1 DTN −1 ··· d

DT0

dN

DTN

0

(23)

by simply reversing the order of the complex, i.e. (D−1 , D,0 , . . . , D N ) ⇒ (DTN , DTN −1 , . . . , D−1 ). In this case, aggregate nodes will aggregate the top-level elements, for instance, the triangles in Figure 1. Intuitively, *k acts like a derivative operator that maps k-cochains to (k +1)-cochains; however, one typically refers to these as k-chains rather than cochains [20]. In Section 3, we coarsen both complexes when computing Hodge decompositions.

3. HODGE DECOMPOSITION The Hodge decomposition [21] states that the space of k-forms on a closed manifold can be decomposed into three orthogonal subspaces k = dk−1 k−1 ⊕dk+1 k+1 ⊕ Hk

(24)

where Hk is the space of harmonic k-forms, Hk = {h ∈ k |Dk h = 0}. The analogous result holds for the space of discrete k-forms kd , where the derived codifferential [22] T dk = M−1 k−1 Dk−1 Mk

(25)

is defined to be the adjoint of Dk−1 in the discrete innerproduct Mk . Convergence of the discrete approximations to the Hodge decomposition is examined in [23]. In practice, for a discrete k-form k we seek a decomposition k+1 T +h k k = Dk−1 k−1 +M−1 k Dk Mk+1

(26)

k+1 k k k k k−1 and k+1 are for some k−1 ∈ k−1 ∈ k+1 d , d , and h ∈ d , where D h = 0. Note that −1 T generally not unique, since the kernels of Dk−1 and Mk Dk Mk+1 are nonempty. However, the

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

178

N. BELL AND L. N. OLSON

k+1 T discrete k-forms (Dk−1 k−1 ) and (M−1 ) are uniquely determined. We decompose k Dk Mk+1 k into (26) by solving DTk−1 Mk Dk−1 k−1 = DTk−1 Mk k (27)

k+1 T = Dk k Dk M−1 k Dk Mk+1

(28)

k+1 T h k = k −Dk−1 k−1 −M−1 k Dk Mk+1

(29)

Note that (28) involves the explicit inverse M−1 which is typically dense.‡ In the following k sections, we first consider the special case Mk = I and then show how (28) can be circumvented in the general case. Equation (27) is obtained by left multiplying Mk−1 DTk−1 Mk on both sides of (26). Likewise, applying Dk to both sides of (26) yields (28). Equivalently, one may seek minima of the following functionals: Dk−1 k−1 −k Mk ,

k+1 T M−1 −k Mk k Dk Mk+1

(30)

3.1. Special case Taking the appropriate identity matrix for all discrete innerproducts Mk in (27)–(29) yields DTk−1 Dk−1 k−1 = DTk−1 k

(31)

Dk DTk k+1 = Dk k

(32)

h k = k −Dk−1 k−1 −DTk k+1

(33)

Although (31)–(33) are devoid of metric information, some fundamental topological properties of the mesh are retained. For instance, the number of harmonic k-forms, which together form a cohomology basis, is independent of the choice of innerproduct.§ In applications where metric information is either irrelevant or simply unavailable [4], these ‘nonphysical’ equations are sufficient. Algorithm 4. construct solver(k, Mk , D−1 , D0 , . . . , D N ) 1 2 3 4 5 6 7 8 9 10 ‡

A0 ⇐ DTk−1 Mk Dk−1 D0−1 , . . . , D0N ⇐ D−1 , . . . , D N f o r l = 0 t o NUM LEVELS − 1 l+1 l l P0l , . . . , PNl , Dl+1 −1 , . . . , D N ⇐ c o a r s e n c o m p l e x ( D−1 , . . . , D N ) end f o r l = 0 t o NUM LEVELS − 1 l Pl ⇐ s m o o t h p r o l o n g a t o r ( Al , Pk−1 ) T Al+1 ⇐ Pl Al Pl end r e t u r n MG solver ( A0 , A1 , . . . , ANUM LEVELS , P0 , P1 , . . . , PNUM LEVELS−1 )

The covolume Hodge star is a notable exception. the case of M = I , the cohomology basis is actually a homology basis also.

§ In

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

AMG FOR k-FORM LAPLACIANS

179

Algorithm 5. decompose special(k , D−1 , D0 , . . . , D N ) 1 2 3 4 5 6 7 8

s o l v e r 1 ⇐ c o n s t r u c t s o l v e r ( k, I, D−1 , D,0 , . . . , D N ) s o l v e r 2 ⇐ c o n s t r u c t s o l v e r ( N −k −1, I, DTN , DTN −1 , . . . , DT−1 ) k−1 ⇐ s o l v e r 1 ( DTk−1 k ) k+1 ⇐ s o l v e r 2 ( Dk k ) h ⇐ k −Dk−1 k−1 −DTk k+1 r e t u r n k−1 , k+1 , h k

Algorithm 5 demonstrates how the proposed method is used to compute Hodge decompositions in the special case. Multigrid solvers solver1 and solver2 are constructed for the solution of linear systems (31) and (32), respectively. In the latter case, the direction of the chain complex is reversed when being passed as an argument to construct solver. As mentioned in Section 2.8, coarsen complex coarsens the reversed complex with this simple change of arguments. Using the identity innerproduct, construct solver applies the proposed method recursively to produce a progressively coarser hierarchy of tentative prolongators Pkl and discrete derivatives Dlk . The tentative prolongators are then smoothed by a user-defined function smoothprolongator to produce the final prolongators Pl and Galerkin products Al+1 ⇐ PlT Al Pl . Finally, the matrices A0 , . . . , ANUM LEVELS and P0 , . . . , PNUM LEVELS−1 determine the multigrid cycle in a user-defined class MGsolver. Choices for smoothprolongator and MGsolver are discussed in Section 4. 3.2. General case The multilevel solver outlined in Section 3.1 can be directly applied to linear system (27) by passing the innerproduct Mk , instead of the identity, in the arguments to construct solver. However, a different strategy is needed to solve (28) since M−1 k is generally dense and cannot be formed explicitly. In the following, we outline a method for computing Hodge decompositions in the general case. We first remark that if a basis for the space of Harmonic k-forms, Hk = span{h k0 , h k1 , . . . h kH }, is known, then the harmonic component of the Hodge decomposition is easily computed by projecting k onto the basis elements. Furthermore, since k−1 in (27) can also be obtained, we can compute the value of the remaining component (k −Dk−1 k−1 −h k ) which must lie in the T range of M−1 k Dk Mk+1 due to orthogonality of the three spaces. Therefore, the task of computing general Hodge decompositions can be reduced to computing a basis for Hk . Sometimes, a basis is known a priori. For instance, H0 , which corresponds to the nullspace of the pure-Neumann problem, is spanned by constant vectors on each connected component of the domain. Furthermore, if the domain is contractible then Hk = {} for k>0. However, in many cases of interest we cannot assume that a basis for Hk is known and, therefore, it must be computed. Note that decompose special can be used to determine a Harmonic k-form basis for the identity innerproduct by decomposing randomly generated k-forms until their respective harmonic components become linearly dependent. We denote this basis {h k0 , h k1 , . . . h km } and their span Hk . Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

180

N. BELL AND L. N. OLSON

Using these k-forms, a basis for the harmonic k-forms with innerproduct Mk can be produced by solving DTk−1 Mk Dk−1 ik−1 = DTk−1 Mk h ik h ik = h ik −Dk−1 ik−1

(34) (35)

It is readily verified that h k0 , . . . , h km are harmonic Dk h ik = Dk h ik −Dk Dk−1 ik−1 = 0 −1 k−1 T k T T k k )=0 M−1 k−1 Dk−1 Mk h i = Mk−1 (Dk−1 Mk h i −Dk−1 Mk h i Dk−1 i

(36) (37)

since Dk Dk−1 = 0 and Dk h ik = 0 by assumption. It remains to be shown that h k0 , . . . , h km are linearly independent. Supposing h k0 , . . . , h km to be linearly dependent, there exist scalars c0 , . . . , c H not all zero such that 0=

m

i=0

=

m

i=0

=

m

i=0

ci h ik ci (h ik −Dk−1 ik−1 ) ci h ik −

m

i=0

ci Dk−1 ik−1

N −1 k k k which is a contradiction, since ( i=0 ci h i ) ∈ H is nonzero and H ⊥ R(Dk−1 ). Note that the harmonic forms h k0 , . . . , h km are not generally the same as the harmonic components of the random k-forms used to produce h k0 , . . . h km . 4. NUMERICAL RESULTS We have applied the proposed method to a number of structured and unstructured problems. In all cases, a multigrid V (1, 1)-cycle is used as a preconditioner to conjugate gradient iteration. Unless stated otherwise, a symmetric Gauss–Seidel sweep is used during pre- and post-smoothing stages. Iteration on the positive-semidefinite systems DTk Dk ,

Dk DTk ,

DTk Mk+1 Dk

(38)

proceeds until the relative residual is reduced by 10−10 . The matrix DT0 M1 D0 corresponds to a Poisson problem with pure-Neumann boundary conditions. Similarly, DT1 M2 D1 is an eddycurrent problem (6) with = 0. As explained in Section 3, matrices (38) arise in discrete Hodge decompositions. The multigrid hierarchy extends until the number of unknowns falls below 500, at which point a pseudoinverse is used to perform the coarse level solve. The tentative prolongators are smoothed Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

181

AMG FOR k-FORM LAPLACIANS

twice with a Jacobi smoother S=I−

4 diag(A)−1 A 3max

(39)

P = SS P

(40)

where max is an upper bound on the spectral radius of diag(A)−1 A. When zero or near zero values appear on the diagonal of the Galerkin product P T AP , the corresponding rows and columns are zeroed and ignored during smoothing. We discuss this choice of prolongation smoother in Section 4.1. Tables I and II show the result of applying the proposed method to regular quadrilateral and hexahedral meshes of increasing size. In both cases, the finite element spaces described in [10] are used to produce the innerproducts Mk . The systems are solved with a random initial value for x. Since the matrices are singular, the solution x is an arbitrary null vector. Column labels are explained as follows: • ‘Grid’—dimensions of the quadrilateral/hexahedral grid. √ • ‘Convergence’—geometric mean of residual convergence factors N r N / r0 . 1 • ‘Work/Digit’—averaged operation cost of 10 residual reduction in units of nnz(A).¶ Table I. Two-dimensional scaling results. System

Grid

Unknowns

Convergence

Work/digit

Complexity

Levels

DT0 D0

2502 5002 10002

63 001 251 001 1 002 001

0.075 0.100 0.063

8.172 9.321 7.866

1.636 1.661 1.686

4 4 5

DT1 D1

2502 5002 10002

125 500 501 000 2 002 000

0.096 0.103 0.085

8.370 8.741 8.142

1.506 1.527 1.545

4 5 5

D0 DT0

2502 5002 10002

125 500 501 000 2 002 000

0.124 0.133 0.094

9.529 9.932 8.550

1.530 1.542 1.553

4 5 5

D1 DT1

2502 5002 10002

62 500 250 000 1 000 000

0.063 0.063 0.063

7.664 7.758 7.868

1.641 1.664 1.687

4 4 5

DT0 M1 D0

2502 5002 10002

63 001 251 001 1 002 001

0.043 0.055 0.041

5.894 6.480 5.963

1.415 1.432 1.448

4 4 5

DT1 M2 D1

2502 5002 10002

125 500 501 000 2 002 000

0.095 0.103 0.085

8.362 8.738 8.140

1.506 1.527 1.545

4 5 5

¶ Including

Copyright q

the cost of conjugate gradient iteration. 2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

182

N. BELL AND L. N. OLSON

Table II. Three-dimensional scaling results. System

Grid

Unknowns

Convergence

Work/digit

Complexity

Levels

DT0 D0

253 503 1003

17 576 132 651 1 030 301

0.120 0.151 0.105

7.976 9.118 7.960

1.268 1.300 1.358

3 3 4

DT1 D1

253 503 1003

50 700 390 150 3 060 300

0.192 0.216 0.208

10.432 11.587 11.849

1.296 1.342 1.415

3 4 4

DT2 D2

253 503 1003

48 750 382 500 3 030 000

0.188 0.218 0.267

9.342 10.447 12.350

1.156 1.180 1.217

3 3 4

D0 DT0

253 503 1003

50 700 390 150 3 060 300

0.287 0.391 0.323

13.323 17.594 14.811

1.246 1.235 1.252

3 4 4

D1 DT1

253 503 1003

48 750 382 500 3 030 000

0.187 0.264 0.194

10.928 13.855 11.630

1.389 1.403 1.455

3 4 4

D2 DT2

253 503 1003

15 625 125 000 1 000 000

0.089 0.102 0.103

7.152 7.649 7.949

1.302 1.318 1.368

3 3 4

DT0 M1 D0

253 503 1003

17 576 132 651 1 030 301

0.037 0.053 0.038

4.804 5.495 5.054

1.178 1.200 1.241

3 3 4

DT1 M2 D1

253 503 1003

50 700 390 150 3 060 300

0.097 0.113 0.088

6.838 7.461 6.932

1.184 1.214 1.264

3 4 4

DT2 M3 D2

253 503 1003

48 750 382 500 3 030 000

0.188 0.223 0.265

9.334 10.585 12.294

1.156 1.180 1.217

3 3 4

• ‘Complexity’—total memory cost of multigrid hierarchy relative to ‘System’. • ‘Levels’—number of levels in the multigrid hierarchy. For each k, the algorithm exhibits competitive convergence factors while maintaining low operator complexity. Together, the work per digit-of-accuracy remains bounded as the problem size increases. In Table III, numerical results are presented for the unstructured tetrahedral mesh depicted in Figure 7. As with classical algebraic multigrid methods, performance degrades in moving from a structured to an unstructured tessellation. However, the decrease in performance for the scalar problems DT0 D0 and DT0 M1 D0 is less significant than that of the other problems. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

183

AMG FOR k-FORM LAPLACIANS

Table III. Solver performance on the unstructured tetrahedral mesh in Figure 7. System

Unknowns

Convergence

Work/digit

Complexity

Levels

DT0 D0 DT1 D1 DT2 D2

84 280

0.073

6.601

1.304

3

554 213

0.378

18.816

1.391

4

920 168

0.366

15.856

1.186

4

D0 DT0

554 213

0.236

19.848

2.289

4

D1 DT1

920 168

0.390

17.068

1.197

4

D2 DT2 DT0 M1 D0 DT1 M2 D1 DT2 M3 D2

450 235

0.370

14.400

1.043

3

84 280

0.144

8.949

1.304

3

554 213

0.518

29.428

1.483

4

920 168

0.348

15.111

1.187

4

Figure 7. Titan IV rocket mesh.

4.1. Prolongation smoother On the nonscalar problems considered, we found second degree prolongation smoothers (39) noticeably more efficient than first degree prolongation smoothers. While additional smoothing operations generally improve the convergence rate of smoothed aggregation methods, this improvement is typically offset by an increase in operator complexity: therefore, the resultant work per digit of accuracy is not improved. However, there is an important difference between the tentative prolongators in the scalar and nonscalar problems. In the scalar case, all degrees of freedom Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

184

N. BELL AND L. N. OLSON

Table IV. Comparison of prolongation smoothers. System

DT1 M2 D1

DT1 M2 D1

DT2 M3 D2

Grid

Degree

Percent zero

Convergence

Work/digit

Complexity

2502

0 1 2 3 4

66.8 66.8 22.9 0.4 0.0

0.697 0.357 0.096 0.063 0.063

42.255 14.774 8.379 9.515 10.188

1.123 1.123 1.506 2.084 2.250

503

0 1 2 3 4

67.6 66.5 8.8 0.3 0.0

0.567 0.290 0.096 0.063 0.063

25.043 11.497 7.460 9.011 9.074

1.034 1.035 1.214 1.577 1.632

503

0 1 2 3 4 5

89.63 89.63 63.93 23.77 6.48 2.07

0.549 0.382 0.214 0.122 0.098 0.089

23.670 14.753 10.304 9.203 8.348 10.267

1.034 1.034 1.180 1.481 1.487 1.953

are associated with a coarse aggregate; therefore, the tentative prolongator has no zero rows. As described in Section 2.4, the tentative prolongator for nonscalar problems has zero rows for elements contained in the interior of a nodal aggregate. In the nonscalar case, additional smoothing operations incorporate a greater proportion of these degrees of freedom into the range of the final prolongator. The influence of higher degree prolongation smoothers on solver performance is reported in Table IV. Column ‘Degree’ records the degree d of the prolongation smoother P = S d P, whereas ‘Percent zero’ reflects the percentage of zero rows in the first-level prolongator. As expected, the operator complexity increases with smoother degree. However, up to a point, this increase is less significant than the corresponding reduction in solver convergence. Second-degree smoothers exhibit the best efficiency in both instances of the problem DT1 M2 D1 and remain competitive with higher-degree smoothers in the last test. Since work per digit figures exclude the cost of constructing multigrid transfer operators, these higher-degree smoothers may be less efficient in practice.

5. CONCLUSION We have described an extension of Reitzinger and Sch¨oberl’s methodology [14] to higherdimensional k-forms with the addition of smoothed prolongation operators. Furthermore, we have detailed properties of the prolongation operator that arise from this generalized setting. Specifically, we have identified necessary and sufficient conditions under which commutativity is maintained. The prolongation operators give rise to a hierarchy of exact finite element sequences. The generality of the method is appealing since the components are constructed independently of a particular mimetic discretization. Finally, we have initiated a study of algebraic multigrid for the Hodge decomposition of general k-forms. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

AMG FOR k-FORM LAPLACIANS

185

REFERENCES 1. Yee KS. Numerical solution of initial boundary value problems involving Maxwells equations in isotropic media. IEEE Transactions on Antennas and Propagation 1966; AP-14(3):302–307. 2. Bossavit A. On the numerical analysis of eddy-current problems. Computer Methods in Applied Mechanics and Engineering 1981; 27(3):303–318. 3. Arnold DN. Differential complexes and numerical stability. Proceedings of the International Congress of Mathematicians, Beijing. Plenary Lectures, vol. 1, 2002. 4. de Silva V, Ghrist R. Homological sensor networks. Notices of the American Mathematical Society 2007; 54:10–17. 5. Polthier K, Preuss E. Identifying vector field singularities using a discrete hodge decomposition. In Visualization and Mathematics, VisMath, Hege HC, Polthier K (eds). Springer: Berlin, 2002. 6. Tong Y, Lombeyda S, Hirani AN, Desbrun M. Discrete multiscale vector field decomposition. ACM Transactions on Graphics (Special issue of SIGGRAPH 2003 Proceedings) 2003; 22(3):445–452. 7. Fisher M, Schr¨oder P, Desbrun M, Hoppe H. Design of tangent vector fields. SIGGRAPH ’07: ACM SIGGRAPH 2007 Papers, New York, NY, U.S.A. ACM: New York, 2007; 56. 8. Whitney H. Geometric Integration Theory. Princeton University Press: Princeton, NJ, 1957. 9. Bossavit A. Whitney forms: a class of finite elements for three-dimensional computations in electromagnetism. IEE Proceedings 1988; 135(Part A(8)):493–500. 10. Bochev PB, Robinson AC. Matching algorithms with physics: exact sequences of finite element spaces. In Collected Lectures on Preservation of Stability Under Discretization, Chapter 8, Estep D, Tavener S (eds). SIAM: Philadelphia, PA, 2002; 145–166. 11. Gradinaru V, Hiptmair R. Whitney elements on pyramids. Electronic Transactions on Numerical Analysis 1999; 8:154–168. 12. Hiptmair R. Multigrid method for maxwell’s equations. SIAM Journal on Numerical Analysis 1999; 36(1): 204–225. 13. Arnold DN, Falk RS, Winther R. Multigrid in H (div) and H (curl). Numerische Mathematik 2000; 85(2):197–217. 14. Reitzinger S, Sch¨oberl J. An algebraic multigrid method for finite element discretizations with edge elements. Numerical Linear Algebra with Applications 2002; 9:223–238. 15. Hu JJ, Tuminaro RS, Bochev PB, Garasi CJ, Robinson AC. Toward an h-independent algebraic multigrid method for Maxwell’s equations. SIAM Journal on Scientific Computing 2006; 27:1669–1688. 16. Jones J, Lee B. A multigrid method for variable coefficient maxwell’s equations. SIAM Journal on Scientific Computing 2006; 27(5):1689–1708. 17. Vanˇek P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing 1996; 56(3):179–196. 18. Muhammad A, Egerstedt M. Control using higher order Laplacians in network topologies. Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, 2006; 1024–1038. 19. Adams M, Brezina M, Hu J, Tuminaro R. Parallel multigrid smoothing: polynomial versus Gauss–Seidel. Journal of Computational Physics 2003; 188(2):593–610. 20. Hirani AN. Discrete exterior calculus. Ph.D. Thesis, California Institute of Technology, May 2003. 21. Frankel T. An introduction. The Geometry of Physics (2nd edn). Cambridge University Press: Cambridge, 2004. 22. Bochev PB, Hyman JM. Principles of mimetic discretizations of differential operators. In Compatible Spatial Discretizations, Arnold DN, Bochev PB, Lehoucq RB, Nicolaides RA, Shashkov M (eds). The IMA Volumes in Mathematics and its Applications, vol. 142. Springer: Berlin, 2006; 89–119. 23. Dodziuk J. Finite-difference approach to the Hodge theory of harmonic forms. American Journal of Mathematics 1976; 98(1):79–104.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:165–185 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:187–200 Published online 7 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.563

A fast full multigrid solver for applications in image processing M. St¨urmer∗, † , H. K¨ostler and U. R¨ude Department of Computer Science 10, University of Erlangen-Nuremberg, Cauerstrasse 6, 91058 Erlangen, Germany

SUMMARY We present a fast, cell-centered multigrid solver and apply it to image denoising and non-rigid diffusionbased image registration. In both applications, real-time performance is required in 3D and the multigrid method has to be compared with solvers based on fast Fourier transform (FFT). The optimization of the underlying variational approach results for image denoising directly in one time step of a parabolic linear heat equation, for image registration a non-linear second-order system of partial differential equations is obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where each time step again results in an elliptic linear heat equation. The multigrid implementation comes close to real-time performance for medium size medical images in 3D for both applications and is compared with a solver based on FFT using available libraries. Copyright q 2007 John Wiley & Sons, Ltd. Received 15 May 2007; Accepted 21 September 2007 KEY WORDS:

multigrid; performance optimization; FFT; image processing; image registration; image denoising

1. INTRODUCTION In recent years, data sizes in image-processing applications have drastically increased due to the improved image acquisition systems. Modern computer tomography (CT) scanners can create volume data sets of 5123 voxels or more [1, 2]. However, users expect real-time image manipulation and analysis. Thus, fast algorithms and implementations are needed to fulfill these tasks. Many image-processing problems can be formulated in a variational framework and require the solution of a large, sparse, linear system arising from the discretization of partial differential ∗ Correspondence

to: M. St¨urmer, Department of Computer Science 10, University of Erlangen-Nuremberg, Cauerstrasse 6, 91058 Erlangen, Germany. † E-mail: [email protected] Contract/grant sponsor: Deutsche Forschungsgemeinschaft (German Science Foundation); contract/grant number: Ru 422/7-1, 2, 3 Contract/grant sponsor: Bavarian KONWIHR supercomputing research consortium

Copyright q

2007 John Wiley & Sons, Ltd.

188

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

equations (PDEs). Often these PDEs are inherently based on some kind of diffusion process. In simple cases, it is possible to use fast Fourier transform (FFT)-based techniques to solve these PDEs that are of complexity O(n log n). The FFT algorithm was introduced in 1965 by Cooley and Tukey [3]; for an overview of Fourier transform methods, we refer e.g. to [4–6]. As an alternative, multigrid methods are more general and can reach an asymptotically optimal complexity of O(n). For discrete Fourier transforms, flexible and highly efficient libraries optimized for special CPU architectures such as the FFTW library [7] or the Intel Math Kernel Library (MKL) [8] are available. However, we are currently not aware of similarly tuned multigrid libraries in 3D and only of DiMEPACK [9] for 2D problems. The purpose of this paper is to close this gap and to implement a multigrid solver optimized especially for the Intel x86 architecture that is competitive to highly optimized FFT libraries and apply it to typical applications in the area of image processing. The outline of this paper is as follows: We describe the multigrid scheme including some results on its convergence and discuss some implementation and optimization issues in Section 2. Then, the variational approaches used for image denoising and non-rigid diffusion registration are introduced in Section 3. Finally, we compare computational times of our multigrid solver and the FFTW package as obtained for image denoising and non-rigid registration of medical CT images.

2. MULTIGRID For a comprehensive overview on multigrid methods we refer to, e.g. [10–15]. In this paper, we implement a multigrid solver for the linear heat equation *u (x, t)−u(x, t) = f (x), *t

u(x, 0) = u 0 (x)

(1)

with time t ∈ R+ , u, f : ⊂ R3 → R, x ∈ , initial solution u 0 : ⊂ R3 → R and homogeneous Neumann boundary conditions. Note that in practice u(x, t) is often computed for a finite t, only, and that the solution tends to the well-known Poisson equation in the limit for t → ∞. We discretize (1) with finite differences u h (x, )−u 0 (x) −h u h (x, ) = f h (x)

(2)

on a regular grid h with mesh size h and time step . h denotes the well-known 7-point stencil for the Laplacian. We consider in the following only a single time step, where we have to solve the elliptic equation (I −h )u h (x, ) = f h (x)+u 0 (x)

(3)

In this paper, we are dealing with image-processing problems, where we can think of the discrete voxels located in the cell centers. Therefore, we have chosen to use a cell-centered multigrid scheme with constant interpolation and 8-point restriction. Note that this combination of intergrid transfer operators will lead to multigrid convergence rates significantly worse than what could be ideally obtained [15, 16]. This will be shown by local Fourier analysis (LFA) and numerical experiments. However, this leads to a relatively simple algorithm that satisfies our numerical requirements and is quite suitable for a careful machine-specific performance optimization. For relaxation we choose Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

189

an -Red–Black Gauss–Seidel smoother (RBGS) using = 1.15, which is known to be a better choice in 3D for the given problem than simple Gauss–Seidel relaxation [13, 17]. 2.1. Efficient multigrid implementation This section describes our multigrid implementation. All floating point calculations are done with single precision (four bytes per value), as this accuracy is already far beyond that of the source image data. Performance of multigrid implementations can be improved significantly if code optimization techniques are used as shown in [18–21]. In this paper we will focus on the x86 processor architecture, since it is currently the most common desktop PC platform. 2.1.1. Memory layout. Best performance on current x86 processors can be achieved by using the SIMD (single instruction multiple data) unit, which was introduced to the architecture in 1999 with the Pentium III as streaming SIMD extension (SSE). These instructions perform vectorlike operations on units of 16 bytes, which can be seen as a SIMD vector data type containing four single precision floating point numbers in our case. Operating on naturally aligned (i.e. at addresses multiples of their size) SIMD vectors, the SSE unit provides high bandwidth especially to the caches. Consequently, the memory layout must support as many aligned data accesses in all multigrid components as possible. To enable efficient handling of the boundary conditions, we chose to explicitly store boundary points around the grid; by copying the outer unknowns before smoothing or calculating the point-wise residuals, we need no special handling of the homogeneous Neumann boundary conditions. The first unknown of every line is further aligned to a multiple of 16 bytes by padding, i.e. filling up the line with unused values up to a length of multiples of four. This enables SIMD processing for any line length, as boundary values, which are generated just-in-time, and the padding area can be overwritten with fake results. 2.1.2. SIMD-aware implementation. Unfortunately, current compilers fail to generate SIMD instruction code from a scalar description in most real-world programs. The SIMD unit can be programmed in assembly language, but as it makes the code more portable and maintainable, our C++ implementation uses compiler intrinsics, which extend the programming language with assembly-like instructions for SIMD vector data types. Implementing the RBGS relaxation in SIMD is not straightforward, as only red or black points must be updated, while every SIMD vector contains two values of each color. The idea of the SIMD-aware RBGS is to first calculate a SIMD vector of relaxed values, like for a Jacobi method. Subsequently, a SIMD multiplication with appropriately initialized SIMD registers is performed such that either values are preserved and the others are relaxed, which can be illustrated as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ u new (x, y) u old (x, y) u relax (x, y) = 1− ∗ + ∗ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ u new (x +1, y)⎥ = ⎢ 1 ⎥ ∗ ⎢ u old (x +1, y)⎥ + ⎢ 0 ⎥ ∗ ⎢ u relax (x +1, y)⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ u new (x +2, y)⎦ = ⎣ 1−⎦ ∗ ⎣ u old (x +2, y)⎦ + ⎣ ⎦ ∗ ⎣ u relax (x +2, y)⎦ u new (x +3, y) =

1

∗ u old (x +3, y) +

0

∗ u relax (x +3, y)

The better internal and external bandwidths of SIMD over the scalar floating point unit lead to a real performance gain, even if we actually double the number of floating point operations. The cell-centered approach is advantageous especially for restriction and interpolation. Coarsening is done by averaging eight neighboring fine grid residuals, where every fine grid residual Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

190

contributes only to a single coarse grid point. Hence, calculation of the residual and its restriction can be done in SIMD and without storing residuals to memory. The idea is to compute four SIMD registers containing residuals from four neighboring lines and averaging them into a single SIMD vector first. Its values are reordered by special shuffle instructions, so that two coarse grid righthand side values can be generated by averaging its first and second, and its third and fourth values. By reusing some common expressions, this can be further simplified. The constant interpolation can also be executed very efficiently in the SIMD unit with shuffle operations. Additionally, the loops are unrolled and the instructions scheduled carefully by hand to support the compiler in producing fast code. 2.1.3. Blocking and fusion of components. SIMD optimization is most useful when combined with techniques to enhance spatial and temporal data locality developed in [20, 22–24] and to exploit the higher bandwidth of the caches. For smaller grids the post-smoother uses a simple blocking method as illustrated in Figure 1(I): After preparing the first boundary (I(a)), it continues after

(I)

(II)

(III)

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

Figure 1. Illustration of the different blocking methods on a 10×10×10 cube. (I) Simple plane blocking of one RBGS update: (a) initial boundary handling; (b) first block; (c) blocking complete; and (d) final boundary handling. (II) Super-blocking of one RBGS update: (a) first sub-block of first super-block; (b) first super-block complete; (c) middle super-block complete; and (d) last super-block complete. (III) Super-blocking of one RBGS update fused with calculation of residual and restriction: (a) initial boundary handling; (b) first sub-block of first super-block; (c) first super-block complete; and (d) only final boundary handling missing. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

191

the red update in line y, z immediately with the black update in line y, z −1 (I(b)) through the whole grid (I(c)) and finishes the sweep with a black update in the last plane (I(d)). As long as data from the last block can be held in the cache hierarchy, the solution and right-hand side grid must be transferred from and to memory only once. For larger grids this is not possible anymore and another blocking level must be introduced as illustrated in Figure 1(II): The grid is divided in the x–z direction then, and every resulting super-block is processed in a similar manner as in the simple case, but the red update in line y, z is followed by the black update in line y −1, z to respect data dependencies between two super-blocks. Therefore, the first and last super-blocks need a special boundary handling (II(a–d)). This two-fold blocking method is slightly less effective, since the super-blocks overlap and some values are read from main memory twice. The optimal super-block height depends on the cache size and the line length. The pre-smoother extends these blocking methods further by fusing the smoothing step with calculation and restriction of the residuals. For smaller grids, the simpler blocking method working on whole planes (I) is extended: the right-hand side values of the coarser grid plane z are computed immediately after smoothing in the planes 2z and 2z +1 is done. This leads to a slightly more complex handling at the first and last planes. For larger planes, however, super-blocks must be used again as depicted in Figure 1(III). 2.2. Convergence rates The asymptotic convergence rates of our algorithm are evaluated in a power iteration for Equation (3), i.e. setting the right-hand side f h and u 0 to zero and scaling the discrete L 2 -norm of the solution u h to 1 after each multigrid V-cycle iteration step. In Table I asymptotic convergence rates (after 100 iterations) for different sizes are shown. These values refer to the case, when (1) degenerates to the Poisson equation, simulated by setting = 1030 . As expected the convergence rates are even better for finite and smaller . We compare these results with LFA predictions computed by the lfa package [14] in Table II again for the case of Poisson’s equation. This confirms our observations that due to the constant interpolation the asymptotic convergence rates get worse for smaller mesh sizes. Note that using a simple RBGS smoother by setting = 1 leads to a worse asymptotic convergence factor. 2.3. Performance results Next we discuss performance results measured on two different test platforms. As reference we present run time for a forward and backward FFT used for periodic boundary conditions and Table I. Asymptotic convergence rates for different time steps measured experimentally with mesh size h = 1.0 on the finest grid and one-grid point on the coarsest level. Size

V(1, 1)

V(2, 2)

643 1283 2563 5123

0.27 0.29 0.31 0.34

0.07 0.07 0.07 0.07

Note: For → ∞, this results effectively in the Poisson equation. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

192

Table II. Smoothing factor and three-grid asymptotic convergence factor (M3L ) for size 643 and = 1030 obtained by LFA. V(1, 1)

(S)

(M3L )

(S)

(M3L )

1.0 1.15 1.15

0.20 0.08 0.08

0.47 0.20 0.10

0.04 0.04 0.04

0.07 0.06 0.06

Interpolation Constant Constant LIN

V(2, 2)

Note: Settings are equivalent to Table I.

Table III. Wallclock times in ms for FFT (real type, out of place, forward and backward) and the optimized multigrid on an AMD Opteron 248 2.2 GHz cluster node. Size

V(1, 1)

FMG V(1, 1)

FMG V(2, 2)

FFT (FFTW)

DCT (FFTW)

32 64 128 256 512

0.63 6.97 56.0 445 3669

0.80 9.55 78.7 622 5175

1.38 14.9 122 976 7943

0.85 10.4 107 992 9274

2.27 19.1 197 2024 67 766

Table IV. Wallclock times in ms for FFT (real type, out of place, forward and backward) and the optimized multigrid on an Intel Core2 Duo 2.4 GHz (Conroe) workstation. Size

V(1, 1)

32 0.43 64 3.33 128 31.6 256 264 512 2168

FMG V(1, 1) FMG V(2, 2) FFT (FFTW) DCT (FFTW) FFT (MKL) 0.55 4.29 44.1 370 3026

0.93 7.12 68.3 574 4699

0.40 3.73 50.4 473 4174

1.43 12.2 123 1246 11 067

0.71 5.27 45.8 401 3510

discrete cosine transform (DCT) used for Neumann boundary conditions, respectively. This does not contain the time necessary for actually solving the problem in Fourier space as described in Section 8, which is highly dependent on the code quality. For our applications, the accuracy of a simple FMG-V(1, 1) or even a simple V(1, 1)-cycle is often sufficient, as will be explained in Section 3. On both platforms, we compare the performance of our code-optimized multigrid implementation with the performance of the well-known FFTW package [25] (version 3.1.2). The first test platform is an AMD Opteron 248 cluster node. The CPUs run at 2.2 GHz and provide a 1 MB unified L2 and 64 kB L1 data cache and are connected to DDR-333 memory. For this platform, the GNU C and C++ compiler (version 4.1.0 for 64-bit environment) was used. Measurements (see Table III) show that a full multigrid with V(1, 1)-cycles can outperform the FFTW’s FFTs and is much faster than its DCTs even with V(2, 2)-cycles. The second test platform is an Intel Core2 Duo (Conroe) workstation. The CPU runs at 2.4 GHz, both cores have an L1 data cache of 16 kB, share 4 MB of unified L2 cache and are connected to DDR2-667 memory. For this platform, the Intel 64 compiler suite (version 9.1) was used. We also Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

193

present results for a beta version of the Intel MKL [8] (version 9.0.06 beta for 64-bit environment), which provides an FFTW-compatible interface for FFTs through wrapper functions, but no DCT functions at all. Although a slightly different instruction scheduling more suitable for that CPU type is used, all multigrid variants are slower at smaller problem sizes than the FFTs of FFTW and the MKL on this platform (see Table IV), the FMG with V(2, 2)-cycles even at all problem sizes. Again, the DCTs take much more time than the code-optimized multigrid at all problem sizes tested.

3. VARIATIONAL APPROACHES IN IMAGE PROCESSING Variational approaches in image processing are often considered as too slow for real-time applications, especially in 3D. Nevertheless, they are attractive due to their flexibility and the quality of the results, see e.g. [1, 26–31]. In the following, we introduce two very simple variational prototype problems. Most of the more complicated image-processing tasks consist of extensions of these approaches that include, e.g. introducing local anisotropy in the PDEs. The reason why we restrict ourselves to these simple approaches is that they can be solved by FFT-based methods and by multigrid and they are therefore good benchmark problems to test the best possible speed of variational image-processing methods. 3.1. Image denoising The task of image denoising is to remove the noise from a given d-dimensional image u 0 : ⊂ Rd → R. One simple variational based on Tikhonov regularization [32] is to minimize the functional E 1 (u) = |u 0 −u|2 +|∇u|2 dx (4)

with x ∈ Rd and ∈ R+ over the image domain ⊂ Rd . A necessary condition for a minimizer u : → R, the denoised image, is characterized by the Euler–Lagrange equations u −u 0 −u = 0

(5)

with homogeneous Neumann boundary conditions. This is equivalent to (3) with f h = 0 and = . In an infinite domain, an explicit solution is given by G √2t (x−y)u 0 (y) dy = (G √2t ∗u 0 )(x) (6) u(x, t) = Rd

where the operator ∗ denotes the convolution of the grid function u 0 and the Gaussian kernel G (x) =

1 −|x|2 /(22 ) e 22

(7)

with standard deviation ∈ R+ . This is equivalent to applying a low-pass filter and can be transformed into Fourier space, where a convolution corresponds to a multiplication of the Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

194

transformed signals. If we denote the Fourier transform of a signal f : Rd → R by F[ f ] and use F[G ](w) = e−|x|

2 /(2/2 )

,

w ∈ Rd

it follows that F[G ∗u 0 ](w) = e−|x|

2 /(2/2 )

F[u 0 ](w)

(8)

Summarizing, we have three choices to compute the denoised image: 1. the convolution of the image with a discrete version of the Gaussian kernel (7), 2. the use of an FFT to solve (8) or 3. the application of a multigrid method to (3). In the first two methods, we extend the image symmetrically and use periodic boundary conditions, while we assume homogeneous Neumann boundary conditions for the third method. In most applications, applying a filter mask to the image constructed from a discrete version of the Gaussian kernel (7) is an easy and efficient way to denoise the image. However, if large (and thus large t) is required, the filter masks become large and computationally inefficient. To show this we add Gaussian noise to a rendered 3D MRI image (size 256×256×160) of a human head (see Figure 2) and filter it using masks of sizes 5×5×5 and 3×3×3. We apply the masks in each direction separately to the image, but do not decompose them as described in [28] to speed up the computation further. Then we use our cell-based multigrid method to solve (3) for = 1.21. Figure 2 shows the resulting blurred volume. Larger time steps would blur image edges too much. Runtimes for different methods measured on the AMD Opteron platform described in Section 2.3 are shown in Table V. Times for FFT-based denoising include applying (8) besides forward and backward transforms. The multiplication with the exponential was not optimized and took about 50% of the time. Note that the Laplacian has very strong isotropic smoothing properties and does not preserve edges. Therefore, in practice, model (4) is not used to restore deteriorated images, but to presmooth the image, e.g. in order to ensure a robust estimation of the image gradient. Next, we turn to another prototype problem in image processing that involves also the solution of several problems of type (3).

Figure 2. Rendered 3D MRI image with added Gaussian noise ( = 10) added (left) and after denoising (right) using a V(1, 1)-cycle of the cell-centered multigrid method. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

195

Table V. Runtime for denoising a 3D MRI image (size 256×256×160) of a human head with added Gaussian noise measured on the AMD Opteron platform. Method

Runtime (ms)

Filtering with a mask of size 5×5×5 Filtering with a mask of size 3×3×3 FMG-V(1, 1) FFT

1200 680 390 1140

3.2. Non-rigid image registration The task of image registration is to align two or more images from the same or different modalities [33, 34]. We consider here only mono-modal registration. This requires finding a suitable spatial transformation such that a transformed image becomes similar to another one, see e.g. [29, 35–39]. This deformation is independent of the motion of the object, e.g. a rotation. For image registration, two d-dimensional images are given by T, R : ⊂ Rd → R

(9)

where T and R are template image and reference image, respectively, and is the image domain. The task of non-rigid image registration is to find a transformation (x) such that the deformed image T (u (x)) can be matched to image R(x). The transformation is defined as u (·) : Rd → Rd ,

u (x) := x−u(x),

x⊂

where the displacement u(x) : Rd → Rd , u = (u 1 , . . . , u d ) is a d-dimensional vector field. Mathematically, we again use a variational approach to minimize the energy functional E 2 (u) =

(T (x−u(x))− R(x))2 +

d

∇u l 2 dx

(10)

l=1

that consists of two parts. The first term (T (x−u(x))− R(x))2 is a distance measure that evaluates the similarity of the two images. Here, we restrict ourselves to the sum of squared differences (SSD) as represented in the integral in (10). When discretized, this results in a point-wise ‘leastsquares’ difference of gray values. The second term, the regularizer, controls the smoothness or regularity of the transformation. In the literature many different regularizers were discussed [29]. d We restrict ourselves here to the so-called diffusion regularizer l=1 ∇u l 2 [35]. By choosing + different parameters ∈ R , one can control the relative weight of the two terms in the functional [40, 41]. The optimization of the energy functional results in nonlinear Euler–Lagrange equations ∇T (x−u(x))(T (x−u(x))− R(x))+u = 0

(11)

with homogeneous Neumann boundary conditions that can be discretized by finite differences on a regular grid h with mesh size h. To treat the nonlinearity often an artificial time is Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

196

Algorithm 1. Image registration scheme. 1. 2. 3. 4. 5. 6. 7.

Set u0 ;f 0 = ∇h T (uk )(T (uk )− R); for timestep = 0 to k do Compute f k = ∇h T (uk )(T (uk )− R); Update := , := if necessary; Compute rk = f k +uk ; Solve (I −h )uk+1 = rk ; end for

introduced *t u(x, t)−u(x, t) = ∇T (x−u(x, t))(T (x−u(x, t))− R(x))

(12)

which is discretized by a semi-implicit scheme with a discrete time step , where the nonlinear term is evaluated at the old time level k (uk+1 h −uh ) k k −h uk+1 h = ∇h T (x−uh )(T (x−uh )− R(x))

(13)

The complete image registration scheme can be found in Algorithm 1. Note that in each time step, line 6 of Algorithm 1 requires the solution of d decoupled scalar linear heat equations of type (3). This can be accomplished by the same multigrid algorithms as for the image denoising in the last section. To minimize the number of time steps, we use a technique described in [42] to adapt the and parameters. The idea is to start with large and (we use = 1000, = 10) penalizing higher oscillations in the solution and preferring global transformations, and then to decrease the parameters by factors = 0.1 and = 0.5 when the improvement of the SSD stagnates. Note that for small the transformations are localized and sensitive to discontinuities or noise in the images. The development of the relative SSD error for an image registration example is found in Figure 3. As initial deformation for the first time step we take an interpolated solution of the

1

relative SSD error

0.8 0.6 0.4 0.2 0

10

20

30

40 50 time step

60

70

80

Figure 3. Relative SSD error for image registration over time. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

197

Figure 4. Slice of reference image (upper left) template image (upper right), distance image Tk –R (lower left) and registered image (lower right).

image registration from the next coarser grid, which explains that the initial relative SSD error is below 1.0. The bends in the curve arise when adapting and . Figure 4 shows slices of the corresponding medical data sets and the registration result. For medical applications, it is not always useful to drive the registration problem to a very small SSD, but to maintain the topology of the medical data. Table VI summarizes the runtimes for different methods to solve (13). A whole time step in the registration algorithm including three linear solves and the computation of the new right-hand side and the SSD error takes 1.4 s. Starting with an FMG-V(2, 1) for the first iterations, it is sufficient to perform an FMG-V(1, 1) after time steps become smaller without losing any accuracy in the solution. The DCT-based implementation is described, e.g. in [29]. Here about 65% of the time was spent to compute the forward and backward transforms, the rest for the non-optimized multiplication of the inverse eigenfunctions. Note that in practice sometimes also Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

198

Table VI. Runtime for one linear solve in one time step in the image registration algorithm for an image of size 256×256×160. Method

Runtime (ms)

FMG-V(2, 2) FMG-V(2, 1) FMG-V(1, 1) DCT AOS

608 499 390 2107 1971

an additive operator splitting (AOS) scheme is used to solve the registration problem [29, 43]. It is fast, but the time step has to be chosen sufficiently small [29]. 4. CONCLUSIONS AND FURTHER WORK A fast cell-based full multigrid implementation for variational image-processing problems is shown to be highly competitive in terms of computing times with alternative techniques such as approaches using FFT-based algorithms. However, this requires a careful machine-specific code optimization. Next, this first step has to be extended to an arbitrary number of grid points in each direction and to anisotropic or nonlinear diffusion models. Furthermore, we consider parallelization of the optimized multigrid solver.

ACKNOWLEDGEMENTS

This research is being supported in part by the Deutsche Forschungsgemeinschaft (German Science Foundation), projects Ru 422/7-1, 2, 3 and the Bavarian KONWIHR supercomputing research consortium [44, 45]. REFERENCES 1. Jain AK. Fundamentals of Digital Image Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989. 2. Oppenheim A, Schafer R. Discrete-time Signal Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989. 3. Cooley J, Tukey J. An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation 1965; 19:297–301. 4. Duhamel P, Vetterli M. Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing 1990; 19:259–299. 5. Rader CM. Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE 1968; 56:1107–1108. 6. Pennebaker W, Mitchell J. JPEG: Still Image Data Compression Standard. Van Nostrand Reinhold: New York, 1993. 7. Frigo M, Johnson S. FFTW: an adaptive software architecture for the FFT. Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, U.S.A., vol. 3, 1998; 1381–1384. 8. MKL. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/. 9. Kowarschik M, Weiß C, R¨ude U. DiMEPACK—a cache-optimized multigrid library. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), vol. I, Las Vegas, NV, U.S.A., Arabnia HR (ed.). CSREA Press: Irvine, CA, U.S.A., 2001; 425–430. 10. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31(138):333–390. Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING

11. 12. 13. 14. 15. 16. 17. 18.

19.

20.

21. 22. 23.

24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.

199

Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000. Trottenberg U, Oosterlee C, Sch¨uller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. Numerical Insights, vol. 5. Chapman & Hall/CRC Press: Boca Raton, FL, U.S.A., 2005. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004. Mohr M, Wienands R. Cell-centred multigrid revisited. Computing and Visualization in Science 2004; 7(3): 129–140. Yavneh I. On red–black SOR smoothing in multigrid. SIAM Journal on Scientific Computing 1996; 17(1):180–192. Barkai D, Brandt A. Vectorized multigrid Poisson solver for the CDC CYBER 205. Applied Mathematics and Computation 1983; 13(3–4):215–228. (Special Issue, Proceedings of the First Copper Mountain Conference on Multigrid Methods, Copper Mountain, CO, McCormick S, Trottenberg U (eds).) Kowarschik M, R¨ude U, Th¨urey N, Weiß C. Performance optimization of 3D multigrid on hierarchical memory architectures. Proceedings of the 6th International Conference on Applied Parallel Computing (PARA 2002), Lecture Notes in Computer Science, vol. 2367. Springer: Berlin, Heidelberg, New York, 2002; 307–316. Kowarschik M. Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on Hierarchical Memory Architectures. Advances in Simulation, vol. 13. SCS Publishing House: Erlangen, Germany, 2004. Bergen B, Gradl T, H¨ulsemann F, R¨ude U. A massively parallel multigrid method for finite elements. Computing in Science and Engineering 2006; 8(6):56–62. Douglas C, Hu J, Kowarschik M, R¨ude U, Weiß C. Cache optimization for structured and unstructured grid multigrid. Electronic Transactions on Numerical Analysis (ETNA) 2000; 10:21–40. Weiß C. Data locality optimizations for multigrid methods on structured grids. Ph.D. Thesis, Lehrstuhl f¨ur Rechnertechnik und Rechnerorganisation, Institut f¨ur Informatik, Technische Universit¨at M¨unchen, Germany, 2001. St¨urmer M. Optimierung von Mehrgitteralgorithmen auf der IA-64 Rechnerarchitektur. Lehrstuhl fr Informatik 10 (Systemsimulation), Institut f¨ur Informatik, University of Erlangen-Nuremberg, Germany, May 2006. Diplomarbeit. FFTW. http://www.fftw.org. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986. Lehmann T, Oberschelp W, Pelikan E, Repges R. Bildverarbeitung f¨ur die Medizin. Springer: Berlin, Heidelberg, New York, 1997. J¨ahne B. Digitale Bildverarbeitung (6th edn). Springer: Berlin, Heidelberg, New York, 2006. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004. Morel J, Solimini S. Variational Methods in Image Segmentation. Progress in Nonlinear Differential Equations and their Applications, vol. 14. Birkhaeuser: Boston, 1995. Weickert J. Anisotropic Diffusion in Image Processing. Teubner Verlag: Stuttgart, Germany, 1998. Tikhonov AN, Arsenin VY. Solution of Ill-posed Problems. Winston and Sons: New York, NY, U.S.A., 1977. Hermosillo G. Variational methods for multi-model image matching. Ph.D. Thesis, Universit´e de Nice, France, 2002. Viola P, Wells W. Alignment by maximization of mutual information. International Journal of Computer Vision 1997; 24(2):137–154. Fischer B, Modersitzki J. Fast diffusion registration. AMS Contemporary Mathematics, Inverse Problems, Image Analysis, and Medical Imaging 2002; 313:117–129. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006; 27(5):1594–1607. Clarenz U, Droske M, Henn S, Rumpf M, Witsch K. Computational methods for nonlinear image registration. Technical Report, Mathematical Institute, Gerhard-Mercator University Duisburg, Germany, 2006. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003; 18(1):81–85. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM Journal on Scientific Computing 2005; 27(3):831–849. J¨ager F, Han J, Hornegger J, Kuwert T. A variational approach to spatially dependent non-rigid registration. In Proceedings of SPIE, vol. 6144, Reinhardt J, Pluim J (eds). SPIE: Bellingham, U.S.A., 2006; 860–869. Kabus S, Franz A, Fischer B. On elastic image registration with varying material parameters. In Proceedings of Bildverarbeitung f¨ur die Medizin (BVM), Maintzer H-P, Handels H, Horsch A, Tolxdorff T (eds). Springer: Berlin, Heidelberg, New York, 2005; 330–334.

Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

200

¨ ¨ ¨ M. STURMER, H. KOSTLER AND U. RUDE

42. Henn S, Witsch K. Image registration based on multiscale energy information. Multiscale Modeling and Simulation 2005; 4(2):584–609. 43. Weickert J, ter Haar Romeny B, Viergever M. Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Transactions on Image Processing 1998; 7(3):398–410. 44. H¨ulsemann F, Meinlschmidt S, Bergen B, Greiner G, R¨ude U. Gridlib—a parallel, object-oriented framework for hierarchical-hybrid grid structures in technical simulation and scientific visualization. In High Performance Computing in Science and Engineering, KONWIHR Results Workshop, Garching, Bode A, Durst F (eds). Springer: Berlin, Heidelberg, New York, 2005; 117–128. 45. Freundl C, Bergen B, H¨ulsemann F, R¨ude U. ParEXPDE: expression templates and advanced PDE software design on the Hitachi SR8000. In High Performance Computing in Science and Engineering, KONWIHR Results Workshop, Garching, Bode A, Durst F (eds). Springer: Berlin, Heidelberg, New York, 2005; 167–179.

Copyright q

2007 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:187–200 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:201–218 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.576

Multigrid solution of the optical flow system using a combined diffusion- and curvature-based regularizer H. K¨ostler1 , K. Ruhnau2 and R. Wienands2, ∗, † 1 Department

of Computer Science 10, University of Erlangen-Nuremberg, Erlangen, Germany 2 Mathematical Institute, University of Cologne, Cologne, Germany

SUMMARY Optical flow techniques are used to compute an approximate motion field in an image sequence. We apply a variational approach for the optical flow using a simple data term but introducing a combined diffusion- and curvature-based regularizer. The same data term arises in image registration problems where a deformation field between two images is computed. For optical flow problems, usually a diffusionbased regularizer should dominate, whereas for image registration a curvature-based regularizer is more appropriate. The combined regularizer enables us to handle optical flow and image registration problems with the same solver and it improves the results of each of the two regularizers used on their own. We develop a geometric multigrid method for the solution of the resulting fourth-order systems of partial differential equations associated with the variational approach for optical flow and image registration problems. The adequacy of using (collective) pointwise smoothers within the multigrid algorithm is demonstrated with the help of local Fourier analysis. Galerkin-based coarse grid operators are applied for an efficient treatment of jumping coefficients. We show some multigrid convergence rates, timings and investigate the visual quality of the approximated motion or deformation field for synthetic and real-world images. Copyright q 2008 John Wiley & Sons, Ltd. Received 15 May 2007; Revised 6 December 2007; Accepted 6 December 2007 KEY WORDS:

multigrid; optical flow; image registration; variational approaches in computer vision

1. INTRODUCTION Optical flow is commonly defined to be the motion of brightness patterns in a sequence of images. It was introduced by Horn and Schunck [1], who proposed a differential method to compute the optical flow from pairs of images using a brightness constancy assumption and an additional smoothness constraint on the magnitude of the gradient of the velocity field in order to regularize the problem, what we call diffusion-based regularization. Since then optical flow has been studied ∗ Correspondence

to: R. Wienands, Mathematical Institute, University of Cologne, Weyertal 86-90, 50931 Cologne, Germany. † E-mail: [email protected]

Copyright q

2008 John Wiley & Sons, Ltd.

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

202

intensively and many extensions to that simple variational approach, e.g. considering different regularizing terms, were investigated [2–9]. Optical flow applications range from robotics to video compression and particle image velocimetry (PIV), where optical flow provides approximate motion of fluid flows. Especially for PIV, it is necessary to incorporate physically more meaningful regularizers to be able to impose, e.g. an incompressibility condition of the velocity field. Suter [10] introduced therefore a smoothness constraint on the divergence and curl of the velocity field that was used intensively in the following [11–14]. A well-known regularizer in image registration that is related to optical flow [15] and a special case of a second-order div–curl-based regularizer [10] is the curvaturebased regularizer [16]. The purpose of the curvature-based regularizer is to let affine motion unpenalized while higher-order motions are still used to enforce smoothness. Another advantage of a higher-order regularizer is that for some applications additional information from features or landmarks is given for the optical flow computation [17]. Here, the higher-order regularizer is required to avoid singularities in the solution [18, 19]. We present a variational approach for optical flow with a combined diffusion- and curvaturebased regularizer in Section 2. Please note that the accuracy of optical flow models is usually dominated by the data term. Our main focus is on the impact of the regularization and we use a rather simple data term that also arises in image registration in order to treat both applications with the same solver. As a consequence, we cannot expect to achieve the same accuracy as it is obtained, for example, in [20] where very accurate optical flow models are presented based on an advanced data term. Besides accuracy of the approximate motion field obtained by optical flow, an important goal is to achieve real time or close to real-time performance in many applications, which makes an efficient numerical solution of the underlying system of partial differential equations (PDEs) mandatory. First attempts to use multilevel techniques to speed up optical flow computations are due to Glazer [21] and Terzopoulos [22]. After that, several multigrid-based solvers were proposed for different optical flow regularizers (see, e.g. [23–27]). In [28, 29] efficient cell-centered (nonlinear) multigrid solvers for various optical flow models with diffusion-based regularizers are discussed. Multigrid methods for image registration are e.g. presented in [30–32]. We develop a geometric multigrid method in Section 3 in order to solve the fourth-order system of PDEs derived from our variational approach efficiently. Especially, the existence and efficiency of point smoothing methods are investigated in some detail. Here, we do not apply the classical multigrid theory based on smoothing and approximation property [33] as it is done in [34] for a similar application but we use local Fourier analysis techniques [35–37]. In Section 4, optical flow and image registration results using the combined diffusion and curvature regularizer both for synthetic and real-world images are found. We end this paper with an outlook for future developments, e.g. the extension to isotropic or anisotropic versions of the combined regularizer to deal with discontinuities in the velocity field.

2. VARIATIONAL MODEL AND DISCRETIZATION 2.1. Optical flow The variational approach to compute the motion field as proposed by Horn and Schunck [1] is composed of a data term and a regularizer. The data term is based on the assumption that a moving Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

203

object in the image does not change its gray values, what means that, for example, changes of illumination are neglected. For an image sequence I : ×T → R, ⊂ R2 describing the gray value intensities for each point x = (x, y) in the regular image domain at time t ∈ T = [0, tmax ], tmax ∈ N, this so-called brightness constancy assumption reads dI =0 dt

(1)

This yields the following identity for the movement of a gray value at (x, y, t): I (x, t) = I (x +dx, y +dy, t +dt)

(2)

Taylor expansion of I (x +dx, y +dy, t +dt) around (x, y, t) neglecting higher-order terms and using (2) gives I x u + I y v + It ≈ 0 with the partial image derivatives *I /*x = I x , *I /*y = I y , *I /*t = It and the optical flow velocity vector u = (u, v)T , u := dx/dt, v := dy/dt. Please note that in general I is not differentiable for real-world images. However, usually these images are preprocessed by several steps of a Gaussian filter [2] making sure that the function I is sufficiently smooth. The brightness constancy assumption (1) is used throughout this paper, but by itself results in an ill-posed, under-determined problem. Therefore, additional regularization is required. Horn and Schunck proposed as second assumption a smoothness constraint or a diffusion-based regularizer S1 (u) = ∇u2 +∇v2 and combined both in an energy functional E 1 (u) := (I x u + I y v + It )2 +S1 (u) dx

(3)

that is to be minimized. ∈ R+ represents a weighting parameter. The curvature-based regularizer penalizes second derivatives instead and can be expressed as S2 (u) = (u)2 +(v)2 As already mentioned, it is a special case of the div–curl-based regularizer [10] S2 (u) = 1 ∇div u2 +2 ∇curl u2 where 1 = 2 = 1. We propose a combination of the regularizers S1 (u) and S2 (u) resulting in the combined diffusion- and curvature-based regularizer S3 (u) = S1 (u)+(1−)S2 (u) where ∈ [0, 1]. The corresponding energy functional E 3 (u) is obtained by simply replacing S1 by S3 in (3). The resulting minimization problem is indeed a well-posed problem, which can be seen as follows. Considering only the regularizing part of the energy functional, it can be easily interpreted as a symmetric, positive and elliptic bilinear form for u and v. In such cases, it is well known that the corresponding minimization problem has a unique solution. Since the data term is assumed to be sufficiently smooth (see above), the well-posedness can be concluded for Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

204

the complete variational problem based on E 3 (u), compare with [15] and the references therein. Considerations concerning the well-posedness in a less regular case are covered in [34]. The diffusion-based regularizer only allows small changes of near vectors and produces very smooth motion fields, but it also smoothes edges out. The curvature-based regularizer lets affine motions unpenalized since they are in its kernel. Here, smoothness is achieved by using higherorder motions. We will show for the problems under consideration that the optical flow (and the deformation field derived in image registration, see below) based on the combined regularizer can be computed efficiently and that we obtain more accurate solutions than they are produced by each of the two regularizers used on their own. 2.2. System of PDEs To solve the variational problem introduced above we consider the corresponding Euler–Lagrange equations. Equipped with natural homogeneous Neumann boundary conditions on u, v, u and v they form a well-posed boundary value problem, which constitutes a necessary condition for a minimum of E 3 (u) (see, e.g. [15]). The Euler–Lagrange equations in the image domain read ((1−)(−)2 u +(−)u)+ I x (I x u + I y v + It ) = 0

(4a)

((1−)(−)2 v +(−)v)+ I y (I x u + I y v + It ) = 0

(4b)

The appropriate set of four boundary conditions for = 1 is given by

∇u, n = 0,

∇(u), n = 0,

∇v, n = 0

(5a)

∇(v), n = 0

(5b)

with outward normal n. For = 0, we obtain a fourth-order system, whereas for = 1 the original Horn and Schunck second-order system results where only two boundary conditions are required given by (5a). The biharmonic operator 2 which appears in (4a) is known to lead to poor multigrid performance. Therefore, it is a common approach to split up the biharmonic operator into a system of two Poisson-type equations [36]. Employing this idea, (4a) can be transformed into the following system using additional unknown functions w 1 = −u and w 2 = −v: ⎛ ⎞ ⎛ ⎞ u 0 ⎜ ⎟ ⎜ ⎜ v⎟ ⎜ 0 ⎟ ⎟ ⎜ ⎟ ⎟ (6a) L ⎜ 1⎟ = ⎜ ⎟ ⎜w ⎟ ⎜ −I I ⎝ ⎠ x t ⎝ ⎠ −I y It

w2 with

⎛

−

⎜ ⎜ 0 ⎜ L=⎜ 2 ⎜ Ix ⎝ Ix I y Copyright q

0

−1

0

−

0

−1

Ix I y

(−(1−)+)

0

I y2

0

(−(1−)+)

2008 John Wiley & Sons, Ltd.

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(6b)

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

205

The boundary conditions (5a) and (5b) are transferred into

∇u, n = 0,

∇v, n = 0,

∇w 1 , n = 0,

∇w 2 , n = 0

The determinant of (6b) is given by det(L) = 2 (−1)2 (−)4 +22 (−2 )(−)3 +(2 2 +(1−)(I x2 + I y2 ))(−)2 +(I x2 + I y2 )(−)

(7)

For the special cases = 0 and 1, we obtain det(L) = 2 4 +(I x2 + I y2 )2

and det(L) = 2 2 −(I x2 + I y2 )

respectively. The principle part of det(L) is m with m = 4 for ∈ [0,1) and m = 2 for = 1 due to >0. Hence, four boundary conditions for = 1 are required and two boundary conditions for = 1 (see, e.g. [35, 36]). This requirement is met by our choice of boundary conditions since we use natural homogeneous Neumann boundary conditions on u, v and additionally on −u = w 1 , −v = w 2 , if = 1, according to the minimization of the energy functional, see above. 2.3. Discretization The continuous system (6a), (6b) of four PDEs is discretized by finite differences using the standard five-point central discretization h of the Laplacian (see, e.g. [36]) with x ∈ h and discrete functions u h , vh , wh1 , wh2 . Here, h denotes the discrete image domain, i.e. each x ∈ h refers to a pixel. The mesh size h is usually set to 1 for optical flow applications. The corresponding homogeneous Neumann boundary conditions for the four unknown functions are discretized by central differences as well. Finally, the image derivatives have to be approximated by sufficiently accurate finite differences schemes. A proper accuracy of these derivatives is often essential for the quality of the image-processing result. The discrete operator Lh is then simply given by (6b) where has to be replaced by h and I x , I y by their finite difference approximations I xh , I yh . 2.4. Image registration Image registration is closely related to the optical flow problem. Here, the goal is to compute a deformation field between two images called reference (R(x) := I (x +dx, y +dy, t +dt)) and template (T (x−u(x)) := Tu := I (x, t)) image in the following. We briefly summarize the mathematical model. We also use assumption (1) but do not linearize the data term as for optical flow. That means we try to minimize the energy functional E reg (u) := (R(x)− Tu )2 +S3 (u) dx (8)

with the same boundary conditions as above. Please note that now the data term is nonlinear. To minimize (8), we linearize the whole energy functional and apply an inexact Newton method as described in detail in [30, 32]. Then, starting with an initial approximation u0 the (k +1)th iterate is computed via uk+1 = uk +k v Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

206

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

where we choose the parameter k ∈ R+ such that the energy becomes smaller after each step and the correction v is derived from H E (uk )v = −J E (uk )

(9)

J E := ∇Tu (R(x)− Tu )+(u+(1−)2 u) denotes the Jacobian and H E the Hessian of (8) that is approximated by H E ≈ (∇Tu )2 +(+(1−)2 ). We drop the term ∇ 2 Tu (R(x)− Tu ) since the difference R(x)− Tu should be small for registered images and since second image derivatives are very sensitive to noise and are hard to estimate robustly. System (9) is equivalent to the optical flow system (4) with a slightly different right-hand side and can be treated numerically in the same way.

3. MULTIGRID SOLVER In recent applications, a real-time solution of the optical flow system becomes more and more important. Hence, an appropriate multigrid solver is an obvious choice for the numerical solution of the resulting linear system, since multigrid methods are known to be among the fastest solvers for discretized elliptic PDEs. Multigrid methods (see, e.g. [33, 35, 36, 38, 39]) are mainly motivated by two basic principles. 1. Smoothing principle: Many iterative methods have a strong error smoothing effect if they are applied to discrete elliptic problems. 2. Coarse grid correction principle: A smooth error term can be well represented on a coarser grid where its approximation is substantially less expensive. These two principles suggest the following structure of a two-grid cycle: Perform 1 steps of an iterative relaxation method Sh on the fine grid (pre-smoothing), compute the defect of the current fine grid approximation, restrict the defect to the coarse grid, solve the coarse grid defect equation, interpolate the obtained error correction to the fine grid, add the interpolated correction to the current fine grid approximation (coarse grid correction), perform 2 steps of an iterative relaxation method on the fine grid (post-smoothing). Instead of an exact solution of the coarse grid equation, it can be solved by a recursive application of the two-grid iteration, yielding a multigrid method. We assume standard coarsening here, i.e. the sequence of coarse grids is obtained by repeatedly doubling the mesh size in each space direction, i.e. h → 2h. The crucial point for any multigrid method is to identify the ‘correct’ multigrid components (i.e. relaxation method, restriction, interpolation, etc.) yielding an efficient interplay between relaxation and coarse grid correction. A useful tool for a proper selection is local Fourier analysis. 3.1. Basic elements of local Fourier analysis Local Fourier analysis [35–37] is mainly valid for operators with constant or smoothly varying coefficients. It is based on the simplification that boundary conditions are neglected and all occurring operators are extended to an infinite grid G h := {x = (x, y)T = h(n x , n y )T with (n x , n y ) ∈ Z2 } On an infinite grid, the discrete solution, its current approximation and the corresponding error or residual can be represented by linear combinations of certain exponential functions—the Fourier Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

207

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

components—which form a unitary basis of the space of bounded infinite grid functions, the Fourier space. Regarding our optical flow system composed of four discrete equations, a proper unitary basis of vector-valued Fourier components is given by uh (h, x) := exp(i hx/ h)·I with I = (1, 1, 1, 1)T , √ and complex unit i = −1 yielding the Fourier space

h ∈ := (−, ]2 ,

x ∈ Gh

F(G h ) := span{uh (h, x) : h ∈ } Then, the main idea of local Fourier analysis is to analyze different multigrid components or even complete two-grid cycles by evaluating their effect on the Fourier components. Especially, the analysis of the smoothing method is based on a distinction between ‘high’ and ‘low’ Fourier frequencies governed by the coarsening strategy under consideration. If standard coarsening is selected, each ‘low frequency’ h = h00 ∈ low := (−/2, /2]2 is coupled with three ‘high frequencies’ h11 := h00 −(sign(1 ), sign(2 )),

h10 := h00 −(sign(1 ), 0)

h01 := h00 −(0, sign(2 )) (h11 , h10 , h01 ∈ high := \low ) in the transition from G h to G 2h . That is, the related three high-frequency components are not visible on the coarse grid G 2h as they coincide with the coupled low-frequency component: uh (h00 , x) = uh (h11 , x) = uh (h10 , x) = uh (h01 , x) for x ∈ G 2h This is of course due to the 2-periodicity of the exponential function. 3.2. Measure of h-ellipticity A well-chosen relaxation method obviously has to take care of the high-frequency error components since they cannot be reduced on coarser grids by the coarse grid correction. The measure of h-ellipticity is often used to decide whether or not this can be accomplished by a point relaxation method [35–37]. A sufficient amount of h-ellipticity indicates that pointwise error smoothing procedures can be constructed for the discrete operator under consideration. Dealing with operators based on variable coefficients prevents a direct application of local Fourier analysis. In our discrete system, variable coefficients occur for the image derivatives. However, the analysis can be applied to the locally frozen operator at a fixed grid point n. Replacing the variable x by a constant n, one obtains an operator Lh (n) with constant frozen coefficients. The measure of h-ellipticity for our frozen system of equations is then defined by E h (Lh (n)) := Copyright q

2008 John Wiley & Sons, Ltd.

min{|det( Lh (n, h))| : h ∈ high } max{|det( Lh (n, h))| : h ∈ } Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

208

where the complex (4×4)-matrix ⎛ −h (h) 0 −1 ⎜ ⎜ 0 − h (h) 0 ⎜ Lh (n, h) = ⎜ ⎜ (I xh (n))2 I xh (n)I yh (n) (−(1−) h (h)+) ⎝ I xh (n)I yh (n)

(I yh (n))2

⎞

0

⎟ ⎟ ⎟ ⎟ ⎟ ⎠

−1 0

(−(1−) h (h)+)

0

is the Fourier symbol (for details concerning Fourier symbols for systems of equations, etc. we refer to [35–37]) of Lh (n), i.e. Lh (n)uh (h, x) = Lh (n, h)uh (h, x) The Fourier symbol Lh (n, h) for the system of PDEs is composed of the Fourier symbol of the Laplacian and several constants. The Fourier symbol of the Laplacian reads (compare with [35–37]) 4 − h (h) = 2 (sin2 (1 /2)+sin2 (2 /2)) with h ∈ h h (h) and the image Now, det( Lh (n, h)) is simply given by (7) where −h has to be replaced by − derivatives by the related frozen constants. For the derivation of E h (Lh (n)), it is important to note that − h (h)0. Moreover, for the four coefficients c1 := Ic ,

c2 := 2 2 +(1−)Ic ,

c3 := 22 (−2 ),

c4 := 2 (−1)2

with Ic = (I xh (n))2 +(I yh (n))2 occurring in det( Lh (n, h)), we have c1 , c2 , c3 , c4 0 for >0, ∈ [0, 1]. Since f (x) = c1 x +c2 x 2 +c3 x 3 +c4 x 4 is monotonically increasing for x, c1 , c2 , c3 , c4 0, the minimal (h ∈ high ) and maximal (h ∈ ) values of − (h) and |det( Lh (n, h))| coincide. In particular, we have 2 h (h)) = − h (−/2, 0) = 2 , min (− h

h∈high

max(− h (h)) = − h (, ) = h∈

8 h2

As a consequence, the measure of h-ellipticity for the discrete operator Lh (n) turns out to be E h (Lh (n)) =

8(−1)2 +8(−2 )h 2 +2(2 +(1−)Ic )h 4 +Ic h 6 2048(−1)2 +512(−2 )h 2 +32(2 +(1−)Ic )h 4 +4Ic h 6

For the special cases = 0, 1 this gives E h (Lh (n)) =

4+ Ic h 4 1024+16Ic h 4

and

E h (Lh ) =

2+ Ic h 2 32+4Ic h 2

respectively. Note that E h (Lh (n))>0 for all possible choices of , h>0, ∈ [0, 1], Ic 0. In particular, this means that E h (Lh (n))>0 for all possible values of I xh (n), I yh (n) over the whole discrete image domain, i.e. for arbitrary n ∈ h . This is a strong and very satisfactory robustness result Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

209

for such a complicated system involving several parameters. Even in the limit of small mesh size h → 0, the measure of h-ellipticity is bounded away from zero since we have 1 for = 1 16 lim E h (Lh (n)) = 1 h→0 256 for = 1 3.3. Smoothing method Owing to the above derivations, it can be expected that the optical flow system under consideration is appropriate to point smoothing. The straightforward generalization of a scalar smoothing method to a system of PDEs is a collective relaxation method. This relaxation method sweeps over all grid points x ∈ h in a certain order, for example, in a lexicographic or a red–black manner. At each grid point, the four difference equations are solved simultaneously, i.e. the corresponding variables u h (x), vh (x), wh1 (x) and wh2 (x) are updated simultaneously. This means that a (4×4)-system has to be solved at each grid point. First of all, we have to note that the large sparse matrix that corresponds to the discrete system is neither symmetric nor diagonally dominant. Furthermore, it is not an M-matrix due to positive off-diagonal entries. As a consequence, most of the classical convergence criteria for standard iterative methods such as Jacobi or Gauss–Seidel relaxation do not apply and it has to be expected that these methods might diverge for certain parameter choices. In our numerical tests for collective lexicographic or red–black Gauss–Seidel relaxation (abbreviated by GS-LEX and GSRB, respectively) we always observed an overall convergence, although for certain combinations of , , I x , I y there were single relaxation steps with an increasing residual. An example of such a convergence history is shown in Figure 1 for collective Jacobi, GS-LEX and GS-RB relaxation. However, if a relaxation method is applied within a multigrid algorithm then we are mainly interested in its smoothing properties. That is, the relaxation is aimed at a sufficient reduction of the high-frequency components of the error between the exact solution and the current approximation, see above. A quantitative measure of its efficiency represents the smoothing factor loc obtained

10000

Jacobi GS-RB GS-LEX

100

||Residuum||

1

0.01

1e-04

1e-06

0

500

1000 Iterations

1500

2000

Figure 1. Residual improvement of relaxations. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

210

by local Fourier analysis. loc is defined as the worst asymptotic error reduction by one relaxation step of all high-frequency error components. For more details on local Fourier smoothing analysis, we refer to the literature [35–37]. In case of smoothly varying coefficients, the smoothing factor for Lh (x) can be bounded by the maximum over the smoothing factors for the locally frozen operator, i.e. loc (Lh (x)) = max loc (Lh (n))

(10)

n∈h

As a popular test case, we consider frame 8 of the Yosemite sequence shown in Figure 4. Table I presents the corresponding smoothing factors calculated via (10) for GS-LEX and GS-RB with varying . is fixed at 1500, which turned out to be a proper choice w.r.t. the average angular error (AAE) (11) in many situations, see below. Obviously there is hardly any influence of the parameter on the resulting smoothing factor. We always observe nearly the same smoothing factors as they are well known for the Poisson equation (i.e. = 0.5 for GS-LEX and = 0.25 for GS-RB). Systematic tests show that the same statement is also valid for the parameter . As a consequence, we can expect to obtain the typical multigrid efficiency as long as the coarse grid correction works properly, compare with Section 3.4. The situation is considerably more complicated if we apply decoupled relaxations (compare with [36]) which will be discussed elsewhere. Note that I x and I y are not varying smoothly over the image domain h for this test case. Instead we have moderate jumps in the coefficients. As a consequence, the smoothing factors from Table I are not justified rigorously. However, from practical experience, they can be considered as heuristic but reliable estimates for the actual smoothing properties especially since we only have moderate jumps. To back up the theoretical results from smoothing analysis, we also tested the smoothing effect of the collective relaxations numerically. The smoothing effect of GS-LEX can be clearly seen from Figure 2. Here, the initial (random) error on a 33×33 grid (a scaled down version of frame 8 from the Yosemite sequence) and the error after five collective GS-LEX steps of the first component u of the optical flow velocity vector are shown. Summarizing, there is sufficient evidence that collective damped Jacobi, GS-LEX and GS-RB relaxation are reasonable smoothing methods even though they might diverge for single relaxation steps as stand-alone solvers. 3.4. Coarse grid correction Next to the collective GS relaxation, standard multigrid components are applied. To handle the jumping coefficients in I x and I y , we use Galerkin coarse grid operators. Since there are only moderate jumps it is not necessary to consider operator-dependent transfers but we can stay with straightforward geometric transfers like full-weighting and bilinear interpolation. Throughout our numerical experiments, V (2,2)-cycles are employed (i.e. 1 = 2 pre-relaxations and 2 = 2

Table I. Smoothing factors for GS-LEX and GS-RB, = 1500. GS-LEX GS-RB Copyright q

0

0.4

1

0.49973 0.25003

0.49980 0.25009

0.49970 0.25000

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

211

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

35 30 25 20 15 10 0

5

10

15

5 20

25

30

0 35

4 3.5 3 2.5 2 1.5 1 0.5 0

35 30 25 20 15 10 0

5

10

15

5 20

25

30

35

0

Figure 2. Error smoothing of GS-LEX relaxation for a scaled down version of frame 8 from the Yosemite sequence.

post-relaxations). For details concerning these multigrid components, we refer to the well-known literature again [33, 35, 36, 38, 39]. Since we are interested in a real-time solution, it is necessary to use the full multigrid (FMG) technique (see, e.g. [35, 36]). Here, the initial approximation on the fine grid is obtained by the computation and interpolation of approximations on coarser grids. A properly adjusted FMG algorithm yields an asymptotically optimal method, i.e. the number of arithmetic operations is proportional to the number of grid points, and at the same time, the error of the resulting fine grid solution is approximately equal to the discretization error.

4. EXPERIMENTAL RESULTS Next, the numerical performance of the multigrid solver described above is investigated, and the quality of the variational model is demonstrated. 4.1. Optical flow In general, it is very hard to quantify the quality of the optical flow velocity field. For synthetic image sequences, often a ground truth motion field (see [40] for details) is used to measure the quality of a computed optical flow field by the AAE. It is calculated via (cf. [28])

T 1 uc ue A AE(uc , ue ) = dx (11) arccos || |uc ||ue | where uc = (u c , vc , 1) is the ground truth and ue = (u e , ve , 1) the estimated optical flow vector. Most real-world image sequences do not offer a ground truth motion field; therefore, in this case the quality of the optical flow is often measured visually by plotting the vector field and comparing it with the expected result. For example, one can check whether the vector field is smooth inside objects and edges from different movements are preserved, e.g. objects moving over a static background. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

212

4.1.1. Multigrid performance. All experiments for different combinations of and (see below) were performed using a single FMG-V (2, 2) cycle with collective GS-RB as the smoother. The same visual and AAE results can be also obtained by five V (2, 2) cycles. Input images are smoothed by a discrete Gaussian filter mask (standard deviation = 1.2) in order to ensure a robust computation of the image derivatives by finite difference approximations. For constant coefficients I x and I y , one obtains the typical multigrid convergence factors similar as for the Poisson equation which can be nicely predicted by local Fourier analysis. For jumping coefficients, a slight deterioration of the convergence rate can be observed. Table II lists some representative results. Different values of that are useful for the application do not have a substantial impact on the convergence rates. The best convergence rates are achieved when the combination of and is optimal with respect to the quality of the solution which is an interesting observation by itself. Figure 3 shows an AAE (11) plot over for = 1500. The best quality with

Table II. Convergence rates for the computation of the optical flow from frames 8 and 9 of the Yosemite sequence with = 1500. GS-LEX

GS-RB

Cycle

=0

= 0.4

=1

=0

= 0.4

=1

1 2 3 4 5

0.053 0.054 0.096 0.124 0.131

0.051 0.042 0.065 0.086 0.093

0.048 0.045 0.148 0.196 0.232

0.091 0.070 0.115 0.156 0.172

0.090 0.055 0.069 0.093 0.110

0.074 0.044 0.127 0.181 0.233

10.6

AAE for alpha=500 AAE for alpha=1500 AAE for alpha=5000

10.4 10.2 10

AAE

9.8 9.6 9.4 9.2 9 8.8 8.6 0

0.2

0.4

0.6

0.8

1

beta

Figure 3. AAE plot of the calculated optical flow between pictures 8 and 9 from the Yosemite sequence for = 500, 1500 and 5000. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

213

Table III. Runtimes of the optical flow FMG-V (2, 2) multigrid solver for different image sizes. Size 256×192 256×256 316×252 640×480

Runtime (in ms) 305 420 560 1900

respect to AAE is obtained for ≈ 0.4. On the other hand, the best convergence rates for = 1500 are also obtained for ≈ 0.4 (see Table II). To give an impression of the performance of our optical flow algorithm, we list in Table III runtimes for a FMG-V (2, 2) cycle for different image sizes. The time measurements are done on an AMD Opteron 248 Cluster node with 2.2 GHz, 64 kB L1 cache, 1 MB L2 cache and 4 GByte DDR-333 RAM. Of course, by a hardware-specific performance optimization of the multigrid solver on current architectures these times can be improved for real applications [41, 42]. Summarizing, the multigrid algorithm exhibits a very robust behavior as it was indicated by the investigation of the measure of h-ellipticity. For all possible choices of , and the image derivatives, one obtains nearly the same (excellent) convergence factors as they are known for the Poisson equation. 4.1.2. Quality of the optical flow model. In the following we use two sequences, one synthetic and another real world [43] to evaluate our optical flow model. The Yosemite sequence with clouds, created by Lynn Quam [44], is a rather complex test case (see Figure 4). It consists of 15 frames of size 316×252 and depicts a flight through the Yosemite national park. In this sequence, translational (clouds) and divergent motion (flight) is present. Additionally, we have varying illumination in the region of the clouds; thus, our constant brightness assumption is not fulfilled there. All tests were obtained with frames 8 and 9 of the Yosemite sequence. First, we consider in Figure 3 the AAE for = 500, 1500, 5000 and varying . = 500 was chosen because it was tested to give the optimal value—w.r.t. a minimal AAE—for the second-order system. The combined regularizer produces the best result. It is able to outperform both the diffusion-based and also the curvature-based regularizer. Since the AAE is measured over the whole image domain, also small improvements of the AAE can lead to a substantial improvement in the local visual quality of the resulting optical flow field. Figure 4 shows image details of the resulting velocity fields for the Yosemite sequence, where we choose = 1500 for a visual comparison of different values of . The right half of this detail includes the high mountain from the middle of the images. The mountains are moving from right to left, whereas the clouds region is moving (pure horizontally) from left to right. For = 1, one can see the usual behavior of the original Horn and Schunck regularizer, which tries to produce a smooth solution even over the mountain crest. The fourth-order system performs better in this regard, as the region of influence is notably smaller, for example, at the right crossover. The combined regularizer with = 0.4 exhibits a mixture of both effects and leads to a smaller AAE over the whole image. One can also observe that all methods fail to calculate the pure horizontal flow in the clouds region. That is due to the fact that the brightness varies here and thus the constant brightness assumption of the data term does not hold. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

214

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185 50 50 50

55

55

55

60

60

60

65

65

65

70

70

70

75

75

75

Figure 4. First line: Frames 8 and 9 from Yosemite sequence. Second line: A detail from the optical flow located left from the highest mountain in the middle of the image (marked in frame 8). It was calculated with = 1500 and (from left to right) = 0, 0.4 and 1.

The second sequence shows rotating particles and is related to PIV. However, we do not use the standard models like a div–curl regularizer for PIV but our variational approach. Our goal is to visualize the difference in the diffusion- and curvature-based regularizer at a vortex, where the latter is able to resolve the vortex much better which can be nicely observed in Figure 5. 4.2. Medical image registration For simplicity, we quantify the registration error by the relative sum of squared differences (SSD) error (see, e.g. [15]) SSD :=

R(x)− Tu R(x)− T (x)

However, for medical applications it is not always useful to force a very small relative SSD error, but to maintain the topology of the medical data, i.e. to keep structures like bones. In Figure 6, we depict two medical images of a human brain and their registration results. After five Newton steps, we achieve SSD = 0.1 for = 0 and SSD = 0.08 for = 0.05. A diffusion-based regularizer is not suitable here and leads to SSD = 0.3. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

215

Figure 5. First line: two frames of a rotating particle sequence (size 512×512). Second line: the resulting optical flow field for = 500 at the vortex for the diffusion-based regularizer (left) and the curvature-based regularizer (right).

5. CONCLUSIONS AND OUTLOOK We presented and evaluated a combined diffusion- and curvature-based regularizer for optical flow and the related image registration. The arising fourth-order system of PDEs was solved efficiently by a geometric multigrid solver. Here, it shows that the best results are obtained, when the weighting between regularizer and brightness constancy assumption is chosen such that the multigrid solver shows an optimal convergence rate. This is an interesting observation and it has to be investigated, if this can be used to choose the weighting parameter automatically. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

216

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

Figure 6. First line: template image (left) and reference image (right) showing a human brain (size 256×256). Second line: registration results (from left to right) with = 3 for = 0 and 0.05.

To improve the static weighting of the regularizer, which produces an equally smooth solution throughout the picture, one could allow a space-dependent parameter in order to deal with discontinuities in the solution. Next steps are the extension of the regularizer to the physically motivated div–curl-based regularizer, or nonlinear regularizers, where and depend on the velocity field. Furthermore, we wish to apply the curvature-based regularizer to motion blur computed by a combined optical flow and ray tracer motion field [17]. This should help to overcome the problem of the diffusion-based regularizer that introduces singularities in the Euler–Lagrange equations, since some motion vectors are fixed within the optical flow model. For image registration, it is an interesting task to extend the model to 3D in order to be able to register 3D medical data sets. REFERENCES 1. Horn B, Schunck B. Determining optical flow. Artificial Intelligence 1981; 17:185–203. 2. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986. 3. Nagel H-H, Enkelmann W. An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986; 8(5):565–593. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM

217

4. Galvin B, McCane B, Novins K, Mason D, Mills S. Recovering motion fields: an evaluation of eight optical flow algorithms. British Machine Vision Conference, Southampton, 1998. 5. Verri A, Poggio T. Motion field and optical flow: qualitative properties. IEEE Transactions on Pattern Analysis and Machine Intelligence 1989; 11(5):490–498. 6. Haussecker H, Fleet D. Computing optical flow with physical models of brightness variation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2001; 23(6):661–673. 7. Weickert J, Schn¨orr C. A theoretical framework for convex regularizers in PDE-based computation of image motion. International Journal of Computer Vision 2001; 45(3):245–264. 8. Weickert J, Schn¨orr C. Variational optic flow computation with a spatio-temporal smoothness constraint. Journal of Mathematical Imaging and Vision 2001; 14(3):245–255. 9. Brox T, Weickert J. Nonlinear matrix diffusion for optic flow estimation. In Pattern Recognition, van Gool L (ed.). Lecture Notes in Computer Science, vol. 2449. Springer: Berlin, 2002; 446–453. 10. Suter D. Motion estimation and vector splines. Proceedings of the Conference on Computer Vision and Pattern Recognition, Los Alamos, U.S.A., 1994; 939–948. 11. Gupta S, Prince J. Stochastic models for div–curl optical flow methods. IEEE Signal Processing Letters 1996; 3(2):32–34. 12. Corpetti T, M´emin E, P´erez P. Dense estimation of fluid flows. IEEE Transactions on Pattern Analysis and Machine Intelligence 2002; 24(3):365–380. 13. Kohlberger T, M´emin E, Schn¨orr Ch. Variational dense motion estimation using the Helmholtz decomposition. In Fourth International Conference on Scale Space Methods in Computer Vision, Griffin L, Lillholm M (eds), Isle of Skye, U.K. Lecture Notes in Computer Science, vol. 2695. Springer: Berlin, 2003; 432–448. 14. Corpetti T, Heitz D, Arroyo G, M´emin E, Santa-Cruz A. Fluid experimental flow estimation based on an opticalflow scheme. Experiments in Fluids 2006; 40(1):80–97. 15. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004. 16. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003; 18(1):81–85. 17. Zheng Y, K¨ostler H, Th¨urey N, R¨ude U. Enhanced motion Blur calculation with optical flow. Proceedings of Vision, Modeling and Visualization, RWTH Aachen, Germany. Aka GmbH, IOS Press: Berlin, 2006; 253–260. 18. Fischer B, Modersitzki J. Combining landmark and intensity driven registrations. PAMM 2003; 3(1):32–35. 19. Galic I, Weickert J, Welk M, Bruhn A, Belyaev A, Seidel H. Towards PDE-based image compression. Proceedings of Variational, Geometric, and Level Set Methods in Computer Vision. Lecture Notes in Computer Science. Springer: Berlin, Heidelberg, New York, 2005; 37–48. 20. Papenberg N, Bruhn A, Brox T, Didas S, Weickert J. Highly accurate optic flow computation with theoretically justified warping. International Journal of Computer Vision 2006; 67(2):141–158. 21. Glazer F. Multilevel relaxation in low-level computer vision. In Multi-Resolution Image Processing and Analysis, Rosenfeld A (ed.). Springer: Berlin, 1984; 312–330. 22. Terzopoulos D. Image analysis using multigrid methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986; 8:129–139. 23. Enkelmann W. Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences. Computer Vision, Graphics, and Image Processing 1988; 43:150–177. 24. Battiti R, Amaldi E, Koch C. Computing optical flow across multiple scales: an adaptive coarse-to-fine strategy. International Journal of Computer Vision 1991; 6(2):133–145. 25. Kalmoun EM, R¨ude U. A variational multigrid for computing the optical flow. In Vision, Modeling and Visualization, Ertl T, Girod B, Greiner G, Niemann H, Seidel HP, Steinbach E, Westermann R (eds). Akademische Verlagsgesellschaft: Berlin, 2003; 577–584. 26. Kalmoun EM, K¨ostler H, R¨ude U. 3D optical flow computation using a parallel variational multigrid scheme with application to cardiac C-arm CT motion. Image and Vision Computing 2007; 25(9):1482–1494. 27. Christadler I, K¨ostler H, R¨ude U. Robust and efficient multigrid techniques for the optical flow problem using different regularizers. In Proceedings of 18th Symposium Simulations Technique ASIM 2005, H¨ulsemann F, Kowarschik M, R¨ude U (eds). Frontiers in Simulation, vol. 15. SCS Publishing House: Erlangen, 2005; 341–346. Preprint version published as Technical Report 05-6. 28. Bruhn A. Variational optic flow computation: accurate modeling and efficient numerics. Ph.D. Thesis, Department of Mathematics and Computer Science, Saarland University, Saarbr¨ucken, Germany, 2006. 29. Bruhn A, Weickert J, Kohlberger T, Schn¨orr C. A multigrid platform for real-time motion computation with discontinuity-preserving variational methods. International Journal of Computer Vision 2006; 70(3):257–277. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

218

¨ H. KOSTLER, K. RUHNAU AND R. WIENANDS

30. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006; 27(5):1594–1607. 31. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM Journal on Scientific Computing 2005; 27(3):831–849. 32. H¨omke L. A multigrid method for anisotropic PDEs in elastic image registration. Numerical Linear Algebra with Applications 2006; 13(2–3):215–229. 33. Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985. 34. Keeling SL, Haase G. Geometric multigrid for high-order regularizations of early vision problems. Applied Mathematics and Computation 2007; 184(2):536–556. 35. Brandt A. Multigrid techniques: 1984 guide with applications to fluid dynamics. GMD-Studie Nr. 85, Sankt Augustin, West Germany, 1984. 36. Trottenberg U, Oosterlee C, Sch¨uller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001. 37. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. In Numerical Insights, vol. 5. Chapman & Hall/CRC Press: Boca Raton, FL, U.S.A., 2005. 38. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000. 39. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004. 40. McCane B, Novins K, Crannitch D, Galvin B. On benchmarking optical flow. Computer Vision and Image Understanding 2001; 84(1):126–143. 41. Douglas C, Hu J, Kowarschik M, R¨ude U, Weiß C. Cache optimization for structured and unstructured grid multigrid. Electronic Transactions on Numerical Analysis 2000; 10:21–40. 42. H¨ulsemann F, Kowarschik M, Mohr M, R¨ude U. Parallel geometric multigrid. In Numerical Solution of Partial Differential Equations on Parallel Computers, Chapter 5, Bruaset A, Tveito A (eds). Lecture Notes in Computational Science and Engineering, vol. 51. Springer: Berlin, Heidelberg, New York, 2005; 165–208. 43. Barron J, Fleet D, Beauchemin S. Performance of optical flow techniques. International Journal of Computer Vision 1994; 12(1):43–77. 44. Heeger D. Model for the extraction of image flow. Journal of the Optical Society of America A: Optics, Image Science, and Vision 1987; 4(8):1455–1471.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:201–218 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:219–247 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.579

A semi-algebraic approach that enables the design of inter-grid operators to optimize multigrid convergence Pablo Navarrete Michelini1, 2, ∗, † and Edward J. Coyle3 1 Center

for Wireless Systems and Applications, Purdue University, 465 Northwestern Ave., West Lafayette, IN 47907-2035, U.S.A. 2 Department of Electrical Engineering, Universidad de Chile, Av. Tupper 2007, Santiago, RM 8370451, Chile 3 School of Electrical and Computer Engineering, Georgia Institute of Technology, 777 Atlantic Dr. NW, Atlanta, GA 30332-0250, U.S.A.

SUMMARY We study the effect of inter-grid operators—the interpolation and restriction operators—on the convergence of two-grid algorithms for linear models. We show how a modal analysis of linear systems, along with some assumptions on the normal modes of the system, allows us to understand the role of inter-grid operators in the speed and accuracy of a full-multigrid step. We state an assumption that generalizes local Fourier analysis (LFA) by means of a precise description of aliasing effects on the system. This assumption condenses, in a single algebraic property called the harmonic aliasing property, all the information needed from the geometry of the discretization and the structure of the system’s eigenvectors. We first state a harmonic aliasing property based on the standard coarsening strategies of 1D problems. Then, we extend this property to a more aggressive coarsening typically used in 2D problems with the help of additional assumptions on the structure of the system matrix. Under our general assumptions, we determine the exact rates at which groups of modal components of the error evolve and interact. With this knowledge, we are then able to design inter-grid operators that optimize the two-grid algorithm convergence. By different choices of operators, we verify the classic heuristics based on Fourier harmonic analysis, show a trade-off between the rate of convergence and the number of computations required per iteration, and show how our analysis differs from LFA. Copyright q 2008 John Wiley & Sons, Ltd. Received 15 May 2007; Revised 9 November 2007; Accepted 14 December 2007

KEY WORDS:

multigrid algorithms; inter-grid operators; convergence analysis; modal analysis; aliasing

∗ Correspondence

to: Pablo Navarrete Michelini, Departamento de Ingenier´ıa El´ectrica, Universidad de Chile, Av. Tupper 2007, Santiago, RM 8370451, Chile. † E-mail: [email protected]

Copyright q

2008 John Wiley & Sons, Ltd.

220

P. NAVARRETE MICHELINI AND E. J. COYLE

1. INTRODUCTION We are interested in applications of the multigrid algorithm in the distributed sensing and processing tasks that arise in the design of wireless sensor networks. In such scenarios, the inexpensive, low-power, low-complexity sensor motes that are the nodes of the network must perform all computation and communication tasks. This is very different than the scenarios encountered in the implementation of multigrid algorithms on large parallel machines for the following reasons: • Sensor motes are battery powered and must operate unattended for long periods of time. The design of algorithms that run on them must therefore attempt to minimize the number of computations each node must perform and the number of times it must communicate because both functions consume energy. Of the two functions, communication is the most energy intensive per bit of data. • Communication between sensor motes is carried out in hop-by-hop fashion, since the energy required to send data over a distance d is proportional to d with 24. Thus, the sensor motes communicate directly only with their nearest neighbors in any direction. • Re-executing an algorithm after adjusting parameters or models is very difficult or might not even be possible because of the remote deployment of the network. It is thus critical that the algorithms used to perform various tasks be as robust and well understood as possible before they are deployed. In implementations of multigrid algorithms on networks like these, as in many other applications of multigrid algorithms, it is thus essential that the convergence rate of the algorithm be optimized. This minimizes the number of communication and computation steps of the algorithm. It also leads to interesting insights in the design of each step, highlighting both trade-offs between the different costs of computations within each node and communications between nodes, and the need for low complexity in each step of the algorithm. Finally, in such applications the multigrid methods must be very robust in order to ensure the continuous operation of the whole system. This task is difficult because it is likely that the system model varies throughout the field. The current theory of algebraic multigrid (AMG) offers one possible solution to this problem [1–4]. Unfortunately, the convergence results obtained so far in the theory of AMG are not as strong as the theory for linear operators with constant stencil coefficients [5]. As optimal convergence behavior is critical under our particular distributed scenario, we seek a more flexible yet still rigorous convergence analysis. The goal of this paper is thus to introduce a new convergence analysis based on a modal decomposition of the system and a precise description of aliasing phenomena on coarse systems. The purpose of this analysis is to provide tools that enable the design of coarsening strategies as well as inter-grid and smoothing operators. We try to stay close to the technique of local Fourier analysis (LFA)‡ introduced by Achi Brandt [5, 6] as it is a powerful technique for quantitative convergence analysis. The essential difference between LFA and our approach is that we drop the requirement of constant stencil coefficients. By doing so, the eigenvectors of a linear operator will no longer be the so-called harmonic grid functions used in LFA [7], which in this paper we call ‡

Originally called local mode analysis (LMA); we chose the nomenclature used in [7] as it emphasizes the essential difference with the approach introduced in this paper.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

221

Fourier harmonic modes. The properties of the system must thus be constrained in some way in order to develop new tools for convergence analysis. The requirement we focus on is an explicit description of the aliasing effects produced by the coarsening strategy. The aliasing of Fourier harmonic modes is present in LFA through the concept of spaces of harmonics [7]. We identify its simple form as one of the reasons why LFA is so powerful. Based on this fact, we assume a more general aliasing pattern that still allows us to characterize convergence behavior. This assumption condenses, in a single algebraic property called the harmonic aliasing property, all the information needed from the geometry of the discretization and the structure of the eigenvectors. If this property is satisfied, then no more information is needed from the system and the analysis is completely algebraic. Therefore, our analysis could be considered a semi-algebraic approach to the study of convergence issues and the design of efficient inter-grid operators. One of the practical advantages of our approach is that we are able to separate the problem of coarsening from what we call filtering, i.e. interpolation/restriction weights and smoothing operations. The analysis of each problem makes no use of heuristics. The coarsening strategy is designed to ensure a convenient aliasing pattern whereas the design of the filters is meant to optimize multigrid convergence. The main difficulty of our approach is the dependence of the assumptions on the eigenvectors of the system. In practical applications, it is very unlikely that this information is available. Therefore the verification of the assumptions remains unsolved. Nevertheless, this problem is also shared in many fields in which transient or local phenomena do not allow a proper use of Fourier analysis [8]. There have been many efforts to identify suitable bases for specific problems and the goal of this work is to open this problem in multigrid analysis. For these reasons, the results of this paper are not entirely conclusive about optimization strategies for coarsening and filtering. They are, however, an important first step toward this goal. In Section 2 we provide the notation and the essential properties of the multigrid algorithm for further analysis. In Section 3 we list the assumptions needed on the algorithm and system in order to apply our analysis. In Section 4 we list the additional assumptions needed on 2D systems in order to extend our analysis. In Section 5 we derive the main results about the influence of inter-grid operators on multigrid convergence and verify the classic heuristics of Fourier harmonic analysis. In Section 6 we provide examples that show how to use our analysis and also on how our analysis differs from the classical LFA.

2. THE ELEMENTS OF MULTIGRID ALGORITHMS We wish to solve discrete linear systems of the form Au = f , defined on a grid h with step size h ∈ R+ defined as the largest distance between neighboring grid nodes. A coarse grid s is defined as a set of nodes such that s ⊂ h and s>h. We define the so-called inter-grid operators, regardless of their use in the multigrid algorithm, as any linear transformation between scalar fields on h and s . That is, Ish ∈ R|

h

|×|s |

and

Ihs ∈ R|

s

|×|h |

(1)

where Ish is the interpolation operator and Ihs is the restriction operator. We introduce a notation with markers ‘ ˇ ’ or ‘ ˆ ’ to indicate transfers from a finer or coarser grid, respectively. We are Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

222

P. NAVARRETE MICHELINI AND E. J. COYLE

then interested in the following operations: xˇ = Ihs x,

x ∈ R |

yˆ = Ish y,

y ∈ R |

h

| s

(2) |

(3)

|×|h |

(4)

and Aˇ = Ihs AIsh ,

A ∈ R |

h

The definition of the coarsening operator in (4) follows the Galerkin condition and is standard in most multigrid applications [9]. We consider a full two-grid approach consisting of a nested iteration step, as shown in Figure 1, and 1 iterations of the Correction Scheme, including 1 pre-smoothing and 2 post-smoothing iterations, as shown in Figure 2. Here, the vector vk is the kth approximation of the exact solution h of the linear system, u ∈ R| | . Similarly, the vector ek = u −vk is the approximation error after the kth step of the algorithm. One smoothing iteration is characterized by the smoothing operator S; after each iteration the approximation error evolves as ek+1 = Sek . Because of this property we also call S the smoothing filter. From these diagrams, it follows that the approximation error between smoothing iterations in the correction scheme is given by e1 +1 = K e1

(5)

Figure 1. Diagram of a nested iterations step. The dotted line separates problems from the fine and coarse grid domains. The interpolation (restriction) operation is applied to vectors crossing the dotted line from below (above).

Figure 2. Diagram of a correction scheme step using 1 pre-smoothing iterations and 2 post-smoothing iterations (e.g. Gauss Seidel, Jacobi, Richardson, etc.). The dotted line separates problems from the fine and coarse grid domains. The interpolation (restriction) operation is applied to vectors crossing the dotted line from below (above). Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

223

DESIGN OF INTER-GRID OPERATORS

and similarly, the initial approximation error, e0 , using nested iteration is given by e0 = K u

(6)

where u is the exact solution of the linear system and K is the so-called coarse grid correction matrix [10] defined as K = I − Ish Aˇ −1 Ihs A

(7)

This matrix is the target of our analysis in Section 5 as it controls all of the convergence features of the two-grid scheme. Considering the effect of smoothing iterations, the error in the whole correction scheme evolves as e1 +1+2 = S 2 K S 1 e0

(8)

In the multiple-grid case, a recursive application of nested iterations and the correction scheme is used to solve coarse system equations, as shown in Figure 3. Since coarse systems are not solved with exact accuracy, the approximation error evolves differently. Here, the error depends on the accuracy of the solutions from the coarse grids. Thus, matrix K used above is replaced by a different matrix, denoted by K 1 , which is obtained from the following recursions: K L = 0, A1 = A A j = Aˇ j−1 , with j = 2, . . . , L −1 and j−1

K j−1 = I − I j j−1

(9)

j [I −(S j 2 K j S j 1 ) j K j ]( Aˇ j−1 )−1 I j−1 A j−1 ,

with j = L , . . . , 2

j

where S j , I j , and I j−1 are the smoothing, interpolation, and restriction operators chosen at level j, and j is the number of iterations of the correction scheme used at level j. Then, the approximation error evolves as e0 = K 1 u in nested iterations and it evolves as e1 +1 = K 1 e1 between smoothing iterations of the correction scheme. Although our analysis is technically applicable to the full multiple-grid case, the coupling between different levels makes the algebra tedious. Therefore, we concentrate on the two-level case and for the multiple-grid case we assume that the problem in coarse levels has been solved with enough accuracy so that matrices (S j 2 K j S j 1 ) j K j can be neglected and we can work under the two-grid assumptions.

Figure 3. Diagram of the recursive full multigrid approach using one iteration of the correction scheme per level. Each box represents a number of pre- or post-smoothing iterations. The particular choice of using the same combination of pre-/post-smoothing iterations on different correction scheme steps is considered. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

224

P. NAVARRETE MICHELINI AND E. J. COYLE

3. ASSUMPTIONS ABOUT THE ALGORITHM AND THE SYSTEM Two assumptions are needed in order to derive our convergence results. First, we introduce a decomposition of the inter-grid interpolation/restriction operators into up-/down-sampling and filtering operations, a standard approach in digital signal processing [8, 11]. Second, we assume that the operators and the system possess the same basis of eigenvectors and we establish a condition on these eigenvectors under (up-/down)-sampling operations. These conditions are motivated by standard Fourier harmonic analysis but they are not restricted to systems with Fourier harmonic modes as eigenvectors. 3.1. System modes Assuming that A is a diagonalizable square matrix, we define its eigen-decomposition as A = W V T

(10)

Here, the diagonal matrix contains the eigenvalues of A on its diagonal. The columns of the matrix W are the right-eigenvectors of A, i.e. AW = W . The columns of the matrix V contain the left-eigenvectors of A, i.e. V T A = V T . The column vectors of W and V form a biorthogonal basis since it follows from the above definitions that V TW = I

(11)

If A is a symmetric matrix, then V = W and the column vectors of W form an orthogonal basis. It is important to note that from this point on our analysis differs from LFA. In LFA it is assumed that the stencil of A, denoted as the row vector s, is not dependent on the position of the grid nodes to which it is applied. When this is true, the operation Ax can be expressed as the convolution: (Ax)n =

(s)k (x)n+k

(12)

k

where (Ax)n denotes the nth component of the vector Ax. This implies that the eigenvectors of A are Fourier harmonic modes. In other words, if (w)k = ei k then Aw = s()w where s() is the Fourier transform of the stencil sequence. In our analysis, the stencil can depend on the position of the grid nodes to which it is applied. In this case, the operation Ax can be expressed as (Ax)n =

(sn )k (x)n+k

(13)

k

and then the eigenvectors of A need not be Fourier harmonic modes. Later on we will make assumptions about the eigenvectors of A that are related to the coarsening strategy of the multigrid approach. This does, of course, limit the scope of our analytical approach, but it can still be applied to a broader family of operators than LFA. The examples in Sections 6.2 and 6.3 will make this point very clear. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

225

3.2. Smoothing filters We assume that the smoothing operator S used in the two-grid algorithm, as defined in Section 2, has the same eigenvectors as A. That is, S = W V T

(14)

where is a diagonal matrix with the eigenvalues of matrix S. The diagonal values in represent the factor by which each modal component of the approximation error is multiplied after one smoothing iteration. As in LFA, our analysis is also applicable to smoothers of the form A+ ek+1 = A− ek with A = A+ − A− [7], e.g. Gauss–Seidel with lexicographical ordering for constant stencil operators, assuming that both A+ and A− have the same eigenvectors as A. The smoothing operator is then given by S = W (+ )−1 − V T

(15)

where + and − are diagonal matrices with the eigenvalues of A+ and A− , respectively. 3.3. Inter-grid filters In our analysis of multigrid convergence, it is useful to decompose the inter-grid operators defined in Section 2 into two consecutive operations. For two grid levels, with the fine grid h and the coarse grid s , we first identify the operation of selecting nodes from the fine grid for the coarse grid. This leads to the following definitions: Definition 1 (Down-/up-sampling matrices) The down-sampling matrix D ∈ R|s |×|h | is defined as 1 if node j ∈ h is the ith selected node (D)i, j = 0 otherwise

(16)

The up-sampling matrix U ∈ R|h |×|s | is defined as U = DT

(17)

A similar definition for an unselecting operation which will be useful in Section 6 is Definition 2 (Down-/up-unselecting matrices) The down-unselecting matrix D¯ is defined as 1 if node j ∈ h is the ith unselected node ¯ i, j = ( D) 0 otherwise

(18)

The up-unselecting matrix U¯ is defined as U¯ = D¯ T . An important property that follows from these definitions is DU = I˜ Copyright q

2008 John Wiley & Sons, Ltd.

(19) Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

226

P. NAVARRETE MICHELINI AND E. J. COYLE

where I˜ ∈ R|s |×|s | is the identity matrix in the coarse grid. On the other hand, the matrix U D ∈ R|h |×|h | is a diagonal matrix with 1 in the diagonal whenever i = j is a selected node and 0 otherwise. Now, we can decompose the inter-grid operators Ish and Ihs , as defined in Section 2, into the following matrix products: Ish = FI U,

with FI ∈ R|

Ihs = D FR ,

with FR ∈ R|

h

|×|h | h

|×| | h

and

(20)

where the square matrices FI and FR are called the interpolation and restriction filters, respectively. Although this kind of decomposition is widely used in digital signal processing [8, 11], it has not been used for convergence analysis of multigrid algorithms. In the case that the variational property Ihs = c(Ish )T is assumed, the inter-grid filters reduce to a single filter F given by F = FR = c(FI )T

(21)

The inter-grid operator decomposition applies to any kind of inter-grid operators. Now, we restrict our analysis to the set of inter-grid filters that have the same eigenvectors as the system matrix A. That is, we assume inter-grid filters of the form FI = W I V T FR = W R V

and

T

(22)

where I and R are diagonal matrices and their diagonal coefficients represent the damping effect of the filters on the corresponding eigenvector. 3.4. The harmonic aliasing property From its earliest formulation, multigrid heuristics have always been based on Fourier harmonic analysis. The idea of reducing high- and low-frequency components of the approximation error can be found in almost any book or tutorial on the subject. In this paper, we generalize this to a modal analysis where the eigenvectors (or modes) are not necessarily Fourier harmonic modes. We keep the notion of harmonic analysis in a more general way. By harmonic modes now we mean a set of vectors with a certain property that, generally speaking, will preserve the notion of self-similarity through the aliasing of different modes after down-sampling. As an example, in Section 6.2 we will mention ‘square-wave’ like functions that do not fit within the scope of LFA. We introduce this property because the aliasing effects of Fourier harmonic modes are essential to revealing the role of the smoothing and inter-grid filters in multigrid convergence. Therefore, we need to define this property for our more general modal analysis. Since the application of the following property will be constrained to 1D systems, we will start using a subindex x as a label that indicates the dimension where the operations apply. Then, we state the harmonic aliasing property as follows: Definition 3 (Harmonic aliasing property) A set of biorthogonal eigenvectors, Wx and Vx , and a down-sampling matrix Dx have the harmonic aliasing property if there exists an ordering of eigenvectors for which VxT Ux Dx Wx = N x Copyright q

2008 John Wiley & Sons, Ltd.

(23) Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

227

where Ux = DxT is the up-sampling matrix and N x is the harmonic aliasing pattern that we define to be 1 I˜x I˜x Nx = (24) 2 I˜x I˜x We must note that the harmonic aliasing property only involves the eigenvectors of the system and the down-/up-sampling operator. Although this is a strong assumption on the system, it only involves the down-sampling operator from the multigrid algorithm. It does not depend on the smoothing and inter-grid filters. This is an important consequence of the inter-grid operator decomposition. The definition above implicitly assumes a down-sampling by a factor of 2 and naturally induces a partition of the eigenvectors into two sets, say Wx = [W L x W H x ] for the right-eigenvectors and Vx = [VL x VH x ] for the left-eigenvectors. The subscripts L x and H x resemble the standard Fourier harmonic analysis used to distinguish between low- and high-frequency modes (see for instance [10]). Using these partitions, we can restate the harmonic aliasing property. For that purpose we state the following definition: Definition 4 (Surjective property) A set of biorthogonal eigenvectors, Wx and Vx , and a down-sampling matrix Dx have the surjective property if there exists an ordering of the eigenvectors for which the partitions Wx = [W L x W H x ] and Vx = [VL x VH x ] fulfill the following conditions: Dx W L x = Dx W H x

(25)

and D x VL x = D x V H x

(26)

Theorem 1 The surjective property is equivalent to the harmonic aliasing property. Proof First, we have to note that, given the partitions Wx = [W L x W H x ] and Vx = [VL x VH x ], we can rewrite the harmonic aliasing property as the following set of biorthogonal relationships: (Dx VL x )T (Dx W L x ) = 12 I˜x

(27)

(Dx VL x )T (Dx W H x ) = 12 I˜x

(28)

(Dx VH x )T (Dx W L x ) = 12 I˜x

(29)

and (Dx VH x )T (Dx W H x ) = 12 I˜x

(30)

Then, since Wx and Vx form a biorthogonal basis, we have Wx VxT = W L x VLTx + W H x VHT x = I x Copyright q

2008 John Wiley & Sons, Ltd.

(31)

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

228

P. NAVARRETE MICHELINI AND E. J. COYLE

By pre-multiplication by Dx and post-multiplication by Ux , we obtain (Dx Wx )(Dx Vx )T = (Dx W L x )(Dx VL x )T +(Dx W H x )(Dx VH x )T = I˜x

(32)

From here, if we assume the surjective property, then Equation (32) immediately implies the set of biorthogonal relationships above, and the harmonic aliasing property is fulfilled. Now, we assume the harmonic aliasing property holds and we pre-multiply Equation (32) by (Dx VL x )T . Using Equations (27) and (28) we obtain (Dx VL x )T (Dx W L x )(Dx VL x )T +(Dx VL x )T (Dx W H x )(Dx VH x )T = (Dx VL x )T T 1 T 1 2 (Dx VL x ) + 2 (Dx V H x ) (Dx VL x )T

= (Dx VL x )T

(33)

= (Dx VH x )T

Similarly, we post-multiply Equation (32) by Dx W H x . Using Equations (28) and (30), we obtain (Dx W L x )(Dx VL x )T (Dx W H x )+(Dx W H x )(Dx VH x )T (Dx W H x ) = Dx W H x (Dx W L x ) 12 +(Dx W H x ) 12 = Dx W H x

(34)

Dx W L x = Dx W H x Therefore, the harmonic aliasing property implies the surjective property.

4. ASSUMPTIONS FOR SEPARABLE BASIS SYSTEMS In Section 3 we stated assumptions that will allow us to understand the role of the smoothing and inter-grid filters in multigrid convergence. The assumptions stated in Section 3 do not allow the study of many multigrid applications. Specifically, when using the multigrid algorithm in d-dimensional problems, the down-sampling is often designed to reduce the number of grid nodes by a factor of 2d . On the other hand, the harmonic aliasing property, as stated in Section 3.4, is essentially applicable only for cases where the grids are down-sampled by a factor of 2. The down-sampling by a factor of 2d is important to reduce the computational and space costs of the algorithm. In this section, we assume further properties in the algorithm and system so that our analysis can be extended to these cases. For these extensions we use the tensor product defined as: Definition 5 (Kronecker product) If A is an m ×n matrix and B is a p ×q matrix, then the Kronecker product A ⊗ B is the mp ×nq block matrix: ⎡ ⎤ (A)1,1 B · · · (A)1,n B ⎢ ⎥ ⎢ ⎥ .. .. .. A⊗ B =⎢ (35) ⎥ . . . ⎣ ⎦ (A)m,1 B Copyright q

2008 John Wiley & Sons, Ltd.

···

(A)m,n B Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

229

The most useful properties of Kronecker products for the purpose of our analysis are (A ⊗ B)(C ⊗ D) = AC ⊗ B D

(36)

(A ⊗ B)−1 = A−1 ⊗ B −1

(37)

and For further properties, we refer the reader to [12, 13]. 4.1. Separability assumptions We now assume that we have a system matrix representing a 2D system with coordinates x and y. We denote the system matrix as A x y ∈ Rmn×mn , where the integers m and n represent the discretization size of the dimensions corresponding to x and y, respectively. We assume that the system matrix can be expressed as the sum of Kronecker products: A x y = A x,1 ⊗ A y,1 +· · ·+ A x,r ⊗ A y,r =

r

A x,i ⊗ A y,i

(38) (39)

i=1

where A x,i ∈ Rm×m and A y,i ∈ Rn×n , with i = 1, . . . ,r , representing r possible operators acting on the dimensions x and y, respectively. We assume that the matrices A x,i , i = 1, . . . ,r , have the same set of eigenvectors Wx and Vx , the matrices A y,i , i = 1, . . . ,r , have the same set of eigenvectors W y and Vy , but each matrix can have a different set of eigenvalues. We denote the matrix of eigenvalues as x,i for each matrix A x,i , and y,i for each matrix A y,i . Thus, we have the following eigen-decompositions: A x,i = Wx x,i VxT ,

i = 1, . . . ,r

(40)

and A y,i = W y y,i VyT ,

i = 1, . . . ,r

(41)

for which the sets of eigenvectors satisfy the biorthogonal relationships VxT Wx = I x and VyT W y = I y , where I x is an m ×m identity matrix and I y is an n ×n identity matrix. It follows from these assumptions that the right-eigenvectors of the system matrix A x y , denoted as Wx y , and its eigenvalues, denoted as x y , are given by Wx y = Wx ⊗ W y

and x y =

r

x,i ⊗ y,i

(42)

i=1

The left-eigenvectors, denoted as Vx y , are given by −1 −1 −1 T T T VxTy = Wx−1 y = (W x ⊗ W y ) = W x ⊗ W y = Vx ⊗ Vy = (Vx ⊗ Vy )

(43)

We refer to the assumptions above as the separability assumptions because they allow us to apply the assumptions from Section 3 for separate sets of eigenvectors. This kind of factorization for the system matrix often appears in the discretization of partial differential equations (PDEs) (e.g. in finite difference discretization of the Laplacian, divergence and other operators). Thus, the analysis under these extended assumptions will be more suitable for applications. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

230

P. NAVARRETE MICHELINI AND E. J. COYLE

4.2. Separable filters The purpose of the assumptions in this section is to apply more aggressive coarsening in the multi-dimensional case. We start from two down-sampling matrices Dx and D y independently designed to down-sample the nodes of the x- and y-dimensions by a factor of 2. Then, we define the down-sampling matrix for the 2D system, denoted as Dx y , as Dx y = Dx ⊗ D y

(44)

In this way the down-sampling matrix Dx y is designed to reduce the total number of nodes by a factor of 4. We use inter-grid filters, denoted by FI,x y and FR,x y , and expressed as FI,x y = FI,x ⊗ FI,y

and

FR,x y = FR,x ⊗ FR,y

(45)

where FI,x , FR,x and FI,y , FR,y are restriction and interpolation filters with eigenvectors Wx and W y , respectively, and with eigenvalues I,x , R,x and I,y , R,y , respectively. Therefore, FI,x y and FR,x y have right-eigenvectors Wx y , left-eigenvectors Vx y and eigenvalues given by I,x y = I,x ⊗ I,y

and

R,x y = R,x ⊗ R,y

(46)

We note that due to the properties of Kronecker products, the decomposition in (20) is valid for both 1D and 2D operators. Similarly, the smoothing operator Sx y is designed such that Sx y = Sx ⊗ S y

(47)

where Sx and S y are smoothing operators with eigenvectors Wx and W y , respectively, with eigenvalues x and y , respectively. The eigenvalues of Sx y are given by x y = x ⊗ y

(48)

4.3. The separable harmonic aliasing property Under the separability assumptions stated in the sections above, we assume the harmonic aliasing property on each set Wx , Dx and W y , D y . Then, a generalization of the harmonic aliasing property that we call the separable harmonic aliasing property follows for the set Wx y , Dx y . That is, VxTy Ux y Dx y Wx y = (Vx ⊗ Vy )T (Dx ⊗ D y )T (Dx ⊗ D y )(Wx ⊗ W y ) = (VxT Ux Dx Wx )⊗(VyT U y D y W y ) = Nx ⊗ N y

(49)

where N x and N y are harmonic aliasing patterns as defined in (24). Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

231

5. ERROR ANALYSIS In Section 2 the coarse grid correction matrix K was defined as K = I − Ish Aˇ −1 Ihs A

(50)

This is the main object of study in this section as it shows the evolution of the approximation error in both nested iteration and the correction scheme. Namely, the approximation error after a full two-grid step with 1 correction scheme iterations, each of them with 1 pre-smoothing and 2 post-smoothing iterations, is given by e(1 +1+2 )1 = (S 2 K S 1 )1 K u

(51)

In the following sub-sections, we use the assumptions stated in Sections 3 and 4 to see how the eigenvectors of the system are affected by these iterations. Based on the partition of eigenvectors introduced in Section 3, we apply the same principle to create the following partition of eigenvalues: L x L x L x 0 0 0 x = , x = and x = (52) 0 H x 0 H x 0 H x Within this section, we will use the convention to omit any subscript x, y or x y whenever the analysis leads to the same formulas. For example, the eigen-decomposition A = W V is valid in both 1D and 2D because the eigen-decomposition A x = Wx x VxT is assumed in the 1D, and the properties of Kronecker products imply A x y = Wx y x y VxTy in the 2D case. 5.1. Galerkin coarsening From the assumptions in both Sections 3 and 4, the Galerkin condition stated in (4) can be expressed as Aˇ −1 = {Ihs AIsh }−1 = {D FR AFI U }−1 = {(DW ) R I (DV )T }−1

(53)

From here, we first consider the assumptions in Section 3. Using the partition of eigenvectors induced by the harmonic aliasing property, we define the matrix x = R,L x L x I,L x + R,H x H x I,H x

(54)

Then, we follow the last step in (53) and obtain ( Aˇ x )−1 = {(Dx Wx ) R,x x I,x (Dx Vx )T }−1 = {(Dx W L x )x (Dx VL x )T }−1 T = 4(Dx W L x )−1 x (Dx VL x )

(55)

where we use, first, the surjective property and, second, the biorthogonal relationships (27) to (30). Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

232

P. NAVARRETE MICHELINI AND E. J. COYLE

Now we consider the assumptions in Section 4. Similarly, for this case we define the matrices x,i = R,L x L x,i I,L x + R,H x H x,i I,H x

(56)

y,i = R,L y L y,i I,L y + R,H y H y,i I,H y

(57)

and, based on these definitions, x y =

r

x,i ⊗ y,i

(58)

i=1

Then, we follow the last step in (53) to obtain ( Aˇ x y )−1 = {(Dx y Wx y ) R,x y x y I,x y (Dx y Vx y )T }−1 = {(Dx W L x ⊗ D y W L y )x y (Dx VL x ⊗ D y VL y )T }−1 T = 16(Dx W L x ⊗ D y W L y )−1 x y (Dx VL x ⊗ D y VL y ) T = 16(Dx y W L x y )−1 x y (Dx y VL x y )

(59)

where we use, first, the surjective property and, second, the biorthogonal relationships (27)–(30), and finally, we simply define W L x y = W L x ⊗ W L y and VL x y = VL x ⊗ VL y . We note that in both (55) and (59) the Galerkin coarse matrix Aˇ has an eigen-decomposition with eigenvectors given by the down-sampled eigenvectors of A. This is a nice property as it assures that the assumptions stated for the system on the fine grid are satisfied in coarser grids as well. 5.2. Convergence rates Using the assumptions in Sections 3 and 4 and the results from Section 5.1, we can express the coarse grid correction matrix as follows: K = I − Ish Aˇ −1 Ihs A = I − FI U Aˇ −1 D FR W V T = I − FI W V T U Aˇ −1 DW R V T = I −(22d )W I (V T U DW L )−1 (VLT U DW ) R V T

(60)

where d represents the dimension of the problem. In parentheses we see how the harmonic aliasing property appears naturally in this matrix. For the assumptions from Section 3, we follow the algebra to obtain T T K x = I x −4Wx I,x (VxT Ux Dx W L x )−1 x (VL x U x Dx W x ) R,x x Vx

1 I˜x T −1 1 ˜ ˜ x = Wx Vx −4Wx I,x I x I x R,x x VxT 2 I˜x 2

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

233

DESIGN OF INTER-GRID OPERATORS

⎡ = Wx ⎣

I˜x − I,L x −1 x R,L x L x − I,H x −1 x R,L x L x

− I,L x −1 x R,H x H x

⎤

⎦ VxT I˜x − I,H x −1 R,H x H x x

(61)

Note that matrix K is not diagonalized by the eigenvectors of the system. Instead, we obtain a block-tridiagonal matrix that shows how each group of modes from W L x and W H x are damped and mixed. In order to simplify this result, we define the convergence operator, x , such that K x = Wx x VxT L x→L x = Wx L x→H x

H x→L x H x→H x

VxT

(62)

Each one of the four submatrices in x is diagonal and we call them the modal convergence operators. Their diagonal values represent the factor by which each modal component of the error is multiplied and transferred between L x and H x modes according to the subscripts. Their diagonal values can be simplified as follows: 1 −bi , ( H x→L x )i,i = 1+ai bi 1+ai bi −ai ai bi = and ( H x→H x )i,i = 1+ai bi 1+ai bi

( L x→L x )i,i = ( L x→H x )i,i

(63)

where ai =

( R,L x )i,i ( L x )i,i ( R,H x )i,i ( H x )i,i

and bi =

( I,L x )i,i ( I,H x )i,i

(64)

The convergence of a two-grid algorithm depends on the smoother Sx and the coarse grid correction matrix K x , which in the domain of the system’s eigenvectors is contained in the matrices x and x , respectively. Now, matrix x and its four modal convergence operators allow us to focus on the performance of the inter-grid operators; therefore, this is the main object of study for the design of inter-grid filters. In Section 6 we will show examples on how to apply this analysis. From the assumptions in Section 4, we follow a different algebra. This is T T K x y = I x y −16Wx y I,x y (VxTy Ux y Dx y W L x y )−1 x y (VL x y U x y Dx y W x y ) R,x y x y Vx y

1 ˜ ˜ 1 I˜x 1 I˜y 1 ˜ ˜ [ I [ I ⊗ −1 = I x y −16Wx y I,x y ]⊗ ] R,x y x y VxTy I I x x y y xy 2 I˜x 2 I˜y 2 2

= I x y − Wx y

I,L x I,H x

⊗

I,L y I,H y

= Wx y x y VxTy

⎛ T T ⎞ r R,L x L x,i R,L y L y,i ⎝ ⎠ VxTy −1 ⊗ xy i=1 R,H x H x,i R,H y H y,i (65)

Here, a simple structure for the convergence operator, x y , does not appear clear because of the Kronecker products involved. Since the matrix −1 x y cannot in general be factored as a Kronecker Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

234

P. NAVARRETE MICHELINI AND E. J. COYLE

product, we cannot analyze the convergence of the algorithm for each dimension independent of the other. We then need to consider the four possible combinations of x, y-dimensions and L , H groups. The products considering these combinations are mixed in x y and we need to reorder them to identify the modal convergence operators. Thus, we introduce a permutation matrix P ∈ {0, 1}mn×mn such that for arbitrary matrices X L , X H ∈ Rm/2×m/2 and Y L , Y H ∈ Rn/2×n/2 one has ⎤ ⎡ X L ⊗Y L ⎢ ⎥ YL XL ⎢ X H ⊗Y L ⎥ ⎥ ⎢ (66) ⊗ =⎢ P ⎥ XH YH X ⊗Y ⎣ L H⎦ X H ⊗Y H Then, applying this permutation to structure: ⎡ L x L y→L x L y ⎢ ⎢ L x L y→H x L y Px y P T = ⎢ ⎢ ⎣ L x L y→L x H y L x L y→H x H y

reorder the rows and columns of x y , we obtain the following H x H y→L x L y

⎤

H x L y→L x L y

L x H y→L x L y

H x L y→H x L y

L x H y→H x L y

H x L y→L x H y

L x H y→L x H y

⎥ H x H y→H x L y ⎥ ⎥ ⎥ H x H y→L x H y ⎦

H x L y→H x H y

L x H y→H x H y

H x H y→H x H y

(67)

where we identify the modal convergence operators representing the 16 possible ways to transfer modal components of the error between the four combinations of x, y-dimensions and L , H groups according to the subscripts. The values of each one of these groups can be expressed in a generic form as Ax By→C x Dy = AC B D −( I,C x ⊗ I,Dy )−1 xy

r

( R,Ax Ax,i )⊗( R,By By,i )

(68)

i=1

where A ∈ {H, L}, B ∈ {H, L}, C ∈ {H, L}, D ∈ {H, L} and AC B D is an identity matrix only if A = C and B = D. The convergence operator, x y , and its 16 modal convergence operators allow us to focus on the performance of the inter-grid operators and it is always the main object of study for the design of inter-grid filters. Compared with the 1D case, the analysis is now more complicated as the modal components of the error are transferred not only between two groups of modes but also between different dimensions. In Section 6.3 we will show an example on how to design inter-grid filters under this scenario. 5.3. The heuristics in error analysis We consider an ideal scenario for a 1D problem in order to check the heuristic behavior of the multigrid algorithm. By using the variational property, we define the single inter-grid filter Fsharp,x such that L x = I H x = 0 Copyright q

2008 John Wiley & Sons, Ltd.

and

(69) Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

235

We call this filter the sharp inter-grid filter. In Fourier harmonic analysis, this would correspond to what is called a ‘perfect low-pass filter’ [11]. This definition is more general as we can now apply it to a more general kind of basis, that is, to any basis with the harmonic aliasing property. By using the eigen-decomposition of A and the sharp inter-grid filter in (63), we obtain K sharp,x = W H x W HT x

(70)

Therefore, for this choice of inter-grid operators, we can see that several applications of the coarse grid correction matrix do not help to reduce the error. It just cancels the W L x components of the error. We then need to apply smoothing iterations in order to reduce the W H x components of the error. We also verify that the error reduction achieved by multigrid iterations does not depend on the step size h as the iteration matrix does not depend on the eigenvalues of A. The simplicity of this result shows the general principles of multigrid algorithm design. In Section 6 we will see how this idealistic scenario does not always lead to an optimal algorithm for solving linear systems.

6. EXAMPLES OF INTER-GRID FILTER DESIGN In Section 5 we obtained theoretical results for the convergence rates based on the assumptions stated in previous sections. In this section, we introduce examples to show how these results can be applied to different kinds of systems. We consider systems based on different sets of eigenvectors: Fourier harmonic modes, Hadamard harmonic modes, and a mixture of Fourier and Hadamard harmonic modes. 6.1. Fourier harmonic analysis: trade-off between computational complexity and convergence rate We consider a 1D system in which A is a standard finite-difference discretization of a second-order derivative with step size h = 1; i.e. the stencil of A is s = [−1 2 −1] (the underline denotes the diagonal element). We apply Dirichlet boundary conditions, i.e. stencil [2 −1] at the left corner and [−1 2] at the right corner, which lead to an invertible system. The number of nodes in the discretization is set to N = 16 and we consider a two-grid algorithm with a coarse-grid step size of 2h = 2. In addition we assume the variational √property that leads to a single inter-grid filter F. The eigenvectors of A are given by (W )i, j = 2/17 sin(i j/17), with i, j = 1, . . . , 16. The eigenvector matrix W is orthonormal and, after reversing the order of the columns j = 9, . . . , 16, it also fulfills the harmonic aliasing property. Therefore, our modal analysis can be directly applied to this system. On the other hand, the extension of Fourier analysis from complex- to real-valued harmonic functions is well known and LFA can therefore be applied to this system. Thus, the purpose of this example is to (i) show how our method is applied to a standard system in which the eigenvectors can be labeled by frequencies, thus giving an intuitive picture of what is happening and (ii) show how to design inter-grid filters within our new framework and thus demonstrate the issue we discover in this process. For the inter-grid filter, we start with the common choice of linear-interpolation and fullweighting (LI/FW), and we consider their application on an increasing number of neighbors per node. The standard choice for this system considers two neighbors per node, which leads to an inter-grid filter F with stencil s = [0.5 1 0.5] and Dirichlet boundary conditions. Considering more neighbors per node is equivalent to applying the inter-grid filter F several times in interpolation Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

236

P. NAVARRETE MICHELINI AND E. J. COYLE

or restriction operations. Thus, an inter-grid filter F, F 2 , F 3 , F 4 , . . . represents LI/FW operations over 2, 4, 6, 8, . . . neighbors per node, respectively. In Table I we show the spectral radii of L x→L x , H x→L x , L x→H x , and H x→H x for a two-grid approach using different numbers of LI/FW passes. Here, the most important factor is the spectral radius of L x→L x . It shows the worst case reduction of modal components of the error for low-frequency modes that are mapped to themselves. In LFA the spectral radius L x→L x is called the asymptotic convergence factor, loc [7]. The reduction of these components of the error is the main task of the two-grid approach. We do not see much reduction of the high-frequency components of the error that are mapped to themselves, as the spectral radius of H x→H x is always close to 1, leaving this task to the smoothing iterations. The cross-frequency rates H x→L x and L x→H x represent the aliasing effect in which high- and low-frequency components of the error are reduced and mapped to low- and high-frequency components of the error, respectively. The spectral radius of H x→L x in Table I appears to be close to 1, which means an almost complete transfer of high-to-low frequency components of the error at each iteration. A careful look at the convergence rates shows that this large number comes from the transfer of the highestfrequency error to the lowest-frequency error. Although this transfer is not ideal, it is not critical because the pre-smoothing iterations will reduce the highest-frequency error very effectively. As expected, all the convergence rates in Table I are further reduced as we increase the number of LI/FW passes. The disadvantage of increasing the number of passes is that the inter-grid filter, as well as the coarse system matrix, becomes less and less sparse (see Figure 4(a)–(d)), thus increasing the computational complexity of the algorithm. To complete the convergence analysis, we need to consider a smoothing filter and select the number of smoothing iterations. A simple choice is to use the Richardson iteration scheme, which leads to a smoothing filter S = I −(1/)A, with = 4 obtained by the Gershgorin bound of A. This filter satisfies our assumptions because it has the same eigenvectors as A. Since the task of the smoothing filter is to reduce the high-frequency components of the error, we suggest choosing the number of smoothing iterations such that the reduction of the high-frequency components of the error, given by H x , is equal to or less than the reduction of low-frequency components of the error achieved by the coarse grid correction matrix, given by L x→L x . For this example, using a 1-pass LI/FW inter-grid filter we achieve the same reduction of low-frequency error as the reduction of high-frequency error achieved by one Richardson iteration. For instance, using one pre-smoothing (1 = 1) and one post-smoothing (2 = 1) Richardson iteration in the correction scheme, the approximation error after one full two-grid step (1 = 1) will be given by e3 = (S K )2 u with a convergence rate of (S K )2 = 0.2458.

Table I. Spectral radii of modal convergence operators for the system in Section 6.1. Filter LI/FW LI/FW LI/FW LI/FW LI/FW LI/FW

1-pass 2-passes 3-passes 4-passes 5-passes 6-passes

L x→L x

H x→L x

L x→H x

H x→H x

0.4539 0.3647 0.2839 0.2149 0.1590 0.1155

0.9915 0.5280 0.4946 0.4506 0.4011 0.3506

0.4539 0.4388 0.4110 0.3745 0.3334 0.2914

0.9915 1.0000 1.0000 1.0000 1.0000 1.0000

The results consider a two-grid approach using several passes of LI/FW as inter-grid operators. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

237

DESIGN OF INTER-GRID OPERATORS

As a different choice of inter-grid operators, we try to approach the sharp inter-grid filter with a common procedure used in signal processing. We select the eigenvalues of F in analogy with a Butterworth filter of order n [11]. We start at order n = 1 with a cut-off frequency of /16 that tries to reduce all frequencies except for the lowest frequency mode, and as we increase the order n the cut-off frequency approaches /2 geometrically, at which point the filter becomes perfectly sharp. That is, Bn (i) =

1

i −1 2 1+ 1−(7/8)n N −1

2n ,

i = 1, . . . , 16

(71)

from which we construct the inter-grid filter as F = W W T with = diag(Bn ). The main reason to move the cut-off frequency with the order of the filter is to prevent the eigenvalues in H x from producing large cross-frequency convergence rates. In Table II we show the spectral radii of L x→L x , H x→L x , L x→H x , and H x→H x for a two-grid approach using Butterworth filters of different orders. The Butterworth filter is better than LI/FW, especially in terms of the cross-frequency convergence rate H x→L x . The main disadvantage of the Butterworth filter is that it is always non-sparse, as shown in Figure 4(e)–(h). Even if increasing the order n makes the filter appear more and more sparse, the overall contribution of small terms is comparable to the largest entries. Now, increasing the order n also concentrates the largest entries close to the diagonal and the tridiagonal elements become similar to the LI/FW entries. This hints at the optimality of LI/FW as a tridiagonal inter-grid filter for this specific problem. An important conclusion of these tests is that in the design of inter-grid filters for systems with Fourier harmonic modes as eigenvectors, we face a trade-off between the number of multigrid steps that can be saved by moving toward a sharp inter-grid filter and the number of communications between neighboring nodes required for interpolation/restriction tasks. This is a consequence of the Gibbs phenomenon, which is well known in Fourier analysis [11]. 6.2. Hadamard harmonic analysis: optimality of the sharp inter-grid filter Now, we consider a system based on an application of Markov chains. The system will have a variable size with 2l−1 , l ∈ N+ , transient states and at least one recurrent state. We ignore the precise number of recurrent states and their interconnections as they will not play any role in the Table II. Spectral radii of modal convergence operators for the system in Section 6.1. Filter B1 B2 B3 B4 B5 B6 B7

L x→L x

H x→L x

L x→H x

H x→H x

0.4156 0.2932 0.1954 0.1246 0.0770 0.0467 0.0279

0.5826 0.4994 0.4350 0.3623 0.2925 0.2314 0.1807

0.4493 0.4150 0.3615 0.3011 0.2431 0.1923 0.1502

0.9982 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

The results consider a two-grid approach using Butterworth filters of different orders as the inter-grid filter. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

238

P. NAVARRETE MICHELINI AND E. J. COYLE

Figure 4. Images of the magnitude of entries for different inter-grid filter matrices. The intensity of gray color is white for the largest magnitude and black for the smallest magnitude. The scale between black and white is set in logarithmic scale in order to increase the visual difference between small and zero entries: (a) LI/FW 1-pass; (b) LI/FW 3-passes; (c) LI/FW 5-passes; (d) LI/FW 7-passes; (e) B1 ; (f) B3 ; (g) B5 ; and (h) B7 .

solution of the problem. Thus, the structure of the system is given by the transition probability matrix within the transient states, which is obtained by the following recursion: T1 =

1 2

Tl =

(72) Tl−1

2−l · I˜c

2−l · I˜c

Tl−1

for l>1

(73)

where I˜c is a counter-diagonal matrix of the same size as Tl−1 . The recursion (73) creates a matrix l−1 l−1 Tl ∈ (R+ )2 ×2 that is sub-stochastic since the sum of all of its entries in a row is always less than or equal to 1. In fact, the sum of all of the entries in a row is equal to 1−1/2l for all the rows in Tl . Thus, in this Markov chain, each transient state has a probability of 1/2l of jumping to one or more recurrent states in one step. An example of this structure is shown in Figure 5 where we can see the state transition diagram of the transient states for l = 4. Since, by definition, no recurrent state is connected to any transient state, once the process jumps from a transient to a recurrent state it will never return to any transient state and it is said to have been absorbed. Starting from a given transient state i, 1i2l−1 , the number of jumps within the transient states before jumping to a recurrent state is called the absorbing time, ti . There are many applications associated with these so-called absorbing chains [14]; for instance, in the study of discrete phase-type distributions in queueing theory [15]. Here, we will consider the problem of computing the expected value of the absorbing time when l−1 we start at node i; denoted by (xl )i = E[ti ]. The vector xl ∈ R2 is given by the solution of the linear Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

239

DESIGN OF INTER-GRID OPERATORS

Figure 5. State transition diagram of the transient states for the Markov chain used in Section 6.2 with l = 4 (N = 8 nodes). Each connection with solid line shows the probability of state transitions. The dashed lines with double arrows show the probability of transition to one or more recurrent states that do not appear in this figure.

system (I − Tl )xl = 1

(74)

where (1)i = 1, for Here, our system matrix is given by Al = I − Tl , which is a non-singular, symmetric, positive-definite M matrix. Furthermore, the matrix Al becomes illconditioned as we increase l, creating a problem similar to that found in the numerical solution of linear PDEs. In the general context of absorbing chains, the matrix Al = I − Tl is called the fundamental matrix [14]. The inversion of this matrix is important as it also appears in the computation of moments of discrete phase-type distributions and the probability of absorption by recurrent classes, among other problems. In the transition graph of this Markov chain, each node representing a transient state is connected to l neighboring nodes. However, the structure of connections changes from node to node such that the stencil of Al is not constant throughout the rows. For instance, in the Markov chain of Figure 5, the fundamental matrix is i = 1, . . . , 2l−1 .

⎡

0.5

⎢ ⎢ −0.25 ⎢ ⎢ ⎢ 0 ⎢ ⎢ ⎢ ⎢ −0.125 ⎢ A4 = ⎢ ⎢ 0 ⎢ ⎢ ⎢ 0 ⎢ ⎢ ⎢ ⎢ 0 ⎣ −0.0625 Copyright q

−0.25

0

−0.125

0

0

0

−0.0625

0.5

−0.125

0

0

0

−0.0625

0

−0.125

0.5

−0.25

0

−0.0625

0

0

−0.25

0.5

−0.0625

0

0

0

0

−0.0625

0.5

−0.25

0

0

−0.0625

0

−0.25

0.5

−0.125

−0.0625

0

0

0

−0.125

0.5

0

0

0

−0.125

0

−0.25

2008 John Wiley & Sons, Ltd.

⎤

⎥ ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ −0.125 ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ −0.25 ⎥ ⎦

(75)

0.5

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

240

P. NAVARRETE MICHELINI AND E. J. COYLE

Here, the stencil at the 3rd row is s3 = [−0.125, 0.5, −0.25, 0, −0.0625] (the underline denotes the diagonal element), whereas the stencil at the 4th row is s4 = [−0.125, 0, −0.25, 0.5, −0.0625]. Therefore, the assumptions of LFA are not fulfilled and its analysis does not apply for this system. Nevertheless, in the tests that follow we will ignore this fact as we wish to see what convergence rates LFA predicts for a system where its assumptions do not apply. In fact, the eigenvectors of the fundamental matrix Al do not correspond to the Fourier harmonic modes of LFA but instead form a Hadamard matrix of order N = 2l−1 . One of the standard ways to construct this matrix is Sylvester’s construction [16], but the basis obtained by this procedure does not fulfill the harmonic aliasing property. As in the previous example, we need to reorder the columns of the eigenvector matrix in order to obtain the right structure. Therefore, we introduce a column-reordered variation of Sylvester’s construction as follows: W1 = 1

Wl 1 Wl+1 = √ [U U¯ ] 2 Wl

Wl

(76)

(77)

−Wl

where U and U¯ correspond to uniform up-sampling and up-unselecting matrices of sizes 2l ×2l−1 . The matrix [U U¯ ] acts as a permutation matrix that reorders the columns of the new basis. From this construction, it can be easily checked through induction arguments that the matrix Wl is orthonormal and that it fulfills the harmonic aliasing property. The same arguments could be used to check the fact that Wl diagonalizes the system matrix Al . Furthermore, the orthogonality of Wl and Equation (77) allow us to obtain a closed-form expression for the sharp inter-grid filter, as defined in (69). That is, ⎤ ⎡ 1 1 ⎥ ⎢ ⎥ ⎢1 1 ⎥ ⎢ ⎥ ⎢ 1 1 ⎥ ⎢ ⎥ ⎢ ˜I 0 1 1 ⎥ ⎢ T 1 1 (78) Fsharp,l+1 = Wl+1 Wl+1 = (I +U D¯ + U¯ D) = ⎢ ⎥ ⎥ 2 2⎢ 0 0 ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎢ ⎥ ⎢ 1 1⎦ ⎣ 1 1 The structure of the filter turns out to be very sparse, unlike the sharp filter for the previous example. This filter alternately averages the values at each node with its left neighbor and then its right neighbor. In our analysis, the inter-grid filter Fl and the smoothing operator Sl should be designed to match the structure of the system. For this reason, our analysis would not work if we use standard inter-grid operators such us LI/FW, because the eigenvectors of the LI/FW filter are Fourier harmonic modes that are different than the Hadamard harmonic modes. As the sharp inter-grid filter in (78) has a sparse structure, we choose it as the inter-grid filter. As in the previous example, for the smoothing filter we use the Richardson iteration scheme, which leads to a smoothing filter Sl = I −(1/)A, with = 1−2−l obtained by the Gershgorin bound of Al . Since the sharp inter-grid filter is removing all the L x components of the error, the only parameters to configure are the Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

241

DESIGN OF INTER-GRID OPERATORS

number of smoothing iterations. This means that we need only one iteration of the full two-grid algorithm with O(1) smoothing iterations to make the algorithm converge. On the other hand, a standard choice of LI/FW inter-grid operators does not work better than the sharp inter-grid configuration as shown in Table III. As this scenario is rather unusual in the context of PDEs, where the eigenvectors are typically similar to Fourier harmonic modes (that come with Gibbs phenomenon, as shown in Section 6.1), we would like to understand how the sparse inter-grid filter arranges the information to reach convergence in one step. To understand this, we need to consider three facts. First, the fact that the sharp inter-grid filter is alternately averaging the values at each node with its left and then right neighbor. Second, we need to note that the coarse grid matrix Aˇ l constructed from Al and Fsharp,l , using the Galerkin condition, is equal to our definition of Al−1 constructed by recursion (this can be checked by induction). This would not have been the case if we used a different inter-grid filter such as LI/FW. Then, we can say that the sharp inter-grid filter has been able to unveil the recursive structure by which we defined the system. It is also a nice property in the sense that the coarse grid problem also represents an absorbing Markov chain; thus the sharp inter-grid filter makes the two-grid algorithm an aggregation method similar to what is sought in [17] using a different multi-level approach. The third fact is that the structure of our system induces a hierarchical classification of nodes. Namely, we can define classes of nodes by the strength of their connections, as is usually done in AMG methods [2]. Two nodes i and j belong to the same class if they have a transition probability (P)i, j 1/2c , with 1cl. For instance, in the system of Figure 5 for c = 1 we have eight singleton classes with the individual transition states in each one; for c = 2 we have four classes: {1, 2}, {3, 4}, {5, 6}, and {7, 8}; for c = 3 we have two classes: {1, 2, 7, 8} and {3, 4, 5, 6}; and finally for c = 4 we have one class with the whole set of nodes. This classification of nodes is shown in Figure 6. Finally we can see how these three facts combine. The sharp inter-grid filter averages the strongest connected nodes, which correspond alternately to nodes at the left and right of each Table III. Convergence rates of the full two-grid algorithm for different inter-grid operators and different sizes of the system in Section 6.2. (S K )2 N 2 4 8 16 32 64 128 256

Sharp filter

LI/FW

0.0000 0.0816 0.1600 0.2040 0.2268 0.2383 0.2442 0.2471

0.2500 0.2030 0.2700 0.3447 0.3955 0.4428 0.4817 0.5156

The configuration considers one step of the full two-grid algorithm with one pre-smoothing and one post-smoothing Richardson iteration. The results compare the convergence rates by using a sharp inter-grid filter or LI/FW for intergrid operators. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

242

P. NAVARRETE MICHELINI AND E. J. COYLE

Figure 6. Classification of nodes by the strength of their connection for the Markov chain in Figure 5. By considering only the strongest connections, we start in the white color with eight singleton classes. As we consider weaker connections, we obtain four classes, two classes and finally one class with the whole set of nodes, represented in light to dark gray colors, respectively. The classification leads to a nested structure of classes. Table IV. Spectral radii of modal convergence operators for the system in Section 6.2. Analysis MA ∀N LFA N = 2 LFA N = 4 LFA N = 8 LFA N = 16 LFA N = 32 LFA N = 64 LFA N = 128 LFA N = 256

L x→L x

H x→L x

L x→H x

H x→H x

0 0 0.0528 0.1702 0.2877 0.3739 0.4283 0.4602 0.4783

0 0 0.2236 0.3758 0.4527 0.4838 0.4948 0.4984 0.4995

0 0 0.2236 0.3758 0.4527 0.4838 0.4948 0.4984 0.4995

1 1 0.9472 0.9803 0.9936 0.9981 0.9995 0.9999 1.0000

The results consider a two-grid approach using the sharp inter-grid filter from (78). The first row shows the results for our modal analysis (MA), which do not change with the problem size. The following rows show the estimation of LFA (working under incorrect assumptions) for systems with increasing size.

node. These nodes belong to the same class defined above for c = 2 and, since the different classes for 1cl are nested (see Figure 6), the sharp inter-grid filter guarantees a similar structure in the coarse grid. This did not happen in the example of Section 6.1 because in that case we could not separate classes with a nested structure. This fact seems to be crucial in order to obtain an optimal inter-grid filter for the Markov chain problem. In terms of convergence factors for this example, our analysis gives different results if we used LFA while ignoring the fact that the assumptions for LFA are not fulfilled. This is shown in Table IV, where we can see that the convergence estimated by our method compared with LFA is the same only for grid size N = 2. This is because N = 2 is the only size for which the Hadamard basis is the same as the Fourier basis. For N >2 we see how LFA gives increasingly pessimistic estimates of the convergence factors. We can also check how different the convergence analysis would be if we chose LI/FW for the inter-grid operators. The multigrid algorithm lets us use these inter-grid operators but then neither LFA nor our analysis can be applied to get information about modal convergence. This is because Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

243

the Fourier harmonic modes of the LI/FW inter-gird filter do not match the Hadamard harmonic modes of the system. If we ignore this limitation and we use the Hadamard harmonic basis to estimate the convergence of a two-grid step, we obtain the results of Table V. On the other hand, if we use a Fourier harmonic basis to estimate convergence rates (which corresponds to LFA), we obtain the results in Table VI. The Hadamard analysis leads to a more pessimistic estimation but it is not possible to determine which result is more accurate because the definitions of the L and H groups of modes technically does not apply under both analyses. The conclusion of this approach is that an arbitrary choice of inter-grid operators does not let us apply the heuristics of the multigrid methodology if we cannot define groups of L and H modes. The choice of LI/FW inter-grid operators still seems to make the algorithm stable because the estimated convergence factors are always less than 1, but its performance is obviously inferior to that of the optimal sharp inter-grid filter for this system. Thus, in this case our analysis has been shown to be better than LFA in terms of its usefulness for studying convergence rates. Its main advantage appears in the design of inter-grid filters and smoothing operators. 6.3. Fourier–Hadamard harmonic analysis: the mixture of two different bases We now consider a 2D system that corresponds to a mixture of the system from Section 6.1 and the system from Section 6.2. Let A x ∈ R16×16 be the system matrix from Section 6.1 and let A y ∈ R16×16 be the system matrix from Section 6.2 for l = 5, N = 16. Then, we define a 2D system by taking the Kronecker sum of these two operators. That is, Ax y = Ax ⊕ A y

(79)

= Ax ⊗ I y + Ix ⊗ A y

(80)

Table V. Spectral radii of modal convergence operators for different sizes of the system in Section 6.2. N

L x→L x

H x→L x

L x→H x

H x→H x

4 8 16 32 64

0.4375 0.5179 0.5843 0.6279 0.6624

0.7844 0.8122 0.8466 0.8893 0.9708

0.2296 0.2641 0.3737 0.4322 0.4645

0.8438 0.9183 0.9586 0.9791 0.9895

The results consider a two-grid approach, using LI/FW as the inter-grid operators, and assuming the Hadamard basis as eigenvectors of the system matrix (valid assumption) and inter-grid filter (wrong assumption).

Table VI. Spectral radii of modal convergence operators for different sizes of the system in Section 6.2. N

L x→L x

H x→L x

L x→H x

H x→H x

4 8 16 32 64

0.2205 0.2782 0.3597 0.4150 0.4514

0.6765 0.7038 0.6907 0.7805 0.8945

0.3841 0.4527 0.4660 0.4770 0.4879

0.8843 0.9630 0.9915 0.9978 0.9995

The results consider a two-grid approach, using LI/FW as the inter-grid operators, and assuming Fourier harmonic modes as eigenvectors of the system matrix (wrong assumption) and inter-grid filter (valid assumption). Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

244

P. NAVARRETE MICHELINI AND E. J. COYLE

Thus, the system matrix A x y ∈ R256×256 is a mixture of matrices with different eigenvectors. Although the problem does not represent any well-known system in applications, we choose it in order to show how our analysis applies to mixtures of very different systems. A more realistic scenario of this kind would be, for example, a 2D diffusion equation with a diffusion coefficient that varies along one of the dimensions. The difficulty in that case is to check the harmonic aliasing property, which thus remains a problem for future research. Since A y does not have constant stencil coefficients, neither does A x y . Therefore the assumptions of LFA are not fulfilled. However, since the system fulfills the assumptions introduced in Section 4, we are able to apply our modal analysis. Here, the eigenvectors of the system matrix A x y are given by Wx ⊗ W y , where Wx are Fourier harmonic modes and W y are Hadamard harmonic modes. From the results of Section 5.2, we know that although the eigenvectors of a system represented by sums of Kronecker products are separable, the convergence rates are not. Thus, the problem of design of inter-grid operators cannot, in general, be considered with any one dimension independent of any other. Now, since in the y-dimension we can actually implement optimal inter-grid operators using the sharp inter-grid filter in (78), this allows us to decouple the two problems. Then, if we choose the inter-grid filter Fx y = Fx ⊗ Fy with the 1-pass LI/FW inter-grid filter as Fx (suitable for Fourier harmonic eigenvectors) and the sharp inter-grid filter in (78) as Fy (optimal for Hadamard harmonic modes) we obtain the convergence rates shown in Table VII for the two-grid algorithm. This combination of inter-grid filters completely removes the cross-modal convergence factors with modal transfers H y → L y and L y → H y. For the modal transfers H y → H y, we observe complete removal of cross-modal error components (HxHy → LxHy and LxHy → HxHy) and complete transfer of self-mode error components (LxHy → LxHy and HxHy → HxHy). For the modal transfers L y → L y, we observe results similar to those obtained for the 1-pass LI/FW inter-grid filter in Section 6.1. As we did in the previous example, we can ignore the fact that the assumptions for LFA are not fulfilled in this problem and we can compute its estimates for the convergence rates. These results are shown in Table VIII, where we see that the estimates are not too far from the estimates of our modal analysis. The disadvantage of LFA, other than working as an approximation, is in the interpretation of these results as it shows that there is no decoupling between the two dimensions of the problem. Finally, we consider the use of different inter-grid operators for which we make a common choice of using a 2D LI/FW operator. This operator leads to an inter-grid filter Fx y = Fx ⊗ Fy Table VII. Spectral radii of modal convergence operators for the system in Section 6.3 using our modal analysis. x y → LxLy → HxLy → LxHy → HxHy

LxLy

HxLy

LxHy

HxHy

0.4532 0.4611 0 0

0.8503 0.9994 0 0

0 0 1 0

0 0 0 1

The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating transfer from the four combinations of modes in the columns to the four combinations of modes in the rows. The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter for the x-dimension and the sharp inter-grid filter in (78) for the y-dimension. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

245

DESIGN OF INTER-GRID OPERATORS

Table VIII. Spectral radii of modal convergence operators for the system in Section 6.3 using LFA (under wrong assumptions). x y → LxLy → HxLy → LxHy → HxHy

LxLy

HxLy

LxHy

HxHy

0.6063 0.4547 0.4523 0.2080

0.8420 0.9995 0.2935 0.2024

0.4523 0.2080 0.9965 0.1322

0.2935 0.2024 0.1878 1.0000

The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating transfer from the four combinations of modes in the columns to the four combinations of modes in the rows. The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter for the x-dimension and the sharp inter-grid filter in (78) for the y-dimension.

Table IX. Spectral radii of modal convergence operators for the system in Section 6.3 using our modal analysis (under incorrect assumptions). x y → LxLy → HxLy → LxHy → HxHy

LxLy

HxLy

LxHy

HxHy

0.7126 0.4533 0.3730 0.1432

0.8287 0.9997 0.2177 0.1433

0.7548 0.1892 0.9982 0.2226

0.2509 0.1798 0.2957 1.0000

The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating transfer from the four combinations of modes in the columns to the four combinations of modes in the rows. The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions. It is assumed that Fourier harmonic modes are eigenvectors of the operators in the x-dimension (valid assumption) and Hadamard basis are eigenvectors of the operators in the y-dimension (valid for the system matrix and false for the inter-grid filter).

Table X. Spectral radii of modal convergence operators for the system in Section 6.3 using LFA (under incorrect assumptions). x y → LxLy → HxLy → LxHy → HxHy

LxLy

HxLy

LxHy

HxHy

0.6722 0.4553 0.4714 0.2257

0.8313 0.9996 0.3026 0.2177

0.6119 0.2253 0.9999 0.1890

0.3030 0.2177 0.2528 1.0000

The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating transfer from the four combinations of modes in the columns to the four combinations of modes in the rows. The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions. It is assumed that Fourier harmonic modes are eigenvectors of the operators in both x- and y-dimensions (false only for the system matrix in the y-dimension).

where both Fx and Fy are 1D, 1-pass LI/FW filters. As in the example of Section 6.2, this choice of inter-grid operators makes both our modal analysis and LFA not applicable for this problem. In Tables IX and X, we can see the estimates of our analysis, based on a Fourier–Hadamard basis and LFA, respectively. The results are very similar and our analysis shows slightly pessimistic results compared with LFA. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

246

P. NAVARRETE MICHELINI AND E. J. COYLE

There are many disadvantages for this choice of inter-grid operators. First and most important, it does not allow us to define groups of L and H modes. Second, by an arbitrary definition of these groups of modes using either our analysis or LFA, we see a high coupling in the cross-modal convergence rates. Finally, the convergence rate for the modal transfer LxLy → LxLy frequencies, which is the most important task for the two-grid algorithm, is far from the convergence rate achieved by the Fourier–Hadamard inter-grid operators in Table VII. This last fact has a consequence in the final algorithm which can be observed by using a smoothing filter Sx y = Sx ⊗ S y , where Sx and S y correspond to the Richardson iteration scheme as configured in Sections 6.1 and 6.2, respectively. Then, a single full two-grid step (1 = 1) with 1 = 2 = 1 shows a convergence rate of (S K )2 = 0.2301 for our inter-grid configuration compared with (S K )2 = 0.3037 obtained by using a 2D LI/FW inter-grid operator. Here, our analysis has been found to be better than LFA for the design of a 2D inter-grid filter, as the combination of LI/FW with a sharp inter-grid filter shows good performance and perfect decoupling between the convergence rates of different dimensions.

7. CONCLUSIONS In this paper we introduced new tools for the analysis of the linear multigrid algorithm. These tools allowed us to reveal and study the roles of the smoothing and inter-grid operators in multigrid convergence. In most applications of multigrid methods, these operators are designed based on the geometry and heuristics of the problem. We see this as a big problem for distributed applications because in such scenarios it is essential to minimize the number of iterations the algorithm requires to converge. The main contribution of this paper is the establishment of a new approach to convergence analysis and new design techniques for inter-grid and smoothing operators. We have shown how this analysis is different than LFA, which is considered to be the standard tool for the analysis and design of multigrid methods [7]. Our study shows the clear advantages of our approach when facing systems with non-uniform stencils. By considering different systems, we showed that there is no general approach to optimizing the multigrid operators for a given system. For systems with Fourier harmonic modes as eigenvectors, we face a trade-off between the computational complexity and the convergence rate of each multigrid step. For systems with a Hadamard basis as eigenvectors, we are able to obtain optimal multigrid operators that make the algorithm converge in one step, with O(1) smoothing iterations, which is possible due to the particular structure of the system. The same multigrid operators show a perfect decoupling in a mixture of two different systems where one of the operators has a Hadamard basis as eigenvectors. Our modal analysis has been shown to be crucial to unveil these properties and to show the exact influence of each operator on the convergence behavior of the algorithm. We note that, given the assumptions imposed on the system, we were able to analyze multigrid convergence with no heuristics based on the geometry of the problem. This opens the possibility of designing a fully AMG method if the correct assumptions are satisfied. Nevertheless, this is not a straightforward step because the harmonic aliasing property is strongly connected with the geometry of the problem. The main difficulty in our approach is to check our assumptions on the eigenvectors of the system. For future research, we are studying practical methods to check these assumptions and modifications that can make them more flexible to check and manage. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

DESIGN OF INTER-GRID OPERATORS

247

REFERENCES 1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computations 1986; 19: 23–56. 2. Ruge JW, St¨uben K. Algebraic multigrid (AMG). In Multigrid Methods, Frontiers in Applied Mathematics, vol. 3, McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130. 3. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984. 4. Yang UM. Parallel algebraic multigrid methods high performance preconditioners. In Numerical Solutions of PDEs on Parallel Computers, Bruaset AM, Bjrstad P, Tveito A (eds), Lecture Notes in Computational Science and Engineering: Springer: Berlin, 2005. 5. Brandt A. Rigorous quantitative analysis of multigrid, I: constant coefficients two-level cycle with l2-norm. SIAM Journal on Numerical Analysis 1994; 31(6):1695–1730. 6. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31: 333–390. 7. Trottenberg U, Oosterlee CW, Sch¨uller A. Multigrid. Academic Press: London, 2000. 8. Mallat S. A Wavelet Tour of Signal Processing (2nd edn), Wavelet Analysis and its Applications. Academic Press: New York, 1999. 9. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000. 10. Wesseling P. An Introduction to Multigrid Methods. Wiley: Chichester, 1992. 11. Proakis JG, Manolakis DG. Digital Signal Processing (2nd edn), Principles, Algorithms, and Applications. Macmillan: Indianapolis, IN, 1992. 12. Laub AJ. Matrix Analysis for Scientists and Engineers. SIAM: Philadelphia, PA, 2005. 13. Davis PJ. Circulant Matrices. A Wiley-Interscience Publication, Pure and Applied Mathematics. Wiley: New York, Chichester, Brisbane, 1979. 14. Bremaud P. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer: New York, 1999. 15. Neuts MF. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. Johns Hopkins University Press: Baltimore, MD, 1981. 16. Sylvester JJ. Thoughts on inverse orthogonal matrices, simultaneous sign-successions, and tesselated pavements in two or more colours, with applications to newton’s rule, ornamental tile-work, and the theory of numbers. Philosophical Magazine 1867; 34:461–475. 17. De Sterck H, Manteuffel T, McCormick SF, Nguyen Q, Ruge JW. Markov chains and web ranking: a multilevel adaptive aggregation method. Thirteenth Copper Mountain Conference on Multigrid Methods, Copper Mountain, CO, U.S.A., 2007.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:219–247 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:249–269 Published online 15 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.575

A generalized eigensolver based on smoothed aggregation (GES-SA) for initializing smoothed aggregation (SA) multigrid M. Brezina1, ‡ , T. Manteuffel1, ‡ , S. McCormick1, ‡ , J. Ruge1, ‡ , G. Sanders1, ∗, †, ‡ and P. Vassilevski2 1 Department

of Applied Mathematics, University of Colorado at Boulder, UCB 526, Boulder, CO 80309-0526, U.S.A. 2 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Avenue, Mail Stop L-560, Livermore, CA 94550, U.S.A.

SUMMARY Consider the linear system Ax = b, where A is a large, sparse, real, symmetric, and positive-definite matrix and b is a known vector. Solving this system for unknown vector x using a smoothed aggregation (SA) multigrid algorithm requires a characterization of the algebraically smooth error, meaning error that is poorly attenuated by the algorithm’s relaxation process. For many common relaxation processes, algebraically smooth error corresponds to the near-nullspace of A. Therefore, having a good approximation to a minimal eigenvector is useful to characterize the algebraically smooth error when forming a linear SA solver. We discuss the details of a generalized eigensolver based on smoothed aggregation (GES-SA) that is designed to produce an approximation to a minimal eigenvector of A. GES-SA may be applied as a stand-alone eigensolver for applications that desire an approximate minimal eigenvector, but the primary purpose here is to apply an eigensolver to the specific application of forming robust, adaptive linear solvers. This paper reports the first stage in our study of incorporating eigensolvers into the existing adaptive SA framework. Copyright q 2008 John Wiley & Sons, Ltd. Received 16 May 2007; Revised 5 December 2007; Accepted 5 December 2007

KEY WORDS:

generalized eigensolver; smoothed aggregation; multigrid; adaptive solver

∗ Correspondence

to: G. Sanders, Department of Applied Mathematics, University of Colorado at Boulder, UCB 526, Boulder, CO 80309-0526, U.S.A. † E-mail: [email protected] ‡ University of Colorado at Boulder and Front Range Scientific Computing. Contract/grant sponsor: University of California Lawrence Livermore National Laboratory; contract/grant number: W-7405-Eng-48

Copyright q

2008 John Wiley & Sons, Ltd.

250

M. BREZINA ET AL.

1. INTRODUCTION In the spirit of algebraic multigrid (AMG) [1–5], smoothed aggregation (SA) multigrid [6] has been designed to solve a linear system of equations with little or no prior knowledge regarding the geometry or physical properties of the underlying problem. Therefore, SA is often an efficient solver for problems discretized on unstructured meshes with varying coefficients or with no associated geometry. The relaxation processes commonly used in multigrid solvers are computationally cheap, but commonly fail to adequately reduce certain types of error, which we call error that is algebraically smooth with respect to the given relaxation. If a characterization of algebraically smooth error is known, in the form of a small set of prototype vectors, the SA framework constructs intergrid transfer operators that allow such error to be eliminated on coarser grids, where relaxation is more economical. For example, in a 3D elasticity problem, six such components (the so-called rigid body modes) form an adequate characterization of the algebraically smooth error. Rigid body modes are often available from discretization packages, and a solver can be produced with these vectors in the SA framework [6]. However, such a characterization is not always readily available (even for some scalar problems) and must be developed in an adaptive process. Adaptive SA (SA), as presented in [7], was designed specifically to create a representative set of vectors for cases where a characterization of algebraically smooth error is not known. Initially, simple relaxation is performed on a homogeneous version of the problem for all levels of the multigrid hierarchy being constructed. These coarse-level approximations are used to achieve a global-scale update that serves as our first prototype vector that is algebraically smooth with respect to relaxation. Using this one resulting component, the SA framework is employed to construct a linear multigrid solver, and the whole process can be repeated with the updated solver playing the role of relaxation on each multigrid level. At each step, the adequacy of the solver is assessed by monitoring convergence factors, and if the current solver is deemed adequate, then the adaptive process is terminated and the current solver is retained. We consider applying SA to an algebraic system of equations Ax = b, where A = (ai j ) is an n ×n symmetric, positive-definite (SPD) matrix that is symmetrically scaled so that its diagonal entries are all ones. For simplicity, we use damped Jacobi for our initial relaxation. The SA framework provides an interpolation operator, P, that is used to define a coarse level with standard Galerkin variational corrections. If the relaxation process is a convergent iteration, then it is known from the literature (e.g. [1, 8]) that a sufficient condition for two-level convergence factors bounded from one is that for any u on the fine grid, there exists a v from the coarse grid such that u− Pv22

C (Au, u) A2

(1)

with some constant C. The quality of the bound on convergence factor depends on the size of C, as shown in [9]. This requirement is known in the literature as the weak approximation property and reflects the observation noted in [8, 10] that any minimal eigenvector (an eigenvector associated with the smallest eigenvalue) of A needs to be interpolated with accuracy inversely proportional to the size of its eigenvalue. For this reason, this paper proposes a generalized eigensolver based on smoothed aggregation (GES-SA) to approximate a minimal eigenvector of A. Solving an eigenvalue problem as an efficient means to developing a linear solver may appear counterintuitive. However, we aim to compute only an appropriately accurate approximation of the minimal eigenvector to develop an efficient linear solver with that approximation at O(n) Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

251

GES-SA

cost. In this context, many existing efficient methods for generating a minimal eigenvector are appealing (see [11, 12] for short lists of such methods). Here, we propose GES-SA because it takes advantage of the same data structures as the existing SA framework. Our intention is to eventually incorporate GES-SA into the SA framework to enhance robustness of our adaptive solvers for difficult problems that may benefit from such enhancement (such as system problems, corner-singularity problems, or problems with geometrically oscillatory near-kernel). The GES-SA algorithm performs a series of iterations that minimize the Rayleigh quotient (RQ) over various subspaces, as discussed in the later sections. In short, GES-SA is a variant of algebraic Rayleigh quotient multigrid (RQMG [13]) that uses overlapping block RQ Gauss–Seidel for its relaxation process and SA RQ minimization for coarse-grid updates. In [14], Hetmaniuk developed an algebraic RQMG algorithm that performs point RQ Gauss–Seidel for relaxation and coarse-grid corrections based on a hierarchy of static intergrid transfer operators that are supplied to his algorithm. This supplied hierarchy is assumed to have adequate approximation properties. In contrast, GES-SA initializes the hierarchy of intergrid transfer operators and modifies it with each cycle, with the goal of developing a hierarchy with adequate approximation properties, as in the setup phase of SA. This is discussed in more detail in Section 3.2. This paper is organized as follows. The rest of Section 1 gives a simple example and a background on SA multigrid. Section 2 introduces the components of GES-SA. Section 3 presents how the components introduced in Section 2 are put together to form the full GES-SA algorithm. Section 4 presents a numerical example with results that demonstrate how the linear SA solvers produced with GES-SA have desirable performance for particular problems. Finally, Section 5 makes concluding remarks. 1.1. The model problem Example 1 Consider the linear problem Ax = b and its associated generalized eigenvalue problem Ax = Bx. Matrix A is the 1D Laplacian with Dirichlet boundary conditions, discretized with equidistant second-order central differences, symmetrically scaled so that the diagonal entries are all ones: ⎡

2

⎢ ⎢ −1 ⎢ 1⎢ A= ⎢ 2⎢ ⎢ ⎢ ⎣

⎤

−1 2

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ −1⎦

−1 ..

.

−1

2 −1

(2)

2

an n ×n tridiagonal matrix. Matrix B for this example is In , the identity operator on Rn . The full set of nodes for this problem is n = {1, 2, . . . , n}. The problem size, n = 9, is used throughout this paper to illustrate various concepts regarding the algorithm. Note that the 1D problem is used merely to show concepts and is not of further interest, as its tridiagonal structure is treated with optimal computational complexity using a direct solver. However, the example is useful in the sense that it captures the concepts we present in their simplest form. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

252

M. BREZINA ET AL.

1.2. SA multigrid In this section, we briefly recall the SA multigrid framework for constructing a multigrid hierarchy. Like any algebraic multilevel method, SA requires a setup phase. Here, we follow the version presented in [6, 15]. Given a relaxation process and a set of vectors K characterizing algebraically smooth error, the SA setup phase produces a multigrid hierarchy that defines a linear solver. For symmetric problems, such as those we consider here, standard SA produces a coarse grid using interpolation operator P and restriction operator, R = P T . This gives the variational (or Galerkin) coarse-grid operator, Ac = P T A P, commonly used in AMG methods. This process is repeated recursively on all grids, constructing a multigrid hierarchy. The interpolation operator is ˆ that satisfies produced by applying a smoothing operator, S, to a tentative interpolation operator, P, the weak approximation property. At the heart of forming Pˆ is a discrete partitioning of fine-level nodes into a disjoint covering of the full set of nodes, n = {1, 2, . . . , n}. Members of this partition are locally grouped based on matrix AG , representing the graph of strong connections [6]. AG is created by filtering the original problem matrix A with regard to strength of coupling (Figure 1). For the scalar problems considered here, we define node i to be strongly connected to node j with respect to the parameter ∈ (0, 1) if √ |ai j |> aii a j j

(3)

Any connection that violates this requirement is a weak connection. Entry (AG )i j = 1 if the connection between i and j is strong, and (AG )i j = 0 otherwise. Definition 1.1 A collection of m subsets {A j }mj=1 of n = {1, 2, . . . , n} is an aggregation with respect to AG if the following conditions hold. • Covering: mj=1 A j = n . • Disjoint: For any j = k, A j ∩Ak = ∅. • Connected: For any j, if two nodes p, q ∈ A j , then there exists a sequence of edges with end points in A j that connects p to q within the graph of AG . Each individual subset A j within the aggregation is called an aggregate. The method we use to form aggregations is given in [6], where each aggregate has a central node, or seed, numbered i, and covers this node’s entire strong neighborhood (the support of the ith row in graph of AG ). This is a very common way of forming aggregations for computational benefits, but is not mandatory. We return to Example 1 to explain the aggregation concept. An acceptable aggregation of 9 with respect to A would be m = 3 aggregates, each of size 3, defined

Figure 1. Graph of matrix AG from Example 1 with n = 9. The nine nodes are enumerated, edges of the graph represent nonzero off-diagonal entries in A, and the Dirichlet boundary conditions are represented with the hollow dots at the end points. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

253

GES-SA

as follows: A1 = {1, 2, 3},

A2 = {4, 5, 6},

A3 = {7, 8, 9}

(4)

It is easily verified that this partitioning satisfies Definition 1.1. This aggregation is pictured in Figure 2. 2D examples are presented in Section 4. We find it useful to represent an aggregation {A j }mj=1 with an n ×m sparse, binary aggregation matrix, which we denote by [A]. Each column of [A] represents a single aggregate, with a one in the (i, j)th entry if point i is contained in aggregate A j , and a zero otherwise. In our 1D example, with n = 9, we represent the aggregation given in (4) as ⎤ ⎡ 1 ⎥ ⎢ ⎥ ⎢1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ (5) [A] = ⎢ ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎢ 1⎥ ⎥ ⎢ ⎥ ⎢ ⎢ 1⎥ ⎦ ⎣ 1 Based on the sparsity structure of [A], the SA setup phase constructs Pˆ with a range that represents a given, small collection of linearly independent vectors, K. This is done by simply restricting the values of each vector in K to the sparsity pattern specified by [A]. ˆ the range of the Under the above construction, the vectors in K are ensured to be in R( P), tentative interpolation operator, and are therefore well attenuated by a corresponding coarse-grid ˆ correction. However, K is only a small number of near-kernel components. Other vectors in R( P) may actually be quite algebraically oscillatory, which can be harmful to the coarsening process because it may lead to a coarse-grid operator with higher condition number than desired. This degrades the effect of coarse-grid relaxation on vectors that are moderately algebraically smooth. Of greater importance, some algebraically smooth vectors are typically not well represented by ˆ and are therefore not reduced by coarse-grid corrections. To remedy the situation, SA does R( P) ˆ not use Pˆ as its interpolation operator directly, but instead utilizes a smoothed version, P = S P, where S is an appropriately chosen polynomial smoothing operator. As a result, a much richer set of algebraically smooth error is accurately represented by the coarse grid. A typical choice for S is one step of the error propagation operator of damped-Jacobi relaxation. In this paper,

Figure 2. Graph of matrix AG from Example 1 with n = 9 splits into three aggregates. Each box encloses a group of nodes in its respective aggregate. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

254

M. BREZINA ET AL.

we use damped-Jacobi smoothing under the assumption that the system is diagonally scaled so that diagonal elements are one. The underlying set, K, that induces a linear SA solver can be either supplied as in standard SA or computed as in SA methods. We now describe a new approach to constructing K that can be used within the existing SA framework. 2. RQ MINIMIZATION WITHIN SUBSPACES Consider the generalized eigenvalue problem, Av = Bv, where A and B are given n ×n real SPD matrices, v is an unknown eigenvector of length n, and is an unknown eigenvalue. Our target problem is stated as follows: find an eigenvector, v1 = 0, corresponding to the smallest eigenvalue, 1 , in the problem Av = Bv

(6)

For convenience, v1 is called a minimal eigenvector and the corresponding eigenvalue, 1 , is called the minimal eigenvalue. First, we review a well-known general strategy for approximating the solution of (6), an approach that has been used in [13, 16] to introduce our method. This strategy is to select a subspace of Rn and choose a vector in the subspace that minimizes the RQ. In GES-SA, we essentially do two types of subspace selection: one uses local groupings to select local subspaces that update our approximations locally; the other uses SA to select low-resolution subspaces that use coarse grids to update our approximation globally. These two minimization schemes are used together in a typical multigrid way. We recall the RQ to introduce a minimization principle that we use to update an iterate within a given subspace. Definition 2.1 The RQ of a vector, v, with respect to matrices A and B is the value A,B (v) ≡

vT Av vT Bv

(7)

Since we restrict ourselves to the case when A and B are SPD, the RQ is always a real and positive valued. The solution we seek minimizes the RQ: A,B (v1 ) = minn A,B (v) = 1 >0 v∈R

(8)

If two vectors w and v are such that A,B (w)< A,B (v), then w is considered to be a better approximate solution to (6) than v. Therefore, problem (6) is restated as a minimization problem: find v1 = 0 such that A,B (v1 ) = minn A,B (v) v∈R

(9)

Given a current approximation, v˜ , we use the minimization principle to construct a subspace, V ⊂ Rn , such that dim(V) = m n and min A,B (v) A,B (˜v)

v∈V

Copyright q

2008 John Wiley & Sons, Ltd.

(10)

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

255

GES-SA

˜ is a vector in V with minimal RQ. Note that if v˜ is already of minimal The new approximation, w, RQ, then lowering the RQ is not possible. In general, we must carefully construct the subspace to ensure that the RQ is indeed lowered. ˜ we must solve a restricted minimization problem within V: To select w, ˜ = 0 such that A,B (w) ˜ = min A,B (v) find w v∈V

(11)

˜ by restating the minimization problem within This restricted minimization problem is solved for w the lower-dimensional vector space, Rm , and then mapping the low-dimensional solution to the corresponding vector in V. To do so, we construct an n ×m matrix, Q, whose m column vectors are a basis for V. Note that, for any v ∈ V, there exists a unique y ∈ Rm such that v = Qy. Moreover, the RQ of v with respect to A and B and the RQ of y with respect to coarse versions of A and B are equivalent: A,B (v) =

vT Av yT Q T AQy = = Q T AQ, Q T B Q (y) = AV ,BV (y) vT Bv yT Q T B Qy

(12)

for AV = Q T AQ and BV = Q T B Q. Thus, the solution of restricted minimization problem (11) is found by solving a low-dimensional minimization problem: find y1 = 0 such that AV ,BV (y1 ) = minm AV ,BV (y) y∈R

(13)

or, equivalently, a low-dimensional eigenproblem: find an eigenvector, y1 = 0, corresponding to the smallest eigenvalue, 1 , in the eigenproblem AV y = BV y

(14)

After either approximating the solution to low-dimensional minimization problem (13) or solving low-dimensional eigenvalue problem (14) for y1 with a standard eigensolver, the solution to the ˜ ← Qy1 . The whole process is then minimization problem restricted to V defined in (11) is w ˜ use v˜ to form a new subspace, V, and corresponding Q, solve (14) for repeated: update v˜ ← w, ˜ ← Qy1 . y1 , and set w The specific methods we use for constructing subspaces are the defining features of GES-SA and are explained in the following three sections. In Section 2.1, we focus on how a reasonable initial approximation is obtained using a nonoverlapping version of the subspace minimization algorithm. In Section 2.2, we present the global subspace minimization based on SA that serves as our nonlinear coarse-grid update. In Section 2.3, we describe the local subspace minimizations that play the role of nonlinear relaxation. 2.1. Initial guess development Because the RQ minimization problem we wish to solve is nonlinear, it is helpful to develop a fairly accurate initial approximation to a minimal eigenvector. The algorithm presented in this section is very similar to the local subspace iteration that is presented in Section 2.3. The difference is that here we perform nonoverlapping additive updates with the zero-vector as an initial iterate. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

256

M. BREZINA ET AL.

First, we require that an aggregation, {A j }mj=1 , be provided. Each aggregate induces a subspace, V j ⊂ Rn , defined by all vectors v whose support is contained entirely in A j . We form a local selection matrix, Q j , that maps Rm j onto V j , where m j is the number of nodes in the jth aggregate. This matrix is given by ⎤ ⎡ ⎥ ⎢ . . . eˆ pm j ⎥ eˆ (15) Q j =⎢ ⎦ ⎣ p1 ⊥ ⊥ m

j where eˆ p is the pth canonical basis vector, and { pq }q=1 are the nodes in the jth aggregate. We T then form local principal submatrices, A j ← Q j AQ j and B j ← Q Tj B Q j . A solution, y1 = 0, to generalized eigenvalue problem (14) of size m j is then found using a standard eigensolver. Nodes ˜ j ← Q j y1 . After w ˜ j is found for within the jth aggregate are set as w each aggregate, the initial ˜ j. approximation is the sum of disjoint locally supported vectors: v˜ ← mj=1 w

Remark 2.1 ˜ j is of the same sign as the w ˜ k that are supported within adjacent There is no guarantee that w ˜ j may have all negative entries on A j and w ˜ k may have all positive aggregates. For example, w entries on an adjacent aggregate. In fact, discrepancies in the sign of entries on neighboring aggregates usually occur in practice because y1 is still a solution to the local eigenproblem for any = 0. However, this is not an issue of concern, because the subsequent coarse-grid update presented in Section 2.2 uses the same aggregation as the initial guess development. The coarse space is invariant to such scaling; hence, the result of coarse-grid update is independent as well. In any case, we emphasize that this may occur only for the initial guess development phase of the algorithm. Example 2 in Section 4 is designed to show the invariance of the success of GES-SA with respect to these sign changes. We summarize initial guess development in the form of an algorithm. This algorithm is used on every level in the full GES-SA (algorithm 3 of Section 3) as pre-relaxation only for the first multigrid cycle. Algorithm 1 Initial guess development. • Function: v˜ ← I G D(A, B, {A j }mj=1 ). • Input: SPD matrices A and B, and aggregation {A j }mj=1 . • Output: Initial approximate solution v˜ to (6). 1. For j = 1, . . . , m, do the following: Form Q j based on A j as in (15). Compute A j ← Q Tj AQ j and B j ← Q Tj B Q j . Find any y1 , y1 2 = 1, by solving (14) with a standard eigensolver. ˜ j ← Q j y1 . Interpolate w m ˜ j. 2. Output v˜ ← j=1 w (a) (b) (c) (d)

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

257

GES-SA

Algorithm 1 is demonstrated through Example 1. The selection matrices are ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Q1 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1

⎤ ⎥ 1 ⎥ ⎥ ⎥ 1⎥ ⎥ ⎥ ⎥ ⎥ ⎥, ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎤

⎡

⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ ⎢ Q2 = ⎢ ⎥ 1 ⎥ ⎢ ⎥ ⎢ ⎢ 1⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣

and

⎡

⎤

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Q3 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢1 ⎢ ⎢ ⎢ 1 ⎣

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(16)

1 Here, for all aggregates, j = 1, 2, 3, the restricted matrices are identical: ⎡

2

1⎢ A j = ⎣ −1 2

−1 2 −1

⎤ ⎥ −1⎦

⎡ and

⎢ Bj =⎣

⎤

1

⎥ ⎦

1

2

(17)

1

Hence, solutions to the restricted eigenproblems are all of the form y˜ 1 = j [ 12 , √1 , 12 ]T , with a 2 scaling term | j | = 1. Hence, the initial guess developed is the vector

1 1 1 2 2 2 3 3 3 v˜ = ,√ , , ,√ , , ,√ , 2 2 2 2 2 2 2 2 2

T (18)

For the case j = 1 for all three aggregates, the initial guess is seen in Figure 3. We reiterate what is stated in Remark 2.1: if, for example, 1 = 3 = 1 and 2 = −1, then the initial guess causes no difficulty, even though the RQ of this vector is much higher than the vector formed from 1 = 2 = 3 = 1. For either vector, the subsequent coarse-grid update uses the same subspace to find a set of coefficients that correspond to some new vector of minimal RQ within that subspace. In the context of multigrid, initial guess development is used in place of pre-relaxation for the first GES-SA multigrid cycle performed. Subsequent pre-relaxations and post-relaxations are applied as local subspace relaxation as presented in Section 2.3. We now describe how SA is used for global subspace updates.

Figure 3. Initial guess for the 1D model problem produced by the initial guess development algorithm; the RQ has been minimized over each aggregate individually. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

258

M. BREZINA ET AL.

2.2. Global coarse-grid RQ minimization Typically, SA has been used to form intergrid transfer operators within multigrid schemes for linear systems, as in [6, 7]. Here, we use SA in a similar fashion to form coarse subspaces of lower dimension that are used to compute iterates with lower RQ. SA defines a sparse n ×m interpolation operator, P, that maps from a coarse set of m variables to the original fine set of n variables. Here, we use the same aggregation that was used for initial guess development in Section 2.1. This is essential for the initial guess to be a suitable one, as stated in Remark 2.1. Given a current iterate, v˜ , we form a space V that is designed to contain a vector with a RQ that is less than or equal to that of v˜ . Our construction is to first form tentative ˆ that has v˜ in its range. This is done in the usual way by restricting the values of v˜ interpolation, P, to individual aggregates according to the sparsity pattern defined by the aggregation matrix, [A]: Pˆ := diag(˜v)[A]

(19)

ˆ Specifically, v˜ = Pˆ 1m , where 1m is the column vector of all Operator Pˆ is such that v˜ ∈ R( P). ˆ with no ones with length m. This means that we are guaranteed to have a vector within R( P) larger an RQ than that of v˜ : min A,B (v) A,B (˜v)

ˆ v∈R( P)

(20)

ˆ are of high RQ, because the columns of Pˆ have local support and Many of the vectors in R( P) are not individually algebraically smooth with respect to relaxation. Therefore, as in standard SA, ˆ and use the resulting operator, we apply a polynomial smoothing operator of low degree, S, to P, ˆ as a basis for our coarse space. This gives a coarse space with better approximation instead of P, to the sought eigenvector at reasonable increase in computational complexity. This smoothing consists of just one application of the error propagation operator of damped Jacobi: S := (In −D −1 A)

(21)

where In is the identity operator on Rn and = 4/3D −1 A2 . Normalization of the columns of interpolation is also performed, which does not change the range of interpolation, but does control the scaling of the coarse-grid problems. This scaling is used so that the diagonal entries of coarse-grid matrix Ac are all one. The scaling is done by multiplying with diagonal matrix, N , whose entries are given by Nii :=

1 ˆ i A S( P)

(22)

ˆ i is the ith column of P. ˆ Note that we must assume that v˜ is nonzero on every aggregate. where ( P) The interpolation matrix is P := S Pˆ N

(23)

Under this construction, S v˜ is in the range of P. Therefore, if S v˜ has lower RQ than that of v˜ , we have guaranteed that a vector in Vc = R(P) improves the RQ of our iterate. The vector of minimal RQ we select from Vc is typically a vector of much lower RQ than that of S v˜ due to the localization provided by prolongation. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

259

GES-SA

Note that a choice of could be computed to minimize the RQ of v˜ , a single vector in the range of interpolation. However, this choice of may not be best for all other vectors in the range. Therefore, we retain the standard choice of , known from the literature [15]. The columns of P form a basis for Vc because our construction ensures that there is at least one point in the support of each column that is not present in any other column. Forming aggregates that are at least a neighborhood in size and using damped-Jacobi smoothing does not allow columns to ever share support with an aggregate’s central node. Therefore, under the assumption v˜ is nonzero on every aggregate, Ac ← P T A P and Bc ← P T B P are both SPD. In the multigrid vocabulary, restricted problem (14) is now the coarse-grid problem. A coarse-grid update is given ˜ ← Py1 . This problem, Ac y = Bc y, is by interpolating the solution of the coarse-grid problem: w either solved using a standard eigensolver or posed as a coarse-grid minimization problem as in (13), where local and global updates may be applied in a recursive fashion. This process forms the coarse-grid update step of Algorithm 3 of Section 3, the full GES-SA algorithm. As in linear multigrid, the coarse-grid update needs to be complemented by an appropriately chosen relaxation process, on which we next focus. 2.3. Local subspace RQ relaxation In the context of a nonlinear multilevel method, we use subspace minimization updates posed over locally supported subsets as our relaxation process, which is a form of nonlinear overlapping-block Gauss–Seidel method for minimizing the RQ. This section explains the specifics for choosing the nodes that make up each block and presents the relaxation algorithm. The original generalized eigenvalue problem, Av = Bv, is posed over a set of n nodes, n . To choose a subspace that provides a local update over a small cluster of m j nodes, we construct W j to be a subset of n , with cardinality m j . Subset W j should be local and connected within the graph of A. Subspace Vv˜j is chosen to be the space of all vectors that only differ from a constant multiple of our current approximation, v˜ , by w, a vector with support in the subset W j : Vv˜j := {v ∈ Rn | v = w0 v˜ +w where w0 ∈ R and supp(w) ⊂ W j }

(24)

a subspace of Rn with dimension (m j +1) used to form and solve (11) for an updated approxi˜ that has a minimum RQ within Vv˜j . We allow changes to the entries of current iterate mation, w, v˜ only at nodes in set W j to minimize RQ, while leaving v˜ unchanged at nodes outside of W j , up to a scaling factor, w0 . Remark 2.2 If v˜ has a relatively high RQ, then a vector in Vv˜j that has minimal RQ may have w0 = 0. Essentially, the subspace iteration throws away all information outside of W j . This is potentially disastrous to our algorithm because, for typical problems, minimal eigenvectors are globally supported. Avoiding this situation is the primary reason we develop initial guesses with Algorithm 1 instead of randomly. Our current implementation does not update the iterate for subspaces in which w0 = 0. However, this situation did not occur for the problems presented in the numerical results in Section 4. We now explain how subsets W j are chosen and then explain the iteration procedure. One step of the local subspace relaxation scheme minimizes the approximate eigenvalue locally over one small portion of the full set of nodes, n . We utilize a sequence of subsets {W j }mj=1 ⊂ n that Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

260

M. BREZINA ET AL.

form an overlapping covering of n . We then perform local subspace relaxation with each of these subsets in a multiplicative fashion. Similar to aggregation matrix [A], we represent these subset coverings with a sparse, binary, overlapping subset matrix, [W]. One way to obtain an overlapping subset covering is by dilating aggregates. This is accomplished by taking each aggregate A j within the aggregation and expanding A j once with respect to the graph of matrix AG (Figure 4). Let [AG ] be an n ×n binary version of AG that stores strong connections in the graph of A, defined as [AG ]i j :=

1, (AG )i j = 0

(25)

0, (AG )i j = 0

Then define [W] by creating a binary version of the matrix product [AG ][A], a dilation [W]i j :=

1, ([AG ][A])i j = 0 0, ([AG ][A])i j = 0

(26)

Our choice of the overlapping subsets is not limited to this construction; however, we make this choice for simplicity and convenience. In practice, each local RQ minimization is accomplished by rewriting minimization problem (11) as a generalized eigenvalue problem of low dimension, as in (14), and solving for minimal eigenvector y1 with a standard eigensolver. Note that here we use Q v˜j to represent matrices that span each subspace, Vv˜j , to distinguish from the Q j used in the initial guess section. We construct

an n ×(m j +1) matrix, Q v˜j , so that its columns are an orthogonal basis for subspace Vv˜j . To define Q v˜j explicitly, first define vector v0 by

(v0 )i :=

vi , i ∈ W j 0,

(27)

i ∈Wj

For each point p ∈ W j , define canonical basis vectors eˆ p . Then, we form Q v˜j by appending these (m j +1) vectors in a matrix of column vectors: ⎡

⎢ v eˆ v˜ Q j = ⎣ 0 p1 ⊥

⊥

⎤

. . . eˆ pm j ⎥ ⎦

(28)

⊥

Figure 4. Graph of matrix AG from Example 1, with n = 9, grouped into three overlapping subsets. Each box encloses a group of nodes in a respective subset. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

261

GES-SA m

j , is a list of all points within local subset W j . This makes where the sequence of points, { pi }i=1 ˜ v the columns of Q j orthogonal, a matrix that maps from Rm j +1 onto Vv˜j . For the 1D example, with W2 = {3, 4, 5, 6, 7}, the operator is given by ⎤ ⎡ v1 ⎥ ⎢ ⎥ ⎢ v2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0 1 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ 1 (29) Q v2˜ = ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢0 1 ⎥ ⎢ ⎥ ⎢ ⎢0 1⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢v ⎦ ⎣ 8 v9

Next, we compute Av˜j ← (Q v˜j )T AQ v˜j and B vj˜ ← (Q v˜j )T B Q v˜j . Then (14) is solved with a standard eigensolver for y1 , which is normalized so that (y1 )1 = 1. This normalization is the same as requiring w0 = 1, which leaves all nodes outside of W j unchanged by the update. Then, updated ˜ ← Q v˜j y1 . iterate is then given by w Local subspace relaxation is summarized in the following algorithm. Algorithm 2 Local subspace relaxation. Function: v˜ ← LSR(A, B, v˜ , {W j }mj=1 ) . Input: SPD matrices A and B, current approximation to the minimal eigenvector v˜ , and overlapping subset covering {W j }mj=1 . Output: Updated iterate v˜ . 1. For j = 1, . . . , m, do the following: (a) Form Q v˜j based on v˜ and W j as in (28).

(b) Form Av˜j ← (Q v˜j )T AQ v˜j and B vj˜ ← (Q v˜j )T B Q v˜j . (c) Find y1 by solving (14) via a standard eigensolver. (d) If w0 = 0, normalize and v˜ ← Q v˜j y1 . 2. Output v˜ . Figure 5 shows how a single sweep of local subspace relaxation acts on a random initial guess for the 1D example. Although the guess is never really random in the actual algorithm, due to the initial guess development, we show this case so it is clear how the algorithm behaves. This algorithm gives ˜ local characteristics of the actual minimal eigenvector. For problems with a large relaxed iterate w number of nodes, the global characteristics of the iterate are far from those of the actual minimal eigenvector. This is where the coarse-grid iteration complements local subspace relaxation. When done in an alternating sequence, as in a standard multigrid method, the complementary processes Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

262

M. BREZINA ET AL.

Figure 5. A typical local subspace relaxation sweeps on a random iterate for the 1D example with n = 9. The top left vector is the initial iterate, v˜ ; top right shows a subspace update on subset W1 ; ˜ after bottom left shows a subsequent update over W2 ; and bottom right shows final relaxed iterate w a subsequent subspace update over W3 .

achieve both local and global characteristics of the approximate minimal eigenvector, forming an eigensolver. Their explicit use is presented in the following section.

3. GES-SA Because GES-SA is a multilevel method, to describe it, we change to multilevel notation. Any symbol with subscript l refers to an object on grid l, with l = 1 the finest or original grid and l = L the coarsest. For example, the matrix associated with the problem on level l is denoted by Al ; in particular, A1 = A, the matrix from our original problem. Interpolation from level l +1 to level l l l )T . The is denoted by Pl+1 instead of P, and restriction from level l to level l +1 is denoted (Pl+1 dimension of Al is written nl . Other level l objects are denoted with a subscript and superscript l, as appropriate. 3.1. The full GES-SA algorithm GES-SA performs multilevel cycles that are structured in a format similar to standard multigrid. The very first cycle of the full GES-SA algorithm differs from the subsequent cycles: on each level, the initial guess development given in Algorithm 1 is used in place of pre-relaxation. Subsequent cycles use local subspace relaxation given in Algorithm 2. Coarse-grid updates are given by the process presented in Section 2.2. A typical GES-SA cycling scheme is illustrated in Figure 6. Algorithm 3 Generalized eigensolver based on smoothed aggregation. Function: v˜ l ← GESSA(Al , Bl , , , ,l). Input: SPD matrices Al and Bl , number of relaxations to perform , number of cycles , number of coarse-grid problem iterations , and current level l. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

263

GES-SA

Figure 6. Diagram of how V -cycles are done in GES-SA for = 1. We follow the diagram from left to right as the algorithm progresses. Gray dots represent the initial guess development phase of the algorithm, only done in the first cycle. Hollow dots represent solve steps done with a standard eigensolver on the coarsest eigenproblem. Black dots represent local subspace pre- and post-relaxation steps. A dot on top stands for a step on the finest grid and a dot on bottom stands for a step on the coarsest grid.

Output: Approximate minimal eigenvector v˜ l to the level l problem. l 0. If no aggregation of nl is provided, compute {Alj }mj=1 . Also, if no overlapping subset l ml covering is provided, compute {W j } j=1 . Step 0 is only performed once per level. 1. For = 1, . . . , , do the following: l (a) If = 1, form an initial guess, v˜ l ← IGD(Al , Bl , {Alj }mj=1 ). Otherwise, pre-relax the l ml current approximation, v˜ l ← LSR(Al , Bl , v˜ l , , {W j } j=1 ). l l (b) Form Pl+1 with SA based on v˜ l and {Alj }mj=1 as in (23). l l l )T B P l . T (c) Form matrices Al+1 ← (Pl+1 ) Al Pl+1 and Bl+1 ← (Pl+1 l l+1 (d) If nl+1 is small enough, solve (14) for y1 with a standard eigensolver, and set v˜ l+1 ← y1 . Else v˜ l+1 ← GESSA(Al+1 , Bl+1 , , , ,l +1). l v ˜ l+1 . (e) Interpolate the coarse-grid minimization, v˜ l ← Pl+1 l ). (f) Post-relax the current approximation, v˜ l ← LSR(Al , Bl , v˜ l , , {Wlj }mj=1

2. Output v˜ l . 3.2. A qualitative comparison with RQMG The GES-SA algorithm differs from RQMG [13] and algebraic versions of RQMG [14] in three main aspects. Iterations in RQMG are performed as corrections, whereas iterations in GES-SA are replacements or updates. In terms of cost, cycles of RQMG are cheaper than those of GES-SA. For one, the updates of the hierarchy that GES-SA creates are not performed with each iteration of RQMG. Also, the version of GES-SA we present here uses block relaxation compared with the point relaxation used by the RQMG methods in the literature. Perhaps more significant is that while the RQMG methods are supplied with a fixed hierarchy of interpolation operators, assumed to have good approximation for the minimal eigenvector, GESSA starts with no multigrid hierarchy and creates one, changing the entries of the interpolation operators with each cycle. This is similar in spirit to running several initialization setup phase cycles of the original, relaxation-based SA. The GES-SA multigrid hierarchy is iteratively improved to have coarsening and good approximation properties tailored for the problem at hand. These differences suggest that GES-SA or a similar adaptive process may also be used to initialize RQMG by supplying it with an initial hierarchy. Of even more appeal is that RQMG Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

264

M. BREZINA ET AL.

could be used in subsequent cycles to develop several eigenvectors at once, which is currently not a feature of GES-SA. This would be a useful approach to initialize linear solvers for system problems. This study does not quantitatively investigate the use of RQMG in the context of an adaptive process. These possible expansions of the current adaptive methodology are under consideration for our future research. 3.3. Simple adaptive linear solvers Our primary purpose is to use GES-SA to create an adaptive linear SA solver for the problem Ax = b. We first consider problems that require only one near-kernel vector for a successful solver. Applications of GES-SA are repeated until the RQ improvement slows. This gives an approximate minimal eigenvector, v˜ . Then, the setup phase of SA is run to form a solver that accurately represents K = {˜v}. This solver is tested on the homogeneous problem, Ax = 0. Section 4 presents results only for such one-vector solvers. However, if the current one-vector solver is not adequate, then we must develop a vector that represents error that is algebraically smooth with respect to this solver. Currently, our approach is to use the general setup phase of SA in [7] to develop a secondary component, k2 . (A study regarding RQ optimization approaches for computing these secondary components is underway.) Then, the setup phase of SA is run to form a solver that accurately represents K = {˜v, k2 }. The updated solver is again tested on the homogeneous problem. If the updated solver is also inadequate, the SA process can be repeated until an adequate solver is built. 4. NUMERICAL RESULTS Many linear systems that come from the discretization of scalar partial differential equations (PDEs) are solved efficiently with SA, with the vector of all ones as near-kernel, where the linear solver has decent convergence rates. However, we present examples of matrices where the vector of all ones is not a near-kernel component, and using it as one with SA may not produce a linear solver with acceptable convergence rates. All the results in this section show the result of running one GES-SA V -cycle ( = 1, = 1) and = 2 post-relaxation steps. Our implementation for GES-SA is currently in MATLAB and we therefore make no rigorous timing comparisons with competing eigensolvers. In further investigations, we intend to explore these details. The small eigenproblems involved in GES-SA were all solved using the eigs() function with flags set for real and symmetric matrices, which implements ARPACK [17] routines. No 2D problem used more than five iterations to solve small eigenproblems; no 3D problem used more than 10 iterations. Example 2 We present the random-signed discrete Laplacian. Consider the d-dimensional Poisson problem with Dirichlet boundary conditions: −u = f u=0

in = (0, 1)d on

(30)

We discretize (30) with both finite element spaces with nodal bases and second-order finite differences on equidistant rectilinear grids. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

265

GES-SA

ˆ We then define the Either way we discretize the problem, we have a sparse n ×n matrix A. diagonal, random-signed matrix D± to have randomly assigned positive and negative ones for entries. Finally, we form the random-signed discrete Laplacian matrix A by ˆ ± A ← D± AD

(31)

In our results, we also symmetrically scale the matrix A to have ones on its diagonal for Example 2. Now consider solving Ax = b given vector b. Note that the vector of all ones is not algebraically smooth with respect to standard relaxation methods. As shown in Table I, using the vector of all ones produces SA solvers that have unacceptable convergence factors for these problems. Instead, we use one GES-SA cycle to produce an approximate minimal eigenvector, v˜ , and use K = {˜v} in the setup phase of SA to produce a linear SA solver. The convergence factors of the resulting solver are comparable to those obtained using the actual minimal eigenvector to build the linear SA solver. Note that convergence factors are reported as an estimation of asymptotic convergence factors by computing a geometric average of the last 5 of 25 linear SA V (2, 2)-cycles,

e(25) A Asymptotic convergence factor ≈ e(20) A

1/5 (32)

for the homogeneous problem, Ax = 0, starting with a random initial guess. Operator complexity is also reported for the linear solver that uses the vector developed with GES-SA. We use

Table I. Asymptotic convergence factors for the 2D and 3D finite difference (FD) and finite element (FE) versions of the random-signed Laplacian problem. Problem size

Levels

Ones

GES-SA

Eigen

Comp

2D, FE

81 729 6561 59 049

2 3 4 5

0.620 0.892 0.965 0.977

0.074 0.176 0.193 0.215

0.074 0.179 0.196 0.214

1.078 1.108 1.119 1.123

2D, FD

81 729 6561 59 049

2 3 4 5

0.849 0.947 0.962 0.978

0.219 0.294 0.306 0.312

0.219 0.290 0.305 0.312

1.317 1.357 1.348 1.342

3D, FE

729 19 683

2 3

0.598 0.934

0.114 0.188

0.111 0.189

1.054 1.112

3D, FD

729 19 683 64 000

2 3 4

0.825 0.944 0.961

0.289 0.360 0.418

0.292 0.358 0.413

1.389 1.495 1.511

Factors in the column labeled ‘ones’ correspond to solvers created using the vectors of all ones; factors in the ‘GES-SA’ column correspond to solvers that use our approximate minimal eigenvector computed with GES-SA; and factors in the ‘eigen’ column correspond to solvers that use the actual minimal eigenvector. The last column, ‘comp’, shows the operator complexity for all three types of solvers. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

266

M. BREZINA ET AL.

the usual definition of operator complexity, L comp =

l=1 nz(Al )

(33)

nz(A1 )

where the function nz(M) is the number of nonzeros in sparse matrix M. Both geometric- and algebraic-based aggregations were done in our tests. For the finite element problems, we took advantage of knowing the geometry of the grid and formed aggregates that were blocks of 3d nodes. For the finite difference problems, no geometric information was employed and aggregation was done algebraically, as in [6]. Small examples in two dimensions of the difference between the two types of aggregations we used are shown in Figure 7. Algebraic aggregations were done based on the strength-of-connection measure given in (3), with = 0.1. Although it is not the primary purpose of this study, it is also interesting to view GES-SA as a stand-alone eigensolver. For the random-signed Laplacians, Table II shows how one GES-SA V -cycle with = 2 produces an approximate minimal eigenvector that is very close to the actual minimal eigenvector in the sense that the relative error between the RQ and the minimal eigenvalue is order 1. All the results in this section are produced using only one GES-SA cycle. However, we do not believe that a decent approximate minimal eigenvector can be produced with one GES-SA cycle for general problems. Note that the relative error of one cycle tends to increase as h decreases, or as the discretization error decreases. For most problems, we anticipate having to do more GES-SA cycles to achieve an acceptable approximate minimal eigenvector. Example 3 We also investigate GES-SA on ‘shifted’ Laplacian, or Hemholtz, problems to show the invariance of performance with respect to such shifts. Consider the d-dimensional Poisson problem with Dirichlet boundary conditions, shifted by a parameter, s >0: −u − s u = f u=0

in = (0, 1)d on

(34)

Figure 7. Aggregation examples displayed for 2D test problems of low dimension. On the left is an aggregation formed with a geometric aggregation method used for the finite element problems; on the right is an aggregation formed with an algebraic aggregation method used for finite-difference problems. Black edges represent strong connections within graph of matrix AG ; each gray box represents a separate aggregate that contains the nodes enclosed. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

267

GES-SA

Table II. Relative errors between the RQ of the GES-SA approximate minimal eigenvector, , and the minimal eigenvalue, 1 , for 2D and 3D finite element and finite difference versions of Example 2. Problem size

Levels

1

Relative error

2D, FE

81 729 6561 59 049

2 3 4 5

7.222e−02 9.413e−03 1.101e−03 1.243e−04

7.222e−02 9.412e−03 1.100e−03 1.243e−04

0.0000034 0.0001608 0.0002491 0.0001224

2D, FD

81 729 6561 59 049

2 3 4 5

4.895e−02 6.307e−03 7.501e−04 9.306e−05

4.894e−02 6.288e−03 7.338e−04 8.289e−05

0.0000582 0.0031257 0.0222547 0.1227465

3D, FE

729 19 683

2 3

1.066e−01 1.412e−02

1.066e−01 1.409e−02

0.0000017 0.0022805

3D, FD

729 19 683 64 000

2 3 4

4.896e−02 6.303e−03 2.981e−03

4.894e−02 6.288e−03 2.934e−03

0.0003230 0.0024756 0.0158771

Here, s is chosen to make the continuous problem nearly singular. The minimal eigenvalue of the Laplacian operator on (0, 1)d is d2 . Therefore, setting

s = (1−10−s )d2

(35)

for an integer s>0 makes the shifted operator (−− s ) have a minimal eigenvalue of 1 = 10−s d2 . Here, we consider the d = 2 and 3 cases for various shifts s . We discretized the 2D 1 case with nodal bilinear functions on square elements, with h = 244 . This gave us a system with n = 59 049 degrees of freedom. All aggregations done in these tests were geometric, and aggregate diameters were never greater than 3. For each shift, the solvers we developed (using both GES-SA and the actual minimal eigenvector) have operator complexity 1.119 and five levels with 59 049, 6561, 729, 81, and 9 degrees of freedom on each respective level. Similarly, the 3D case was 1 discretized with nodal trilinear functions on cube elements with h = 37 . This gave us a system with n = 46 656 degrees of freedom. Again, for each shift the solvers have operator complexity 1.033 and four levels with 46 656, 1728, 64, and 8 degrees of freedom on each respective level. In either case, the minimal eigenvalue for the discretized matrix A is 1 ≈ 10−s d2 h d . For all cases, we produced two SA solvers: the first solver was based on the actual minimal eigenvector of A and the second was based on the approximation to the minimal eigenvector created by one cycle of GES-SA. In Table III, we show asymptotic convergence factors (32) for these solvers for 2D and 3D and specific shift parameters. We assume that prolongation P from the first coarse grid to the fine grid satisfies the weak approximation property with constant

u− Pv22 A2 C := sup (36) minn c n (Au, u) u∈R f v∈R Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

268

M. BREZINA ET AL.

Table III. Asymptotic convergence factors and measures of approximation, for example, 3.

1 2D, FE (n = 59 049)

eigen GES-SA M1 (P) M2 (P)

1 3D, FE (n = 46 656)

eigen GES-SA M1 (P) M2 (P)

s =1

s =2

s =3

s =4

s =5

3.32e−05 3.32e−05 0.196 0.197 1.14e−05 9.45e−11

3.32e−06 3.37e−06 0.198 0.197 1.13e−04 9.37e−11

3.36e−07 3.88e−07 0.198 0.196 1.11e−03 9.36e−11

3.77e−08 9.11e−08 0.199 0.199 1.01e−02 9.54e−11

7.90e−09 6.03e−08 0.197 0.430 4.83e−02 9.54e−11

5.86e−05 5.88e−05 0.187 0.188 7.07e−05 3.85e−08

6.17e−06 6.30e−06 0.187 0.185 6.67e−04 3.83e−08

9.32e−07 1.06e−06 0.190 0.188 4.43e−03 3.84e−08

4.08e−07 5.40e−07 0.188 0.187 1.04e−02 3.94e−08

3.56e−07 4.86e−07 0.183 0.185 1.18e−02 3.91e−08

The s values in the columns give shift sizes s as in (35). The first block row is for 2D problems, the second is for 3D problems. The rows labeled ‘1 ’ show the minimal eigenvalue for the specific discrete problem and those labeled ‘’ show RQs of the GES-SA vectors. Rows labeled ‘eigen’ show convergence factors for solvers based on the actual minimum eigenvector. Rows labeled ‘GES-SA’ show convergence factors for solvers based on the approximation to the minimal eigenvector given by one GES-SA cycle. Measures of approximation, M1 (P) and M2 (P), are in rows with respective labels.

Based on the knowledge that A comes from a scalar PDE, we further assume that it is most essential to approximate a minimal eigenvector, u1 . The denominator, (Au, u), is smallest for this vector and other vectors that have comparable denominators are locally well represented by u1 . Under these assumptions, we feel it is insightful to monitor the following measure of approximation for any P that we develop M1 (P) := minn v∈R

c

u1 − Pv22 A2 (Au1 , u1 )

(37)

where u1 is the minimal eigenvector of A. Note that this is a lower bound: M1 (P)C. We compute minv∈Rnc u1 − Pv2 by directly projecting u1 onto the range of P, a computationally costly operation that is merely a tool for analyzing test problems. Table III reports M1 (P) on the finest grid for the P developed using the GES-SA method. As s increases, and the problem becomes more ill-conditioned, we see an increase of M1 (P) and eventually a degradation in the convergence factors for the 2D linear solvers that GES-SA produced. We wish to investigate whether the degradation in the 2D GES-SA solver is due to GES-SA performing worse for the more ill-conditioned problems, or the approximation requirements getting stricter. To this purpose, we monitor a second measure of approximation M2 (P) := minn v∈R

c

u1 − Pv22 u1 22

(38)

Again, this measure is shown in Table III for each problem. As s increases, we see that M2 (P) is essentially constant for the linear solvers that GES-SA produced, with fixed computation, indicating that the degradation is only due to the approximation requirements getting stricter. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

269

GES-SA

5. CONCLUSION This paper develops a multilevel eigensolver, GES-SA, in the SA framework for the specific application of enhancing robustness of current adaptive linear SA solvers. We show preliminary numerical results that support approximate eigensolvers as potentially useful for initialization within the adaptive AMG process. This paper serves as a proof of concept, and due to our highlevel implementation, we are not making claims about the efficiency of this algorithm versus purely relaxation-based initialization given in [7]. This question will be investigated as we begin incorporating eigensolvers into our low-level adaptive software.

ACKNOWLEDGEMENTS

The work of the last author was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract W-7405-Eng-48. REFERENCES 1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computation 1986; 9:23–26. 2. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, U.K., 1984. 3. Briggs W, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000. 4. Ruge J, St¨uben K. Algebraic multigrid (AMG). In Multigrid Methods, vol. 5, McComrick SF (ed.). SIAM: Philadelphia, PA, 1986. 5. Trottenberg U, Osterlee CW, Schuller A (Appendix by K. Stuben). Multigrid (Appendix A: An Introduction to Algebraic Multigrid). Academic Press: New York, 2000. 6. Vanˇek P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems. Computing 1996; 56:179–196. 7. Brezina M, Falgout R, MacLachlan S, Manteuffel T, McCormick S, Ruge J. Adaptive smoothed aggregation (SA). SIAM Journal on Scientific Computing 2004; 25:1896–1920. 8. McCormick SF, Ruge J. Multigrid methods for variational problems. SIAM Journal on Numerical Analysis 1982; 19:925–929. 9. Brezina M. Robust iterative methods on unstructured meshes. Ph.D. Thesis, University of Colorado, Denver, CO, 1997. 10. Ruge J. Multigrid methods for variational and differential eigenvalue problems and unigrid for multigrid simulation. Ph.D. Thesis, Colorado State University, Fort Collins, CO, 1981. 11. Hetmaniuk U, Lehoucq RB. Multilevel methods for eigenspace computations in structural dynamics. Domain Decomposition Methods in Science and Engineering, Lecture Notes in Computational Science and Engineering, vol. 55. Springer: Berlin, 2007; 103–114. 12. Neymeyr K. Solving mesh eigenproblems with multigrid efficiency. In Numerical Methods for Scientific Computing, Variational Problems and Applications, Kuznetsoz Y, Neittaanm¨aki P, Pironneau O (eds). Wiley: Chichester, U.K., 2003. 13. Cai Z, Mandel J, McCormick SF. Multigrid methods for nearly singular linear equations and eigenvalue problems. SIAM Journal on Numerical Analysis 1997; 34:178–200. 14. Hetmaniuk U. A Rayleigh quotient minimization algorithm based on algebraic multigrid. Numerical Linear Algebra with Applications 2007; 14:563–580. 15. Vanˇek P, Brezina M, Mandel J. Convergence of algebraic multigrid based on smoothed aggregation. Numerische Mathematik 2001; 88:559–579. 16. Chan TF, Sharapov I. Subspace correction multi-level methods for elliptic eigenvalue problems. Numerical Linear Algebra with Applications 2002; 9:1–20. 17. Lehoucq RB, Sorensen DC, Yang C. ARPACK USERS GUIDE: Solution of Large Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM: Philadelphia, PA, 1998.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:249–269 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:271–289 Published online 7 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.566

Domain decomposition preconditioners for elliptic equations with jump coefficients Yunrong Zhu∗, † Department of Mathematics, Pennsylvania State University, University Park, PA 16802, U.S.A.

SUMMARY This paper provides a proof of the robustness of the overlapping domain decomposition preconditioners for the linear finite element approximation of second-order elliptic boundary value problems with strongly discontinuous coefficients. By analyzing the eigenvalue distribution of the domain decomposition preconditioned system, we prove that only a small number of eigenvalues may deteriorate with respect to the discontinuous jump or mesh size, and all the other eigenvalues are bounded below and above nearly uniformly with respect to the jump and mesh size. As a result, we prove that the convergence rate of the preconditioned conjugate gradient methods is nearly uniform with respect to the large jump and mesh size. Copyright q 2008 John Wiley & Sons, Ltd. Received 19 May 2007; Accepted 1 November 2007

KEY WORDS:

jump coefficients; domain decomposition; conjugate gradient; effective condition number

1. INTRODUCTION In this paper, we will discuss the overlapping domain decomposition preconditioned conjugate gradient (PCG) methods for the linear finite element approximation of the second-order elliptic

∗ Correspondence †

to: Yunrong Zhu, Department of Mathematics, Pennsylvania State University, University Park, PA 16802, U.S.A. E-mail: zhu [email protected], [email protected]

Contract/grant sponsor: NSF; contract/grant number: DMS-0609727 Contract/grant sponsor: NSFC; contract/grant number: 10528102 Contract/grant sponsor: Center for Computational Mathematics and Applications

Copyright q

2008 John Wiley & Sons, Ltd.

272

Y. ZHU

boundary value problem −∇ ·(∇u) = f

in

u = gD

on D

*u = gN *n

on N

(1)

where ∈ Rd (2 or 3) is a polygonal or polyhedral domain with Dirichlet boundary D and Neumann boundary N . The coefficient = (x) is a positive and piecewise constant function. More precisely, we assume that there are M open disjointed polygonal or polyhedral subregions M 0m (m = 1, . . . , M) satisfying m=1 0m = with |0 = m , m

m = 1, . . . , M

where each m >0 is a constant. The analysis can be carried through to a more general case when (x) varies moderately in each subregion. We assume that the subregions {0m : m = 1, . . . , M} are given and fixed but may possibly have complicated geometry. We are concerned with the robustness of the PCG method in regard to both the fineness of the discretization of the overall problem and to the severity of the discontinuities in . This model problem is relevant to many applications, such as groundwater flow [1, 2], fluid pressure prediction [3], electromagnetics [4], semiconductor modeling [5], electrical power network modeling [6] and fuel cell modeling [7, 8], where the coefficients have large discontinuities across interfaces between regions with different material properties. When the above problem is discretized by the finite element method, for example, the conditioning of the resulting discrete system will depend on both the (discontinuous) coefficients and also the mesh size. There has been much interest in the development of iterative methods (such as domain decomposition and multigrid methods) whose convergence rates will be robust with respect to the change of jump size and mesh size (see [9–14] and the references cited therein). In two dimensions, it is not too difficult to see that both domain decomposition [15–18] and multigrid [14, 19, 20] methods lead to robust iterative methods. In three dimensions, some nonoverlapping domain decomposition methods have been shown to be robust with respect to both the jump size and mesh size (see [12, 14, 21, 22]). As was pointed out in [20, Remark 6.3], in some circumstances the deterioration is not significantly severe. In fact, using the estimates related to weighted L 2 -projection in [23], it can be proved that (BA)C| log H | in some cases for d = 3 where H is the mesh size of the coarse space. For example, if the interface has no cross points, or if every subdomain touches part of the Dirichlet boundary [23–25], or if the size of coefficient satisfy the quasi-monotonicity (cf. [26, 27]), then the multilevel or domain decomposition method was proved to be robust. However, in general, the situations for overlapping domain decomposition and multilevel methods are still unclear. Technically, the difficulty is due to the lack of uniform or nearly uniform error and the stability estimates for weighted L 2 -projection, as demonstrated in [24, 28]. Recently [29, 30], we have proved that both the BPX and the multigrid V -cycle preconditioners will lead to nearly uniformly convergent PCG methods for the finite element approximations of (1), although the resulting condition numbers can deteriorate severely as mentioned above. Our work was motivated by the work of Graham and Hagger [31]. In their work, they proved that a simple diagonal scaling would lead to a preconditioned system that only has a fixed number of Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

273

small eigenvalues, which are severely infected by the discontinuous jumps. More precisely, they proved that the ratio of the extreme values of the remaining eigenvalues, the effective condition number (cf. [30]), can be bounded by Ch −2 where C is a constant independent of the coefficients and mesh size. The aim of this paper is to provide a rigorous proof of the robustness of the overlapping domain decomposition preconditioners. As in [30], the main idea is to analyze the eigenvalue distribution of the preconditioned systems and to prove that except for a few ‘bad’ eigenvalues, the effective condition numbers are bounded uniformly with respect to the jump and logarithmically with respect to the mesh size. Thanks to a standard theory for the conjugate gradient method (see [31–33]), these small eigenvalues will not deteriorate the efficiency of the methodsignificantly. More specific, the asymptotic convergent rate of the PCG method will be 1−2/(C | log H |+1), which is uniform with respect to the size of discontinuous jump. When d = 3 if each subregion 0m (m = 1, . . . , M) is assumed to be a polyhedral domain with each edge length of size H0 , then the effective condition number of BA can be bounded by C (1+log H0 /H ). Consequently, the asymptotic convergence rate of the corresponding PCG algorithm is 1−2/(C 1+log H0 /H +1). In particular, if the coarse grid satisfies H H0 , then the asymptotic convergence rate of the PCG algorithm is bounded uniformly. The rest of the paper is organized as follows. In Section 2, we introduce some basic notation, the PCG algorithm and some theoretical foundations. In Section 3, we quote some main results on the weighted L 2 -projection from [23]. We also consider the approximation property and stability of weighted L 2 -projection in some special cases mentioned above. In Section 4, we analyze the eigenvalue distribution of the domain decomposition preconditioned system and prove the convergence rate of the PCG algorithm. In Section 5, we give some conclusion remarks. Following [20], we will use the following short notation: x y means xC y; xy means xcy and x y means cxyC x, where c and C are generic positive constants independent of the variables in the inequalities and any other parameters related to mesh, space and especially the coefficients.

2. PRELIMINARY 2.1. Notation We introduce the bilinear form a(u, v) =

M m=1

m (∇u, ∇v) L 2 (0 ) m

∀u, v ∈ HD1 ()

where HD1 () = {v ∈ H 1 () : v|D = 0} and introduce the H 1 -norm and seminorm with respect to any subregion 0m by |u|1,0 = ∇u0,0 , m

m

u1,0 = (u20,0 +|u|21,0 )1/2 m

m

m

Thus, a(u, u) =

M m=1

Copyright q

2008 John Wiley & Sons, Ltd.

m |u|21,0 := |u|21, m

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

274

Y. ZHU

We also need the weighted L 2 -inner product (u, v)0, =

M m=1

m (u, v) L 2 (0 ) m

and the weighted L 2 - and H 1 -norms 1/2

u0, = (u, u)0, ,

u1, = (u20, +|u|21, )1/2

For any subset O ⊂ , we denote |u|1,,O and u0,,O be the restrictions of |u|1, and u0, on the subset O, respectively. For the distribution of the coefficients, we introduce the index set I = {m : meas(* 0m ∩D ) = 0} where meas(·) is the d −1 measure, in other words, I is the index set of all subregions which do not touch the Dirichlet boundary. We assume that the cardinality of I is m 0 . We shall emphasize that m 0 is a constant that depends only on the distribution of the coefficients. 2.2. The discrete system Given a quasi-uniform triangulation Th with the mesh size h, let Vh = {v ∈ HD1 () : v| ∈ P1 (), ∀ ∈ Th } be the piecewise linear finite element space, where P1 denotes the set of linear polynomials. The finite element approximation of (1) is the function u ∈ Vh , such that gN v ∀v ∈ Vh a(u, v) = ( f, v)+ N

We define a linear symmetric positive definite (SPD) operator A : Vh → Vh by (Au, v)0, = a(u, v) The related inner product and the induced energy norm are denoted by (·, ·) A := a(·, ·), · A := a(·, ·) Then we have the following operator equation: Au = F (2) where F ∈ L 2 () such that (F, v)0, = ( f, v)+ N gN v, ∀v ∈ Vh . The space Vh has a natural n nodal basis {i }i=1 such that i (x j ) = i j for each non-Dirichlet boundary node x j . By means of these nodal basis functions, (2) can be reduced to the following linear algebra equation: A = b (3) where A = (ai j )n×n , with ai j = a(i , j ) = ∇i ·∇ j is the stiffness matrix and b = (b1 , . . . , bn ) ∈ Rn such that bi = ( f, i )+ N gN i . In this algebraic form, we shall also need the discrete weighted 2 inner product corresponding to the weighted L 2 -inner product. Let , ∈ Rn Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

275

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

be the vector representation of u, v ∈ Vh , respectively, i.e. u = Define (, )2 , =

n

n

i=1 i i

and v =

n

i=1 i i .

¯ i i i

i=1

where ¯ j = o j /|o j | is the average of the coefficient on the local patch o j = supp( j ). By definition and quasi-uniformity, we can easily see that h d (, )2 , u20, Let (A) be the condition number of A, i.e. the ratio between the largest and the smallest eigenvalues. By the standard finite element theory (cf. [14]), it is apparent that maxm m (A) = (A) h −2 J() with J() = minm m 2.3. PCG methods The well-known conjugate gradient method is the basis of all the preconditioning techniques to be studied in this paper. The PCG methods can be viewed as a conjugate gradient method applied to the preconditioned system BAu = BF Here, B is an SPD operator, known as a preconditioner of A. Note that BA is symmetric with respect to the inner product (·, ·) B −1 (or (·, ·) A ). For the implementation of the PCG algorithm, we refer to the monographs [34–36]. Let u k , k = 0, 1, 2, . . . , be the solution sequence of the PCG algorithm. It is well known that √ k (BA)−1 u −u k A 2 √ u −u 0 A (4) (BA)+1 which implies that the PCG method generally converges faster with a smaller condition number. Even though the estimate given in (4) is sufficient for many applications, in general, it is not sharp. One way to improve the estimate is to look at the eigenvalue distribution of BA (see [31–33, 37] for more details). More specifically, suppose that we can divide (BA), the spectrum of BA, into two sets, 0 (BA) and 1 (BA), where 0 consists of all ‘bad’ eigenvalues and the remaining eigenvalues in 1 are bounded above and below, then we have the following theorem. Theorem 2.1 (Axelsson [32] and Xu [33]) Suppose that (BA) = 0 (BA)∪1 (BA) such that there are m elements in 0 (BA) and ∈ [a, b] for each ∈ 1 (BA). Then k−m √ b/a −1 u −u 0 A (5) u −u k A 2K √ b/a +1 where

K = max 1− ∈1 (BA) ∈0 (BA)

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

276

Y. ZHU

If there are only m small eigenvalues in 0 , say 0< 1 2 · · · m m+1 · · · n then K=

m m 1− n n −1 = ((BA)−1)m i

i=1

1

In this case, the convergence rate estimate (5) becomes k−m √ b/a −1 u −u k A m 2((BA)−1) √ u −u 0 A b/a +1

(6)

Based on (6), given a tolerance 0< <1, the number of iterations of the PCG algorithm needed for u −u k A /u −u 0 A < is given by

2 +m log((BA)−1) c0 km + log (7)

√ √ where c0 = log( b/a +1)/( b/a −1). More detailed discussions on the iteration number of PCG methods can be found in [32, 38]. Observing the convergence estimate (6), if there are only a few small eigenvalues of BA in√ 0 (BA), then √ the convergent rate of the PCG methods will be dominated by the factor ( b/a −1)/( b/a +1), i.e. by b/a where b = n (BA) and a = m+1 (BA). We define this quantity as the ‘effective condition number.’ Definition 2.2 (Xu and Zhu [30]) Let V be a Hilbert space. The mth effective condition number of an operator A : V → V is defined by m+1 (A) =

max (A) m+1 (A)

where m+1 (A) is the (m +1)th minimal eigenvalue of A. To estimate the effective condition number, we need to estimate m+1 (A). A fundamental tool is the Courant-Fisher ‘minimax’ principle (see, e.g. [34]): Lemma 2.3 Let V be an n-dimensional Hilbert space and A : V → V is an SPD operator on V. Suppose that 1 2 · · · n are the eigenvalues of A, then (Av, v) dim(S)=m 0 =v∈S ⊥ (v, v)

m+1 (A) = max

min

for m = 1, 2, . . . , n. Especially, for any subspace V0 ⊂ V with dim(V0 ) = n −m, the following estimation of m+1 (A) holds: (Av, v) 0 =v∈V0 (v, v)

m+1 (A) min Copyright q

2008 John Wiley & Sons, Ltd.

(8)

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

277

Inequality (8) is the starting point for our analysis of eigenvalue distribution. It enables us to obtain a lower bound of every eigenvalue if we can estimate min0 =v∈V0 (Av, v)/(v, v) for some suitable subspace V0 . 3. WEIGHTED L 2 -PROJECTION Similar to [30], a major tool to analyze the overlapping domain decomposition preconditioner is 2 the weighted L 2 -projection Q H : L () → VH defined by (Q H u, v H )0, = (u, v H )0,

∀v H ∈ VH

In this section, we shall recall some main results on weighted L 2 -projection from [23]. Most of the results in this section can also be found in [30]. Lemma 3.1 (Bramble and Xu [23]) Let VH ⊂ Vh be two nested linear finite element spaces. Then for any u ∈ Vh , there (I − Q H )u0, cd (h, H )H |u|1, and |Q H u|1, cd (h, H )|u|1, where

⎧ H 1/2 ⎪ ⎪ ⎪ ⎨ log h cd (h, H ) = C · ⎪ ⎪ H 1/2 ⎪ ⎩ h

if d = 2 if d = 3

The proof of this lemma is based on the properties of the standard interpolation operator and Sobolev imbedding theorem (for details, see [23]). The above lemma is not necessary true for general H 1 () function. However, if we use the full weighted H 1 -norm, then we have Lemma 3.2 (Bramble and Xu [23]) For all u ∈ HD1 (), we have 1/2 u1, (I − Q H )u0, H | log H |

In general, we cannot replace u1, by the semi-norm |u|1, in the above lemma. For this 1 () of H 1 () as purpose, u ∈ HD1 () must satisfy certain condition. We introduce a subspace H D D follows: 1 1 v dx = 0, ∀m ∈ I HD () = v ∈ HD () : 0m

An important feature of this subspace is that the Poincar´e–Friedrichs inequality holds v0, |v|1, Copyright q

2008 John Wiley & Sons, Ltd.

D1 () ∀v ∈ H

(9)

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

278

Y. ZHU

Remark 3.3 The condition 0 v = 0 is not essential. The main idea is to introduce a subspace such that the m Poincar´e–Friedrichs inequality (9) holds. It can be replaced by some other conditions. For example, we can use v dx = 0, Fm ⊂ *0m and meas(Fm )>0 Fm

0m

for each such that m ∈ I. In this case, the Poincar´e–Friedrichs inequality (9) is still true (see [11, 14] for more details). Thanks to inequality (9), we have the following estimates for the weighted L 2 -projection: Lemma 3.4 1 () we have For any v ∈ H D 1/2 |v|1, (I − Q H )v0, H | log H |

(10)

1/2 |Q |v|1, H v|1, | log H |

(11)

and

Proof From the assumption, v satisfies the Poincar´e–Friedrichs inequality (9). Inequality (10) then follows by Lemma 3.2. The proof of inequality (11) relies on (10) and the local L 2 projection Q : L 2 () → P1 () defined by (Q u, ) = (u, ) for all ∈ P1 (). Then on each element ∈ TH , we have 2 2 2 |Q H v|1, |Q H v − Q v|1, +|Q v|1, 2 2 H −2 Q H v − Q v0, +|Q v|1, 2 2 2 H −2 (v − Q H v0, +v − Q v0, )+|Q v|1, 2 2 H −2 v − Q H v0, +|v|1,

In the last inequality, we used the stability and approximation properties of Q , see [23, Lemma 3.3]. By multiplying suitable weights and summing up over all ∈ TH on both sides, we obtain 2 −2 2 2 2 |Q H v|1, h v − Q H v0, +|v|1, | log H ||v|1,

In the last step, we used inequality (10).

Although it is true for d = 2 or 3, Lemma 3.4 is of interest only when d = 3. When d = 2, Lemma 3.1 is sufficient for our future use. From Lemma 3.4, the approximation and stability of the weighted L 2 -projection will deteriorate by | log H |. A sharper estimate can be obtained if we assume that each subregion 0m is a polyhedral domain with each edge of length H0 . Lemma 3.5 (Bramble and Xu [23]) Assume G is a polyhedral domain in R3 . Then v L 2 (E) | log h|1/2 v1,G

∀v ∈ Vh (G)

where E is any edge of G. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

279

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

1 (), we have By the Poincar´e–Friedrichs inequality (9), for each v ∈ H D v1,0 |v|1,0 m

for all 0m (m = 1, . . . , M)

m

Then by Lemma 3.5 and a standard scaling argument, H0 1/2 v L 2 (E) log |v|1,0 m H

D1 () ∀v ∈ VH (0m )∩ H

(12)

In this case, we can obtain the following approximation and stability properties for the weighted L 2 -projection: Lemma 3.6 In R3 , assume that each subregion 0m , (m = 1, . . . , M) satisfies H0 length(E) for each edge E 1 (), we have of 0m . Then for all v ∈ H D (I − Q H )v0, H

H0 log H

1/2 |v|1,

(13)

and |Q H v|1,

H0 log H

1/2 |v|1,

(14)

Proof Define w ∈ VH by

w=

⎧ w ⎪ ⎪ ⎨ m ⎪ ⎪ ⎩

at the nodes inside 0m

QFu

at the nodes inside F ⊂ *0m

0

at the nodes elsewhere

where wm = Q H v is the standard L 2 -projection of v, F ⊂ *0m is any face of 0m , and Q F : L 2 (F) → VH (F) is the orthogonal L 2 (F) projection. Then w −wm 2L 2 (0 ) H 3 m

F⊂*0m

x∈F

H3

H

(w −wm )2 (x)

x∈*0m

3

F∈*0m

F∈*0m

Copyright q

2008 John Wiley & Sons, Ltd.

(w −wm )2 (x)

x∈F

(wm − Q F u) (x)+ 2

x∈* F

2 wm (x)

(H wm − Q F u2L 2 (F) + H 2 wm 2L 2 (* F) ) Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

280

Y. ZHU

We need to bound two terms appearing in the last expression. For the first term, we have H wm − Q F u2L 2 (F) H u −wm 2L 2 (*0 ) m

F∈*0m

u −wm 2L 2 (0 ) + H 2 u −wm 21,0 m

m

H 2 u21,0

m

In the second step, we used inequality v L 2 (*0 ) −1 v0,0 + v1,0 m

m

(15)

m

The second term can be bounded by using inequality (12) H0 H0 |wm |21,0 H 2 log |u|21,0 H 2 wm 2L 2 (* F) H 2 log m m H H 0 F∈* m

In the last step, we used the stability of Q H : |wm |1,0 = |Q H u|1,0 |u|1,0 . Consequently, m

w −wm 0,0 H log m

H0 H

m

m

1/2 |u|1,0

m

This proves (13). The proof of the stability (14) is the same as in Lemma 3.4.

Remark 3.7 D (), we have In addition to the condition in Lemma 3.6, if H H0 then for all v ∈ H (I − Q w H )0,w H |v|1,w

(16)

|Q w H v|1,w |v|1,w

(17)

In fact, in this case, obviously inequality (12) becomes D1 () v L 2 (E) |v|1,0 , ∀ v ∈ VH (0m ) ∩ H m

Then inequalities (16) and (17) follows by the same proof as Lemma 3.6.

4. OVERLAPPING DOMAIN DECOMPOSITION METHODS In this section, we consider the two level overlapping domain decomposition methods. Specifically, there is a fine grid Th with mesh size h as described in Section 2.2, on which the solution is sought. There is also a coarse grid TH with mesh size H. For simplicity, we assume that each element in TH is a union of some elements in Th , and we also assume that TH aligns with the jump interface. Let V := Vh and V0 := VH be the piecewise linear continuous finite element spaces on Th and TH , respectively. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

281

We partition the domain into L nonoverlapping subdomains l (l = 1, . . . , L), such that = L l=1 l . Enlarge each subdomain l to l in such a way that the restriction of triangulation Th on l is also a triangulation of l itself, and l consists of all points in within a distance of CH from l . Here, we make no assumption on the relationship between this partition and the jump regions 0m (m = 1, . . . , M). Based on the partition, a natural decomposition of the finite element space V is V=

L

Vl

l=1

where Vl := {v ∈ V : v = 0 in \l }

As usual, we introduce the coarse space V0 to provide the global coupling between subdomains. Obviously, we have the space decomposition V=

L

Vl

l=0

For each l = 0, 1, . . . , L , we define the projections Pl , Q l : V → Vl by (Q l u, vl )0, = (u, vl )0, ∀vl ∈ Vl

a(Pl u, vl ) = a(u, vl ), and define the operator Al : Vl → Vl by

(Al u l , vl )0, = a(u l , vl )

∀u l , vl ∈ Vl

For convenience, we denote A = A L and Q −1 = 0. It follows from the definitions that Q l A = Al Pl

Q l Q k = Q k Ql = Q k

and

for kl

The additive Schwarz preconditioner is defined by B=

L l=0

Al−1 Q l

(18)

Obviously, we have BA =

L l=0

Al−1 Q l A =

L

Pl

l=0

4.1. Relation between additive Schwarz and diagonal scaling In [31], it was proved that the additive Schwarz preconditioner and diagonal scaling (Jacobi preconditioner) have the following relationship: Theorem 4.1 ([31]) There exist constants C1 1 and C2 >0 that depend only on the connectivity of the mesh such that, for all k = 1, . . . , n, k (D −1 A)C1 k (BA)C2 Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

282

Y. ZHU

By using this theorem, we have m 0 +1 (BA) m 0 +1 (D −1 A)h 2 From this relationship, we can see that the m 0 th effective condition number m 0 +1 (BA) h −2 is independent of the coefficients. We refer to [30] for a simple analytic proof of this fact. However, this estimate is too rough. It was pointed out that m 0 +1 (BA) could be much better than this estimate, but no rigorous proof was given in [31]. In the following subsection, we analyze the eigenvalue distribution of BA and prove the robustness of the additive Schwarz preconditioner. 4.2. Eigenvalue analysis of BA By a standard coloring technique [39, 40], we can easily prove max (BA)C where C is independent of the mesh and coefficient. The analysis of the lower bound of eigenvalues relies on certain stable decomposition. of V by 1 () in Section 3, we introduce a subspace V Similar to H D 1 V := HD ()∩V = v ∈ V :

m

v = 0, for m ∈ I

⊥

) = m 0 and the Poincar´e–Friedrichs inequality (9) holds for We shall emphasis here that dim(V Then we have the following stable decomposition result: any v ∈ V. Lemma 4.2 L For any v ∈ V, there exist vl ∈ Vl such that v = l=0 vl and L

a(vl , vl ) cd (h, H )2 a(v, v)

(19)

l=0

there exist vl ∈ Vl such that v = For any v ∈ V, L

L

l=0 vl

and

a(vl , vl ) | log H |a(v, v)

(20)

l=0

Furthermore, if each subdomain 0m satisfies length(E) H0 for any edge E of 0m , then for any there exist vl ∈ Vl such that v = L vl and v ∈ V, l=0 H0 a(v, v) a(vl , vl ) 1+log H l=0 L

In particular, in this case if the coarse grid satisfies H H0 the

(21) L

l=0 a(vl , vl ) a(v, v).

Proof The ideas to prove inequality (19)–(21) are the same. The main difference is that we use different properties of weighted L 2 -projection in Section 3. Here, we follow the idea from [20]. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

283

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM L be a partition of unity defined on satisfying Let {l }l=1

supp l ⊂ l ∪*,

L

l=1 l = 1

and for l = 1, 2, . . . , L ,

∇l ∞,l C H −1

0l 1,

Here ·∞,O denote the L ∞ -norm of a function defined on a subdomain O. L The construction of such a partition of unity is standard. A partition v = l=0 vl for vl ∈ Vl can then be obtained by taking v0 = Q v and 0 vl = Ih (l (v − Q 0 v)) ∈ Vl ,

l = 1, . . . , L

where Ih is the nodal value interpolant on V. From this decomposition, we prove that inequalities (19) and (20) hold. For any ∈ Th , note that h l −l, L ∞ () h∇l L ∞ () H Let w = v − Q 0 v, and by the inverse inequality |vl |1, |l, w|1, +|Ih (l −l, )w|1, |w|1, +h −1 Ih (l −l, )w0, It is easy to show that Ih (l −l, )w0,

h w0, H

Consequently, 1 w20, H2

|vl |21, |w|21, +

Summing over all ∈ Th ∩l with appropriate weights gives |vl |21, = |vl |21,,l |w|21,,l + and L l=1

L

L

1 w20,,l H2

1 w20,,l 2 H l=1 l=1 1 1 2 v| + v − Q v |v − Q 0, 0 1, 0 H2

a(vl , vl )

|vl |21,,l

|w|21,,l +

From the above inequality, for any v ∈ V, applying Lemma 3.1 we obtain inequality (19). Applying gives inequality (20), and applying Lemma 3.6 for any v ∈ V, we obtain Lemma 3.4 for v ∈ V inequality (21). This completes the proof. Theorem 4.3 For the additive Schwarz preconditioner B defined by (18), the eigenvalues of BA satisfies min (BA)cd (h, H )−2 , Copyright q

2008 John Wiley & Sons, Ltd.

m 0 +1 (BA)C|log H |−1

and max (BA)C

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

284

Y. ZHU

Moreover, when d = 3 and if each subregion 0m is a polyhedral domain with each edge of length H0 , then H0 −1 m 0 +1 (BA)C 1+log H Especially, if H H0 then m0+1 (B A)C. Proof L Pl , by a standard coloring argument, we have Note that BA = l=0 max (BA)C For the minimum eigenvalue, for any v ∈ V consider the decomposition v = Lemma 4.2. By the Schwarz inequality, we obtain a(v, v) =

L

a(vl , v) =

L l=0

1/2

a(vl , vl )

l=0

=

L

l=0 vl

as in

a(vl , Pl v)

l=0 L

L

L

1/2 a (Pl v, Pl v)

l=0

1/2 a(vl , vl )

(a (BAv, v))1/2

l=0

Followed by (19), we have a(v, v)cd (h, H )a(v, v)1/2 a(BAv, v)1/2

∀v ∈ V

This implies min (BA)cd (h, H )−2 On the other hand, by (20), we have a(v, v) | log H |1/2 a(v, v)1/2 a(BAv, v)1/2

∀v ∈ V

⊥ ) = m 0 , we obtain By Min–Max Lemma 2.3, and note that dim(V m 0 +1 (BA)| log H |−1 Similarly, from by (21) and Min–Max Lemma 2.3, H0 −1 m 0 +1 (BA)C 1+log H when each subregion satisfies length(E) H0 . This completes the proof.

Remark 4.4 Theorem 4.3 gives a direct proof of the robustness of overlapping domain decomposition preconditioner for the variable coefficient problem (1). That is, the preconditioned system has only m 0 small Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

285

eigenvalues, and the effective condition number is bounded by C| log H |, or C(1+log H0 /H ) if each subregion is a polyhedral domain with each edge of length H0 . Especially when H H0 , the effective condition number is bounded uniformly. The estimates of the maximum and minimum eigenvalues of BA are standard and can be found in many references (see, for example, [27, 39]). From the above theorem, we know that when d = 2, (BA)C(1+log H/ h) which is also quite robust. However, for the worst case in d = 3, we have (BA)C H/ h, which grows rapidly as h → 0. In this case, we have the following convergence estimate for the PCG algorithm. Theorem 4.5 In R3 , assume that each subregion 0m (m = 1, . . . , M) is a polyhedral domain with each edge of length H0 . Let u ∈ V be the exact solution to Equation (2) and {u k : k = 0, 1, 2, . . .} be the solution sequence of the PCG algorithm. Then we have m 0 u −u k A C0 H −1 2 k−m 0 for km 0 u −u 0 A h where = 1−2/(C 1+logH0 /H +1) < 1 and C0 , C are constants independent of coefficients and mesh size. Moreover, given a tolerance 0< <1, the number of iterations needed for u −u k A /u −u 0 A < satisfies

2 C0 H +m 0 log −1 | log( )| km 0 + log

h Especially, if H H0 then the asymptotic convergence rate of the PCG algorithm is uniform bounded with respect to both the coefficients and mesh size. Theorem 4.5 is a direct consequence of inequalities (6) and (7) and Theorem 4.3. Remark 4.6 From Theorem 4.5, although the convergence rate will deteriorate slightly by the condition number (BA), because m 0 is a fixed number, the asymptotic convergence rate can be bounded by < 1 which is uniform with respect to the coefficients and the mesh size. Without the assumption on the subdomains 0m (m = 1, . . . , M), Theorem 4.5 becomes k−m 0 m 0 u −u k A 2 C0 H −1 1− 2 for km 0 u −u 0 A h C1 | log H |+1 In this case, the number of iterations needed for u −u k A /u −u 0 A < with the given tolerance 0< <1 satisfies

2 C0 H km 0 + log +m 0 log −1 c0 (H )

h where

Copyright q

Cl | log H |+1 c0 (H ) = log Cl | log H |−1

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

286

Y. ZHU

Remark 4.7 By similar arguments, the results above can be generated to the inexact solver additive Schwarz preconditioners (cf. [39]) and also to the multilevel additive Schwarz preconditioners (cf. [41]). For the BPX preconditioner and the multigrid V -cycle preconditioner, similar results can be found in [30].

5. MATRIX REPRESENTATIONS So far, our analysis are based on the operator form (19). In this section, we are going to look at the algebraic representation of this preconditioner and to show that introducing the weighted L 2 projection Q l is for theoretical purpose only. That is, the matrix representation of B is independent of Q l . Let V be the finite element space, with the nodal basis {1 , . . . , n }. Then given any function v ∈ V, there exists a unique ∈ Rn such that v=

n

i i

i=1

Let v˜ = be the vector representation of v. Given two linear vector spaces V and W and a linear operator A ∈ L(V, W), the matrix representation of A with respect to a basis {1 , . . . , n } of V and a basis { 1 , . . . , k } of W is the matrix A˜ = (a˜ i j ) ∈ Rk×n satisfying A j =

k

a˜ i j i

for 1 jn

i=1

From the above definitions, it is easy to see that for any two operators A, B ∈ L(V) and v ∈ V, we have AB = A B and

Av = A˜ v˜

(22)

Given any subspace V0 ⊂ V equipped with a basis {01 , . . . , 0n 0 }. Then there exists a unique matrix I0 = (ei j ) ∈ Rn×n 0 such that 0j =

n

ei j i

for j = 1, . . . , n 0

i=1

This matrix is the matrix representation of the natural inclusion I0 : V0 → V, and it is known a prolongation matrix. Its transpose It0 is known as a restriction matrix. Define the mass matrix M and the stiffness matrix A by M = ((i , j )0, )n×n

and A = ((Ai , j )0, )n×n

˜ Obviously, By definition we have (u, v)0, = (u, ˜ Mv) ˜ 2 and we can easily show that A = M A. the prolongation and restriction matrices satisfy the following important relation: (u, v0 )0, = (Mu, ˜ I0 v˜0 )2 = (It0 Mu, ˜ v˜0 )2 Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

287

For the weighted L 2 projection Q l : V → Vl , we have by definition ˜ M V (u, vl )0, = (Q l u, vl )0, = ( Q l l ) 2 l u, Ml Vl )2 = ( Q l u,

where Ml is the mass matrix on Vl . On the other hand, note that l ) 2 (u, vl )0, = (u, Il vl )0, = (u, ˜ M Il vl )2 = (u, ˜ MIl V we deduce that the matrix representation of the weighted L 2 projection Q l , denoted by Ql , is Ql = Ml −1 Ilt M

(23)

To derive the algebraic representation of the preconditioner B=

L l=0

Al−1 Q l =

L l=0

Il Al−1 Q l

applying (22) and (23) gives B=

L l=0

l Q l = Il R

L l=0

Il (Al−1 Ml )(Ml−1 Ilt M) = BM

where Al−1 is the algebraic representation of Al−1 and B=

L l=0

Il Al−1 Ilt

is the standard matrix representation of the additive Schwarz preconditioner B. We summarize this section by the following relationship between the operator equation (2) and the algebraic equation (22). The linear iterations of (2) and (22) can be expressed as u k+1 = u k + B(F − Au k ) and

k+1 = k +B(b −Ak ), k = 0, 1, . . .

respectively. Proposition 5.1 (Xu [20]) The linear iterations u is a solution to (2) if and only if = u is the solution to (22) with b = M F. are equivalent if and only if B = BM. In this case, (BA) = (BA). 6. CONCLUSION In this paper, we discussed the eigenvalue distribution of the additive and multiplicative overlapping domain decomposition methods for second-order elliptic equations with large jump coefficients. We proved that there are only a few small eigenvalues infected by the large jump and that the effective condition number of the preconditioned system is of O(| log H |). As a result, the asymptotic convergence rate of the PCG algorithm with additive Schwarz preconditioner is 1− 2/(C | log H |+1). With additional assumptions on the subregions 0m (m = 1, . . . , M), we also proved that the effective condition number of the preconditioned system is uniform bounded with respect to the coefficients and mesh size. In this case, the asymptotic convergence rate of the PCG algorithm is bounded uniformly. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

288

Y. ZHU

ACKNOWLEDGEMENTS

This work was supported in part by NSF DMS-0609727, NSFC-10528102 and the Center for Computational Mathematics and Applications at Pennsylvania State. The author would like to thank professor Jinchao Xu for his valuable advice and comments on this paper. REFERENCES 1. Alcouffe RE, Brandt A, Dendy JE, Painter JW. The multi-grid methods for the diffusion equation with strongly discontinuous coefficients. SIAM Journal on Scientific and Statistical Computing 1981; 2:430–454. 2. Kees CE, Miller CT, Jenkins EW, Kelley CT. Versatile two-level Schwarz preconditioners for multiphase flow. Computers and Geosciences 2003; 7(2):91–114. 3. Vuik C, Segal A, Meijerink JA. An efficient preconditioned CG method for the solution of a class of layered problems with extreme contrasts in the coefficients. Journal of Computational Physics 1999; 152(1):385–403. 4. Heise B, Kuhn M. Parallel solvers for linear and nonlinear exterior magnetic field problems based upon coupled FE/BE formulations. Computing 1996; 56(3):237–258. 5. Coomer RK, Graham IG. Massively parallel methods for semiconductor device modelling. Computing 1996; 56(1):1–27. 6. Howle VE, Vavasis SA. An iterative method for solving complex-symmetric systems arising in electrical power modeling. SIAM Journal on Matrix Analysis and Applications 2005; 26(4):1150–1178. 7. Wang C. Fundamental models for fuel cell engineering. Chemical Reviews 2004; 104:4727–4766. 8. Wang Z, Wang C, Chen K. Two phase flow and transport in the air cathode of proton exchange membrane fuel cells. Journal of Power Sources 2001; 94:40–50. 9. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring. IV. Mathematics of Computation 1989; 53(187):1–24. 10. Chan T, Wan W. Robust multigrid methods for nonsmooth coefficient elliptic linear systems. Journal of Computational and Applied Mathematics 2000; 123:323–352. 11. Dryja M, Widlund OB. Schwarz methods of Neumann–Neumann type for three-dimensional elliptic finite element problems. Communications on Pure and Applied Mathematics 1995; 48(2):121–155. 12. Mandel J, Brezina M. Balancing domain decomposition for problems with large jumps in coefficients. Mathematics of Computation 1996; 65(216):1387–1401. 13. Smith BF. A domain decomposition algorithm for elliptic problems in three dimensions. Numerische Mathematik 1991; 60(2):219–234. 14. Xu J, Zou J. Some nonoverlapping domain decomposition methods. SIAM Review 1998; 40(4):857–914 (electronic). 15. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring. III. Mathematics of Computation 1988; 51(184):415–430. 16. Cho S, Nepomnyaschikh SV, Park E-J. Domain decomposition preconditioning for elliptic problems with jumps in coefficients. Technical Report rep05-22, Radon Institute for Computational and Applied Mathematics (RICAM), 2005. 17. Nepomnyaschikh SV. Preconditioning operators for elliptic problems with bad parameters. Eleventh International Conference on Domain Decomposition Methods, London. DDM.org: Augsburg, 1999; 82–88 (electronic). 18. Wang J, Xie R. Domain decomposition for elliptic problems with large jumps in coefficients. Proceedings of Conference on Scientific and Engineering Computing. National Defense Industry Press: Beijing, China, 1994; 74–86. 19. Bramble JH, Pasciak JE, Wang J, Xu J. Convergence estimates for multigrid algorithms without regularity assumption. Mathematics of Computation 1991; 57(195):23–45. 20. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34:581–613. 21. Le Tallec P. Domain decomposition methods in computational mechanics. Computational Mechanics Advances 1994; 1(2):121–220. 22. Smith BF, Bjørstad PE, Gropp WD. Domain Decomposition. Cambridge University Press: Cambridge, 1996. 23. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194): 463–476. 24. Oswald P. On the robustness of the BPX-preconditioner with respect to jumps in the coefficients. Mathematics of Computation 1999; 68:633–650. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM

289

25. Wang J. New convergence estimates for multilevel algorithms for finite-element approximations. Journal of Computational and Applied Mathematics 1994; 50(1–3):593–604. 26. Dryja M, Sarkis MV, Widlund OB. Multilevel Schwarz methods for elliptic problems with discontinuous coefficients in three dimensions. Numerische Mathematik 1996; 72(3):313–348. 27. Dryja M, Smith BF, Widlund OB. Schwarz analysis of iterative substructuring algorithms for elliptic problems in three dimensions. SIAM Journal on Numerical Analysis 1994; 31(6):1662–1694. 28. Xu J. Counter examples concerning a weighted L 2 projection. Mathematics of Computation 1991; 57:563–568. 29. Xu J, Zhu Y. Multilevel preconditioners for elliptic equations with jump coefficients on bisection grids. 2007, preprint. 30. Xu J, Zhu Y. Uniform convergent multigrid methods for elliptic problems with strongly discontinuous coefficients. Mathematical Models and Methods in Applied Sciences 2008; 18(2):1–29. 31. Graham IG, Hagger MJ. Unstructured additive Schwarz-conjugate gradient method for elliptic problems with highly discontinuous coefficients. SIAM Journal on Scientific Computing 1999; 20:2041–2066. 32. Axelsson O. Iteration number for the conjugate gradient method. Mathematics and Computers in Simulation 2003; 61(3–6):421–435. 33. Xu J. Lecture Notes Multigrid Methods. Penn State MATH 552 (Fall 2006). 34. Golub GH, van Loan CF. Matrix Computations. Johns Hopkins University Press: Baltimore, MD, 1996. 35. Kelley CT. Iterative Methods for Linear and Nonlinear Equations, vol. 16. SIAM: Philadelphia, PA, 1995. 36. Saad Y. Iterative Methods for Sparse Linear Systems. SIAM: Philadelphia, PA, 2003. 37. Hackbusch W. Iterative Solution of Large Sparse Systems of Equations, vol. 95. Springer: New York, 1994. 38. Axelsson O. Iterative Solution Methods. Cambridge University Press: Cambridge, 1994. 39. Chan TF, Mathew TP. Domain decomposition algorithms. Acta Numerica 1994; 3:61–143. 40. Toselli A, Widlund O. Domain Decomposition Methods—Algorithms and Theory, vol. 34. Springer: Berlin, 2005. 41. Zhang X. Multilevel Schwarz methods. Numerische Mathematik 1992; 63(1):521–539.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:271–289 DOI: 10.1002/nla

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2008; 15:291–306 Published online 15 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.574

Uniform convergence of the multigrid V -cycle on graded meshes for corner singularities James J. Brannick, Hengguang Li∗, † and Ludmil T. Zikatanov Department of Mathematics, The Pennsylvania State University, University Park, PA 16802, U.S.A.

SUMMARY This paper analyzes a multigrid (MG) V -cycle scheme for solving the discretized 2D Poisson equation with corner singularities. Using weighted Sobolev spaces K am () and a space decomposition based on elliptic projections, we prove that the MG V -cycle with standard smoothers (Richardson, weighted Jacobi, Gauss– Seidel, etc.) and piecewise linear interpolation converges uniformly for the linear systems obtained by finite element discretization of the Poisson equation on graded meshes. In addition, we provide numerical experiments to demonstrate the optimality of the proposed approach. Copyright q 2008 John Wiley & Sons, Ltd. Received 23 May 2007; Revised 30 November 2007; Accepted 3 December 2007 KEY WORDS:

multigrid methods; graded meshes; uniform convergence; corner-like singularities

1. INTRODUCTION Multigrid (MG) methods are arguably one of the most efficient techniques for solving the large systems of algebraic equations resulting from finite element discretizations of elliptic boundary value problems. Many of the known results on the convergence properties of MG methods for elliptic equations can be found in monographs and survey papers by Bramble [1], Hackbusch [2], Trottenberg et al. [3], Xu [4] and the references therein. It is well known that the geometry of the boundary and changes in the boundary condition can influence the regularity of the solution [5–12]. In particular, if the domain possesses re-entrant corners, cracks, or there exist abrupt changes in the boundary conditions, then the solution of

∗ Correspondence

to: Hengguang Li, Department of Mathematics, The Pennsylvania State University, University Park, PA 16802, U.S.A. † E-mail: li [email protected] Contract/grant sponsor: NSF; contract/grant numbers: DMS-0555831, DMS-058110 Contract/grant sponsor: Lawrence Livermore National Lab; contract/grant number: B568399

Copyright q

2008 John Wiley & Sons, Ltd.

292

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

the elliptic boundary value problem may have singularities in H 2 —we hereafter refer to singularities of these types as corner-like singularities. One possible approach for obtaining accurate numerical approximations to the solutions nearby these types of singularities is to make use of graded meshes [6, 13–15], for which the quasi-optimal convergence rates of the numerical solutions can be recovered by using an analysis based on weighted Sobolev spaces. The analysis of the convergence rate of MG methods in such settings is, however, non-trivial. The difficulties that arise are due primarily to the lack of regularity of the solution and the non-uniformity of the mesh. A result for the uniform convergence of the MG method assuming full regularity was derived by Braess and Hackbusch [16]; in Brenner’s paper [17], the analysis of the convergence rate for only partial regularity was presented; Bramble et al. [18] developed the convergence estimate without regularity assumptions for an L 2 -projection-based decomposition. In addition, on graded meshes, using the approximation property in [14], Yserentant [19] proved the uniform convergence of the MG W -cycle with a particular iterative method on each level for piecewise linear functions. There are also many other more classical convergence proofs that use algebraic techniques and derive convergence results based on assumptions related to, but nevertheless different from, the regularity of the underlying partial differential equation [20, 21]. In this paper, using a space decomposition for elliptic projections and an estimate on the weighted Sobolev space K am , we prove the uniform convergence of the MG V -cycle with standard subspace smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) for elliptic problems with cornerlike singularities, discretized using graded meshes. To date, this type of convergence analysis has been carried out only for problems with full elliptic regularity. The result presented here establishes the uniform convergence of the MG method for problems with less regular solutions discretized using graded meshes that appropriately capture the correct behavior of the solution near the singularities. Although the main convergence theorem can be modified for elliptic problems discretized on general graded meshes, for exposition, we restrict our discussion to the graded mesh refinement (GMR) strategy developed by B˘acut¸a˘ et al. [6]. Before proceeding, we mention that, with appropriate modifications, our analysis for linear elements can also be applied to higher-order finite element methods. 1.1. Preliminaries and notation Let be a bounded polygonal domain, possibly with cracks, in R2 and consider the following prototype elliptic equation with mixed boundary conditions: −u = f

in

u=0

on * D

*u/*n = 0

on * N

(1)

where * D and * N consist of segments of the boundary, and we assume that the Neumann boundary condition is not imposed on adjacent sides of the boundary. We note that, in the Sobolev space H m , corner-like singularities appear in the solution near vertices of the domain. Here, by ¯ where corner-like singularities in H 2 () are located, namely, vertices, we mean the points on the geometric vertices on re-entrant corners, crack points, or points with an interior angle > /2, where the boundary conditions change. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

MULTIGRID METHOD ON GRADED MESHES

293

Let H D1 () = {u ∈ H 1 ()| u = 0 on * D } be the space of H 1 () functions with zero trace on * D , Tj , 0 jJ , be a sequence of appropriately graded and nested triangulations of , and Mj , 0 jJ , be the finite element space associated with the linear Lagrange triangle [22] on Tj . Then, M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 () Let A be the differential operator associated with Equation (1). Solving (1) amounts to finding an approximation u J ∈ MJ such that a(u J , v J ) = (Au J , v J ) = (∇u J , ∇v J ) = ( f, v J )

∀v J ∈ MJ

Denoting by N J the dimension of the space MJ , by using a GMR strategy, one can recover the following quasi-optimal rate of convergence for the finite element approximation u J ∈ MJ on TJ : −1/2

u −u J H 1 () C N J

f L 2 ()

The main objective of this paper is to prove the uniform convergence of the MG V -cycle with standard subspace smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) and linear interpolation applied to the 2D Poisson equation discretized using piecewise linear functions on graded meshes obtained via the GMR strategy introduced in [6]. Moreover, we shall show that the convergence rate, c, of the MG V -cycle satisfies c

c1 c1 +c2 n

where c1 and c2 are mesh-independent constants related to the elliptic equation and the smoother, respectively, and n is the number of iterative solves on each subspace. We note that this result can also be used to estimate the efficiency of other subspace smoothers on graded meshes. The rest of this paper is organized as follows. In Section 2, we introduce the weighted Sobolev space K am () for boundary value problem (1) and review the method of subspace corrections (MSC). In addition, we briefly describe the GMR strategy under consideration here for generating the sequence of graded meshes. Then, in Section 3, we prove the approximation and smoothing properties, which in turn lead to our main MG convergence theorem. Section 4 contains numerical results of the proposed method applied to problem (1).

2. WEIGHTED SOBOLEV SPACES AND THE MSC In this section, we begin by introducing the weighted Sobolev space K am () and the mesh refinement strategy under consideration for recovering quasi-optimal rates of convergence of the finite element solution. Then, we present the MSC and a technique for estimating the norm of the product of non-expansive operators. 2.1. Weighted Sobolev spaces and graded meshes It has been shown in [6–8, 14, 23] that with a careful choice of the parameters in the weight, the singular behavior of the solution in Equation (1) can be captured well in the following weighted Sobolev spaces. Namely, there is no loss of regularity of the solution in these spaces and the corresponding refinements of meshes are optimal in the sense of Theorem 2.3. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

294

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

¯ be an arbitrary point and S = {Si } be the set of vertices of the domain, on which Let (x, y) ∈ the solution has singularities in H 2 (). Denote by ri (x, y) the distance from (x, y) to the vertex ¯ such that =ri in the neighborhood of Si , Si ∈ S and let (x, y) be a smooth function on , and C > 0 otherwise. Then, the weighted Sobolev space K am (), m0, is defined as follows [6, 11]: i

j

m K am () = {u ∈ Hloc ()| i+ j−a *x * y u ∈ L 2 (), i + jm}

The corresponding K am -norm and seminorm for any function v ∈ K am () are v2K m () := a

|v|2K m () := a

i+ j m

i+ j=m

i

j

i+ j−a *x * y v2L 2 () i

j

m−a *x * y v2L 2 ()

Note that is equal to the distance function ri (x, y) near the vertex Si . Thus, we have the following proposition and mesh refinements as in [6, 15]. Proposition 2.1 We have |v| K 1 () = ∼ |v| H 1 () , v K 0 () Cv L 2 () , and the Poincar´e type inequality v K 0 () 1

1

1

C|v| K 1 () for v ∈ K 11 ()∩{v|* D = 0}. 1

Here, a = ∼ b means there exist positive constants C1 , C2 , such that C1 baC2 b. Definition 2.2 Let be the ratio of decay of triangles near a vertex Si ∈ S. Then, for every < min(/ti ), one can choose = 2−1/ , where i is the interior angle of vertex Si , t = 1 on vertices with both Dirichlet boundary conditions, and t = 2 if the boundary condition changes type at Si . For example, i = 2 and t = 1 on crack points with both Dirichlet boundary conditions. In the initial triangulation, we require that each triangle contains at most one point in S, and each Si needs to be a vertex of some triangle. In other words, no point in S is sitting on the edge or in the interior of a triangle. Let Tj = {Tk } be the triangulation after j refinements. Then, for the ( j +1)th refinement, if the function is bounded away from 0 on a triangle (no point in S contained), new triangles are obtained by connecting the mid-points of the old one. However, if Si is one of the vertices of a triangle Si BC, then we choose a point D on Si B and another point E on Si C such that the following holds for the ratios of the lengths = Si D/Si B = Si E/Si C In this way, the triangle Si BC is divided into four smaller triangles by connecting D, E, and the mid-point of BC (see Figure 1). We note that other refinements, for example, those found in [13, 14] also satisfy this condition, although they follow different constructions. We now conclude this subsection by restating the following theorem derived in [6, 15]. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

MULTIGRID METHOD ON GRADED MESHES

295

Figure 1. Mesh refinements: triangulation after one refinement, = 0.2.

Theorem 2.3 Let u j ∈ Mj be the finite element solution of Equation (1) and denote by N j the dimension of Mj . Then, there exists a constant B1 = B1 (, , ), such that −1/2

u −u j H 1 () B1 N j

f K 0

−1 ()

−1/2

B1 N j

f L 2 ()

for every f ∈ L 2 (), where < 1 is determined from Definition 2.2, Mj is the finite element space of linear functions on the graded mesh Tj , as described in the introduction. Remark 2.4 m+1 For u ∈ / H 2 (), this theorem follows from the fact that the differential operator A : K 1+ ()∩{u = m−1 0, on * D } → K −1+ (), m0, in Equation (1), is an isomorphism between the weighted Sobolev spaces. 2.2. The method of subspace corrections In this subsection, we review the MSC and provide an identity for estimating the norm of the product of non-expansive operators. In addition, Lemma 2.6 reveals the connection between the matrix representation and operator representation of the MG method. Let H D1 () = {u ∈ H 1 ()|u = 0 on * D } be the Hilbert space associated with Equation (1), Tj be the associated graded mesh, as defined in the previous subsection, Mj ∈ H D1 () be the space of piecewise linear functions on Tj , and A : H D1 () → (H D1 ()) be the corresponding differential operator. The weak form for (1) is then a(u, v) = (Au, v) = (−u, v) = (∇u, ∇v) = ( f, v) ∀v ∈ H D1 () where the pairing (·, ·) is the inner product in L 2 (). Here, a(·, ·) is a continuous bilinear form on H D1 ()× H D1 () and by the Poincare inequality is also coercive. In addition, since the Tj are nested, M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 () Define Q j , P j : H D1 () → Mj and A j : Mj → Mj as orthogonal projectors and the restriction of A on Mj , respectively, (Q j u, v j ) = (u, v j ),

a(P j u, v j ) = a(u, v j )

(Au j , v j ) = (A j u j , v j ) Copyright q

2008 John Wiley & Sons, Ltd.

∀u ∈ H D1 () ∀u j , v j ∈ Mj Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

296

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV j

j

Let N j = {xi } be the set of nodal points in Tj and k (xi ) = i,k be the linear finite element nodal j basis function corresponding to node xk . Then, the jth level finite element discretization reads: Find u j ∈ Mj , such that Aju j = f j

(2)

where f j ∈ Mj satisfies ( f j , v j ) = ( f, v j ), ∀v j ∈ Mj . The MSC reduces an MG process to choosing a sequence of subspaces and corresponding operators B j : Mj → Mj approximating A−1 j , j = 1, . . . , J . For example, in the MSC framework, the standard MG backslash cycle for solving (2) is defined by the following subspace correction scheme: u j,l = u j,l−1 + B j ( f j − A j u j,l−1 ) where the operators B j : Mj → Mj , 0 jJ , are recursively defined as follows [24]. Algorithm 2.5 −1 Let R j ≈ A−1 j , j > 0, denote a local relaxation method. For j = 0, define B0 = A0 . Assume that B j−1 : Mj−1 → Mj−1 is defined. Then, 1. Fine grid smoothing: For u 0j = 0 and k = 1, 2, . . . , n, u kj = u k−1 + R j ( f j − A j u k−1 j j )

(3)

2. Coarse grid correction: Find the corrector e j−1 ∈ Mj−1 by the iterator B j−1 e j−1 = B j−1 Q j−1 ( f j − A j u nj ) Then, B j f j = u nj +e j−1 . Recursive application of Algorithm 2.5 results in an MG V -cycle for which the following identity holds: I − B vJ A J = (I − B J A J )∗ (I − B J A J ) [24], where B vJ is the iterator for the MG V -cycle. Direct computation gives the following useful result: u nj = (I − R j A j )u n−1 + Rj Aju j j = (I − R j A j )2 u n−2 −(I − R j A j )2 u j +u j j = −(I − R j A j )n u j +u j where u j is the finite element solution of (2) and u nj is the approximation after n iterations of (3) on the jth level. Let T j = (I −(I − R j A j )n )P j be a linear operator and define T0 = P0 . We have the following identity: (I − B J A J )u J = u J −u nJ −e J −1 = (I − T J )u J −e J −1 = (I − B J −1 A J −1 PJ −1 )(I − T J )u J where, for B J −1 = A−1 J −1 , this becomes a two-level method. Recursive application of this identity then yields the error propagation operator of an MG V -cycle: (I − B J A J ) = (I − T0 )(I − T1 ) · · · (I − T J ) Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

MULTIGRID METHOD ON GRADED MESHES

297

To estimate the uniform convergence of the MG V -cycle, we thus need to show that I − B vJ A J a = I − B J A J a2 c < 1 where c is independent of J and ua2 = a(u, u) = (Au, u) on . Associated with each T j , we introduce its symmetrization T¯ j = T j + T j∗ − T j∗ T j where T j∗ is the adjoint operator of T j with respect to the inner product a(·, ·). By a well-known result found in [25], the following estimate holds: c0 I − B J A J a2 = 1+c0 where c0 sup

J

va =1 j=1

a((T¯ j−1 − I )(P j − P j−1 )v, (P j − P j−1 )v)

(4)

Now, to prove the uniform convergence of the proposed MG scheme, we must derive a uniform bound on the constant c0 . Although the above presentation is in terms of operators, the matrix representation of the smoothing step (3) is often used in practice. By the matrix representation R of an operator R on Nj Mj , we here mean that with respect to the basis {i }i=1 of Mj , R(k ) =

Nj

Ri,k i

i=1

where Ri,k is the (i, k) component of the matrix R. Throughout the paper, we use boldfaced letters to denote vectors and matrices. Let A S = D−L−U be the stiffness matrix associated with the operator A j , where the matrix D consists of only the diagonal entries of A S , while matrices −L and −U are the strictly lower and upper triangular parts of A S , respectively. Denote by R M the corresponding matrix of the smoother R j on the jth level. For example, R M = D−1 for the Jacobi method, and R M = (D−L)−1 for the Gauss–Seidel method. In addition, let ul , ul−1 , and f be the vectors containing the coordinates N j l Ni l of u lj , u l−1 j , f j ∈ Mj on the basis {i }i=1 , namely u j = i=1 ui i . Then, one smoothing step for solving (2) on a single level j in terms of matrices reads ul = ul−1 +R M (Mf−A S ul−1 )

(5)

where M is the mass matrix, and Mi,k = (i , k ). Lemma 2.6 Let R be the matrix representation of the smoother R j in Equation (3). Then, R = RM M Hence, R j (k ) =

Nj i=1

Copyright q

2008 John Wiley & Sons, Ltd.

Ri,k i =

Nj

(R M M)i,k i

i=1

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

298

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

and ul = ul−1 +R M (Mf−A S ul−1 ) = ul−1 +R(f−M−1 A S ul−1 ) Proof Denote by A the matrix representation of the operator A. Note that Nj (Ai , k ) = Am,i m , k = (∇k , ∇i ) = (A S )k,i m=1

indicates A S = MA. Moreover, in terms of matrices and vectors, Equation (3) also reads Nj i=1

uli i =

Nj i=1

ul−1 i i +

Nj Nj

Rk,i fi k −

i=1 k=1

Nj Nj Nj

Rm,k Ak,i ui m

i=1 k=1 m=1

Then, the inner product with n on both sides, 1nN j , leads to Mul = Mul−1 +MRf−MRAu Multiplication by M−1 gives ul = ul−1 +R(f−Au) Taking into account that Equations (3) and (5) represent the same iteration, we have Rf = R M Mf Note the above equation holds for any f ∈ R N j . Therefore, R = R M M, which completes the proof.

3. UNIFORM CONVERGENCE OF THE MG METHOD ON GRADED MESHES Next, we derive an estimate for the constant c0 in (4) of Section 2 and then proceed to establish the main convergence theorem of the paper. We begin by proving several lemmas that are needed ¯ for in the convergence proof. For simplicity, we assume that there is only a single point S0 ∈ , 2 which the solution of Equation (1) has a singularity in H (), and that a nested sequence of graded meshes has been constructed, as described in Definition 2.2. The same argument, however, carries over to problems on domains with multiple singularities and also for similar refinement strategies. S Denote by {Ti 0 } all the initial triangles with the common vertex S0 . Recall that the function in the weight equals the distance to S0 on these triangles. Based on the process in Definition 2.2, S after N refinements, the region ∪Ti 0 is partitioned into N +1 sub-domains (layers) Dn , 0nN , whose sizes decrease by the factor as they approach S0 (see Figure 2). In addition, (x, y) = ∼ n N on Dn for 0n < N and (x, y)C on D N . Meanwhile, sub-triangles (nested meshes) are generated in these layers Dn , 0nN , with corresponding mesh size of order O(n 2n−N ). Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

299

MULTIGRID METHOD ON GRADED MESHES

Figure 2. Initial triangles with vertex S0 (left); layer D0 and D1 after one refinement (right), = 0.2.

Note that = (∪Dn )∪(\∪ Dn ). Let *Dn be the boundary of Dn . Then, we define a piecewise ¯ as follows. constant function r p (x, y) on (1/2)n on D¯ n \*Dn−1 for 1 < nN r p (x, y) = 1 otherwise S

where N = J is the number of refinements for TJ . Therefore, the restriction of r p on every Ti 0 ∩ Dn is a constant. Recall that < 1 is the parameter for , such that = 2−1/ . Define the weighted inner product with respect to r p : (u, v)r p = (r p u,r p v) = r 2p uv

In addition, the above inner product induces the norm: 1/2

ur p = (u, u)r p Then, the following estimate holds. Lemma 3.1 (u j − P j−1 u j , u j − P j−1 u j )r p

c1 a(u j − P j−1 u j , u j − P j−1 u j ) Nj

∀u j ∈ Mj

where N j = O(22 j ) is the dimension of Mj . Proof This lemma can be proved by the duality argument as follows. Consider the following boundary value problem: −w = r 2p (u j − P j−1 u j )

in

w = 0 on * D *w/*n = 0 on * N Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

300

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

Then, since P j−1 w ∈ Mj−1 , from the equation above, we have (r p (u j − P j−1 u j ),r p (u j − P j−1 u j )) = (r 2p (u j − P j−1 u j ), u j − P j−1 u j ) = (∇w, ∇(u j − P j−1 u j )) = (∇(w − P j−1 w), ∇(u j − P j−1 u j )) We note that w is a piecewise linear function on the graded triangulation Tj that is derived after j refinements. From the results of Theorem 2.3, we conclude |w − P j−1 w|2H 1 () (C1 /N j−1 )w2K 0 = (C1 /N j−1 )

−1 ()

j

n=0

(C/N j−1 )

j

n=0

= (C/N j−1 )

j

n=0

= (C/N j−1 )

j

n=0

1− w2L 2 (D ) +1− w2L 2 (\∪D n

n)

n(1−) w2L 2 (D ) +w2L 2 (\∪D n

n)

2

n n

w2L 2 (D ) +w2L 2 (\∪D ) n n

2 2 r −1 p w L 2 (Dn ) +w L 2 (\∪Dn )

2 = (C/N j−1 )r −1 p w L 2 ()

The inequalities above are based on the definition of , r p , and related norms. Now, since N j = O(N j−1 ), combining the results above, we have u j − P j−1 u j r2p =

|w − P j−1 w|2H 1 |u j − P j−1 u j |2H 1 (u j − P j−1 u j )r2p |w − P j−1 w|2H 1 |u j − P j−1 u j |2H 1 2 r −1 p w L 2

c1 c1 |u j − P j−1 u j |2H 1 = a(u j − P j−1 u j , u j − P j−1 u j ) Nj Nj

which completes the proof. Recall that the matrix form R M and the matrix representation R of a smoother from Lemma 2.6. Then, we have the following result regarding the smoother R tj A j R j on Mj , which is the symmetrization of R j , where R tj is the adjoint of to (·, ·). Copyright q

2008 John Wiley & Sons, Ltd.

R j are different R¯ j = R j + R tj − R j with respect

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

301

MULTIGRID METHOD ON GRADED MESHES

Lemma 3.2 For the subspace smoother R¯ j : Mj → Mj , we assume that there is a constant C > 0 independent ¯ M satisfies of j, such that the corresponding matrix form R ¯ M vCvT v ∀v ∈ R N j vT R on every level j, where N j is the dimension of the subspace Mj . Then, there exists c2 > 0, also independent of the level j, such that the following estimate holds on each graded mesh Tj , c2 ¯ ( R j v, v)( R¯ j v, R¯ j v)r p ∀v ∈ Mj Nj Proof For any v = i vi i ∈ Mj , from Lemma 2.6, we have ¯ M M)k,m k , vi i = vT MT R ¯ M Mv ( R¯ j v, v) = v m (R m

On the other hand, ( R¯ j v, R¯ j v)r p =

k

m

vm

i

¯ M M)k,m k , (R

k

¯ M M)i,l i vl (R

l

i

¯ MM ˜R ¯ M Mv = vT MT R ˜ is a matrix satisfying (M) ˜ i,k = (r p i ,r p k ). Note that both M and M ˜ are symmetric where M positive definite (SPD). Now, suppose supp(i )∩ Dn = ∅, 0n j. Then, on supp(i ), the mesh size is O(n 2n− j ) and r p = ∼ (1/2)n , respectively, since supp(i ) is covered by at most two adjacent ˜ are positive and M ˜ = layers. Thus, all the non-zero elements in M ∼ 2−2 j = ∼ 1/N j . To complete the proof, it is sufficient to show that there exists C > 0, such that ¯ 1/2 M ˜R ¯ 1/2 w(C/N j )wT w wT R M M ¯ Mv. where w = R M ˜ it follows that ¯ M and the estimates on M, From the condition on R 1/2

T¯ T ¯ 1/2 M ˜R ¯ 1/2 w = wT R ∼ (1/N j )w R M w(C/N j )w w M M

Remark 3.3 For our choice of graded meshes, the triangles remain shape-regular elements, that is, the minimum angles of the triangles are bounded away from 0. Therefore, the stiffness matrix A S has a bounded number of non-zero entries per row and each entry is of order O(1). Hence, the maximum eigenvalue of A S is bounded. For this reason, standard smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) satisfy Lemma 3.2, and (R M )i, j = O(1) as well, since they are all from part of the matrix A S . Moreover, if R M is SPD and the spectral radius (R M A S ) , for 0 < < 1, then based on Lemma 2.6, a(R j A j v, v) = (A j R j A j v, v) = vT A S R M A S v a(v, v) Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

302

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV 1/2

1/2

The last inequality follows from the similarity of the matrix A S R M A S and the matrix R M A S . Note that the above inequality implies the spectral radius of R j A j , since R j A j is symmetric with respect to a(·, ·). We then define the following operators for the MG V -cycle. Recall T j from Section 2 and let R j denote a subspace smoother satisfying Lemma 3.2. Recall the symmetrization R¯ j of R j , and assume the spectral radius ( R¯ j A j ) for 0 < < 1. Note that R tj is the adjoint of R j with respect to (·, ·) and T j∗ is the adjoint of T j with respect to a(·, ·). With n smoothing steps, where R j and R tj are applied alternatingly, the operator G j and G ∗j are defined as follows: G ∗j = I − R tj A j

G j = I − Rj Aj, With this choice

Tj =

P j −(G ∗j G j )n/2 P j

for even n

P j − G j (G ∗j G j )(n−1)/2 P j

for odd n

Therefore, if we define G j,n =

G ∗j G j

for even n

G j G ∗j

for odd n

since P j2 = P j , T¯j = T j + T j∗ − T j∗ T j = (I − G nj,n )P j Note that T¯ j is invertible on Mj , and hence T¯ j−1 exists. The main result concerning the uniform convergence of the MG V -cycle for our model problem is summarized in the following theorem. Theorem 3.4 On every triangulation Tj , suppose that the smoother on each subspace Mj satisfies Lemma 3.2. Then, following the algorithm described above, we have I − B J A J a2 =

c0 c1 1+c0 c1 +c2 n

where c1 and c2 are constants from Lemmas 3.1 and 3.2. Proof Recall (4) from Section 2. To estimate the constant c0 , we first consider the decomposition v = j v j for any v ∈ MJ with v j = (P j − P j−1 )v ∈ Mj Then, Lemma 3.1 implies N j (v j , v j )r p c1 a(v j , v j ) Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

303

MULTIGRID METHOD ON GRADED MESHES

Estimating the identity of Xu and Zikatanov [25], we have a(T¯ j−1 (I − T¯ j )v j , v j ) = a((I − G nj,n )−1 G nj,n v j , v j ) n −1 n ¯ = ( R¯ −1 j R j A j (I − G j,n ) G j,n v j , v j ) n −1 n = ( R¯ −1 j (I − G j,n )(I − G j,n ) G j,n v j , v j ) −1/2 1/2 −1/2 Note that G kj,n , kn, is in fact a polynomial of R¯ j A j . Therefore, R¯ j (I − G j,n ) R¯ j , R¯ j G nj,n −1/2 1/2 1/2 1/2 −1/2 1/2 (I − G n ) R¯ are all polynomials of R¯ A j R¯ , where R¯ (I − G n ) R¯ = R¯ 1/2 , and R¯ j

j,n

j

j

j

j

j,n

j

−1/2 1/2 −1/2 1/2 −1/2 ( R¯ j (I − G nj,n )−1 R¯ j )−1 . Thus, it can be seen that R¯ j (I − G j,n ) R¯ j , R¯ j G nj,n R¯ 1/2 , and −1/2 1/2 −1/2 R¯ j (I − G nj,n )−1 R¯ j commute with each other; hence, R¯ j (I − G j,n )(I − G nj,n )−1 G nj,n R¯ 1/2 is symmetric with respect to (·, ·). −1/2 Then, based on the above argument, defining w j = R¯ j v j , we have −1/2 a(T¯ j−1 (I − T¯ j )v j , v j ) = ( R¯ j (I − G j,n )(I − G nj,n )−1 G nj,n R¯ 1/2 w j , w j )

max (1−t)(1−t n )−1 t n ( R¯ −1 j vj,vj) t∈[0,1]

Nj 1 (v j , v j )r p ( R¯ −1 j v j , v j ) n c2 n where the last inequality is from Lemma 3.2. Moreover, J j=0

a(T¯ j−1 (I − T¯ j )v j , v j )

J N J c c1 j 1 (v j , v j )r p a(v j , v j ) = a(v, v) c n c n c 2n j=1 2 j=0 2

Therefore, c0 c1 /(c2 n) and consequently, the MSC yields the following convergence estimate for the MG V -cycle: I − B J A J a2 =

c0 c1 1+c0 c1 +c2 n

which completes the proof.

4. NUMERICAL ILLUSTRATION This section contains numerical results for the proposed MG V -cycle applied to the 2D Poisson equation with a single corner-like singularity. The model test problem we consider here is given by −u = f u=0

in on *

(6)

where the singularity occurs at the tip of the crack {(x, y), 0x0.5, y = 0.5} for = (0, 1)×(0, 1) as in Figure 3. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

304

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

The MG scheme used to solve (6) is a standard MG V -cycle with linear interpolation. The sequence of coarse-level problems defining the MG hierarchy is obtained by re-discretizing (6) on the nested meshes constructed using the GMR strategy described in Section 2. The reported results are for V (1, 1)-cycles and Gauss–Seidel (GS) as a smoother. The asymptotic convergence factors are computed using 100 V (1, 1)-cycles applied to the homogeneous problem starting with an O(1) random initial approximation. The asymptotic convergence factors reported in Table I clearly demonstrate our theoretical estimates in that they are independent of the number of refinement levels. To obtain a more complete picture of the overall effectiveness of our MG solver, we examine also storage and work-per-cycle measures. These are usually expressed in terms of operator complexity, defined as the number of non-zero entries stored in the operators on all levels divided by the number of non-zero entries in the finest-level matrix, and grid complexity defined as the sum of the dimensions of operators over all levels divided by the dimension of the finest-level operator. The grid and, especially, the operator complexities can be viewed as proportionality constants that indicate how expensive the entire V -cycle is compared with performing only the finest-level relaxations of the V -cycle. For our test problem, the grid and operator complexities were 1.2 and 1.3, respectively, independent of the number of levels. Considering the low grid and operator complexities the performance of the resulting MG solver applied to problem (6) is comparable to that of standard geometric MG applied to the Poisson equation with full regularity, i.e. without corner-like singularities; for the Poisson equation discretized on uniformly refined grids, standard MG with a GS smoother and linear interpolation yields MG ≈ 0.35.

Figure 3. Crack: initial triangulation (left) and the triangulation after one refinement (right), = 0.2.

Table I. Asymptotic convergence factors (MG ) for the MG V (1, 1)-cycle applied to problem (6) with Gauss–Seidel smoother. levels

MG (GS) Copyright q

2

3

4

5

6

0.40

0.53

0.56

0.53

0.50

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

MULTIGRID METHOD ON GRADED MESHES

305

ACKNOWLEDGEMENTS

We would like to thank Long Chen, Victor Nistor and Jinchao Xu for their useful suggestions and discussions during the preparation of this manuscript. The work of the second author was supported in part by NSF (DMS-0555831). The work of the first and the third author was supported in part by the NSF (DMS-058110) and Lawrence Livermore National Lab (B568399).

REFERENCES 1. Bramble JH. Multigrid Methods. Chapman & Hall, CRC Press: London, Boca Raton, FL, 1993. 2. Hackbusch W. Multi-Grid Methods and Applications. Computational Mathematics. Springer: New York, 1995. 3. Trottenberg U, Oosterlee CW, Sch¨uller A. Multigrid. Academic Press: San Diego, CA, 2001 (With contributions by A. Brandt, P. Oswald, K. St¨uben). 4. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34(4):581–613. 5. Babuˇska I, Aziz AK. The Mathematical Foundations of the Finite Element Method with Applications to Partial Differential Equations. Academic Press: New York, 1972. 6. B˘acut¸a˘ C, Nistor V, Zikatanov LT. Improving the rate of convergence of ‘high order finite elements’ on polygons and domains with cusps. Numerische Mathematik 2005; 100(2):165–184. 7. Bourlard M, Dauge M, Lubuma MS, Nicaise S. Coefficients of the singularities for elliptic boundary value problems on domains with conical points. III. Finite element methods on polygonal domains. SIAM Journal on Numerical Analysis 1992; 29(1):136–155. 8. Dauge M. Elliptic Boundary Value Problems on Corner Domains. Lecture Notes in Mathematics, vol. 1341. Springer: Berlin, 1988. 9. Grisvard P. Singularities in Boundary Value Problems. Research Notes in Applied Mathematics, vol. 22. Springer: New York, 1992. 10. Kellogg RB, Osborn JE. A regularity result for the Stokes problem in a convex polygon. Journal of Functional Analysis 1976; 21(4):397–431. 11. Kondratiev VA. Boundary value problems for elliptic equations in domains with conical or angular points. Transactions of the Moscow Mathematical Society 1967; 16:227–313. 12. Kozlov VA, Mazya V, Rossmann J. Elliptic Boundary Value Problems in Domains with Point Singularities. American Mathematical Society: Rhode Island, 1997. 13. Apel T, S¨andig A, Whiteman JR. Graded mesh refinement and error estimates for finite element solutions of elliptic boundary value problems in non-smooth domains. Mathematical Methods in the Applied Sciences 1996; 19(1):63–85. 14. Babuˇska I, Kellogg RB, Pitk¨aranta J. Direct and inverse error estimates for finite elements with mesh refinements. Numerische Mathematik 1979; 33(4):447–471. 15. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains II: mesh refinements and interpolation estimates. 2007, in preparation. 16. Braess D, Hackbusch W. A new convergence proof for the multigrid method including the V -cycle. SIAM Journal on Numerical Analysis 1983; 20(5):967–975. 17. Brenner SC. Convergence of the multigrid V -cycle algorithm for second-order boundary value problems without full elliptic regularity. Mathematics of Computation 2002; 71(238):507–525 (electronic). 18. Bramble JH, Pasciak JE, Wang JP, Xu J. Convergence estimates for multigrid algorithms without regularity assumptions. Mathematics of Computation 1991; 57(195):23–45. 19. Yserentant H. The convergence of multilevel methods for solving finite-element equations in the presence of singularities. Mathematics of Computation 1986; 47(176):399–409. 20. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. Sparsity and its Applications (Loughborough, 1983). Cambridge University Press: Cambridge, 1985; 257–284. 21. Vassilevski P. Multilevel Block Factorization Preconditioners. Springer: Berlin, 2008. 22. Ciarlet P. The Finite Element Method for Elliptic Problems. Studies in Mathematics and its Applications, vol. 4. North-Holland: Amsterdam, 1978. 23. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains I: transmission problems and a priori estimates. CCMA Preprint AM319, 2007. Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla

306

J. J. BRANNICK, H. LI AND L. T. ZIKATANOV

24. Xu J. An introduction to multigrid convergence theory. Iterative Methods in Scientific Computing, Hong Kong, 1995. Springer: Singapore, 1997; 169–241. 25. Xu J, Zikatanov L. The method of alternating projections and the method of subspace corrections in Hilbert space. Journal of the American Mathematical Society 2002; 15(3):573–597 (electronic). 26. Adams R. Sobolev Spaces. Pure and Applied Mathematics, vol. 65. Academic Press: New York, London, 1975. 27. Ammann B, Nistor V. Weighted sobolev spaces and regularity for polyhedral domains. Preprint, 2005. 28. Apel T, Sch¨oberl J. Multigrid methods for anisotropic edge refinement. SIAM Journal on Numerical Analysis 2002; 40(5):1993–2006 (electronic). 29. B˘acut¸a˘ C, Nistor V, Zikatanov LT. Regularity and well posedness for the Laplace operator on polyhedral domains. IMA Preprint, 2004. 30. Bramble JH, Pasciak JE. New convergence estimates for multigrid algorithms. Mathematics of Computation 1987; 49(180):311–329. 31. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194): 463–476. 32. Bramble JH, Zhang X. Uniform convergence of the multigrid V -cycle for an anisotropic problem. Mathematics of Computation 2001; 70(234):453–470. 33. Brenner S, Scott LR. The Mathematical Theory of Finite Element Methods. Texts in Applied Mathematics, vol. 15. Springer: New York, 1994. 34. Brenner SC. Multigrid methods for the computation of singular solutions and stress intensity factors. I. Corner singularities. Mathematics of Computation 1999; 68(226):559–583. 35. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. II. Crack singularities. BIT 1997; 37(3):623–643 (Direct methods, linear algebra in optimization, iterative methods, Toulouse, 1995/1996). 36. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. III. Interface singularities. Computer Methods in Applied Mechanics and Engineering 2003; 192(41–42):4687–4702. 37. Wu H, Chen Z. Uniform convergence of multigrid v-cycle on adaptively refined finite element meshes for second order elliptic problems. Science in China 2006; 49:1405–1429. 38. Yosida K. Functional Analysis (5th edn). A Series of Comprehensive Studies in Mathematics, vol. 123. Springer: New York, 1978. 39. Yserentant H. On the convergence of multilevel methods for strongly nonuniform families of grids and any number of smoothing steps per level. Computing 1983; 30(4):305–313. 40. Yserentant H. Old and new convergence proofs for multigrid methods. Acta Numerica, 1993. Cambridge University Press: Cambridge, 1993; 285–326.

Copyright q

2008 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2008; 15:291–306 DOI: 10.1002/nla