256x Filetype PDF File size 2.45 MB Source: engineering.purdue.edu
IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018 35
Accuracy Directly Controlled Fast Direct Solution
of General H2-Matrices and Its Application
to Solving Electrodynamic Volume
Integral Equations
Miaomiao Ma, Graduate Student Member, IEEE, and Dan Jiao, Fellow, IEEE
Abstract—The dense matrix resulting from an integral equa- fast solvers for electromagnetic analysis. Many existing fast
tion (IE)-based solution of Maxwell’s equations can be compactly solvers can be interpreted in this framework, although their
represented by an H2-matrix. Given a general dense H2- developments may predate the H2-framework. For exam-
matrix, prevailing fast direct solutions involve approximations ple, the matrix structure resulting from the well-known
whose accuracy can only be indirectly controlled. In this paper, FMM-basedmethods[5], [6] is an H2-matrix. In other words,
we propose new direct solution algorithms whose accuracy is 2
directly controlled, including both factorization and inversion, the H -matrix can be viewed as an algebraic generalization
for solving general H2-matrices. Different from the recursive of the FMM method. Multiplying an H2-matrix by a vector
inverse performed in existing H2-based direct solutions, this new has a complexity of O(NlogN) for solving electrically large
direct solution is a one-way traversal of the cluster tree from surface integral equations (SIEs). In the mathematical litera-
the leaf level all the way up to the root level. The underlying ture, such as [7], it is shown that the storage, matrix-vector
multiplications and additions are carried out as they are without
using formatted multiplications and additions whose accuracy multiplication (MVM), and matrix–matrix multiplication of an
cannot be directly controlled. The cluster bases and their rank H2-matrix can be performed in optimal O(N) complexity
of the original matrix are also updated level by level based on pre- for constant-rank cases. However, no such complexity is
scribed accuracy, without increasing computational complexity, shown for either matrix factorization or inversion, regardless
to take into account the contributions of fill-ins generated during
the direct solution procedure. For constant-rank H2-matrices, of whether the rank is a bounded constant or an increasing
the proposed direct solution has a strict O(N) complexity in both variable. Later in [8] and [9], fast matrix inversion and
time and memory. For rank that linearly grows with the electrical LU factorization algorithms are developed for H2-matrices.
size, the complexity of the proposed direct solution is O(NlogN) They have shown a complexity of O(N) for solving constant-
in factorization and inversion time, and O(N) in solution time rank cases, such as electrically small and moderate problems
andmemoryforsolvingvolumeIEs(VIEs).Rapiddirectsolutions for both SIE and volume IE (VIE) [10]–[13];and a complexity
of electrodynamic VIEs involving millions of unknownshave been
obtained on a single CPU core with directly controlled accuracy. of O(NlogN) for solving electrically large VIEs [14]–[16].
Comparisons with state-of-the-art H2-based direct VIE solvers Thereexist other significant contributions in fast direct solvers,
have also demonstrated the advantages of the proposed direct such as [17]–[25], [32], and [33]. The focus of this paper,
solution in accuracy control, as well as achieving better accuracy however, is a fast direct solution to an H2-matrix, as this
with much less CPU time. matrix structure is general suitable for solving both dense and
Index Terms—Dense matrices, electrodynamic, electromag- sparse matrices.
netic analysis, fast direct solvers, frequency domain, H2-matrix, Despite a significantly reduced complexity, in the existing
integral equations (IEs), linear complexity solvers, volume direct solutions of H2-matrices, such as [8] and [9], formatted
IEs (VIEs).
multiplications and additions are performed instead of actual
ones, where the cluster bases used to represent a matrix
I. INTRODUCTION are also used to represent the matrix’s inverse as well as
HE H2-matrix is a general mathematical frame- LU factors. During the inversion and factorization procedure,
Twork [1]–[4] for compact representation and efficient only the coupling matrix of each admissible block is com-
computation of dense matrices. It can be utilized to develop puted, while the cluster bases are kept to be the same as those
in the original matrix. Physically speaking, such a choice of
Manuscript received March 22, 2017; revised June 5, 2017 and the cluster bases for the inverse matrix often constitutes an
July 16, 2017; accepted July 20, 2017. Date of publication August 15, 2017; accurate choice. However, mathematically speaking, the accu-
date of current version January 4, 2018. This work was supported in part by
the NSF under Award 1619062, in part by the SRC (Task 1292.073), and racy of the inversion and factorization cannot be directly
in part by DARPA under Award HR0011-14-1-0057. (Corresponding author: controlled. The lack of an accuracy control is also observed
Dan Jiao.) in the H2-based formatted matrix–matrix multiplication in the
The authors are with the School of Electrical and Computer Engi-
neering, Purdue University, West Lafayette, IN 47907 USA (e-mail: mathematical literature, such as [7]. If the accuracy is to be
djiao@purdue.edu). controlled, the inverse as well as the matrix–matrix multipli-
Color versions of one or more of the figures in this paper are available cation algorithm must be completely changed, as the original
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TMTT.2017.2734090 formatted framework does not offer a mechanism to control
0018-9480 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
36 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018
the accuracy without increasing complexity. Algorithmwise,
the collect operation involves approximations, whose accuracy
is not directly controlled. This operation is performed when
multiplying a nonleaf block with a nonleaf block, or multiply-
ing a nonleaf block with an admissible block, with the target
block being admissible. In this operation, the four blocks,
either admissible or inadmissible, are collected to a single
admissible block based on the cluster bases of the original
matrix. If the target block cannot be accurately represented by
the original cluster bases, then an error would occur. When
the accuracy of the direct solution is not satisfactory, one
can only change the original H2-representation, i.e., change
the cluster bases and/or the rank for representing the original
matrix (based on a prescribed accuracy), with the hope that
they can better represent the inverse and LU factors. Therefore,
the accuracy of the resulting direct solution is not directly
controlled. A direct solution with an explicit accuracy control
should perform every multiplication and addition as it is
without assuming a format of the matrix involved in the
computation.Meanwhile,everyoperationin the direct solution
should be either exact or performed based on a prescribed
accuracy. This is what is pursued and achieved in this paper.
Recently, an HSS-matrix structure [26] has also been
explored for electromagnetic analysis [27]–[30]. Exact- 2
arithmetic fast factorization and inversion algorithms have Fig. 1. Illustration of an H -matrix. (a) Block cluster tree. (b) Matrix
partition where admissible blocks are stored in the format of (1).
been developed in [28] and [29]. In other words, the direct
solution of the HSS matrix does not involve any approx- both backward and forward substitutions. The accuracy and
imation. However, the HSS matrix, as a special class of the computational complexity of the proposed direct solutions
H2-matrix, requires only one admissible block be formed for are analyzed in Section V. Section VI demonstrates the
each node in the H2-tree. All the off-diagonal matrices in the application of this paper in solving electrodynamic VIEs.
HSS matrix are represented as low-rank matrices. As a result, In Section VII, we summarize this paper.
the resultant rank can be too high to be computed efficiently,
especially for electrically large analysis. This is because as II. BACKGROUND OFTHISPAPER
long as the sources and observers are separated, even though AnH2-matrix [1]–[4] is generally stored in a tree structure,
their distance is infinitely small, their interaction must be with each node in the tree called a cluster. The number of
represented by a low-rank matrix to suite the structure of an unknowns in each cluster at the leaf level is no greater than
HSS matrix. From this perspective, an H2-matrix is a more- leafsize, a predefined constant. An H2-matrix is partitioned
efficient structure for general applications, since the distance into multilevel admissible and inadmissible blocks based on
between the sources and observers can be used to reduce the an admissibility condition. As an example, a four-level block
rank required to represent a dense matrix for a prescribed cluster H2-tree is shown in Fig. 1(a), where a green link
accuracy. However, the accuracy-controlled direct solution of denotes an admissible block, formed between a row cluster
general H2-matrices is still lacking in the open literature. and a column cluster that satisfy the admissibility condition,
The contribution of this paper is such an accuracy-controlled and a red link denotes an inadmissible block. The resultant
direct solution of general H2-matrices. This solution can be H2-matrix is shown in Fig. 1(b). An admissible block in an
applied to solve a general H2-matrix, be it real- or complex- H2-matrix is represented as
valued, symmetric or unsymmetrical. Its complexity is O(N) Z =(V) (S ) (V )T (1)
for constant-rank H2-matrices in both time and memory. For t,s t #t×k t,s k×k s #s×k
rank that linearly grows with electrical size, the complexity of where V (V ) is called cluster basis of cluster t (s), S is
the proposed direct solution is O(NlogN) in factorization and t s t,s
called coupling matrix, # denotes the cardinality of a set, and
inversion time, O(N) in solution time, and memory usage for k is rank. The cluster basis in an H2-matrix has a nested
solving VIEs. As a demonstration, we show how it is used to property. This means the cluster basis for a nonleaf cluster t,
solve large VIEs involving millions of unknowns on a single Vt, can be expressed by its two children’s cluster bases,
core CPU, and with controlled accuracy. Vt and Vt , as the following:
The rest of this paper is organized as follows. 1 2
(V ) 0 (T )
t #t ×k t k ×k
In Section II, we review the mathematical background of this (Vt)#t×k = 1 1 1 1 1 (2)
0 (V ) (T )
t #t ×k t k ×k
paper. In Section III, we present the proposed factorization and 2 2 2 2 2
inversion algorithms for general H2-matrices. In Section IV, where Tt and Tt are called transfer matrices. In an
2 1 2
wedescribe the proposed matrix solution algorithm, including H -matrix, the number of blocks formed by a single cluster at
2
MAANDJIAO:ACCURACYDIRECTLYCONTROLLEDFAST DIRECT SOLUTIONOF GENERAL H -MATRICES 37
each tree level is bounded by a constant, Csp. The size of an A. Computation at the Leaf Level
inadmissible block is leafsize, and inadmissible blocks appear 1) For the First Leaf Cluster: We start from the leaf
only at the leaf level. level (l = L). For the first cluster, whose index is denoted
In the mathematical literature [7], for constant-rank cases, by i = 1, we perform three steps as follows.
it is shown that the computational complexity of an Step 1: We compute the complementary cluster basis V⊥,
H2-matrix is optimal (linear) in storage, MVM, and i
which is orthogonal to Vi (cluster basis of cluster i). This can
matrix–matrix multiplication. In [8]–[13], it is shown that be easily done by carrying out an SVD or QL factorization
O(N) direct solutions can also be developed for H2-matrices on V . The cost is not high; instead, it is constant at the leaf
i
for constant-rank cases as well as electrically moderate cases ⊥
level. We then combine V with V to form Q matrix as the
i i i
whose rank can be controlled by a rank function. For following:
electrically large analysis, the rank of an H2-representation
⊥
Q =[(V )#i×(#ik ) (Vi)#i×k ] (3)
of IE kernels is not constant any more for achieving a i i l l
prescribed accuracy. Different H2-representations also result in which k is the rank at level l (now l = L), and # denotes
in different ranks and their growth rates with electrical l
the number of unknowns contained in a set.
size, and hence different complexities. For example, if an Step 2: Let Zcur be the current matrix that remains to be
FMM-based H2-representation is used to model an IE oper- factorized. For leaf cluster i = 1, Z = Z. We compute a
cur
ator, the asymptotic rank would scale quadratically with the transformed matrix QHZ Q, and use it to overwrite Z ;
electrical size. If an interpolation-based H2-representation is i cur i cur
thus
used to model an IE operator, the asymptotic rank would scale H
Z =Q Z Q (4)
quadratically with the electrical size in SIEs, and cubically cur i cur i
with the electrical size in VIEs. Despite its full-rank model where QH denotes Q ’s complex conjugate transpose, and
in SIEs, the FMM complexity is as low as O(NlogN) for i i
oneMVM,becauseitscouplingmatrixisdiagonalandtransfer Q =diag{I ,I ,...,Q ,...,I } (5)
matrix is sparse. On the other hand, if one uses an SVD-based i 1 2 i #{leaf clusters}
minimal-rank representation, the resultant coupling matrix is is a block diagonal matrix. The ith diagonal block in Qi
not diagonal. However, the asymptotic rank of such a minimal- is Qi, and other blocks are identity matrices I, the size of
rank representation only scales linearly with electrical size each of which is the corresponding leaf-cluster size. The
for general 3-D problems as shown by the analysis given subscripts in the right-hand side of (5) denote the indexes of
in [31]. In this analysis, there are additional findings that are diagonal blocks, which are also the indexes of leaf clusters,
not utilized in the FMM-based rank analysis or a source- and #{leaf clusters} stands for the number of leaf clusters.
If leaf cluster i = 1 is being computed, Q appears in
observer separated rank analysis, such as SVD turns to a i
Fourier analysis in a linear-shift invariant system, and the the first block of (5). In addition, Qi in (4) denotes the
Fourier transform of Green’s function reveals that not all the complex conjugate of Qi, as the cluster bases we construct
are complex-valuedto account for general H2-matrices. In (5),
plane waves are important, and only those whose wavenumber
Q only consists of one Q ,wherei is the current clus-
is closest to the given wavenumber k0 make the most important i i
contribution. These plane waves are counted for prescribed ter being computed. This is because Qi values for other
accuracy for 1-, 2-, and 3-D problems in [31]. It is found that clusters are not ready for use yet, since in the proposed
the growth rate with electrical size is no greater than linear. algorithm, we will update cluster bases to take into account
Regardless of which H2-matrix is generated to represent the the contribution from fill-ins. This will become clear in the
integral or partial differential equation operators, the proposed sequel.
direct solution is equally applicable. In this paper, when The purpose of doing this step, i.e., obtaining (4), is to
we apply the proposed direct solution to solve large VIEs, introduce zeros in the matrix being factorized, since
we employ a minimal-rank H2-representation of the H 0
(#ik )×k
×V = l l (6)
Q i
VIE operator using the technique we developed in [14]–[16]. i I
k ×k
l l
T
V ×Q =[0 I ]. (7)
i k ×(#ik ) k ×k
III. PROPOSED FACTORIZATIONAND i l l l l
INVERSIONALGORITHMS The operation of (4) thus zeros out the first (#i kl) rows
Given a general H2-matrix Z, the proposed direct solution of the admissible blocks formed by row cluster i,aswellas
is a one-way tree traversal from the leaf level all the way up the first (#i k ) columns in Z , as shown by the white
l cur
to the root level of the H2-tree. At each level, the factorization blocks in Fig. 2. The real computation of (4) is, hence, in the
proceeds from the left to the right across all the clusters. This inadmissible blocks of cluster i, as shown by the four red
is very different from the algorithm of [9], [11], [14], and [15], blocks associated with cluster i = 1inFig.2.
where a recursive procedure is performed. To help better Step 3: After Step 2, the first (#i kl)rowsandcolumns
understand the proposed algorithm, while we present a generic of Zcur in the admissible blocks of cluster i become zero.
algorithm, we will use the H2-matrix shown in Fig. 1(b), Hence, if we eliminate the first (#i kl) unknowns of cluster i
as an example to illustrate the overall procedure. In Fig. 1(b), via a partial LU factorization, the admissible blocks of cluster i
green blocks are admissible blocks, and red ones represent at these rows and columns would stay as zero, and thus not
inadmissible blocks. affected. As a result, the resulting L and U factors, as well
38 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018
Fig. 2. Illustration of QHZQ .
1 1
as the fill-in blocks generated, would only involve a constant
number of blocks, which is bounded by O(C2 ).
sp
In view of the above-mentioned property, in Step 3, we per-
form a partial LU factorization to eliminate the first (#i kl)
unknowns of cluster i in Zcur. Let this set of unknowns
be denoted by i′. Z can be cast into the following form
cur
accordingly:
F′ ′ Z′
Zcur = i i i c (8)
Z ′ Z
ci cc
Fig. 3. (a) Z after partial LU of cluster 1 with fill-in blocks marked in
in which F ′ ′ denotes the diagonal block of the (#i kl) cur
i i blue. (b) Fill-in blocks of cluster 2 turned green after cluster basis update.
unknowns, and Z ′ is the remaining rectangular block in the
i c
row of set i′. Here, c contains the rest of the unknowns, which
do not belong to i′.TheZ ′ is the transpose block of Z ′ ,and
ci i c be factorized. Thus, we let
Z is the diagonal block formed by unknowns in set c.Itis
cc I0
evident that Z ′ (Z ′) only consists of O(Csp) inadmissible Z = . (13)
i c ci cur
0 Z
blocks, as the rest of the blocks are zero. Take leaf cluster cc
i = 1 as an example. It forms inadmissible blocks with clusters This matrix is shown in Fig. 3(a), where the fill-in blocks
i = 2,4,6, as shown by the red blocks in Fig. 2. Hence, are added upon previous Zcur. If the fill-in block is added
Z′ =[F12,F14,F16] in addition to part of F11. upon an inadmissible block, we add the fill-in directly to the
i c
After performing a partial LU factorization to eliminate the inadmissible block. For the example shown in Fig. 1(b), these
unknowns contained in i′, we obtain blocks are the blocks marked in dark blue in Fig. 3(a) after the
L′ ′ 0 1 i′ set of unknowns in cluster i = 1 is eliminated. If the fill-
i i I0U′′ L′′Z′
Z = i i i i i c (9) in appears in an admissible block, we store the fill-in block
cur ′ 1
Z U I 0 Z 0I
ci i′i′ cc temporarily in this admissible block for future computation.
where These blocks are the blocks marked in light blue in Fig. 3(a).
To explain in more detail, from (11), it can be seen that the
F′ ′ = L ′ ′U ′ ′ (10)
i i i i i i fill-in block formed between cluster j and cluster k has the
and the Schur complement is following form:
′ 1 1 ′ F =Z ′U1L1Z′ (14)
Zcc = Zcc Z U′ ′L ′ ′Z (11) j,k ji ′ ′ ′ ′ i k
ci i i i i i c i i i i
′ 1 1 ′ whichis also low rank in nature. The row cluster j and column
in which Z U′ ′L ′ ′Z represents the fill-in blocks intro-
ci i i i i i c
duced due to the elimination of i′-unknowns. Because of the cluster k belong to the set of clusters that form inadmissible
zeros introduced in the admissible blocks, Zi′c and Zci′ only blocks with cluster i, whose number is bounded by Csp.Ifthe
consist of inadmissible blocks formed by cluster i, the number original block formed between j and k, Z , is admissible,
cur,jk
of which is bounded by constant Csp. Hence, the number of no computation is needed for Fj,k, i.e., we do not add it
fill-in blocks introduced by this step is also bounded by a directly to the admissible block. Instead, we store it with this
constant, which is O(C2 ). admissible block temporarily until cluster j is reached during
sp the factorization. At that time, F will be used to compute
Equation (9) can further be written in short as j,k
an updated cluster basis for j. If the elimination of different
l I0l clusters results in more than one fill-in block in admissible
Z =L U (12)
cur i i
0 Zcc Z block, the later ones are added upon existing F .
cur,jk j,k
in which the first matrix of (9), which is the L factor In other words, the fill-ins at the admissible Z block are
cur,jk
generated after a partial LU of cluster i at level l, is denoted summed up and stored in a single Fj,k.
l
by L . Similarly, the rightmost matrix of (9), which is the In summary, Step 3 is to perform a partial LU factorization,
i
U factor generated after a partial LU of cluster i at level l, as shown in (9), to eliminate the first #i kl unknowns in
l
is denoted by Ui. The center matrix is the one that remains to cluster i. The storage and computation cost of this step can
no reviews yet
Please Login to review.