Volume 2, Issue 3
Distributed-Memory $\mathcal{H}$-Matrix Algebra I: Data Distribution and Matrix-Vector Multiplication

Yingzhou Li, Jack Poulson & Lexing Ying

CSIAM Trans. Appl. Math., 2 (2021), pp. 431-459.

Published online: 2021-08

Export citation
  • Abstract

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $Ω(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $\mathscr{O}(\frac{Nlog N}{P} +αlog P+βlog^2P)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $α$ denotes the latency, and $β$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

  • Keywords

Parallel fast algorithm, $\mathcal{H}$-matrix, distributed-memory, parallel computing.

  • AMS Subject Headings

65F99, 65Y05

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{CSIAM-AM-2-431, author = {Li , YingzhouPoulson , Jack and Ying , Lexing}, title = {Distributed-Memory $\mathcal{H}$-Matrix Algebra I: Data Distribution and Matrix-Vector Multiplication}, journal = {CSIAM Transactions on Applied Mathematics}, year = {2021}, volume = {2}, number = {3}, pages = {431--459}, abstract = {

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $Ω(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $\mathscr{O}(\frac{Nlog N}{P} +αlog P+βlog^2P)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $α$ denotes the latency, and $β$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

}, issn = {2708-0579}, doi = {https://doi.org/10.4208/csiam-am.2020-0206}, url = {http://global-sci.org/intro/article_detail/csiam-am/19445.html} }
TY - JOUR T1 - Distributed-Memory $\mathcal{H}$-Matrix Algebra I: Data Distribution and Matrix-Vector Multiplication AU - Li , Yingzhou AU - Poulson , Jack AU - Ying , Lexing JO - CSIAM Transactions on Applied Mathematics VL - 3 SP - 431 EP - 459 PY - 2021 DA - 2021/08 SN - 2 DO - http://doi.org/10.4208/csiam-am.2020-0206 UR - https://global-sci.org/intro/article_detail/csiam-am/19445.html KW - Parallel fast algorithm, $\mathcal{H}$-matrix, distributed-memory, parallel computing. AB -

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $Ω(P^2)$ scheduling procedure used in previous work, where $P$ is the number of processes, while data balancing is well-preserved. Based on the data distribution, our distributed-memory algorithm evenly distributes all computations among $P$ processes and adopts a novel tree-communication algorithm to reduce the latency cost. The overall complexity of our algorithm is $\mathscr{O}(\frac{Nlog N}{P} +αlog P+βlog^2P)$ for $\mathcal{H}$-matrices under weak admissibility condition, where $N$ is the matrix size, $α$ denotes the latency, and $β$ denotes the inverse bandwidth. Numerically, our algorithm is applied to address both two- and three-dimensional problems of various sizes among various numbers of processes. On thousands of processes, good parallel efficiency is still observed.

Yingzhou Li, Jack Poulson & LexingYing. (2021). Distributed-Memory $\mathcal{H}$-Matrix Algebra I: Data Distribution and Matrix-Vector Multiplication. CSIAM Transactions on Applied Mathematics. 2 (3). 431-459. doi:10.4208/csiam-am.2020-0206
Copy to clipboard
The citation has been copied to your clipboard