CS 5220

Applications of Parallel Computers

Sparse direct methods

Prof David Bindel

Please click the play button below.

Reminder: Sparsity and reordering

Permute unknowns for better SpMV or

  • Stability of Gauss elimination,
  • Fill reduction in Gaussian elimination,
  • Improved performance of preconditioners...

Reminder: Sparsity and partitioning

Want to partition sparse graphs so that

  • Subgraphs are same size (load balance)
  • Cut size is minimal (minimize communication)

Matrices that are “almost” diagonal are good?

Reordering for bandedness

Natural order RCM order

Reverse Cuthill-McKee

  • Select “peripheral” vertex \(v\)
  • Order according to breadth first search from \(v\)
  • Reverse ordering

From iterative to direct

  • RCM ordering is great for SpMV
  • But isn’t narrow banding good for solvers, too?
    • LU takes \(O(nb^2)\) where \(b\) is bandwidth.
    • Great if there’s an ordering where \(b\) is small!

Skylines and profiles

  • Profile solvers generalize band solvers
  • Skyline storage for lower triangle: for each row \(i\),
    • Start and end of storage for nonzeros in row.
    • Contiguous nonzero list up to main diagonal.
  • In each column, first nonzero defines a profile.
  • All fill-in confined to profile.
  • RCM is again a good ordering.

Beyond bandedness

  • Minimum bandwidth for 2D model problem? 3D?
  • Skyline only gets us so much farther

Beyond bandedness

But more general solvers have similar structure

  • Ordering (minimize fill)
  • Symbolic factorization (where will fill be?)
  • Numerical factorization (pivoting?)
  • ... and triangular solves

Troublesome Trees

One step of Gaussian elimination completely fills this matrix!

Terrific Trees

Full Gaussian elimination generates no fill in this matrix!

Graphic Elimination

Consider first steps of GE

A(2:end,1)     = A(2:end,1)/A(1,1);
A(2:end,2:end) = A(2:end,2:end)-...
                 A(2:end,1)*A(1,2:end);

Nonzero in the outer product at \((i,j)\) if A(i,1) and A(j,1) both nonzero — that is, if \(i\) and \(j\) are both connected to 1.

General: Eliminate variable, connect remaining neighbors.

Terrific Trees Redux

Order leaves to root \(\implies\)
on eliminating \(i\), parent of \(i\) is only remaining neighbor.

Nested Dissection

  • Idea: Think of block tree structures.
  • Eliminate block trees from bottom up.
  • Can recursively partition at leaves.

Nested Dissection

  • Rough cost estimate: how much just to factor dense Schur complements associated with separators?
  • Notice graph partitioning appears again!
    • And again we want small separators!

Nested Dissection

Model problem: Laplacian with 5 point stencil (for 2D)

  • ND gives optimal complexity in exact arithmetic
    (George 73, Hoffman/Martin/Rose)
  • 2D: \(O(N \log N)\) memory, \(O(N^{3/2})\) flops
  • 3D: \(O(N^{4/3})\) memory, \(O(N^2)\) flops

Minimum Degree

  • Locally greedy strategy
    • Want to minimize upper bound on fill-in
    • Fill \(\leq\) (degree in remaining graph)\(^2\)
  • At each step
    • Eliminate vertex with smallest degree
    • Update degrees of neighbors
  • Problem: Expensive to implement!
    • But better varients via quotient graphs
    • Variants often used in practice

Elimination Tree

  • Variables (columns) are nodes in trees
  • \(j\) a descendant of \(k\) if eliminating \(j\) updates \(k\)
  • Can eliminate disjoint subtrees in parallel!

Cache locality

Basic idea: exploit “supernodal” (dense) structures in factor

  • e.g. arising from elimination of separator Schur complements in ND
  • Other alternatives exist (multifrontal solvers)

Pivoting

Pivoting is painful, particularly in distributed memory!

  • Cholesky — no need to pivot!
  • Threshold pivoting — pivot when things look dangerous
  • Static pivoting — try to decide up front

What if things go wrong with threshold/static pivoting?
Common theme: Clean up sloppy solves with good residuals

Direct to iterative

Can improve solution by iterative refinement: \[\begin{aligned} PAQ &\approx LU \\ x_0 &\approx Q U^{-1} L^{-1} Pb \\ r_0 &= b-Ax_0 \\ x_1 &\approx x_0 + Q U^{-1} L^{-1} P r_0 \end{aligned} \] Looks like approximate Newton on \(F(x) = Ax-b = 0\).
This is just a stationary iterative method!
Nonstationary methods work, too.

Variations on a theme

If we’re willing to sacrifice some on factorization,

  • Single precision factor + double precision refinement?
  • Sloppy factorizations (marginal stability) + refinement?
  • Modify \(m\) small pivots as they’re encountered (low rank updates), fix with \(m\) steps of a Krylov solver?

Parting advice

  • Sparse direct for 2D problems
  • Gets more expensive for 3D problems
  • Approximate direct solves make good preconditioners