Over last several months we have been focusing on making toolbox fast in processing the large matrices. Although many updates were released along the way (see full changelog) here we outline only the most important changes:
- High-level dense linear algebra routines have been updated to use modern performance-oriented algorithms:
- Most of the dense matrix algorithms have been enabled with parallel execution on multi-core CPUs. In future we plan to add support for GPU (at least for quadruple precision).
- Many significant improvements have been made in toolbox architecture: low-level kernel modules are re-designed to minimize memory fragmentation, cache misses, copy-overhead in data exchange with MATLAB, etc.
All algorithms are implemented in quadruple as well as in arbitrary precision modes.
All combined allows toolbox to reach the following results:
UPDATE (April 22, 2017): Timings for Mathematica 11.1 have been added to the table, thanks to test script contributed by Bor Plestenjak.
Factorization | Timing (sec) | Speed-up (times) | |||||
---|---|---|---|---|---|---|---|
MATLAB | Maple | Mathematica | Advanpix | Over MATLAB | Over Maple | Over Mathematica | |
A & B are pseudo-random real matrices (500×500): | |||||||
[L,U] = lu(A) | 249.13 | 85.16 | 15.12 | 0.47 | 534.38 | 182.67 | 32.42 |
[Q,R] = qr(A) | 514.34 | 458.86 | 44.39 | 3.08 | 167.25 | 149.21 | 14.43 |
[U,S,V] = svd(A) | 5595.26 | 4317.40 | 376.94 | 9.57 | 584.62 | 451.11 | 39.38 |
S = svd(A) | 2765.11 | 645.60 | 56.96 | 2.98 | 927.17 | 216.48 | 19.10 |
[V,D] = eig(A) | 21806.30 | 6060.90 | 584.86 | 33.75 | 646.05 | 179.57 | 17.33 |
lambda = eig(A) | 3384.58 | 7822.30 | 205.46 | 23.55 | 143.70 | 332.11 | 8.72 |
[V,D] = eig(A,B) | n/a | 11358.00 | 928.89 | 112.05 | ∞ | 101.36 | 8.29 |
lambda = eig(A,B) | n/a | 5273.00 | 510.80 | 60.37 | ∞ | 87.34 | 8.46 |
Advanpix toolbox outperforms VPA by 500 times, Maple by 210 times and Wolfram Mathematica by 18 times in average.
VPA requires 6 hours to do the full eigen-decomposition alone!
Maple was unable to work with 500×500 matrix in GUI mode (choked with "mserver stopped responding"
). Eventually we switched to Maple’s command-line interface. In complex case (not shown) Maple wasn’t able to provide accurate eigenvectors for generalized eigen-problem at all and it required 16 hours to compute the eigenvalues…
Our test environment and test scripts:
Core i7-990X Extreme Edition, 4.27 MHz, 24GB, Windows 7 64-bit
.MATLAB 2016b, Multiprecision Computing Toolbox 4.3.3
.Maple 2016
.Mathematica 11.1.0.0
.- Maple test script: maple_denseset.zip
- MATLAB test script for VPA: vpa-denseset.zip
- MATLAB test script for Advanpix: advanpix-denseset.zip
- Mathematica test script: mathematica-denseset.zip
Future Work
As always, we will continue working on performance improvement of dense matrix computations focusing on: more parallelism (multi-core CPU, GPU) and advanced algorithms (e.g. multi-shift QZ [7, 8] for generalized eigenproblems, MRRR for SVD).
However for the next release we will prioritize long-awaited (and delayed) Krulov-Schur eigensolver [9] for sparse matrices. Iterative solvers GMRES and BiCGSTAB will be introduced as well.
Acknowledgements
I am grateful to Andre Van Moer (Université libre de Bruxelles) who not only ran all the comparison tests on his powerful computer but also spent a lot of time and efforts on making tests run properly in all software packages (re-writing scripts for Maple CLI, etc.). Also for his tremendous help with ongoing testing of alpha-beta versions of toolbox in various configurations and settings.
I also thank Gerhard Wittreich (University of Delaware), Michael Klanner (University of Technology Graz), Brandon Langley (University of Illinois at Urbana-Champaign), Adam Rancon (Ecole Normale Superieure de Lyon), Steffen Hofmann (Technische Universität Berlin) and many others who provided detailed feedback, bug reports and ideas for improvement.
It would be impossible to move forward without such supportive users. Thank you all.
References
- Jessup E. R., Sorensen D. C., A Parallel Algorithm for Computing the Singular Value Decomposition of a Matrix. SIAM Journal on Matrix Analysis and Applications 15:530-548, 1994.^
- Gu M., Eisenstat S.C., A divide-and-conquer algorithm for the bidiagonal SVD, SIAM Journal on Matrix Analysis and Applications, 16:79–92, 1995.^
- Braman K., Byers R., Mathias R., The multi-shift QR algorithm Part I: Maintaining well focused shifts, and level 3 performance, SIAM Journal on Matrix Analysis and Applications, 23:929–947, 2002.^
- Braman K., Byers R., Mathias R., The multi-shift QR algorithm Part II: Aggressive early deflation, SIAM Journal on Matrix Analysis and Applications, 23:948–973, 2002.^
- Dhillon I.S., Parlett B.N., Vömel C., The design and implementation of the MRRR algorithm, ACM Transactions on Mathematical Software 32:533-560, 2006.^
- Dhillon I.S., and Parlett B.N., Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices, Linear Algebra and its Applications, 387(1), pp. 1-28, August 2004.^
- Kreßner D., Numerical methods and software for general and structured eigenvalue problems, PhD thesis, TU Berlin, Institut für Mathematik, 2004.^
- Kågström B., Kreßner D., Multishift variants of the QZ algorithm with aggressive early deflation, SIAM Journal on Matrix Analysis and Applications, 2006.^
- Stewart G. W., A Krylov–Schur Algorithm for Large Eigenproblems. SIAM Journal on Matrix Analysis and Applications, 23:601–614, 2001.^
{ 0 comments… add one now }