UPDATE (April 22, 2017): Timings for Mathematica 11.1 have been added to the table, thanks to test script contributed by Bor Plestenjak. I suggest to take a look at his excellent toolbox for multiparameter eigenvalue problems – MultiParEig.
UPDATE (November 17, 2016): All timings in table have been updated to reflect speed improvements in new version of toolbox (4.3.0
). Now toolbox computes elementary functions using multi-core parallelism. Also we included timings for the the latest version of MATLAB
– 2016b
.
UPDATE (June 1, 2016): Initial version of the post included statement that newest version of MATLAB R2016a
uses MAPLE
engine for variable precision arithmetic (instead of MuPAD
as in previous versions). After more detailed checks we have detected that this is not true. As it turned out, MAPLE 2016
silently replaced VPA
functionality of MATLAB
during installation. Thus we (without knowing it) tested MAPLE Toolbox for MATLAB
instead of MathWorks Symbolic Math Toolbox
. We apologize for misinformation. Now post provides correct comparison results with Symbolic Math Toolbox/VPA
.
Thanks to Nick Higham, Massimiliano Fasi and Samuel Relton for their help in finding this mistake!
From the very beginning we have been focusing on improving performance of matrix computations, linear algebra, solvers and other high level algorithms (e.g. 3.8.0 release notes).
With time, as speed of advanced algorithms has been increasing, elementary functions started to bubble up in top list of hot-spots more frequently. For example the main bottleneck of the multiquadric collocation method in extended precision was the coefficient-wise power function (.^
).
Thus we decided to polish our library for computing elementary functions. Here we present intermediate results of this work and traditional comparison with the latest MATLAB R2016b
(Symbolic Math Toolbox/Variable Precision Arithmetic), MAPLE 2016
and Wolfram Mathematica 11.1.0.0
.
Timing of logarithmic and power functions in 3.9.4.10481
:
>> mp.Digits(34); >> A = mp(rand(2000)-0.5); >> B = mp(rand(2000)-0.5); >> tic; C = A.^B; toc; Elapsed time is 67.199782 seconds. >> tic; C = log(A); toc; Elapsed time is 22.570701 seconds.
Speed of the same functions after optimization, in 4.3.0.12057
:
>> mp.Digits(34); >> A = mp(rand(2000)-0.5); >> B = mp(rand(2000)-0.5); >> tic; C = A.^B; toc; % 130 times faster Elapsed time is 0.514553 seconds. >> tic; C = log(A); toc; % 95 times faster Elapsed time is 0.238416 seconds.
Now toolbox computes 4 millions of logarithms in quadruple precision (including negative arguments) in less than a second!
Inspired by this result, we have applied our ideas to speed-up some other elementary functions. Summary table with timings and comparison against MATLAB R2016b (VPA)
, MAPLE 2016
and Wolfram Mathematica 11.1.0.0
on Core i7 990x / Windows 7
64-bit:
Function | Timing (sec) | Speed-up (times) | |||||
---|---|---|---|---|---|---|---|
MATLAB (VPA) | Maple | Mathematica | Advanpix | Over VPA | Over Maple | Over Mathematica | |
Power & exponential: | |||||||
EXP | 107.34 | 756.14 | 4.54 | 0.12 | 886.34 | 6243.90 | 37.49 |
LOG | 1161.18 | 593.98 | 6.61 | 0.23 | 5133.40 | 2625.91 | 29.21 |
LOG10 | 1438.91 | 639.46 | 11.13 | 0.24 | 5958.23 | 2647.88 | 46.09 |
LOG2 | 1442.71 | 643.17 | 11.08 | 0.25 | 5789.35 | 2580.94 | 44.48 |
SQRT | 28.75 | 427.40 | 2.60 | 0.27 | 105.74 | 1571.90 | 9.55 |
Trigonometric: | |||||||
SIN | 85.28 | 736.89 | 6.07 | 0.15 | 570.80 | 4932.33 | 40.62 |
COS | 78.96 | 513.73 | 6.10 | 0.15 | 516.44 | 3359.92 | 39.89 |
TAN | 1261.92 | 844.05 | 8.91 | 0.17 | 7277.51 | 4867.64 | 51.37 |
ASIN | 105.12 | 1181.83 | 12.39 | 0.39 | 266.40 | 2995.01 | 31.39 |
ACOS | 100.49 | 1330.99 | 23.10 | 0.39 | 257.55 | 3411.03 | 59.19 |
ATAN | 131.92 | 1039.55 | 5.71 | 0.14 | 974.28 | 7677.64 | 42.17 |
SEC | 1466.09 | 778.14 | 8.00 | 0.18 | 8199.59 | 4352.01 | 44.76 |
CSC | 1503.75 | 793.87 | 8.35 | 0.18 | 8490.95 | 4482.60 | 47.13 |
COT | 1511.67 | 1014.76 | 10.46 | 0.20 | 7728.36 | 5187.95 | 53.48 |
ASEC | 1610.29 | 1962.87 | 18.45 | 0.28 | 5815.44 | 7088.72 | 66.62 |
ACSC | 1648.31 | 1720.76 | 21.96 | 0.28 | 5965.65 | 6227.86 | 79.47 |
ACOT | 140.37 | 1179.84 | 16.61 | 0.16 | 867.58 | 7291.96 | 102.63 |
SINH | 117.85 | 781.78 | 6.88 | 0.13 | 910.78 | 6041.59 | 53.17 |
COSH | 117.73 | 795.34 | 7.00 | 0.13 | 924.06 | 6242.87 | 54.92 |
TANH | 121.37 | 976.78 | 9.20 | 0.10 | 1198.14 | 9642.45 | 90.78 |
ASINH | 92.55 | 778.46 | 13.51 | 0.14 | 656.38 | 5521.02 | 95.81 |
ACOSH | 103.78 | 1349.79 | 20.51 | 0.31 | 332.10 | 4319.31 | 65.65 |
ATANH | 121.46 | 2287.94 | 11.60 | 0.32 | 378.49 | 7129.76 | 36.14 |
SECH | 1922.54 | 978.91 | 9.10 | 0.17 | 11602.53 | 5907.73 | 54.93 |
CSCH | 1947.11 | 960.78 | 8.96 | 0.17 | 11652.35 | 5749.72 | 53.63 |
COTH | 1958.51 | 1268.98 | 10.90 | 0.12 | 16266.72 | 10539.72 | 90.502 |
ASECH | 2378.24 | 2921.78 | 18.75 | 0.43 | 5476.04 | 6727.56 | 43.18 |
ACSCH | 2087.72 | 1188.18 | 17.78 | 0.16 | 12831.71 | 7302.87 | 109.26 |
ACOTH | 2117.19 | 2335.23 | 19.77 | 0.26 | 8083.95 | 8916.49 | 75.47 |
Selected special: | |||||||
gamma | 2491.81 | 7734.53 | 228.35 | 0.76 | 3266.23 | 13018.78 | 299.31 |
erf | 104.11 | 321.20 | 125.88 | 0.16 | 669.96 | 2163.26 | 810.02 |
bessely(0,x) | 7855.70 | 14923.53 | 250.38 | 0.83 | 9482.98 | 18014.89 | 302.25 |
bessely(1,x) | 7302.29 | 14964.26 | 267.94 | 0.83 | 8786.29 | 18005.36 | 322.39 |
besselj(0,x) | 7273.29 | 9998.60 | 90.54 | 0.75 | 9684.81 | 13313.72 | 120.55 |
besselj(1,x) | 5987.67 | 10153.13 | 91.89 | 0.74 | 8077.25 | 13696.38 | 123.96 |
Advanpix toolbox outperforms MATLAB/VPA by 5000 times, MAPLE by 6766 times and Wolfram Mathematica by 100 times by speed in average. Test scripts are available for download:
Run timing_elementary_advanpix
to test Advanpix toolbox, and timing_elementary_vpa
to test VPA. Don’t forget to add toolbox directory to search path before running the toolbox tests!
***
†Toolbox’s timings are higher on GNU Linux & Apple Mac OSX. We can do deeper performance optimization on Windows since we have full license of Intel Developer tools on the platform.
{ 0 comments… add one now }