OSCER Benchmarks
These benchmarks are provided with
no claims as to correctness or relevance.
HPL is the de facto standard for benchmarking
supercomputers. It's used for determining rankings on
the Top 500 List of supercomputers.
Sustained GFLOP/s on 256 procs: 606.9 GFLOP/s
(57.5% of peak)
(We used 256 processors instead of 264 because HPL runs
faster on numbers of processors that are perfect squares.)
This number was obtained using Kazushige
Goto's DGEMM kernel for Pentium4.
Input file
Output file
Benchmarks on various numbers of processors
The STREAM benchmark measures memory bandwidth
on one or more processors within a cluster node or an
SMP.
All data are in MB/sec.
Note that these benchmarks are based on the
generic version of the STREAM benchmark. For each entry,
only the maximum values are provided.
Using gcc
-O |
Procs |
Copy |
Scale |
Add |
Triad |
1 |
1481.9591 |
1481.0690 |
1653.8636 |
1665.6851
| |
Using ecgs
-O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer
|
Procs |
Copy |
Scale |
Add |
Triad |
1 |
1276.2237 |
1269.3378 |
1452.8294 |
1450.1064
| |
Using pgcc
-O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align
-Mdalign -Mvect=sse |
Procs |
Copy |
Scale |
Add |
Triad |
1 |
1365.6556 |
1364.3167 |
1533.3977 |
1535.1661
| |
Using pgf77
-O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align
-Mdalign -Mvect=sse -mp |
Procs |
Copy |
Scale |
Add |
Triad |
1 |
1362.3381 |
1360.8911 |
1529.9287 |
1534.9195
| |
Using icc
-O3 -tpp7 -xW -static |
Procs |
Copy |
Scale |
Add |
Triad |
1 |
1372.0924 |
1370.3309 |
1535.6075 |
1535.6532
| |
LLCbench contains three benchmarks: BLASBench, CacheBench
and MPBench.
BLASBench
BLASBench measures numerical performance using the
Basic Linear Algebra
Subprograms. The tested routines are: DAXPY
(Double precision a*x+y ), DGEMV
(Double precision GEneral Matrix-Vector
multiply) and DGEMM (Double
precision GEneral Matrix-Matrix multiply).
Using Netlib BLAS
(gcc -O3 -funroll-all-loops -fno-f2c
-fomit-frame-pointer ) |
PDF
|
PostScript
|
Using ATLAS
(gcc -O3 -funroll-all-loops -fno-f2c
-fomit-frame-pointer ) |
PDF |
PostScript
|
Using ATLAS
(icc -O2 -mp -prec_div -pc64 -unroll
-tpp7 -xW -vec_report2 -opt_report ) |
PDF
|
PostScript
|
CacheBench
CacheBench measures memory bandwidth, sort of the
way that STREAM does, but also determining the bandwidth
of cache(s).
Using
(icc -O2 -mp -prec_div -pc64 -unroll
-tpp7 -xW -vec_report2 -opt_report ) |
PDF |
PostScript
|
Using gcc -O3 -funroll-all-loops
-fno-f2c -fomit-frame-pointer |
PDF
|
PostScript
|
MPBench: coming soon
|