OSCER Benchmarks

Aspen Systems Linux Pentium4 Xeon Cluster (`boomer.oscer.ou.edu`)

These benchmarks are provided with no claims as to correctness or relevance.

High Performance Linpack (HPL)

HPL is the de facto standard for benchmarking supercomputers. It's used for determining rankings on the Top 500 List of supercomputers.

Sustained GFLOP/s on 256 procs: 606.9 GFLOP/s (57.5% of peak)
(We used 256 processors instead of 264 because HPL runs faster on numbers of processors that are perfect squares.)

This number was obtained using Kazushige Goto's DGEMM kernel for Pentium4.

Input file
Output file

Benchmarks on various numbers of processors

Raw performance (GFLOP/s) PostScript PDF

Percent of theoretical peak PostScript PDF

STREAM

The STREAM benchmark measures memory bandwidth on one or more processors within a cluster node or an SMP.

All data are in MB/sec.

Note that these benchmarks are based on the generic version of the STREAM benchmark. For each entry, only the maximum values are provided.

Using `gcc -O`
Procs	Copy	Scale	Add	Triad
1	1481.9591	1481.0690	1653.8636	1665.6851
Using `ecgs -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer`
Procs	Copy	Scale	Add	Triad
1	1276.2237	1269.3378	1452.8294	1450.1064
Using `pgcc -O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align -Mdalign -Mvect=sse`
Procs	Copy	Scale	Add	Triad
1	1365.6556	1364.3167	1533.3977	1535.1661
Using `pgf77 -O4 -Munroll -Mnoframe -Mnobounds -Mnodepchk -Mcache_align -Mdalign -Mvect=sse -mp`
Procs	Copy	Scale	Add	Triad
1	1362.3381	1360.8911	1529.9287	1534.9195
Using `icc -O3 -tpp7 -xW -static`
Procs	Copy	Scale	Add	Triad
1	1372.0924	1370.3309	1535.6075	1535.6532

LLCbench

LLCbench contains three benchmarks: BLASBench, CacheBench and MPBench.

BLASBench
BLASBench measures numerical performance using the Basic Linear Algebra Subprograms. The tested routines are: DAXPY (Double precision a*x+y), DGEMV (Double precision GEneral Matrix-Vector multiply) and DGEMM (Double precision GEneral Matrix-Matrix multiply).

Using Netlib BLAS
(gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer) PDF PostScript

Using ATLAS
(gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer) PDF PostScript

Using ATLAS
(icc -O2 -mp -prec_div -pc64 -unroll -tpp7 -xW -vec_report2 -opt_report) PDF PostScript

CacheBench
CacheBench measures memory bandwidth, sort of the way that STREAM does, but also determining the bandwidth of cache(s).

Using
(icc -O2 -mp -prec_div -pc64 -unroll -tpp7 -xW -vec_report2 -opt_report) PDF PostScript

Using
gcc -O3 -funroll-all-loops -fno-f2c -fomit-frame-pointer PDF PostScript

MPBench: coming soon