Index of /download/dist/upc-tests/gwu-npb-upc
GWU - UPC NPB Benchmark Suite
===============================
The George Washington University - High Performance Computing Laboratory
- http://hpcl.gwu.edu/
Work done under the GWU UPC Project, supervised by Professor Tarek El-Ghazawi
- [email protected]
Project website :
- http://threads.hpcl.gwu.edu/sites/npb-upc
- SVN repository of the code, bug reports, ...
I) Presentation and remarks
---------------------------
The following kernels are available:
- CG - Conjugate Gradient: This benchmark computes an approximation to the
smallest eigenvalue of symmetric positive definite matrix. This kernel
features unstructured grid computations requiring irregular long-range
communications.
- EP - Embarrassingly Parallel: This benchmark can run on any number of
processors with little communication. It estimates the upper achievable
limits for floating point performance of a parallel computer. This
kernel generates pairs of Gaussian random deviates according to a
specific scheme and tabulates the number of pairs in successive annuli.
- FT - Fast Fourier Transform: This benchmark solves a 3D partial
differential equation using an FFT-based spectral method, also
requiring long range communication. FT performs three one-dimensional
(1-D) FFT's, one for each dimension.
- IS - Integer Sort: This benchmark is a parallel sorting program based on
the bucket sort. It requires a lot of total exchange communication.
- MG - MultiGrid: This benchmark uses a V-cycle multigrid method to compute
the solution of the 3-D scalar Poisson equation. It performs both short
and long range communications that are highly structured.
- BTIO - Test of different parallel I/O techniques
II) Content of the distribution
-------------------------------
Follows a short-description of the structure of the NAS NPB UPC distribution:
README <- This file
CG/ <- contains the CG kernel and Makefile files
EP/ <- contains the EP kernel and Makefile files
FT/ <- contains the FT kernel and Makefile files
IS/ <- contains the IS kernel and Makefile files
MG/ <- contains the MG kernel global.h and Makefile files
??/variants/ <- gathers different UPC variants of the ?? problem, including
a dynamic memory allocated O0 version, a static memory allocated O3
version (and sometimes even a dynamic memory allocated O1 version) (for
details over variants O0, O1 and O3, please check the notes)
bin/ <- executables will be built in this directory
common/ <- common files (C files)
config/ <- Configuration files
(See section III: Building instructions)
sys/ <- C file to create the npbparams.h for each workload
Notes:
- The notations O0, O1 and O3 are referring to the paper "UPC Benchmarking
Issues" (Tarek El-Ghazawi, Sebastien Chauvin, 30th Annual Conference
IEEE International Conference on Parallel Processing, 2001 (ICPP01)
Pages 365-372).
O0: No privatization, no prefetching
O1: Privatization hand-optimization implemented (local shared accesses
converted as much as possible to private accesses)
O2: No privatization but prefetching implemented (prefetching of block
of shared references) (NOTE: There is no O2 version of any
NPB workload)
O3: Privatization and prefetching hand-coded.
- All these problems have been implemented using dynamically allocated
shared memory. Several (CG, FT, IS) have a statically allocated shared
memory variant.
III) Building instructions
--------------------------
a) The built need to be configured for your specific UPC compiler.
Some defaults are set in the config/Makefile.default file.
config/Makefile.in will replace those defaults;
cp config/Makefile.default config/Makefile.in
vim config/Makefile.in (Configure it for your specific compiler)
More advanced options are located in config/make.def
b) Go to the workload directory (e.g. cd CG)
c) Clean the current binaries files present in the workload directory (e.g.
gmake clean)
d) Make the binary for the CLASS and Number of Processors chosen using the
most optimized UPC version (e.g. gmake CLASS=A NP=4)
Make options:
------------
* VARIANT: can be specified during compilation to use a different UPC
variant of a given workload (gmake CLASS=A NP=4 VARIANT=O1).
It can be O0, O1, O1static, O3 and O3static, depending on the available
implementations of a given workload.
* CLASS: can be S, W, A, B or C (smaller to larger sizes). A larger CLASS
(CLASS D) is even present in the NPB2.4 workloads (except IS).
* NP: Number of threads (limited by the type of workload and the number of
CPU present).
* USE_MONOTONIC_CLOCK=1: Make use of the system monotonic clock for more
precise timing results.
e) Run the UPC Binary file created
V) Revision History
-------------------
v1.00: Initial Effort - Implemented in a way similar to MPI. The distribution
is no longer available on the web.
v2.00: First Release - May 9th 2003
v2.01: Minor Changes - Improvement of the Makefiles in order to avoid to do a
'make clean' before each compilation - May 13th 2003
v2.02: Bug fix - Do a single useful upc_all_lock_alloc() call instead of
two (FT workload) - May 16th 2003
v2.03: New Workload - CG added to the kernels (O0, O1, O3) - May 19th 2003
v2.04: New Workload - IS added to the kernels (O0, O1) - May 29th 2003
v2.05: Started conjoint development of NAS 2.4 - CG, EP, FT, IS kernels -
June 5th 2003
v2.06: New Workload - MG added to the kernels (O0, O1, O3) - June 26th 2003
v2.07: New Makefile accepting VARIANT flag, new template for Berkeley UPC
Compiler available in config/models/ - July 14th 2003
v2.08: Portability of the support/ scripts over HP Tru64 Unix sh shell -
July 15th 2003
v2.09: Various bugfix over CG and MG, implementation of a file_output for MG -
November 2004
v2.20 and later release : Please consult the ChangeLog file
After version 2.20, the version numbering scheme changed as follow :
npb-NASVERSION-YY.MM.tar.gz
So, npb-upc-2.4-11.02.tar.gz stands for NAS Parallel Benchmarks for
the 2.4 NAS specification, released February 2011.
VI) Acknowledgements
--------------------
Please consult the AUTHORS file for the complete list of people having
contributed to this software.