Index of /download/dist/upc-tests/uts
.----------------------------------------------------------------------------.
| UTS - Unbalanced Tree Search |
| ______________________________________________________________________ |
| |
| C.-W. Tseng - University of Maryland |
| J. Prins, S. Olivier, J. Liu - UNC Chapel Hill |
| J. Dinan, G. Sabin, P. Sadayappan - The Ohio State University |
| D. Pryor - Supercomputing Research Center |
'----------------------------------------------------------------------------'
INTRODUCTION
===============
The UTS benchmark performs an exhaustive search on an unbalanced tree. The
tree is generated on the fly using a splittable random number generator (RNG)
that allows the random stream to be split and processed in parallel while still
producing a deterministic tree. The splittable RNG has been constructed using
the SHA1 secure hash algorithm. Thus, generating a node's children requires
multiple applications of the SHA1 hash algorithm to generate splittable hashes
for each child.
For more information on UTS, see [REFERENCES] below.
CONFIGURING UTS
===============
Select a configuration file for your system from the 'config/' directory using
the 'configure.sh' script. After running the configure script, edit
'config.in' to review the build parameters.
BUILDING UTS
==============
The Makefiles provided with UTS require GNU Make. To build UTS:
$ gmake [targets]
The following targets are available:
uts-seq - Sequential implementation
uts-stat - Sequential implementation for gathering tree statistics
(statistics output in file stat.txt)
uts-upc - UPC with workstealing distributed load balancing
uts-upc-enhanced - UPC with improved performance on distributed memory systems
uts-upc-dcq - UPC shared global work queue load balancer
uts-omp - OpenMP workstealing
uts-shmem - Shmem workstealing
uts-pthread - Pthread workstealing
uts-mta - MTA-2 futures-parallel recursive implementation
uts-dfs - Sequential recursive implementation (built from from uts-mta)
uts-mpi-wm - MPI with work manager centralized load balancing
uts-mpi-ws - MPI with workstealing distributed load balancing
time_rng - Microbenchmark to measure peak RNG spawn time
RUNNING UTS
==============
Sample workloads are provided in the sample_trees files. These files are
scriptable and are meant to be used as in:
$ source sample_trees.sh
$ ./uts-seq $T1
Or if using csh:
$ source sample_trees.csh
$ ./uts-seq $T1
Depending on the number and speed of processors used, we suggest
measuring performance for the following trees:
T3L (~100 million nodes) for 1-8 processors
T3XXL (~3 billion nodes) for 8-128 processors
T2WL (~300 billion nodes) for 128-1024+ processors
More and less verbose output can be gathered using the '-v #' flag. '-v 0'
provides concise data suitable for graphing, '-v 1' is the normal output, level
2 provides additional load balancing statistics, and higher levels provide
greater detail.
In addition to this, extra parameters are available for different
implementations that can be used to fine-tune load balancing or get more
verbose output. These parameters can be viewed by running one of the uts
targets with the '-h' flag.
The chunk size parameter '-c' is the main knob used to tune load balancing. It
is available on most parallel builds of UTS and it controls the granularity
with which load balancing is performed. Specifically, it controls how many
nodes are transferred from one process to another as a result of a load
balancing operation. Smaller chunk sizes allow for more fine-grained load
balancing but cost more in overhead. Larger chunk sizes have lower overhead
but can result in load imbalance. In practice, the best chunk size will vary
for a given system and parallel implementation and will also vary slightly over
different workloads. Finding a single chunk size that performs well on all
workloads can be achieved through experimentation. By default, UTS uses a
chunk size that gives reasonable performance across a range of systems.
For example, to perform a parallel search on T3 using 4 Pthreads with a chunk
size of 20:
$ ./uts-pthread -T 4 -c 20 $T3
THE OUTPUT
==============
The benchmark's result is to print out the total number of nodes explored
in the tree. An execution run is valid if the number of nodes explored
matches the number of nodes given in the sample_trees script. The benchmark
also outputs the performance that was achieved in nodes explored per second.
The peak performance for a system can be calculated using the time_rng
microbenchmark. In practice, actual performance will be less than the
value given by the microbenchmark because of additional search operations,
load imbalance, and load balancing overhead.
UTS FILES
===============
Random Number Generators (RNG):
rng/ - Location of random number generators
alfg.c - Additive lagged fibonacci generator
alfg.h
brg_sha1.c - Dr. Brian R. Gladman's SHA-1 implementation
brg_...
Configuration Files:
configure.sh - Configure UTS (generates config.in)
config/ - Location where configuration files are stored
config.in - Symlink to your configuration file in config/
Makefile - GNU Makefile, reads config.in to build UTS
Core Files:
dequeue.c - Double-ended queue data structure
dequeue.h
dlist.c - Doubly linked list data structure
dlist.h
shared_dequeue.c - UPC shared double-ended queue
shared_dequeue.h
shared_dlist.c - UPC shared doubly linked list
shared_dlist.h
uts.c,h - UTS tree generation library
uts_dm.c,h - Base for distributed memory/PGAS implementations
(MPI, UPC-DCQ)
uts_shm.c - Base for shared memory implementations
(UPC, OpenMP, Shmem, Serial, and Stats)
uts_upc_enhanced.c - UPC implementation with improved performance on
distributed memory systems
uts_dfs.c - Base for MTA-2 and serial uts-dfs. This code
performs a recursive search.
mpi_worksharing.c - Work-Manager implementation using MPI
mpi_workstealing.c - WorkStealing implementation using MPI
upc_worksharing.c - Distributed global queue using UPC
time_rng.c - RNG timing program
Debugging Files:
stats.c - Tracing and additional statistics generation code
check_ctrk.pl - Chunk tracker debugging tool
ctrk.h
Documentation:
LICENSE - The license that UTS is distributed under
AUTHORS - Contributors to UTS
README - This file
sample_trees.sh - Sample inputs to UTS (sh shell script)
sample_trees.csh - Sample inputs to UTS (csh shell script)
REFERENCES
===============
For more information, please refer to the following publications:
S. Olivier, J. Prins, "Scalable Dynamic Load Balancing Using UPC", Proc. of
37th International Conference on Parallel Processing (ICPP-08), Portland, OR,
September 2008.
J. Dinan, S. Olivier, J. Prins, G. Sabin, P. Sadayappan, C.-W. Tseng, "Dynamic
Load Balancing of Unbalanced Computations Using Message Passing", Proc. of 6th
Intl. Workshop on Performance Modeling, Evaluation, and Optimization of
Parallel and Distributed Systems (PMEO-PDS 2007), Long Beach, CA, 2007.
S. Olivier, J. Huan, J. Liu, J. Prins, J. Dinan, P. Sadayappan, C.-W. Tseng,
"UTS: An Unbalanced Tree Search Benchmark", 19th International Workshop on
Languages and Compilers for Parallel Computing (LCPC 2006), 2006.
J. Prins, J. Huan, W. Pugh, C.-W. Tseng, P. Sadayappan, "UPC Implementation of
an Unbalanced Tree Search Benchmark", UNC Dept. of Computer Science Technical
Report 03-034, Oct 2003