Re: UPC Benchmarks

Date view	Thread view	Subject view	Author view	Attachment view

From: Steven D. Vormwald (sdvormwa_at_mtu_dot_edu)
Date: Thu Feb 12 2009 - 13:48:10 PST

Next message: Paul H. Hargrove: "Re: pthreads"

Previous message: Nenad Vukicevic: "Re: UPC Benchmarks"
In reply to: Gary Funck: "Re: UPC Benchmarks"

Gary Funck wrote:
> Steven,
> 
> A student at MTU, Zhang Zhang, presented some UPC benchmark results
> back in 2004/2005:
> http://www.upc.mtu.edu/papers/ZhangIPDPS05.pdf
> http://upc.gwu.edu/~upc/upcworkshop04/MTU-upcworkshop04.pdf
> 
> We looked at those his paper, and those benchmarks, and noted
> some methodological errors.  Notably, a buggy version of the NPB benchmark
> developed by GWU was utilized which skewed results and led to some
> false indications of failures when run on various platforms.
> This led to apparent "no shows" by various compilers.
> 
> A couple of years ago, we collected UPC benchmarks from various
> sources, and re-worked them so that they (1) execute enough iterations
> to be meaningful on modern hardware, (2) did not print extraneous
> output during the timing run part of the benchmark, and (3) were run
> in a dedicated OS environment (run level 1 on Linux) to avoid
> extraneous timing noise created by normal OS activities (4) sufficient
> runs of the benchmarks were made to obtain a representative timing
> sample.  We found that all these steps were necessary to obtain
> reasonable timing results.  During that process, we did not attempt
> to verify that each benchmark measured exactly what it was trying
> to measure in an effective fashion.  Further, we didn't try to
> verify that complex benchmarks (like NPB) produced correct results.
> 
> Although I commend Zhang Zhang for advancing knowledge in the
> area of UPC performance -- due to methodological errors it is
> unfortunate that his paper is the seminal work in this area.
> I'd like to see his experiments re-done with the errors corrected,
> and run against current compilers and runtime systems.
> 
> A procedural recommendation: while developing and selecting
> benchmarks and collecting initial results, I'd encourage
> that the results be run by each vendor involved to ensure that
> the compiler was executed with appropriate paramaters and to
> give the vendor the opportunity to fix small errors/bugs,
> and to verify that the benchmarks in fact measure the
> feature as intended.
> 
> - Gary

Gary,

Thank you for your prompt reply.  You raise some valid points about 
benchmarks in general and the history of benchmarks in UPC.  As I see 
it, there are two primary reasons for having language benchmarks.  The 
first is to measure the performance of various implementations of the 
language, and the points you brought up address this wonderfully.  The 
second is to give implementation developers and researchers examples of 
important program behaviors, that they can use to model the effect that 
their "[not-so-]great new optimization" will have on "real applications".

For the work we are doing, the behavior of applications (in particular, 
the remote memory access patterns) is more important than with the 
efficiency and "correctness" of the implementation of various language 
features.  We are running the benchmarks on an instrumented version of 
MuPC that records a trace of all remote memory accesses (and doesn't 
optimize them away...) that is then analyzed offline, so 
micro-benchmarks that focus on the performance of a few language 
features aren't as useful as benchmarks that have remote access 
behaviors more similar to those one would expect to find in a real 
application.

I'd be happy to get recommendations for algorithms to implement (and 
also the "proper" way to implement them in UPC) that people think would 
provide good coverage of actual application behaviors.  At the moment, I 
have a couple different simple matrix-multiply programs (naive element 
by element computed in a upc_forall loop) that differ in the 
distribution of the shared arrays (checkerboard, cyclic, block-cyclic), 
and an (equally naive) implementation of the Jacobi method that halts 
after a given number of iterations instead of when the result converges, 
though it still does the convergence check.

Steven Vormwald

Next message: Paul H. Hargrove: "Re: pthreads"

Previous message: Nenad Vukicevic: "Re: UPC Benchmarks"
In reply to: Gary Funck: "Re: UPC Benchmarks"

Date view	Thread view	Subject view	Author view	Attachment view