Re: ping pong in UPC

Date view	Thread view	Subject view	Author view	Attachment view

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jul 28 2009 - 18:11:42 PDT

Next message: Jose Vicente Espi: "Re: ping pong in UPC"

Previous message: Gary Funck: "Re: Defining block size during runtime"
In reply to: Jose Vicente Espi: "ping pong in UPC"
Next in thread: Jose Vicente Espi: "Re: ping pong in UPC"
Reply: Jose Vicente Espi: "Re: ping pong in UPC"

Jose,

Sorry we have not responded sooner.  Your mail arrived while our entire 
team was involved in an important two-day meeting.

First let me say that what you are trying to compare is a little 
tricky.  The performance results you see on the GASNet webpages are a 
comparison of the speed of MPI vs GASNet for *implementing* UPC-like 
communications patterns over various networks.  That is not quite the 
same as comparing UPC vs MPI for implementing a given application's 
communications.

Second let me say that UDP is expected to give better latency 
performance than MPI when both are running on an Ethernet network, but 
this assumes that network is "mostly reliable" as is the case with most 
switched Ethernet networks used in clusters.  However, if run over a 
wide-area network or with very inexpensive equipment, it is possible 
that reliability at the TCP level (used indirectly by MPI) may be more 
efficient than the UDP implementation that GASNet employs.

PLEASE keep in mind that both the MPI and UDP implementations of GASNet 
exist only for their portability and neither is going to be blindingly 
fast.  Comparing either of them to some other benchmark may satisfy ones 
curiosity, but I don't see any deep value in such a comparison.

Benchmarks in general:

In the Berkeley UPC distribution tarball there are upc-tests and 
upc-examples directories that contain UPC code gathered from many 
sources.  Among them are several benchmarks, some of which might even be 
correct ;-).  Have a look at that collection of code, but be aware that 
we provide it as-is and since we wrote very little of it may not be able 
to help much.  (We might not even know what some of them do.)

Measuring "latency":

How you define latency will depend on what really matters to your 
application.  If one wants to look (as we have on the GASNet site) at 
the time required to implement a UPC-level "strict Put" operation then 
you are looking at comparing upc_memput() against an MPI Ping-ACK 
(N-bytes sent, and then wait for a zero-byte reply).  The "ACK" in the 
MPI test is to allow the sender to know the value has reached remote 
memory before it can perform the next operation (a part of the 'strict' 
UPC memory model).  In the GASNet case, the completion of a blocking Put 
operation uses lower-level acknowledgments when available from a given 
network API, which is one of the reasons it outperforms MPI Ping-ACK on 
many high-speed networks.  In the case of UDP, however, no lower-level 
notification is provided and the comms pattern is pretty much the same 
as for MPI (a UDP-level ACK sent by the GASNet implementation)

If what you want is a Ping-Pong in which node0 sends a message that 
requires a reply from node1, then you are trying to measure something 
quite different from what the GASNet performance webpage shows.  In MPI 
the idea of waiting for a message arrival is quite natural.  In UPC, 
however, there is no natural way to wait for "arrival" since there is no 
"message" concept.  In the Berkeley UPC implementation we address this 
lack with a "semaphore" extension that you may wish to investigate.  
Without the semaphore or a similar abstraction for point-to-point 
ordering, a true Ping-Pong is hard to write portably in UPC (and the 
portable implementation may be quire inefficient).

Measuring "bandwidth":

In the case of bandwidth the idea is pretty much the same in MPI and 
UPC: move data in one direction as fast as possible with a given 
transfer size.  Again, however, mapping this into MPI and UPC code is 
different.  In MPI one will use non-blocking sends and receives to get 
the best possible bandwidth (by overlapping the per-operation overhead 
with the communication of the previous operations).  In UPC one wants to 
do the same thing.  In an ideal work the fact that comms in UPC are done 
at a language level should allow a smart compiler to automatically 
transform things into non-blocking transfers when this does not change 
the program semantics.  However, few compilers can do this perfectly 
(ours can to a limited extent) and even if they could the typical 
benchmark is transferring to the same same destination repeatedly, 
possibly preventing such a non-blocking transformation by the compiler.  
So, how does one express EXPLICITLY non-blocking comms in UPC?  Again we 
have an extension in the Berkeley UPC compiler (proposed as an extension 
to the UPC language spec) for this purpose.

Docs:
For info on the semaphore/signaling-put extensions, see 
http://upc.lbl.gov/publications/PGAS06-p2p.pdf
For info on the non-blocking memcpy extensions, see 
http://upc.lbl.gov/publications/upc_memcpy.pdf

I have probably left you with more question than answers, but hopefully 
the new questions lead you in the right direction.  If you could 
describe for us what you think you want to measure, we might be able to 
provide more useful answers.  However, I will caution you again that the 
UDP implementation of GASNet exists for portability (not performance) 
and comparing it to MPI benchmarks is probably of very little value.

-Paul

Jose Vicente Espi wrote:
> Hello,
>
> I'm testing performance of UPC communications in an UDP network, 
> comparing it with MPI ping pong bandwith/latency test. But  I didn't 
> get the results that I expected, based in the tests made in 
> http://gasnet.cs.berkeley.edu/performance/.
>
> I'm probably doing something wrong, functions used for transferring 
> data are upc_memput and upc_memget. Only with message sizes smaller 
> than 512 bytes I get a little better performance than MPI. For larger 
> message sizes performance become worse.
>
> Have you got an example of code for measuring performance of UPC vs 
> MPI  in ping pong bandwith test?
>
> Thanks in advance.
>
> Jose Vicente.
>

-- 
Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
Future Technologies Group                 Tel: +1-510-495-2352
HPC Research Department                   Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory

Next message: Jose Vicente Espi: "Re: ping pong in UPC"

Previous message: Gary Funck: "Re: Defining block size during runtime"
In reply to: Jose Vicente Espi: "ping pong in UPC"
Next in thread: Jose Vicente Espi: "Re: ping pong in UPC"
Reply: Jose Vicente Espi: "Re: ping pong in UPC"

Date view	Thread view	Subject view	Author view	Attachment view