Re: ping pong in UPC

Date view	Thread view	Subject view	Author view	Attachment view
From: Jose Vicente Espi (jvespi_at_gmail_dot_com)
Date: Thu Jul 30 2009 - 02:20:32 PDT
Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for Berkeley UPC Bugzilla"
Previous message: Paul H. Hargrove: "Re: ping pong in UPC"
In reply to: Paul H. Hargrove: "Re: ping pong in UPC"
Hello Paul,

thank you for your answer. Your comments has been very useful for 
solving some doubts I had. I didn't notice about neither examples in the 
tarball nor the Berkeley extensions of the language. Until now I was 
programming based in specs, but you can improve performance using these 
libraries. I really think that these extensions should be included in 
UPC specs, specially those related with synchronizing (semaphores).

My bandwith test now seems like gasnet webpages despite using 
udp-conduit. And for latency test I think useful using 
*bupc_memput_signal_async / bupc_sem_post* routines for signaling data 
arrival and ACK. But I'm also aware that in an UDP cluster results are 
not reliable, soon I expect to access to an Infiniband cluster.

Jose Vicente


Paul H. Hargrove escribi�:
> Jose,
>
> Sorry we have not responded sooner.  Your mail arrived while our 
> entire team was involved in an important two-day meeting.
>
> First let me say that what you are trying to compare is a little 
> tricky.  The performance results you see on the GASNet webpages are a 
> comparison of the speed of MPI vs GASNet for *implementing* UPC-like 
> communications patterns over various networks.  That is not quite the 
> same as comparing UPC vs MPI for implementing a given application's 
> communications.
>
> Second let me say that UDP is expected to give better latency 
> performance than MPI when both are running on an Ethernet network, but 
> this assumes that network is "mostly reliable" as is the case with 
> most switched Ethernet networks used in clusters.  However, if run 
> over a wide-area network or with very inexpensive equipment, it is 
> possible that reliability at the TCP level (used indirectly by MPI) 
> may be more efficient than the UDP implementation that GASNet employs.
>
> PLEASE keep in mind that both the MPI and UDP implementations of 
> GASNet exist only for their portability and neither is going to be 
> blindingly fast.  Comparing either of them to some other benchmark may 
> satisfy ones curiosity, but I don't see any deep value in such a 
> comparison.
>
> Benchmarks in general:
>
> In the Berkeley UPC distribution tarball there are upc-tests and 
> upc-examples directories that contain UPC code gathered from many 
> sources.  Among them are several benchmarks, some of which might even 
> be correct ;-).  Have a look at that collection of code, but be aware 
> that we provide it as-is and since we wrote very little of it may not 
> be able to help much.  (We might not even know what some of them do.)
>
> Measuring "latency":
>
> How you define latency will depend on what really matters to your 
> application.  If one wants to look (as we have on the GASNet site) at 
> the time required to implement a UPC-level "strict Put" operation then 
> you are looking at comparing upc_memput() against an MPI Ping-ACK 
> (N-bytes sent, and then wait for a zero-byte reply).  The "ACK" in the 
> MPI test is to allow the sender to know the value has reached remote 
> memory before it can perform the next operation (a part of the 
> 'strict' UPC memory model).  In the GASNet case, the completion of a 
> blocking Put operation uses lower-level acknowledgments when available 
> from a given network API, which is one of the reasons it outperforms 
> MPI Ping-ACK on many high-speed networks.  In the case of UDP, 
> however, no lower-level notification is provided and the comms pattern 
> is pretty much the same as for MPI (a UDP-level ACK sent by the GASNet 
> implementation)
>
> If what you want is a Ping-Pong in which node0 sends a message that 
> requires a reply from node1, then you are trying to measure something 
> quite different from what the GASNet performance webpage shows.  In 
> MPI the idea of waiting for a message arrival is quite natural.  In 
> UPC, however, there is no natural way to wait for "arrival" since 
> there is no "message" concept.  In the Berkeley UPC implementation we 
> address this lack with a "semaphore" extension that you may wish to 
> investigate.  Without the semaphore or a similar abstraction for 
> point-to-point ordering, a true Ping-Pong is hard to write portably in 
> UPC (and the portable implementation may be quire inefficient).
>
> Measuring "bandwidth":
>
> In the case of bandwidth the idea is pretty much the same in MPI and 
> UPC: move data in one direction as fast as possible with a given 
> transfer size.  Again, however, mapping this into MPI and UPC code is 
> different.  In MPI one will use non-blocking sends and receives to get 
> the best possible bandwidth (by overlapping the per-operation overhead 
> with the communication of the previous operations).  In UPC one wants 
> to do the same thing.  In an ideal work the fact that comms in UPC are 
> done at a language level should allow a smart compiler to 
> automatically transform things into non-blocking transfers when this 
> does not change the program semantics.  However, few compilers can do 
> this perfectly (ours can to a limited extent) and even if they could 
> the typical benchmark is transferring to the same same destination 
> repeatedly, possibly preventing such a non-blocking transformation by 
> the compiler.  So, how does one express EXPLICITLY non-blocking comms 
> in UPC?  Again we have an extension in the Berkeley UPC compiler 
> (proposed as an extension to the UPC language spec) for this purpose.
>
> Docs:
> For info on the semaphore/signaling-put extensions, see 
> http://upc.lbl.gov/publications/PGAS06-p2p.pdf
> For info on the non-blocking memcpy extensions, see 
> http://upc.lbl.gov/publications/upc_memcpy.pdf
>
>
> I have probably left you with more question than answers, but 
> hopefully the new questions lead you in the right direction.  If you 
> could describe for us what you think you want to measure, we might be 
> able to provide more useful answers.  However, I will caution you 
> again that the UDP implementation of GASNet exists for portability 
> (not performance) and comparing it to MPI benchmarks is probably of 
> very little value.
>
> -Paul
>
>
> Jose Vicente Espi wrote:
>> Hello,
>>
>> I'm testing performance of UPC communications in an UDP network, 
>> comparing it with MPI ping pong bandwith/latency test. But  I didn't 
>> get the results that I expected, based in the tests made in 
>> http://gasnet.cs.berkeley.edu/performance/.
>>
>> I'm probably doing something wrong, functions used for transferring 
>> data are upc_memput and upc_memget. Only with message sizes smaller 
>> than 512 bytes I get a little better performance than MPI. For larger 
>> message sizes performance become worse.
>>
>> Have you got an example of code for measuring performance of UPC vs 
>> MPI  in ping pong bandwith test?
>>
>> Thanks in advance.
>>
>> Jose Vicente.
>>
>
>
Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for Berkeley UPC Bugzilla"
Previous message: Paul H. Hargrove: "Re: ping pong in UPC"
In reply to: Paul H. Hargrove: "Re: ping pong in UPC"
Date view	Thread view	Subject view	Author view	Attachment view