Re: ping pong in UPC

From: Jose Vicente Espi (jvespi_at_gmail_dot_com)
Date: Thu Jul 30 2009 - 02:20:32 PDT

  • Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for Berkeley UPC Bugzilla"
    Hello Paul,
    
    thank you for your answer. Your comments has been very useful for 
    solving some doubts I had. I didn't notice about neither examples in the 
    tarball nor the Berkeley extensions of the language. Until now I was 
    programming based in specs, but you can improve performance using these 
    libraries. I really think that these extensions should be included in 
    UPC specs, specially those related with synchronizing (semaphores).
    
    My bandwith test now seems like gasnet webpages despite using 
    udp-conduit. And for latency test I think useful using 
    *bupc_memput_signal_async / bupc_sem_post* routines for signaling data 
    arrival and ACK. But I'm also aware that in an UDP cluster results are 
    not reliable, soon I expect to access to an Infiniband cluster.
    
    Jose Vicente
    
    
    Paul H. Hargrove escribi�:
    > Jose,
    >
    > Sorry we have not responded sooner.  Your mail arrived while our 
    > entire team was involved in an important two-day meeting.
    >
    > First let me say that what you are trying to compare is a little 
    > tricky.  The performance results you see on the GASNet webpages are a 
    > comparison of the speed of MPI vs GASNet for *implementing* UPC-like 
    > communications patterns over various networks.  That is not quite the 
    > same as comparing UPC vs MPI for implementing a given application's 
    > communications.
    >
    > Second let me say that UDP is expected to give better latency 
    > performance than MPI when both are running on an Ethernet network, but 
    > this assumes that network is "mostly reliable" as is the case with 
    > most switched Ethernet networks used in clusters.  However, if run 
    > over a wide-area network or with very inexpensive equipment, it is 
    > possible that reliability at the TCP level (used indirectly by MPI) 
    > may be more efficient than the UDP implementation that GASNet employs.
    >
    > PLEASE keep in mind that both the MPI and UDP implementations of 
    > GASNet exist only for their portability and neither is going to be 
    > blindingly fast.  Comparing either of them to some other benchmark may 
    > satisfy ones curiosity, but I don't see any deep value in such a 
    > comparison.
    >
    > Benchmarks in general:
    >
    > In the Berkeley UPC distribution tarball there are upc-tests and 
    > upc-examples directories that contain UPC code gathered from many 
    > sources.  Among them are several benchmarks, some of which might even 
    > be correct ;-).  Have a look at that collection of code, but be aware 
    > that we provide it as-is and since we wrote very little of it may not 
    > be able to help much.  (We might not even know what some of them do.)
    >
    > Measuring "latency":
    >
    > How you define latency will depend on what really matters to your 
    > application.  If one wants to look (as we have on the GASNet site) at 
    > the time required to implement a UPC-level "strict Put" operation then 
    > you are looking at comparing upc_memput() against an MPI Ping-ACK 
    > (N-bytes sent, and then wait for a zero-byte reply).  The "ACK" in the 
    > MPI test is to allow the sender to know the value has reached remote 
    > memory before it can perform the next operation (a part of the 
    > 'strict' UPC memory model).  In the GASNet case, the completion of a 
    > blocking Put operation uses lower-level acknowledgments when available 
    > from a given network API, which is one of the reasons it outperforms 
    > MPI Ping-ACK on many high-speed networks.  In the case of UDP, 
    > however, no lower-level notification is provided and the comms pattern 
    > is pretty much the same as for MPI (a UDP-level ACK sent by the GASNet 
    > implementation)
    >
    > If what you want is a Ping-Pong in which node0 sends a message that 
    > requires a reply from node1, then you are trying to measure something 
    > quite different from what the GASNet performance webpage shows.  In 
    > MPI the idea of waiting for a message arrival is quite natural.  In 
    > UPC, however, there is no natural way to wait for "arrival" since 
    > there is no "message" concept.  In the Berkeley UPC implementation we 
    > address this lack with a "semaphore" extension that you may wish to 
    > investigate.  Without the semaphore or a similar abstraction for 
    > point-to-point ordering, a true Ping-Pong is hard to write portably in 
    > UPC (and the portable implementation may be quire inefficient).
    >
    > Measuring "bandwidth":
    >
    > In the case of bandwidth the idea is pretty much the same in MPI and 
    > UPC: move data in one direction as fast as possible with a given 
    > transfer size.  Again, however, mapping this into MPI and UPC code is 
    > different.  In MPI one will use non-blocking sends and receives to get 
    > the best possible bandwidth (by overlapping the per-operation overhead 
    > with the communication of the previous operations).  In UPC one wants 
    > to do the same thing.  In an ideal work the fact that comms in UPC are 
    > done at a language level should allow a smart compiler to 
    > automatically transform things into non-blocking transfers when this 
    > does not change the program semantics.  However, few compilers can do 
    > this perfectly (ours can to a limited extent) and even if they could 
    > the typical benchmark is transferring to the same same destination 
    > repeatedly, possibly preventing such a non-blocking transformation by 
    > the compiler.  So, how does one express EXPLICITLY non-blocking comms 
    > in UPC?  Again we have an extension in the Berkeley UPC compiler 
    > (proposed as an extension to the UPC language spec) for this purpose.
    >
    > Docs:
    > For info on the semaphore/signaling-put extensions, see 
    > http://upc.lbl.gov/publications/PGAS06-p2p.pdf
    > For info on the non-blocking memcpy extensions, see 
    > http://upc.lbl.gov/publications/upc_memcpy.pdf
    >
    >
    > I have probably left you with more question than answers, but 
    > hopefully the new questions lead you in the right direction.  If you 
    > could describe for us what you think you want to measure, we might be 
    > able to provide more useful answers.  However, I will caution you 
    > again that the UDP implementation of GASNet exists for portability 
    > (not performance) and comparing it to MPI benchmarks is probably of 
    > very little value.
    >
    > -Paul
    >
    >
    > Jose Vicente Espi wrote:
    >> Hello,
    >>
    >> I'm testing performance of UPC communications in an UDP network, 
    >> comparing it with MPI ping pong bandwith/latency test. But  I didn't 
    >> get the results that I expected, based in the tests made in 
    >> http://gasnet.cs.berkeley.edu/performance/.
    >>
    >> I'm probably doing something wrong, functions used for transferring 
    >> data are upc_memput and upc_memget. Only with message sizes smaller 
    >> than 512 bytes I get a little better performance than MPI. For larger 
    >> message sizes performance become worse.
    >>
    >> Have you got an example of code for measuring performance of UPC vs 
    >> MPI  in ping pong bandwith test?
    >>
    >> Thanks in advance.
    >>
    >> Jose Vicente.
    >>
    >
    >
    

  • Next message: Paul H. Hargrove: "ANNC: Scheduled downtime for Berkeley UPC Bugzilla"