From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jul 28 2009 - 18:11:42 PDT
Jose, Sorry we have not responded sooner. Your mail arrived while our entire team was involved in an important two-day meeting. First let me say that what you are trying to compare is a little tricky. The performance results you see on the GASNet webpages are a comparison of the speed of MPI vs GASNet for *implementing* UPC-like communications patterns over various networks. That is not quite the same as comparing UPC vs MPI for implementing a given application's communications. Second let me say that UDP is expected to give better latency performance than MPI when both are running on an Ethernet network, but this assumes that network is "mostly reliable" as is the case with most switched Ethernet networks used in clusters. However, if run over a wide-area network or with very inexpensive equipment, it is possible that reliability at the TCP level (used indirectly by MPI) may be more efficient than the UDP implementation that GASNet employs. PLEASE keep in mind that both the MPI and UDP implementations of GASNet exist only for their portability and neither is going to be blindingly fast. Comparing either of them to some other benchmark may satisfy ones curiosity, but I don't see any deep value in such a comparison. Benchmarks in general: In the Berkeley UPC distribution tarball there are upc-tests and upc-examples directories that contain UPC code gathered from many sources. Among them are several benchmarks, some of which might even be correct ;-). Have a look at that collection of code, but be aware that we provide it as-is and since we wrote very little of it may not be able to help much. (We might not even know what some of them do.) Measuring "latency": How you define latency will depend on what really matters to your application. If one wants to look (as we have on the GASNet site) at the time required to implement a UPC-level "strict Put" operation then you are looking at comparing upc_memput() against an MPI Ping-ACK (N-bytes sent, and then wait for a zero-byte reply). The "ACK" in the MPI test is to allow the sender to know the value has reached remote memory before it can perform the next operation (a part of the 'strict' UPC memory model). In the GASNet case, the completion of a blocking Put operation uses lower-level acknowledgments when available from a given network API, which is one of the reasons it outperforms MPI Ping-ACK on many high-speed networks. In the case of UDP, however, no lower-level notification is provided and the comms pattern is pretty much the same as for MPI (a UDP-level ACK sent by the GASNet implementation) If what you want is a Ping-Pong in which node0 sends a message that requires a reply from node1, then you are trying to measure something quite different from what the GASNet performance webpage shows. In MPI the idea of waiting for a message arrival is quite natural. In UPC, however, there is no natural way to wait for "arrival" since there is no "message" concept. In the Berkeley UPC implementation we address this lack with a "semaphore" extension that you may wish to investigate. Without the semaphore or a similar abstraction for point-to-point ordering, a true Ping-Pong is hard to write portably in UPC (and the portable implementation may be quire inefficient). Measuring "bandwidth": In the case of bandwidth the idea is pretty much the same in MPI and UPC: move data in one direction as fast as possible with a given transfer size. Again, however, mapping this into MPI and UPC code is different. In MPI one will use non-blocking sends and receives to get the best possible bandwidth (by overlapping the per-operation overhead with the communication of the previous operations). In UPC one wants to do the same thing. In an ideal work the fact that comms in UPC are done at a language level should allow a smart compiler to automatically transform things into non-blocking transfers when this does not change the program semantics. However, few compilers can do this perfectly (ours can to a limited extent) and even if they could the typical benchmark is transferring to the same same destination repeatedly, possibly preventing such a non-blocking transformation by the compiler. So, how does one express EXPLICITLY non-blocking comms in UPC? Again we have an extension in the Berkeley UPC compiler (proposed as an extension to the UPC language spec) for this purpose. Docs: For info on the semaphore/signaling-put extensions, see http://upc.lbl.gov/publications/PGAS06-p2p.pdf For info on the non-blocking memcpy extensions, see http://upc.lbl.gov/publications/upc_memcpy.pdf I have probably left you with more question than answers, but hopefully the new questions lead you in the right direction. If you could describe for us what you think you want to measure, we might be able to provide more useful answers. However, I will caution you again that the UDP implementation of GASNet exists for portability (not performance) and comparing it to MPI benchmarks is probably of very little value. -Paul Jose Vicente Espi wrote: > Hello, > > I'm testing performance of UPC communications in an UDP network, > comparing it with MPI ping pong bandwith/latency test. But I didn't > get the results that I expected, based in the tests made in > http://gasnet.cs.berkeley.edu/performance/. > > I'm probably doing something wrong, functions used for transferring > data are upc_memput and upc_memget. Only with message sizes smaller > than 512 bytes I get a little better performance than MPI. For larger > message sizes performance become worse. > > Have you got an example of code for measuring performance of UPC vs > MPI in ping pong bandwith test? > > Thanks in advance. > > Jose Vicente. > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory