From: Jose Vicente Espi (jvespi_at_gmail_dot_com)
Date: Thu Jul 30 2009 - 02:20:32 PDT
Hello Paul, thank you for your answer. Your comments has been very useful for solving some doubts I had. I didn't notice about neither examples in the tarball nor the Berkeley extensions of the language. Until now I was programming based in specs, but you can improve performance using these libraries. I really think that these extensions should be included in UPC specs, specially those related with synchronizing (semaphores). My bandwith test now seems like gasnet webpages despite using udp-conduit. And for latency test I think useful using *bupc_memput_signal_async / bupc_sem_post* routines for signaling data arrival and ACK. But I'm also aware that in an UDP cluster results are not reliable, soon I expect to access to an Infiniband cluster. Jose Vicente Paul H. Hargrove escribi�: > Jose, > > Sorry we have not responded sooner. Your mail arrived while our > entire team was involved in an important two-day meeting. > > First let me say that what you are trying to compare is a little > tricky. The performance results you see on the GASNet webpages are a > comparison of the speed of MPI vs GASNet for *implementing* UPC-like > communications patterns over various networks. That is not quite the > same as comparing UPC vs MPI for implementing a given application's > communications. > > Second let me say that UDP is expected to give better latency > performance than MPI when both are running on an Ethernet network, but > this assumes that network is "mostly reliable" as is the case with > most switched Ethernet networks used in clusters. However, if run > over a wide-area network or with very inexpensive equipment, it is > possible that reliability at the TCP level (used indirectly by MPI) > may be more efficient than the UDP implementation that GASNet employs. > > PLEASE keep in mind that both the MPI and UDP implementations of > GASNet exist only for their portability and neither is going to be > blindingly fast. Comparing either of them to some other benchmark may > satisfy ones curiosity, but I don't see any deep value in such a > comparison. > > Benchmarks in general: > > In the Berkeley UPC distribution tarball there are upc-tests and > upc-examples directories that contain UPC code gathered from many > sources. Among them are several benchmarks, some of which might even > be correct ;-). Have a look at that collection of code, but be aware > that we provide it as-is and since we wrote very little of it may not > be able to help much. (We might not even know what some of them do.) > > Measuring "latency": > > How you define latency will depend on what really matters to your > application. If one wants to look (as we have on the GASNet site) at > the time required to implement a UPC-level "strict Put" operation then > you are looking at comparing upc_memput() against an MPI Ping-ACK > (N-bytes sent, and then wait for a zero-byte reply). The "ACK" in the > MPI test is to allow the sender to know the value has reached remote > memory before it can perform the next operation (a part of the > 'strict' UPC memory model). In the GASNet case, the completion of a > blocking Put operation uses lower-level acknowledgments when available > from a given network API, which is one of the reasons it outperforms > MPI Ping-ACK on many high-speed networks. In the case of UDP, > however, no lower-level notification is provided and the comms pattern > is pretty much the same as for MPI (a UDP-level ACK sent by the GASNet > implementation) > > If what you want is a Ping-Pong in which node0 sends a message that > requires a reply from node1, then you are trying to measure something > quite different from what the GASNet performance webpage shows. In > MPI the idea of waiting for a message arrival is quite natural. In > UPC, however, there is no natural way to wait for "arrival" since > there is no "message" concept. In the Berkeley UPC implementation we > address this lack with a "semaphore" extension that you may wish to > investigate. Without the semaphore or a similar abstraction for > point-to-point ordering, a true Ping-Pong is hard to write portably in > UPC (and the portable implementation may be quire inefficient). > > Measuring "bandwidth": > > In the case of bandwidth the idea is pretty much the same in MPI and > UPC: move data in one direction as fast as possible with a given > transfer size. Again, however, mapping this into MPI and UPC code is > different. In MPI one will use non-blocking sends and receives to get > the best possible bandwidth (by overlapping the per-operation overhead > with the communication of the previous operations). In UPC one wants > to do the same thing. In an ideal work the fact that comms in UPC are > done at a language level should allow a smart compiler to > automatically transform things into non-blocking transfers when this > does not change the program semantics. However, few compilers can do > this perfectly (ours can to a limited extent) and even if they could > the typical benchmark is transferring to the same same destination > repeatedly, possibly preventing such a non-blocking transformation by > the compiler. So, how does one express EXPLICITLY non-blocking comms > in UPC? Again we have an extension in the Berkeley UPC compiler > (proposed as an extension to the UPC language spec) for this purpose. > > Docs: > For info on the semaphore/signaling-put extensions, see > http://upc.lbl.gov/publications/PGAS06-p2p.pdf > For info on the non-blocking memcpy extensions, see > http://upc.lbl.gov/publications/upc_memcpy.pdf > > > I have probably left you with more question than answers, but > hopefully the new questions lead you in the right direction. If you > could describe for us what you think you want to measure, we might be > able to provide more useful answers. However, I will caution you > again that the UDP implementation of GASNet exists for portability > (not performance) and comparing it to MPI benchmarks is probably of > very little value. > > -Paul > > > Jose Vicente Espi wrote: >> Hello, >> >> I'm testing performance of UPC communications in an UDP network, >> comparing it with MPI ping pong bandwith/latency test. But I didn't >> get the results that I expected, based in the tests made in >> http://gasnet.cs.berkeley.edu/performance/. >> >> I'm probably doing something wrong, functions used for transferring >> data are upc_memput and upc_memget. Only with message sizes smaller >> than 512 bytes I get a little better performance than MPI. For larger >> message sizes performance become worse. >> >> Have you got an example of code for measuring performance of UPC vs >> MPI in ping pong bandwith test? >> >> Thanks in advance. >> >> Jose Vicente. >> > >