Re: P.S. Re: UPC program acting like no synchronization

From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Mon Nov 07 2005 - 07:21:56 PST

  • Next message: Jonathan L Brown: "GASP or other UPC profiler on Seaborg"
    At 07:37 PM 11/6/2005, Marc L. Smith wrote:
    >One thought
    >is you can't rely on the order print statements appear, as the
    >buffering can fool you into thinking barriers are being ignored.
    That's absolutely true, and in fact it's more than just buffering effects - 
    because even an fflush(stdout) after each printf doesn't always solve the 
    problem (although it sometimes helps). In most systems, stdout/stderr travel 
    over independent socket connections to the console, in many cases traveling 
    over completely different and usually slower network hardware than the compute 
    nodes are using for communication (eg in a Myrinet cluster, the compute nodes 
    talk over Myrinet, but the stdout/stderr often travel over Ethernet to the 
    frontend console).
    So basically you can never rely upon barriers ordering printf output across 
    threads. You can do hokey things like :
       upc_barrier; fflush(NULL); sleep(1); upc_barrier;
    to try to enforce output ordering across threads, but you obviously don't want 
    to do that often for performance reasons, and anyhow it's only 
    probabilistically correct and increasingly likely to fail at larger scales.
    The one thing you *can* rely upon is that output from any *given* thread 
    should always arrive at the console in the same order it was produced - in 
    other words, the console output is a non-deterministic interleaving of the 
    output streams of each thread, but each thread's output stream is still 
    totally ordered. On a few systems this interleaving may even happen on a byte 
    level (so characters from different thread's output lines can become 
    interleaved, which is usually completely unreadable), however most system 
    spawners try to provide line-by-line buffering so that at worst you get 
    line-by-line interleaving. I've also seen at least one system that will remain 
    unnamed where no output appears until the job completes, and then it simply 
    concatenates the output of each thread in some random order (obviously a 
    painful system to deal with). By the way in most cases all of this is outside 
    of the scope of the Berkeley UPC / GASNet software because stdout/stderr is 
    usually provided by the system's parallel spawner and we're just working with 
    whatever it gives us.
    In any case, if you really need a deterministic order to your output 
    statements, the best solution is often to have a specific thread perform all 
    output. Although if you have a large amount of output data to write, it should 
    probably be going to the compute nodes' file system instead for performance 
    At 02:19 PM 11/6/2005, Steve Reinhardt wrote:
    >How do people overcome this to debug with printf?  Add time stamps?  Use some 
    >other routine?
    Well, the simple answer is "don't debug with printf, use a real debugger" :)
    Berkeley UPC now includes Totalview integration support, and that's the 
    recommended way to find bugs in your program. See:
    >Hi Steve,
    >I haven't used bupc_trace_printf, and don't really know what its
    >purpose in life is, but it sounds like it *might* be helpful.  I'll
    >wait for one of the UPC folks to comment on it.  :-)
    bupc_trace_printf() is used to write a user-provided message into the UPC 
    trace log, which is only available if you configure with --enable-trace (or 
    --enable-ddebug). For details about tracing, see:
    Each thread's trace log is labeled with a wall-clock time stamp on every 
    entry, so using that you could reconstruct a fairly accurate picture of event 
    ordering, modulo slight timing drift across the machine.
    >I'm trying to allow some data structures to get initialized before all the 
    >other threads join the fray.  Basically it's
    >if (MYTHREAD == 0) {
    >         ...initialization stuff... (including some printf()s)
    >... parallel work (including some printf()s)
    >Running on 2P, it runs approximately as I'd want it to, except that thread 1 
    >appears not to participate.  If I toggle the initialization work to be done 
    >on thread 1 instead of thread 0, it appears that thread 0 goes past the 
    >barrier prematurely.
    As explained above, you cannot count on the relative order of printfs across 
    threads. However, you *can* count on barriers ordering updates to shared data 
    structures, as required by the UPC specification. So in the example above, 
    data structure reads in the "parallel work" section should observe the results 
    of any data structure writes in "initialization stuff" (that have not been 
    subsequently overwritten). If you have an example where that appears to be 
    violated, please submit a compile-able example at
    Hope this helps..

  • Next message: Jonathan L Brown: "GASP or other UPC profiler on Seaborg"