From: Dan Bonachea (bonachea_at_cs_dot_berkeley_dot_edu)
Date: Mon Nov 07 2005 - 07:21:56 PST
At 07:37 PM 11/6/2005, Marc L. Smith wrote: >One thought >is you can't rely on the order print statements appear, as the >buffering can fool you into thinking barriers are being ignored. That's absolutely true, and in fact it's more than just buffering effects - because even an fflush(stdout) after each printf doesn't always solve the problem (although it sometimes helps). In most systems, stdout/stderr travel over independent socket connections to the console, in many cases traveling over completely different and usually slower network hardware than the compute nodes are using for communication (eg in a Myrinet cluster, the compute nodes talk over Myrinet, but the stdout/stderr often travel over Ethernet to the frontend console). So basically you can never rely upon barriers ordering printf output across threads. You can do hokey things like : upc_barrier; fflush(NULL); sleep(1); upc_barrier; to try to enforce output ordering across threads, but you obviously don't want to do that often for performance reasons, and anyhow it's only probabilistically correct and increasingly likely to fail at larger scales. The one thing you *can* rely upon is that output from any *given* thread should always arrive at the console in the same order it was produced - in other words, the console output is a non-deterministic interleaving of the output streams of each thread, but each thread's output stream is still totally ordered. On a few systems this interleaving may even happen on a byte level (so characters from different thread's output lines can become interleaved, which is usually completely unreadable), however most system spawners try to provide line-by-line buffering so that at worst you get line-by-line interleaving. I've also seen at least one system that will remain unnamed where no output appears until the job completes, and then it simply concatenates the output of each thread in some random order (obviously a painful system to deal with). By the way in most cases all of this is outside of the scope of the Berkeley UPC / GASNet software because stdout/stderr is usually provided by the system's parallel spawner and we're just working with whatever it gives us. In any case, if you really need a deterministic order to your output statements, the best solution is often to have a specific thread perform all output. Although if you have a large amount of output data to write, it should probably be going to the compute nodes' file system instead for performance reasons. At 02:19 PM 11/6/2005, Steve Reinhardt wrote: >How do people overcome this to debug with printf? Add time stamps? Use some >other routine? Well, the simple answer is "don't debug with printf, use a real debugger" :) Berkeley UPC now includes Totalview integration support, and that's the recommended way to find bugs in your program. See: http://upc.lbl.gov/docs/user/index.html#debugging >Hi Steve, > >I haven't used bupc_trace_printf, and don't really know what its >purpose in life is, but it sounds like it *might* be helpful. I'll >wait for one of the UPC folks to comment on it. :-) bupc_trace_printf() is used to write a user-provided message into the UPC trace log, which is only available if you configure with --enable-trace (or --enable-ddebug). For details about tracing, see: http://upc.lbl.gov/docs/user/index.html#tracing Each thread's trace log is labeled with a wall-clock time stamp on every entry, so using that you could reconstruct a fairly accurate picture of event ordering, modulo slight timing drift across the machine. >I'm trying to allow some data structures to get initialized before all the >other threads join the fray. Basically it's > >if (MYTHREAD == 0) { > ...initialization stuff... (including some printf()s) >} >upc_barrier; > >... parallel work (including some printf()s) > >Running on 2P, it runs approximately as I'd want it to, except that thread 1 >appears not to participate. If I toggle the initialization work to be done >on thread 1 instead of thread 0, it appears that thread 0 goes past the >barrier prematurely. As explained above, you cannot count on the relative order of printfs across threads. However, you *can* count on barriers ordering updates to shared data structures, as required by the UPC specification. So in the example above, data structure reads in the "parallel work" section should observe the results of any data structure writes in "initialization stuff" (that have not been subsequently overwritten). If you have an example where that appears to be violated, please submit a compile-able example at http://upc-bugs.lbl.gov Hope this helps.. Dan