From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Dec 24 2009 - 01:09:50 PST
Nikita, Our default representation for a shared pointer packs all the required info into 64 bits. By default only 34 bits are used for "address" bits. This means that the maximum shared heap size per UPC thread is 16G. So, I would guess that the 2 and 4 thread cases are simply using too much memory (I see --shared-heap=32G in the job script) and the addressing is getting truncated internally leading to the crash. Assuming you do actually have enough memory on your compute nodes, there are at least two things you could do to address this 16GB shared heap limit. Unfortunately they both require rebuilding the Berkeley UPC runtime: Option 1) Pass --enable-sptr-struct to configure. This will use a struct to represent a shared pointer and effectively removes any limitations on the shared heap imposed by the shared pointer representation. Unfortunately, this tends to perform less well than the 64-bit "packed" representation. Option 2) You could keep the 64-bit packed representation but adjust how many bits are allocated to the address field. To do this pass --with-sptr-packed-bits=P,T,A to configure, for suitable integer values of P (phase bits), T (thread bits) and A (address bits). For an explanation of these values, see the section "TRADING-OFF MAXIMUM 'THREADS', BLOCKSIZE, AND HEAP SIZE" in the INSTALL.TXT file in the Berkley UPC source (or online at http://upc.lbl.gov/download/dist/INSTALL.TXT ) I would try "Option 1" first. If that does not work, then I am wrong about the cause of your crashes and "Option 2" will not be of any use regardless of what P,T,A values you pass. We should probably be sanity checking the --shared-heap argument against the shared pointer representation to produce a use full message instead of crashing. I'll enter a bug report for that issue. -Paul P.S. Due to the holiday season, you may not hear much from me or the rest of the UPC group in Berkeley for the next week or more. Happy Holidays to you. Andreev Nikita wrote: > Hi, > > Another issue with suncc. I'm trying to run UPC NPB benchmark on Sun > x86 machines cluster. I create jobs for every NPB kernel for > 1, 2, 4, 8, 16 and 32 threads. 1, 8, 16 and 32 compute fine, but 2 and 4 > everytime crash (for every kernel). > > I'm using Sun Ceres Studio IDE 9.0 Linux_i386 2009/03/06 compiler, udp > conduit (ibv also crashes). > > make.def file which is used for compiling NPB is in attachment. > > For instance I ran mg kernel for 2 threads and saved the output with > debug info. It's also in the attachment. There you can find also the > job file which was used to run the job in question. > > Regards, > Nikita > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory