Re: suncc, NPB crashes

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Thu Dec 24 2009 - 01:09:50 PST

  • Next message: Paul H. Hargrove: "Re: suncc, NPB crashes"
    Nikita,
    
      Our default representation for a shared pointer packs all the required 
    info into 64 bits.  By default only 34 bits are used for "address" 
    bits.  This means that the maximum shared heap size per UPC thread is 
    16G.  So, I would guess that the 2 and 4 thread cases are simply using 
    too much memory (I see --shared-heap=32G in the job script) and the 
    addressing is getting truncated internally leading to the crash.
    
      Assuming you do actually have enough memory on your compute nodes, 
    there are at least two things you could do to address this 16GB shared 
    heap limit.  Unfortunately they both require rebuilding the Berkeley UPC 
    runtime:
    
    Option 1)  Pass --enable-sptr-struct to configure.  This will use a 
    struct to represent a shared pointer and effectively removes any 
    limitations on the shared heap imposed by the shared pointer 
    representation.  Unfortunately, this tends to perform less well than the 
    64-bit "packed" representation.
    
    Option 2) You could keep the 64-bit packed representation but adjust how 
    many bits are allocated to the address field.  To do this pass 
    --with-sptr-packed-bits=P,T,A to configure, for suitable integer values 
    of P (phase bits), T (thread bits) and A (address bits).  For an 
    explanation of these values, see the section "TRADING-OFF MAXIMUM 
    'THREADS', BLOCKSIZE, AND HEAP SIZE" in the INSTALL.TXT file in the 
    Berkley UPC source (or online at 
    http://upc.lbl.gov/download/dist/INSTALL.TXT )
    
    I would try "Option 1" first.  If that does not work, then I am wrong 
    about the cause of your crashes and "Option 2" will not be of any use 
    regardless of what P,T,A values you pass.
    
    We should probably be sanity checking the --shared-heap argument against 
    the shared pointer representation to produce a use full message instead 
    of crashing.  I'll enter a bug report for that issue.
    
    -Paul
    P.S.  Due to the holiday season, you may not hear much from me or the 
    rest of the UPC group in Berkeley for the next week or more.  Happy 
    Holidays to you.
    
    Andreev Nikita wrote:
    > Hi,
    >
    > Another issue with suncc. I'm trying to run UPC NPB benchmark on Sun
    > x86 machines cluster. I create jobs for every NPB kernel for
    > 1, 2, 4, 8, 16 and 32 threads. 1, 8, 16 and 32 compute fine, but 2 and 4
    > everytime crash (for every kernel).
    >
    > I'm using Sun Ceres Studio IDE 9.0 Linux_i386 2009/03/06 compiler, udp
    > conduit (ibv also crashes).
    >
    > make.def file which is used for compiling NPB is in attachment.
    >
    > For instance I ran mg kernel for 2 threads and saved the output with
    > debug info. It's also in the attachment. There you can find also the
    > job file which was used to run the job in question.
    >
    > Regards,
    > Nikita
    >   
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "Re: suncc, NPB crashes"