Re: 2.10.0 over GM

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 20 2010 - 12:44:48 PDT

  • Next message: Oliver Perks: "Re: 2.10.0 over GM"
    Oliver,
    
      The message you are getting is saying that the shared heap you have 
    requested is too large for our default GM support.  Use of GM requires 
    that the memory addressed remotely be "pinned" (prevents the OS from 
    swapping it out).  Our default behaviour with GM is to try to pin the 
    entire shared heap, and this configuration is known as "SEGMENT_FAST" 
    because it provides the greatest speed for remote memory access, but at 
    the possible cost of limited heap size.  The message you are getting 
    indicates that gm_register_memory() function failed to pin the shared heap.
    
      So, the simplest fix is probably to ask for a smaller shared heap if 
    possible (you can pass --shared-heap=N to upcrun without needing to 
    recompile the executable).  You should also see if reducing the setting 
    of the environment variable GASNET_PHYSMEM_PINNABLE_RATIO might help.  
    This variable is 0.7 by default and indicates the largest fraction of 
    physical memory we'll ask GM to pin.  Of course if you /need/ the large 
    shared heap size, then you'll need the "SEGMENT_LARGE" option, below.
    
      The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" mode 
    to allow for a larger shared heap.  However, this is accomplished by 
    dynamically pinning and unpinning of portions of memory, which can lead 
    to a reduction in speed relative to SEGMENT_FAST.  To get the "LARGE" 
    segment support you will need to reconfigure Berkeley UPC with 
    "--enable-segment-large" on the configure command line, recompile and 
    reinstall Berkeley UPC and then recompile your application with the new 
    Berkeley UPC installation.
    
      Note that --enable-segment-large affects all the networks, meaning 
    that the IBV support in such a build will also be switched into "LARGE" 
    mode.  So, you may want to consider keeping two separate builds of the 
    Berkeley UPC runtime ("FAST" for IBV, and "LARGE" for GM).  For MPI 
    there is actually no distinction between the two segment modes.
    
    -Paul
    
    Oliver Perks wrote:
    > This looks like it may be a very simple fix but I honestly have no 
    > idea where to start. Sadly nothing obvious from google searches.
    > My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built for GM, 
    > IBV and eth), IBV but not GM, it crashes with the following error.
    >
    > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    > NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the 
    > environment to generate a backtrace.
    > *** Caught a fatal signal: SIGABRT(6) on node 3/4
    > bash: line 1: 14500 Aborted                 /usr/bin/env 
    > GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1 
    > LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib 
    > GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 
    > GMPI_SLAVE=10.131.56.61 
    > GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon41.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon40.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon40.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" 
    > GASNET_GASNETRUN_GM=1 
    > UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon41.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon40.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> 
    > vogon40.deepthought.hpsg.dcs.warwick.ac.uk 
    > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" 
    > /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
    >
    >
    > GASNET_BACKTRACE=1 Adds this extra information
    >
    > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    > [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly 
    > '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
    > [3] [Thread debugging using libthread_db enabled]
    > [3] [New Thread 0x403190c0 (LWP 10899)]
    > [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    > [3] #0  0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    > [3] #1  0x4005946c in system (
    > [3]     cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly 
    > '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899") at 
    > ./libgm/gm_fork_system.c:227
    > [3] #2  0x080a01ad in gasneti_bt_gdb ()
    > [3] #3  0x080a27fb in gasneti_print_backtrace ()
    > [3] #4  0x080a0446 in gasneti_fatalerror ()
    > [3] #5  0x0809130d in gasnetc_attach ()
    > [3] #6  0x08069ee3 in upcr_startup_attach ()
    > [3] #7  0x0807de2e in bupc_init_reentrant ()
    > [3] #8  0x08061698 in main ()
    > [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz 
    > '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
    >
    >
    >
    > Program compiled with `upcc -shared-heap=900 -network=gm`
    >
    > Regards
    > Oliver Perks
    >
    > -- 
    > Oliver Perks
    > CS204 Department of Computer Science
    > University of Warwick
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Oliver Perks: "Re: 2.10.0 over GM"