Re: 2.10.0 over GM

From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:25:59 PDT

  • Next message: Paul H. Hargrove: "Re: 2.10.0 over GM"
    Thank you so much.
    What a fantastic explanation. I'm really happy it's such a simple problem. I
    ran my program with a smaller shared heap and it works fine.
    Does the -shared-heap flag indicate the memory in total, or per node?
    
    Oliver
    
    
    On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    
    > Oliver,
    >
    >  The message you are getting is saying that the shared heap you have
    > requested is too large for our default GM support.  Use of GM requires that
    > the memory addressed remotely be "pinned" (prevents the OS from swapping it
    > out).  Our default behaviour with GM is to try to pin the entire shared
    > heap, and this configuration is known as "SEGMENT_FAST" because it provides
    > the greatest speed for remote memory access, but at the possible cost of
    > limited heap size.  The message you are getting indicates that
    > gm_register_memory() function failed to pin the shared heap.
    >
    >  So, the simplest fix is probably to ask for a smaller shared heap if
    > possible (you can pass --shared-heap=N to upcrun without needing to
    > recompile the executable).  You should also see if reducing the setting of
    > the environment variable GASNET_PHYSMEM_PINNABLE_RATIO might help.  This
    > variable is 0.7 by default and indicates the largest fraction of physical
    > memory we'll ask GM to pin.  Of course if you /need/ the large shared heap
    > size, then you'll need the "SEGMENT_LARGE" option, below.
    >
    >  The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" mode to
    > allow for a larger shared heap.  However, this is accomplished by
    > dynamically pinning and unpinning of portions of memory, which can lead to a
    > reduction in speed relative to SEGMENT_FAST.  To get the "LARGE" segment
    > support you will need to reconfigure Berkeley UPC with
    > "--enable-segment-large" on the configure command line, recompile and
    > reinstall Berkeley UPC and then recompile your application with the new
    > Berkeley UPC installation.
    >
    >  Note that --enable-segment-large affects all the networks, meaning that
    > the IBV support in such a build will also be switched into "LARGE" mode.
    >  So, you may want to consider keeping two separate builds of the Berkeley
    > UPC runtime ("FAST" for IBV, and "LARGE" for GM).  For MPI there is actually
    > no distinction between the two segment modes.
    >
    > -Paul
    >
    > Oliver Perks wrote:
    >
    >> This looks like it may be a very simple fix but I honestly have no idea
    >> where to start. Sadly nothing obvious from google searches.
    >> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built for GM, IBV
    >> and eth), IBV but not GM, it crashes with the following error.
    >>
    >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the
    >> environment to generate a backtrace.
    >> *** Caught a fatal signal: SIGABRT(6) on node 3/4
    >> bash: line 1: 14500 Aborted                 /usr/bin/env
    >> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
    >> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
    >> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 GMPI_SLAVE=10.131.56.61
    >> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" GASNET_GASNETRUN_GM=1
    >> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
    >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
    >> /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
    >>
    >>
    >>
    >> GASNET_BACKTRACE=1 Adds this extra information
    >>
    >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
    >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
    >> [3] [Thread debugging using libthread_db enabled]
    >> [3] [New Thread 0x403190c0 (LWP 10899)]
    >> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    >> [3] #0  0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    >> [3] #1  0x4005946c in system (
    >> [3]     cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
    >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899") at
    >> ./libgm/gm_fork_system.c:227
    >> [3] #2  0x080a01ad in gasneti_bt_gdb ()
    >> [3] #3  0x080a27fb in gasneti_print_backtrace ()
    >> [3] #4  0x080a0446 in gasneti_fatalerror ()
    >> [3] #5  0x0809130d in gasnetc_attach ()
    >> [3] #6  0x08069ee3 in upcr_startup_attach ()
    >> [3] #7  0x0807de2e in bupc_init_reentrant ()
    >> [3] #8  0x08061698 in main ()
    >> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
    >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
    >>
    >>
    >>
    >> Program compiled with `upcc -shared-heap=900 -network=gm`
    >>
    >> Regards
    >> Oliver Perks
    >>
    >> --
    >> Oliver Perks
    >> CS204 Department of Computer Science
    >> University of Warwick
    >>
    >
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    
    
    
    -- 
    Oliver Perks
    CS204 Department of Computer Science
    University of Warwick
    

  • Next message: Paul H. Hargrove: "Re: 2.10.0 over GM"