Re: 2.10.0 over GM

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 20 2010 - 15:40:25 PDT

  • Next message: Oliver Perks: "Re: 2.10.0 over GM"
    Oliver,
    
    The value is per UPC thread, and if no "K", "M" or "G" suffix is 
    included the default is units of MB.
    
    The manpages or --help output ftom upcc will tell you:
    >        -shared-heap=NUM
    >               Specify default  amount  (per  UPC  thread)  of  shared  
    > memory.
    >               Defaults  to  megabytes: use '1GB' for 1 gigabyte.  Can 
    > override
    >               at startup via the UPC_SHARED_HEAP_SIZE environment 
    > variable.
    
    And the manpage or --help output from upcrun will tell you:
    
    >        -shared-heap <sz>
    >
    >               Requests the given amount of shared  memory  (per  UPC  
    > thread).
    >               Units of <sz> default to megabytes; use '2GB' to request 
    > 2 giga-
    >               bytes per thread.
    
    -Paul
    
    
    Oliver Perks wrote:
    > Thank you so much.
    > What a fantastic explanation. I'm really happy it's such a simple 
    > problem. I ran my program with a smaller shared heap and it works fine.
    > Does the -shared-heap flag indicate the memory in total, or per node?
    >
    > Oliver
    >
    >
    > On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov 
    > <mailto:PHHargrove_at_lbl_dot_gov>> wrote:
    >
    >     Oliver,
    >
    >      The message you are getting is saying that the shared heap you
    >     have requested is too large for our default GM support.  Use of GM
    >     requires that the memory addressed remotely be "pinned" (prevents
    >     the OS from swapping it out).  Our default behaviour with GM is to
    >     try to pin the entire shared heap, and this configuration is known
    >     as "SEGMENT_FAST" because it provides the greatest speed for
    >     remote memory access, but at the possible cost of limited heap
    >     size.  The message you are getting indicates that
    >     gm_register_memory() function failed to pin the shared heap.
    >
    >      So, the simplest fix is probably to ask for a smaller shared heap
    >     if possible (you can pass --shared-heap=N to upcrun without
    >     needing to recompile the executable).  You should also see if
    >     reducing the setting of the environment variable
    >     GASNET_PHYSMEM_PINNABLE_RATIO might help.  This variable is 0.7 by
    >     default and indicates the largest fraction of physical memory
    >     we'll ask GM to pin.  Of course if you /need/ the large shared
    >     heap size, then you'll need the "SEGMENT_LARGE" option, below.
    >
    >      The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE"
    >     mode to allow for a larger shared heap.  However, this is
    >     accomplished by dynamically pinning and unpinning of portions of
    >     memory, which can lead to a reduction in speed relative to
    >     SEGMENT_FAST.  To get the "LARGE" segment support you will need to
    >     reconfigure Berkeley UPC with "--enable-segment-large" on the
    >     configure command line, recompile and reinstall Berkeley UPC and
    >     then recompile your application with the new Berkeley UPC
    >     installation.
    >
    >      Note that --enable-segment-large affects all the networks,
    >     meaning that the IBV support in such a build will also be switched
    >     into "LARGE" mode.  So, you may want to consider keeping two
    >     separate builds of the Berkeley UPC runtime ("FAST" for IBV, and
    >     "LARGE" for GM).  For MPI there is actually no distinction between
    >     the two segment modes.
    >
    >     -Paul
    >
    >     Oliver Perks wrote:
    >
    >         This looks like it may be a very simple fix but I honestly
    >         have no idea where to start. Sadly nothing obvious from google
    >         searches.
    >         My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built
    >         for GM, IBV and eth), IBV but not GM, it crashes with the
    >         following error.
    >
    >         *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >         NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in
    >         the environment to generate a backtrace.
    >         *** Caught a fatal signal: SIGABRT(6) on node 3/4
    >         bash: line 1: 14500 Aborted                 /usr/bin/env
    >         GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
    >         LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
    >         GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1
    >         GMPI_SLAVE=10.131.56.61
    >         GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
    >         GASNET_GASNETRUN_GM=1
    >         UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >         <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
    >         /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
    >
    >
    >
    >         GASNET_BACKTRACE=1 Adds this extra information
    >
    >         *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >         [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
    >         '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
    >         [3] [Thread debugging using libthread_db enabled]
    >         [3] [New Thread 0x403190c0 (LWP 10899)]
    >         [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    >         [3] #0  0x401cf3ae in __waitpid_nocancel () from
    >         /lib/libpthread.so.0
    >         [3] #1  0x4005946c in system (
    >         [3]     cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x
    >         /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel'
    >         10899") at ./libgm/gm_fork_system.c:227
    >         [3] #2  0x080a01ad in gasneti_bt_gdb ()
    >         [3] #3  0x080a27fb in gasneti_print_backtrace ()
    >         [3] #4  0x080a0446 in gasneti_fatalerror ()
    >         [3] #5  0x0809130d in gasnetc_attach ()
    >         [3] #6  0x08069ee3 in upcr_startup_attach ()
    >         [3] #7  0x0807de2e in bupc_init_reentrant ()
    >         [3] #8  0x08061698 in main ()
    >         [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
    >         '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
    >
    >
    >
    >         Program compiled with `upcc -shared-heap=900 -network=gm`
    >
    >         Regards
    >         Oliver Perks
    >
    >         -- 
    >         Oliver Perks
    >         CS204 Department of Computer Science
    >         University of Warwick
    >
    >
    >
    >     -- 
    >     Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >     <mailto:PHHargrove_at_lbl_dot_gov>
    >     Future Technologies Group                 Tel: +1-510-495-2352
    >     HPC Research Department                   Fax: +1-510-486-6900
    >     Lawrence Berkeley National Laboratory    
    >
    >
    >
    >
    > -- 
    > Oliver Perks
    > CS204 Department of Computer Science
    > University of Warwick
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Oliver Perks: "Re: 2.10.0 over GM"