From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:25:59 PDT
Thank you so much.
What a fantastic explanation. I'm really happy it's such a simple problem. I
ran my program with a smaller shared heap and it works fine.
Does the -shared-heap flag indicate the memory in total, or per node?
Oliver
On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
> Oliver,
>
> The message you are getting is saying that the shared heap you have
> requested is too large for our default GM support. Use of GM requires that
> the memory addressed remotely be "pinned" (prevents the OS from swapping it
> out). Our default behaviour with GM is to try to pin the entire shared
> heap, and this configuration is known as "SEGMENT_FAST" because it provides
> the greatest speed for remote memory access, but at the possible cost of
> limited heap size. The message you are getting indicates that
> gm_register_memory() function failed to pin the shared heap.
>
> So, the simplest fix is probably to ask for a smaller shared heap if
> possible (you can pass --shared-heap=N to upcrun without needing to
> recompile the executable). You should also see if reducing the setting of
> the environment variable GASNET_PHYSMEM_PINNABLE_RATIO might help. This
> variable is 0.7 by default and indicates the largest fraction of physical
> memory we'll ask GM to pin. Of course if you /need/ the large shared heap
> size, then you'll need the "SEGMENT_LARGE" option, below.
>
> The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" mode to
> allow for a larger shared heap. However, this is accomplished by
> dynamically pinning and unpinning of portions of memory, which can lead to a
> reduction in speed relative to SEGMENT_FAST. To get the "LARGE" segment
> support you will need to reconfigure Berkeley UPC with
> "--enable-segment-large" on the configure command line, recompile and
> reinstall Berkeley UPC and then recompile your application with the new
> Berkeley UPC installation.
>
> Note that --enable-segment-large affects all the networks, meaning that
> the IBV support in such a build will also be switched into "LARGE" mode.
> So, you may want to consider keeping two separate builds of the Berkeley
> UPC runtime ("FAST" for IBV, and "LARGE" for GM). For MPI there is actually
> no distinction between the two segment modes.
>
> -Paul
>
> Oliver Perks wrote:
>
>> This looks like it may be a very simple fix but I honestly have no idea
>> where to start. Sadly nothing obvious from google searches.
>> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built for GM, IBV
>> and eth), IBV but not GM, it crashes with the following error.
>>
>> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the
>> environment to generate a backtrace.
>> *** Caught a fatal signal: SIGABRT(6) on node 3/4
>> bash: line 1: 14500 Aborted /usr/bin/env
>> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
>> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
>> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 GMPI_SLAVE=10.131.56.61
>> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" GASNET_GASNETRUN_GM=1
>> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon41.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk <
>> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
>> /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
>>
>>
>>
>> GASNET_BACKTRACE=1 Adds this extra information
>>
>> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
>> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
>> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
>> [3] [Thread debugging using libthread_db enabled]
>> [3] [New Thread 0x403190c0 (LWP 10899)]
>> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
>> [3] #0 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
>> [3] #1 0x4005946c in system (
>> [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
>> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899") at
>> ./libgm/gm_fork_system.c:227
>> [3] #2 0x080a01ad in gasneti_bt_gdb ()
>> [3] #3 0x080a27fb in gasneti_print_backtrace ()
>> [3] #4 0x080a0446 in gasneti_fatalerror ()
>> [3] #5 0x0809130d in gasnetc_attach ()
>> [3] #6 0x08069ee3 in upcr_startup_attach ()
>> [3] #7 0x0807de2e in bupc_init_reentrant ()
>> [3] #8 0x08061698 in main ()
>> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
>> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
>>
>>
>>
>> Program compiled with `upcc -shared-heap=900 -network=gm`
>>
>> Regards
>> Oliver Perks
>>
>> --
>> Oliver Perks
>> CS204 Department of Computer Science
>> University of Warwick
>>
>
>
> --
> Paul H. Hargrove PHHargrove_at_lbl_dot_gov
> Future Technologies Group Tel: +1-510-495-2352
> HPC Research Department Fax: +1-510-486-6900
> Lawrence Berkeley National Laboratory
>
--
Oliver Perks
CS204 Department of Computer Science
University of Warwick