From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:42:46 PDT
Fantastic.
Sorry I should have read the man page. My bad.
Thank you for all your time. It is really appreciated.
Oliver
On Tue, Apr 20, 2010 at 11:40 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
> Oliver,
>
> The value is per UPC thread, and if no "K", "M" or "G" suffix is included
> the default is units of MB.
>
> The manpages or --help output ftom upcc will tell you:
>
>> -shared-heap=NUM
>> Specify default amount (per UPC thread) of shared
>> memory.
>> Defaults to megabytes: use '1GB' for 1 gigabyte. Can
>> override
>> at startup via the UPC_SHARED_HEAP_SIZE environment variable.
>>
>
> And the manpage or --help output from upcrun will tell you:
>
> -shared-heap <sz>
>>
>> Requests the given amount of shared memory (per UPC
>> thread).
>> Units of <sz> default to megabytes; use '2GB' to request 2
>> giga-
>> bytes per thread.
>>
>
> -Paul
>
>
> Oliver Perks wrote:
>
>> Thank you so much.
>> What a fantastic explanation. I'm really happy it's such a simple problem.
>> I ran my program with a smaller shared heap and it works fine.
>> Does the -shared-heap flag indicate the memory in total, or per node?
>>
>> Oliver
>>
>>
>> On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov<mailto:
>> PHHargrove_at_lbl_dot_gov>> wrote:
>>
>> Oliver,
>>
>> The message you are getting is saying that the shared heap you
>> have requested is too large for our default GM support. Use of GM
>> requires that the memory addressed remotely be "pinned" (prevents
>> the OS from swapping it out). Our default behaviour with GM is to
>> try to pin the entire shared heap, and this configuration is known
>> as "SEGMENT_FAST" because it provides the greatest speed for
>> remote memory access, but at the possible cost of limited heap
>> size. The message you are getting indicates that
>> gm_register_memory() function failed to pin the shared heap.
>>
>> So, the simplest fix is probably to ask for a smaller shared heap
>> if possible (you can pass --shared-heap=N to upcrun without
>> needing to recompile the executable). You should also see if
>> reducing the setting of the environment variable
>> GASNET_PHYSMEM_PINNABLE_RATIO might help. This variable is 0.7 by
>> default and indicates the largest fraction of physical memory
>> we'll ask GM to pin. Of course if you /need/ the large shared
>> heap size, then you'll need the "SEGMENT_LARGE" option, below.
>>
>> The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE"
>> mode to allow for a larger shared heap. However, this is
>> accomplished by dynamically pinning and unpinning of portions of
>> memory, which can lead to a reduction in speed relative to
>> SEGMENT_FAST. To get the "LARGE" segment support you will need to
>> reconfigure Berkeley UPC with "--enable-segment-large" on the
>> configure command line, recompile and reinstall Berkeley UPC and
>> then recompile your application with the new Berkeley UPC
>> installation.
>>
>> Note that --enable-segment-large affects all the networks,
>> meaning that the IBV support in such a build will also be switched
>> into "LARGE" mode. So, you may want to consider keeping two
>> separate builds of the Berkeley UPC runtime ("FAST" for IBV, and
>> "LARGE" for GM). For MPI there is actually no distinction between
>> the two segment modes.
>>
>> -Paul
>>
>> Oliver Perks wrote:
>>
>> This looks like it may be a very simple fix but I honestly
>> have no idea where to start. Sadly nothing obvious from google
>> searches.
>> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built
>> for GM, IBV and eth), IBV but not GM, it crashes with the
>> following error.
>>
>> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
>> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in
>> the environment to generate a backtrace.
>> *** Caught a fatal signal: SIGABRT(6) on node 3/4
>> bash: line 1: 14500 Aborted /usr/bin/env
>> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
>>
>> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
>> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1
>> GMPI_SLAVE=10.131.56.61
>> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon41.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
>> GASNET_GASNETRUN_GM=1
>> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon41.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
>> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
>> /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
>>
>>
>>
>> GASNET_BACKTRACE=1 Adds this extra information
>>
>> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
>> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
>> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
>> [3] [Thread debugging using libthread_db enabled]
>> [3] [New Thread 0x403190c0 (LWP 10899)]
>> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
>> [3] #0 0x401cf3ae in __waitpid_nocancel () from
>> /lib/libpthread.so.0
>> [3] #1 0x4005946c in system (
>> [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x
>> /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel'
>> 10899") at ./libgm/gm_fork_system.c:227
>> [3] #2 0x080a01ad in gasneti_bt_gdb ()
>> [3] #3 0x080a27fb in gasneti_print_backtrace ()
>> [3] #4 0x080a0446 in gasneti_fatalerror ()
>> [3] #5 0x0809130d in gasnetc_attach ()
>> [3] #6 0x08069ee3 in upcr_startup_attach ()
>> [3] #7 0x0807de2e in bupc_init_reentrant ()
>> [3] #8 0x08061698 in main ()
>> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
>> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
>>
>>
>>
>> Program compiled with `upcc -shared-heap=900 -network=gm`
>>
>> Regards
>> Oliver Perks
>>
>> -- Oliver Perks
>> CS204 Department of Computer Science
>> University of Warwick
>>
>>
>>
>> -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov
>> <mailto:PHHargrove_at_lbl_dot_gov>
>>
>> Future Technologies Group Tel: +1-510-495-2352
>> HPC Research Department Fax: +1-510-486-6900
>> Lawrence Berkeley National Laboratory
>>
>>
>>
>> --
>> Oliver Perks
>> CS204 Department of Computer Science
>> University of Warwick
>>
>
>
> --
> Paul H. Hargrove PHHargrove_at_lbl_dot_gov
> Future Technologies Group Tel: +1-510-495-2352
> HPC Research Department Fax: +1-510-486-6900
> Lawrence Berkeley National Laboratory
>
--
Oliver Perks
CS204 Department of Computer Science
University of Warwick