From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 20 2010 - 15:40:25 PDT
Oliver,
The value is per UPC thread, and if no "K", "M" or "G" suffix is
included the default is units of MB.
The manpages or --help output ftom upcc will tell you:
> -shared-heap=NUM
> Specify default amount (per UPC thread) of shared
> memory.
> Defaults to megabytes: use '1GB' for 1 gigabyte. Can
> override
> at startup via the UPC_SHARED_HEAP_SIZE environment
> variable.
And the manpage or --help output from upcrun will tell you:
> -shared-heap <sz>
>
> Requests the given amount of shared memory (per UPC
> thread).
> Units of <sz> default to megabytes; use '2GB' to request
> 2 giga-
> bytes per thread.
-Paul
Oliver Perks wrote:
> Thank you so much.
> What a fantastic explanation. I'm really happy it's such a simple
> problem. I ran my program with a smaller shared heap and it works fine.
> Does the -shared-heap flag indicate the memory in total, or per node?
>
> Oliver
>
>
> On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov
> <mailto:PHHargrove_at_lbl_dot_gov>> wrote:
>
> Oliver,
>
> The message you are getting is saying that the shared heap you
> have requested is too large for our default GM support. Use of GM
> requires that the memory addressed remotely be "pinned" (prevents
> the OS from swapping it out). Our default behaviour with GM is to
> try to pin the entire shared heap, and this configuration is known
> as "SEGMENT_FAST" because it provides the greatest speed for
> remote memory access, but at the possible cost of limited heap
> size. The message you are getting indicates that
> gm_register_memory() function failed to pin the shared heap.
>
> So, the simplest fix is probably to ask for a smaller shared heap
> if possible (you can pass --shared-heap=N to upcrun without
> needing to recompile the executable). You should also see if
> reducing the setting of the environment variable
> GASNET_PHYSMEM_PINNABLE_RATIO might help. This variable is 0.7 by
> default and indicates the largest fraction of physical memory
> we'll ask GM to pin. Of course if you /need/ the large shared
> heap size, then you'll need the "SEGMENT_LARGE" option, below.
>
> The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE"
> mode to allow for a larger shared heap. However, this is
> accomplished by dynamically pinning and unpinning of portions of
> memory, which can lead to a reduction in speed relative to
> SEGMENT_FAST. To get the "LARGE" segment support you will need to
> reconfigure Berkeley UPC with "--enable-segment-large" on the
> configure command line, recompile and reinstall Berkeley UPC and
> then recompile your application with the new Berkeley UPC
> installation.
>
> Note that --enable-segment-large affects all the networks,
> meaning that the IBV support in such a build will also be switched
> into "LARGE" mode. So, you may want to consider keeping two
> separate builds of the Berkeley UPC runtime ("FAST" for IBV, and
> "LARGE" for GM). For MPI there is actually no distinction between
> the two segment modes.
>
> -Paul
>
> Oliver Perks wrote:
>
> This looks like it may be a very simple fix but I honestly
> have no idea where to start. Sadly nothing obvious from google
> searches.
> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built
> for GM, IBV and eth), IBV but not GM, it crashes with the
> following error.
>
> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in
> the environment to generate a backtrace.
> *** Caught a fatal signal: SIGABRT(6) on node 3/4
> bash: line 1: 14500 Aborted /usr/bin/env
> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1
> GMPI_SLAVE=10.131.56.61
> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon41.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
> GASNET_GASNETRUN_GM=1
> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon41.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> vogon40.deepthought.hpsg.dcs.warwick.ac.uk
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
> /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
>
>
>
> GASNET_BACKTRACE=1 Adds this extra information
>
> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
> [3] [Thread debugging using libthread_db enabled]
> [3] [New Thread 0x403190c0 (LWP 10899)]
> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
> [3] #0 0x401cf3ae in __waitpid_nocancel () from
> /lib/libpthread.so.0
> [3] #1 0x4005946c in system (
> [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x
> /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel'
> 10899") at ./libgm/gm_fork_system.c:227
> [3] #2 0x080a01ad in gasneti_bt_gdb ()
> [3] #3 0x080a27fb in gasneti_print_backtrace ()
> [3] #4 0x080a0446 in gasneti_fatalerror ()
> [3] #5 0x0809130d in gasnetc_attach ()
> [3] #6 0x08069ee3 in upcr_startup_attach ()
> [3] #7 0x0807de2e in bupc_init_reentrant ()
> [3] #8 0x08061698 in main ()
> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
>
>
>
> Program compiled with `upcc -shared-heap=900 -network=gm`
>
> Regards
> Oliver Perks
>
> --
> Oliver Perks
> CS204 Department of Computer Science
> University of Warwick
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_lbl_dot_gov
> <mailto:PHHargrove_at_lbl_dot_gov>
> Future Technologies Group Tel: +1-510-495-2352
> HPC Research Department Fax: +1-510-486-6900
> Lawrence Berkeley National Laboratory
>
>
>
>
> --
> Oliver Perks
> CS204 Department of Computer Science
> University of Warwick
--
Paul H. Hargrove PHHargrove_at_lbl_dot_gov
Future Technologies Group Tel: +1-510-495-2352
HPC Research Department Fax: +1-510-486-6900
Lawrence Berkeley National Laboratory