From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 20 2010 - 15:40:25 PDT
Oliver, The value is per UPC thread, and if no "K", "M" or "G" suffix is included the default is units of MB. The manpages or --help output ftom upcc will tell you: > -shared-heap=NUM > Specify default amount (per UPC thread) of shared > memory. > Defaults to megabytes: use '1GB' for 1 gigabyte. Can > override > at startup via the UPC_SHARED_HEAP_SIZE environment > variable. And the manpage or --help output from upcrun will tell you: > -shared-heap <sz> > > Requests the given amount of shared memory (per UPC > thread). > Units of <sz> default to megabytes; use '2GB' to request > 2 giga- > bytes per thread. -Paul Oliver Perks wrote: > Thank you so much. > What a fantastic explanation. I'm really happy it's such a simple > problem. I ran my program with a smaller shared heap and it works fine. > Does the -shared-heap flag indicate the memory in total, or per node? > > Oliver > > > On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov>> wrote: > > Oliver, > > The message you are getting is saying that the shared heap you > have requested is too large for our default GM support. Use of GM > requires that the memory addressed remotely be "pinned" (prevents > the OS from swapping it out). Our default behaviour with GM is to > try to pin the entire shared heap, and this configuration is known > as "SEGMENT_FAST" because it provides the greatest speed for > remote memory access, but at the possible cost of limited heap > size. The message you are getting indicates that > gm_register_memory() function failed to pin the shared heap. > > So, the simplest fix is probably to ask for a smaller shared heap > if possible (you can pass --shared-heap=N to upcrun without > needing to recompile the executable). You should also see if > reducing the setting of the environment variable > GASNET_PHYSMEM_PINNABLE_RATIO might help. This variable is 0.7 by > default and indicates the largest fraction of physical memory > we'll ask GM to pin. Of course if you /need/ the large shared > heap size, then you'll need the "SEGMENT_LARGE" option, below. > > The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" > mode to allow for a larger shared heap. However, this is > accomplished by dynamically pinning and unpinning of portions of > memory, which can lead to a reduction in speed relative to > SEGMENT_FAST. To get the "LARGE" segment support you will need to > reconfigure Berkeley UPC with "--enable-segment-large" on the > configure command line, recompile and reinstall Berkeley UPC and > then recompile your application with the new Berkeley UPC > installation. > > Note that --enable-segment-large affects all the networks, > meaning that the IBV support in such a build will also be switched > into "LARGE" mode. So, you may want to consider keeping two > separate builds of the Berkeley UPC runtime ("FAST" for IBV, and > "LARGE" for GM). For MPI there is actually no distinction between > the two segment modes. > > -Paul > > Oliver Perks wrote: > > This looks like it may be a very simple fix but I honestly > have no idea where to start. Sadly nothing obvious from google > searches. > My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built > for GM, IBV and eth), IBV but not GM, it crashes with the > following error. > > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB > NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in > the environment to generate a backtrace. > *** Caught a fatal signal: SIGABRT(6) on node 3/4 > bash: line 1: 14500 Aborted /usr/bin/env > GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1 > LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib > GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 > GMPI_SLAVE=10.131.56.61 > GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" > GASNET_GASNETRUN_GM=1 > UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" > /home/fjp/Graphs/UPC/./floyd2-parallel '8000' > > > > GASNET_BACKTRACE=1 Adds this extra information > > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB > [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly > '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899 > [3] [Thread debugging using libthread_db enabled] > [3] [New Thread 0x403190c0 (LWP 10899)] > [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 > [3] #0 0x401cf3ae in __waitpid_nocancel () from > /lib/libpthread.so.0 > [3] #1 0x4005946c in system ( > [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x > /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel' > 10899") at ./libgm/gm_fork_system.c:227 > [3] #2 0x080a01ad in gasneti_bt_gdb () > [3] #3 0x080a27fb in gasneti_print_backtrace () > [3] #4 0x080a0446 in gasneti_fatalerror () > [3] #5 0x0809130d in gasnetc_attach () > [3] #6 0x08069ee3 in upcr_startup_attach () > [3] #7 0x0807de2e in bupc_init_reentrant () > [3] #8 0x08061698 in main () > [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz > '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630 > > > > Program compiled with `upcc -shared-heap=900 -network=gm` > > Regards > Oliver Perks > > -- > Oliver Perks > CS204 Department of Computer Science > University of Warwick > > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > <mailto:PHHargrove_at_lbl_dot_gov> > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory > > > > > -- > Oliver Perks > CS204 Department of Computer Science > University of Warwick -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory