From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Apr 20 2010 - 12:44:48 PDT
Oliver, The message you are getting is saying that the shared heap you have requested is too large for our default GM support. Use of GM requires that the memory addressed remotely be "pinned" (prevents the OS from swapping it out). Our default behaviour with GM is to try to pin the entire shared heap, and this configuration is known as "SEGMENT_FAST" because it provides the greatest speed for remote memory access, but at the possible cost of limited heap size. The message you are getting indicates that gm_register_memory() function failed to pin the shared heap. So, the simplest fix is probably to ask for a smaller shared heap if possible (you can pass --shared-heap=N to upcrun without needing to recompile the executable). You should also see if reducing the setting of the environment variable GASNET_PHYSMEM_PINNABLE_RATIO might help. This variable is 0.7 by default and indicates the largest fraction of physical memory we'll ask GM to pin. Of course if you /need/ the large shared heap size, then you'll need the "SEGMENT_LARGE" option, below. The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" mode to allow for a larger shared heap. However, this is accomplished by dynamically pinning and unpinning of portions of memory, which can lead to a reduction in speed relative to SEGMENT_FAST. To get the "LARGE" segment support you will need to reconfigure Berkeley UPC with "--enable-segment-large" on the configure command line, recompile and reinstall Berkeley UPC and then recompile your application with the new Berkeley UPC installation. Note that --enable-segment-large affects all the networks, meaning that the IBV support in such a build will also be switched into "LARGE" mode. So, you may want to consider keeping two separate builds of the Berkeley UPC runtime ("FAST" for IBV, and "LARGE" for GM). For MPI there is actually no distinction between the two segment modes. -Paul Oliver Perks wrote: > This looks like it may be a very simple fix but I honestly have no > idea where to start. Sadly nothing obvious from google searches. > My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built for GM, > IBV and eth), IBV but not GM, it crashes with the following error. > > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB > NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the > environment to generate a backtrace. > *** Caught a fatal signal: SIGABRT(6) on node 3/4 > bash: line 1: 14500 Aborted /usr/bin/env > GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1 > LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib > GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 > GMPI_SLAVE=10.131.56.61 > GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" > GASNET_GASNETRUN_GM=1 > UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon41.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> > vogon40.deepthought.hpsg.dcs.warwick.ac.uk > <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" > /home/fjp/Graphs/UPC/./floyd2-parallel '8000' > > > GASNET_BACKTRACE=1 Adds this extra information > > *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB > [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly > '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899 > [3] [Thread debugging using libthread_db enabled] > [3] [New Thread 0x403190c0 (LWP 10899)] > [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 > [3] #0 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 > [3] #1 0x4005946c in system ( > [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly > '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899") at > ./libgm/gm_fork_system.c:227 > [3] #2 0x080a01ad in gasneti_bt_gdb () > [3] #3 0x080a27fb in gasneti_print_backtrace () > [3] #4 0x080a0446 in gasneti_fatalerror () > [3] #5 0x0809130d in gasnetc_attach () > [3] #6 0x08069ee3 in upcr_startup_attach () > [3] #7 0x0807de2e in bupc_init_reentrant () > [3] #8 0x08061698 in main () > [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz > '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630 > > > > Program compiled with `upcc -shared-heap=900 -network=gm` > > Regards > Oliver Perks > > -- > Oliver Perks > CS204 Department of Computer Science > University of Warwick -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory