From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:25:59 PDT
Thank you so much. What a fantastic explanation. I'm really happy it's such a simple problem. I ran my program with a smaller shared heap and it works fine. Does the -shared-heap flag indicate the memory in total, or per node? Oliver On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote: > Oliver, > > The message you are getting is saying that the shared heap you have > requested is too large for our default GM support. Use of GM requires that > the memory addressed remotely be "pinned" (prevents the OS from swapping it > out). Our default behaviour with GM is to try to pin the entire shared > heap, and this configuration is known as "SEGMENT_FAST" because it provides > the greatest speed for remote memory access, but at the possible cost of > limited heap size. The message you are getting indicates that > gm_register_memory() function failed to pin the shared heap. > > So, the simplest fix is probably to ask for a smaller shared heap if > possible (you can pass --shared-heap=N to upcrun without needing to > recompile the executable). You should also see if reducing the setting of > the environment variable GASNET_PHYSMEM_PINNABLE_RATIO might help. This > variable is 0.7 by default and indicates the largest fraction of physical > memory we'll ask GM to pin. Of course if you /need/ the large shared heap > size, then you'll need the "SEGMENT_LARGE" option, below. > > The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" mode to > allow for a larger shared heap. However, this is accomplished by > dynamically pinning and unpinning of portions of memory, which can lead to a > reduction in speed relative to SEGMENT_FAST. To get the "LARGE" segment > support you will need to reconfigure Berkeley UPC with > "--enable-segment-large" on the configure command line, recompile and > reinstall Berkeley UPC and then recompile your application with the new > Berkeley UPC installation. > > Note that --enable-segment-large affects all the networks, meaning that > the IBV support in such a build will also be switched into "LARGE" mode. > So, you may want to consider keeping two separate builds of the Berkeley > UPC runtime ("FAST" for IBV, and "LARGE" for GM). For MPI there is actually > no distinction between the two segment modes. > > -Paul > > Oliver Perks wrote: > >> This looks like it may be a very simple fix but I honestly have no idea >> where to start. Sadly nothing obvious from google searches. >> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built for GM, IBV >> and eth), IBV but not GM, it crashes with the following error. >> >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB >> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the >> environment to generate a backtrace. >> *** Caught a fatal signal: SIGABRT(6) on node 3/4 >> bash: line 1: 14500 Aborted /usr/bin/env >> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1 >> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib >> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 GMPI_SLAVE=10.131.56.61 >> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" GASNET_GASNETRUN_GM=1 >> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk < >> http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" >> /home/fjp/Graphs/UPC/./floyd2-parallel '8000' >> >> >> >> GASNET_BACKTRACE=1 Adds this extra information >> >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB >> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899 >> [3] [Thread debugging using libthread_db enabled] >> [3] [New Thread 0x403190c0 (LWP 10899)] >> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 >> [3] #0 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 >> [3] #1 0x4005946c in system ( >> [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899") at >> ./libgm/gm_fork_system.c:227 >> [3] #2 0x080a01ad in gasneti_bt_gdb () >> [3] #3 0x080a27fb in gasneti_print_backtrace () >> [3] #4 0x080a0446 in gasneti_fatalerror () >> [3] #5 0x0809130d in gasnetc_attach () >> [3] #6 0x08069ee3 in upcr_startup_attach () >> [3] #7 0x0807de2e in bupc_init_reentrant () >> [3] #8 0x08061698 in main () >> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630 >> >> >> >> Program compiled with `upcc -shared-heap=900 -network=gm` >> >> Regards >> Oliver Perks >> >> -- >> Oliver Perks >> CS204 Department of Computer Science >> University of Warwick >> > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory > -- Oliver Perks CS204 Department of Computer Science University of Warwick