From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:42:46 PDT
Fantastic. Sorry I should have read the man page. My bad. Thank you for all your time. It is really appreciated. Oliver On Tue, Apr 20, 2010 at 11:40 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote: > Oliver, > > The value is per UPC thread, and if no "K", "M" or "G" suffix is included > the default is units of MB. > > The manpages or --help output ftom upcc will tell you: > >> -shared-heap=NUM >> Specify default amount (per UPC thread) of shared >> memory. >> Defaults to megabytes: use '1GB' for 1 gigabyte. Can >> override >> at startup via the UPC_SHARED_HEAP_SIZE environment variable. >> > > And the manpage or --help output from upcrun will tell you: > > -shared-heap <sz> >> >> Requests the given amount of shared memory (per UPC >> thread). >> Units of <sz> default to megabytes; use '2GB' to request 2 >> giga- >> bytes per thread. >> > > -Paul > > > Oliver Perks wrote: > >> Thank you so much. >> What a fantastic explanation. I'm really happy it's such a simple problem. >> I ran my program with a smaller shared heap and it works fine. >> Does the -shared-heap flag indicate the memory in total, or per node? >> >> Oliver >> >> >> On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov<mailto: >> PHHargrove_at_lbl_dot_gov>> wrote: >> >> Oliver, >> >> The message you are getting is saying that the shared heap you >> have requested is too large for our default GM support. Use of GM >> requires that the memory addressed remotely be "pinned" (prevents >> the OS from swapping it out). Our default behaviour with GM is to >> try to pin the entire shared heap, and this configuration is known >> as "SEGMENT_FAST" because it provides the greatest speed for >> remote memory access, but at the possible cost of limited heap >> size. The message you are getting indicates that >> gm_register_memory() function failed to pin the shared heap. >> >> So, the simplest fix is probably to ask for a smaller shared heap >> if possible (you can pass --shared-heap=N to upcrun without >> needing to recompile the executable). You should also see if >> reducing the setting of the environment variable >> GASNET_PHYSMEM_PINNABLE_RATIO might help. This variable is 0.7 by >> default and indicates the largest fraction of physical memory >> we'll ask GM to pin. Of course if you /need/ the large shared >> heap size, then you'll need the "SEGMENT_LARGE" option, below. >> >> The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE" >> mode to allow for a larger shared heap. However, this is >> accomplished by dynamically pinning and unpinning of portions of >> memory, which can lead to a reduction in speed relative to >> SEGMENT_FAST. To get the "LARGE" segment support you will need to >> reconfigure Berkeley UPC with "--enable-segment-large" on the >> configure command line, recompile and reinstall Berkeley UPC and >> then recompile your application with the new Berkeley UPC >> installation. >> >> Note that --enable-segment-large affects all the networks, >> meaning that the IBV support in such a build will also be switched >> into "LARGE" mode. So, you may want to consider keeping two >> separate builds of the Berkeley UPC runtime ("FAST" for IBV, and >> "LARGE" for GM). For MPI there is actually no distinction between >> the two segment modes. >> >> -Paul >> >> Oliver Perks wrote: >> >> This looks like it may be a very simple fix but I honestly >> have no idea where to start. Sadly nothing obvious from google >> searches. >> My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built >> for GM, IBV and eth), IBV but not GM, it crashes with the >> following error. >> >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB >> NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in >> the environment to generate a backtrace. >> *** Caught a fatal signal: SIGABRT(6) on node 3/4 >> bash: line 1: 14500 Aborted /usr/bin/env >> GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1 >> >> LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib >> GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1 >> GMPI_SLAVE=10.131.56.61 >> GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" >> GASNET_GASNETRUN_GM=1 >> UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon41.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> vogon40.deepthought.hpsg.dcs.warwick.ac.uk >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk> >> <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>" >> /home/fjp/Graphs/UPC/./floyd2-parallel '8000' >> >> >> >> GASNET_BACKTRACE=1 Adds this extra information >> >> *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB >> [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899 >> [3] [Thread debugging using libthread_db enabled] >> [3] [New Thread 0x403190c0 (LWP 10899)] >> [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0 >> [3] #0 0x401cf3ae in __waitpid_nocancel () from >> /lib/libpthread.so.0 >> [3] #1 0x4005946c in system ( >> [3] cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x >> /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel' >> 10899") at ./libgm/gm_fork_system.c:227 >> [3] #2 0x080a01ad in gasneti_bt_gdb () >> [3] #3 0x080a27fb in gasneti_print_backtrace () >> [3] #4 0x080a0446 in gasneti_fatalerror () >> [3] #5 0x0809130d in gasnetc_attach () >> [3] #6 0x08069ee3 in upcr_startup_attach () >> [3] #7 0x0807de2e in bupc_init_reentrant () >> [3] #8 0x08061698 in main () >> [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz >> '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630 >> >> >> >> Program compiled with `upcc -shared-heap=900 -network=gm` >> >> Regards >> Oliver Perks >> >> -- Oliver Perks >> CS204 Department of Computer Science >> University of Warwick >> >> >> >> -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov >> <mailto:PHHargrove_at_lbl_dot_gov> >> >> Future Technologies Group Tel: +1-510-495-2352 >> HPC Research Department Fax: +1-510-486-6900 >> Lawrence Berkeley National Laboratory >> >> >> >> -- >> Oliver Perks >> CS204 Department of Computer Science >> University of Warwick >> > > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > Future Technologies Group Tel: +1-510-495-2352 > HPC Research Department Fax: +1-510-486-6900 > Lawrence Berkeley National Laboratory > -- Oliver Perks CS204 Department of Computer Science University of Warwick