Re: 2.10.0 over GM

From: Oliver Perks (olly.perks_at_googlemail_dot_com)
Date: Tue Apr 20 2010 - 15:42:46 PDT

  • Next message: Debabrata Midya: "Re: UPC on Windows"
    Fantastic.
    Sorry I should have read the man page. My bad.
    
    Thank you for all your time. It is really appreciated.
    
    Oliver
    
    On Tue, Apr 20, 2010 at 11:40 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    
    > Oliver,
    >
    > The value is per UPC thread, and if no "K", "M" or "G" suffix is included
    > the default is units of MB.
    >
    > The manpages or --help output ftom upcc will tell you:
    >
    >>       -shared-heap=NUM
    >>              Specify default  amount  (per  UPC  thread)  of  shared
    >>  memory.
    >>              Defaults  to  megabytes: use '1GB' for 1 gigabyte.  Can
    >> override
    >>              at startup via the UPC_SHARED_HEAP_SIZE environment variable.
    >>
    >
    > And the manpage or --help output from upcrun will tell you:
    >
    >        -shared-heap <sz>
    >>
    >>              Requests the given amount of shared  memory  (per  UPC
    >>  thread).
    >>              Units of <sz> default to megabytes; use '2GB' to request 2
    >> giga-
    >>              bytes per thread.
    >>
    >
    > -Paul
    >
    >
    > Oliver Perks wrote:
    >
    >> Thank you so much.
    >> What a fantastic explanation. I'm really happy it's such a simple problem.
    >> I ran my program with a smaller shared heap and it works fine.
    >> Does the -shared-heap flag indicate the memory in total, or per node?
    >>
    >> Oliver
    >>
    >>
    >> On Tue, Apr 20, 2010 at 8:44 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov<mailto:
    >> PHHargrove_at_lbl_dot_gov>> wrote:
    >>
    >>    Oliver,
    >>
    >>     The message you are getting is saying that the shared heap you
    >>    have requested is too large for our default GM support.  Use of GM
    >>    requires that the memory addressed remotely be "pinned" (prevents
    >>    the OS from swapping it out).  Our default behaviour with GM is to
    >>    try to pin the entire shared heap, and this configuration is known
    >>    as "SEGMENT_FAST" because it provides the greatest speed for
    >>    remote memory access, but at the possible cost of limited heap
    >>    size.  The message you are getting indicates that
    >>    gm_register_memory() function failed to pin the shared heap.
    >>
    >>     So, the simplest fix is probably to ask for a smaller shared heap
    >>    if possible (you can pass --shared-heap=N to upcrun without
    >>    needing to recompile the executable).  You should also see if
    >>    reducing the setting of the environment variable
    >>    GASNET_PHYSMEM_PINNABLE_RATIO might help.  This variable is 0.7 by
    >>    default and indicates the largest fraction of physical memory
    >>    we'll ask GM to pin.  Of course if you /need/ the large shared
    >>    heap size, then you'll need the "SEGMENT_LARGE" option, below.
    >>
    >>     The GM support in Berkeley UPC can be compiled in "SEGMENT_LARGE"
    >>    mode to allow for a larger shared heap.  However, this is
    >>    accomplished by dynamically pinning and unpinning of portions of
    >>    memory, which can lead to a reduction in speed relative to
    >>    SEGMENT_FAST.  To get the "LARGE" segment support you will need to
    >>    reconfigure Berkeley UPC with "--enable-segment-large" on the
    >>    configure command line, recompile and reinstall Berkeley UPC and
    >>    then recompile your application with the new Berkeley UPC
    >>    installation.
    >>
    >>     Note that --enable-segment-large affects all the networks,
    >>    meaning that the IBV support in such a build will also be switched
    >>    into "LARGE" mode.  So, you may want to consider keeping two
    >>    separate builds of the Berkeley UPC runtime ("FAST" for IBV, and
    >>    "LARGE" for GM).  For MPI there is actually no distinction between
    >>    the two segment modes.
    >>
    >>    -Paul
    >>
    >>    Oliver Perks wrote:
    >>
    >>        This looks like it may be a very simple fix but I honestly
    >>        have no idea where to start. Sadly nothing obvious from google
    >>        searches.
    >>        My UPC 2.10.0 build works fine over MPI( openmpi 1.4.1 - built
    >>        for GM, IBV and eth), IBV but not GM, it crashes with the
    >>        following error.
    >>
    >>        *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >>        NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in
    >>        the environment to generate a backtrace.
    >>        *** Caught a fatal signal: SIGABRT(6) on node 3/4
    >>        bash: line 1: 14500 Aborted                 /usr/bin/env
    >>        GMPI_MASTER=10.131.56.61 GMPI_PORT=8000 GMPI_SHMEM=1
    >>
    >>  LD_LIBRARY_PATH=/opt/mpi/openmpi/1.4.1/gnu/lib:/opt/myrinet/2.1.30/lib/:/opt/upc/berkley/2.10.0/gnu//lib
    >>        GMPI_MAGIC=2016437 GMPI_ID=0 GMPI_NP=4 GMPI_BOARD=-1
    >>        GMPI_SLAVE=10.131.56.61
    >>        GASNET_SSH_SERVERS="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
    >>        GASNET_GASNETRUN_GM=1
    >>        UPC_NODES="vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon41.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon41.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        vogon40.deepthought.hpsg.dcs.warwick.ac.uk
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>
    >>        <http://vogon40.deepthought.hpsg.dcs.warwick.ac.uk>"
    >>        /home/fjp/Graphs/UPC/./floyd2-parallel '8000'
    >>
    >>
    >>
    >>        GASNET_BACKTRACE=1 Adds this extra information
    >>
    >>        *** FATAL ERROR: Can't pin FAST Segment of 532.27 MB
    >>        [3] /usr/bin/gdb -nx -batch -x /tmp/gasnet_IXY5Ly
    >>        '/home/fjp/Graphs/UPC/./floyd2-parallel' 10899
    >>        [3] [Thread debugging using libthread_db enabled]
    >>        [3] [New Thread 0x403190c0 (LWP 10899)]
    >>        [3] 0x401cf3ae in __waitpid_nocancel () from /lib/libpthread.so.0
    >>        [3] #0  0x401cf3ae in __waitpid_nocancel () from
    >>        /lib/libpthread.so.0
    >>        [3] #1  0x4005946c in system (
    >>        [3]     cmd=0x8115ac0 "/usr/bin/gdb -nx -batch -x
    >>        /tmp/gasnet_IXY5Ly '/home/fjp/Graphs/UPC/./floyd2-parallel'
    >>        10899") at ./libgm/gm_fork_system.c:227
    >>        [3] #2  0x080a01ad in gasneti_bt_gdb ()
    >>        [3] #3  0x080a27fb in gasneti_print_backtrace ()
    >>        [3] #4  0x080a0446 in gasneti_fatalerror ()
    >>        [3] #5  0x0809130d in gasnetc_attach ()
    >>        [3] #6  0x08069ee3 in upcr_startup_attach ()
    >>        [3] #7  0x0807de2e in bupc_init_reentrant ()
    >>        [3] #8  0x08061698 in main ()
    >>        [0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_qB0Wxz
    >>        '/home/fjp/Graphs/UPC/./floyd2-parallel' 14630
    >>
    >>
    >>
    >>        Program compiled with `upcc -shared-heap=900 -network=gm`
    >>
    >>        Regards
    >>        Oliver Perks
    >>
    >>        --         Oliver Perks
    >>        CS204 Department of Computer Science
    >>        University of Warwick
    >>
    >>
    >>
    >>    --     Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >>    <mailto:PHHargrove_at_lbl_dot_gov>
    >>
    >>    Future Technologies Group                 Tel: +1-510-495-2352
    >>    HPC Research Department                   Fax: +1-510-486-6900
    >>    Lawrence Berkeley National Laboratory
    >>
    >>
    >>
    >> --
    >> Oliver Perks
    >> CS204 Department of Computer Science
    >> University of Warwick
    >>
    >
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    
    
    
    -- 
    Oliver Perks
    CS204 Department of Computer Science
    University of Warwick
    

  • Next message: Debabrata Midya: "Re: UPC on Windows"