Re: Question regarding blocksize

From: Yaakoub El Khamra (yye00_at_tacc.utexas.edu)
Date: Tue Mar 23 2010 - 13:55:12 PDT

  • Next message: Junchao Zhang: "Will UPC caches remote data?"
    Thank you very much for the detailed explanation. The UPC_MAX_BLOCK_SIZE is
    a not a show-stopping limitation at this point. It is quite reassuring that
    it can be increased. For the moment, more work needs to be done on the code
    before we start running production cases, and I am afraid I will bug you
    then regarding the ranger installation.
    
    Regards
    Yaakoub El Khamra
    
    
    
    On Tue, Mar 23, 2010 at 1:57 PM, Paul H. Hargrove <PHHargrove_at_lbl_dot_gov>wrote:
    
    > Yaakoub,
    >
    > The UPC spec permits an implementation to determine its own maximum
    > block size, which is available as the preprocess-time constant
    > UPC_MAX_BLOCK_SIZE.
    >
    > In the case of Berkeley UPC we have the ability to trade off blocksize
    > limitations for thread-count limitations, by adjusting at configure time
    > how many bits in the 64-bit "packed" representation of a shared pointer
    > are used for each field.  By default on a 64-bit systems we devote 20
    > bits to "phase" which yields the 2^20 limit on blocksize, and 10 bits
    > for "thread" which limits runs one to 1024 UPC threads.  We also have 34
    > bits left for "addressing", which limits one to 16GB of shared heap per
    > UPC thread.  OR, one can choose to use a 128-bit struct representation
    > which is, for most practical purposes, unlimited (2^32 threads, 2^32 max
    > blocksize and 2^64 shared heap per thread).  Unfortunately, the struct
    > representation results in slightly lower performance.
    >
    > Given the relatively large node count and memory per node, I cannot see
    > a "good" trade-off being selected for Ranger - either one has too few
    > thread-bits to come close to spanning the core count of Ranger, or one
    > has too small a max blocksize to utilize the large per-core memory via
    > large blocksized arrays.  Not every UPC code/user needs all of the
    > cores, nor large blocksize arrays, but we cannot assume that no users
    > will ever need either.  I think that the next time we build BUPC for
    > Ranger we should consider lifting some of these limits (see below).
    >
    > On the subject of "the next time we build BUPC for Ranger", the version
    > available on Ranger via "module load beta upc" is 2.8.0, while 2.10.0
    > was released in Nov 2009.  I think it is time (after we pass some
    > proposal deadlines on April 6) that we look at building BUPC 2.10.0 on
    > Ranger.  AND we can see about addressing the
    > max-blocksize-vs-max-threads - what I would propose is that we can use
    > our "multiconf" capability to build multiple versions of the runtime (as
    > we do for -g vs -O) that are selected based on command line options to
    > upcc.  With just some minor config file additions one could have 4 types
    > of builds (8 total due to debug-vs-opt) that would be selected from
    > based on the presence or absence of two flags made up for this purpose:
    > --large-block-size and --large-thread-count (as suggestions).  With
    > neither we'd use the defaults, with only --large-block-size we'd
    > increase the max block size at the expense of max thread count, with
    > only --large-thread-count we'd trade-off in the opposite direction, and
    > with both passed we'd use the 128-bit struct representation to eliminate
    > the trade off at the expense of some performance.
    >
    > Not sure who is responsible for what at TACC, so feel free to forward
    > the suggested build idea to Victor, Bill or Jim as appropriate.
    >
    > -Paul
    >
    > PS
    > I've set the Reply-To to upc-devel_at_lbl_dot_gov with the expectation that
    > we'll be discussing the TACC Ranger builds.
    > If instead you want to discuss UPC_MAX_BLOCK_SIZE some more, feel free
    > to reply to upc-users_at_lbl_dot_gov instead.
    >
    >
    > Yaakoub El Khamra wrote:
    > >
    > > Greetings
    > > I am starting to prototype a causal sets code with UPC and I am
    > > running into the following error: "Maximum block size in this
    > > implementation is 1048576". I am using the -O -opt options and this is
    > > on ranger.
    > >
    > > The message is obvious and decreasing the size of the block does get
    > > things working again. However I am wondering if there are any
    > > references I can read about the block size limit. Any recommendations
    > > or suggestions?
    > >
    > > Regards
    > > Yaakoub El Khamra
    > >
    >
    >
    > --
    > Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    > Future Technologies Group                 Tel: +1-510-495-2352
    > HPC Research Department                   Fax: +1-510-486-6900
    > Lawrence Berkeley National Laboratory
    >
    >
    

  • Next message: Junchao Zhang: "Will UPC caches remote data?"