Re: Question regarding blocksize

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Mar 23 2010 - 12:57:03 PDT

  • Next message: Paul H. Hargrove: "Re: Expense of BUPC timer functions"
    The UPC spec permits an implementation to determine its own maximum 
    block size, which is available as the preprocess-time constant 
    In the case of Berkeley UPC we have the ability to trade off blocksize 
    limitations for thread-count limitations, by adjusting at configure time 
    how many bits in the 64-bit "packed" representation of a shared pointer 
    are used for each field.  By default on a 64-bit systems we devote 20 
    bits to "phase" which yields the 2^20 limit on blocksize, and 10 bits 
    for "thread" which limits runs one to 1024 UPC threads.  We also have 34 
    bits left for "addressing", which limits one to 16GB of shared heap per 
    UPC thread.  OR, one can choose to use a 128-bit struct representation 
    which is, for most practical purposes, unlimited (2^32 threads, 2^32 max 
    blocksize and 2^64 shared heap per thread).  Unfortunately, the struct 
    representation results in slightly lower performance.
    Given the relatively large node count and memory per node, I cannot see 
    a "good" trade-off being selected for Ranger - either one has too few 
    thread-bits to come close to spanning the core count of Ranger, or one 
    has too small a max blocksize to utilize the large per-core memory via 
    large blocksized arrays.  Not every UPC code/user needs all of the 
    cores, nor large blocksize arrays, but we cannot assume that no users 
    will ever need either.  I think that the next time we build BUPC for 
    Ranger we should consider lifting some of these limits (see below).
    On the subject of "the next time we build BUPC for Ranger", the version 
    available on Ranger via "module load beta upc" is 2.8.0, while 2.10.0 
    was released in Nov 2009.  I think it is time (after we pass some 
    proposal deadlines on April 6) that we look at building BUPC 2.10.0 on 
    Ranger.  AND we can see about addressing the 
    max-blocksize-vs-max-threads - what I would propose is that we can use 
    our "multiconf" capability to build multiple versions of the runtime (as 
    we do for -g vs -O) that are selected based on command line options to 
    upcc.  With just some minor config file additions one could have 4 types 
    of builds (8 total due to debug-vs-opt) that would be selected from 
    based on the presence or absence of two flags made up for this purpose: 
    --large-block-size and --large-thread-count (as suggestions).  With 
    neither we'd use the defaults, with only --large-block-size we'd 
    increase the max block size at the expense of max thread count, with 
    only --large-thread-count we'd trade-off in the opposite direction, and 
    with both passed we'd use the 128-bit struct representation to eliminate 
    the trade off at the expense of some performance.
    Not sure who is responsible for what at TACC, so feel free to forward 
    the suggested build idea to Victor, Bill or Jim as appropriate.
    I've set the Reply-To to upc-devel_at_lbl_dot_gov with the expectation that 
    we'll be discussing the TACC Ranger builds.
    If instead you want to discuss UPC_MAX_BLOCK_SIZE some more, feel free 
    to reply to upc-users_at_lbl_dot_gov instead.
    Yaakoub El Khamra wrote:
    > Greetings
    > I am starting to prototype a causal sets code with UPC and I am 
    > running into the following error: "Maximum block size in this 
    > implementation is 1048576". I am using the -O -opt options and this is 
    > on ranger.
    > The message is obvious and decreasing the size of the block does get 
    > things working again. However I am wondering if there are any 
    > references I can read about the block size limit. Any recommendations 
    > or suggestions?
    > Regards
    > Yaakoub El Khamra
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     

  • Next message: Paul H. Hargrove: "Re: Expense of BUPC timer functions"