From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Mar 23 2010 - 12:57:03 PDT
Yaakoub, The UPC spec permits an implementation to determine its own maximum block size, which is available as the preprocess-time constant UPC_MAX_BLOCK_SIZE. In the case of Berkeley UPC we have the ability to trade off blocksize limitations for thread-count limitations, by adjusting at configure time how many bits in the 64-bit "packed" representation of a shared pointer are used for each field. By default on a 64-bit systems we devote 20 bits to "phase" which yields the 2^20 limit on blocksize, and 10 bits for "thread" which limits runs one to 1024 UPC threads. We also have 34 bits left for "addressing", which limits one to 16GB of shared heap per UPC thread. OR, one can choose to use a 128-bit struct representation which is, for most practical purposes, unlimited (2^32 threads, 2^32 max blocksize and 2^64 shared heap per thread). Unfortunately, the struct representation results in slightly lower performance. Given the relatively large node count and memory per node, I cannot see a "good" trade-off being selected for Ranger - either one has too few thread-bits to come close to spanning the core count of Ranger, or one has too small a max blocksize to utilize the large per-core memory via large blocksized arrays. Not every UPC code/user needs all of the cores, nor large blocksize arrays, but we cannot assume that no users will ever need either. I think that the next time we build BUPC for Ranger we should consider lifting some of these limits (see below). On the subject of "the next time we build BUPC for Ranger", the version available on Ranger via "module load beta upc" is 2.8.0, while 2.10.0 was released in Nov 2009. I think it is time (after we pass some proposal deadlines on April 6) that we look at building BUPC 2.10.0 on Ranger. AND we can see about addressing the max-blocksize-vs-max-threads - what I would propose is that we can use our "multiconf" capability to build multiple versions of the runtime (as we do for -g vs -O) that are selected based on command line options to upcc. With just some minor config file additions one could have 4 types of builds (8 total due to debug-vs-opt) that would be selected from based on the presence or absence of two flags made up for this purpose: --large-block-size and --large-thread-count (as suggestions). With neither we'd use the defaults, with only --large-block-size we'd increase the max block size at the expense of max thread count, with only --large-thread-count we'd trade-off in the opposite direction, and with both passed we'd use the 128-bit struct representation to eliminate the trade off at the expense of some performance. Not sure who is responsible for what at TACC, so feel free to forward the suggested build idea to Victor, Bill or Jim as appropriate. -Paul PS I've set the Reply-To to upc-devel_at_lbl_dot_gov with the expectation that we'll be discussing the TACC Ranger builds. If instead you want to discuss UPC_MAX_BLOCK_SIZE some more, feel free to reply to upc-users_at_lbl_dot_gov instead. Yaakoub El Khamra wrote: > > Greetings > I am starting to prototype a causal sets code with UPC and I am > running into the following error: "Maximum block size in this > implementation is 1048576". I am using the -O -opt options and this is > on ranger. > > The message is obvious and decreasing the size of the block does get > things working again. However I am wondering if there are any > references I can read about the block size limit. Any recommendations > or suggestions? > > Regards > Yaakoub El Khamra > -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group Tel: +1-510-495-2352 HPC Research Department Fax: +1-510-486-6900 Lawrence Berkeley National Laboratory