Index of /download/dist/upc-examples/guppie

Name	Last modified	Size

Parent Directory		-
Makefile	2022-10-28 13:54	556
README	2022-10-28 13:54	2.0K
guppie-async-pipeline.upc	2022-10-28 13:54	9.5K
guppie-async.upc	2022-10-28 13:54	62
guppie.upc	2022-10-28 13:54	6.0K
harness.conf	2022-10-28 13:54	1.8K

Fine-grained HPCChallenge RandomAccess (GUPS) in UPC

This directory contains three fine-grained implementations of HPCChallenge
RandomAccess (GUPS) in UPC.

All three versions use fine-grained put/get for performing updates on remote
table entries, and consequently on distributed-memory platforms are NOT
expected to be competetive with tuned implementations of GUPS that explicitly
coalesce communication and perform target-side updates. These are provided solely
as an algorithmic example of fine-grained communication in UPC, NOT as the 
best possible or even recommended implementation of HPCChallenge RandomAccess.

The three versions:

* guppie.upc - put/gets on table entries are performed using language-level
  shared array accesses.  This version was originally written for the Cray T3E,
  whose unique network hardware and custom UPC compiler (with some special flags)
  allowed this version to compile to an executable that exposed some communication 
  overlap at runtime.
  UPC compilers on modern systems are likely to compile this version to an 
  executable that uses fine-grained blocking puts and gets of shared memory, with
  no communication overlap in the case of remote data.

* guppie-async.upc - this version performs fine-grained table updates using the
  explicitly non-blocking transfer operations upc_mem{put,get}_nbi introduced
  as an optional library in UPC spec v1.3.
  This explicitly exposes communication-communication overlap between the
  individual fine-grained gets and puts performed in each chunk of updates.
  The data access pattern is otherwise unchanged from guppie.upc.

* guppie-async-pipeline.upc - this version goes a step further by additionally
  using software pipelining to schedule the asynchronous communication, in
  order to overlap computation with communication and reduce stalls waiting for
  asynchronous communication completion.

Currently all versions also use a naive, serial verification step, which won't 
scale to large thread counts.