Berkeley UPC - Unified Parallel C

(A joint project of LBNL and UC Berkeley)


UPC Task Library


  1. Download task-0.6.1.tgz
  2. Unpack the tar file (eg. tar xvzf task-0.6.1.tgz)
  3. Compile - there are Makefile examples for each test examples under the apps directory.
    When using Makefile in the apps directory, set the UPCC_FLAGS and CC variables accordingly.


Please send your comments and questions to

User Manual

A. Overview

UPC Task Library is a simple and effective way of expressing task parallelism in UPC. It is a library that helps users perform dynamic load balancing in a partitioned global address space model. This document provides a high-level API that abstracts concurrent task management details and dynamic load balancing mechanism.

A.1 Task Programming

A parallel loop and recursive divide-and-conquer program with potential load imbalance are good candidates for dynamic tasking. Transforming a parallel loop into a tasking form is easy (see section C.1 for more information). Recursive divide-and-conquer program (a.k.a. Fork/Join parallelism) can be written in blocking style. Figure 1 shows a Fibonacci example written using UPC task library. A task function, FIB, spawns two child tasks in line 11 and 12, respectively, and waits until these child tasks complete at line 13. In line 20, the main function allocates and initializes a global task queue and upc_barrier in line 21 ensures that all threads have allocated task queue before any thread uses it. In line 23, taskq_all_run executes tasks in the task queue.

  01: #include "upc.h"
  02: #include "upc_task.h"
  04: taskq_t *taskq;
  06: void FIB( int *n, int *out ) {
  07:   int n1 = *n-1;
  08:   int n2 = *n-2;
  09:   int x, y;
  10:   if (*n < 2){ *out = *n; return;}  /* termination condition */
  11:   taskq_put(taskq, FIB, &n1, &x);
  12:   taskq_put(taskq, FIB, &n2, &y);
  13:   taskq_wait(taskq);
  14:   *out = x + y;
  15: }
  17: int main(int argc, char *argv[]) {
  18:   int N, result;
  19:   N = (int) atoi(argv[1]);
  20:   taskq = taskq_all_alloc(1, FIB, sizeof(int), sizeof(int));
  21:   upc_barrier;    
  22:   if (MYTHREAD==0) taskq_put(taskq, FIB, &N, &result);  
  23:   taskq_all_run(taskq);  /* executes tasks in the taskq */
  24:   if (MYTHREAD==0) printf("Fibonacci(%d) result = %d\n", N, result);
  25:   taskq_all_free(taskq);
  26: } 

	        Figure 1. Fibonacci task example

A.2 Task and Task Function

A task is a unit of executable code, which is defined as a task function with pointers to its input and output data as arguments. A task function is declared in the user code as follows:

void task_func(void *input, void *output);

Input and output data are stored in contiguous memory and the sizes of input and output are specified when a task queue is allocated (see Section B.1). The input data is copied into the task library space when a task is created and travels with a task whenever a task migrates between threads. Data in the input and output region can be either a value or a reference. However, if such a data is a reference type, it should be a pointer to shared data because pointers to private data are meaningless when they move.

Within its function body, a task function can read its local variables, its input pointer, pointers to shared data, and file scope const variables and can write to its local variables, its output pointer, and pointers to shared data.

User programs cannot directly call task functions, instead task functions are put into the task queue and then executed via taskq_execute library function. (taskq_all_run can also execute tasks in the task queue since it will internally call taskq_execute function.)

A.3 Terms and Definitions

B. Task Library API

B.1 Task Queue Allocation and Free

taskq_t *taskq_all_alloc(int n, void *func1, int input_sz1, int output_sz1, ..., void *funcn, int input_szn, int output_szn);

void taskq_all_free(taskq_t *taskq);

B.2 Task Queue Data Operation

The functions in this section are not collective functions and the arguments of these functions are not single-valued.

int taskq_put(taskq_t *taskq, void *func, void *in, void *out);

int taskq_execute(taskq_t *taskq);

int taskq_steal(taskq_t *taskq);

int taskq_isEmpty(taskq_t *taskq);

int taskq_all_isEmpty(taskq_t *taskq);

void taskq_all_run(taskq_t *taskq);

B.3 Taskq Synchronization

The synchronization functions are not collective functions and are called within a task function.

void taskq_wait (taskq_t *taskq);

void taskq_fence (taskq_t *taskq);

B.4 Task Queue Parameter Setting Functions

Figure 2. Global task queue distributed on four UPC threads

This section describes APIs to control the behavior of a global task queue. Figure 2 shows the block diagram of a global task queue allocated on four UPC threads. Each thread manages its local task queue, which is further split into the shared region and the private region. The shared region is subject to concurrent accesses, thus operations to the shared region are serialized through a lock, but only the local thread can access the private region. From the user's point of view, the global task queue behaves as a stack because tasks are accessed from the top of the local task queue. On the other hand, on task stealing, a thief thread steals tasks from the bottom of victim thread's local task queue.

The following 4 functions are not collective functions and the arguments are not single-valued.

void taskq_set_steal_size(taskq_t *taskq, int k);

void taskq_set_chunk_size (taskq_t *taskq, int k);

void taskq_set_threshold (taskq_t *taskq, int k);

C. Task Scheduling

C.1 Independent Task Execution - Parallel Loop

	01:  taskq_t *taskq; 
	02:  shared int A[10000]; 
	03:  void foo(int *index) { // a task function 
	04:    A[*index] = /* do some work */;  
	05:  } 
	06:  int main() { 
	07:    taskq = taskq_all_alloc(1, foo, sizeof(int), 0); 
	08:    upc_forall(int i=0; i < 10000; i++; &A[i]) {   
	09:      taskq_put(taskq, foo, &i); 
	10:    } 
	11:    do { 
	12:      while(taskq_execute(taskq)) {;}   
	13:      if ( taskq_steal(taskq) ) continue; 
	14:    } while ( !taskq_all_isEmpty(taskq) );   
	15:  } 

      Figure 3. Parallel loop tasking example - independent task execution  

C.2 Dependent Task Graph Execution - Fork/Join Parallelism

C.2.1. Blocking style

Figure 4. Blocking style fork/join parallelism (the running tasks at each step are shaded)

D. Publications


This page last modified on Sunday, 22-Apr-2012 14:57:51 PDT