2007-10-30 Berkeley UPC release 2.6.0 - Multiconf build manager is now enabled by default, providing easy access to multiple UPCR configs (eg debug and opt) from the upcc command line - lapi-conduit now uses RDMA support on LAPI/Federation systems, when available, to improve communication performance - Data movement collectives now use a scalable, high-performance implementation - Add upcc -extern-main flag for programs with main() in non-UPC code - Suppress harmless warnings caused by a gcc 4.2 optimizer bug - Document workarounds for a gcc 4.x optimizer bug that can affect the correctness of shared-local accesses in rare cases. 2007-09-13 Berkeley UPC release 2.5.10 (Cray XT only beta release) - Upgrade portals network support to be fully native - Add pthreads support on compute-node Linux - Value collectives v1.2: minor usability upgrades - Fully inline local put/get operations for GCCUPC+UPCR - GCCUPC+UPCR now requires GCCUPC v4.x or newer 2007-02-01 Berkeley UPC release 2.5.8 (Cray XT-3 only beta release) - Add native support for OpenIB networks via new 'ibv' network - Port the runtime to: CrayXT/Linux, SunC/Linux, OpenBSD/x86 - Add multiconf config manager, allowing upcc options to select appropriate install - Fix a bug with handling of multiple trans_extra files - Add upcrun options for backtracing and freezing - Add upcrun options for argument and environment encoding for buggy spawners - Add new test harness options to select groups of tests based on filters - Add a valgrind warning suppression file: gasnet.supp - Fix the following notable bugs in 2.4.0 (see http://upc-bugs.lbl.gov for details): - bug1853: compiler mismatch warnings for GCC/UPC 2006-11-02 Berkeley UPC release 2.4.0 (and 2.3.16 BETA 1) - Add initial native support for the Cray XT3 via new 'portals' network - Implement the GASP 1.5 performance instrumentation interface, supporting the Parallel Performance Wizard (PPW) and other third-party profiling tools. - Add bupc_ticks_to_ns() - finer granularity timer query - Add the Berkeley implementations of the UPC collectives and UPC-IO to GCCUPC+UPCR - Add most of the Berkeley UPC library extensions to GCCUPC+UPCR - Add upcdecl command-line tool (also online at: http://upc.lbl.gov/upcdecl) - Add support for alloca() and stdarg.h - Performance improvements to the BUPC semaphore library for signalling store - Add bupc_thread_distance() - runtime thread layout query for hierarchical systems - Add a remote fetch-and-add UPC library extension (initially just for 64-bit ints) - Allow configure-time tuning of bit distribution in packed pointer-to-shared rep - Fix the following notable bugs in 2.2.2 (see http://upc-bugs.lbl.gov for details): - bug525: optimizer crashes on Tru64/CompaqC for libgasnet - bug1229: More robust preprocessing on Compaq C - bug1389: ansi-aliasing violations on small local put/get copies - bug1531: improved lock fairness to remote lock requests - bug1594: timer inaccuracies on Cray X1E - bug1645: preprocess-time failure 'Backslash found where operator expected' - bug1657: PACKAGE_* symbols exposed to UPC code on GCCUPC+UPCR - bug1683: improve upcrun handling of -shared-heap-max - bug 1743: More robust behavior when backend C compiler changes - Improved SRV-based DNS failover for upcc HTTP translation - Add gzip compression to HTTP netcompile, for faster compiles over slow links - Improved robustness for SSH netcompile to handle stray output from dotfiles - Numerous misc minor bug fixes 2006-03-15 Berkeley UPC release 2.2.3 (Cray XT-3 only bug-fix release) - Workaround for gcc 3.2.3 optimizer hang when compiling UPC code on XT-3 - Fix GCCUPC+UPCR specific bug - broken initialization of static shared data 2006-03-07 Berkeley UPC release 2.2.2 - Port translator to new platforms: MacOSX/PPC32, Linux/PPC64 and AIX/PPC32 - Port runtime to: MacOSX/x86, MacOSX/PPC64 and Cray XD1 - Translator build improvements: auto platform detection and install target - Numerous translator optimizer improvements - Fix the following bugs in 2.2.1 (see http://upc-bugs.lbl.gov for details): bug990: runtime failures on PPC/Linux with XLC bug1316: AMMPI workaround for bug in IBM MPI bug1300: upcc tweaks to support OSX translator (auto-set shared lib paths) bug1324: string.h compilation errors on RHEL4/x86-64 bug1327: intermittent ref-collectives crashes on ppc/xlc bug1337: workaround pathscale optimizer bug breaking libupcr barriers bug1358: initialization failure finding an mmap segment bug1367: vapi-conduit under-utilizes physical memory bug1375: update handling of mpi-incompatible conduit configs bug1378: miscompilation w/ icc-9.0.027 on ia64 bug1392: mysterious non-collective exits from vapi-conduit bug1443: compile errors when using runtimes with non-canonical install paths bug1452: bad codegen for embedded struct alignment exceptions on PowerPC bug1475: broken rand() behavior with pthreads on Cygwin bug1490: link failures on HP C bug1493: barrier mismatch failures on AIX/Power5 with pthreads intermittent exit crashes when profiling with pthreads intermittent gmon.out loss when profiling with multiple nodes intermittent crash on realloc in debug mode - Improve handling of asm statements in system headers - Robustness improvements to header-wrapper infrastructure - Add auto-retry and DNS failover to upcc HTTP translation - Expand gcc_as_cc to improve robustness of processing for Sun C and PGI - Many fixes to Cray XT3 port - Numerous misc minor bug fixes 2005-10-20 Berkeley UPC release 2.2.1 (no translator release) - Fix the following bugs in 2.2 (see http://upc-bugs.lbl.gov for details): bug569: upcc -version leaks temp files bug1185: startup crash on Altix running Propack4 (davinci) bug1226-7: gasnet-trace improvements to deal with bad tracemask values bug1247: -translator flag broken for ssh netcompile bug1261: stdio.h broken on Mac OSX 10.4 bug1262: potential memory corruption on upc_all_alloc bug1263: upcrun broken for non-uniform pthread layouts bug1270: limited number of shared globals per file bug1287: fix blocksize units on upc_all_{fread,fwrite}_shared() bug1297: incorrect behavior for puts to remote shared 'float' variables - other misc minor bug fixes 2005-8-27 Berkeley UPC release 2.2 (and 2.1.18, 2.2 BETA 2) - Fully compliant with the UPC 1.2 language specification, including support for UPC collectives and the optional UPC-IO interface. - Berkeley UPC programs can now be debugged with Totalview on x86 over MPI or Quadrics. - UPC-to-C translator now supported on Linux x86, Itanium and Opteron systems, as well as Tru64/Alpha. - Experimental support for faster 'symmetric' shared pointers (with blocksize == 1 or indefinite blocksize) on smp-pthreads and SHMEM. - Experimental support for optimizations at the UPC language level within our UPC-to-C translator: Use 'upcc -opt ...' to enable. - Improve the performance of local shared accesses. - Substantial performance improvements to the upc_lock library. - New bupc_ptradd library extension enables pointer-to-shared arithmetic with a variable (non-constant) blocksize. - New bupc_tick_t library extensions expose cycle-granularity wall-clock timers to UPC code. - Performance improvements to bupc_mem{put,get,cpy}_async. - Added prototype implementation of proposed UPC semaphore library. - bupc_collectivev.h provides a convenience wrapper that adds simple-to-use value-based collectives. - Add library extensions for printing to and controlling communication tracing. - Myrinet/GM-based Berkeley UPC programs can now interoperate with MPI. - Many GASNet performance/functionality improvements, including improved barrier performance on many platforms, and optimized collective operations. See 'gasnet/ChangeLog' for details. - Cross-compilation support for the Cray X-1. - Experimental support for the Cray XT3 and IBM Blue Gene/L (contact us for details). - Improve the flexibility of max shared heap size. - Add automatic malloc heap debug checking in --enable-debug mode. - Add automatic cache alignment for large shared heap objects. - UPC-to-C translator now passes '#pragma' and 'restrict' in user UPC code to the back-end C compiler. - Improve processor affinity of shared heap objects in pthreaded configs. - New upcc -pg option embeds gprof sequential profiling information, if supported by the C compiler. - Improved upcc heuristic detection of C/UPC header language mode. - Expand upcrun -i to show more useful program information. - Improved creation and error checking for program stack under pthreads. - Various improvements to upc_trace. - Removed the need for users to hard-code a maximum per-node size for shared memory at configure time. - Improve the performance and functionality of upc_trace. - Many, many bug fixes, for both language constructs and platform portability. See http://upc-bugs.lbl.gov for complete details. 2004-12-06 Berkeley UPC release 2.1.0 (2.2 BETA 1) - IMPORTANT: this is a BETA release, and may not be as stable as our official releases. - Added a reference implementation of the UPC I/O spec. This implementation is not designed to be highly performant, but it is believed to be complete and stable. - The SHMEM network API is now supported, at least on SGI Altix and Cray X1 systems. - Support for Quadrics' Elan 4 API added. Performance tuning is not yet complete, but the implementation is stable. - Support for using GCC/UPC 3.3.2.9. You must upgrade to this version of the runtime if you wish to use this version (or greater) of GCC/UPC, due to changes in the GCC/UPC interface. - Improved the automated UPC test harness to support any UPC 1.1 compiler - Numerous bugfixes. 2004-10-04 Berkeley UPC release 2.0.1 - GM/Myrinet network layer was broken in 2.0.0. - Fixed upc_trace and improved documentation. - Added documentation on running UPC over UDP networks. 2004-09-15 Berkeley UPC release 2.0 - Full implementation of UPC collectives (as per version 1.0 of the UPC Collective specification) - MPI/C++/C/FORTRAN/UPC interoperability support added - Runtime now works with GCC UPC binary compiler (see INSTALL for more information) - Runtime memory allocator no longer divides shared memory 50-50% between upc_global_alloc/upc_all_alloc and upc_alloc: either type of allocation allowed to go over 50% of shared region, so long as total of both < 100% - Added '-network=udp' which runs UPC over UDP, and is thus supported by any network with a standard TCP/IP stack (this is now the recommended layer for use on Ethernet hardware). - Added numerous Berkeley-specific extensions to the UPC memcpy libraries to support explicitly asynchronous and non-contiguous bulk data movement (see http://upc.lbl.gov/publications/upc_memcpy.pdf). - Experimental support for Dolphin SCI networks. - Ported to new platforms: Cray X1, AMD Athlon/Opteron, Sun Pro C, HP C - 'upc_trace' application added to allow profiling of UPC application's network traffic - GM (Myrinet) network layer now supports pthreaded UPC applications - Added auto-detection of most network drivers - Fixes to ensure memory consistency behavior matches the current memory model proposal (due to be introduced in UPC spec 1.2) - Runtime interface has changed, so 1.x translators/runtimes will not work with their 2.x counterparts (i.e. if you have installed your own translator, you will need to upgrade it at the same time as your runtime). - Enhanced heuristics for detecting C-mode and UPC-mode headers when compiling for pthreads - Greatly expanded compiler automated test suite, which is now run nightly on many platforms - Many, many bugfixes, and much-improved stability/portability. 2003-11-14 UPC release 1.1.0 - adheres to UPC 1.1.1 specification - added pthread support - added support for VAPI/Infiniband networks - added 'smp' (single process, no network) -network option - Added support for HTTP-based remote translation - More C compilers (Portland Group, Intel ecc) supported. - 'detect-upc' utility removes need to add #pragma to UPC .h files - upcrun improvements 2002-06-08 -- Added an initial implementation of UPC barrier (for the trivial case of processes and the non-trivial case of pthreads) -- Added an initial implementation of UPC locks (using GASNet core AM calls) -- Both compile, need the rest of the system to test functionality 2002-06-07 -- All shared ptr functions now working & tested for naive implementation. 2002-06-04 -- Phaseless ptr arithmetic now working. 2002-06-02 -- Have phased shared ptr arithmetic working for both positive/negative offsets, with test framework. 2002-05-15 -- CVS tag VERSION_4: Merged changes from 0.4 of the runtime spec