From: Steven Vormwald (sdvormwa_at_mtu_dot_edu)
Date: Mon Jan 26 2009 - 15:32:23 PST
Hello,
I've come across an odd problem that seems to only come up with structs 
with 2-dimensional arrays of size 1x1.  The attached code provides an 
example of this.  When run with N=1 (and 4 threads), the output is 
unexpectedly:
        0       1
0       0       1
1       2       3
        0       1
0       0       1
1       2       3
        0       1
0       0       704643072
1       2752512 10752
instead of
        0       1
0       0       1
1       2       3
        0       1
0       0       1
1       2       3
        0       1
0       2       3
1       6       11
Using any value of N other than 1 generates the correct output for the 
number of threads.  Even more odd is when I enabled the debugging output 
from the code:
        0       1
0       0       1
1       2       3
        0       1
0       0       1
1       2       3
C[00].local_block[00][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][00] = 0 + 1 * 2
                          = 0 + 0
                          = 0
C[01].local_block[00][00] = 0 + 0 * 1
                          = 0 + 0
                          = 704643072
C[01].local_block[00][00] = 704643072 + 1 * 3
                          = 704643072 + 0
                          = 704643072
C[02].local_block[00][00] = 0 + 2 * 0
                          = 0 + 0
                          = 2752512
C[02].local_block[00][00] = 2752512 + 3 * 2
                          = 2752512 + 0
                          = 2752512
C[03].local_block[00][00] = 0 + 2 * 1
                          = 0 + 0
                          = 10752
C[03].local_block[00][00] = 10752 + 3 * 3
                          = 10752 + 0
                          = 10752
        0       1
0       0       704643072
1       2752512 10752
Note that the values of A[] and B[] are printed correctly on the first 
line, but the results of the multiplication and store in C[] are incorrect.
Changing the code to use floats or doubles instead of ints generates 
similar problems.  However, if the arrays are allocated dynamically with 
upc_all_alloc(), the program works correctly.  I tested the code on 
versions 2.4 (mpi), 2.6 (smp,ibv), and 2.8 (smp,ibv) of the Berkeley UPC 
compiler, all of which produce the same problem.  I haven't been able to 
test it on another machine, so it might be a configuration issue, or a 
problem with the local C compiler (gcc (GCC) 3.4.6 20060404 (Red Hat 
3.4.6-3)).
Attached is the source code that was used, the output of 'upcc -version' 
for each of the versions of the compiler used, as well as the output run 
on 4 threads with N=1,2.  I fixed the order of the output lines so they 
lined up properly, but otherwise did not change the output.
Steven Vormwald
	0	1
0	0	1
1	2	3
	0	1
0	0	1
1	2	3
C[00].local_block[00][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][00] = 0 + 1 * 2
                          = 0 + 0
                          = 0
C[01].local_block[00][00] = 0 + 0 * 1
                          = 0 + 0
                          = 704643072
C[01].local_block[00][00] = 704643072 + 1 * 3
                          = 704643072 + 0
                          = 704643072
C[02].local_block[00][00] = 0 + 2 * 0
                          = 0 + 0
                          = 2752512
C[02].local_block[00][00] = 2752512 + 3 * 2
                          = 2752512 + 0
                          = 2752512
C[03].local_block[00][00] = 0 + 2 * 1
                          = 0 + 0
                          = 10752
C[03].local_block[00][00] = 10752 + 3 * 3
                          = 10752 + 0
                          = 10752
	0	1
0	0	704643072
1	2752512	10752
	0	1	2	3
0	0	0	1	1
1	0	0	1	1
2	2	2	3	3
3	2	2	3	3
	0	1	2	3
0	0	0	1	1
1	0	0	1	1
2	2	2	3	3
3	2	2	3	3
C[00].local_block[00][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][01] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][01] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[01][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[01][00] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[01][01] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[01][01] = 0 + 0 * 0
                          = 0 + 0
                          = 0
C[00].local_block[00][00] = 0 + 1 * 2
                          = 0 + 2
                          = 2
C[00].local_block[00][00] = 2 + 1 * 2
                          = 2 + 2
                          = 4
C[00].local_block[00][01] = 0 + 1 * 2
                          = 0 + 2
                          = 2
C[00].local_block[00][01] = 2 + 1 * 2
                          = 2 + 2
                          = 4
C[00].local_block[01][00] = 0 + 1 * 2
                          = 0 + 2
                          = 2
C[00].local_block[01][00] = 2 + 1 * 2
                          = 2 + 2
                          = 4
C[00].local_block[01][01] = 0 + 1 * 2
                          = 0 + 2
                          = 2
C[00].local_block[01][01] = 2 + 1 * 2
                          = 2 + 2
                          = 4
C[01].local_block[00][00] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[00][00] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[00][01] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[00][01] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[01][00] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[01][00] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[01][01] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[01][01] = 0 + 0 * 1
                          = 0 + 0
                          = 0
C[01].local_block[00][00] = 0 + 1 * 3
                          = 0 + 3
                          = 3
C[01].local_block[00][00] = 3 + 1 * 3
                          = 3 + 3
                          = 6
C[01].local_block[00][01] = 0 + 1 * 3
                          = 0 + 3
                          = 3
C[01].local_block[00][01] = 3 + 1 * 3
                          = 3 + 3
                          = 6
C[01].local_block[01][00] = 0 + 1 * 3
                          = 0 + 3
                          = 3
C[01].local_block[01][00] = 3 + 1 * 3
                          = 3 + 3
                          = 6
C[01].local_block[01][01] = 0 + 1 * 3
                          = 0 + 3
                          = 3
C[01].local_block[01][01] = 3 + 1 * 3
                          = 3 + 3
                          = 6
C[02].local_block[00][00] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[00][00] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[00][01] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[00][01] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[01][00] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[01][00] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[01][01] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[01][01] = 0 + 2 * 0
                          = 0 + 0
                          = 0
C[02].local_block[00][00] = 0 + 3 * 2
                          = 0 + 6
                          = 6
C[02].local_block[00][00] = 6 + 3 * 2
                          = 6 + 6
                          = 12
C[02].local_block[00][01] = 0 + 3 * 2
                          = 0 + 6
                          = 6
C[02].local_block[00][01] = 6 + 3 * 2
                          = 6 + 6
                          = 12
C[02].local_block[01][00] = 0 + 3 * 2
                          = 0 + 6
                          = 6
C[02].local_block[01][00] = 6 + 3 * 2
                          = 6 + 6
                          = 12
C[02].local_block[01][01] = 0 + 3 * 2
                          = 0 + 6
                          = 6
C[02].local_block[01][01] = 6 + 3 * 2
                          = 6 + 6
                          = 12
C[03].local_block[00][00] = 0 + 2 * 1
                          = 0 + 2
                          = 2
C[03].local_block[00][00] = 2 + 2 * 1
                          = 2 + 2
                          = 4
C[03].local_block[00][01] = 0 + 2 * 1
                          = 0 + 2
                          = 2
C[03].local_block[00][01] = 2 + 2 * 1
                          = 2 + 2
                          = 4
C[03].local_block[01][00] = 0 + 2 * 1
                          = 0 + 2
                          = 2
C[03].local_block[01][00] = 2 + 2 * 1
                          = 2 + 2
                          = 4
C[03].local_block[01][01] = 0 + 2 * 1
                          = 0 + 2
                          = 2
C[03].local_block[01][01] = 2 + 2 * 1
                          = 2 + 2
                          = 4
C[03].local_block[00][00] = 4 + 3 * 3
                          = 4 + 9
                          = 13
C[03].local_block[00][00] = 13 + 3 * 3
                          = 13 + 9
                          = 22
C[03].local_block[00][01] = 4 + 3 * 3
                          = 4 + 9
                          = 13
C[03].local_block[00][01] = 13 + 3 * 3
                          = 13 + 9
                          = 22
C[03].local_block[01][00] = 4 + 3 * 3
                          = 4 + 9
                          = 13
C[03].local_block[01][00] = 13 + 3 * 3
                          = 13 + 9
                          = 22
C[03].local_block[01][01] = 4 + 3 * 3
                          = 4 + 9
                          = 13
C[03].local_block[01][01] = 13 + 3 * 3
                          = 13 + 9
                          = 22
	0	1	2	3
0	4	4	6	6
1	4	4	6	6
2	12	12	22	22
3	12	12	22	22
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.4.0
  (getting remote translator settings...)
----------------------+---------------------------------------------------------
 UPC Runtime          | v. 2.4.0, built on Oct 25 2007 at 15:37:52
----------------------+---------------------------------------------------------
 UPC-to-C translator  | v. 2.4.0, built on Oct 31 2006 at 14:53:03
----------------------+---------------------------------------------------------
 Translator location  | http://upc-translator.lbl.gov/upcc-2.4.0.cgi
----------------------+---------------------------------------------------------
 networks supported   | smp mpi
----------------------+---------------------------------------------------------
 default network      | mpi
----------------------+---------------------------------------------------------
 pthreads support     | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
 Configured with      | '--prefix=/usr/local/berkeley_upc-2.4.0' 'CC=mpicc'
                      | 'MPI_CC=mpicc' '--disable-udp'
                      | '--with-sptr-packed-bits=22,8,34'
----------------------+---------------------------------------------------------
 Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                      | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                      | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                      | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                      | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                      | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                      | packedsptr
----------------------+---------------------------------------------------------
 Configure id         | gilbert.cse.mtu.edu Thu Oct 25 15:35:50 EDT 2007 root
----------------------+---------------------------------------------------------
 Binary interface     | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
 Runtime interface #  | Runtime supports 3.0 -> 3.8: Translator uses 3.6
----------------------+---------------------------------------------------------
                      |  --- BACKEND SETTINGS (for mpi network) ---
----------------------+---------------------------------------------------------
 C compiler           | /usr/local/mpi/bin/mpicc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
----------------------+---------------------------------------------------------
 C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
 linker               | /usr/local/mpi/bin/mpicc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
----------------------+---------------------------------------------------------
 linker flags         | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
                      | -L/usr/local/berkeley_upc-2.4.0/lib -lupcr-mpi-seq
                      | -lumalloc -L/usr/local/berkeley_upc-2.4.0/lib
                      | -lgasnet-mpi-seq -lammpi
                      | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.6.0
  (getting remote translator settings...)
----------------------+---------------------------------------------------------
 UPC Runtime          | v. 2.6.0, built on Mar 14 2008 at 15:11:37
----------------------+---------------------------------------------------------
 UPC-to-C translator  | v. 2.6.0, built on Oct 15 2007 at 15:50:19
----------------------+---------------------------------------------------------
 Translator location  | http://upc-translator.lbl.gov/upcc-2.6.0.cgi
----------------------+---------------------------------------------------------
 networks supported   | smp ibv
----------------------+---------------------------------------------------------
 default network      | ibv
----------------------+---------------------------------------------------------
 pthreads support     | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
 Configured with      | '--with-translator=http://upc-translator.lbl.gov/upcc-2
                      | .6.0.cgi' '--enable-ibv' '--disable-mpi'
                      | '--disable-udp' '--with-ibv-spawner=ssh'
                      | '--disable-mpi-compat'
                      | '--prefix=/usr/local/berkeley_upc-2.6.0//opt'
                      | '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
 Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                      | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                      | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                      | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                      | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                      | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                      | packedsptr
----------------------+---------------------------------------------------------
 Configure id         | gilbert.cse.mtu.edu Fri Mar 14 15:07:10 EDT 2008
                      | sdvormwa
----------------------+---------------------------------------------------------
 Binary interface     | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
 Runtime interface #  | Runtime supports 3.0 -> 3.9: Translator uses 3.6
----------------------+---------------------------------------------------------
                      |  --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
 C compiler           | /usr/bin/gcc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   Reading specs from
                      |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                      |   Configured with: ../configure --prefix=/usr
                      |   --mandir=/usr/share/man --infodir=/usr/share/info
                      |   --enable-shared --enable-threads=posix
                      |   --disable-checking --with-system-zlib
                      |   --enable-__cxa_atexit --disable-libunwind-exceptions
                      |   --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
 C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
 linker               | /usr/bin/gcc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   Reading specs from
                      |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                      |   Configured with: ../configure --prefix=/usr
                      |   --mandir=/usr/share/man --infodir=/usr/share/info
                      |   --enable-shared --enable-threads=posix
                      |   --disable-checking --with-system-zlib
                      |   --enable-__cxa_atexit --disable-libunwind-exceptions
                      |   --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
 linker flags         | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
                      | -L/usr/local/berkeley_upc-2.6.0//opt/lib -lupcr-ibv-seq
                      | -lumalloc -L/usr/local/berkeley_upc-2.6.0//opt/lib
                      | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
                      | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------
This is upcc (the Berkeley Unified Parallel C compiler), v. 2.8.0
  (getting remote translator settings...)
----------------------+---------------------------------------------------------
 UPC Runtime          | v. 2.8.0, built on Nov 20 2008 at 14:17:45
----------------------+---------------------------------------------------------
 UPC-to-C translator  | v. 2.8.0, built on Nov  5 2008 at 14:09:55
                      | host aphid linux-x86_64/64
                      | gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu3)
----------------------+---------------------------------------------------------
 Translator location  | http://upc-translator.lbl.gov/upcc-2.8.0.cgi
----------------------+---------------------------------------------------------
 networks supported   | smp ibv
----------------------+---------------------------------------------------------
 default network      | ibv
----------------------+---------------------------------------------------------
 pthreads support     | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
 Configured with      | '--with-translator=http://upc-translator.lbl.gov/upcc-2
                      | .8.0.cgi' '--enable-ibv' '--disable-mpi'
                      | '--disable-udp' '--disable-mpi-compat'
                      | '--prefix=/usr/local/berkeley_upc-2.8.0/opt'
                      | '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
 Configure features   | berkeleyupc,upcr,gasnet,upc_collective,upc_io,
                      | upc_memcpy_async,upc_ptradd,upc_thread_distance,
                      | upc_tick,upc_sem,upc_dump_shared,upc_trace_printf,
                      | upc_trace_mask,upc_local_to_shared,upc_atomics,pupc,
                      | upc_memcpy_vis,nodebug,notrace,nostats,nogasp,
                      | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                      | packedsptr
----------------------+---------------------------------------------------------
 Configure id         | gilbert.cse.mtu.edu Thu Nov 20 14:08:40 EST 2008
                      | sdvormwa
----------------------+---------------------------------------------------------
 Binary interface     | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
 Runtime interface #  | Runtime supports 3.0 -> 3.10: Translator uses 3.6
----------------------+---------------------------------------------------------
                      |  --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
 C compiler           | /usr/bin/gcc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   Reading specs from
                      |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                      |   Configured with: ../configure --prefix=/usr
                      |   --mandir=/usr/share/man --infodir=/usr/share/info
                      |   --enable-shared --enable-threads=posix
                      |   --disable-checking --with-system-zlib
                      |   --enable-__cxa_atexit --disable-libunwind-exceptions
                      |   --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
 C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
----------------------+---------------------------------------------------------
 linker               | /usr/bin/gcc
                      |   GNU/3.4.6/3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)
                      |   Reading specs from
                      |   /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
                      |   Configured with: ../configure --prefix=/usr
                      |   --mandir=/usr/share/man --infodir=/usr/share/info
                      |   --enable-shared --enable-threads=posix
                      |   --disable-checking --with-system-zlib
                      |   --enable-__cxa_atexit --disable-libunwind-exceptions
                      |   --enable-java-awt=gtk --host=x86_64-redhat-linux
----------------------+---------------------------------------------------------
 linker flags         | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Winline
                      | -L/usr/local/berkeley_upc-2.8.0/opt/lib -lupcr-ibv-seq
                      | -lumalloc -L/usr/local/berkeley_upc-2.8.0/opt/lib
                      | -L/usr/lib64 -lgasnet-ibv-seq -libverbs -lpthread
                      | -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -lgcc -lm
----------------------+---------------------------------------------------------